Category: Engineering

Hydra Internals: Efficient Storage of Nested Hierarchical Data

Hydra is a distributed data processing and storage system developed at AddThis, which we recently released as open source. It ingests streams of data and builds hierarchical tree structures that are aggregates, summaries, or transformations of the data. Sibling nodes in the tree are stored in lexicographic sorted order. This ordering is often used explicitly by the human when writing queries or implicitly by the query system to optimize the execution of queries. Continue →

Open-Sourcing Ssync: An Out-of-the-Box Distributed Rsync

We periodically have to transfer files to a collection of machines in a cluster. Without a distributed filesystem we rely on user level processes to move these files to their target destinations. Previously we had been making a sequence of N rsync calls to populate a collection of N machines. When looking for an preexisting solution that would improve our workflow, I could not find one that met the following requirements:

Continue →

The Secret Life of Concurrent Data Structures

concurrent data structure is “a particular way of storing and organizing data for access by multiple computing threads (or processes) on a computer.” In this blog entry, we’ll be covering one of the hidden sides of concurrent data structures that are not so documented in the literature. We’ll be looking at insertion and deletion operations, and comparing the relative complexity of implementing these two operations.

Let’s focus our attention to concurrent data structures in a shared-memory environment where multiple threads are concurrently reading and/or writing. Continue →

Building a Distributed System with Akka Remote Actors

At AddThis, we deployed our first production system written in Scala almost two years ago. Since then, a growing stack of new applications are built using this exciting language. Among the many native Scala libraries we have tried and adopted, Akka stands out as the most indispensable.

Akka is a library for building concurrent scalable applications using the Actor Model. Its fault-tolerance model is heavily influenced by Erlang. In this post, I will talk about our experience with using Akka (2.0.x) remote actors to build a distributed system, SAM. Continue →