Hydra is a distributed data processing and storage system developed at AddThis, which we recently released as open source. It ingests streams of data and builds hierarchical tree structures that are aggregates, summaries, or transformations of the data. Sibling nodes in the tree are stored in lexicographic sorted order. This ordering is often used explicitly by the human when writing queries or implicitly by the query system to optimize the execution of queries. Continue reading
Now that Hydra is open source we can start to talk about how to use it for common data processing tasks. In this post we will answer several questions about log files generated using Log-Synth. The Log-Synth files we will be using for this post have four columns:
Today we are happy to announce that Hydra—the core of our data processing platform—is now open source and available on github. It’s freely available under the Apache License for anyone to use, and we look forward to seeing just what people do with it!
We had big plans to expand our server infrastructure this year. So, we put together a rough plan that included capacity planning, hardware selection, hardware testing and validation, roll out, and finally, using the new hardware!
We periodically have to transfer files to a collection of machines in a cluster. Without a distributed filesystem we rely on user level processes to move these files to their target destinations. Previously we had been making a sequence of N rsync calls to populate a collection of N machines. When looking for an preexisting solution that would improve our workflow, I could not find one that met the following requirements:
It is a competitive advantage for websites to be fast and responsive, so we made performance a priority when building Smart Layers. Let’s take a look at some of the mobile and desktop performance best practices that help make Smart Layers blazing fast.
A concurrent data structure is “a particular way of storing and organizing data for access by multiple computing threads (or processes) on a computer.” In this blog entry, we’ll be covering one of the hidden sides of concurrent data structures that are not so documented in the literature. We’ll be looking at insertion and deletion operations, and comparing the relative complexity of implementing these two operations.
Let’s focus our attention to concurrent data structures in a shared-memory environment where multiple threads are concurrently reading and/or writing. Continue reading
At AddThis, we deployed our first production system written in Scala almost two years ago. Since then, a growing stack of new applications are built using this exciting language. Among the many native Scala libraries we have tried and adopted, Akka stands out as the most indispensable.
Akka is a library for building concurrent scalable applications using the Actor Model. Its fault-tolerance model is heavily influenced by Erlang. In this post, I will talk about our experience with using Akka (2.0.x) remote actors to build a distributed system, SAM. Continue reading