Hydra is great not only for continuously processing data streams, such as web logs, but also for tasks such as special data analysis, validation, troubleshooting, etc., that call for one-off jobs. Among the latter use cases, one of the more interesting and complicated cases is joining data sets. In this post, I’ll use an example to demonstrate how to join two data sets. Continue reading →
This morning, we released our 2014 Q2 Engagement Report analyzing scrolling behavior on content across the AddThis network. In this report, we break this data down broadly by time and operating system, but also go deeper into how users were referred to the page (i.e. through ad campaigns), and which AddThis tools the pages were using. Here I’ll describe the mechanics of how we created the report. Continue reading →
The video game industry has thrived even in the midst of a recession. Revenue is over $60 billion a year, and projected to reach $82 billion by 2017. Even so, marketing is critical to make games stand out in a bulging supply. With the growth of social media, how has searching and sharing represented sales for video games? Are there predictors in the strength of a release? Continue reading →
Hydra is a distributed data processing and storage system developed at AddThis, which we recently released as open source. It ingests streams of data and builds hierarchical tree structures that are aggregates, summaries, or transformations of the data. Sibling nodes in the tree are stored in lexicographic sorted order. This ordering is often used explicitly by the human when writing queries or implicitly by the query system to optimize the execution of queries. Continue reading →
Today we are happy to announce that Hydra—the core of our data processing platform—is now open source and available on github. It’s freely available under the Apache License for anyone to use, and we look forward to seeing just what people do with it!