Today we are happy to announce that Hydra—the core of our data processing platform—is now open source and available on github. It’s freely available under the Apache License for anyone to use, and we look forward to seeing just what people do with it!
Where We’ve Come From
For the past few years we’ve been talking about Hydra during meetups, and at conferences. We’ve also been excited to open source libraries such as stream-lib, and even more excited to watch a community grow around them.
Hydra is a large system with many pieces: a distributed job execution system handling task placement across a heterogeneous cluster, a network accessible file serving system, and a guardian of local backup and remote replica constraints for the inevitable node failure. However, the primordial idea (circa 2006) is describing data processing as paths through trees and queries as navigating through those trees. The technology landscape looked different in 2006, and many of the ideas practitioners take for granted now were new or just beginning to take shape in public: MapReduce in 2004; Bigtable in 2006; Dynamo in 2007; and Apache Hadoop became a top level project in 2008. Hydra has grown and changed wildly since then, but those trees are the platonic ideal it has been striving towards.
Of course that’s not to say that all of the ideas in Hydra are brand new. Like most 21st century Computer Science, there are strong antecedents in work decades earlier. IBM’s Information Management System is a hierarchical database that is a close conceptual match to the “paths through trees” model. The spirit of exploring data is captured in Charles Bachman’s 1973 Turing Award Lecture “The Programmer as Navigator“:
There is a growing feeling that data processing people would benefit if they were to accept a radically new point of view, one that would liberate the application programmer’s thinking from the centralism of core storage and allow him the freedom to act as a navigator within a database. To do this, he must first learn the various navigational skills; then he must learn the “rules of the road” to avoid conflict with other programmers as they jointly navigate the database information space.
This reorientation will cause as much anguish among programmers as the heliocentric theory did among ancient astronomers and theologians.
Feedback, comments, patches, bugs, and forks are all welcome. Hydra is at the center of what we do at AddThis, so we hope that others find this as useful as we have!