Hydra is Now Open Source

Today we are happy to announce that Hydra—the core of our data processing platform—is now open source and available on github. It’s freely available under the Apache License for anyone to use, and we look forward to seeing just what people do with it!

egg_hatching_hydra

Where We’ve Come From

For the past few years we’ve been talking about Hydra during meetups, and at conferences. We’ve also been excited to open source libraries such as stream-lib, and even more excited to watch a community grow around them.

Hydra is a large system with many pieces: a distributed job execution system handling task placement across a heterogeneous cluster, a network accessible file serving system, and a guardian of local backup and remote replica constraints for the inevitable node failure. However, the primordial idea (circa 2006) is describing data processing as paths through trees and queries as navigating through those trees. The technology landscape looked different in 2006, and many of the ideas practitioners take for granted now were new or just beginning to take shape in public: MapReduce in 2004; Bigtable in 2006; Dynamo in 2007; and Apache Hadoop became a top level project in 2008. Hydra has grown and changed wildly since then, but those trees are the platonic ideal it has been striving towards.

Of course that’s not to say that all of the ideas in Hydra are brand new. Like most 21st century Computer Science, there are strong antecedents in work decades earlier. IBM’s Information Management System is a hierarchical database that is a close conceptual match to the “paths through trees” model. The spirit of exploring data is captured in Charles Bachman’s 1973 Turing Award Lecture “The Programmer as Navigator“:

There is a growing feeling that data processing people would benefit if they were to accept a radically new point of view, one that would liberate the application programmer’s thinking from the centralism of core storage and allow him the freedom to act as a navigator within a database. To do this, he must first learn the various navigational skills; then he must learn the “rules of the road” to avoid conflict with other programmers as they jointly navigate the database information space.

This reorientation will cause as much anguish among programmers as the heliocentric theory did among ancient astronomers and theologians.

We invite you to check out the code and start hacking or start reading the documentation for more information.

Feedback, comments, patches, bugs, and forks are all welcome. Hydra is at the center of what we do at AddThis, so we hope that others find this as useful as we have!

  • Guest

    Awesome!

  • Tionna Davidson

    Luv this web site

  • www.f4uonlinecourses.com

    Great.

  • Özgür Şafak Kellehanlı

    dffg

  • John Masters

    Thank you all for your contributions.

    Open Source allows future development efforts to see which ideas, methods and data models worked best. As each generation builds on the accomplishments of past Open Source efforts, we can expect software to become more stable and secure.

    Do not run code unless you know where it came from. You have to trust Microsoft, Apple and Android that their systems are safe. I run a lot of Adobe and Oracle too. We will never know what’s inside their software, but we have to trust them.

    Open Source allows consumers to inspect the code, or more likely, create standards organizations to certify that software does not contain backdoors, exploits or other unexpected security issues.
    At first these attempts will be crude and of limited effectiveness, but over time we should see the rules get better and the trust in the certification system will also improve.

    If you value privacy and security, you want the software world to become stable and valid. However, building such a stable system could create tools that would allow despots to identify and intimidate opponents.

    Having a trusted organization inspect and certify code must also be balanced with some sort of civil liberties commission to make sure that secure data systems can not be used for political surveillance or to promote violence to citizens.

  • http://www.dweb3d.com/blog.html Diseño Web: Dweb3d.com

    Great for the open source, congratulations for make the web better

  • MAS Internet Mktg

    Open source nice

  • http://www.freebanglatutorial.com/ freebangla tutorial

    Great idea.

  • http://sonnha123.blogspot.com/ sonnha

    tuyet voi

  • Pingback: Hydra is a non-Hadoop database for realtime analysis of dynamic data | Cloud (IaaS) & Big Data

  • abramsm

    Thanks! We’ve been looking at helix, mesos, and friends for cluster process management. hydra’s cluster managmeent is very application specific. So a potential deployment model would be to use Helix to manage the process tier and then have Hydra’s application specific logic maintain consistent state for its own data.