Blog

Processing Our Data

Currently taking place on the Clearspring blog is a four part series about how our team processes tens of billions of unique, new data points on a daily basis.

In storage terms, we ingest 4 to 5 TB of new data each day; that could easily double in 12-18 months. That data must be processed as live streams, re-processed from the archive as new algorithms are developed and able to be queried in flexible ways. An interesting mix of batch, live and hybrid jobs are employed.

So head on over to the Clearspring blog to learn all about the wizardry!

  • Great good job guys!

  • An interesting mix of batch, live and hybrid jobs are employed.

  • I’ve always wondered what kind of computing resources (configuration and number of machines) do you guys need for such a lot of processing.

    It might be a good idea to distribute this kinda activity over a large number of desktops which aren’t used to their entire potential, thereby making these activities a wee bit greener :)

  • Hello,

    Great blog.!! I have been gone through. I am working on data processing too and it will be helpful to me.

    Thanks a lot,

    John

  • Hey! Do you know if they make any plugins to assist with SEO? I’m trying to get my blog to rank for some targeted keywords but I’m not seeing very good success. If you know of any please share. Kudos!