AddThis Blog

New Open Source Stream Summarizing Java Library

Here at Clearspring we like to count interesting things. How many users have visited a site? How many unique URLs are shared every hour? It turns out that counting is a non-trivial problem when there are several billion things to count. And counting is only the first step. What about the frequency of the things you are counting? Maintaining a complete multiset — with billions of elements indexed by multiplicity — for each dimension of interest is rarely practical. So, we’ve developed a set of utilities to help make counting to a billion easy.

Today we are pleased to release those utilites under an open-source license. Stream Lib is a Java library for summarizing streams of data. Included are classes for estimating:

There is a Readme to get your started with the code. We hope that others find this as useful as we have. Feedback, comments, patches, bugs, and forks are all welcome.

Think this is cool? Apply for a job to join the team!