About a year ago we released metrics-reporter-config to help us manage the configuration for all of the metrics our applications were generating using Coda Hale’s Metrics library. Since then the number of metrics generated has continued to grow and currently sits at around 600k, available in both Ganglia and Graphite. That’s more metrics than we would like to have with current tooling (a preferable problem to not having enough metrics), and we are looking into ways help engineers cull or detect anomalies.
In addition to our custom analytics systems we run several Apache Cassandra clusters. AddThis has been using Cassandra since way back in version 0.6, and our footprint (in terms of both number of servers and dependent services) has continued to expand along the way. This year we made the leap to virtual nodes and dual-dc clusters, and we’re not looking back.
After a bit of work and some valuable feedback, I’m happy to note that starting in version 2.0.2 Cassandra has a new pluggable metrics reporting feature using metrics-reporter-config. Our graphing systems now have even more metrics to content with, but the improvements for debugging, errors detection, and capacity planning are well worth it!