Phetric: Persistent Metrics for PHP applications

This is part two of three in a series on how we monitor application metrics on AddThis.com. See part 1.

Last month we open sourced two internally developed pieces of our metrics stack. Drew Stephens wrote about the first, MetricCatcher, which enables non-java applications to utilize Coda Hale’s Metrics Package. The second piece, which we use on AddThis.com to monitor internal application metrics, is Phetric.

AddThis.com attracts millions of page views each month across multiple servers on a complex site to which we are constantly adding new features and enhancements. In order to monitor our metrics during this process, we gather metrics with Phetric and MetricCatcher, sending them on to Ganglia and Graphite for recording and viewing.

Using Phetric

To initialize Phetric, first you must include Sender.php and call Phetric_Sender::init() which takes the following four arguments (the final two of which are optional):

  1. The host where we want to send our metrics
  2. The port where MetricCatcher is listening
  3. A string you want prepend to all metric names
  4. A boolean for if you want to send each metric as it comes in rather then waiting for the end. (off by default)

All metrics are sent (as JSON) using UDP, which is a non-blocking operation; sending metrics won’t ever hang your application. Phetric hooks into php’s shutdown event to send the metrics after your application code has finished. If you are debugging or running extremely long running application code, setting the auto flush boolean to true will mean you send all the metrics as they come in.

Phetric supports everything offered by Coda Hale’s metric Package except for health check. Take a look at the Phetric Readme to see how to implement each metric.

Verifying your metrics

There are two options for verifying the metrics that you are sending as you develop. The first is to point Phetric at an install of MetricCatcher and tail its logs. If you aren’t using MetricCatcher, you can trivially see the metrics that Phetric emits using Netcat. I have this function in my .bashrc to make testing easy:

catcher(){
    while true;
    do
         nc -w 1 -l -u 1420;
    done;
}

In Conclusion

Grab Phetric on Github, it’s available under the MIT License. If you want to extend it please submit a pull request. Have ideas or questions? Open an issue on Github.

If you haven’t already, check out Drew Stephens previous post on MetricCatcher, where our Phetric metrics end up. Also, check back here soon for the final part of this series.

Advanced Metrics Tracking for Webapps

MetricCatcher is a bookkeeping agent for application metrics. It utilizes Coda Hale’s Metrics package to provide languages that aren’t Java (or aren’t long-running) with the easy-to-use tracking & advanced maths of Metrics.

If you have a Java app and are tracking its performance, the best way to do that is using Coda Hale’s Metrics package, which provides convenient objects for counting happenings in your application. In other languages you don’t have the option of using this great library, and in web apps that start up a new process for each request, simply keeping the persistent data to enable metrics like this is a hassle. That’s where MetricCatcher comes in—toss values at MetricCatcher and it will create corresponding Metric objects, allowing your non-Java app to take advantage of Coda Hale’s fancy maths. Metrics in a Java application can be viewed with Jconsole (or even better, VisualVM), but to really realize the power of tracking your application, MetricCatcher can pump its data into Graphite or Ganglia.

We use MetricCatcher to keep tabs on our PHP code using the Phetric library.  The combination of Phetric & MetricCatcher allows the easy creation & updating of metrics without requiring any state to be kept on the PHP side of things.

Running MetricCatcher

Grab MetricCatcher from the Clearspring GitHub repository for MetricCatcher

The only configuration that MetricCatcher requires is the location of your Ganglia or Graphite server, which can be defined in the conf/config.properties of the distribution. MetricCatcher will send metrics to whichever metrics servers are defined. Starting & stopping MetricCatcher can be done using the included scripts in the bin directory.

Getting Data In

MetricCatcher listens for JSON on UDP port 1420 for metrics to track—simply feed it lists of Metrics objects, each of which must have a name, type, timestamp, and value. MetricCatcher supports all of the types that Coda Hale’s Metrics provides, except for Health Checks. Note that histograms are either biased (favor more recent data) or uniform (weight all data equally) and are referred to as such. The JSON format looks like this:

{
    "name":"namespace.metric.name",
    "value":numeric_value,
    "type":"[gauge|counter|meter|biased|uniform|timer]",
    "timestamp":unix_time.millis
}

Metrics are sent as a JSON list, so multiple individual metrics can be bundled:

[
    {"name":"foo","value":7,"type":"gauge","timestamp":1320682297.6631},
    {"name":"bar","value":77,"type":"meter","timestamp":1320682297.6631}
]

Where Data Goes

You can view the metrics using a JMX agent (jConsole or VisualVM as mentioned above), but the best way toview them is to define a metrics—collecting server in the config.properties file. If you do that, MetricCatcher will send its stats there once a minute, so you can check out your Graphite or Ganglia server to see the results.

25 Hours of Racing

Last weekend the Clearspring Motor Club racing team, Cobra Kai, raced their Swedish Race Truck at Nelson Ledges Road Course in Ohio as part of the ChumpCar Longer Longest Day.  After 25 hours, 25 minutes, and 25 seconds of racing—stopping only to change drivers and refuel the car every 2 hours—the checker flag dropped with our car in 13th place overall, a great result from a field of 74 cars that started the race.

ChumpCar is a racing series similar to the 24 Hours of LeMons.  The premise is simple: buy a car for less than $500 and run it on a racetrack for hours on end.  The car that accumulates the most laps over the race is declared the winner.  This sort of “crap can” racing isn’t just orbiting a circle track, nor is it a demolition derby.  The tracks are complex and difficult to master.  The danger is real, too—we wear helmets, head-to-toe fire gear, all of the cars have full roll cages, and the safety rules are extensive.

In endurance racing the biggest factor is reliability.  As long as you can keep out on the track without problems you have the basis for a good race.  Performance can only take you so far and since the $500 price of the car includes any performance modifications, options there are limited.  Our team has made great use of cutting things off of the car to improve performance.  Before our first race we cut the springs and filled them with tennis balls to make the suspension stiffer.   At our second race, the ChumpCar 24 Hours at VIR, we turned the race wagon into a truck by chopping the roof off while waiting in line to get into the racetrack.  Our car’s performance from this weight-saving modification were enough to encourage another team at our most recent race to do the same with their Volvo.

Overall we had a great race without any problems that kept us off the track.  Ending up in 13th place—ahead of dozens of cars that were turning faster laps when they were on track—is an wonderful achievement.  For complete details on our race, check out this post on the CS Motor Club blog.  Next time, we’re shooting for the top ten!

Think this is awesome?  Check out the jobs page, we’re always looking for new drivers folks at Clearspring.

Add Google Social Tracking

Google launched social tracking for Google Analytics in June, to measure social engagement, social actions and social pages. If you’re already using AddThis with GA, you can integrate with one line of code!

You just need to opt-in:

<script type="text/javascript">
     var addthis_config = {
        /* your GA property ID goes here: */
        data_ga_property: 'UA-123456-1',
        /* set to true to enable social tracking
        data_ga_social : true
     };
</script>

We’ll track the network (e.g. “facebook”), the social action (e.g. “share”), and the target (e.g., http://example.com/your/blog).

If you haven’t already, upgrade to the new version of GA to see the social reports:
Social reports on GA

Google’s got plenty of documentation on the new social analytics features if you need help interpreting their reports. If you’re totally new to Google Analytics and AddThis, check out our help doc on basic integration.

Questions? Comments? Drop us a line in the forums.

We are excited to announce that we have just gone live with a major upgrade of our analytics software and deployed that software in our brand-spanking-new data centers! These new data centers represent a massive increase in the computational power. Our new servers boast over 2000 CPU Cores, 1.8 Petabytes of capacity, and around 6000G of RAM. With this new data center roll-out you will notice a major improvement in page-load times when browsing your analytics. In some cases page-load times have improved by over 1000%.

Since we also run a website, we understand how important analytics are to a publisher or site owner. We consider it essential to us to make our tools as accessible and easy-to-use as possible.

Performance is and always will be an ongoing effort so if you ever notice any lags or are having trouble retrieving a report, please report to us so we can take a look.

Exciting Network Update

We’re very happy to announce that this Sunday we’ll be making a very big update to our network. We will be increasing our bandwidth into our data center by 10x.

This is just one of the steps we are taking to continue improving our existing and new products’ performance. There will be more updates to come.

It’s important to note that this should not affect any of the AddThis services and everything should remain up and running as usual!

Please feel free to email us if you have any questions.

We recently posted about AddThis Analytics and how we are in the process of rolling out additional capacity to bring you better performance when trying to view your stats.

As part of the upgrade and building out our data center we will need to temporarily limit the amount of data visible in the analytics to the past 30 days. Rest assured that your historical data is safe and will be made available to you in the near future.

If you have any questions or concerns, please feel free to reach out to us and we will be happy to provide you with any further details as necessary.

We highly recommend making your way over to the Clearspring blog to read Part 3 of our Big Data Architecture blog series.

The latest post describes the distributed query system we use to quickly access terabytes of data that is distributed across hundreds of machines. The query subsystem is comprised of two key components, QueryMaster and QuerySlave.  A single cluster can have multiple QueryMasters and each processing node in the cluster will have one or more QuerySlaves.

Also, if you need catching up, here are Parts 1 and 2.

Processing Our Data

Currently taking place on the Clearspring blog is a four part series about how our team processes tens of billions of unique, new data points on a daily basis.

In storage terms, we ingest 4 to 5 TB of new data each day; that could easily double in 12-18 months. That data must be processed as live streams, re-processed from the archive as new algorithms are developed and able to be queried in flexible ways. An interesting mix of batch, live and hybrid jobs are employed.

So head on over to the Clearspring blog to learn all about the wizardry!

New Open Source Stream Summarizing Java Library

Here at Clearspring we like to count interesting things. How many users have visited a site? How many unique URLs are shared every hour? It turns out that counting is a non-trivial problem when there are several billion things to count. And counting is only the first step. What about the frequency of the things you are counting? Maintaining a complete multiset — with billions of elements indexed by multiplicity — for each dimension of interest is rarely practical. So, we’ve developed a set of utilities to help make counting to a billion easy.

Today we are pleased to release those utilites under an open-source license. Stream Lib is a Java library for summarizing streams of data. Included are classes for estimating:

  • Cardinality (aka “counting things”): Instead of storing an entire set of elements it is possible to instead construct a compact binary object that will provide a tunable estimate to how many distinct elements have been seen. This reduces memory requirements by orders of magnitude. There is a significant body of academic literature on approaches to this problem. We have tried to provide useful implementations of those ideas.
  • Set membership: Bloom filters provide a space efficient way to test for set membership. They have the useful property that their are no false negatives, only false positives within whatever bounds you specify. The wikipedia page has a good section on the interesting variants that have been developed over the recent years. We have adapted Apache Cassandra’s well tested implementation for standalone use.
  • Top-k elements: While counting the number of distinct elements with a cardinality estimator is cool, sometimes (well often) you also want to know something about those elements (such as the most frequent ones). We have some early work in this area (a stochastic topper).

There is a Readme to get your started with the code. We hope that others find this as useful as we have. Feedback, comments, patches, bugs, and forks are all welcome.

Think this is cool? Apply for a job to join the team!

« Older Entries