Logging application metrics with StatsD

Page content

Application monitoring and service metrics

If you don’t track it, you can’t measure it. Realtime service and business metrics should be part of any production application. Knowing how is the app performing is as important as measuring whether the product impact of your changes.

A good set of service metrics lets you effectively monitor the impact of your changes on the app performance. Has your change to multi-threading really achieved the times x throughput? Has the last dependency injection change caused a slow memory leak? Are there levels of response codes you’re not expecting in your app?

There are a variety of ways to deal with application level metrics. The world of .NET offers you perf counters while unix has a variety of possibilities. Usuaully, I would go for the simplest, most friction-less option. That’s why StatsD with Graphite is so appealing.

StatsD

Etsy talks about measuring everything and anaything. Their real obsession with metrics made them release an awesome library that took great traction - because it’s simple and effective.

In essence, StatsD server listens on a UDP/TCP port and collects metrics. These are then aggregated and in intervals passed to a back-end of your choice - most likely graphite.

The advantage of the community is that there is a massive list of client implementations so integrating StatsD into your app is super-straightforward. There are multiple clients for node, python, java, ruby, php, .net, go …. and more. Check out the entire list. Also, you will find a bunch of server implementations beyond the original Node.js. If you fancy StatsD on windows machines, check out statsD.net.

Integrate StatsD client into your JVM app

If you’re looking to add statsD into your JVM application, the java client by tim group seemed like the best choice. It has zero dependencies and it’s pretty straight-forward.

Add your dependency to mvn or gradle

 1MVN
 2
 3    <dependency>
 4        <groupId>com.timgroup</groupId>
 5        <artifactId>java-statsd-client</artifactId>
 6        <version>3.0.1</version>
 7    </dependency>
 8
 9Gradle
10    'com.timgroup:java-statsd-client:3.1.0'

And init the statsd client with the prefix, host of the statD server and the port. StatsD has a concept of namespaces, where you can group your metrics - that allows for better visualisation and keeps them neat. The choice of the namespace is yours, depending on what suits you. In bigger deployments you might go for something like “application-name.data-centre.box-name.counter-name”.

 1    import com.timgroup.statsd.StatsDClient;
 2    import com.timgroup.statsd.NonBlockingStatsDClient;
 3    
 4    public class DiagnosticsService {
 5        private static final StatsDClient statsd;
 6    
 7        public DiagnosticsService(String host, int portNumber) {
 8            statsd = = new NonBlockingStatsDClient("your.custom.prefix", host, portNumber);
 9        }
10        
11        .....
12    }
13    

I tend to have a single statsD client within the app as a singleton wrapped by a diagnostics service.

StatsD Metric Types

StatsD supports a range of metric types. These you fit 99% of your metric logging scenarios. It also has a concept of a flush interval, where the data is sent off to back-ends.

Counters

Basic counters that are incremented each time you log against the counter. These are reset to 0 at flush. You can also set a sampling interval to tell StatsD you’re only sending part of the data-set.

1    your.namespace.counter:1|c

Timers

These are great for monitoring response times of any kind. You tell statsD how long an action took. It then automatically works out percentiles, average (mean), standard deviation, sum, and min/max. Really awesome.

1    your.namespace.response_time:300|ms

Gauges

Gauges are single values that can be incremented or decremented or set to a specific value. Unlike counters, gauges aren’t reset to zero at flush time

Sets

These count unique set of occurrences between flushes.

Logging metrics using the JVM client

Using the JAVA implementation of the StatsD client is then pretty straight-forward.

 1    import com.timgroup.statsd.NonBlockingStatsDClient;
 2    import com.timgroup.statsd.StatsDClient;
 3
 4
 5    public final class StatsDPerformanceService implements DiagnosticsService {
 6
 7        private static StatsDClient statsd = null;
 8        private static DiagnosticsConfig config;
 9
10        public StatsDPerformanceService(DiagnosticsConfig configuration) {
11            config = configuration;
12            statsd = new NonBlockingStatsDClient(
13                    getPrefix(), config.getHost(), config.getPort());
14        }
15
16        private String getPrefix() {
17            return String.format("yourprefix.%s", config.getBoxName()).toLowerCase();
18        }
19
20        @Override
21        public void incrementCounter(String counterName) {
22            if(config.getEnableMetrics())
23                statsd.incrementCounter(counterName);
24        }
25
26        @Override
27        public void decrementCounter(String counterName) {
28            if(config.getEnableMetrics())
29                statsd.decrementCounter(counterName);
30        }
31
32        @Override
33        public void gauge(String gaugeName, long value) {
34            if(config.getEnableMetrics())
35                statsd.gauge(gaugeName, value);
36        }
37
38        @Override
39        public void recordExecutionTime(String timerName, long value) {
40            if(config.getEnableMetrics())
41                statsd.recordExecutionTime(timerName, value);
42        }
43    }

you can then also consider helper methods using runnable and callable to wrap timings around the methods

 1        @Override
 2        public <T> T executeWithTimer(Callable<T> callable, String counterName) {
 3            if(callable == null || counterName == null)
 4                return null;
 5
 6            T result = null;
 7            long startTime = System.nanoTime();
 8            try {
 9                result = callable.call();
10            } catch (Exception e) {
11                e.printStackTrace();
12            }
13            long endTime = System.nanoTime();
14
15            long duration = (endTime - startTime)/1000000;
16
17            recordExecutionTime(counterName, duration);
18            return result;
19        }
20
21    ... and execute like
22
23    String result = diagnosticsService.executeWithTimer(
24    () -> randomService.getResult(someVar), "SuperAwesomeNameOfTheCounter");

Enjoy! StatsD is great - I’ll look at configuring StatD and graphite in my next post.