Logmatic.io Blog

The Real User Monitoring plan
you need to bring your vision to life

A how-to guide to RUM collection, analysis
and correlation with backend data

This post is part of the Real User Monitoring series on our blog. Read more on Website monitoring with W3C APIs, Web app performance with Boomerang.js library or Our user tracking day-to-day use cases if you’re interested in further Real User Monitoring specifics.

Hi to all the web & mobile developers out there! Your app performance seems to fluctuate throughout the day and to depend on the features being used? Your service usage is on the rise, but support guys start to complain about inconsistent performance impacting business? You are not alone. That’s why Real User Monitoring is on the rise, as it can let you extract plenty of amazing insights.

Your app performance is influenced by many factors, many of which being directly user-related. And anytime users experience unstable performance, they’ll either grudgingly keep on interacting with the service, or they’ll simply leave your app looking for greener pastures elsewhere. E-retailers proved it early on. Wal-mart thus raised the flag in 2014 with their Real User Monitoring experiment, comparing conversion rate with performance experienced by end users. Their conversion rate was dropping very quickly. If display took:

  • Between 2 and 4 seconds to load, conversion rate got divided by 2
  • Between 4 and 8 seconds to load, conversion rate got divided by 4
  • More than 8 seconds to load, conversion rate got divided by 8!

Can you only picture the loss of revenue for a company of this size? But even if you’re company is much smaller, keeping users happy is mandatory to reach success. That’s why Real User Monitoring is key to get your business booming.

So let’s get down to it and talk about actual client performance. Why can it drop? How can you get a glimpse of what real users experience? I’ll cover in this article why you need your own Real User Monitoring plan, how to build a good Real User Monitoring strategy, and why we believe Real User Monitoring should be correlated with other better-known types of data to indulge in true overall app performance.

I. Why can user performance drop?

In real life, apps do not live in the laboratory-like environment they were developed into, with an even set of conditions. They’re constantly interacting with the outside world and dealing with a whole range of users parameters. So your server may take 100ms to answer a query while a specific user could still be waiting 1, 5 or 10 seconds to get the results displayed.

Indeed, a whole bunch of latencies sum up. Network latencies multiplied by the number of round trips required are still amplified by international access and take a toll on performance. But so do execution times eaten up by different browsers running on different systems.

All of these behaviors are well known and experienced professionals commonly mitigate them with two sets of actions:

  • Server optimization, load balancing, cache and CDN deployment for System architects
  • Code adjustments after performance tracking with Chrome Devtools or Firebug for developers

Once the initial performance wished for is reached, Ops teams would monitor systems to ensure steady performance.

II. From Synthetic Monitoring to Real User Monitoring

1) Synthetic monitoring

It is because these standard ways of checking for performance could not assess properly real end user experience that Website Monitoring or Synthetic Monitoring came along with services such as Pingdom, Uptime Robot or Status Cake.
These tools have a set of servers deployed around the world that periodically ping your services and pages. These real-life pings allow them to determine if your service is up or down, and to measure overall performance of pages. This is pretty useful as you get notified anytime something needs your attention.

It is a good first step, as it pretty much guaranteed that as long as you did not receive any alert your service was up. But it is still not representative of your real users experience, with all the range of network – browser – access – actions combinations they can come up with in real-life conditions.

And from this need was born Real User Monitoring.

2) Real User Monitoring metrics

Real User Monitoring pushes things… a few steps forward. It holds a more detailed approach that aims to capture and analyze every transaction from each of your application real user. RUM insights can be splitted into two main categories of metrics:

  1. Application performance metrics, answering questions such as:
    • How long do the assets of my web pages take to load? Which ones exactly are responsible for the slowness experienced by users? Are there any script slowing down my website? (like ad banners for instance)
    • What profiles (browser, geography, …) impact your application to load more slowly?
  2. Usability metrics, answering questions such as:
    • How long did it take for a user to make their first click? How much time elapsed in between two interactions?
    • What specific path followed a user of more importance for your business (with an interesting checkout cart for example)? What interactions did this user have with your website?

Because of the granular level of information it collects, Real User Monitoring generates much more data than more classical monitoring strategies. But dealing with it properly will greatly enhance the quality and sharpness of your application understanding.

Did I raise your interest? Now you wish to know how to implement your own Real User Monitoring strategy? Let’s get to it.

III. Let’s make some Real User Monitoring!

There are 2 main components to consider when setting up Real User Monitoring:

  • A metric collector that usually comes as a little piece of Javascript code deployed on your webpages
  • A metric container to take care of data storage and analytics, usually backend server(s)

Let’s go over the current options available for Real User Monitoring collectors and containers.

1) Collect Real User Monitoring data from browsers

Real User Monitoring can be applied to mobile apps, but for the sake of simplicity we’ll focus on web browsers here. Please note that some of the solutions listed below are able to collect Real User Monitoring data from any type of device.

We’re looking into collecting the following information:

  • Client bandwidth
  • Time elapsed between the last click and the first byte received – network time duration
  • Rendering time duration
  • Total time elapsed between the first page request and the moment when the page is considered to be totally loaded
  • Download time per asset

Also remember that these measures are set on every page of your website – for each URL and for every single user. If your website contains lots of content or product pages (as it is the case for e-retailers) then the number of data points collected gets high pretty quick.You have to find a good strategy to take care of it all, as we’ll see later on this article.

So how are we going to collect all of these with JavaScript?

a. Real User Monitoring with W3C Navigation Timing API

w3c logo

The W3C is the consortium responsible for providing proper international standards for browsers, and its Navigation Timing API is its endeavour towards browser performance. It is pretty comprehensive and well written.
You can easily get important indicators such as long network redirections, dns resolution time or page rendering time. So there is absolutely no need to do it yourself with some homemade technics using `Date.now()`: the API already implements the most important measures in a meticulous way.

For a quick trial, open your browser on any website and copy-paste the following in the console:

var timing = performance.timing;
var bench = timing.loadEventEnd – timing.navigationStart
console.debug(“Page processed in ” + bench + ” ms”);

You’ve just measured the main RUM performance metric: page loading time.
We dedicated a full blogpost on practical RUM with W3C API if you’re interested in the specifics to implement it: Practical Website Monitoring with W3C API.

Just be aware that this API only works with recent browsers, which explains why it is often not the first choice of developers building their Real User Monitoring stack.

b. Boomerang.js: THE Real User Monitoring reference

boomerang.js logo

Boomerang.js is an open source script launched by Yahoo. It uses a variety of techniques to measure page load time depending on what browsers support. For browsers compatible with the NavigationTiming API, data is read out of that API, though there are some bugs in certain parts of the implementation. For browsers that do not support the NavigationTiming API, boomerang uses a cookie to store action times across page loads. Naturally this only works when pages are on the same domain, so you typically don’t get first page load times.

To sum it up, it does all the dirty work for you. For more specifics on how to setup your Real User Monitoring boomerang.js strategy, please refer to our Smashing web app performance issues blogpost fully dedicated to the subject.

2) Store & Analyze your Real User Monitoring data

Neither Boomerang.js nor the NavigationTiming API come with a backend storage service, but it can easily be done with a piece of code or an integration. Possible integrations would be the following:

  • A commercial service such as SOASTA (which currently maintains the Boomerang.js project since Yahoo decided to abandon it)
  • Google Analytics (more details here)
  • Piwik. Here is the only article that I have found dealing about this (…it is unfortunately written in French).
  • Your own backend with HADOOP, ElasticSearch or your preferred NoSQL database while building the right APIs and JS calls.
  • A SaaS based log management tool such as logmatic.io. We’ll cover this topic in the following section.

But how can you chose? Let’s talk about the two most common needs we’ve encountered.

a. Dealing with volume

Be very much aware that the volume of events sent grows pretty fast, and so solving storage and analytics issues that arise consequently is a challenge. That being said, we would advise you to go for open source solutions such as PIWIK, HADOOP, ElasticSearch, etc… only if you have a technical team that has both strong knowledge in this area and availability to work on the project. Building this kind of complex architecture without taking it seriously will otherwise lead to maintainability issues and higher costs than expected.

If you want a high-end solution that applies to all the known techniques in terms of data collection and provided analytics then go for a solution like SOASTA. You won’t be disappointed as we’ve heard only good things about them.

b. Answering marketing needs

If you’re focusing on business metrics and you don’t have technical backing then you should probably stick with Google Analytics. However, know that GA does sample the numbers (as keeping all the data will be too costly for a free service) and won’t give you deep insights. But that remains a good and affordable start.

Now you know how to collect, store and analyze your Real User Monitoring data, and your RUM strategy should be clear. But there’s still another need we’ve encountered: being able to confront Real User Monitoring data with other sources of data.

3) Real User Monitoring and backend visibility

Handling overall app performance with log management tools

As a developer accountable for your app performance, you will need to investigate and correlate data coming from backend servers, mobile devices and web browsers at once. A drop in performance can indeed be generated by:

  • An underperforming piece of code or an error (applicative log)
  • Slow queries over a database (slow log)
  • Lack of resources (machine constants metrics and events)
  • Infrastructure mis-configuration (audit trail, infrastructure logs)
  • Caches and CDN performance (Cloudfront, Fastly, etc… logs)
  • etc…

In that case, a log management tool and more precisely a log analytics tool is probably the only solution that has the flexibility to let you do troubleshooting and analytical monitoring over such a wide variety of heterogeneous data.

As firm believers of the usefulness of this approach – mixing Real User Monitoring data with other types of data for correlation – we invested in the development of a dedicated SDK.

b. Incorporating Logmatic.io-rum-js to our tool

Logmatic.io has been designed to collect any kind of real-time, heterogenuous data (log, events and metrics). Once the data collected, it provides powerful troubleshooting & analytical capabilities packed in a user-friendly interface. If you want to learn more about log collection and how it can really boost modern application developers productivity, refer to our most popular article Discover Logging Best Practices – Beyond Application Monitoring. We highly encourage you to read it.

Real User Monitoring data then came as another brick of what webapp development team needed to properly monitor and troubleshoot the overall system. And that’s how the logmatic-rum-js started.

We developed our Real User Monitoring script as a smart wrapper of boomerang.js solving the 2 issues we encountered above:

  • Data collection : what data and how to send it?
  • Data storage: how do I store and analyze it?

You can bootstrap it as follows:

 <head>
    <title>Example to report User Monitoring performance to Logmatic.io</title>

    <script type="text/javascript" src="path/to/boomerang.min.js"></script>
    <script type="text/javascript" src="path/to/logmatic.min.js"></script>
    <script type="text/javascript" src="path/to/logmatic-rum.min.js"></script>

    <script>
        // set up your Logmatic account
        logmatic.init('<your_api_key>');
        // see https://github.com/logmatic/logmatic-js customize the logger as expected

        // set up boomerang
        BOOMR.init({});
    </script>
    ...
  </head>

All your pages provide the important metrics mentioned in the previous sections as well as the worst ressources as you can see in the following sample extracted from a local server test:

{
   "severity":"info",
   "message":"[RUM JS] Page '/#!/phones/motorola-xoom' took 398 ms to load",
   "rum":{
      "t_done":398,
      "t_resp":11,
      "t_page":387,
      "rt":{
          "t_domloaded": 230
       },
      "restiming":{
         "nb":24,
         "t_max":135,
         "worst_entries":[
            "http://localhost:8000/phone-detail/phone-detail.module.js took 135 ms",
            "http://localhost:8000/phone-detail/phone-detail.component.js took 135 ms",
            "http://localhost:8000/phone-list/phone-list.component.js took 132 ms",
            "http://localhost:8000/core/phone/phone.service.js took 98 ms",
            "http://localhost:8000/core/core.module.js took 95 ms"
         ],
         ...
      }
   },
   "url":"http://localhost:8000/#!/phones/motorola-xoom",
   "domain":"localhost"
}

So, not that much to do. A front-end colleague explains it all in this article: Our top user tracking best practices, showing how the library and wrapper work for simpler and more manageable analytics later on, and displaying his preferred used cases with users’ behaviors, context (browser, geography, operating system, etc…) and perceived performances:

real user monitoring logmatic

Wrapping up

Software editors and e-retailers are particularly exposed to high correlations between user happiness and performance. To prevent damaging sales and brand image, they pretty much have no choice but to embrace Real User Monitoring… And nowadays, collecting Real User Monitoring data can be compared to collecting customer feedbacks before bad comments pop up. 🙂

Collecting metrics and watching them from afar through analytical tools is not enough: actions are needed too. We feel that the granular view logs offer is the best way to spot where to take actions in the midst of large systems. And so we believe that applicative logs need to be nearby Real User Monitoring data in order to let devops and devs quickly react based on a comprehensive view of what is really happening.

Related Posts