Logmatic.io Blog

Discover Logging Best Practices
Part 1: Collecting Logs

Beyond Application Monitoring

This post is part of a series about the new Era of Log Analytics. Be sure to check out the post Application Troubleshooting & Investigation Best Practices if interested in some more advanced tips.

Ensuring the performance of rapidly evolving and increasingly complex apps all comes down to getting full visibility of your app (read more about the need for app visibility here). And getting this visibility requires accessing clear data about metrics and events to allow for improved application monitoring, troubleshooting and understanding.

Fortunately, there is an incredibly convenient and powerful vehicle to extract everything you need to know about your application: LOGS. These little strings can save your life! (or at least your job). If we look around the IT world we will see that logs are increasingly integrated in the planning product development and in the everyday life of tech teams.
The usefulness of logs is actually often underestimated as many would think of them at the tail of a hidden file in /var/log, lost in endless files that burn your eyes and that you cannot conveniently use. But there are logging best practices out there waiting to be implemented!

logging graph
There are fortunately ways to easily benefit from the hidden technical and business value of logs. The era of business intelligence directly powered by raw logs is just starting, opening exciting possibilities for driving tech performance with data, performing A/B testing on your app and your infra, as well as easy info sharing with business departments. Tech and business teams would get closer as valuable data appears at the fingertips of the former. Tech resources and coding time needed to get rich analytics usable for a CTO & Management teams can indeed be very small with a couple of logs/ logging best practices.

So how are you going to get the value hidden in your logs? The first step to take is to collect logs in a way that will facilitate their upcoming enrichment, parsing and searching. The second step would be to set up a log management tool – whether it be an internal or external one – to receive your precious logs and facilitate logs’ search as well as system & application monitoring.
Our purpose here is thus to guide you through the best practices in collecting logs to get the most of their value, and get you started on the right foot.

1. Keeping it simple

Being simple and structuring your data can come a long way to help you easily extract its value later on. Doing so will prevent you from having to go through your logs later transformation. There are 3 things to consider:

A) Structure

The old standard way to write logs is by using sentences. Unfortunately, by writing sentences, you are not preparing the data to be used for application monitoring or other purposes. We advise you to write down logs in a way that facilitates readability of machines for humans. It will allow your developer, devops, and various business departments to assimilate information generated by your production environment.

So the former log:

logger.info("user 1234 clicked on the save button on the sign-up page");

Would now look like the following

logger.info("userID=1234 clicked on buttonId=save on pageId=sign-up";

Avoiding using complex encoding to make event information intelligible is the first logging best practice. It allows teams to steer clear of lookups, and not be “lost in translation”. When you are in rampage debugging mode you don’t want to spend time figuring out what your data means.

Dev teams should get used to that new kind of structured logging. With that purpose in mind, JSON logging is probably easier to use as its standard naturally enforces itself, compared to xml for example.

B) Standard

Using JSON format will help you do what we just mentioned above in point 1). It is a pretty straightforward standard that makes for easily readable and parsed data. It does not look like much and is simple to code, but it has the incredible capacity to record unfathomable amounts of data and transmit it. Plus it is compatible with the model data of most of the current programming languages in use.

Thanks to the simplicity of the JSON standard, tech teams can manage to collect information in a matter of hours or days, leaving the complexity of former BI projects, spread over weeks and months, far behind.

And the previous log example would read as:

"message" : "user 1234 clicked on the save button on the sign-up page",
"userId" : "1234",
"buttonId" : "save",
"pageId" : "sign-up"

If you want logging to work seamlessly for your tech teams and to promote rich application monitoring, you should create standards in the code itself so that they can be enforced easily by all team members. Nowadays most of current logging libraries (in java, python, javascript… ) natively push for context information by providing meta data fields to your logs. All the information that you used to write in the log itself should then be provided in the contextual parts, more structured than plain full-text lines. Now a simple configuration of your logging libraries will transform your old text & meta logs into proper JSON logs.

Here is an example of a typical logger:

logger.info("User clicked on a button",{"userId": "1234", "buttonId": "save", "pageId": "sign-up" });

Which would give you:

    "message": "User clicked on a button",
    "userId": "1234",
    "buttonId": "save",
    "pageId": "sign-up"

C) Normalisation

Good and readable data is specific and detailed. You want to look at your logs and easily understand – or even interpret – in which context is was collected, when and how it was captured, what does it contain, why it was emitted. So when (Time) and where (Hostname) and who (Appname) things happen is especially important to get it right.

Following rules and sticking to it will greatly increase the value you will extract from your logs. It will equally enhance your capacity to get a bird’s eye view of your stack. Use norms as much as you can. They should be crystal clear:

LoggingRule Ex.
Time ValueIn UNIX (the milliseconds EPOCH format)
DurationIn Milliseconds
Log TimestampsUTC Time
Unique IDconcatenation of filed_1 field_2 and field3

So now that you know how to log, let’s have a look at what type of data you want to collect and put into your logs.

2. Log as much as you can!

As a rule of thumb, the more you log, the bigger the chances are that you get the right information when debugging or trying to understand correlations in your infrastructure.

The saying “better safe than sorry” makes much sense here: collecting logs eats up very few resources compared to the human and financial resources needed in case of failures and poor app performance!

This being said, asking yourself why you’re logging what you’re logging is the best way to ensure you’re collecting the right data, with the proper information needed. Why should this log be generated? What information does it bring to me, my team, my company, and what could I possibly do with this information? Keep these questions in mind while deciding on what data you want to collect.

A) Data sources across your stack

All the layers in your stack constitute sources of data ready to be explored. There’s your user’s activity, components, and even the services behind your system are all interesting to look at for application monitoring and understanding:

LevelExample of Technologies Description
UserBrowsers (Javascript, tracking pixel),
Mobile Devices (IOS, Android, Microsoft),
Desktop app
User level are the leaves of your stack.
Everything which is in direct interaction with your user is here.
All your frontend code should be in this level.
HTTP servers & proxiesApache, Nginx, IIS, HAproxies, varnish, untangle, WinGate, Squid Link between your user level and you applicative level.
Caches applications will be at this level
ApplicativeJava, Ruby, PHP, .NET, python, objective-C, C, C++ Part of your application on your backend server.
It’s the main code of your product which is not
in direct interaction with the user.
API’s are at this level
DatabaseSQL, noSQL, Hadoop, Mongo, Django, Cassandra The database level is the memory of your product.
Though it could be at the applicative level,
we choose to put it apart because of its complexity
and KPI specifications.
PlatformOS: Linux, Windows,
PaaS: Heroku, AWS, Microsoft Azure, Google Cloud platform
The platform level lay under all of your applications,
most of the time it’s an operating systems
(on a PaaS or dedicated server) or Docker
InfrastructureCPU, Mem Disk, Network, Uptime The infrastructure level is the root of all your stack.
Limit between the software and the hardware.
It is the physical border of your product

These six sources form one big team and work together at all times. Think about it, would your favourite football team win the game if the athletes didn’t play together? The same rule applies to your stack: one source is not more important than another. Collect data from each one of them to draw the big picture and master your own game.

B) The core of logs: metrics & events

When it comes to logging best practices, the advice often comes as “channel as many logs as possible, chase all errors and exceptions, don’t let any escape you!” And you may start feeling overwhelmed with all the logging possibilities ((or you’re just like me: when hearing about endless sources of information to explore, you’re thinking “awesome!”). But what looks like a vast ocean of unrelated information, actually boils down to two categories of information:

  • Events: capture non frequent phenomena with the additional information you wish. They are time-related. Events can be triggered by changes (builds, build failures), by alerts or by scaling (adding hosts). Recorded events usually carry enough information so that they can be interpreted on their own. An event is intrinsically linked to its context of emission, so each layer has a “specific” kind of event:
DataLevelsLog TypeExample
EventUserUser connected
Abandoned process
Button clicked
Payment done
Login failure
Login failure (SSHD):
{“syslog”: {“severity”: 6, “hostname”: “prod-es-f01”,
“appname”: “sshd”, “prival”: 38, “facility”: 4,
“version”: 0},
“message”: ” Invalid user visitor from 88.157.192.XXX”}
HTTP servers & proxiesResponse Code
User agent
Session ID
Proxy connections
Proxy Connection (HAproxy):
{“syslog”: {“severity”: 6, “hostname”: “prod-api-4”,
“appname”: “haproxy”, “prival”: 174, “facility”: 21,
“version”: 0, “timestamp”:
“message”: ” 5.50.XXX.XXX:54853
[03/Feb/2016:09:52:34.227] https-in~ http-api/prod-api-5 159/0/0/70/229 200 115 – – —-
500/500/3/2/0 0/0 \”POST
/v1/input/4gb7aQe_XXXXXXXXXX/ HTTP/1.1\””}
ApplicativeJob executions,
Runtime errors,
Third party service error messages,
Unexpected behavior
Job Execution (Java):
{“level”: “INFO”, “thread_name”:
-455”, “@version”: 1, “logger_name”:
“message”: “ – –
[03/Feb/2016:10:04:51 +0000] \”POST
HTTP/1.1\” 200 235 \”-\” \”-\” \”-\” 11″,
“2016-02-03T10:04:51.965Z”,”level_value”: 20000}
DatabaseTable purged
Table access time
Slow log
Slow queries
SQL statement
Transaction traces
Transaction traces (Mongo):
{“syslog”: {“severity”: 6, “hostname”:
“prod-mongo-3”, “appname”: “mongo”, “prival”:
134, “facility”: 16, “version”: 0},
“message”: “2016-02-03T10:09:38.264+0000
[initandlisten] connection accepted from
172.16.XXX.XXX:41134 #140825 (91 connections
now open)”}
PlatformServer restarted
Server Up/Down
Boot success
Mounted filesystem
Mounted File system (linux kernel) :
{ “syslog”: { “severity”: 6, “hostname”:
“test-es-f1”, “appname”: “kernel”, “prival”: 6,
“facility”: 0, “version”: 0},
“message”: “[224994.933566] EXT4-fs
(dm-2): mounted filesystem with ordered data
mode. Opts: (null)”}
  • Metrics: provide you with a value related to your system. They are just as time-related as events are, but are collected more or less continuously. A single metric data point is generally only meaningful when put in context. Metrics can be found at each layer of your stack. We’re all familiar with them, and they make up the very foundation of application monitoring. There are two main types of metrics:
    • Quality of service (QS) metrics: measure the outcome of your system or application. They are used to track availability and understand the effectiveness of your application and infrastructure. These metrics should be tightly linked to the business performance of your app and the value your service is creating.
    • Resources metrics: measure how much resources are consumed to produce a desired outcome. It’s about how much energy your structure is spending to produce a – hopefully – desired outcome. CPU, disks or memory are low-level resources whereas your database, for instance, is a high-level one.
DataData TypeTypologyExample
MetricsQS MetricsThroughput- Requests per minute: HTTP servers and proxies: nb API call/ min
System Health- Success: HTTP servers: % of responses at 2xx
- Error: HTTP servers: % of responses at 5xx
Availability– Up/down time: applicative and database level: measurement up time?
Nothing happening?
PerformancePerformance-Quick response time/ latency:
from user to platform level: ms
Resources Metrics (Ex Kafka)Utilization- RAM used
Saturation- Number of queued element
Errors- Failed fetch request, failed request send, network down
Availability- Kafka is reachable / % of time when network is available

3. And add a touch of meta information

You’re now almost set to get rolling! One more step though: using metas… Adding context to any log is one of our logging best practices. It will indeed allow you to quickly filter over users, customers or any business centric attribute. Context can massively improve your troubleshooting process and reduce time wasted trying to understand and search for the right information!

As you will see shortly, adding context is all about categorizing data. These categories will form your hinge points while in root cause analysis mode:

    • Data identity: it can be used to bundle information that cuts across multiple layers of your technology stack. It thus allows you to trace requests, transactions or even monitor user experience throughout your infrastructure. Logging Examples of useful data IDs are: User ID (email, login name), Appname, Transaction ID (web session ID, SQL transaction ID), Account ID (company name)
    • Severity Level: not all systems in your stack are critically equals. Make a list of all of your events and metrics values for one system, and rank them depending on how critical they are for your business. Here is a possible category list:
      • Category 1: Crap I’m losing a client.. the end of the world is coming upon us!!!! (Critical, Alert, Emergency)
      • Category 2: Somebody’s butt is going to get kicked (Error)
      • Category 3: Here come the mighty phone calls (Warning)
      • Category 4: Nobody should notice, it’s safe anyway, let’s keep it real and move on (Notice, Informational, debug)

Logging security level by categories will allow for much easier application monitoring and alerting later on.

Download White Paper
Logging Best Practices


If you followed all the steps mentioned above, you’ve just laid the foundation for a rewarding log monitoring and easy business intelligence experience!

You can now see that logging efficiently is pretty easy and does not require many tech resources or coding time. And you’re probably getting the feeling that when plugged in the right log management tool for you, it will open unforeseen BI opportunities for yourself, management and business teams. Good log management on your app will open every department’s eyes and make data-driven decisions a reality.

Sharing is caring! We’re doing our best to create useful posts, please share if you appreciate it!

Related Posts