I got 99 Problems,
but Multi-line Logging Ain’t one
Maybe you are a Java developper, a PostgreSQL user, or a passionate logger of everything you can find. In any case, you’ve encountered multi-lines logs. Parsing multi-line to collect them into a single event can rapidly become a nightmare.
This post is right for you as we will cover how to handle multi-lines from file to a single originating event. For example, joining Java exception and stacktrace messages into a single event. So take a seat: debugging and monitoring multi-line logs will have a much friendlier face in a few minutes.
First, there are several things you should keep an eye out for when you are dealing with your multi-line logs:
- Logs should be written in realtime into a file, in chronological order.
- Logs should be streamed through a single pipe: in others terms, all lines generated from an originating event are contiguous.
For the purpose of this article we will assume the points above are already fulfilled. A postgres multi-line sequence then looks as follows:
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: DEBUG: StartTransactionCommand STATEMENT: select sum(bar) as total from foo my_table where bar <> 1; DEBUG: StartTransaction
And here would be a Java Multi-line Log generated by a Log4j appender:
2014-11-23 23:25:22,119 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 2014-11-23 23:25:22,222 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2014-11-23 23:25:22,481 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.NullPointerException at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:970) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:673) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) 2014-11-23 23:25:22,485 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task ...
Does it ring a bell? Not quite friendly looking. But our logging experience has taught us two main lessons to cope with those Postgres, Java or any other multi-line logs:
1. Log in JSON, my apprentice
JSON makes things easier because it will keep each event unified, thus eliminating the multi-line log issue all together. If you are not familiar with the JSON format, check out our article explaining everything you need to know about log collection.
2. When JSON is not an option…
For some, JSON is simply not an option. Some see tasks dangerously piling up in order to be able to log in json. Some others are hitting a brick wall – if they wish to log in JSON, they literally need to re-write pieces of their code, re-think crumbs of their apps or change frameworks to implement it. Some do not have ownership of log format choices.
In all these cases, there is still a way to deal with multi-line logs, and that is:
The almighty log shippers
Log shippers come in all shapes and so to speak sizes. A range of choices is available depending on your system and preferences:
Multi-line is supported only by recent versions of rsyslog and requires at minimum version 8.10.0. Before starting, check the version you’re managing with rsyslogd -v. If your version is too low, you can try upgrading or, as this is sometime easier, switch to Syslog-ng as a replacement.
Multi-line support is only available on imfile module. So, your logs need to be written in specific file before being parsed and polled by RSyslog.
In order to enable multi-line, you need to use the inline syntax. There are no directives available. See here for full details on implementation with Logmatic.io.
Its working are similar to the shipper above. However you can choose between 2 multi-line modes:
- Prefix-garbage which is close to rsyslog in its workings. Once a prefix line-breaker has been defined and has been matched, syslog ng would start a new event, until a new prefix line-breaker is found.
- Prefix-suffix uses a suffix to detect the end of an event.
The in_tail plugin that tails any file also has the ability to do multi-line aggregation. You simply need to set up multi-line format to enable it. Then define a pattern to split groups of lines.
Syntax is quite simple and Regex are ruby-compatible. Fluentd also allows to add format1 through format20 in order to parse different format. See here for full details on implementation with Logmatic.io.
Previous versions of Logstash could handle multi-lines and multi-streams but now, its multi-line filter is deprecated. From now on, Logstash handles multi-line with the multi-line codec. It can be applied to a file or an input and its behavior is similar to the ones of RSyslog, Fluentd,… even if its syntax and line-break rule slightly differs: it aggregates all the lines that don’t start (negate => true) with the pattern (pattern => …) and merged them with the previous line (what => previous). Full details on how to implement it with Logmatic.io here.
Multi-line logs hold true value within! Make sure you don’t let it slip between your fingers because of configuration issues. Turn all of your multi-line logs into your friends, not your enemies. Either by using JSON or one of the log shippers listed above, you shall find your path dear apprentice… Becoming a logging master is not far off!