4 Issues to Consider for your Logging Strategy
What makes an experienced Devops, Developer or CTO use a cloud-based log management tool? Take the example of a Data Management Platform service CTO I know. He’s got a good grasp of all the technologies involved in building a log management tool. He also has experience in handling big data analytics. So why did he decide to use an external, cloud based log management instead of building his own?
More interestingly even, what would make these same experienced people say “Ok, that’s a log analyzer that really works”? We’ll see this is because they established a logging strategy: they know their log management needs are going to be evolving, and that they need their logging tool to be scaling with them.
I. The initial logging need
The wish for a tool able to analyze machine data is typically driven by troubleshooting needs. Tech teams rightfully get tired of deep-diving into log files, grep-ing here or there for the lost log, looking for the needle in a haystack. Deep-diving in the logs could be seen as a punishment. I remember the distraught looks of colleagues when asked about an intriguing phenomenon. It meant:
Should I really do that or would you rather have me spend time on new features?”
So their initial pain is all about centralising and finding granular information much faster. This is typically what they expect from a log analyzer. They are suffering first and foremost from the time lost in searching and the frustration it is generating.
To solve this, people typically think of centralizing all those log files and putting a search engine on it. Et voilà, you would then have your own log analyzer. Great. It doesn’t look like rocket science. You’ve got many options here, from open source solutions, SaaS tools and even on-premises appliances. Sometimes the initial need even includes a bit of compliance. And the same solution range works. So what’s the big deal about developing a logging strategy and choosing build vs buy a log management system?
II. The upcoming logging needs
Centralizing and searching logs is only your initial need. It’s going to be evolving over time, just as everything in your stack keeps on changing. So you’re going to need a tool that can scale and evolve together with your volume, with your team and with your demands.
1) Scaling volume: housing the growing family
As your business develops your volume of logs increases. More code running, more logs.
And when you scale, the size of your log files grows. And then slowly you start hitting the capabilities of your code or other people’s code. There is no magic happening here. If you leave things as they are, queries will run slower and slower with more and more data to be processed.
At first, it was fine you tested with a small volume (remember that freemium offer, the couple hundred MB you could send, or the little side server you set up for the logs). Peace of cake for what you were testing. Send in the dozens or hundreds of GB of log files. See the change in the experience? Does it work as well?
So you’re now facing two drawbacks. First searching logs takes more time. That’s going to be quite a lot of coffee breaks waiting for the results to pop up. But second, and more importantly, the quality of your log analytics and visibility will drop. Because it is so painful to search your logs, you will stop analyzing those logs at the first sign of an answer: the extra cost of confirming it being unbearable. But did you find the real root cause? Will your app performance and stability really improve?
You’re now left with a not so usable product. So yes, your logging strategy starts to suck. Unless you ensure your log management infrastructure – build or buy – is adapted to your growing volume. See our scaling article if you’re interested in learning more on full-text search & tagging billions of messages.
2) Is your build or buy “never gonna let you down”?
Ok, we all had that incident where an app started getting crazy. Or when one of the developers suddenly turned on the debugging logs. And there you go, a tidal wave of logs suddenly funneled to your log management system. Had you specified the system to be able to absorb all those log files? You have 10 times more data to ingest. You can’t really control the volume of logs in such crisis time.
The result is your log analyzer is either down, slower than ever or missing the logs you wanted in because your quota is passed depending on whether you chose build or buy log management. So your operations are probably down or suffering and you are blind. Anyone to deep dive and grep in those logs? The thing with logs is the more you need them, the less predictable they are. Your logging strategy should thus input the need to deal with unexpected burst.
3) Don’t ask what you can do for your logs but what they can do for you
Let’s look at another situation, quite a usual one. You’ve troubleshot and found the root cause. That line of code is now on your radar screen. So your log management system is working fine. Then the next step is to prevent the incident from happening again. Do you have an alert for that? Can you interface it with a messaging system? Oh right, we changed it. So does it connect easily to the new one? Etc.
At the end of the day, what you want is to have this massive amount of always growing log files work for you and interconnect smoothly with your systems. If you need to update your logging tool each time you change something in your system, you might get crazy. Hence your logging strategy should come up with a way to provide effortless interconnections with the rest of your system. Whether build or buy, your log management software should adapt to and incorporate your needs. And those needs do not only come from your tech team.
4) More friends, more fun?
The more you get out of your log management system, the more users you have. So imagine the support guys want to know what is happening for the customers calling in. And they should. Or the product guys and marketing have some questions on users behaviour (more on how marketing can use logs in our blogpost here). Or the account managers want to know more about their customers’ experience. We see all of that. Those log files have great value. Especially when mixed with events and some other useful metrics.
Now these non tech guys want dashboards, or data dumped in their BI tool, or integration with their tools. If you went on with buying a cloud based log management system, can it do it? If you went on with building your own log management system, can you do it? Surely, a bit of time could sort out their needs. Will that be enough? To me, it sounds like a never-ending game. And of course, more users and more analytics on those logs means your system has to handle those too. And sometimes (in fact very often) they want to go beyond search and simple analytics. Could we build a flow chart on what they dot? What about a top list of users? Or of transactions? Please, I need a pivot table. Or can’t we just have the unique visitor count and the unique URL visited count? And then divide one by the other?
You are in fact just a flowchart away (well, add-in metrics, pivot table…) from unlocking the value of your logs. Does the logging system you have allow for it, and could it allow for new data visualization or calculation whenever you discover you need it?
We now see that common log management needs usually go way past the simple searching a few test log lines. Our piece of advice is to start small but keep in mind you’ve put your hand in the cookie jar and you’d certainly want some more. That’s when choosing a logging strategy comes up.
As in many IT projects, log analysing starts as a simple need, yet it grows progressively to become something more complex… Efficient and useful log management is hard to do well. It represents a real cost in terms of human and technical resources.
So whether you build or buy your log management system, make sure it is addressing the 4 issues mentioned above. And if you choose to build your own homegrown log management tool, take into account the very real costs of dedicating qualified human resources to it, as well as the costs of developing features, ongoing code maintenance and support.