Our most loved HAProxy
logging strat.

What should you do with your HAProxy logs?

Hello fellas, Pierre here.

If you have more than one web server around, you probably already have a proxy in front of them both to manipulate and load balance your input flow. It this is not the case, I would strongly advise you to use one… (you dam’ fool).

haproxy logo

HAProxy is an open source solution for load balancing and reverse proxying both TCP and HTTP requests. First released in 2001, its usefulness for both community and enterprise users has turned it into a standard for the load balancing and high-availability application industries. They now display an extensive list of famous company users.

As I’ve been working with HAProxy for a while, checking its performance with logs, I have decided to share some of this HAProxy logging experience.

I. A quick background: how does HAProxy work?

HAProxy is organized with front ends and back ends. The former and latter being respectively the client-side and server-side proxies where configuration of the two following points is possible:

  • How and where HAProxy should accept inbound traffic
  • How and where it is supposed to forward it

One of HAProxy key features is that it can continue to operate in the presence of failed backend servers, handling crossover reliably and seamlessly. HAProxy thus provides built-in health checks that will remove a backend if it fails several health checks in a row and will then dynamically route incoming traffic to a variety of backend servers. It brings your both more flexibility and more reliance.

Each client that interacts with HAProxy uses one session. A session is composed of two connections:

  • From the client to HAProxy frontends
  • From HAProxy frontends to the appropriate backend server.

Once a session has been created (i.e. the client can talk to the backend through HAProxy), the client can begin issuing requests. A client will typically use one session for all of its requests, with sessions terminating after receiving no requests during the timeout window, or with a specific terminaison state if an error occurred.

II. What information do HAProxy logs provide?

1) HAProxy logs overview

Note that by default HAProxy only registers logs for sent data, and not for received data (especially with POST method). If you wish to collect information on received data, you need to add the %U options to your logformat, find out more here.

HAProxy generates really helpful default logs. They’re a good base to start with and answer questions about HAProxy global behavior and performance such as:

  • What are my real-time flow typology?
  • What is my overall latency?
  • What are the relations between my frontend and backend servers configurations?
  • How do my rules interact with flows?
  • What amount of data am I sending? Ingesting?
  • Why some session ended with an error terminaison state?

In HTTP, HAProxy provides request details and status so you can analyse your backend servers outside behavior, answering questions such as the following:

  • What are my top URI requests in terms of 4xx errors, 5xx errors? Which one is the slowest to answer? Which one is the slowest in transmitting data and why?
  • Why did some sessions ended with an abnormal termination state? Did it come from the client? Did it come from my servers or from my HAProxy config?
  • How did my system handle security connections?

But let’s get a closer look at what those HAProxy logs contain first.

2) HAProxy Log details

For practical purposes we will take the example of data sent to HAProxy servers only over HTTP connections as it’s the most common usage. Learnings are similar with TCP/UDP connections, though less detailed. Let’s say your configuration is the following:

frontend app-http-in
   mode http
   option httplog
   log global
   default_backend app-backend

backend app-backend
   mode http
   balance roundrobin
   server prod-web-f1 check
   server prod-web-f2 321.321.321.321:3000 check

The check option makes the load balancer periodically perform a health check on this server.

We define the frontend app-http-in~ which will handle http connections and log all information with with the classic format httplog. All logs will then be forwarded to the app-backend backend server prod-web-f1 & prod-web-f2. For a HTTP connection, you thus get this type of log:

haproxy[14389]: [06/Feb/2009:12:14:14.655] app-http-in~ \
app-backend/prod-web-f1 10/0/30/69/109 200 2750 – – —- 1/1/1/1/0 0/0 {1wt.eu} \
{} “GET /index.html HTTP/1.1”

Lots of useful information are extracted from such a simple log:

Table HAProxy

HAProxy Log Format
Extract from the above exampleHAProxy Log Details
client_ip ':' client_port address & port of the client which initiated the connection.
'[' accept_date ']'
Exact date when the TCP connection was received by HAProxy.
app-http-inName of the frontend (or listener) which received and processed the connection.
backend_name '/' server_name app-backend/prod-web-f1
Backend_name: name of the backend (or listener) which was selected to manage the connection to the server.
Server_name: name of the last server to which the connection was sent.
Tq '/' Tw '/' Tc '/' Tr '/' Tt
Tq: Total time spent waiting for the client to send a full HTTP request. In milliseconds.
Tw: Total time spent waiting in the various queues. In milliseconds.
Tc: Total time spent waiting for the connection to establish to the final server, including retries. In milliseconds.
Tr: Total time spent waiting for the server to send a full HTTP response. In milliseconds.
Tt: Total time elapsed between the accept and the last close. In milliseconds.
200HTTP status code returned to the client.
Total number of bytes transmitted to the client.
termination_state---The condition the session was in when the session ended.
actconn '/' feconn '/' beconn '/' srv_conn '/' retries1/1/1/1/0
Actconn: Total number of concurrent connections being processed.
Feconn: Total number of concurrent connections to the frontend.
Beconn: Total number of concurrent connections handled by the backend.
Srv_conn: Total number of concurrent connections still active on the server.
Retries: Number of connection retries experienced by this session when trying to connect to the server.
srv_queue '/' backend_queue0/0
Srv_queue: Total number of requests which were processed before this one in the server queue.
Backend_queue: Total number of requests which were processed before this one in the backend global queue.
'{' captured_request_headers* '}' {haproxy.1wt.eu}
Captured request headers (if their is some).
'{' captured_response_headers* '}' {}
Captured response headers (if their is some).
'"' http_request '"'
"GET /index.html HTTP/1.1"
Complete HTTP request line.

III. What to do with these HAProxy logs

It’s all very well to know all the information you can access, but what can you do with it exactly? Once you have more than one log, you’re able to calculate metrics by aggregating some log attribute values on a given time bucket. For instance:

  • You can count the sum of request hitting your front-end server per second
  • You can make the average of all the requests response time logged during one minute

Of course, tail greping around and scripting is not really what enables you to get those insights. That’s when proper log analytics come in, whether in-house or as-a-service, it will change your life for ever and ever. :-*
Let’s get a closer look at what you can get.

1) Building some interesting HAProxy metrics

From all the information in the original raw HAProxy log events you can build the following metrics:

  • Http request rate (http/s) : Number of requests received by a server during 1 second
  • Http 4xx error rate (http_400/s): Number of requests received by a server with a 4xx status during 1 second
  • Http 5xx error rate (http_500/s): Number of requests received by a server with a 5xx status during 1 second
  • Http request error rate (http_err/s) : Number of requests received by a server with a 4xx or 5xx status during 1 second
  • Ratio of 4xx & 5xx error rate: Ratio between the number of requests received by a server with a 4xx or 5xx status and the total number of requests received by a server.
  • Average response time: Average of all response time logged during a given time bucket.

Studying all those metrics plus correlations between one another will provide you with lots of insights to ensure your HAProxy performance:

  • Ensure continuous service availability:
    • By keeping an eye on peaks and drops on the overall frontend http request rate. In the event of a traffic spike, you can monitor latency experienced by your clients, and identify its source with the average response time per frontend and backend servers.
    • Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience.
    • By keeping an eye on the maximum retry values per backend servers. If you begin to see more retries than usual (aka more than 0), it is likely that other metrics like the average response time (rising), or the request load balancing (degrading) will change. The root cause of all of this is usually that one of your backend servers went down.
  • Ensure fair application usability:
  • By keeping an eye on response statuses fluctuation. Ideally, all responses forwarded by HAProxy would be class 2xx codes, so an unexpected surge in the number of other code classes would be a sign of trouble.
    Under normal conditions, it is acceptable to (infrequently) receive invalid requests from clients. However, a significant increase in the number of invalid requests received could be a sign of larger issues. More 4xx? That could point to a misconfigured application, or client fearing neither God nor man… More 5xx? Your backend servers seems to be in trouble, hurry up!

  • Ensure proper HAProxy configuration behavior:
    • By watching the average time spent by one request in a queue (Tw), and combining it with the value of Actconn, Feconn, Beconn and Srvconn, you can detect if your configuration is up to date compare to your traffic, or if you need to adjust some settings.
    • By watching retry rate together with errors, you can rule out configuration issues. More errors coupled with low retry rates means the problem is not about configuration. 😉

2) Using your HAProxy log data to enhance your monitoring

Because the most granular data – logs – is in your hands, you can get to incident rootcauses more quickly, especially by slicing your data with your requests or client sessions (slice and dice = log analytics guys!)

So for example, if you:

Take http_400/s & http_500/s
Slice it by request
4xx 5xx status requests

You’ll find the top request URI causing 4xx and 5xx statuses from your backend servers. You will soon see if it’s the same requests causing trouble over and over, or if a pattern is emerging.

Another example I like to do is to:

Take Tt
Substract Tq + Tw + Tc + Tr
Slice it by requests

slowest request to transmit

You thus get the average transmission time for each request and can identify the top slowest request to transmit, giving you clues for future improvements for your HAProxy stack.

One last example:

Get terminaison states count. (The combination of the two first flags of a terminaison states —- give a lot of information about what was happening when the session terminated. For reference, the most common terminaison state are here)
Slice it by client session.

Now being able to see the overall repartition of terminaisons, you can detect a server saturation, network troubles, local system resource starvation, attacks… But zooming in on a specific client session and actually seeing why its session ended sooner than usual will allow you to catch possible correlations between requests and your infrastructure behavior.

Wrapping up

ha proxy monitoring

Related Posts

Get notified when new content is published