Our most loved HAProxy
logging strategies

What should you do with your HAProxy logs?

If you have more than one web server around, you probably already have a proxy in front of them both to manipulate and load balance your input flow. It this is not the case, I would strongly advise you to use one.

haproxy logo

HAProxy is an open source solution for load balancing and reverse proxying for both TCP and HTTP requests. They now display an extensive user list of high profile companies since its release in 2001.

Having worked with HAProxy for a while and checking its performance with logs, I have decided to share some of my HAProxy logging experiences and strategies.

A quick background:
How does HAProxy work?

HAProxy is organized with frontend client side proxies and backend server side proxies. Configuration is possible and should be considered at the following two points:

  • How/where HAProxy should accept inbound traffic
  • How/where it is supposed to forward it

One key feature of HAProxy is that it can continue to operate in the presence of failed backend servers. The load balancer incorporates built-in health checks that remove backends if they fail several consecutive health checks. HAProxy will then dynamically route incoming traffic to other backend servers, bringing you seamless crossover and server reliability.

Each client that interacts with HAProxy uses one session. A session is composed of two connections:

  • The client to HAProxy frontends
  • HAProxy frontends to the appropriate backend server.

Once a session has been created (i.e. the client can talk to the backend through HAProxy), the client can begin issuing requests. Clients typically use one session for all of its requests, with sessions terminating after receiving no requests during the timeout window, or with a specific termination state if an error occurred.

What information
do HAProxy logs provide?

1) HAProxy logs overview

Note that by default HAProxy only registers logs for sent data, and not for received data (especially with POST method). If you wish to collect information on received data, you need to add the %U options to your logformat, find out more here.

HAProxy generates really helpful default logs. They’re a good base to start with to answer questions about aggregate HAProxy behavior and performance such as:

  • What is my real-time flow typology?
  • What is my overall latency?
  • What are the relations between my frontend and backend servers configurations?
  • How do my rules interact with flows?
  • What amount of data am I sending? Receiving?
  • Why some session ended with an error termination state?

In HTTP, HAProxy provides request details and status so you can analyze your backend server’s external behavior, answering questions such as the following:

  • What are my top URI requests in terms of 4xx errors, 5xx errors? Which one is the slowest to answer? Which one is the slowest in transmitting data and why?
  • Why did some sessions ended with an abnormal termination state? Did it come from the client? Did it come from my servers or from my HAProxy config?
  • How did my system handle security connections?

Let’s take a closer look at what those HAProxy logs contain.

2) HAProxy Log details

The following is an example of data sent to HAProxy servers through HTTP connections, the most common type of connection. Learnings are similar with TCP/UDP connections, though less detailed.

Let’s say your configuration is the following:

frontend app-http-in
mode http
option httplog
log global
default_backend app-backend

backend app-backend
mode http
balance roundrobin
server prod-web-f1 check
server prod-web-f2 321.321.321.321:3000 check

The check option makes the load balancer periodically perform a health check on this server.

We define the frontend app-http-in~ which will handle http connections and log all information with with the classic format httplog. All logs will then be forwarded to the app-backend backend server prod-web-f1 & prod-web-f2. For a HTTP connection, you thus get this type of log:

haproxy[14389]: [06/Feb/2009:12:14:14.655] app-http-in~ \
app-backend/prod-web-f1 10/0/30/69/109 200 2750 – – —- 1/1/1/1/0 0/0 {1wt.eu} \
{} “GET /index.html HTTP/1.1”

Lots of useful information can be extracted from such a simple log:

Table HAProxy

HAProxy Log Format
Extract from the above exampleHAProxy Log Details
client_ip ':' client_port address & port of the client which initiated the connection.
'[' accept_date ']'
Exact date when the TCP connection was received by HAProxy.
app-http-inName of the frontend (or listener) which received and processed the connection.
backend_name '/' server_name app-backend/prod-web-f1
Backend_name: name of the backend (or listener) which was selected to manage the connection to the server.
Server_name: name of the last server to which the connection was sent.
Tq '/' Tw '/' Tc '/' Tr '/' Tt
Tq: Total time spent waiting for the client to send a full HTTP request. In milliseconds.
Tw: Total time spent waiting in the various queues. In milliseconds.
Tc: Total time spent waiting for the connection to establish to the final server, including retries. In milliseconds.
Tr: Total time spent waiting for the server to send a full HTTP response. In milliseconds.
Tt: Total time elapsed between the accept and the last close. In milliseconds.
200HTTP status code returned to the client.
Total number of bytes transmitted to the client.
termination_state---The condition the session was in when the session ended.
actconn '/' feconn '/' beconn '/' srv_conn '/' retries1/1/1/1/0
Actconn: Total number of concurrent connections being processed.
Feconn: Total number of concurrent connections to the frontend.
Beconn: Total number of concurrent connections handled by the backend.
Srv_conn: Total number of concurrent connections still active on the server.
Retries: Number of connection retries experienced by this session when trying to connect to the server.
srv_queue '/' backend_queue0/0
Srv_queue: Total number of requests which were processed before this one in the server queue.
Backend_queue: Total number of requests which were processed before this one in the backend global queue.
'{' captured_request_headers* '}' {haproxy.1wt.eu}
Captured request headers (if their is some).
'{' captured_response_headers* '}' {}
Captured response headers (if their is some).
'"' http_request '"'
"GET /index.html HTTP/1.1"
Complete HTTP request line.

What to do with HAProxy logs

So what can you do with all of the new information you can access? Once you have more than one log, you will able to calculate metrics by aggregating some log attribute values on a given time bucket. For instance:

  • You can count the sum of requests hitting your front-end server per second
  • You can see the average of request response times logged during one minute

Let’s take a closer look at what log analytics and log management can do for you.

1) Building some interesting HAProxy metrics

From all the information in the original raw HAProxy log events you can build the following metrics:

  • Http request rate (http/s) : Number of requests received by a server during 1 second
  • Http 4xx error rate (http_400/s): Number of requests received by a server with a 4xx status during 1 second
  • Http 5xx error rate (http_500/s): Number of requests received by a server with a 5xx status during 1 second
  • Http request error rate (http_err/s) : Number of requests received by a server with a 4xx or 5xx status during 1 second
  • Ratio of 4xx & 5xx error rate: Ratio between the number of requests received by a server with a 4xx or 5xx status and the total number of requests received by a server.
  • Average response time: Average of all response time logged during a given time bucket.

Exploring all these metrics and correlations will better help you maintain the performance of your HAProxy:

  • Ensure continuous service availability:

    • Keep an eye on peaks and drops on the overall frontend HTTP request rate. In the event of a traffic spike, you can monitor latency experienced by your clients, and identify its source with the average response time per frontend and backend servers.

      Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience.
    • Keep an eye on the maximum retry values per backend servers. If you begin to see more retries than usual (aka more than 0), it is likely that other metrics like average response time (rising), or request load balancing (degrading) will change. The root cause of all of this is usually that one of your backend servers went down.

  • Ensure fair application usability:

    • Keep an eye on response status fluctuation. Ideally, all responses forwarded by HAProxy would be class 2xx codes, so an unexpected surge in the number of other code classes would be a sign of trouble.

    • Under normal conditions, it is acceptable to (infrequently) receive invalid requests from clients. However, a significant increase in the number of invalid requests received could be a sign of larger issues. More 4xx? That could point to a misconfigured application. More 5xx? Your backend servers seems to be in trouble.

  • Ensure proper HAProxy configuration behavior:

    • Watch the average time spent by one request in a queue (Tw), and combining it with the value of Actconn, Feconn, Beconn and Srvconn, you can detect if your configuration is up to date compare to your traffic, or if you need to adjust some settings.

    • Also, watch retry rate and errors to rule out configuration issues. More errors coupled with low retry rates means the problem is not about configuration.

2) Using your HAProxy log data to enhance your monitoring

With the information provided by granular logs, slice your data with requests or client sessions to quickly get to incident root causes.

So for example, if you:

Take http_400/s & http_500/s
Slice it by request
4xx 5xx status requests

You’ll find the top request URI causing 4xx and 5xx statuses from your backend servers. Check if it’s the same requests causing trouble over and over, or if a pattern is emerging.

Another example is to:

Take Tt
Substract Tq + Tw + Tc + Tr
Slice it by requests

ha proxy monitoring

You’ll get the average transmission time for each request and can identify the slowest requests to transmit, giving you clues for future improvements for your HAProxy stack.

One last example:

Get termination state counts
Slice it by client session

The combination of the two first flags of a termination state give a lot of information about what was happening when the session terminated. (The most common termination state can be found here)

With the overall repartition of terminations, you can now detect server saturation, network troubles, local system resource starvation, attacks… But zooming in on a specific client session and actually seeing why its session ended sooner than usual will allow you to catch possible correlations between requests and your infrastructure behavior.

Related Posts

Get notified when new content is published