Exploring Snapt is a recurring series of Snapt blog posts that looks at highlighted features of the product.
Snapt monitors all of the responses that go back to your clients, in great detail. We do this primarily to measure performance and prevent errors, but it can be extremely useful when debugging servers as well.
We will highlight some of the nifty features and tips and tricks available for observing HTTP responses, performance metrics and more here.
HTTP Status Codes
One of the best features in Snapt is the Traffic Watch for HTTP status codes. Available under the Balancer -> Views & Data -> Traffic Watch menu item, the HTTP tab allows you to quickly see the number of responses by HTTP code, per server.
Above you can see nyc2-web0 has generated 1171 errors.
In general terms, a 4xx error is non-critical and a 5xx error is critical for webservers. 100-399 are status codes that are expected.
This allows you to quickly see if a specific server is generating errors.
The Live Dashboard is one of the most useful tabs, available under Balancer -> Balancer Dashboard; then the Live Dashboard subtab.
Here you can see a huge array of information, but specifically your HTTP health check status (and time), and the latency.
Above you can see the Status column says "UP [L7OK] in 4ms". This gives you a huge amount of information in one column.
- UP: This can be UP or DOWN; and in this case the server is online.
- L7OK: This is the health checks detailed status. L7OK means Layer 7 OK. L4TOUT would be a Layer 4 timeout, etc. This is very useful when deciding why a service is down.
- 4ms: This is the time taken to report the status code. With L7OK it means the HTTP servers reply time, which is great to see slow servers.
The next column to talk about is Latency. This lets us measure the average latency of communication. Depending on your app, it can be high, but when looking for an issue it's useful to compare to other servers!
Side note: below are some common Layer 4-7 error codes and what they mean:
|L4OK||check passed on layer 4, no upper layers testing enabled|
|L4TOUT||layer 1-4 timeout|
|L4CON||layer 1-4 connection problem, "Connection refused" or "No route to host"|
|L6OK||check passed on layer 6|
|L6TOUT||layer 6 (SSL) timeout|
|L6RSP||layer 6 invalid response - protocol error|
|L7OK||check passed on layer 7|
|L7OKC||check conditionally passed on layer 7|
|L7TOUT||layer 7 (HTTP/SMTP) timeout|
|L7RSP||layer 7 invalid response - protocol error|
|L7STS||layer 7 response error, for example HTTP 5xx|
Another great HTTP/S feature is response observation. To understand this, lets talk about health checks first.
If you have Layer 7 (e.g. http) health checks a servers health to Snapt looks like this:
- Can I connect to server_ip:80, if yes continue.
- Can I fetch a webpage from http://server_ip:80/xxx.php, if yes continue
- Server is online.
Now when that occurs, the server will get traffic. However, it may start generating 5xx errors that clients are receiving, which do not fall part of the health check.
Enabling HTTP Scanning on your Servers in a Group or Backend allows to count those toward the Fall Count. They will be considered failed health checks, and in that way bad servers can be picked up quickly!
Automatically monitoring the response times of your webservers is a great idea - it lets you stay ahead of potential failures, or simply poor client experience.
For this we use the Alerts menu option, under Balancer -> Configuration -> Alerts and Notices.
In the above example you can see we are alerting on "Response time (ms)". That's the HTTP servers time to reply to a request, and in our example we are alerting on anything over 500ms.
This lets you set a reasonable time based on your app, and be notified if it gets slower than expected!
Snapt is the leading edge software load balancer, jam packed with metrics and tools for DevOps.
Want your load balancer (that you can deploy anywhere) to tell you an app server is running 500ms slower than the rest? Via Slack?
Then you want Snapt! Try it for free today.