3 upstream errors in Nginx ingress with Gunicorn/Flask as backend in Kubernetes environment
Recently I faced 3 types of upsteam errors. It was like 2% of all requests, but it was quite annoying.
Fortunately, I was able to reproduce errors in the preprod environment with locust load tests.
connect() failed (110: Connection timed out) while connecting to upstream
Gunicorn was launched with --worker-class gevent --workers 4
. It works well but sometimes produces spikes with a response time of more than 30 seconds and this error.
I didn’t find the reason, but a change to --worker-class gthread --workers 3 --threads 2
solved the problem.
It’s less performant but more stable.
upstream prematurely closed connection while reading response header from upstream
This problem was because of the absence of the --keep-alive
option, by default it’s 2
seconds.
I have no idea what value is perfect but 64
helped.
I tested it with uwsgi
server too, same result.
connect() failed (111: Connection refused) while connecting to upstream
This one happened on pods downscaling, looks like pod accepts some connections, but the app is down already.
To solve this I used preStop
hook with a small sleep timeout. As I understand it gives time to complete requests and remove pod from load balancer before termination.
lifecycle:
preStop:
exec:
command:
- sleep
- "15"