3 upstream errors in Nginx ingress with Gunicorn/Flask as backend in Kubernetes environment

Recently I faced 3 types of upsteam errors. It was like 2% of all requests, but it was quite annoying.

Fortunately, I was able to reproduce errors in the preprod environment with locust load tests.

connect() failed (110: Connection timed out) while connecting to upstream

Gunicorn was launched with --worker-class gevent --workers 4. It works well but sometimes produces spikes with a response time of more than 30 seconds and this error.

I didn’t find the reason, but a change to --worker-class gthread --workers 3 --threads 2 solved the problem.

It’s less performant but more stable.

upstream prematurely closed connection while reading response header from upstream

This problem was because of the absence of the --keep-alive option, by default it’s 2 seconds.

I have no idea what value is perfect but 64 helped.

I tested it with uwsgi server too, same result.

connect() failed (111: Connection refused) while connecting to upstream

This one happened on pods downscaling, looks like pod accepts some connections, but the app is down already.

To solve this I used preStop hook with a small sleep timeout. As I understand it gives time to complete requests and remove pod from load balancer before termination.

  lifecycle:
    preStop:
      exec:
        command:
          - sleep
          - "15"