The Vanishing Thread and Postgresql TCP Connection Parameters

11 · Instacart · April 15, 2022, 3:27 p.m.
Like any good story, this one starts in the logs. Our platform engineering team noticed that we occasionally see very long-running requests that ran well past our SLA. They seemed to ignore any application timeouts so they timeout around 15 minutes instead of 5,10 or 15 seconds. We thought it may be a logging fluke, but after digging further in the logs it seemed that the Puma threads that raised these delayed timeouts stopped processing any work and were blocked on something.Log volume from a s...