Fast turnaround of fixes...
Oct. 9th, 2017 04:50 pmAfter carefully enabling debugging — something which, when switched on across the board, generates so much data that it causes everything to grind to a halt in a matter of minutes — and tracing a job, I noticed that the start of the job coincided with a periodic cleanup event. Checking the source code, I noticed the cleanup was using a negative match to determine what to remove, confirming my suspicion that either a race condition or a type mismatch was to blame.
The person I was working with was in a chat session with the developer. They mentioned that we thought the failure was caused by a race with the periodic event but didn't provide any further details. Within seconds, they'd got a reply which effectively restated our hypothesis. Then, a few seconds after that, they got another message from the developer saying that they were in the process of producing a fix and could we please send the logs for confirmation.
It took longer to work out how to transfer the logs than it took to develop a first cut fix for the problem...