In api.py, we're sorting only by '-id', which means, the first accepted action terminates the iteration. Instead, we should order by '-signoff__id' '-id', so that we're going back over all sign-offs, and then over all actions, back in time of action. dogfood and regression, thus P1 and hotfix-worthy. Patch in hand.
Created attachment 562602 [details] [diff] [review] sort by signoff, then action, with tests This patch is a oneliner and then a ton of tests. I verified that without the one liner, the tests actually break. I basically had to add a completely new locale to the shipping tests for the new scenario, with has fall-out throughout the test, kinda sucks.
Attachment #562602 - Flags: review?(peterbe)
Comment on attachment 562602 [details] [diff] [review] sort by signoff, then action, with tests Review of attachment 562602 [details] [diff] [review]: ----------------------------------------------------------------- Why are we sorting by ID at all? It's generally considered a bad idea. That's what dates are for. I'm too lazy to find some sort of wikipedia article or something that explains why. I noticed that we have `Signoff.when` which defaults to todays date with timezone meaning it'll be set in the same chronological order as auto-incrementing IDs. Why can't we use that?
Attachment #562602 - Flags: review?(peterbe) → review+
I can see how sorting by ID is brittle in general. I wonder, are there perf implications when sorting by date instead of ID? Anyway, guess that's good for a follow-up?
(In reply to Axel Hecht [:Pike] from comment #3) > I can see how sorting by ID is brittle in general. I wonder, are there perf > implications when sorting by date instead of ID? Anyway, guess that's good > for a follow-up? IDs are always indexed using BTree so out-of-the-box they'll be fast to sort on. However, performance is, I guess, a much more delicate matter. Definitely a followup bug. Hence the r+ :)
This landed as https://github.com/mozilla/elmo/commit/5e81440dd3427c2a997fb7861fb2c0784022cde5. Didn't do a deployment on site, though. That's done now on site, too. (Peter has done a dev deployment in the meantime). Marking FIXED.
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
So, this needed a bustage fix, sadly. This broke because we changed how we ordered actions in shipping.api from being ordered by Action to be ordered by Signoff/Action. That's still right, but the code that chunked the Action queries into bunches to get at a time didn't work with that, and thus actions got dropped, resulting in bad data. Hard to see in tests as it only affects if you have more data than is gotten in one chunk, which is 100 by default. We fixed this by not reducing the locales to query over in each iteration. That makes us go through more db items, but we still drop once all locales are covered. Not a great help as long as we don't reduce the locales to begin with, but that's OK for now. The magic bullet is, we don't change the query we slice while we're going through it. The other alternative would have been to redo all locales from scratch that have not been completely handled yet. Then you do reduce the locales, but you go over the same block of data over and over potentially. You could possibly even dead-lock. Thinking about it, that's even harder to get right than the one before. Also renamed chunks to chunk_size on the way, as it was confusing as a variable. Tests for this would be great, but given the amount of data we need and the fact that we block release at this point, that's not blocking. Landed as a hotfix, https://github.com/mozilla/elmo/commit/e28dec456ff71683a16606d96b32b6f24f51b453
You need to log in before you can comment on or make changes to this bug.