Yesterday evening, we ended up with a partially ingested mozilla-inbound push, where we had inserted the row for the resultset, but didn't have a corresponding row for it in the revision table. The revision had to be manually inserted into the DB by camd. This bug is to prevent this from occurring again - presumably by either ensuring that both inserts happen in one transaction (that will roll back if interrupted), or else by making the pushlog ingestion handle the split state more gracefully (eg not updated the cached pushlog rev, and on next run of the cron insert the missing row in the other table without bailing). I wonder if the issue here is the cause of bug 1076752 too? Transcript from #treeherder for reference: 22:04 KWierso|sheriffduty mdoglio, camd: Hrm, I'm not seeing luke's most recent push to mozilla-inbound on TH 22:04 KWierso|sheriffduty but it's been on TBPL for an hour... 22:12 camd KWierso|sheriffduty: hey, sorry to not have gotten back to you. jeads and I are debugging this right now 22:15 jeads camd: https://github.com/mozilla/treeher...er/treeherder/etl/pushlog.py#L87 22:21 jeads camd: https://github.com/mozilla/treeher...er/treeherder/etl/pushlog.py#L76 22:21 camd https://hg.mozilla.org/integration...77285cc36cf6cc2082a6563d6f7ac09 ... 23:04 camd jeads: we have the resultset, but not the revision. the record was never created for it. 23:05 camd can't tell why yet. But I only notice that the specific revision has a value for the field ``tags`` where most of the others don't . However we don't use that field, so I don't think it should matter... 23:07 camd jeads: so we won't ingest that resultset, because we already have it. But we probably have a failed join somewhere because it doesn't have any revisions... ... 23:33 camd jeads: I can confirm the revision hash stored with that resultset is correct. 23:33 camd jeads: now I just have to determine why we don't ingest the revisions when we already have the resultset (and how to remedy that) 23:37 jeads camd: let me know if you have specific questions 23:49 camd jeads: I think I'm good. thanks for all the help on debugging this. but I can see that wintout a transaction, we could get into this partially loaded resultset state. must have happened when we were pushing new code that it got interrupted. 23:50 camd we either need a transaction, or to "pick up where we left off" on the resultset ingestion ... 00:01 camd jeads: I actually think I do have a specific question for you. Are you available for a couple minutes? 00:03 jeads camd: sorry, busy, I don’t think the observed behavior with the memcache can be explained by a not having an explicit transaction 00:04 jeads camd: if that was try, it should correct itself when we cleared the cache for mozilla-inbound 00:04 camd jeads: the problem is that we actually HAVE the resultset in the db. but no revisions for it. 00:04 camd it must have gotten interrupted in the middle of ingestion. 00:05 camd but now it doesn't try to re-ingest the resultset, because it already has it 00:05 camd but I suspect our webservice doesn't return it because the revision is missing? not clear on that quite yet. still digging. 00:06 camd but I think I know where to go from here. :) 00:06 KWierso|sheriffduty camd: :( a new push on top of it didn't magically make it show up 00:07 camd KWierso|sheriffduty: unfortunately, no. I'm working on a fix for this right now. but the fix won't be ready until that push has dropped off of json-pushes, I would think. 00:08 camd I think the best I can do without direct database surgery is to safeguard against this happening in the future. :( 00:09 KWierso|sheriffduty camd: I think that's acceptable 00:10 jeads camd: if that’s the case, if you want to manually get ingestion going again you can insert the revision that’s missing in the revision table for the last result set received 00:10 camd jeads: so, the nice thing is that ingestion is continuing already 00:10 camd but, in the ui, it is just missing that "bad" push 00:11 camd however, it moves on to the next one just fine 00:11 camd jeads: but, yeah. I /could/ manually insert the revision and the resultset/revision map record. 00:12 camd I'm loathe to insert directly into the db, I must admit. :) 00:12 KWierso|sheriffduty camd: would deleting whatever did get inserted and then re-running ingestion be easier? 00:13 camd KWierso|sheriffduty: I'd have to check, but I suspect it actually has gotten the jobs for that resultset. hmm.. checking... 00:14 camd hope, I'm wrong. no jobs for it.... 00:14 camd s/hope/nope 00:15 camd yeah, if i delete the resultset, then it'll probably pick it up again from json-pushes. if I reset the cache value as well... 00:18 camd jeads: do you have an opinion on that? delete the resultset vs. manually inserting the revision? :) 00:19 jeads camd: I would highly recommend going with the manual insertion of the revision 00:19 jeads camd: that is much less risky than a physical delete of a result_set 00:20 camd jeads: makes sense, yeah. ok. going for it. :) 00:20 camd sorry in advance for all the explosions and radioactive fallout... 00:20 jeads camd: feel free to show me your SQL before you execute it, nothing like a second set of eyes 00:20 camd jeads: thanks. sounds good. 00:26 camd jeads: here's for the revision: 00:26 camd INSERT INTO `mozilla_inbound_jobs_1`.`revision` (`revision`, `author`, `comments`, `files`, `repository_id`) VALUES ('e91da4515772', 'Luke Wagner <luke%mozilla.com>', 'Bug 965880 - OdinMonkey: add back masm.storePtr that somehow got lost in rebasing on CLOSED TREE (r=me)', 'js/src/asmjs/AsmJSValidate.cpp', '2'); 00:26 camd this was the json-push: 00:26 camd https://hg.mozilla.org/integratio...77285cc36cf6cc2082a6563d6f7ac09 00:30 jeads camd: I executed it on my local treeherder env, it worked and the data inserted looks good, you will also need an insert statement for the revision_map to map the ids 00:30 jeads but that will require executing this one first 00:31 camd yep, I'll create that once I get the id for this new record. 00:31 camd thanks :) 00:35 camd jeads: KWierso|sheriffduty: it worked! now we'll have to see if jobs get ingested to it 00:35 camd https://treeherder.mozilla.org/ui/...nbound&revision=e91da4515772 00:35 camd looks like it. :) 00:36 jeads camd: super, well at least we have a way of reseting things now, great work and good find
This one was entered first, but the fix went in for bug 1083305.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1083305
You need to log in before you can comment on or make changes to this bug.