Prevent re-occurrence of a result set missing its corresponding row in the revision table

RESOLVED DUPLICATE of bug 1083305

Status

P1
major
RESOLVED DUPLICATE of bug 1083305
4 years ago
4 years ago

People

(Reporter: emorley, Unassigned)

Tracking

Details

(Reporter)

Description

4 years ago
Yesterday evening, we ended up with a partially ingested mozilla-inbound push, where we had inserted the row for the resultset, but didn't have a corresponding row for it in the revision table. The revision had to be manually inserted into the DB by camd.

This bug is to prevent this from occurring again - presumably by either ensuring that both inserts happen in one transaction (that will roll back if interrupted), or else by making the pushlog ingestion handle the split state more gracefully (eg not updated the cached pushlog rev, and on next run of the cron insert the missing row in the other table without bailing).

I wonder if the issue here is the cause of bug 1076752 too?


Transcript from #treeherder for reference:

22:04	KWierso|sheriffduty	mdoglio, camd: Hrm, I'm not seeing luke's most recent push to mozilla-inbound on TH
22:04	KWierso|sheriffduty	but it's been on TBPL for an hour...
22:12	camd	KWierso|sheriffduty: hey, sorry to not have gotten back to you. jeads and I are debugging this right now
22:15	jeads	camd: https://github.com/mozilla/treeher...er/treeherder/etl/pushlog.py#L87
22:21	jeads	camd: https://github.com/mozilla/treeher...er/treeherder/etl/pushlog.py#L76
22:21	camd	https://hg.mozilla.org/integration...77285cc36cf6cc2082a6563d6f7ac09
...
23:04	camd	jeads: we have the resultset, but not the revision. the record was never created for it.
23:05	camd	can't tell why yet. But I only notice that the specific revision has a value for the field ``tags`` where most of the others don't . However we don't use that field, so I don't think it should matter...
23:07	camd	jeads: so we won't ingest that resultset, because we already have it. But we probably have a failed join somewhere because it doesn't have any revisions...
...
23:33	camd	jeads: I can confirm the revision hash stored with that resultset is correct.
23:33	camd	jeads: now I just have to determine why we don't ingest the revisions when we already have the resultset (and how to remedy that)
23:37	jeads	camd: let me know if you have specific questions
23:49	camd	jeads: I think I'm good. thanks for all the help on debugging this. but I can see that wintout a transaction, we could get into this partially loaded resultset state. must have happened when we were pushing new code that it got interrupted.
23:50	camd	we either need a transaction, or to "pick up where we left off" on the resultset ingestion
...
00:01	camd	jeads: I actually think I do have a specific question for you. Are you available for a couple minutes?
00:03	jeads	camd: sorry, busy, I don’t think the observed behavior with the memcache can be explained by a not having an explicit transaction
00:04	jeads	camd: if that was try, it should correct itself when we cleared the cache for mozilla-inbound
00:04	camd	jeads: the problem is that we actually HAVE the resultset in the db. but no revisions for it.
00:04	camd	it must have gotten interrupted in the middle of ingestion.
00:05	camd	but now it doesn't try to re-ingest the resultset, because it already has it
00:05	camd	but I suspect our webservice doesn't return it because the revision is missing? not clear on that quite yet. still digging.
00:06	camd	but I think I know where to go from here. :)
00:06	KWierso|sheriffduty	camd: :( a new push on top of it didn't magically make it show up
00:07	camd	KWierso|sheriffduty: unfortunately, no. I'm working on a fix for this right now. but the fix won't be ready until that push has dropped off of json-pushes, I would think.
00:08	camd	I think the best I can do without direct database surgery is to safeguard against this happening in the future. :(
00:09	KWierso|sheriffduty	camd: I think that's acceptable
00:10	jeads	camd: if that’s the case, if you want to manually get ingestion going again you can insert the revision that’s missing in the revision table for the last result set received
00:10	camd	jeads: so, the nice thing is that ingestion is continuing already
00:10	camd	but, in the ui, it is just missing that "bad" push
00:11	camd	however, it moves on to the next one just fine
00:11	camd	jeads: but, yeah. I /could/ manually insert the revision and the resultset/revision map record.
00:12	camd	I'm loathe to insert directly into the db, I must admit. :)
00:12	KWierso|sheriffduty	camd: would deleting whatever did get inserted and then re-running ingestion be easier?
00:13	camd	KWierso|sheriffduty: I'd have to check, but I suspect it actually has gotten the jobs for that resultset. hmm.. checking...
00:14	camd	hope, I'm wrong. no jobs for it....
00:14	camd	s/hope/nope
00:15	camd	yeah, if i delete the resultset, then it'll probably pick it up again from json-pushes. if I reset the cache value as well...
00:18	camd	jeads: do you have an opinion on that? delete the resultset vs. manually inserting the revision? :)
00:19	jeads	camd: I would highly recommend going with the manual insertion of the revision
00:19	jeads	camd: that is much less risky than a physical delete of a result_set
00:20	camd	jeads: makes sense, yeah. ok. going for it. :)
00:20	camd	sorry in advance for all the explosions and radioactive fallout...
00:20	jeads	camd: feel free to show me your SQL before you execute it, nothing like a second set of eyes
00:20	camd	jeads: thanks. sounds good.
00:26	camd	jeads: here's for the revision:
00:26	camd	INSERT INTO `mozilla_inbound_jobs_1`.`revision` (`revision`, `author`, `comments`, `files`, `repository_id`) VALUES ('e91da4515772', 'Luke Wagner <luke%mozilla.com>', 'Bug 965880 - OdinMonkey: add back masm.storePtr that somehow got lost in rebasing on CLOSED TREE (r=me)', 'js/src/asmjs/AsmJSValidate.cpp', '2');
00:26	camd	this was the json-push:
00:26	camd	https://hg.mozilla.org/integratio...77285cc36cf6cc2082a6563d6f7ac09
00:30	jeads	camd: I executed it on my local treeherder env, it worked and the data inserted looks good, you will also need an insert statement for the revision_map to map the ids
00:30	jeads	but that will require executing this one first
00:31	camd	yep, I'll create that once I get the id for this new record.
00:31	camd	thanks :)
00:35	camd	jeads: KWierso|sheriffduty: it worked! now we'll have to see if jobs get ingested to it
00:35	camd	https://treeherder.mozilla.org/ui/...nbound&revision=e91da4515772
00:35	camd	looks like it. :)
00:36	jeads	camd: super, well at least we have a way of reseting things now, great work and good find
This one was entered first, but the fix went in for bug 1083305.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1083305
(Reporter)

Updated

4 years ago
No longer blocks: 1059400, 1076750
You need to log in before you can comment on or make changes to this bug.