Closed
Bug 1085682
Opened 10 years ago
Closed 10 years ago
Analyse cases where we had to use the pushlog ingestion fallback
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P3)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Bug 1077136 added a workaround for missing pushes: If we find a job that refers to a push that is unknown to treeherder, then manually try and ingest that push again.
This solves the initial urgency of the problem and was the best first step for us to do, however:
1) It's a workaround for the real problem.
2) Until a result set has a job, it still doesn't appear (so there's an additional lag).
3) Some result sets may never get a job (eg due to DONTBUILD or coalescing) and so never appear.
4) Even when we hit this fallback, the UI (at least currently) doesn't update, so the user has to manually refresh.
In this bug it would be good to:
a) Add additional logging to the initial pushlog ingestion process.
b) Analyse the log output added as part of bug 1077136 and cross reference with that from #a, to see if we can find the exact times pushes were missed, which might lead to the root cause (eg see if they always occur when we've done a prod push, or at peak load times of day).
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 1•10 years ago
|
||
Note to self:
/var/log/celery/celery_worker_pushlog.log on treeherder-etl[12]
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → emorley
Assignee | ||
Updated•10 years ago
|
No longer blocks: 1080757
Component: Treeherder → Treeherder: Data Ingestion
Assignee | ||
Comment 2•10 years ago
|
||
I think it's only worth looking into this once bug 1090441 is fixed - since without that, at least a proportion of the pushlog ingestion fallback cases will just be bad luck + races between the normal scheduled pushlog ingestion and the builds-pending.js ingestion.
Assignee | ||
Comment 3•10 years ago
|
||
Though that said, some initial observations:
1) All of the fallback cases I could see were for Try - I'm guessing we're more likely to time out fetching the pushlog / get a 500 for it, and so not get the push before the job is ingested. Perhaps we need to increase the original pushlog ingestion timeout?
2) There were many many duplicate revisions being requested - it seems as though we queue up hundreds of the same revision in the fetch-missing-push-logs task queue.
Assignee | ||
Comment 4•10 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #3)
> 2) There were many many duplicate revisions being requested - it seems as
> though we queue up hundreds of the same revision in the
> fetch-missing-push-logs task queue.
Filed bug 1118068.
The noise in the logs will be much lower once these two additional deps are fixed, so let's hold off here until then.
Assignee | ||
Comment 5•10 years ago
|
||
Saw a few of these whilst debugging bug 1125410:
[2015-01-23 15:52:08,143: WARNING/Worker-4] Found builds4h jobs with missing resultsets. Scheduling re-fetch: defaultdict(<type 'set'>, {'mozilla-aurora': ['dca8fe9d9425']})
The revision doesn't exist on mozilla-aurora.
The l10n jobs lie and put the wrong revision in builds-4hr:
{
"builder_id": 239152,
"buildnumber": 555,
"endtime": 1422045230,
"id": 57291749,
"master_id": 124,
"properties": {
"app": "browser",
"appName": "Firefox",
"appVersion": "37.0a2",
"aws_ami_id": "ami-36502b5e",
"aws_instance_id": "i-8d849d61",
"aws_instance_type": "r3.xlarge",
"basedir": "/builds/slave/m-aurora-l64-l10n-dep-00000000",
"branch": "mozilla-aurora",
"builddir": "m-aurora-l64-l10n-dep-00000000",
"buildername": "Firefox mozilla-aurora linux64 l10n dep",
"buildid": "20150123004028",
"buildnumber": 555,
"commit_titles": [
"update Punjabi Translation: merge"
],
"en_revision": "default",
"forced_clobber": false,
"fx_revision": "79fcabb42355",
"hashType": "sha512",
"inipath": "dist/l10n-stage/firefox/application.ini",
"l10n_revision": "dca8fe9d9425",
"locale": "pa-IN",
"log_url": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-l10n/mozilla-aurora-linux64-l10n-dep-pa-IN-bm71-build1-build555.txt.gz",
"master": "http://buildbot-master71.srv.releng.use1.mozilla.com:8001/",
"packageUrl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-l10n/firefox-37.0a2.pa-IN.linux-x86_64.tar.bz2",
"periodic_clobber": false,
"placement/availability_zone": "us-east-1a",
"platform": "linux64",
"product": "firefox",
"project": "",
"purge_actual": "28.19GB",
"purge_target": "3GB",
"purged_clobber": true,
"repository": "",
"request_ids": [
60056556
],
"request_times": {
"60056556": 1422044215
},
"revision": "dca8fe9d94258eb617529c22107e0e2c7c222025",
"scheduler": "mozilla-aurora l10n",
"slavebuilddir": "m-aurora-l64-l10n-dep-00000000",
"slavename": "bld-linux64-spot-1019",
"stage_platform": "linux64",
"toolsdir": "/builds/slave/m-aurora-l64-l10n-dep-00000000/tools",
"tree": "fxaurora"
},
"reason": "scheduler",
"request_ids": [
60056556
],
"requesttime": 1422044215,
"result": 0,
"slave_id": 8387,
"starttime": 1422044283
},
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → emorley
Assignee | ||
Updated•10 years ago
|
Priority: P2 → P3
Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•