Closed
Bug 1151629
Opened 9 years ago
Closed 9 years ago
The bugscache population task has a hardcoded limit of 15,000 bugs, which we've now reached
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P1)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: emorley)
References
Details
Attachments
(1 file)
I've stared at it until my eyes swim, but it sure looks to me like it has kw:intermittent-failure, and browser_compartments.js in the summary. Still, https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=8c716f35d9ec&filter-searchStr=Windows%207%2032-bit%20mozilla-inbound%20debug%20test%20mochitest-browser-chrome-3 only gets the closed bug 1150259 suggested, not the open one with a (truncated) exact match for the failure.
Comment 1•9 years ago
|
||
I seem to be having a similar issue with bug 1151711.
Flags: needinfo?(emorley)
Priority: -- → P1
Comment 2•9 years ago
|
||
I'm seeing this with other bugs I filed yesterday as well. Seems widespread.
Flags: needinfo?(mdoglio)
Flags: needinfo?(cdawson)
Updated•9 years ago
|
Flags: needinfo?(mdoglio)
Comment 3•9 years ago
|
||
Heads up sheriffs - until fixed, this basically means we need to search BMO for dupes before filing any new oranges.
Comment 4•9 years ago
|
||
yeah confirmed i saw a lot of this issues today, we should get this fixed asap
Comment 5•9 years ago
|
||
Affects stage as well.
Assignee | ||
Comment 6•9 years ago
|
||
For the link in comment 0, the artefact URL is: https://treeherder.mozilla.org/api/project/mozilla-inbound/artifact/?job_id=8508222&name=Bug+suggestions&type=json Excerpt from it: "blob": [{ "search": "179 INFO TEST-UNEXPECTED-FAIL | toolkit/components/aboutperformance/tests/browser/browser_compartments.js | Sanity check (): totalUserTime is monotonic.: 15600 <= 0 - false == true - JS frame :: chrome://mochitests/content/browser/toolkit/components/aboutperformance/tests/browser/browser_compartments.js :: Assert_leq :: line 36", "search_terms": ["browser_compartments.js"], "bugs": { "open_recent": [], "all_others": [{ "crash_signature": "", "resolution": "FIXED", "summary": "Intermittent browser_compartments.js | Test timed out | Found a tab after previous test timed out: browser/browser_compartments.html?test=0.9079043654642461 | A promise chain failed to handle a rejection: - at browser-test.js:743", "relevance": 1.0, "keywords": "intermittent-failure", "os": "Windows XP", "id": 1150259 }] } }, -> The correct search term is being used, "browser_compartments.js". Searching with that term: https://treeherder.mozilla.org/api/bugscache/?search=browser_compartments.js Gives: { "open_recent": [{ "crash_signature": "", "resolution": "", "summary": "Intermittent browser_compartments.js | Sanity check (): totalUserTime is monotonic.: 15600 <= 0 - false == true - JS frame :: chrome://mochitests/content/browser/toolkit/components/aboutperformance/tests/browser/browser_compartments.js :: Assert_leq :: li", "relevance": 1.0, "keywords": "intermittent-failure", "os": "Windows 7", "id": 1151240 }], "all_others": [{ "crash_signature": "", "resolution": "FIXED", "summary": "Intermittent browser_compartments.js | Test timed out | Found a tab after previous test timed out: browser/browser_compartments.html?test=0.9079043654642461 | A promise chain failed to handle a rejection: - at browser-test.js:743", "relevance": 1.0, "keywords": "intermittent-failure", "os": "Windows XP", "id": 1150259 }] } So the bug is there now.
Flags: needinfo?(emorley)
Assignee | ||
Comment 7•9 years ago
|
||
If I could have some more examples, it would really help...
Comment 8•9 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #1) > I seem to be having a similar issue with bug 1151711.
Comment 9•9 years ago
|
||
Which was still affected as of ~4h ago on the last push to b2g37
Assignee | ||
Comment 10•9 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #9) > Which was still affected as of ~4h ago on the last push to b2g37 https://treeherder.mozilla.org/#/jobs?repo=mozilla-b2g37_v2_2&revision=9ab8a3ae0fc3&filter-searchStr=b2g_emulator_vm mozilla-b2g37_v2_2 debug test mochitest-debug-5 https://treeherder.mozilla.org/api/project/mozilla-b2g37_v2_2/artifact/?job_id=97349&name=Bug+suggestions&type=json { "search": "PROCESS-CRASH | dom/canvas/test/test_2d.composite.canvas.color-burn.html | application crashed [None]", "search_terms": ["test_2d.composite.canvas.color-burn.html"], "bugs": { "open_recent": [], "all_others": [] } }, https://treeherder.mozilla.org/api/bugscache/?search=test_2d.composite.canvas.color-burn.html -> {"open_recent": [], "all_others": []} Execute: > SELECT * FROM treeherder.bugscache WHERE id = 1151711 + ------- + ----------- + --------------- + ------------ + -------------------- + ------------- + ------- + ------------- + | id | status | resolution | summary | crash_signature | keywords | os | modified | + ------- + ----------- + --------------- + ------------ + -------------------- + ------------- + ------- + ------------- + | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | + ------- + ----------- + --------------- + ------------ + -------------------- + ------------- + ------- + ------------- + 1 rows (On both master and slave)
Assignee | ||
Comment 11•9 years ago
|
||
The fetch_bugs task runs from the celery_worker queue on rabbitmq1. The queue is not backlogged, now are others: [emorley@treeherder-rabbitmq1.private.scl3 ~]$ sudo rabbitmqctl list_queues -p treeherder Listing queues ... buildapi 0 calculate_eta 0 celery@buildapi.treeherder-etl1.private.scl3.mozilla.com.celery.pidbox 0 celery@buildapi.treeherder-etl2.private.scl3.mozilla.com.celery.pidbox 0 celery@default.treeherder-rabbitmq1.private.scl3.mozilla.com.celery.pidbox 0 celery@hp.treeherder-rabbitmq1.private.scl3.mozilla.com.celery.pidbox 0 celery@log_parser.treeherder-processor1.private.scl3.mozilla.com.celery.pidbox 0 celery@log_parser.treeherder-processor2.private.scl3.mozilla.com.celery.pidbox 0 celery@log_parser.treeherder-processor3.private.scl3.mozilla.com.celery.pidbox 0 celery@pushlog.treeherder-etl1.private.scl3.mozilla.com.celery.pidbox 0 celery@pushlog.treeherder-etl2.private.scl3.mozilla.com.celery.pidbox 0 celeryev.30087534-00c5-4c58-9821-5c9496bf2858 0 celeryev.3244c743-300e-4735-afce-9fc81e418171 0 celeryev.35020252-8bb5-435a-88a5-7bedadc5d3b9 0 celeryev.7e754135-9ba9-4fd8-977e-0a5f590790d0 0 celeryev.89d31514-286d-481a-a823-95c216367f5e 0 celeryev.9c6176ea-fc6c-4af5-b52f-3763b3e1a59a 0 celeryev.a9a7e73f-412d-450f-936a-496feff741b3 0 celeryev.b6d92059-4d6c-410a-95f9-d875d7b63cf0 0 celeryev.cdcb6436-3c46-4d15-b0b7-ba696aeec73b 0 cycle_data 0 default 0 fetch_bugs 0 fetch_missing_push_logs 0 high_priority 0 log_parser 0 log_parser_fail 0 log_parser_hp 0 log_parser_json 0 populate_performance_series 0 process_objects 0 pushlog 0 ...done. Rabbitmq1 looks ok on New Relic: https://rpm.newrelic.com/accounts/677903/servers/5575925 There are no fetch-bugs exceptions on: https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors The 7 day transaction view for fetch-bugs doesn't show anything obvious: https://rpm.newrelic.com/accounts/677903/applications/4180461/transactions?show_browser=false&tw[end]=1428508479&tw[start]=1427903679&type=all#id=5b224f746865725472616e73616374696f6e2f43656c6572792f66657463682d62756773222c22225d
Assignee | ||
Comment 12•9 years ago
|
||
s/now/nor/
Comment 13•9 years ago
|
||
Bug 1152289
Assignee | ||
Comment 14•9 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #13) > Bug 1152289 Thanks :-) Should be good with those now. There are no errors being shown in /var/log/celery/celery_worker.log on rabbitmq1: [2015-04-08 08:00:00,064: INFO/MainProcess] Received task: fetch-bugs[77925354-132e-4c67-8003-625c2a937f0c] ... [2015-04-08 08:05:10,681: INFO/MainProcess] Task fetch-bugs[77925354-132e-4c67-8003-625c2a937f0c] succeeded in 310.615266436s: None The runtime is sometimes above, sometimes below 300s, however this is pre-existing from March (and not that exceeded 300s should make any difference as far as I'm aware).
Assignee | ||
Comment 15•9 years ago
|
||
So the tl;dr of what I have so far is: * The bugs aren't even in the bugscache table (this is an issue with populating the bugscache, not generating the summaries etc) * There are zero exceptions/errors/... being reported * The job is definitely still running and takes the same amount of time to run (so presumably any issue is say on the insert into the DB, rather than the task bailing early or just being skipped) Will continue looking..
Assignee | ||
Comment 16•9 years ago
|
||
Sigh: https://github.com/mozilla/treeherder-service/blob/f5c0b53e0ce6b527c5eb2d861adeb72e1e5859ea/treeherder/etl/bugzilla.py#L39 offset = 0 limit = 500 # fetch new pages no more than 30 times # this is a safe guard to not generate an infinite loop # in case something went wrong for i in range(1, 30 + 1): # fetch the bugzilla service until we have an empty result paginated_url = "{0}&offset={1}&limit={2}".format( get_bz_source_url(), offset, limit ) 30 * 500 = 15,000 results max [~/src]$ curl 'https://bugzilla.mozilla.org/bzapi/count?keywords=intermittent-failure' {"data":15048} I don't know whether to laugh or cry.
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Assignee | ||
Comment 17•9 years ago
|
||
The existing code should have generated an exception, not carried on silently. But we can just remove the limit now IMO, since we know the search terms are correct (ie we're not fetching all of Bugzilla by accident, just intermittent failure bugs).
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(cdawson)
Summary: Bug 1151240 isn't suggested for browser_compartments.js failures → The bugscache population task has a hardocded limit of 15,000 bugs, which we've now reached
Assignee | ||
Updated•9 years ago
|
Summary: The bugscache population task has a hardocded limit of 15,000 bugs, which we've now reached → The bugscache population task has a hardcoded limit of 15,000 bugs, which we've now reached
Assignee | ||
Comment 18•9 years ago
|
||
Attachment #8589726 -
Flags: review?(mdoglio)
Assignee | ||
Updated•9 years ago
|
Component: Treeherder → Treeherder: Data Ingestion
Updated•9 years ago
|
Attachment #8589726 -
Flags: review?(mdoglio) → review+
Comment 19•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder-service https://github.com/mozilla/treeherder-service/commit/7be69d14922c10c0a30fcd8b4c842b80e4d89f36 Bug 1151629 - Don't limit the number of bugs retrieved by fetch_bugs Previously we limited the number of pages to 30, which with a page size of 500, meant a max of 15,000 intermittent-failure bugs retrieved from Bugzilla, with no exceptions or log output to indicate this had occurred. We now have more than 15,000 intermittent failure bugs, so the limit is being removed, since we're both confident that the search terms are correct, and any other infinite loop would be caught by the existing 600s timeout. The task currently takes ~300s to run, so there is still plenty of headroom. Plus a timeout exception would be immediately visible in New Relic and so much less of a pain to debug.
Assignee | ||
Comment 20•9 years ago
|
||
Before... Execute: > SELECT COUNT(*) FROM treeherder.bugscache + ------------- + | COUNT(*) | + ------------- + | 15000 | + ------------- + 1 rows And now another hourly fetch-bugs task has completed, after this was deployed: Execute: > SELECT COUNT(*) FROM treeherder.bugscache + ------------- + | COUNT(*) | + ------------- + | 15043 | + ------------- + 1 rows
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•