Closed
Bug 1294025
Opened 9 years ago
Closed 8 years ago
Intermittent browser/base/content/test/plugins/browser_CTP_crashreporting.js | Uncaught exception - Timed out waiting for plugin binding to be in success state - timed out after 50 tries.
Categories
(Core Graveyard :: Plug-ins, defect, P3)
Core Graveyard
Plug-ins
Tracking
(firefox-esr52 unaffected, firefox54 wontfix, firefox55 fixed, firefox56 fixed)
RESOLVED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox54 | --- | wontfix |
firefox55 | --- | fixed |
firefox56 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: gsvelto)
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:timing])
Attachments
(2 files)
![]() |
||
Updated•9 years ago
|
Priority: -- → P4
Updated•9 years ago
|
Priority: P4 → P3
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 5•8 years ago
|
||
this bug has a few spikes and a many days with no failures:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-05-01&endday=2017-05-30&tree=all
recently we have had many failures since May 23rd and it jumped up high May 25th mostly on linux32 debug and also on linux64 debug browser-chrome e10s!
here is a log for linux32-debug:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=102914117
and the related screenshot:
https://public-artifacts.taskcluster.net/WB2b4NwwTnO5SyKS78HqDQ/0/public/test_info//mozilla-test-fail-screenshot_UCEGME.png
and what I see in the above log is:
[task 2017-05-29T23:14:10.019309Z] 23:14:10 INFO - Entering test bound
[task 2017-05-29T23:14:10.021042Z] 23:14:10 INFO - Buffered messages logged at 23:13:55
[task 2017-05-29T23:14:10.025967Z] 23:14:10 INFO - TEST-PASS | browser/base/content/test/plugins/browser_CTP_crashreporting.js | Plugin should not be activated -
[task 2017-05-29T23:14:10.028058Z] 23:14:10 INFO - Buffered messages finished
[task 2017-05-29T23:14:10.031400Z] 23:14:10 INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/plugins/browser_CTP_crashreporting.js | Uncaught exception - Timed out waiting for plugin binding to be in success state - timed out after 50 tries.
[task 2017-05-29T23:14:10.033290Z] 23:14:10 INFO - Leaving test bound
[task 2017-05-29T23:14:10.036124Z] 23:14:10 INFO - Entering test bound
the failure seems to be on this line:
https://dxr.mozilla.org/mozilla-central/source/browser/base/content/test/plugins/browser_CTP_crashreporting.js?q=path%3Abrowser_CTP_crashreporting.js&redirect_type=single#139
I am glad we timeout and handle this faster than the harness timing out
:bsmedberg, I see you as the triage owner, can you get someone from the plug-ins team (or who is knowledgeable about this specific test case) to look at this failure and make it more reliable in the next 2 weeks?
Flags: needinfo?(benjamin)
Whiteboard: [stockwell needswork]
Comment hidden (Intermittent Failures Robot) |
Comment 7•8 years ago
|
||
Joel, do you have a regression range from the 22-25 May that I could peruse? My old tool to generate regression ranges for nightlies is broken. I suspect recent crash reporting stuff gsvelto was working on but I'd be more comfortable to see that before reassigning.
Flags: needinfo?(benjamin) → needinfo?(jmaher)
Comment 8•8 years ago
|
||
I don't have a tighter range, I was just trying to help shed light on what was going on- my primary goal is to triage and get this to the test owners to fix the issues. Right now I don't have bandwidth to do a lot of extra things
Flags: needinfo?(jmaher)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 11•8 years ago
|
||
I suspect bug 1359326 could have made this worse; there might be a race between the crash processing and the crash submission display and it might have been widened by that bug. I'll have a look today.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 15•8 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #11)
> I suspect bug 1359326 could have made this worse; there might be a race
> between the crash processing and the crash submission display and it might
> have been widened by that bug. I'll have a look today.
Did you get a chance to look at this? There are lots of failures now...would like to see this test fixed or disabled soon.
Flags: needinfo?(gsvelto)
Assignee | ||
Comment 16•8 years ago
|
||
(In reply to Geoff Brown [:gbrown] (pto June 12 - 16) from comment #15)
> Did you get a chance to look at this? There are lots of failures now...would
> like to see this test fixed or disabled soon.
Yes, but I couldn't find the root cause and I don't seem to be able to reproduce locally. I think I know a why to mitigate it though. I'll send a try run with the necessary change and I'll re-trigger this test a bunch of times to see if it gets better. If it doesn't we can disable it and then I'll re-enable it in bug 1323979 where I'm changing the way events about crashes are delivered throughout the system. Not clearing the NI for now.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 18•8 years ago
|
||
I've managed to reproduce this locally using rr's chaos mode. I should be able to fix it on Monday, if not I'll disable it in the meantime. Taking the bug.
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Flags: needinfo?(gsvelto)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 21•8 years ago
|
||
Sorry if it took me a while to figure this one out but it was tricky. Originally I thought about bug 1359326 because it was a rewrite of the crash reporting flow so I thought that might have broken something. As it turns out, it didn't. Bug 1335536 on the other hand broke a promise chain that made crash reporting reliable. The reason why this has only shown up after bug 1359326 landed is that previously that part of the crash reporting chain was mostly made of C++ code that run on a background thread thus not affecting the event loop. The new code is JS, and while it's asynchronous and runs stuff in the background it's using promises to sync everything up, so it must have upset the broken promise chain by "injecting" stuff in the middle of it and revealing the race.
I've got a patch that restores the promise chain and it's currently running on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea482c07aa3416e68b1bd3b1d148145013b3875a&group_state=expanded
I've re-triggered the affected ~10 times and they don't seem to fail anymore. Before they failed very often, especially on debug builds.
Assignee | ||
Comment 22•8 years ago
|
||
Sorry for the typo, I meant I've re-triggered the affected *tests* ~10 times.
Comment hidden (mozreview-request) |
Comment 24•8 years ago
|
||
mozreview-review |
Comment on attachment 8876677 [details]
Bug 1294025 - Fix the broken promise chain when recording a crash submission attempt;
https://reviewboard.mozilla.org/r/148026/#review152494
sneaky!
I bet making some of this async/await would make it more readable.
Attachment #8876677 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 25•8 years ago
|
||
Thanks for the review! I'm about to land but I'll leave the bug open for now, let's close it when we're sure that the test is not failing anymore.
Keywords: leave-open
Comment 26•8 years ago
|
||
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/aff2e55a7974
Fix the broken promise chain when recording a crash submission attempt; r=bsmedberg
Comment hidden (Intermittent Failures Robot) |
Comment 28•8 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 30•8 years ago
|
||
The change here doesn't seem to have worked - this test is currently perma-failing.
Comment 31•8 years ago
|
||
it sounds like the changes made this worse- should we:
1) back out the changes
2) disable the test for linux debug?
Flags: needinfo?(gsvelto)
Assignee | ||
Comment 32•8 years ago
|
||
I'm fairly stumped but I'm running more tests right now. If they're inconclusive I'll disable the test today and re-enable it once I've refactored all the associated code in bug 1323979.
Flags: needinfo?(gsvelto)
Assignee | ||
Comment 33•8 years ago
|
||
The issue that's causing the failures on Linux debug builds is different than what I was seeing before and it looks like it's a race in the test code itself. Unfortunately verifying it takes hours because I really can't reproduce it easily on my machine so I have to wait the usual try-run turn around time. I'll disable the test in the meantime and I will file a bug to re-enable it once I've found a solution.
Comment 34•8 years ago
|
||
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bacba9805202
Disable browser_CTP_crashreporting.js on Linux debug builds; r=me
Assignee | ||
Updated•8 years ago
|
Whiteboard: [stockwell needswork] → [stockwell needswork][test disabled]
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 36•8 years ago
|
||
bugherder |
Assignee | ||
Comment 37•8 years ago
|
||
OK, I've figured out why the test was perma-failing on Linux debug builds: it was timing out. Now that the promise chain is restored it often took a while longer for the fake crash submission server to respond thus triggering the rest of the test-chain. With the promise chain broken it was working some of the time because the code was just racing ahead to the desired condition without waiting for the actual events to happen. In a sense the test wasn't working, but it didn't explicitly fail either.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 39•8 years ago
|
||
Try run with test re-enabled and the timeout of a check extended to prevent it from failing. I've retriggered the affected chunks ~15 times to ensure they're stable:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=294adafebc09b08ccd908f00a4db4f35653c43c7&group_state=expanded
Assignee | ||
Comment 40•8 years ago
|
||
This re-enables the test and lengthens the time we wait for the crash submitted status to be reported. I couldn't trigger the issue anymore over ~20 runs so I'm pretty sure it's settled for good.
Attachment #8878469 -
Flags: review?(benjamin)
Updated•8 years ago
|
Attachment #8878469 -
Flags: review?(benjamin) → review+
Comment 41•8 years ago
|
||
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d69a2c18f841
Ensure the test waits long enough for the success condition to be fullfilled; r=bsmedberg
![]() |
||
Comment 42•8 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Whiteboard: [stockwell needswork][test disabled] → [stockwell fixed:timing]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 46•8 years ago
|
||
These failures seem to have gone away now on central? Time to call this fixed?
Flags: needinfo?(gsvelto)
Flags: needinfo?(gbrown)
![]() |
||
Comment 47•8 years ago
|
||
I think so! Thanks much :gsvelto.
(Failures continue on beta...could the patches be uplifted?)
Flags: needinfo?(gbrown)
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox56:
--- → fixed
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Updated•8 years ago
|
status-firefox55:
--- → affected
Updated•8 years ago
|
status-firefox54:
--- → wontfix
status-firefox-esr52:
--- → unaffected
Comment 50•8 years ago
|
||
bugherder uplift |
Flags: in-testsuite+
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•