Bugzilla

Updated

•

8 years ago

Priority: P4 → P3

Comment hidden (Intermittent Failures Robot)

6 failures in 733 pushes (0.008 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* mozilla-inbound: 3
* autoland: 3

Platform breakdown:
* linux64: 3
* linux32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-01-30&endday=2017-02-05&tree=all

Comment hidden (Intermittent Failures Robot)

10 failures in 883 pushes (0.011 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 4
* mozilla-beta: 3
* mozilla-inbound: 1
* mozilla-central: 1
* graphics: 1

Platform breakdown:
* linux32: 5
* linux64: 4
* linux64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-04-24&endday=2017-04-30&tree=all

Comment hidden (Intermittent Failures Robot)

25 failures in 879 pushes (0.028 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 24
* mozilla-central: 1

Platform breakdown:
* linux64: 13
* linux32: 12

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-05-08&endday=2017-05-14&tree=all

Comment hidden (Intermittent Failures Robot)

40 failures in 891 pushes (0.045 failures/push) were associated with this bug in the last 7 days. 

This is the #34 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* autoland: 29
* mozilla-inbound: 6
* mozilla-central: 5

Platform breakdown:
* linux32: 38
* linux64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-05-22&endday=2017-05-28&tree=all

Comment 5

•

7 years ago

this bug has a few spikes and a many days with no failures:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-05-01&endday=2017-05-30&tree=all

recently we have had many failures since May 23rd and it jumped up high May 25th mostly on linux32 debug and also on linux64 debug browser-chrome e10s!

here is a log for linux32-debug:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=102914117

and the related screenshot:
https://public-artifacts.taskcluster.net/WB2b4NwwTnO5SyKS78HqDQ/0/public/test_info//mozilla-test-fail-screenshot_UCEGME.png

and what I see in the above log is:
[task 2017-05-29T23:14:10.019309Z] 23:14:10     INFO - Entering test bound 
[task 2017-05-29T23:14:10.021042Z] 23:14:10     INFO - Buffered messages logged at 23:13:55
[task 2017-05-29T23:14:10.025967Z] 23:14:10     INFO - TEST-PASS | browser/base/content/test/plugins/browser_CTP_crashreporting.js | Plugin should not be activated - 
[task 2017-05-29T23:14:10.028058Z] 23:14:10     INFO - Buffered messages finished
[task 2017-05-29T23:14:10.031400Z] 23:14:10     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/plugins/browser_CTP_crashreporting.js | Uncaught exception - Timed out waiting for plugin binding to be in success state - timed out after 50 tries.
[task 2017-05-29T23:14:10.033290Z] 23:14:10     INFO - Leaving test bound 
[task 2017-05-29T23:14:10.036124Z] 23:14:10     INFO - Entering test bound 


the failure seems to be on this line:
https://dxr.mozilla.org/mozilla-central/source/browser/base/content/test/plugins/browser_CTP_crashreporting.js?q=path%3Abrowser_CTP_crashreporting.js&redirect_type=single#139

I am glad we timeout and handle this faster than the harness timing out

:bsmedberg, I see you as the triage owner, can you get someone from the plug-ins team (or who is knowledgeable about this specific test case) to look at this failure and make it more reliable in the next 2 weeks?

Flags: needinfo?(benjamin)

Whiteboard: [stockwell needswork]

Comment hidden (Intermittent Failures Robot)

29 failures in 155 pushes (0.187 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* mozilla-inbound: 17
* autoland: 8
* mozilla-central: 4

Platform breakdown:
* linux32: 29

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-01&endday=2017-06-01&tree=all

Comment 7

•

7 years ago

Joel, do you have a regression range from the 22-25 May that I could peruse? My old tool to generate regression ranges for nightlies is broken. I suspect recent crash reporting stuff gsvelto was working on but I'd be more comfortable to see that before reassigning.

Flags: needinfo?(benjamin) → needinfo?(jmaher)

Comment 8

•

7 years ago

I don't have a tighter range, I was just trying to help shed light on what was going on- my primary goal is to triage and get this to the test owners to fix the issues.  Right now I don't have bandwidth to do a lot of extra things

Flags: needinfo?(jmaher)

Comment hidden (Intermittent Failures Robot)

58 failures in 820 pushes (0.071 failures/push) were associated with this bug in the last 7 days. 

This is the #21 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* mozilla-inbound: 29
* autoland: 18
* try: 6
* mozilla-central: 5

Platform breakdown:
* linux32: 56
* linux64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-05-29&endday=2017-06-04&tree=all

Comment hidden (Intermittent Failures Robot)

30 failures in 148 pushes (0.203 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 12
* mozilla-inbound: 11
* mozilla-central: 7

Platform breakdown:
* linux32: 28
* linux64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-05&endday=2017-06-05&tree=all

Assignee

Comment 11

•

7 years ago

I suspect bug 1359326 could have made this worse; there might be a race between the crash processing and the crash submission display and it might have been widened by that bug. I'll have a look today.

Comment hidden (Intermittent Failures Robot)

22 failures in 132 pushes (0.167 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 10
* mozilla-inbound: 6
* mozilla-central: 6

Platform breakdown:
* linux32: 16
* linux64: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-06&endday=2017-06-06&tree=all

Comment hidden (Intermittent Failures Robot)

32 failures in 182 pushes (0.176 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* mozilla-inbound: 17
* autoland: 8
* try: 4
* mozilla-central: 3

Platform breakdown:
* linux32: 24
* linux64: 8

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-07&endday=2017-06-07&tree=all

Comment hidden (Intermittent Failures Robot)

58 failures in 172 pushes (0.337 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* mozilla-inbound: 40
* autoland: 11
* try: 4
* mozilla-central: 2
* cedar: 1

Platform breakdown:
* linux64: 41
* linux32: 17

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-08&endday=2017-06-08&tree=all

Geoff Brown [:gbrown]

Comment 15

•

7 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #11)
> I suspect bug 1359326 could have made this worse; there might be a race
> between the crash processing and the crash submission display and it might
> have been widened by that bug. I'll have a look today.

Did you get a chance to look at this? There are lots of failures now...would like to see this test fixed or disabled soon.

Flags: needinfo?(gsvelto)

Assignee

Comment 16

•

7 years ago

(In reply to Geoff Brown [:gbrown] (pto June 12 - 16) from comment #15)
> Did you get a chance to look at this? There are lots of failures now...would
> like to see this test fixed or disabled soon.

Yes, but I couldn't find the root cause and I don't seem to be able to reproduce locally. I think I know a why to mitigate it though. I'll send a try run with the necessary change and I'll re-trigger this test a bunch of times to see if it gets better. If it doesn't we can disable it and then I'll re-enable it in bug 1323979 where I'm changing the way events about crashes are delivered throughout the system. Not clearing the NI for now.

Comment hidden (Intermittent Failures Robot)

20 failures in 153 pushes (0.131 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* mozilla-inbound: 8
* autoland: 7
* mozilla-central: 3
* try: 2

Platform breakdown:
* linux32: 19
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-09&endday=2017-06-09&tree=all

Assignee

Comment 18

•

7 years ago

I've managed to reproduce this locally using rr's chaos mode. I should be able to fix it on Monday, if not I'll disable it in the meantime. Taking the bug.

Assignee: nobody → gsvelto

Status: NEW → ASSIGNED

Flags: needinfo?(gsvelto)

Comment hidden (Intermittent Failures Robot)

22 failures in 53 pushes (0.415 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* mozilla-inbound: 17
* mozilla-central: 3
* try: 1
* autoland: 1

Platform breakdown:
* linux64: 11
* linux32: 11

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-10&endday=2017-06-10&tree=all

Comment hidden (Intermittent Failures Robot)

212 failures in 864 pushes (0.245 failures/push) were associated with this bug in the last 7 days. 

This is the #7 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **  

Repository breakdown:
* mozilla-inbound: 113
* autoland: 55
* mozilla-central: 29
* try: 14
* cedar: 1

Platform breakdown:
* linux32: 130
* linux64: 82

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-05&endday=2017-06-11&tree=all

Assignee

Comment 21

•

7 years ago

Sorry if it took me a while to figure this one out but it was tricky. Originally I thought about bug 1359326 because it was a rewrite of the crash reporting flow so I thought that might have broken something. As it turns out, it didn't. Bug 1335536 on the other hand broke a promise chain that made crash reporting reliable. The reason why this has only shown up after bug 1359326 landed is that previously that part of the crash reporting chain was mostly made of C++ code that run on a background thread thus not affecting the event loop. The new code is JS, and while it's asynchronous and runs stuff in the background it's using promises to sync everything up, so it must have upset the broken promise chain by "injecting" stuff in the middle of it and revealing the race.

I've got a patch that restores the promise chain and it's currently running on try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea482c07aa3416e68b1bd3b1d148145013b3875a&group_state=expanded

I've re-triggered the affected ~10 times and they don't seem to fail anymore. Before they failed very often, especially on debug builds.

Assignee

Comment 22

•

7 years ago

Sorry for the typo, I meant I've re-triggered the affected *tests* ~10 times.

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/148026/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/148026/

Comment 24

•

7 years ago

mozreview-review

Comment on attachment 8876677 [details]
Bug 1294025 - Fix the broken promise chain when recording a crash submission attempt;

https://reviewboard.mozilla.org/r/148026/#review152494

sneaky!

I bet making some of this async/await would make it more readable.

Attachment #8876677 - Flags: review?(benjamin) → review+

Ryan VanderMeulen [:RyanVM]

Assignee

Comment 25

•

7 years ago

Thanks for the review! I'm about to land but I'll leave the bug open for now, let's close it when we're sure that the test is not failing anymore.

Keywords: leave-open

Pulsebot

Comment 26

•

7 years ago

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/aff2e55a7974
Fix the broken promise chain when recording a crash submission attempt; r=bsmedberg

Comment hidden (Intermittent Failures Robot)

134 failures in 142 pushes (0.944 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* mozilla-inbound: 56
* autoland: 46
* mozilla-central: 20
* mozilla-beta: 9
* try: 3

Platform breakdown:
* linux64: 88
* linux32: 46

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-12&endday=2017-06-12&tree=all

Comment 28

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/aff2e55a7974

Comment hidden (Intermittent Failures Robot)

107 failures in 150 pushes (0.713 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 43
* mozilla-inbound: 40
* mozilla-central: 10
* try: 8
* mozilla-beta: 6

Platform breakdown:
* linux64: 69
* linux32: 38

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-13&endday=2017-06-13&tree=all

Mark Banner (:standard8)

Comment 30

•

7 years ago

The change here doesn't seem to have worked - this test is currently perma-failing.

Comment 31

•

7 years ago

it sounds like the changes made this worse- should we:
1) back out the changes
2) disable the test for linux debug?

Flags: needinfo?(gsvelto)

Assignee

Comment 32

•

7 years ago

I'm fairly stumped but I'm running more tests right now. If they're inconclusive I'll disable the test today and re-enable it once I've refactored all the associated code in bug 1323979.

Flags: needinfo?(gsvelto)

Assignee

Comment 33

•

7 years ago

The issue that's causing the failures on Linux debug builds is different than what I was seeing before and it looks like it's a race in the test code itself. Unfortunately verifying it takes hours because I really can't reproduce it easily on my machine so I have to wait the usual try-run turn around time. I'll disable the test in the meantime and I will file a bug to re-enable it once I've found a solution.

Pulsebot

Comment 34

•

7 years ago

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bacba9805202
Disable browser_CTP_crashreporting.js on Linux debug builds; r=me

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Assignee

Updated

•

7 years ago

Whiteboard: [stockwell needswork] → [stockwell needswork][test disabled]

Comment hidden (Intermittent Failures Robot)

77 failures in 168 pushes (0.458 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 33
* try: 14
* mozilla-inbound: 14
* mozilla-central: 11
* mozilla-beta: 4
* cedar: 1

Platform breakdown:
* linux64: 50
* linux32: 27

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-14&endday=2017-06-14&tree=all

Comment 36

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bacba9805202

Assignee

Comment 37

•

7 years ago

OK, I've figured out why the test was perma-failing on Linux debug builds: it was timing out. Now that the promise chain is restored it often took a while longer for the fake crash submission server to respond thus triggering the rest of the test-chain. With the promise chain broken it was working some of the time because the code was just racing ahead to the desired condition without waiting for the actual events to happen. In a sense the test wasn't working, but it didn't explicitly fail either.

Comment hidden (Intermittent Failures Robot)

24 failures in 131 pushes (0.183 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 9
* try: 8
* mozilla-beta: 5
* mozilla-central: 2

Platform breakdown:
* linux64: 15
* linux32: 9

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-15&endday=2017-06-15&tree=all

Assignee

Comment 39

•

7 years ago

Try run with test re-enabled and the timeout of a check extended to prevent it from failing. I've retriggered the affected chunks ~15 times to ensure they're stable:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=294adafebc09b08ccd908f00a4db4f35653c43c7&group_state=expanded

Assignee

Comment 40

•

7 years ago

Attached patch [PATCH] Ensure the test waits long enough for the success condition to be fullfilled — Details — Splinter Review

This re-enables the test and lengthens the time we wait for the crash submitted status to be reported. I couldn't trigger the issue anymore over ~20 runs so I'm pretty sure it's settled for good.

Attachment #8878469 - Flags: review?(benjamin)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

7 years ago

Attachment #8878469 - Flags: review?(benjamin) → review+

Pulsebot

Comment 41

•

7 years ago

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d69a2c18f841
Ensure the test waits long enough for the success condition to be fullfilled; r=bsmedberg

Comment 42

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d69a2c18f841

Comment hidden (Intermittent Failures Robot)

391 failures in 814 pushes (0.48 failures/push) were associated with this bug in the last 7 days. 

This is the #3 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **  

Repository breakdown:
* autoland: 146
* mozilla-inbound: 116
* try: 51
* mozilla-central: 43
* mozilla-beta: 33
* cedar: 2

Platform breakdown:
* linux64: 253
* linux32: 138

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-12&endday=2017-06-18&tree=all

Julien Cristau [:jcristau] (back April 22)

Updated

•

7 years ago

Whiteboard: [stockwell needswork][test disabled] → [stockwell fixed:timing]

Comment hidden (Intermittent Failures Robot)

22 failures in 892 pushes (0.025 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-beta: 18
* try: 4

Platform breakdown:
* linux64: 11
* linux32: 11

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-19&endday=2017-06-25&tree=all

Comment hidden (Intermittent Failures Robot)

18 failures in 718 pushes (0.025 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-beta: 18

Platform breakdown:
* linux32: 11
* linux64: 7

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1294025&startday=2017-06-26&endday=2017-07-02&tree=all

Comment 46

•

7 years ago

These failures seem to have gone away now on central?  Time to call this fixed?

Flags: needinfo?(gsvelto)

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Comment 47

•

7 years ago

I think so! Thanks much :gsvelto.

(Failures continue on beta...could the patches be uplifted?)

Flags: needinfo?(gbrown)