Closed Bug 1353556 Opened 7 years ago Closed 6 years ago

Intermittent dom/workers/test/serviceworkers/browser_multie10s_update.js | both succeeded. This is wrong. -

Categories

(Core :: DOM: Workers, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1429794

People

(Reporter: intermittent-bug-filer, Assigned: asuth)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:backout])

Andrea, can you look at this?  Its a bit concerning if our new bandaid for this is failing the test.
Flags: needinfo?(amarchesini)
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #1)
> Andrea, can you look at this?  Its a bit concerning if our new bandaid for
> this is failing the test.

I think the problem is a timing issue. Here we have 2 processes doing both an update(). This test uses a timeout. Probably we should just extend that timeout value.
Priority: -- → P3
There are 97 failures in the past 7 days.
Platforms: most of them on Windows 7 opt/debug and pgo, windows10-64 debug/opt and pgo,and a few on  windows10-64-ccov debug, windows10-64-devedition opt, windows10-64-nightly opt.
Recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=154585422&lineNumber=5029
Relevant part of the log:
INFO -  963 INFO TEST-START | dom/workers/test/serviceworkers/browser_multie10s_update.js
5018
19:29:09     INFO -  TEST-INFO | started process screenshot
5019
19:29:09     INFO -  TEST-INFO | screenshot: exit 0
5020
19:29:09     INFO -  Buffered messages logged at 19:29:06
5021
19:29:09     INFO -  964 INFO Entering test bound test_update
5022
19:29:09     INFO -  965 INFO Setting the prefs to having multi-e10s enabled
5023
19:29:09     INFO -  966 INFO Creating the first tab...
5024
19:29:09     INFO -  967 INFO Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://mochi.test:8888/browser/dom/workers/test/serviceworkers/file_multie10s_update.html" line: 0}]
5025
19:29:09     INFO -  968 INFO Creating the second tab...
5026
19:29:09     INFO -  969 INFO Let's start the test...
5027
19:29:09     INFO -  970 INFO Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://mochi.test:8888/browser/dom/workers/test/serviceworkers/file_multie10s_update.html" line: 0}]
5028
19:29:09     INFO -  Buffered messages finished
5029
19:29:09    ERROR -  971 INFO TEST-UNEXPECTED-FAIL | dom/workers/test/serviceworkers/browser_multie10s_update.js | both succeeded. This is wrong. -
5030
19:29:09     INFO -  Stack trace:
5031
19:29:09     INFO -  chrome://mochitests/content/browser/dom/workers/test/serviceworkers/browser_multie10s_update.js:test_update:71
5032
19:29:09     INFO -  GECKO(3236) | [Parent 3236, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
5033
19:29:09     INFO -  972 INFO Leaving test bound test_update
5034
19:29:09     INFO -  GECKO(3236) | [Parent 3236, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
5035
19:29:09     INFO -  GECKO(3236) | MEMORY STAT | vsize 703MB | vsizeMaxContiguous 678MB | residentFast 226MB | heapAllocated 114MB
5036
19:29:09     INFO -  973 INFO TEST-OK | dom/workers/test/serviceworkers/browser_multie10s_update.js | took 3261ms
5037
19:29:09     INFO -  974 INFO checking window state
5038
19:29:09     INFO -  975 INFO TEST-START | dom/workers/test/serviceworkers/browser_userContextId_openWindow.js
5039
19:29:09     INFO -  GECKO(3236) | [Parent 3236, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
5040
19:29:09     INFO -  GECKO(3236) | MEMORY STAT | vsize 709MB | vsizeMaxContiguous 678MB | residentFast 219MB | heapAllocated 100MB
5041
19:29:09     INFO -  976 INFO TEST-OK | dom/workers/test/serviceworkers/browser_userContextId_openWindow.js | took 544ms
5042
19:29:09     INFO -  977 INFO checking window state
5043
19:29:10     INFO -  GECKO(3236) | [Parent 3236, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
5044
19:29:10     INFO -  GECKO(3236) | [Parent 3236, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
5045
19:29:10     INFO -  GECKO(3236) | [DEBUG SHUTDOWN] Shutdown: decoder=05522B60 state machine=0CC7D000
5046
19:29:10     INFO -  GECKO(3236) | [DEBUG SHUTDOWN] Enter: state machine=0857CC30 reader=0CCE45F0
5047
19:29:10     INFO -  GECKO(3236) | [DEBUG SHUTDOWN] Shutdown: reader=085FD000 shutdown demuxer=0CCA0180
5048
19:29:10     INFO -  GECKO(3236) | [DEBUG SHUTDOWN] Shutdown: pool=0CC65640 count=1
5049
19:29:10     INFO -  GECKO(3236) | [DEBUG SHUTDOWN] operator (): pool=0CC65640 shutdown=true count=0
5050
19:29:10     INFO -  GECKO(3236) | [DEBUG SHUTDOWN] TearDownDecoders: reader=085FD000 shut down audio task queue
5051
19:29:10     INFO -  GECKO(3236) | [DEBUG SHUTDOWN] TearDownDecoders: reader=085FD000 shut down video task queue
5052
19:29:10     INFO -  GECKO(3236) | [DEBUG SHUTDOWN] FinishShutdown: state machine=0CC7D000
5053
19:29:10     INFO -  GECKO(3236) | [DEBUG SHUTDOWN] Unregister: decoder=05522B60, count=0
5054
19:29:11     INFO -  GECKO(3236) | Completed ShutdownLeaks collections in process 4656
5055
19:29:11     INFO -  GECKO(3236) | Completed ShutdownLeaks collections in process 6052
5056
19:29:11     INFO -  GECKO(3236) | Completed ShutdownLeaks collections in process 6012
5057
19:29:11     INFO -  GECKO(3236) | Completed ShutdownLeaks collections in process 5380
5058
19:29:11     INFO -  GECKO(3236) | Completed ShutdownLeaks collections in process 3236
5059
19:29:11     INFO -  978 INFO TEST-START | Shutdown
5060
19:29:11     INFO -  979 INFO Browser Chrome Test Summary
5061
19:29:11     INFO -  980 INFO Passed:  81
5062
19:29:11  WARNING -  981 INFO Failed:  1
5063
19:29:11  WARNING -  One or more unittests failed.
5064
19:29:11     INFO -  982 INFO Todo:    0
5065
19:29:11     INFO -  983 INFO Mode:    e10s
5066
19:29:11     INFO -  984 INFO *** End BrowserChrome Test Results ***
:overholt Can you please take a look?
Flags: needinfo?(overholt)
Whiteboard: [stockwell needswork:owner]
I apparently made this near-perma-orange with my landing of bug 1416879 yesterday and am investigating now.
Assignee: nobody → bugmail
Blocks: 1416879
Status: NEW → ASSIGNED
Flags: needinfo?(overholt)
Flags: needinfo?(amarchesini)
Are you sure it was your patch and not my Client handle SWM landing?
I wanted to blame that landing too, but if you look at treeherder for mozilla-inbound, it unfortunately looks like it's my stack.  The test I added runs immediately prior to this test, so it may just be spillover effects.  However, I think this test could use some improvements anyways, like, I'm not sure where the call to unregister() is ever supposed to happen?

Filtered down to just that job, scrolling back, I think you can see via:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&group_state=expanded&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=success&filter-searchStr=windows7%20opt%20M-e10s(bc3)&fromchange=9cd8ce4f59dd2e9ffbffb74107227e10d0fa0910&selectedJob=154595277
Maybe confused by the auto-update of previous tests SW.

Can you try moving multu-e10s update test to a separate directory so it runs in isolation?
Flags: needinfo?(bugmail)
The actual successful try push with added logs was https://treeherder.mozilla.org/#/jobs?repo=try&revision=be0d92dd03882bc59f8021406b50558d155f2e38&selectedJob=154963090

The logs like https://treeherder.mozilla.org/logviewer.html#?job_id=154963090&repo=try&lineNumber=4921 show that:
- browser_force_refresh.js leaves its registration around with the default scope...
- browser_multie10s_update.js uses the same default scope...
  - it also makes its SW take 3 seconds to be served.  Every time, not just for update fetches...
- file_multie10s_update.html just does getRegistrations() and invokes update() on what it gets back.
- So it's the force_refresh_browser_worker.js worker that gets updated, and since it doesn't have an .sjs with a 3 second delay, the two update requests are able to serialize because they can be processed lickety split.  This results in a test failure.
thanks :asuth for working on this, I see the recent orange factor shows there are no failures in the last few days:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1353556

Is there further work to do here?  Do we know why this stopped failing 3 days ago?
The bug spiked due to changes I made in Bug 1416879.  The spike went away because my changes got backed out.  I've cleaned up this test further in Bug 1416879 and the test that precedes it that was inducing most of the failures seen here.  But I've also had to disable it because of another new failure.  I've filed Bug 1429794 for re-enabling and that will happen soon.

It's possible any relative drop that's occurred here is due to changes :bkelly has been making in the ServiceWorker implementation, or because he's changed some other tests to drop the process count to 1.  Dropping the process count to 1 makes test happier because the multi-e10s propagation issues are mooted when the subsequent test starts with a current snapshot of ServiceWorker state.  Also, the time required to spin up the new content processes helps avoid races.

I will dupe this bug over to the re-enabling bug 1429794 once bug 1416879 lands and sticks.
Flags: needinfo?(bugmail)
Whiteboard: [stockwell disable-recommended] → [stockwell fixed:backout]
Duping to the re-enabling bug, bug 1429794, as promised.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.