Closed Bug 1361823 Opened 7 years ago Closed 7 years ago

Intermittent browser_scratchpad_wrong_window_focus.js | application crashed [@ ScriptLoaderRunnable::LoadScript]

Categories

(Core :: DOM: Workers, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: bkelly)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [stockwell fixed])

Attachments

(1 file, 1 obsolete file)

Scratchpad bug triage. Filter on TRIAGE-JD201705

Looks like a new, but frequent intermittent.
Priority: -- → P2
Looks like this is coming from the following merge from autoland to m-c: 

https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=604acb6a6aece0d9f296dbb7ca091f45713c0c9d

Fails only on Linux 64 debug, after the following assertion fails:

> Assertion failure: NS_LoadGroupMatchesPrincipal(loadGroup, principal), at /home/worker/workspace/build/src/dom/workers/ScriptLoader.cpp:891

Can't really see which changeset from the list above could lead to this, so I'm doing a few retriggers on autoland to try to narrow it down.
From the retriggers I made yesterday it looks like it might have been introduced by Bug 1359144:
- no failures on https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=aa42442a1a5654f4f2ddd023f15e77146999cfad
- 2 failures / 15 runs  on https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=d5a0c4a83b726ed4747855de2094e4ea50328209 

Doing a few more retriggers to confirm the investigation above.

Bug 1359144 is changing the way sourcemapping works in the console. And the failing test is indeed checking the console. We check that messages logged from scratchpad have the correct location displayed in the console, so we are probably triggering the sourcemap service.
Tom, it looks like this started to spike after switching the console to the new sourcemap service. So far it only ever crashed on Linux 64 debug (e10s or not, but seems more frequent with e10s disabled)

- log for a failure: https://treeherder.mozilla.org/logviewer.html#?job_id=96595002&repo=autoland&lineNumber=21764

Any idea?
Blocks: 1359144
Flags: needinfo?(ttromey)
See Also: → 1353604
No idea, except that this doesn't seem likely to be a devtools bug.
The source map decoding is run in a worker but it isn't doing anything special.
Component: Developer Tools: Scratchpad → DOM: Service Workers
Flags: needinfo?(ttromey)
Product: Firefox → Core
thanks for looking into this Julian and Tom!  I cannot follow the transition from devtools:scratchpad -> dom:service workers- could you clarify that a big more?
Flags: needinfo?(ttromey)
Whiteboard: [stockwell needswork]
I don't understand why this was ever filed as a scratchpad bug.
I suspected it as a worker bug due to the stack trace:

[task 2017-05-04T14:00:11.735622Z] 14:00:11     INFO - GECKO(1701) | [1701] WARNING: NS_ENSURE_TRUE(loadContext) failed: file /home/worker/workspace/build/src/netwerk/base/nsNetUtil.cpp, line 807
[task 2017-05-04T14:00:11.735918Z] 14:00:11     INFO - GECKO(1701) | Assertion failure: NS_LoadGroupMatchesPrincipal(loadGroup, principal), at /home/worker/workspace/build/src/dom/workers/ScriptLoader.cpp:891
[task 2017-05-04T14:00:11.736123Z] 14:00:11     INFO - GECKO(1701) | #01: ScriptLoaderRunnable::Run [dom/workers/ScriptLoader.cpp:844]
[task 2017-05-04T14:00:11.736203Z] 14:00:11     INFO - 
[task 2017-05-04T14:00:11.736905Z] 14:00:11     INFO - GECKO(1701) | #02: nsThread::ProcessNextEvent [mfbt/Maybe.h:445]
[task 2017-05-04T14:00:11.737506Z] 14:00:11     INFO - 
[task 2017-05-04T14:00:11.737583Z] 14:00:11     INFO - GECKO(1701) | #03: NS_ProcessNextEvent [xpcom/threads/nsThreadUtils.cpp:393]
[task 2017-05-04T14:00:11.738213Z] 14:00:11     INFO - 
[task 2017-05-04T14:00:11.738281Z] 14:00:11     INFO - GECKO(1701) | #04: mozilla::ipc::MessagePump::Run [ipc/glue/MessagePump.cpp:97]
Flags: needinfo?(ttromey)
looking at recent failures:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1361823&startday=2017-05-03&endday=2017-05-05&tree=all

all the failures are in the browser_scratchpad_wrong_window_focus.js test file- but the assertion is not in devtools, it is here:
https://hg.mozilla.org/mozilla-central/annotate/0b255199db9d6a6f189b89b7906f99155bde3726/dom/workers/ScriptLoader.cpp#l891

added from bug 1107516.

:bkelly, I see you added this assertion, can you pick up investigating this intermittent?
Flags: needinfo?(bkelly)
Can you link to the worker being run in this test?
Component: DOM: Service Workers → DOM: Workers
I am not sure how to do that, are there instructions for this?  Maybe this is something the devtools team knows?
Yea, I was hoping Tom could respond.
All I really know is what is written in the bug.
But, since it started spiking after switching to the client-side source mapping service,
and since source map decoding is done in a worker, it seems reasonable to hypothesize
that this is the worker in question.

This code is here:
https://dxr.mozilla.org/mozilla-central/source/devtools/client/shared/source-map/worker.js

The worker is started here:
https://dxr.mozilla.org/mozilla-central/source/devtools/client/shared/source-map/index.js#561
... however I can't explain why the failures would occur in a scratchpad test.
That is a strike against the theory.  I don't know anything about scratchpad, I'm afraid.
Ben, any ideas here?
(In reply to Joel Maher ( :jmaher) from comment #16)
> Ben, any ideas here?

Not really.  I can't reproduce locally running the test individually and I can't run the entire test directory locally (it hangs).

The assertion means the worker is not getting created correctly.  It's load group is not populated with a load context.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2ced53814d1109ca81172851c75890bb1f55ddc
Looks like it is the source-map worker:

INFO - GECKO(1749) | ### ### LoadScript() resource://devtools/client/shared/source-map/worker.js
INFO - GECKO(1749) | [1749] WARNING: NS_ENSURE_TRUE(loadContext) failed: file /home/worker/workspace/build/src/netwerk/base/nsNetUtil.cpp, line 835
INFO - GECKO(1749) | Assertion failure: NS_LoadGroupMatchesPrincipal(loadGroup, principal), at /home/worker/workspace/build/src/dom/workers/ScriptLoader.cpp:894

I'll try to add more instrumentation to day.
As a start, lets prevent workers with a bad load group from starting.  Instead we will throw in this situation.

This will probably still cause a test failure, but it will show up as an exception in the devtools code.  Then we can try to figure out why devtools is trying to start a worker in this situation.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=23c8955b72565e8a23b62370ede5311a3889d128
Interestingly those new protections don't prevent the original assert.
Comment on attachment 8866450 [details] [diff] [review]
P1 Don't allow a worker to start with a bad load group. r=baku

Andrea, the intermittent test was hitting our assertion on NS_LoadGroupMatchesPrincipal() due to the load context being missing.  My best guess is this is happening because of cycle collection just before the worker thread is started.

This patch fails the load in this case instead of asserting.  I also check the load group in a few more places to ensure we're not getting a bad load group passed to us.

This seems to fix the intermittent.
Attachment #8866450 - Flags: review?(amarchesini)
Attachment #8866450 - Flags: review?(amarchesini) → review+
Attachment #8866365 - Attachment is obsolete: true
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1eab3a251445
Fail the worker script load instead of crashing if the load group is torn down. r=baku
https://hg.mozilla.org/mozilla-central/rev/1eab3a251445
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Keywords: crash
Whiteboard: [stockwell needswork] → [stockwell fixed]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: