Assertion failure: js::GetNonCCWObjectRealm(wrapper) == js::GetContextRealm(cx), at /builds/worker/workspace/build/src/dom/workers/WorkerRunnable.cpp:345
Categories
(Core :: DOM: Workers, defect, P1)
Tracking
()
People
(Reporter: jkratzer, Assigned: edenchuang)
References
(Blocks 2 open bugs)
Details
(Keywords: assertion, sec-high, testcase, Whiteboard: [post-critsmash-triage][adv-main74+r])
Attachments
(3 files, 1 obsolete file)
1.01 KB,
text/html
|
Details | |
14.25 KB,
application/x-javascript
|
Details | |
47 bytes,
text/x-phabricator-request
|
tjr
:
approval-mozilla-beta+
tjr
:
sec-approval+
|
Details | Review |
Testcase found while fuzzing mozilla-central rev debc7af30b7e (built with --enable-debug). Testcase must be served over HTTP in order to reproduce.
Assertion failure: js::GetNonCCWObjectRealm(wrapper) == js::GetContextRealm(cx), at /builds/worker/workspace/build/src/dom/workers/WorkerRunnable.cpp:345
rax = 0x000055ef733e1040 rdx = 0x0000000000000000
rcx = 0x00007fcc4f674484 rbx = 0x00007fcc25fe3940
rsi = 0x00007fcc5aebf8b0 rdi = 0x00007fcc5aebe680
rbp = 0x00007fffd5378330 rsp = 0x00007fffd5378190
r8 = 0x00007fcc5aebf8b0 r9 = 0x00007fcc5c029780
r10 = 0x0000000000000000 r11 = 0x0000000000000000
r12 = 0x00007fffd53781c0 r13 = 0x00007fcc37d2d000
r14 = 0x0000265588fc7190 r15 = 0x00007fffd5378240
rip = 0x00007fcc4b995047
OS|Linux|0.0.0 Linux 4.18.0-20-generic #21~18.04.1-Ubuntu SMP Wed May 8 08:43:37 UTC 2019 x86_64
CPU|amd64|family 6 model 94 stepping 3|1
GPU|||
Crash|SIGSEGV|0x0|0
0|0|libxul.so|mozilla::dom::WorkerRunnable::Run()|hg:hg.mozilla.org/mozilla-central:dom/workers/WorkerRunnable.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|224|0x2d
0|1|libxul.so|mozilla::ThrottledEventQueue::Inner::ExecuteRunnable()|hg:hg.mozilla.org/mozilla-central:xpcom/threads/ThrottledEventQueue.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|252|0x11
0|2|libxul.so|mozilla::ThrottledEventQueue::Inner::Executor::Run()|hg:hg.mozilla.org/mozilla-central:xpcom/threads/ThrottledEventQueue.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|80|0xd
0|3|libxul.so|nsThread::ProcessNextEvent(bool, bool*)|hg:hg.mozilla.org/mozilla-central:xpcom/threads/nsThread.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|1176|0x15
0|4|libxul.so|NS_ProcessNextEvent(nsIThread*, bool)|hg:hg.mozilla.org/mozilla-central:xpcom/threads/nsThreadUtils.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|486|0x11
0|5|libxul.so|mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)|hg:hg.mozilla.org/mozilla-central:ipc/glue/MessagePump.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|88|0xa
0|6|libxul.so|MessageLoop::RunInternal()|hg:hg.mozilla.org/mozilla-central:ipc/chromium/src/base/message_loop.cc:debc7af30b7e0ee50653706c19a10511ceefff83|315|0x17
0|7|libxul.so|MessageLoop::Run()|hg:hg.mozilla.org/mozilla-central:ipc/chromium/src/base/message_loop.cc:debc7af30b7e0ee50653706c19a10511ceefff83|290|0x8
0|8|libxul.so|nsBaseAppShell::Run()|hg:hg.mozilla.org/mozilla-central:widget/nsBaseAppShell.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|137|0xd
0|9|libxul.so|nsAppStartup::Run()|hg:hg.mozilla.org/mozilla-central:toolkit/components/startup/nsAppStartup.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|276|0xe
0|10|libxul.so|XREMain::XRE_mainRun()|hg:hg.mozilla.org/mozilla-central:toolkit/xre/nsAppRunner.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|4650|0x11
0|11|libxul.so|XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&)|hg:hg.mozilla.org/mozilla-central:toolkit/xre/nsAppRunner.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|4789|0x8
0|12|libxul.so|XRE_main(int, char**, mozilla::BootstrapConfig const&)|hg:hg.mozilla.org/mozilla-central:toolkit/xre/nsAppRunner.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|4870|0x5
0|13|firefox-bin|do_main|hg:hg.mozilla.org/mozilla-central:browser/app/nsBrowserApp.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|213|0x22
0|14|firefox-bin|main|hg:hg.mozilla.org/mozilla-central:browser/app/nsBrowserApp.cpp:debc7af30b7e0ee50653706c19a10511ceefff83|295|0xf
0|15|libc-2.27.so||||0x21b97
0|16|firefox-bin|MOZ_ReportCrash|hg:hg.mozilla.org/mozilla-central:mfbt/Assertions.h:debc7af30b7e0ee50653706c19a10511ceefff83|184|0x5
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Guessing sec-high because it sounds like a compartment mismatch.
Comment 2•5 years ago
|
||
- :jkratzer, Can the fuzzer make patches so that this could be run as a test via mach following an
hg import
? Or is there some way to otherwise easily use mach to run this test under mach with "--debugger rr" in a quasi-automated fashion? If there's a link I should just read, that's fine. Thanks! - :bholley, related to that, I'm interested for things like this if we can get a repro under pernos.co, but I'm worried that to do that I need to push this test to try but that is somewhat explicitly frowned upon I think. Do we have a mechanism to make it less risky / a concern?
My general motivating concern is we're quite overwhelmed as it is with worker sec bugs and our worker sec bug person just went on medical leave for at least 6 weeks, so anything we can do to make it easier to process and analyze the sec-bugs would be handy. (And especially the ability to link someone else to a pernos.co trace versus needing them to reproduce locally under rr, etc., would be nice.) Thanks!
Comment 3•5 years ago
|
||
secure try / secure pernosco are both things that could exist in the future but don't exist now.
Reporter | ||
Comment 4•5 years ago
|
||
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #2)
- :jkratzer, Can the fuzzer make patches so that this could be run as a test via mach following an
hg import
? Or is there some way to otherwise easily use mach to run this test under mach with "--debugger rr" in a quasi-automated fashion? If there's a link I should just read, that's fine. Thanks!
Andrew, I'm not sure if there's an easy way to import this as a test to be run under mach. However, if you're just looking for a simplified way to run this under rr, you could use ffpuppet (https://github.com/MozillaSecurity/ffpuppet/) with the -rr option.
python -m SimpleHTTPServer &
python -m ffpuppet -p prefs.js --xvfb -d -l log -rr ~/builds/firefox/firefox http://localhost:8000/testcase.html
I'll also attach the prefs I used to reproduce this.
Reporter | ||
Comment 5•5 years ago
|
||
Comment 6•5 years ago
|
||
That is very helpful, thank you!
Comment 7•5 years ago
|
||
Given this is a sec-high issue and has an STR, mark this as P1.
Comment 8•5 years ago
|
||
Once we identify where the compartment is wrong we should see if there are any API design issues that made it easy to get wrong. See Bug 1565930 for discussions about compartment mismatch bugs related to Promise APIs.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 9•5 years ago
|
||
Is this something you might be able to work on for the 71 release?
Assignee | ||
Comment 10•5 years ago
|
||
Yes, I am working on it now. Hope can fix before moving 71 to beta.
Assignee | ||
Comment 11•5 years ago
|
||
The worker sends jobs(CompileScript) to the main thread before the document/inner window(GlobalObject) is recreated and executing the job after recreated(when calling SpinEventLoop in the second sync XHR), but the worker/workerPrivate still keeps the original GlobalObject. I wrote some patch to try to cleanup the event queue, but it has side-effects to make other tests fail.
However, I think cleanup the jobs in the event queue might not be a correct solution, because there are not only worker jobs, and it might be the root cause making other regressions. I think these bugs 1546331, 1497633, 1557732 and 1545345 are related to maintaining worker's lifecycle, especially the delete worker timing. We should have an overall solution for these bugs, and bug 1539508 is for it.
In the next step, I will study from these cases and source code to figure out how to refactor the current architecture.
But it would take some time to do it.
Assignee | ||
Updated•5 years ago
|
Comment 12•5 years ago
|
||
I need to compose a deeper reply, but I think there isn't too much overlap on these.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 13•5 years ago
|
||
Could not reproduce the bug after bug 1588248 landed. But it still could be reproduced in esr68.
Comment 14•5 years ago
|
||
Last week Eden told me that he's trying to figure out what fixed this so we can think about backporting. Is that still correct, Eden?
Assignee | ||
Comment 15•5 years ago
|
||
Yes, it is correct. Writing a patch backporting.
Comment 16•5 years ago
|
||
Taking while Eden's on PTO. I understand the current plan has been to backport the partial fix on bug 1588248 to ESR68.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 17•5 years ago
|
||
Hi Andrew, if it is okay, I'd like to take this bug back.
Following is a rough expected flow when loading the page.
- Loading and create DOM objects.
- Firing DOMContentLoaded event
- The DOMContentLoaded event handler executes a sync XHR then creates an empty worker after sync XHR return.
- DocShell creates the ContentViewer(nsDocumentViewer) and copy the document and set them into nsGlobalWindowOuter.
- Firing load event
In this expected flow, we should only fire the DOMContentLoaded event once.
In step 4, when DocShell creates a ContentViewer and copies the document, it also creates a new realm/GlobalObject, so you can expect the realm/compartment is different before and after DocShell creates the ContentViewer.
However, Sync XHR could make step 4 happens before we create the worker. And the following is the problem flow we meet.
- Loading and create DOMObjects
- Firing DOMContentLoaded event
- The DOMContentLoaded event handler executes a sync XHR, but not yet return.
- DocShell creates the ContentViewer(nsDocumentViewer) and copy the document and set them into nsGlobalWindowOuter.
- Creates an empty worker after Sync XHR returns with the GlobalObject from the event handler.
- DOMContentLoaded event could be fired again for the new document since it was copied before the handler finish.
- Some WorkerRunnable executes then hits the assertion since the realm from GlobalObject (from worker) is different from nsGlobalWindowOuter.
The possible solution could be making sure DocShell creates the ContentViewer and copies the document after DOMContentLoaded finish.
But I need some time to figure out how to achieve it without side-effects.
Comment 18•5 years ago
|
||
Hey, yes! Thanks for the detailed description!
Here's a pernosco trace from my local repro (still rebuilding): https://pernos.co/debug/f--h5b9H8IFFTzqKsZoVpA/index.html
Assignee | ||
Comment 19•5 years ago
|
||
Assignee | ||
Comment 20•5 years ago
|
||
Spend some time to analyze why seeing two DOMContentLoaded event dispatching when loading the attached testcase.html.
The interesting thing is a <meta> tag with a specified charset="UTF8" in the last line.
https://searchfox.org/mozilla-central/source/parser/html/nsHtml5TreeOperation.cpp#964
When parser parses the line of the file, it detects the document needs to reload because of charset changed, then nsDocShell::CharsetChangeStopDocumentLoad() is called to discard the current document.
https://searchfox.org/mozilla-central/source/parser/html/nsHtml5TreeOpExecutor.cpp#774
CharsetChangeStopDocumentLoad() discards the document through a normal stop loading flow, and DOMContentLoaded() is dispatched while executing the Document::EndLoad(). This is the first DOMContentLoaded event.
https://searchfox.org/mozilla-central/source/dom/base/Document.cpp#7354
After nsDocShell::CharsetChangeStopDocumentLoad() success, nsDocShell::CharsetChangeReloadDocument is called for reloading the document with specified charset. And this will create new ContentViewer, document, and WindowInner for reloading, then set them into the nsGlobalWindowOuter. After the document loading finish, the second DOMContentLoaded event is dispatched.
https://searchfox.org/mozilla-central/source/docshell/base/nsDocShell.cpp#12489
We meet the assertion fail since we are switching documents while handling the DOMContentLoaded event on the first document, and the worse is there is a sync XHR in the event handler.
I think we should not dispatch the DOMContentLoad event on the discarded document. And a patch is attached.
Comment 21•5 years ago
|
||
I'm having great trouble to understand why the patch helps with the worker issue in general. One can always
spin event loop in DOMContentLoaded listener (or any time JS runs) and load some new page while that is happening.
Why does some workerrunnable try to run on wrong global? (assuming I understand the assertion correctly)
Why we haven't canceled it? Or if cancelling is for worker side only, why we we don't return early when we realize the global isn't correct anymore.
Or what am I missing? :)
Assignee | ||
Comment 22•5 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #21)
I'm having great trouble to understand why the patch helps with the worker issue in general. One can always
spin event loop in DOMContentLoaded listener (or any time JS runs) and load some new page while that is happening.
Yes, the patch is specific for the charset changing case since it is from the attached case. Like you said, for other cases, one spin event loop in DOMContentLoaded listener (or any time JS runs) and load some new page while that is happening, the assertion might be hit again.
However, maybe the condition in the assertion doesn't fit all cases. I will try to update the assertion condition.
I am not sure if this patch is correct, but I think the patch still has its value for the charset changing case. At least, I didn't find that we should dispatch DOMContentLoaded on the document that discarded by charset changing in the spec. I also compared with other browsers' implementation, no matter the spinning event loop in DOMContentLoaded or not, no one dispatches two DOMContentLoaded events for loading the attached case, only Firefox.
Why does some workerrunnable try to run on wrong global? (assuming I understand the assertion correctly)
In the attached case, the global from DOMContentLoad event is used to create the worker, and runnables for the worker inherits the worker's gloabl and assumes the execution environment should fit with the global. So when executing the runnable the assertion checks the compartments between worker's global and current nsGlobalWindowOuter.
Why we haven't canceled it? Or if cancelling is for worker side only, why we we don't return early when we realize the global isn't correct anymore.
Or what am I missing? :)
The workerRunnable can be a control runnable or a normal runnable. If it is a control runnable, we use it to control workers, i.e. closing Worker. For the case, we cannot cancel it, since we expect to run the runnable to close workers normally.
Comment 23•5 years ago
|
||
If we don't want to dispatch DOMContentLoaded for charset change, let's file a separate bug for that and fix that case there (with a testcase).
But I can't see how the patch would fix the worker related issue - it just hides the issue in one specific case.
This is a security bug (though, I'm not actually sure how critical), so the issue should be fixed properly.
This very much smells like an issue related to WorkerPrivate or WorkerRunnable.
Something there can't deal with the case when global is changing.
Assignee | ||
Comment 24•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 25•5 years ago
|
||
According to comments 23, remove the patch for the specific case, reload with charset changing.
When constructing a dedicated worker, the new patch checks if the innerWindow for construction is the current innerWindow or not.
If it is not, we are constructing a worker on a discarded window(RemovedFromDocshell), and create it will cause compartment mismatching for the worker. In the case, the patch returns nullptr and throw InvalidStateError.
Assignee | ||
Comment 26•5 years ago
|
||
Comment on attachment 9126146 [details]
Bug 1557732 - Don't create worker for discarded document.
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Not sure. But it is easy to understand what situation the patch wants to fix.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: 69
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?:
- How likely is this patch to cause regressions; how much testing does it need?: The patch should not cause regressions easy.
Assignee | ||
Comment 27•5 years ago
|
||
Hello :tjr, could you help on this sec-approval? Thanks.
Updated•5 years ago
|
Comment 28•5 years ago
|
||
Comment on attachment 9126146 [details]
Bug 1557732 - Don't create worker for discarded document.
Approved to land.
Comment 29•5 years ago
|
||
(Actually, I don't know if this patch applies cleanly to Beta, Eden can you confirm? If not can you attach one?)
Updated•5 years ago
|
Comment 30•5 years ago
|
||
https://hg.mozilla.org/integration/autoland/rev/448b5cd3eaa0
This grafts cleanly to Beta as-landed.
Comment 31•5 years ago
|
||
Comment 32•5 years ago
|
||
uplift |
Updated•5 years ago
|
Updated•5 years ago
|
Comment 33•5 years ago
|
||
Hi, I'm trying to reproduce this issue on our end but I'm getting this error :
svuser@P5287:~/Desktop/Test/ffpuppet-master$ python -m ffpuppet -p prefs.js --xvfb -d -l log --rr ~/home/svuser/Desktop/Foxes/Fuzz/firefox/firefox http://localhost:8000/testcase.html
usage: main.py [-h] [-a ABORT_TOKEN] [-d] [-e EXTENSION] [-l LOG]
[--log-limit LOG_LIMIT] [-m MEMORY]
[--poll-interval POLL_INTERVAL] [-p PREFS] [-P PROFILE]
[-t TIMEOUT] [-u URL] [-v] [--xvfb] [--gdb] [--rr]
[--valgrind]
binary
main.py: error: unrecognized arguments: http://localhost:8000/testcase.html
I'm not that familiar with python but I did start the simple HTTP server using : python -m SimpleHTTPServer in the same folder where I ran the other command.
Am I missing something ?
Jason can you please take a look at this and confirm the Fix in Beta and Nightly ?
Reporter | ||
Comment 34•5 years ago
|
||
Ehsan, I will have jsbugmon check that the issue has been fixed but just FYI, you need to pass -u before the URL or file path.
Reporter | ||
Comment 35•5 years ago
|
||
Testcase is verified as fixed on mozilla-central rev fb4f281c1c54a5199e6e713c1b8115f80d7faa37.
Comment 36•5 years ago
|
||
(In reply to Rares Doghi from comment #33)
Am I missing something ?
Yes, -u http://localhost:8000/testcase.html
is what was missing.
Comment 37•5 years ago
|
||
After setting the -u before the URL i was getting this error :
usage: rr [-h] [--interval INTERVAL] [--exclude EXCLUDE]
[--loglevel {NOTSET,DEBUG,INFO,WARNING,ERROR,CRITICAL}]
...
rr: error: unrecognized arguments: --version
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/svuser/Desktop/Test/ffpuppet-master/ffpuppet/main.py", line 9, in <module>
main()
File "/home/svuser/Desktop/Test/ffpuppet-master/ffpuppet/main.py", line 189, in main
use_rr=getattr(args, "rr", False))
File "/home/svuser/Desktop/Test/ffpuppet-master/ffpuppet/core.py", line 89, in init
subprocess.check_output(["rr", "--version"])
File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/usr/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['rr', '--version']' returned non-zero exit status 2
Jason can you please try Beta as well ? In the mean time I will update the flag on Nightly 75 based on comment 35.
Updated•5 years ago
|
Reporter | ||
Comment 38•5 years ago
|
||
Testcase is verified as fixed on mozilla-beta rev 609993945f74.
Comment 39•5 years ago
|
||
(In reply to Rares Doghi from comment #37)
Also you don't need --rr
to verify the bugs. As you saw here it can add unnecessary overhead sometimes.
Comment 40•5 years ago
|
||
Thank you For checking Jason , also thanks for the help Tyson after removing the --rr this worked, I will update the flags accordingly. This issue is Verified as fixed in Our latest Nightly build and Beta.
Updated•5 years ago
|
Updated•4 years ago
|
Description
•