Crash in mozilla::dom::workers::GetCurrentThreadJSContext() when closing a Web worker with an XHR

RESOLVED FIXED in Firefox 35

Status

()

--
critical
RESOLVED FIXED
4 years ago
a year ago

People

(Reporter: m.gosselin, Assigned: swu)

Tracking

(4 keywords)

32 Branch
mozilla36
x86_64
Windows 7
crash, regression, reproducible, testcase
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox32 wontfix, firefox33+ wontfix, firefox34+ wontfix, firefox35+ fixed, firefox36+ fixed, firefox-esr31 unaffected)

Details

(crash signature)

Attachments

(4 attachments, 2 obsolete attachments)

(Reporter)

Description

4 years ago
Created attachment 8484963 [details]
crash.zip

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0
Build ID: 20140825202822

Steps to reproduce:

Unzip and deploy the attached file to a JSP capable web server, and display crash.html in Firefox 32.


Actual results:

Firefox crashes.


Expected results:

Firefox displays a blank page.
Do you have a crashreport (from about:crashes)?
Flags: needinfo?(m.gosselin)
(Reporter)

Comment 2

4 years ago
Here you are: https://crash-stats.mozilla.com/report/index/49a275cc-2ff5-4cd1-8f94-cdddc2140905
Flags: needinfo?(m.gosselin)

Comment 4

4 years ago
Maybe a dupe of bug 988626.
Could you host a self-contained testcase (public webpage crashing FF), please?
Flags: needinfo?(m.gosselin)
(Reporter)

Comment 5

4 years ago
The crash is still present in 32.01.
A sample of the code that crashes FF is now hosted here: http://tomlev.free.fr/bug-ff32/crash.html
(Reporter)

Updated

4 years ago
Flags: needinfo?(m.gosselin)

Comment 6

4 years ago
CR: https://crash-stats.mozilla.com/report/index/007a230a-5810-41c2-b7b0-006e82140916
Severity: normal → critical
Crash Signature: [@ mozilla::dom::workers::GetCurrentThreadJSContext() ]
Keywords: crash, reproducible, testcase

Updated

4 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true

Updated

4 years ago
Summary: Firefox crashes when closing a Web worker with an XHR → Crash in mozilla::dom::workers::GetCurrentThreadJSContext() when closing a Web worker with an XHR

Comment 7

4 years ago
Regression range I found, but i'm not sure:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=62d33e3ba514&tochange=a2f0e0619332

Alice, do you confirm?
Flags: needinfo?(alice0775)

Comment 8

4 years ago
Regression window(m-i)
Good:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f3d257065b33
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140605191457
Bad:
https://hg.mozilla.org/integration/mozilla-inbound/rev/87ed776cbacd
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140605222857
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f3d257065b33&tochange=87ed776cbacd

Regressed by: Bug 1014466
Blocks: 1014466
Flags: needinfo?(alice0775)

Comment 9

4 years ago

Ty for confirming.
status-firefox32: --- → affected
tracking-firefox33: --- → ?
tracking-firefox34: --- → ?
tracking-firefox35: --- → ?
Flags: needinfo?(swu)
Keywords: regression
status-firefox32: affected → wontfix
status-firefox33: --- → affected
status-firefox34: --- → affected
status-firefox35: --- → affected
status-firefox-esr31: --- → unaffected
tracking-firefox33: ? → +
tracking-firefox34: ? → +
tracking-firefox35: ? → +
Created attachment 8490384 [details] [diff] [review]
Patch

I can't actually reproduce the crash but this should fix it.

I wonder if we should rethink this whole mechanism though because calling Run() from Cancel() just doesn't work at this stage.  Maybe we should change all the Cancel()s to do this and assert that we don't reenter Run()?
Assignee: nobody → khuey
Status: NEW → ASSIGNED
Attachment #8490384 - Flags: review?(bent.mozilla)
Flags: needinfo?(swu)

Comment 11

4 years ago
When I was bisecting, the crash wasn't automatic, I obtained 3 regression ranges, only the last one was good.
Comment on attachment 8490384 [details] [diff] [review]
Patch

Review of attachment 8490384 [details] [diff] [review]:
-----------------------------------------------------------------

Ok...
Attachment #8490384 - Flags: review?(bent.mozilla) → review+
(Assignee)

Comment 13

4 years ago
I tested the patch, it does fix crash in GetCurrentThreadJSContext().  But it still crashed in later place.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc44fc700 (LWP 11814)]
mozilla::dom::workers::WorkerPrivate::AssertIsOnWorkerThread (this=0x5a5a5a5a5a5a5a5a)
    at /home/sywu/work/mozilla-central/dom/workers/WorkerPrivate.cpp:5901
5901	  MOZ_ASSERT(mPRThread,
(gdb) bt
#0  mozilla::dom::workers::WorkerPrivate::AssertIsOnWorkerThread (this=0x5a5a5a5a5a5a5a5a)
    at /home/sywu/work/mozilla-central/dom/workers/WorkerPrivate.cpp:5901
#1  0x00007ffff1e3d9dc in mozilla::dom::workers::XMLHttpRequest::Unpin (this=0x7fffc8752d50)
    at /home/sywu/work/mozilla-central/dom/workers/XMLHttpRequest.cpp:1835
#2  0x00007ffff1e3e423 in WorkerRun (this=0x7fffc83c9880, aCx=<optimized out>, aWorkerPrivate=<optimized out>)
    at /home/sywu/work/mozilla-central/dom/workers/XMLHttpRequest.cpp:361
#3  (anonymous namespace)::LoadStartDetectionRunnable::ProxyCompleteRunnable::WorkerRun (this=0x7fffc83c9880, 
    aCx=<optimized out>, aWorkerPrivate=0x7fffc5d13800)
    at /home/sywu/work/mozilla-central/dom/workers/XMLHttpRequest.cpp:348
#4  0x00007ffff1e3d5a1 in (anonymous namespace)::LoadStartDetectionRunnable::ProxyCompleteRunnable::Cancel (this=
    0x7fffc83c9880) at /home/sywu/work/mozilla-central/dom/workers/XMLHttpRequest.cpp:377
#5  0x00007ffff1e3ad31 in mozilla::dom::workers::WorkerRunnable::Run (this=0x7fffc83c9880)
    at /home/sywu/work/mozilla-central/dom/workers/WorkerRunnable.cpp:272
#6  0x00007ffff0d683e0 in nsThread::ProcessNextEvent (this=0x7fffd32c1670, aMayWait=false, aResult=0x7fffc44fb8cf)
    at /home/sywu/work/mozilla-central/xpcom/threads/nsThread.cpp:823
#7  0x00007ffff0d8aaa5 in NS_ProcessPendingEvents (aThread=0x7fffd32c1670, aTimeout=4294967295)
    at /home/sywu/work/mozilla-central/xpcom/glue/nsThreadUtils.cpp:207
#8  0x00007ffff1e2a99c in mozilla::dom::workers::WorkerPrivate::ClearMainEventQueue (this=0x7fffc5d13800, 
    aRanOrNot=<optimized out>) at /home/sywu/work/mozilla-central/dom/workers/WorkerPrivate.cpp:4555
#9  0x00007ffff1e2ad82 in mozilla::dom::workers::WorkerPrivate::ScheduleDeletion (this=0x7fffc5d13800, aRanOrNot=
    mozilla::dom::workers::WorkerPrivate::WorkerRan)
    at /home/sywu/work/mozilla-central/dom/workers/WorkerPrivate.cpp:4312
#10 0x00007ffff1e0d530 in (anonymous namespace)::WorkerThreadPrimaryRunnable::Run (this=0x7fffc8603ee0)
    at /home/sywu/work/mozilla-central/dom/workers/RuntimeService.cpp:2822
#11 0x00007ffff0d683e0 in nsThread::ProcessNextEvent (this=0x7fffd32c1670, aMayWait=false, aResult=0x7fffc44fbcff)
    at /home/sywu/work/mozilla-central/xpcom/threads/nsThread.cpp:823
#12 0x00007ffff0d8a9e0 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=false)
    at /home/sywu/work/mozilla-central/xpcom/glue/nsThreadUtils.cpp:265
#13 0x00007ffff1014f2e in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0x7fffca2fa700, aDelegate=
    0x7fffcc362aa0) at /home/sywu/work/mozilla-central/ipc/glue/MessagePump.cpp:339
#14 0x00007ffff0ff8a7e in MessageLoop::RunInternal (this=0x7fffcc362aa0)
    at /home/sywu/work/mozilla-central/ipc/chromium/src/base/message_loop.cc:234
#15 0x00007ffff0ff8af2 in RunHandler (this=0x7fffcc362aa0)
    at /home/sywu/work/mozilla-central/ipc/chromium/src/base/message_loop.cc:227
---Type <return> to continue, or q <return> to quit---
#16 MessageLoop::Run (this=0x7fffcc362aa0) at /home/sywu/work/mozilla-central/ipc/chromium/src/base/message_loop.cc:201
#17 0x00007ffff0d67ed1 in nsThread::ThreadFunc (aArg=0x7fffd32c1670)
    at /home/sywu/work/mozilla-central/xpcom/threads/nsThread.cpp:350
#18 0x00007ffff67e6de7 in _pt_root (arg=0x7fffc94698a0)
    at /home/sywu/work/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:212
#19 0x00007ffff7bc4e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#20 0x00007ffff6ee9ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#21 0x0000000000000000 in ?? ()
(gdb) 

The problem is caused by Unpin() inside XMLHttpRequest::MaybeDispatchPrematureAbortEvents(), which added in bug 1014466.
http://dxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.cpp#1761

Once the XHR object been unpinned, the object will be deleted when closing worker.  Then the crash happens later when ProxyCompleteRunnable thread tries to unpin the dead object.

The purpose of unpin in when abort is to allow setting resopnseType again after abort().  Maybe we should find other way to acomplish it?  I don't have a good idea so far.
Flags: needinfo?(khuey)
(Assignee)

Comment 14

4 years ago
Created attachment 8492018 [details] [diff] [review]
Patch: Allow XHR worker to set overrideMimeType/responseType after an aborted send.

This fixed the crash problem casued by Unpin().

When the send is aborted and before a new send, the mSeenLoadStart must be to false, and mReadyState could only be UNSENT/OPENED, we can use it to determined an aborted send.
Attachment #8492018 - Flags: review?(khuey)
Flags: needinfo?(khuey)
It would be nice if this could land soon. Beta6 is going to build today.
thanks
(In reply to Sylvestre Ledru [:sylvestre] from comment #15)
> It would be nice if this could land soon. Beta6 is going to build today.
> thanks

This code is fragile.  I wouldn't be comfortable taking this on branches without a couple weeks testing on mozilla-central.
OK. Thanks for the feedback.
Let's mark it as wontfix for 33 and uplift it to aurora (34) once we are sure it is safe.
status-firefox33: affected → wontfix
Kyle - Last update was from over a month ago. You still going to work on this one?
Flags: needinfo?(khuey)
Comment on attachment 8492018 [details] [diff] [review]
Patch: Allow XHR worker to set overrideMimeType/responseType after an aborted send.

Review of attachment 8492018 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.  I wanted to simplify this code since it's far too complex, but I'm not going to get to that in the near term :/

I am reasonably confident that Shian-Yow's patch is correct.  If it's not, we should see intermittent crashes in our testing.

r=me if you check a test in too.
Attachment #8492018 - Flags: review?(khuey) → review+
(Assignee)

Comment 20

4 years ago
Created attachment 8521196 [details] [diff] [review]
Part 2: Test case.

Kyle, this is the mochitest to reproduce crash, please help to review.
Assignee: khuey → swu
Attachment #8521196 - Flags: review?(khuey)
https://hg.mozilla.org/mozilla-central/rev/d75aafdbe1a6
https://hg.mozilla.org/mozilla-central/rev/14391d50460e
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Kyle suggests that it's best to let this ride a later train and not uplift a fix to 34. I'm marking 34 as won't fix. If this is shown to be safe and you want to nom the fix for 35, please submit an approval request.
status-firefox34: affected → wontfix

Updated

4 years ago
tracking-firefox36: --- → ?
status-firefox35: affected → fixed
tracking-firefox36: ? → +
(Assignee)

Comment 25

4 years ago
Created attachment 8525286 [details] [diff] [review]
Part 1: Allow XHR worker to set overrideMimeType/responseType after an aborted send. r=khuey (as landed)

Approval Request Comment
[Feature/regressing bug #]: bug 1014466
[User impact if declined]: Potential crash if abort and close the worker during XHR loading.
[Describe test coverage new/current, TBPL]: Manual test and mochitest.
[Risks and why]: Low.  The patch removed code which caused potential crash in bug 1014466, and fixed it in a different way.
[String/UUID change made/needed]: n/a
Attachment #8492018 - Attachment is obsolete: true
Attachment #8525286 - Flags: review+
Attachment #8525286 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 26

4 years ago
Created attachment 8525288 [details] [diff] [review]
Part 2: Test case. (rebased for firefox35)

Approval Request Comment
[Feature/regressing bug #]: bug 1014466
[User impact if declined]: This is the test case.
[Describe test coverage new/current, TBPL]: Mochitest.
[Risks and why]: Low.  This is the test case.
[String/UUID change made/needed]: n/a
Attachment #8525288 - Flags: review?(khuey)
Attachment #8525288 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 27

4 years ago
The test case needs to be rebased for firefox35, because there is a path name difference between mozilla-aurora and mozilla-central (content/base/test -> dom/base/test).
Attachment #8525286 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8525288 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 28

4 years ago
checkin-needed for mozilla-aurora.  Thank you!
Keywords: checkin-needed
You don't need to set checkin-needed on uplifts. We have bug queries for them.
Keywords: checkin-needed
That is, when Release Management doesn't inadvertently set the wrong status flags :)
status-firefox35: fixed → affected
status-firefox36: --- → fixed
You need to log in before you can comment on or make changes to this bug.