Closed Bug 1063538 Opened 9 years ago Closed 9 years ago

Crash in mozilla::dom::workers::GetCurrentThreadJSContext() when closing a Web worker with an XHR

Categories

(Core :: DOM: Workers, defect)

32 Branch
x86_64
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla36
Tracking Status
firefox32 --- wontfix
firefox33 + wontfix
firefox34 + wontfix
firefox35 + fixed
firefox36 + fixed
firefox-esr31 --- unaffected

People

(Reporter: m.gosselin, Assigned: swu)

References

Details

(4 keywords)

Crash Data

Attachments

(4 files, 2 obsolete files)

Attached file crash.zip
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0
Build ID: 20140825202822

Steps to reproduce:

Unzip and deploy the attached file to a JSP capable web server, and display crash.html in Firefox 32.


Actual results:

Firefox crashes.


Expected results:

Firefox displays a blank page.
Do you have a crashreport (from about:crashes)?
Flags: needinfo?(m.gosselin)
Maybe a dupe of bug 988626.
Could you host a self-contained testcase (public webpage crashing FF), please?
Flags: needinfo?(m.gosselin)
The crash is still present in 32.01.
A sample of the code that crashes FF is now hosted here: http://tomlev.free.fr/bug-ff32/crash.html
Flags: needinfo?(m.gosselin)
CR: https://crash-stats.mozilla.com/report/index/007a230a-5810-41c2-b7b0-006e82140916
Severity: normal → critical
Crash Signature: [@ mozilla::dom::workers::GetCurrentThreadJSContext() ]
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Firefox crashes when closing a Web worker with an XHR → Crash in mozilla::dom::workers::GetCurrentThreadJSContext() when closing a Web worker with an XHR
Regression range I found, but i'm not sure:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=62d33e3ba514&tochange=a2f0e0619332

Alice, do you confirm?
Flags: needinfo?(alice0775)
Regression window(m-i)
Good:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f3d257065b33
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140605191457
Bad:
https://hg.mozilla.org/integration/mozilla-inbound/rev/87ed776cbacd
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140605222857
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f3d257065b33&tochange=87ed776cbacd

Regressed by: Bug 1014466
Blocks: 1014466
Flags: needinfo?(alice0775)

Ty for confirming.
Flags: needinfo?(swu)
Keywords: regression
Attached patch Patch (obsolete) — Splinter Review
I can't actually reproduce the crash but this should fix it.

I wonder if we should rethink this whole mechanism though because calling Run() from Cancel() just doesn't work at this stage.  Maybe we should change all the Cancel()s to do this and assert that we don't reenter Run()?
Assignee: nobody → khuey
Status: NEW → ASSIGNED
Attachment #8490384 - Flags: review?(bent.mozilla)
Flags: needinfo?(swu)
When I was bisecting, the crash wasn't automatic, I obtained 3 regression ranges, only the last one was good.
Comment on attachment 8490384 [details] [diff] [review]
Patch

Review of attachment 8490384 [details] [diff] [review]:
-----------------------------------------------------------------

Ok...
Attachment #8490384 - Flags: review?(bent.mozilla) → review+
I tested the patch, it does fix crash in GetCurrentThreadJSContext().  But it still crashed in later place.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc44fc700 (LWP 11814)]
mozilla::dom::workers::WorkerPrivate::AssertIsOnWorkerThread (this=0x5a5a5a5a5a5a5a5a)
    at /home/sywu/work/mozilla-central/dom/workers/WorkerPrivate.cpp:5901
5901	  MOZ_ASSERT(mPRThread,
(gdb) bt
#0  mozilla::dom::workers::WorkerPrivate::AssertIsOnWorkerThread (this=0x5a5a5a5a5a5a5a5a)
    at /home/sywu/work/mozilla-central/dom/workers/WorkerPrivate.cpp:5901
#1  0x00007ffff1e3d9dc in mozilla::dom::workers::XMLHttpRequest::Unpin (this=0x7fffc8752d50)
    at /home/sywu/work/mozilla-central/dom/workers/XMLHttpRequest.cpp:1835
#2  0x00007ffff1e3e423 in WorkerRun (this=0x7fffc83c9880, aCx=<optimized out>, aWorkerPrivate=<optimized out>)
    at /home/sywu/work/mozilla-central/dom/workers/XMLHttpRequest.cpp:361
#3  (anonymous namespace)::LoadStartDetectionRunnable::ProxyCompleteRunnable::WorkerRun (this=0x7fffc83c9880, 
    aCx=<optimized out>, aWorkerPrivate=0x7fffc5d13800)
    at /home/sywu/work/mozilla-central/dom/workers/XMLHttpRequest.cpp:348
#4  0x00007ffff1e3d5a1 in (anonymous namespace)::LoadStartDetectionRunnable::ProxyCompleteRunnable::Cancel (this=
    0x7fffc83c9880) at /home/sywu/work/mozilla-central/dom/workers/XMLHttpRequest.cpp:377
#5  0x00007ffff1e3ad31 in mozilla::dom::workers::WorkerRunnable::Run (this=0x7fffc83c9880)
    at /home/sywu/work/mozilla-central/dom/workers/WorkerRunnable.cpp:272
#6  0x00007ffff0d683e0 in nsThread::ProcessNextEvent (this=0x7fffd32c1670, aMayWait=false, aResult=0x7fffc44fb8cf)
    at /home/sywu/work/mozilla-central/xpcom/threads/nsThread.cpp:823
#7  0x00007ffff0d8aaa5 in NS_ProcessPendingEvents (aThread=0x7fffd32c1670, aTimeout=4294967295)
    at /home/sywu/work/mozilla-central/xpcom/glue/nsThreadUtils.cpp:207
#8  0x00007ffff1e2a99c in mozilla::dom::workers::WorkerPrivate::ClearMainEventQueue (this=0x7fffc5d13800, 
    aRanOrNot=<optimized out>) at /home/sywu/work/mozilla-central/dom/workers/WorkerPrivate.cpp:4555
#9  0x00007ffff1e2ad82 in mozilla::dom::workers::WorkerPrivate::ScheduleDeletion (this=0x7fffc5d13800, aRanOrNot=
    mozilla::dom::workers::WorkerPrivate::WorkerRan)
    at /home/sywu/work/mozilla-central/dom/workers/WorkerPrivate.cpp:4312
#10 0x00007ffff1e0d530 in (anonymous namespace)::WorkerThreadPrimaryRunnable::Run (this=0x7fffc8603ee0)
    at /home/sywu/work/mozilla-central/dom/workers/RuntimeService.cpp:2822
#11 0x00007ffff0d683e0 in nsThread::ProcessNextEvent (this=0x7fffd32c1670, aMayWait=false, aResult=0x7fffc44fbcff)
    at /home/sywu/work/mozilla-central/xpcom/threads/nsThread.cpp:823
#12 0x00007ffff0d8a9e0 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=false)
    at /home/sywu/work/mozilla-central/xpcom/glue/nsThreadUtils.cpp:265
#13 0x00007ffff1014f2e in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0x7fffca2fa700, aDelegate=
    0x7fffcc362aa0) at /home/sywu/work/mozilla-central/ipc/glue/MessagePump.cpp:339
#14 0x00007ffff0ff8a7e in MessageLoop::RunInternal (this=0x7fffcc362aa0)
    at /home/sywu/work/mozilla-central/ipc/chromium/src/base/message_loop.cc:234
#15 0x00007ffff0ff8af2 in RunHandler (this=0x7fffcc362aa0)
    at /home/sywu/work/mozilla-central/ipc/chromium/src/base/message_loop.cc:227
---Type <return> to continue, or q <return> to quit---
#16 MessageLoop::Run (this=0x7fffcc362aa0) at /home/sywu/work/mozilla-central/ipc/chromium/src/base/message_loop.cc:201
#17 0x00007ffff0d67ed1 in nsThread::ThreadFunc (aArg=0x7fffd32c1670)
    at /home/sywu/work/mozilla-central/xpcom/threads/nsThread.cpp:350
#18 0x00007ffff67e6de7 in _pt_root (arg=0x7fffc94698a0)
    at /home/sywu/work/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:212
#19 0x00007ffff7bc4e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#20 0x00007ffff6ee9ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#21 0x0000000000000000 in ?? ()
(gdb) 

The problem is caused by Unpin() inside XMLHttpRequest::MaybeDispatchPrematureAbortEvents(), which added in bug 1014466.
http://dxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.cpp#1761

Once the XHR object been unpinned, the object will be deleted when closing worker.  Then the crash happens later when ProxyCompleteRunnable thread tries to unpin the dead object.

The purpose of unpin in when abort is to allow setting resopnseType again after abort().  Maybe we should find other way to acomplish it?  I don't have a good idea so far.
Flags: needinfo?(khuey)
This fixed the crash problem casued by Unpin().

When the send is aborted and before a new send, the mSeenLoadStart must be to false, and mReadyState could only be UNSENT/OPENED, we can use it to determined an aborted send.
Attachment #8492018 - Flags: review?(khuey)
Flags: needinfo?(khuey)
It would be nice if this could land soon. Beta6 is going to build today.
thanks
(In reply to Sylvestre Ledru [:sylvestre] from comment #15)
> It would be nice if this could land soon. Beta6 is going to build today.
> thanks

This code is fragile.  I wouldn't be comfortable taking this on branches without a couple weeks testing on mozilla-central.
OK. Thanks for the feedback.
Let's mark it as wontfix for 33 and uplift it to aurora (34) once we are sure it is safe.
Kyle - Last update was from over a month ago. You still going to work on this one?
Flags: needinfo?(khuey)
Comment on attachment 8492018 [details] [diff] [review]
Patch: Allow XHR worker to set overrideMimeType/responseType after an aborted send.

Review of attachment 8492018 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.  I wanted to simplify this code since it's far too complex, but I'm not going to get to that in the near term :/

I am reasonably confident that Shian-Yow's patch is correct.  If it's not, we should see intermittent crashes in our testing.

r=me if you check a test in too.
Attachment #8492018 - Flags: review?(khuey) → review+
Kyle, this is the mochitest to reproduce crash, please help to review.
Assignee: khuey → swu
Attachment #8521196 - Flags: review?(khuey)
https://hg.mozilla.org/mozilla-central/rev/d75aafdbe1a6
https://hg.mozilla.org/mozilla-central/rev/14391d50460e
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Kyle suggests that it's best to let this ride a later train and not uplift a fix to 34. I'm marking 34 as won't fix. If this is shown to be safe and you want to nom the fix for 35, please submit an approval request.
Approval Request Comment
[Feature/regressing bug #]: bug 1014466
[User impact if declined]: Potential crash if abort and close the worker during XHR loading.
[Describe test coverage new/current, TBPL]: Manual test and mochitest.
[Risks and why]: Low.  The patch removed code which caused potential crash in bug 1014466, and fixed it in a different way.
[String/UUID change made/needed]: n/a
Attachment #8492018 - Attachment is obsolete: true
Attachment #8525286 - Flags: review+
Attachment #8525286 - Flags: approval-mozilla-aurora?
Approval Request Comment
[Feature/regressing bug #]: bug 1014466
[User impact if declined]: This is the test case.
[Describe test coverage new/current, TBPL]: Mochitest.
[Risks and why]: Low.  This is the test case.
[String/UUID change made/needed]: n/a
Attachment #8525288 - Flags: review?(khuey)
Attachment #8525288 - Flags: approval-mozilla-aurora?
The test case needs to be rebased for firefox35, because there is a path name difference between mozilla-aurora and mozilla-central (content/base/test -> dom/base/test).
Attachment #8525286 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8525288 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
checkin-needed for mozilla-aurora.  Thank you!
Keywords: checkin-needed
You don't need to set checkin-needed on uplifts. We have bug queries for them.
Keywords: checkin-needed
That is, when Release Management doesn't inadvertently set the wrong status flags :)
You need to log in before you can comment on or make changes to this bug.