Closed
Bug 1334465
Opened 8 years ago
Closed 7 years ago
[e10s] Crash in IPCError-browser | PHttpChannel::Msg_SetPriority Route error: message sent to unknown actor ID
Categories
(Core :: Networking, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla60
People
(Reporter: philipp, Assigned: valentin)
Details
(Keywords: crash, regression, Whiteboard: [necko-active])
Crash Data
Attachments
(1 file)
59 bytes,
text/x-review-board-request
|
bagder
:
review+
ritu
:
approval-mozilla-beta+
ritu
:
approval-mozilla-esr52+
|
Details |
This bug was filed from the Socorro interface and is
report bp-91708d65-f0f7-4262-8302-320302170127.
=============================================================
Crashing Thread (0)
Frame Module Signature Source
0 ntdll.dll KiFastSystemCallRet
1 ntdll.dll NtWaitForMultipleObjects
2 kernel32.dll CreateFileMappingA
3 user32.dll RealMsgWaitForMultipleObjectsEx
4 xul.dll mozilla::widget::WinUtils::WaitForMessage(unsigned long) widget/windows/WinUtils.cpp:792
5 xul.dll nsAppShell::ProcessNextNativeEvent(bool) widget/windows/nsAppShell.cpp:382
6 xul.dll nsBaseAppShell::DoProcessNextNativeEvent(bool) widget/nsBaseAppShell.cpp:138
7 xul.dll nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) widget/nsBaseAppShell.cpp:289
8 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1189
9 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:361
10 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:124
11 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:301
12 xul.dll _SEH_epilog4
crash reports in the content process with this signature are regressing in firefox 52 builds and later.
Updated•8 years ago
|
Component: Untriaged → Networking
Comment 1•8 years ago
|
||
FWIW, most (but not all) reports are coming from WinXP. The stacks are all over the place too. I don't think Networking is the right component, but I have no clue what the right one should be either. Any thoughts, Jim?
Flags: needinfo?(jmathies)
![]() |
||
Comment 2•8 years ago
|
||
Network looks right, this is an async ipdl message that ends up getting delivered after the receiver gets torn down. The stacks are going to be all over the place.
http://searchfox.org/mozilla-central/source/netwerk/protocol/http/PHttpChannel.ipdl#180
Flags: needinfo?(jmathies)
Comment 3•8 years ago
|
||
Jason, can you please help find an owner for this?
Flags: needinfo?(jduell.mcbugs)
Comment 4•8 years ago
|
||
Valentin can you look into this? You have repair some of crashes like this.
Looking into the history, the oldest build id is 20161102030205.
Your patch for bug 1294719 landed 5 days before 2.Nov. Bug 1294719 should have repaired such crashes :)
Flags: needinfo?(valentin.gosu)
Updated•8 years ago
|
Whiteboard: [necko-next]
Assignee | ||
Comment 5•8 years ago
|
||
This i
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [necko-next] → [necko-active]
Assignee | ||
Comment 6•8 years ago
|
||
This is very strange, since calls to SendSetPriority are guarded in both the parent and child, and bug 1294719 should have fixed any race condition that may occur when deleting the channel. I'll investigate more.
Updated•8 years ago
|
Updated•8 years ago
|
Comment 7•8 years ago
|
||
Too late for a fix for 53. Looks like most of the crashes have now moved to ESR52.
Comment 8•8 years ago
|
||
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Keywords: stale-bug
Comment 9•7 years ago
|
||
433 crashes on ESR52 in the last week. Is there something actionable we can do in this bug?
Flags: needinfo?(valentin.gosu)
Assignee | ||
Comment 10•7 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
> 433 crashes on ESR52 in the last week. Is there something actionable we can
> do in this bug?
I think I figured this out. It turns out HttpChannelParent::mIPCClosed isn't atomic. This turns out to be a problem, as DoSendSetPriority sometimes gets called off-main-thread.
Good news is this patch is very easy to uplift to ESR, and risk free.
Flags: needinfo?(valentin.gosu)
Comment hidden (mozreview-request) |
Comment 12•7 years ago
|
||
mozreview-review |
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race
https://reviewboard.mozilla.org/r/219038/#review224944
Attachment #8949700 -
Flags: review?(daniel) → review+
Comment 13•7 years ago
|
||
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/bd315ae86709
Make HttpChannelParent::mIPCClosed atomic r=bagder
Assignee | ||
Comment 14•7 years ago
|
||
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race
Approval Request Comment
[Feature/Bug causing the regression]:
unknown. Bug was filed in Firefox 52.
[User impact if declined]:
The content process crashes, with most of the crashes being in esr 52.
[Fix Landed on Version]:
Firefox 60
[Is this code covered by automated tests?]:
No
[Has the fix been verified in Nightly?]:
Just landed.
[Needs manual test from QE? If yes, steps to reproduce]:
No. This is almost impossible to manually reproduce, especially on the latest versions.
[List of other uplifts needed for the feature/fix]:
None.
[Is the change risky?]:
No. It just changes a variable to be atomic in order to avoid data races.
[Why is the change risky/not risky?]:
It is a very simple change, unlikely to cause unintended behaviour.
[String changes made/needed]:
None.
Attachment #8949700 -
Flags: approval-mozilla-esr52?
Attachment #8949700 -
Flags: approval-mozilla-beta?
Comment 15•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox60:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Assignee | ||
Comment 16•7 years ago
|
||
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race
Canceling uplift as a followup patch is needed. (making the variable atomic doesn't completely solve the race)
Attachment #8949700 -
Flags: approval-mozilla-esr52?
Attachment #8949700 -
Flags: approval-mozilla-beta?
Comment hidden (mozreview-request) |
Comment 18•7 years ago
|
||
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/4ebdab0e3328
Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race r=bagder
Comment 19•7 years ago
|
||
bugherder |
Assignee | ||
Comment 20•7 years ago
|
||
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race
Approval Request Comment
[Feature/Bug causing the regression]:
Bug 1274886. Failures started in Firefox 52.
[User impact if declined]:
The content process crashes, with most of the crashes being in esr 52.
[Fix Landed on Version]:
Firefox 60
[Is this code covered by automated tests?]:
No
[Has the fix been verified in Nightly?]:
Just landed.
[Needs manual test from QE? If yes, steps to reproduce]:
No. This is almost impossible to manually reproduce, especially on the latest versions.
[List of other uplifts needed for the feature/fix]:
https://hg.mozilla.org/mozilla-central/rev/bd315ae86709 (also landed in this bug)
[Is the change risky?]:
No. It just makes a variable atomic and makes sure to set it before sending the "supposed to be last" IPC message.
[Why is the change risky/not risky?]:
It is a very simple change, unlikely to cause unintended behaviour.
[String changes made/needed]:
None.
Attachment #8949700 -
Flags: approval-mozilla-esr52?
Attachment #8949700 -
Flags: approval-mozilla-beta?
Comment 21•7 years ago
|
||
Backout by aiakab@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/e78a36e944d2
Backed out changeset 4ebdab0e3328 for frequently failing marionette\test_refresh_firefox.py a=backout
Comment 22•7 years ago
|
||
Backed out changeset 4ebdab0e3328 (bug 1334465) for frequently failing marionette\test_refresh_firefox.py a=backout
Link to the log of failurehttps://treeherder.mozilla.org/logviewer.html#?job_id=161636850&repo=autoland&lineNumber=48779
Backou revision e78a36e944d2
Failing push https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=bd315ae86709c3459a3dbf0778022ff3b1908723&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=success&filter-searchStr=marionette
Status: RESOLVED → REOPENED
status-firefox60:
fixed → ---
Flags: needinfo?(valentin.gosu)
Resolution: FIXED → ---
Target Milestone: mozilla60 → ---
Assignee | ||
Comment 23•7 years ago
|
||
(In reply to Arthur Iakab [arthur_iakab] from comment #22)
> Backed out changeset 4ebdab0e3328 (bug 1334465) for frequently failing
> marionette\test_refresh_firefox.py a=backout
That's really really odd. That patch really shouldn't be causing such issues.
From what I can tell in bug 1425323 failures started a few weeks ago, but for some reason they started to spike with my patch. I'll try to figure out why.
Flags: needinfo?(valentin.gosu)
Comment 24•7 years ago
|
||
Thanks Valentin. Perhaps with this spike we can figure out what's actually causing it. If you want to run this test execute the following command and get all the log info:
> mach marionette test -vv --gecko-log - %path_to_test_file%
Comment 25•7 years ago
|
||
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fc4ad53516e0
Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race r=bagder
![]() |
||
Comment 26•7 years ago
|
||
Relanded as https://hg.mozilla.org/integration/mozilla-inbound/rev/fc4ad53516e01095be35542fd979c9e16d6e6b16 because the backout didn't fix the issue.
See https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=cb9b9f2707a33a1a85d5c2e74e2cd75a8afa337e&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=pending&filter-resultStatus=running
![]() |
||
Comment 27•7 years ago
|
||
Backed out again. The previous backout only reverted the follow-up fix:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f49cf3d147dee2eb1fd2b726cfbae6d01b719d55
Flags: needinfo?(valentin.gosu)
Comment 28•7 years ago
|
||
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/bad527a1b855
Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race r=bagder a=Aryx
https://hg.mozilla.org/mozilla-central/rev/db42a5fc7587
Backed out 2 changesets for frequently failing marionette\test_refresh_firefox.py a=Aryx
![]() |
||
Comment 29•7 years ago
|
||
Relanding: https://hg.mozilla.org/mozilla-central/rev/bad527a1b855
Backout of both patches: https://hg.mozilla.org/mozilla-central/rev/db42a5fc7587
![]() |
||
Comment 30•7 years ago
|
||
There are still 2 of those failures on the background push, so far. https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=db42a5fc758778ce332dac0c9026ced63ab42a95&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Assignee | ||
Comment 31•7 years ago
|
||
I see a bunch of failures since the full backout. I take it that confirms that we can land it back again?
Flags: needinfo?(valentin.gosu) → needinfo?(aryx.bugmail)
status-firefox59:
--- → affected
status-firefox60:
--- → affected
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race
content crash fix, Beta59+, ESR52+
Attachment #8949700 -
Flags: approval-mozilla-esr52?
Attachment #8949700 -
Flags: approval-mozilla-esr52+
Attachment #8949700 -
Flags: approval-mozilla-beta?
Attachment #8949700 -
Flags: approval-mozilla-beta+
Comment 33•7 years ago
|
||
bugherder uplift |
![]() |
||
Updated•7 years ago
|
Flags: needinfo?(aryx.bugmail)
Comment 34•7 years ago
|
||
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/48b9f6671588
Make HttpChannelParent::mIPCClosed atomic r=bagder
https://hg.mozilla.org/integration/mozilla-inbound/rev/cfc9afe91609
Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race r=bagder
Comment 35•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/48b9f6671588
https://hg.mozilla.org/mozilla-central/rev/cfc9afe91609
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Updated•7 years ago
|
status-firefox58:
--- → wontfix
Keywords: stale-bug
Comment 36•7 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•