Closed Bug 1334465 Opened 3 years ago Closed 2 years ago

[e10s] Crash in IPCError-browser | PHttpChannel::Msg_SetPriority Route error: message sent to unknown actor ID

Categories

(Core :: Networking, defect, P1, critical)

52 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr45 --- unaffected
firefox51 --- unaffected
firefox52 --- wontfix
firefox-esr52 59+ fixed
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox58 --- wontfix
firefox59 --- fixed
firefox60 --- fixed

People

(Reporter: philipp, Assigned: valentin.gosu)

Details

(Keywords: crash, regression, Whiteboard: [necko-active])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-91708d65-f0f7-4262-8302-320302170127.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	ntdll.dll 	KiFastSystemCallRet 	
1 	ntdll.dll 	NtWaitForMultipleObjects 	
2 	kernel32.dll 	CreateFileMappingA 	
3 	user32.dll 	RealMsgWaitForMultipleObjectsEx 	
4 	xul.dll 	mozilla::widget::WinUtils::WaitForMessage(unsigned long) 	widget/windows/WinUtils.cpp:792
5 	xul.dll 	nsAppShell::ProcessNextNativeEvent(bool) 	widget/windows/nsAppShell.cpp:382
6 	xul.dll 	nsBaseAppShell::DoProcessNextNativeEvent(bool) 	widget/nsBaseAppShell.cpp:138
7 	xul.dll 	nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) 	widget/nsBaseAppShell.cpp:289
8 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1189
9 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:361
10 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:124
11 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:301
12 	xul.dll 	_SEH_epilog4

crash reports in the content process with this signature are regressing in firefox 52 builds and later.
Component: Untriaged → Networking
FWIW, most (but not all) reports are coming from WinXP. The stacks are all over the place too. I don't think Networking is the right component, but I have no clue what the right one should be either. Any thoughts, Jim?
Flags: needinfo?(jmathies)
Network looks right, this is an async ipdl message that ends up getting delivered after the receiver gets torn down. The stacks are going to be all over the place.

http://searchfox.org/mozilla-central/source/netwerk/protocol/http/PHttpChannel.ipdl#180
Flags: needinfo?(jmathies)
Jason, can you please help find an owner for this?
Flags: needinfo?(jduell.mcbugs)
Valentin can you look into this? You have repair some of crashes like this.

Looking into the history, the oldest build id is 20161102030205.

Your patch for bug 1294719 landed 5 days before 2.Nov. Bug 1294719 should have repaired such crashes :)
Flags: needinfo?(valentin.gosu)
Whiteboard: [necko-next]
This i
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [necko-next] → [necko-active]
This is very strange, since calls to SendSetPriority are guarded in both the parent and child, and bug 1294719 should have fixed any race condition that may occur when deleting the channel. I'll investigate more.
Too late for a fix for 53. Looks like most of the crashes have now moved to ESR52.
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
433 crashes on ESR52 in the last week. Is there something actionable we can do in this bug?
Flags: needinfo?(valentin.gosu)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
> 433 crashes on ESR52 in the last week. Is there something actionable we can
> do in this bug?

I think I figured this out. It turns out HttpChannelParent::mIPCClosed isn't atomic. This turns out to be a problem, as DoSendSetPriority sometimes gets called off-main-thread.
Good news is this patch is very easy to uplift to ESR, and risk free.
Flags: needinfo?(valentin.gosu)
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race

https://reviewboard.mozilla.org/r/219038/#review224944
Attachment #8949700 - Flags: review?(daniel) → review+
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/bd315ae86709
Make HttpChannelParent::mIPCClosed atomic r=bagder
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race

Approval Request Comment

[Feature/Bug causing the regression]:
unknown. Bug was filed in Firefox 52.

[User impact if declined]:
The content process crashes, with most of the crashes being in esr 52.

[Fix Landed on Version]:
Firefox 60

[Is this code covered by automated tests?]:
No

[Has the fix been verified in Nightly?]:
Just landed.

[Needs manual test from QE? If yes, steps to reproduce]:
No. This is almost impossible to manually reproduce, especially on the latest versions.
 
[List of other uplifts needed for the feature/fix]:
None.

[Is the change risky?]:
No. It just changes a variable to be atomic in order to avoid data races.

[Why is the change risky/not risky?]:
It is a very simple change, unlikely to cause unintended behaviour.

[String changes made/needed]:
None.
Attachment #8949700 - Flags: approval-mozilla-esr52?
Attachment #8949700 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/mozilla-central/rev/bd315ae86709
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race

Canceling uplift as a followup patch is needed. (making the variable atomic doesn't completely solve the race)
Attachment #8949700 - Flags: approval-mozilla-esr52?
Attachment #8949700 - Flags: approval-mozilla-beta?
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/4ebdab0e3328
Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race r=bagder
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race

Approval Request Comment

[Feature/Bug causing the regression]:
Bug 1274886. Failures started in Firefox 52.

[User impact if declined]:
The content process crashes, with most of the crashes being in esr 52.

[Fix Landed on Version]:
Firefox 60

[Is this code covered by automated tests?]:
No

[Has the fix been verified in Nightly?]:
Just landed.

[Needs manual test from QE? If yes, steps to reproduce]:
No. This is almost impossible to manually reproduce, especially on the latest versions.
 
[List of other uplifts needed for the feature/fix]:
https://hg.mozilla.org/mozilla-central/rev/bd315ae86709 (also landed in this bug)

[Is the change risky?]:
No. It just makes a variable atomic and makes sure to set it before sending the "supposed to be last" IPC message.

[Why is the change risky/not risky?]:
It is a very simple change, unlikely to cause unintended behaviour.

[String changes made/needed]:
None.
Attachment #8949700 - Flags: approval-mozilla-esr52?
Attachment #8949700 - Flags: approval-mozilla-beta?
Backout by aiakab@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/e78a36e944d2
Backed out changeset 4ebdab0e3328 for frequently failing marionette\test_refresh_firefox.py a=backout
Backed out changeset 4ebdab0e3328 (bug 1334465) for frequently failing marionette\test_refresh_firefox.py a=backout
Link to the log of failurehttps://treeherder.mozilla.org/logviewer.html#?job_id=161636850&repo=autoland&lineNumber=48779
Backou revision e78a36e944d2
Failing push https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=bd315ae86709c3459a3dbf0778022ff3b1908723&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=success&filter-searchStr=marionette
Status: RESOLVED → REOPENED
Flags: needinfo?(valentin.gosu)
Resolution: FIXED → ---
Target Milestone: mozilla60 → ---
(In reply to Arthur Iakab [arthur_iakab] from comment #22)
> Backed out changeset 4ebdab0e3328 (bug 1334465) for frequently failing
> marionette\test_refresh_firefox.py a=backout

That's really really odd. That patch really shouldn't be causing such issues.
From what I can tell in bug 1425323 failures started a few weeks ago, but for some reason they started to spike with my patch. I'll try to figure out why.
Flags: needinfo?(valentin.gosu)
Thanks Valentin. Perhaps with this spike we can figure out what's actually causing it. If you want to run this test execute the following command and get all the log info:

> mach marionette test -vv --gecko-log - %path_to_test_file%
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fc4ad53516e0
Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race r=bagder
Backed out again. The previous backout only reverted the follow-up fix:

https://hg.mozilla.org/integration/mozilla-inbound/rev/f49cf3d147dee2eb1fd2b726cfbae6d01b719d55
Flags: needinfo?(valentin.gosu)
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/bad527a1b855
Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race r=bagder a=Aryx
https://hg.mozilla.org/mozilla-central/rev/db42a5fc7587
Backed out 2 changesets for frequently failing marionette\test_refresh_firefox.py a=Aryx
I see a bunch of failures since the full backout. I take it that confirms that we can land it back again?
Flags: needinfo?(valentin.gosu) → needinfo?(aryx.bugmail)
Comment on attachment 8949700 [details]
Bug 1334465 - Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race

content crash fix, Beta59+, ESR52+
Attachment #8949700 - Flags: approval-mozilla-esr52?
Attachment #8949700 - Flags: approval-mozilla-esr52+
Attachment #8949700 - Flags: approval-mozilla-beta?
Attachment #8949700 - Flags: approval-mozilla-beta+
Flags: needinfo?(aryx.bugmail)
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/48b9f6671588
Make HttpChannelParent::mIPCClosed atomic r=bagder
https://hg.mozilla.org/integration/mozilla-inbound/rev/cfc9afe91609
Set mIPCClosed to true before calling SendDeleteSelf in order to avoid race r=bagder
https://hg.mozilla.org/mozilla-central/rev/48b9f6671588
https://hg.mozilla.org/mozilla-central/rev/cfc9afe91609
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.