Closed
Bug 1368600
Opened 7 years ago
Closed 6 years ago
Firefox isn't launched with e10s+sandbox after updating it
Categories
(Core :: Security: Process Sandboxing, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | blocking | fixed |
People
(Reporter: m_kato, Assigned: bobowen)
References
Details
(Keywords: regression, Whiteboard: sb+)
Attachments
(1 file)
3.38 KB,
patch
|
jimm
:
review+
|
Details | Diff | Splinter Review |
After updating Nightly, I cannot sometimes launch Nightly with e10s+content sandbox. When debugging this, image_path into memory isn't same as process_path. But I don't know why image_path by GetImageFilePath (GetMappedFileNameW) is *\updated\firefox.exe. 0:008> x @rsi process = 0x00000000`0000027c 0000002f`951fec70 process_path = "\Device\HarddiskVolume3\Program Files\Nightly\firefox.exe" 0000002f`951feb18 mem_info = struct _MEMORY_BASIC_INFORMATION 0000002f`951fec50 image_path = "\Device\HarddiskVolume3\Program Files\Nightly\updated\firefox.exe" 0:008> k Child-SP RetAddr Call Site 0000002f`951feaf0 00007ff7`6170b50c firefox!sandbox::GetProcessBaseAddress+0x9f 0000002f`951fecc0 00007ff7`616f37a0 firefox!sandbox::TargetProcess::Create+0x1cc 0000002f`951fedb0 00007ffe`15d8f1ee firefox!sandbox::BrokerServicesBase::SpawnTarget+0x3f0 0000002f`951ff0f0 00007ffe`15f981be xul!mozilla::SandboxBroker::LaunchApp+0x1c6 0000002f`951ff400 00007ffe`15f97ab5 xul!mozilla::ipc::GeckoChildProcessHost::PerformAsyncLaunchInternal+0x592 0000002f`951ff620 00007ffe`15f99419 xul!mozilla::ipc::GeckoChildProcessHost::PerformAsyncLaunch+0x61 0000002f`951ff750 00007ffe`15f98f49 xul!mozilla::ipc::GeckoChildProcessHost::RunPerformAsyncLaunch+0x45 (Inline Function) --------`-------- xul!mozilla::detail::RunnableMethodArguments<std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,enum base::ProcessArchitecture>::applyImpl+0x1a (Inline Function) --------`-------- xul!mozilla::detail::RunnableMethodArguments<std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,enum base::ProcessArchitecture>::apply+0x1a 0000002f`951ff7b0 00007ffe`15456ac0 xul!mozilla::detail::RunnableMethodImpl<mozilla::ipc::GeckoChildProcessHost * __ptr64 const,bool (__cdecl mozilla::ipc::GeckoChildProcessHost::*)(std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,enum base::ProcessArchitecture) __ptr64,0,0,std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,enum base::ProcessArchitecture>::Run+0x39 0000002f`951ff800 00007ffe`15456a94 xul!MessageLoop::RunTask+0x24 0000002f`951ff830 00007ffe`1552856c xul!MessageLoop::DeferOrRunPendingTask+0x24 0000002f`951ff860 00007ffe`15796c0a xul!MessageLoop::DoWork+0x314 0000002f`951ff910 00007ffe`15797423 xul!base::MessagePumpForIO::DoRunLoop+0x1a (Inline Function) --------`-------- xul!base::MessagePumpWin::RunWithDispatcher+0x33 0000002f`951ff940 00007ffe`157976c3 xul!base::MessagePumpWin::Run+0x3f 0000002f`951ff990 00007ffe`15797686 xul!MessageLoop::RunHandler+0x1b 0000002f`951ff9c0 00007ffe`15797571 xul!MessageLoop::Run+0x3e 0000002f`951ffa10 00007ffe`1579748a xul!base::Thread::ThreadMain+0xdd 0000002f`951ffb70 00007ffe`54ad2774 xul!`anonymous namespace'::ThreadFunc+0xa 0000002f`951ffba0 00007ffe`567e0d61 KERNEL32!BaseThreadInitThunk+0x14 0000002f`951ffbd0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Reporter | ||
Updated•7 years ago
|
Summary: Firefox doesn't launch with e10s+sandbox after updating it → Firefox isn't launched with e10s+sandbox after updating it
Reporter | ||
Comment 1•7 years ago
|
||
Until rebooting Windows, it cannot be launched.
Assignee | ||
Comment 2•7 years ago
|
||
As part of the sandbox launch, the parent process has to patch things in the suspended child process before it releases it. It has to load the image to calculate where it needs to do that. If the images are getting out of sync, I suspect that isn't going to end well. Not sure what \Device\HarddiskVolume3\Program Files\Nightly\updated\firefox.exe is, I guess it's part of the updater. Given the need to reboot it sounds like there might be some sort of locking problem. rstrong - any chance this could be a similar issue to bug 1352192, where we launch a child process during update?
Flags: needinfo?(robert.strong.bugs)
![]() |
||
Comment 3•7 years ago
|
||
The Nightly\updated\firefox.exe is a copy of Nightly\firefox.exe that is then updated while Firefox is running and it shouldn't require a reboot to launch firefox.exe. With bug 1352192 what happens is one instance updates the installation while another instance is running and when e10s launches a binary it ends up launching the updated binary. This might be related somehow. Any idea how running code could end up referencing the copied firefox.exe?
Flags: needinfo?(robert.strong.bugs)
![]() |
||
Comment 4•7 years ago
|
||
Reference bug that implemented update staging - bug 307181
![]() |
||
Comment 5•7 years ago
|
||
So, this appears to be happening after the update has been staged and before the staged update has been moved to the installation directory.
![]() |
||
Updated•7 years ago
|
Whiteboard: sb+
Comment 6•7 years ago
|
||
[Tracking Requested - why for this release]: Assuming mconley is right, bug 1372222 and bug 1371714 are also this bug. I believe based on that we should block 55b1: I don't think we know how often this is happening to users, but it leaves the browser horribly broken. Is this a recent regression specifically on windows creator update? Or getting more prevalent with e10s-multi or a more recent sandboxing change?
status-firefox55:
--- → affected
tracking-firefox55:
--- → ?
I haven't experienced this before the creator update, but have seen it twice since - in a week. I don't know what sandboxing changes recently happened though, so that could have been a coincidence.
Assignee | ||
Comment 8•7 years ago
|
||
The sandbox code, that is getting mismatched paths, landed as part of an update to the chromium sandbox over 2 months ago. So, it looks like it might be an interaction between our updater, this sandbox code and some change in the Creators update. I'm just manually updating my laptop, to see if I can reproduce.
Comment 9•7 years ago
|
||
CCing Ritu due to the IRC discussion about this potentially being considered a 55b1 blocker.
status-firefox53:
--- → unaffected
status-firefox54:
--- → unaffected
status-firefox-esr52:
--- → unaffected
Keywords: regression
I am going to track this as a blocker for now until we know more on the frequency of this problem.
(In reply to Ritu Kothari (:ritu) from comment #10) > I am going to track this as a blocker for now until we know more on the > frequency of this problem. Do we have a way of doing this? We should at least, and as fast as we can, put in some kind of a message/telemetry/whatever when this happens. After all, we actually make the "terminate content process" call, so we should be able to track when we've made it and if we've ever gotten by that call successfully in the session. And if not, we should restart with e10s off. Bob, do you know if we have a way of measuring the frequency/size of the problem?
Flags: needinfo?(bobowencode)
Assignee | ||
Comment 12•7 years ago
|
||
This should add Telemtry when it fails to launch with the error code.
Attachment #8876859 -
Flags: review?(jmathies)
Attachment #8876859 -
Flags: feedback?(benjamin)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → bobowencode
Status: NEW → ASSIGNED
With beta staged rollout (starting with Dawn project), we'll push 55.0b1 and 55.0b2 < 25% of beta population. That should give us some indication of how severe this problem is. At the same time, Milan suggested we get in touch with MS to find a good fix. I will gtb 55.0b1 with Bob's patch to get more telemetry data on sandbox launch failures.
![]() |
||
Comment 14•7 years ago
|
||
Comment on attachment 8876859 [details] [diff] [review] Add telemetry for when a sandboxed child process fails to start r+ a telemetry probe that accumulates process creation failure results.
Attachment #8876859 -
Flags: review?(jmathies) → review+
(In reply to Ritu Kothari (:ritu) from comment #13) > ... > > At the same time, Milan suggested we get in touch with MS to find a good fix. Include in this conversation somebody that knows how we restart Windows builds after update, and why we'd have the updated/ in the path (comment 0.) 'Cause I can't explain how we do it, so I won't be able to ask the right questions :)
Assignee | ||
Comment 16•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=242569bedf9b2873927f47752df1dd1b249ef7cc
Flags: needinfo?(bobowencode)
FWIW, I just updated my win10 to creators edition, no trouble at all (so far) launching or browsing on Nightly56 and Beta54 RC.
Comment 18•7 years ago
|
||
Comment on attachment 8876859 [details] [diff] [review] Add telemetry for when a sandboxed child process fails to start Review of attachment 8876859 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/Histograms.json @@ +12477,5 @@ > + "kind": "enumerated", > + "n_values": 50, > + "bug_numbers": [1368600], > + "cpp_guard": "XP_WIN", > + "description": "Error code when a Windows sandboxed process fails to launch." What does each error code mean? Could you link from the description to a place where the error codes are defined?
Comment 19•7 years ago
|
||
(In reply to François Marier [:francois] from comment #18) > Comment on attachment 8876859 [details] [diff] [review] > Add telemetry for when a sandboxed child process fails to start > > Review of attachment 8876859 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: toolkit/components/telemetry/Histograms.json > @@ +12477,5 @@ > > + "kind": "enumerated", > > + "n_values": 50, > > + "bug_numbers": [1368600], > > + "cpp_guard": "XP_WIN", > > + "description": "Error code when a Windows sandboxed process fails to launch." > > What does each error code mean? Could you link from the description to a > place where the error codes are defined? Given that this is blocking 55b1 and that Bob won't get to it until tomorrow, I'm ok if this is addressed in a follow-up bug. datareview+ with the above follow-up work.
Thanks Francois. Hi Ryan, can we please get this landed on m-b soon? I'd like to gtb 55.0b1 in the next hour or two if possible.
Flags: needinfo?(ryanvm)
Comment 21•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/4bd58be05340
Assignee | ||
Updated•7 years ago
|
Attachment #8876859 -
Flags: feedback?(benjamin)
Comment 22•7 years ago
|
||
Pushed by bobowencode@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e62914155f64 Add telemetry with the error code for when a Windows sandboxed child process fails to start. r=jimm, data-review=francois
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Assignee | ||
Comment 23•7 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #15) > (In reply to Ritu Kothari (:ritu) from comment #13) > > ... > > > > At the same time, Milan suggested we get in touch with MS to find a good fix. > > Include in this conversation somebody that knows how we restart Windows > builds after update, and why we'd have the updated/ in the path (comment 0.) > 'Cause I can't explain how we do it, so I won't be able to ask the right > questions :) I imagine rstrong is the best person from an updater point of view. From looking at the code, I believe the staged update process is as follows (assuming default 64-bit paths for Nightly): * Copy from "C:\Program Files\Nightly" to "C:\Program Files\Nightly\updated" * Apply update to "C:\Program Files\Nightly\updated" * When restart is triggered ... launch updater.exe and shutdown Firefox. * updater renames "C:\Program Files\Nightly" to "C:\Program Files\Nightly.bak" * updater renames "C:\Program Files\Nightly.bak\updated" to "C:\Program Files\Nightly" * updater deletes "C:\Program Files\Nightly.bak" * updater relaunches Firefox It appears that somehow the "C:\Program Files\Nightly\updated\firefox.exe" is getting mapped into memory, prior to the rename to "C:Program Files\Nightly\firefox.exe". Then even though the child process is getting launched using "C:Program Files\Nightly\firefox.exe", the call to GetMappedFileNameW is still returning "C:\Program Files\Nightly\updated\firefox.exe". I've had no luck in trying to reproduce. I also tried running the firefox.exe in updated using a different profile and then triggering the restart and it seems to work fine, although the updated dir doesn't get deleted.
![]() |
||
Comment 24•7 years ago
|
||
I'm really at a loss as to how GetMappedFileNameW could be returning "C:\Program Files\Nightly\updated\firefox.exe", etc. Staging doesn't save much if any time over applying an update directly to the install directory on Windows. As a matter of fact, staging was broken on Windows for a release and since it falls back to applying directly to the install directory no one noticed it since updates were still successful. If anyone can reproduce this please try setting the app.update.staging preference to false and try to reproduce. Thanks!
![]() |
||
Comment 25•7 years ago
|
||
Bug 1370576 might also be related. I've also asked the reporter to check if it can be reproduced without staging.
Assignee | ||
Comment 26•7 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #23) ... > I've had no luck in trying to reproduce. > I also tried running the firefox.exe in updated using a different profile > and then triggering the restart and it seems to work fine, although the > updated dir doesn't get deleted. OK, I've managed to reproduce artificially. rstrong's comment made me realise that the above was probably blocking the renaming and it was just updating in place, which is why the updated dir was still there. If I run the ...\updated\firefox.exe with a different profile, then close it and then trigger the restart I get the problem. The mapped image name is being returned as ...\updated\firefox.exe just as in comment 0. So, it does look like the image is getting mapped and then the mapped name is not being updated correctly when it is renamed. What is not clear is how the image is getting mapped before the rename - virus checker maybe? The same steps on my pre-Creators update Win10 VM, does not cause the issue. Just about to update the VM as well to test. Clearly turning off the staged update would make these artificial steps impossible, but it would seem that it would fix the real problem as well.
![]() |
||
Comment 27•7 years ago
|
||
It looks like staging is also causing bug 1370576 and I suspect it is the same root cause.
![]() |
||
Comment 28•7 years ago
|
||
Can the updater terminate running firefox.exe processes vs. waiting for them to shutdown?
![]() |
||
Updated•7 years ago
|
![]() |
||
Comment 29•7 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #28) > Can the updater terminate running firefox.exe processes vs. waiting for them > to shutdown? From what I can tell by the logs in bug 1370576 the firefox.exe process has exited when this happens. Also, if something like this were to be done it would probably be best if it were done by the WatchDog code.
![]() |
||
Comment 30•7 years ago
|
||
Jim, do you have any idea why the copy of firefox.exe in the updated directory is getting mapped even though it isn't launched until after it is in the installation directory? I suspect Windows is doing something under the covers to cause this.
![]() |
||
Comment 31•7 years ago
|
||
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #29) > (In reply to Jim Mathies [:jimm] from comment #28) > > Can the updater terminate running firefox.exe processes vs. waiting for them > > to shutdown? > From what I can tell by the logs in bug 1370576 the firefox.exe process has > exited when this happens. The logs show that the process exited and the updater is able to get an exclusive file lock on firefox.exe so it has exited.
Comment 32•7 years ago
|
||
bugherder uplift |
Uplifted the addressed review comments to Beta as well: https://hg.mozilla.org/releases/mozilla-beta/rev/979d797835ad
Assignee | ||
Comment 33•7 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #26) > ... > The same steps on my pre-Creators update Win10 VM, does not cause the issue. > Just about to update the VM as well to test. Update finally finished and the issue is now reproducible with those steps, so it definitely points to a change in the Creators update.
![]() |
||
Comment 34•7 years ago
|
||
Bob, could you check that this isn't reproducible when the app.update.staging.enabled preference is set to false? Thanks!
(In reply to Bob Owen (:bobowen) from comment #26) > What is not clear is how the image is getting mapped before the rename - > virus checker maybe? I don't have anything beyond the Windows Defender (which is telling me that my apps and browser control is off and that I should do something about it.)
![]() |
||
Comment 36•7 years ago
|
||
Note: I just landed the patch in Bug 1370576 on mozilla-inbound to disable update staging on Windows.
Assignee | ||
Comment 37•7 years ago
|
||
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #34) > Bob, could you check that this isn't reproducible when the > app.update.staging.enabled preference is set to false? Thanks! As I said I'm manually running the firefox.exe in updated before triggering the rest of the update, so it would definitely stop those steps, but I haven't seen the issue during a normal update.
![]() |
||
Comment 38•7 years ago
|
||
Oh, I thought you reproduced without using artificial steps as well. The reason I was hoping for non artificial steps is because the firefox.exe in updated isn't launched.
I think I might have been two updates behind when I ran into this problem, but I'm not 100% sure of this. As in, I was still running Wednesday's version on Friday, and did an update from the hamburger menu.
Comment 40•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e62914155f64
Updated•7 years ago
|
Flags: needinfo?(ryanvm)
Assignee | ||
Comment 42•6 years ago
|
||
This should have been fixed even with staged updates with the patch from bug 1385928.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Depends on: 1385928
Keywords: leave-open
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•