Open Bug 1352192 Opened 8 years ago Updated 1 year ago

Crash in Windows sandbox after chromium update due to binary version mismatch during update.

Categories

(Toolkit :: Application Update, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

Tracking Status
firefox53 + wontfix
firefox54 + wontfix
firefox55 + wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox134 - ---

People

(Reporter: mccr8, Unassigned)

References

Details

(Keywords: crash, Whiteboard: sb+)

Crash Data

This bug was filed from the Socorro interface and is report bp-302f9e81-19ea-4b6a-84a5-a85a42170330. ============================================================= I see about 40 of these crashes across all branches in the last 7 days, so this isn't a big stability issue, but maybe it is an interesting failure.
Hmm, from that spike on the 24th this would possibly point to an issue with the update to the chromium sandbox code (bug 1337331). That was backed out and re-landed late on 29th, so should have been in yesterday's Nightly, so if it is that we should see it start to spike again. I can only guess that mPolicy is messed up somehow, because that function in theory always returns sandbox::SBOX_ALL_OK. But the calls above have worked.
Flags: needinfo?(bobowencode)
Actually over half of the crashes in the last 3 days are on Aurora, so maybe it's not (or not entirely) to do with the chromium update. I'll continue to keep an eye on this.
In the last 24 hours this has started happening on several different 53 Beta versions. mccr8: Any idea what could land that could affect previous versions like this? Some hot fix or something? (Also, NIing lizzard so she's aware of this.) One other weird thing is the line that it says it's failing on for beta isn't even in those versions. So, now I'm wondering if bug 1344453 is causing this somehow during update, so it reports the wrong version number?? The uplift of that bug matches the incidences of this.
Flags: needinfo?(lhenry)
Flags: needinfo?(continuation)
I'm not really sure how to get the change sets that went into a particular beta. Hopefully Liz can do that.
Flags: needinfo?(continuation)
You can get them from the pushlog, I just fiddle with the url to look at what went into m-b between particular betas, for example: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_53_0b8_RELEASE&tochange=FIREFOX_53_0b9_RELEASE
OK new theory. I realised the first spike couldn't have been bug 1344453 as that hadn't landed at all, so that must have been bug 1337331. Now the two things these changesets have in common (aside from the fact they are mine :-) ), is that they contain updates to the chromium sandbox code. Now the sandbox code is linked into firefox.exe not xul.dll and we pass pointers to the sandbox::BrokerServices from firefox.exe through to xul.dll. We use the broker services to create the sandbox::TargetPolicy (mPolicy) that is blowing up on us. What I think is happening is during the update, in some circumstances we end up with an old firefox.exe running with a new xul.dll (or possibly vice versa). This also explains why it is reported on the wrong/old versions. I tried out this theory, by installing Fx53b8 and copying in the xul.dll from Fx53b9 and hey presto, it produces the exact same crash: https://crash-stats.mozilla.com/report/index/f513bfb0-3b7f-4f10-a598-ec26a2170405 So, this really seems to indicate some problems with the update process that means we can get a mismatch between firefox.exe and xul.dll. This would probably have caused other seemingly spurious (and temporary) issues/crashes in the past. The good news is the spike should be temporary, the bad news is we'll probably get a bit of a rough ride into release as well.
Flags: needinfo?(bobowencode)
Nice. This might be another thing that would be helped by bug 1112937.
Depends on: 1112937
I realised I could confirm this, by looking at some of the dumps. I can see the mismatched older version of firefox.exe and newer version of xul.dll. It was always this way round. In a couple of the ones I looked at firefox.exe had been copied or renamed, which would explain the problem in those cases, but most of them were not like this.
OK, from talking offline with Bob it sounds like this is temporary, went away in beta 9, but we expect we may see the issue again. I am not sure of the possible scope here. Is this something I should be tracking for 53? Does it need more investigation? (no one is assigned now to this and bug 1112937 doesn't look like a priority).
Flags: needinfo?(lhenry) → needinfo?(blassey.bugs)
I'm not sure Brad is the right person to address this. This is essentially an updater bug. As I said in the email we could back out my change ... that won't stop the updater issue (old version of firefox with newer version of xul or other libraries). In fact with release the updater issue is more likely to cause issues anyway, even without the sandbox update, because there will be a greater difference between the versions, than for a lot of Beta updates. I just tried to simulate this by copying xul.dll from Fx53b9 into Fx52.0.2 and it fails to start completely, so I think there is a reasonable chance we won't see this crash for release, but the problem will still happen and we possibly don't get any reports about it. Rob - needinfo, partly just so you are aware of this, but also in case this is a different problem to the one in bug 1112937.
Component: Security: Process Sandboxing → Application Update
Flags: needinfo?(blassey.bugs) → needinfo?(robert.strong.bugs)
Product: Core → Toolkit
Tracking across current channels so that we keep an eye out for bigger problems.
So far, no crashes on 53 release. Noting that the problem may still exist, and is not likely to be common, but we won't see reports for it. I don't think release management needs to track this.
Quick update... I've been able to spend some time on a Windows fix for this. I have the code finished for updater.cpp and the tests. I still have to finish the UI code and hope to be able to within the next week.
Assignee: nobody → robert.strong.bugs
Status: NEW → ASSIGNED
Flags: needinfo?(robert.strong.bugs)
This signature just spiked on beta with 500+ crashes yesterday and almost as many today already (even though it's early in the day). Crash reason is "MOZ_RELEASE_ASSERT(sandbox::SBOX_ALL_OK == result) (SetIntegrityLevel should never fail, what happened?)"
(In reply to Julien Cristau [:jcristau] from comment #14) > This signature just spiked on beta with 500+ crashes yesterday and almost as > many today already (even though it's early in the day). > > Crash reason is "MOZ_RELEASE_ASSERT(sandbox::SBOX_ALL_OK == result) > (SetIntegrityLevel should never fail, what happened?)" This definitely looks like the same issue, just checked 20 reports and they all have a mismatch in the firefox.exe and xull.dll versions. As I've said before it's quite possible that this mismatch causes crashes every time we have an update, but when we have a sandbox change it's more obvious because they all crash in one place. Turning off staged updates, because of Windows 10 creators update issues, may have made this worse.
Sounds similar to bug 1341373...
(In reply to Julien Cristau [:jcristau] from comment #16) > Sounds similar to bug 1341373... Yes, looks like the same root cause.
the user comment & modules at bp-6363c7b7-fa1d-45ed-9a2e-2c4470170718 indicate it might be related to sandboxie (external sandboxing app)
Whiteboard: sb?
Whiteboard: sb? → sb+
Assignee: robert.strong.bugs → nobody
Status: ASSIGNED → NEW
Priority: -- → P2
Changing the summary of this, so it makes sense to duplicate the latest incarnation of this issue to this bug.
Summary: Crash in mozilla::SandboxBroker::SetSecurityLevelForContentProcess with "SetIntegrityLevel should never fail, what happened?" → Crash in Windows sandbox after chromium update due to binary version mismatch during update.
Crash Signature: [@ mozilla::SandboxBroker::SetSecurityLevelForContentProcess] → [@ mozilla::SandboxBroker::SetSecurityLevelForContentProcess] [@ mozilla::ipc::GeckoChildProcessHost::GeckoChildProcessHost]
12 of these crashes on the 2-28 Nightly, from at least a few different install times. For instance: bp-eaca2c9c-47bf-426b-b626-2f2fd0180228
Depends on: 1366808
Priority: P2 → P3
Severity: critical → S3
Duplicate of this bug: 1936838
Crash Signature: [@ mozilla::SandboxBroker::SetSecurityLevelForContentProcess] [@ mozilla::ipc::GeckoChildProcessHost::GeckoChildProcessHost] → [@ mozilla::SandboxBroker::SetSecurityLevelForContentProcess] [@ mozilla::ipc::GeckoChildProcessHost::GeckoChildProcessHost] [@ mozilla::SandboxBroker::SetSecurityLevelForGPUProcess]

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on beta (startup)
  • Top 5 desktop browser crashes on Windows on beta (startup)

:bytesized, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(bytesized)

Still hoping to fix this via Bug 1891600.

Flags: needinfo?(bytesized)

[Tracking Requested - why for this release]: A fairly heavy spike in crashes on Beta, for Windows users

(In reply to Liz Henry (:lizzard) (relman/hg->git project) from comment #25)

[Tracking Requested - why for this release]: A fairly heavy spike in crashes on Beta, for Windows users

This is a long term issue that crops up when there are sandbox (and probably other) changes that cause incompatibility between firefox.exe and xul.dll. I'm not sure that tracking this for Fx134 is going to be particularly useful.

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

You need to log in before you can comment on or make changes to this bug.