Open Bug 1404743 Opened 8 years ago Updated 2 years ago

file not found in Preloader's nsZipArchive Likely/mostly omni jar corruption

Categories

(Core :: XPConnect, defect, P3)

defect

Tracking

()

Tracking Status
thunderbird_esr78 --- wontfix
thunderbird_esr91 --- affected
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox-esr68 --- affected
firefox57 + wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- fix-optional
firefox71 --- fix-optional

People

(Reporter: philipp, Unassigned)

References

Details

(Keywords: crash, leave-open, regression, Whiteboard: [tbird crash])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-f864931e-66e2-4610-af71-10e530170928. ============================================================= Crashing Thread (16), Name: BGReadURLs Frame Module Signature Source 0 xul.dll nsZipArchive::GetItem(char const*) modules/libjar/nsZipArchive.cpp:450 1 xul.dll mozilla::URLPreloader::BackgroundReadFiles() js/xpconnect/loader/URLPreloader.cpp:374 2 xul.dll mozilla::detail::RunnableMethodImpl<mozilla::layers::ImageBridgeParent* const, void ( mozilla::layers::ImageBridgeParent::*)(void), 1, 0>::Run() xpcom/threads/nsThreadUtils.h:1192 3 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1039 4 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:524 5 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:338 6 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319 7 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299 8 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:427 9 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 10 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 11 ucrtbase.dll o__strtoui64 12 kernel32.dll BaseThreadInitThunk 13 ntdll.dll RtlUserThreadStart these startup crash reports are increasing in volume during 57.0b and look related to the changes in bug 1363482.
Hi Kris, The reported said this bug might be relevant to bug 1363482. Could you take a look of this one?
Flags: needinfo?(kmaglione+bmo)
an affected user is posting here in case we'd need to ask for more information (he's also crashing with the signature from bug 1404741): https://www.reddit.com/r/firefox/comments/73u8h3/on_nightly_restart_to_update_crash_reporter/
See Also: → 1404741
Priority: -- → P1
Startup crash, important volume, tracking.
It looks like most of these crashes are actually bug 1404741 (some of them even have MOZ_RELEASE_ASSERT(globalObj) as the crash reason), but for some reason they're being reported as coming from the BGReadURLs thread.
Depends on: 1404741
Flags: needinfo?(kmaglione+bmo)
Component: General → XPConnect
This may have been fixed in bug 1404741 for beta 7 (releasing today). Let's check back in a few days to make sure.
Currently #36 in the crash list.
Priority: P1 → P3
[Tracking Requested - why for this release]: we discussed this in the regression triage meeting today and we're not sure the volume warrants this being a 57 blocker (or even tracked). Sylvestre, are you ok with that?
Kris, in comment 4 you speculated this was actually bug 1404741 and in https://bugzilla.mozilla.org/show_bug.cgi?id=1404741#c15 you said you thought that bug correlates with ZVFORT32.DLL. I thought I'd check crash reports here to see if they also had ZVFORT32.DLL loaded but I don't see anything on the correlations tab ... can you please take a look?
Flags: needinfo?(kmaglione+bmo)
Andrew, this is a startup crash. So, meaning that we might lose these users. And from the volume, this is new in 57. So, maybe not a blocker but at least tracking.
Flags: needinfo?(sledru)
Sylvestre pointed out that the fact this is a startup crash means we shouldn't fix-optional it.
tracking as regressing startup crash
Flags: needinfo?(amckay)
Hi Andrew, this crash is still worrisome. for two reasons 1) startup crash and 2) steadily increasing crash volume with each beta build. Is there something we can do here? ~2 weeks away from go live.
Flags: needinfo?(overholt)
Sorry, I wrote a comment about this a few days ago, but it looks like it somehow got lost. I'm still looking into this, but so far it seems like some sort of corruption. We're winding up with a null pointer for one of the omnijar archives when we try to read a file from it. While it's technically possible for some instances not to have an app omnijar (or any omnijar if they're running unpacked), the list of entries that we read at startup is generated during the last session, and the cache file is cleared whenever we try to load a profile with a different build ID from the last session, so it should never be possible for the referenced archives to be missing at this point. I'm inclined to suspect this has something to do with interference from third-party anti-virus DLLs, since they seem to be present in most or all of the crashes I've looked at, but that's mostly just a guess at this point.
Flags: needinfo?(kmaglione+bmo)
Kris is the best person to continue investigation. If you could use a hand, Kris, please let me know.
Flags: needinfo?(overholt)
Comment on attachment 8923235 [details] Bug 1404743: Add better diagnostic crash reasons for URLPreloader failure. data-r?rweiss Requesting data review for the addition of a MOZ_CRASH_UNSAFE_PRINTF. 1) Is there documentation that describes the schema for the ultimate data set available publicly, complete and accurate? This is a trivial crash assertion. The only data it includes is the type of omnijar archive that failed to load, and the patch within that archive that we intended to read. That path should always be the path of a built-in file, so no user information is exposed. 2) Is there a control mechanism that allows the user to turn the data collection on and off? (Note, for data collection not needed for security purposes, Mozilla provides such a control mechanism) Yes, the data is only submitted if the user chooses to submit a crash report. 3) If the request is for permanent data collection, is there someone who will monitor the data over time? No, the assertion will be removed when we understand the cause of the crash. This bug will remain open until it's removed. 4) Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under? This is type 1. 5) Is the data collection default-on or default-off? Default on, in the cases where users choose to submit crash reports. 6) Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)? No. 7) Is the data collection covered by the existing Firefox privacy notice? If unsure: escalate to legal. Yes. 8) Does there need to be a check-in in the future to determine whether to renew the data? The removal should be verified when this bug is closed.
Attachment #8923235 - Flags: feedback?(rweiss)
Comment on attachment 8923235 [details] Bug 1404743: Add better diagnostic crash reasons for URLPreloader failure. data-r?rweiss https://reviewboard.mozilla.org/r/194396/#review199700
Attachment #8923235 - Flags: review?(continuation) → review+
Comment on attachment 8923235 [details] Bug 1404743: Add better diagnostic crash reasons for URLPreloader failure. data-r?rweiss https://reviewboard.mozilla.org/r/194396/#review199728 1. Yes, these are crash report probes and will have the standard documentation. 2. Yes, users opt in via crashes. 3. No, this is temporary. 4. Type 1 measurement. 5. Default on, all locales, all countries but only for submitted crash reports. 6. No new identifiers. 7. Covered under policy. 8. No need for future check-in.
Attachment #8923235 - Flags: review+
Attachment #8923235 - Flags: feedback?(rweiss) → feedback+
https://hg.mozilla.org/integration/mozilla-inbound/rev/315c6426fedf93d49891a3d1042c8e177ad13c22 Bug 1404743: Add better diagnostic crash reasons for URLPreloader failure. r=mccr8 data-r=rweiss
Keywords: leave-open
Comment on attachment 8923235 [details] Bug 1404743: Add better diagnostic crash reasons for URLPreloader failure. data-r?rweiss Approval Request Comment [Feature/Bug causing the regression]: N/A [User impact if declined]: This patch should give us better diagnostic information to help diagnose a high volume crash. [Is this code covered by automated tests?]: It is exercised by tests, but the new diagnostic code is not tested. [Has the fix been verified in Nightly?]: N/A [Needs manual test from QE? If yes, steps to reproduce]: N/A [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: No. [Why is the change risky/not risky?]: This patch simply adds diagnostic code that only takes effect in cases where we would otherwise crash. [String changes made/needed]: None
Attachment #8923235 - Flags: approval-mozilla-beta?
Flags: needinfo?(amckay)
Comment on attachment 8923235 [details] Bug 1404743: Add better diagnostic crash reasons for URLPreloader failure. data-r?rweiss Adds diagnostics, low risk, beta57+
Attachment #8923235 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Crash Signature: [@ nsZipArchive::GetItem] → [@ nsZipArchive::GetItem] [@ mozilla::URLPreloader::BackgroundReadFiles ]
Moz crash reason facet in 57.0b14: 1 Failed to get Omnijar AppJar archive for entry (path: "chrome/browser/content/browser/built_in_addons.json") 54 75.00 % 2 Failed to get Omnijar AppJar archive for entry (path: "chrome/en-US/locale/branding/brand.properties") 17 23.61 % 3 Failed to get Omnijar AppJar archive for entry (path: "chrome/de/locale/branding/brand.properties") 1 1.39 % https://crash-stats.mozilla.com/search/?signature=%3Dmozilla%3A%3AURLPreloader%3A%3ABackgroundReadFiles&_facets=moz_crash_reason#facet-moz_crash_reason
It looks like all of the failures are for the app jar, and it doesn't look like there's corruption in any of the path names. The app jar is usually a member of omni.ja, and some of the other crashes I've been looking at suggest a lot of users may be winding up with omnijar corruption, so it's possible that's the cause. I'm going to add more diagnostics to figure out exactly where/why the app jar is failing. I'm also considering adding build-time checksums for omni.ja and comparing/submitting them with every crash report, which may help narrow down other crashes too.
I just had this crash happen to me (https://crash-stats.mozilla.com/report/index/75c0f2e0-9931-4a61-b2d1-1415f0171117). In this case, I had: - Clicked the restart-to-update button - Firefox had quit and was in the process of updating - I accidentally/impatiently clicked the Firefox Dock icon So looking at the code a little bit, it could be that it was trying to read this file while it was being replaced by the updater?
Lots of wildptr crashes -- and also some breakpoints. In 56 there were UAF crashes.
Group: core-security
Given the default pref("network.jar.block-remote-files", true) this can't be used as an exploit against most users -- it's going to be local corruption like comment 26.
Group: core-security → dom-core-security
Keywords: sec-highsec-moderate
Looks as if a Russian user hit one of these signatures in Bug 1458782.
kmag, looks like these crashes are happening fairly often in beta 64. Is there anything new or useful in those crashes or should we count this issue as stalled?

Updating flags based on the fact there are still crashes happening in 68 nightly and 67 beta.

The leave-open keyword is there and there is no activity for 6 months.
:peterv, maybe it's time to close this bug?

Flags: needinfo?(peterv)

bp-dfe24d0d-d7e2-428b-98f9-a75bf0200404 mozilla::URLPreloader::BackgroundReadFiles
crash rate not substantially changed

The leave-open keyword is there and there is no activity for 6 months.
:peterv, maybe it's time to close this bug?

Flags: needinfo?(peterv)

The crash is still happening.

Flags: needinfo?(peterv)

The most likely cause if this is still omni jar corruption. We still have a lot of other crashes that suggest that's a common cause of crashes, and we still unfortunately don't have a way to reliably detect it in crash reports.

In any case, if that is the cause, there really isn't anything we can do here but crash (we obviously can't run without the app omnijar), and this isn't really a security issue.

Flags: needinfo?(kmaglione+bmo)

(In reply to Kris Maglione [:kmag] from comment #36)

The most likely cause if this is still omni jar corruption. We still have a lot of other crashes that suggest that's a common cause of crashes, and we still unfortunately don't have a way to reliably detect it in crash reports.

In any case, if that is the cause, there really isn't anything we can do here but crash (we obviously can't run without the app omnijar), and this isn't really a security issue.

Dan, are you still of the opinion this is sec-moderate?

Flags: needinfo?(dveditz)
Summary: Crash in nsZipArchive::GetItem → Startup Crash in nsZipArchive::GetItem. Likely/mostly omni jar corruption
Whiteboard: [tbird crash]

The vast majority of crashes (211 in the last week) are startup crashes that moved to the breakpoint at mozilla::URLPreloader::BackgroundReadFiles -- I'm not worried about those particularly.

There are still a handful (6-8) of crashes in nsZipArchive::GetItem() that are non-startup and worry me. You're right that a corrupt omnijar is not a reasonable attack vector, but a corrupt WebExtension served from a website would be. For those we open and read the archive as part of the signature-checking process.

Until yesterday this bug's summary didn't include "Startup Crash in". If you want to clone a new bug about the non-startup crash in nsZipArchive::GetItem and make that one the sec-moderate, then this startup crash in mozilla::URLPreloader::BackgroundReadFiles doesn't need to be a security bug. There is still a potential security problem one way or another.

Flags: needinfo?(dveditz)

The leave-open keyword is there and there is no activity for 6 months.
:peterv, maybe it's time to close this bug?

Flags: needinfo?(peterv)

This is still happening. Still not clear if it is actionable.

Flags: needinfo?(peterv)
Version: 57 Branch → unspecified

The leave-open keyword is there and there is no activity for 6 months.
:peterv, maybe it's time to close this bug?

Flags: needinfo?(peterv)

Still happening.

Flags: needinfo?(peterv)

The startup crash and JAR stuff isn't a use after free, so I think we can unhide this. Maybe there's some occasional crash somewhere with a signature that has some corruption but eh.

Group: dom-core-security
Keywords: sec-moderate

The leave-open keyword is there and there is no activity for 6 months.
:peterv, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(peterv)
Flags: needinfo?(peterv)
Severity: critical → S2

All non-breakpoint crashes in Preload (except for a few random sigill's on 91esr) are actually MOZ_CRASH_UNSAFE_PRINTF()'s, which appear on mac and linux as nullptrs

Splitting from GetItem, which seems to be a different crash - no crashes in GetItem have preloader in the stack

Crash Signature: [@ nsZipArchive::GetItem] [@ mozilla::URLPreloader::BackgroundReadFiles ] → [@ mozilla::URLPreloader::BackgroundReadFiles ]
Summary: Startup Crash in nsZipArchive::GetItem. Likely/mostly omni jar corruption → file not found in Preloader's nsZipArchive Likely/mostly omni jar corruption

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 20 desktop browser crashes on beta (startup)

For more information, please visit auto_nag documentation.

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Severity: S2 → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: