Crash in [@ ErrorLoadingSheet]
Categories
(Toolkit :: Application Update, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | affected |
firefox65 | --- | unaffected |
firefox66 | - | wontfix |
firefox67 | --- | wontfix |
firefox67.0.1 | --- | wontfix |
firefox68 | --- | wontfix |
firefox69 | --- | fix-optional |
People
(Reporter: lizzard, Unassigned)
References
Details
(Keywords: crash, regression)
Crash Data
This bug is for crash report bp-00fa9d97-dc14-429c-be73-6b1c80190313.
We're seeing some crashes for the first time in the last 6 months in this signature with the 66 RC1 build. See also bug 315278.
Top 10 frames of crashing thread:
0 xul.dll static void MOZ_CrashOOL mfbt/Assertions.h:314
1 xul.dll static void ErrorLoadingSheet layout/style/nsLayoutStylesheetCache.cpp:277
2 xul.dll nsLayoutStylesheetCache::LoadSheet layout/style/nsLayoutStylesheetCache.cpp:304
3 xul.dll nsLayoutStylesheetCache::nsLayoutStylesheetCache layout/style/UserAgentStyleSheetList.h:34
4 xul.dll nsLayoutStylesheetCache::Singleton layout/style/nsLayoutStylesheetCache.cpp:187
5 xul.dll nsDocumentViewer::CreateStyleSet layout/base/nsDocumentViewer.cpp:2292
6 xul.dll nsresult nsDocumentViewer::InitPresentationStuff layout/base/nsDocumentViewer.cpp:732
7 xul.dll nsresult nsDocumentViewer::InitInternal layout/base/nsDocumentViewer.cpp:980
8 xul.dll nsDocumentViewer::Init layout/base/nsDocumentViewer.cpp:714
9 xul.dll nsresult nsDocShell::Embed docshell/base/nsDocShell.cpp:6256
Reporter | ||
Comment 1•6 years ago
|
||
Robert, can you take a look so we can check if this is some new problem? It didn't happen for the 65 RC so I'm concerned we may have regressed something.
Comment 2•6 years ago
•
|
||
I've been trying to track down what is happening today. My findings so far
- Nothing changed in the app update client after 66.0b10 so an app update client change is extremely unlikely.
- The omni.ja that contains the css file (svg.css) is updated after firefox.exe and before xul.dll both of which have the correct version. If this were due to an update failure (e.g. low disk space, updater crash, etc.) that failed to rollback to the original state then the file versions would be different.
I did notice that the compatibility.ini file is located in the Windows roaming profile and the startup cache is located in the Windows local profile. I don't know the startup cache code well enough to know for certain but it seems that if a client with a roaming profile updated and then used that roaming profile on another system that had already been updated the profile's compatibility.ini would report it as compatible but the users startup cache in the Windows local profile would be for the old version and not be purged.
I'm looking into what could be added to the crash reports that might help narrow this down.
Comment 3•6 years ago
|
||
I filed bug 1535181 for the possible startup cache issue.
Reporter | ||
Comment 4•6 years ago
|
||
Thanks, I'll keep an eye on it after release.
Now that I look back at the crash signature it isn't new in 66 at all. I must have been searching with some other limit without realizing.
Comment 5•6 years ago
|
||
It does look like it increased significantly in 66.0b11 so something may have changed after 66.0b10 to cause this increase. I went through all of the changesets but there were no app update changes and nothing else stood out.
Comment 6•6 years ago
|
||
I highly doubt bug 315278 is involved per point 2 in comment #2
Comment 7•6 years ago
|
||
This looks very similar to bug 1194856.
:heycam, is the diagnostic results from the spinoffs of bug 1225004 available anywhere? If not, could the diagnostics be added back? Thanks!
Comment 8•6 years ago
|
||
Another weird data point. Bug 1194856 comment #68 states that the error occurred after uninstalling / reinstalling.
Comment 9•6 years ago
|
||
heycam, after reading through bug 1194856 again I don't think adding back the diagnostics will help at this point so clearing the needinfo.
Updated•6 years ago
|
Comment 10•6 years ago
|
||
Changing the priority to p1 as the bug is tracked by a release manager for the current release.
See How Do You Triage for more information
Reporter | ||
Comment 11•6 years ago
|
||
This is showing up as the #2 top crash in 66 release on the first day of the release.
Jimm, is there anything else that could be done here? Do you still think this should be P3?
Comment 12•6 years ago
|
||
Changing platform to all as I see crashes on all 3 platforms in 66 release.
Comment 13•6 years ago
|
||
This bug appears to be essentially the same as Bug 1194856 and Bug 1194856 comment #68 shows this happens with the installer as well.
Also, FennecAndroid doesn't use Application Update.
Given the above two points I'm not sure this bug is properly categorized or that Application Update can do anything here.
Comment 14•6 years ago
•
|
||
We did a little debugging in our standup. In 65 we had debug data that helped. The consensus is this:
- this impacts a small number of users
- the crcs on certain files are wrong, but the file sizes appear correct. This implies some sort of corruption of the files on disk.
- bug 1303764, which is the previous incarnation of this bug (filed in 2016) had a user report that the main cause appeared to be a full disk.
- This isn't application update specific as this happens in cases where updater isn't involved .
The way forward here is to add a check to install/update that validates crc values after applying an update to a file which is something we currently don't do. To accomplish this we'd need crc data embedded in mar files so the work here extends beyond client.
We're going to find some time to work on this (bug 1535687) in Q2. There are no easy fixes here so I'd suggest this not block any release.
Updated•6 years ago
|
Comment 15•6 years ago
|
||
Also, bug 1194856 (filed 2015) appears to be a previous incarnation of this bug which also affected people uninstalling and reinstalling. It is also the bug that added the crc values in the crash reports which has since been backed out.
Note that the crc checks of files after they have been updated won't help the FennecAndroid case since Application Update isn't used by FennecAndroid.
Reporter | ||
Updated•6 years ago
|
Comment 16•6 years ago
|
||
Signs point to omni.ja truncation.
We looked at an instance of @ ErrorLoadingSheet
last week, it includes these annotations:
Error loading sheet: resource://gre/res/svg.css
NS_ERROR_FILE_CORRUPTION reason: nsJARInputStream: !mZs.next_in
Real location: jar:file:///C:/Program%20Files/Mozilla%20Firefox/omni.ja!/res/svg.css
GRE directory: C:\Program Files\Mozilla Firefox
Interesting files in the GRE directory:
C:\Program Files\Mozilla Firefox\browser\chrome.manifest (0 bytes, crc32 = 0x00000000)
C:\Program Files\Mozilla Firefox\browser\omni.ja (41437720 bytes, crc32 = 0x7059911f)
C:\Program Files\Mozilla Firefox\chrome.manifest (0 bytes, crc32 = 0x00000000)
C:\Program Files\Mozilla Firefox\omni.ja (17429973 bytes, crc32 = 0x326fbb3c)
Contents of chrome.manifest:
[[[]]]
GRE omnijar URI string: jar:file:///C:/Program%20Files/Mozilla%20Firefox/omni.ja!/
Interesting files in the GRE omnijar:
chrome/chrome.manifest (2784 bytes, crc32 = 0x0a1409b4)
res/svg.css (2159 bytes, crc32 = 0xeae4bb67)
The released 65.0.2 Windows amd64 en-US omni.ja
should have that size, but the CRC should be 45c7dd43
; browser\omni.ja
is also the right size but the CRC should be a7d7a15d
. The later CRCs (from the files in the GRE omnijar) are correct, but these come from the Zip central directory and thus don't reflect the actual data on disk.
It turns out that the reported CRCs can be explained if the files are truncated (specifically, filled with zeroes past a certain point, this can be checked with crctrunc): omni.ja
from 0x1000000
, browser\omni.ja
from 0xb00000
. Truncating the files in this way produces the same crash in my testing on 64 bit Windows 10.
I'm not sure what to blame this on. It would be interesting to see if the Android crashes look the same, as then there would be a totally different install/update process and filesystem responsible. One thing we could do to catch this earlier would be to check for the Zip "end of central directory" record at the end of the file.
I haven't yet had the opportunity to check more of these reports; this crash could still be a catch-all for any omnijar corruption. We may want to restore the annotations removed with https://phabricator.services.mozilla.com/D14050 , with some sort of support for Android (as Android crashes like edaf2b37-2f6c-4667-8bda-3e3ba0190325 don't report the CRC for res/omni.ja
).
Comment 17•6 years ago
|
||
There are also OS X and Linux crashes. I'll try to find ones that includes the crc annotations.
Comment 18•6 years ago
|
||
Sent a couple of OS X and Linux crashes to agashlin via email
Comment 19•6 years ago
•
|
||
I looked at the handful of crashes rstrong referred, there's a wide variety.
Using omnijars from the future (partial updates?):
- d899de90-4883-4531-a11c-012da0190307 64.0.2 Linux x86 es-ES 20190108160530
- omni.ja is from 65.0.1
- 2bfa61fa-a5cb-465d-bdf9-41c1e0190210 64.0.2 Linux amd64 en-GB 20190108160530
- omni.ja is from 65.0.
- b093afa0-2054-49ba-94d0-f02950190325 64.0.2 OS X 10.14 amd64 en-US 20190108160530
- browser/omni.ja is from 65.0.1 but zeroed from 1201000.
- omni.ja is from 65.0.1.
- f6e60a7e-38fb-4f2b-ae5e-e8bc70190325 64.0.2 OS X 10.11 amd64 nl 20190108160530
Error was in omni.ja!/chrome/toolkit/content/global/components.css but no CRC is reported.- browser/omni.ja same size as 65.0 but no CRC match
- omni.ja is from 65.0.
From the past:
Etc:
- a20c0a5e-a9e5-4455-beed-895480190325 65.0.1 OS X 10.14 amd64 en-US 20190211233335
- omni.ja correct size, directory seems right, but no CRC match
- browser/omni.ja correct size, no CRC match
- 4262eeff-9fa1-45fd-80c2-1312e0181109 63.0.1 Linux amd64 fr 20181101150546
This is actually from Arch Linux (matched the unusual buildid): https://archive.archlinux.org/packages/f/firefox/firefox-63.0.1-1-x86_64.pkg.tar.xz
The CRCs match what's in that archive, and I tried that version in Arch and it seemed to work fine, so who knows what's up.
Edit2: Reverted first edit, I was reading the wrong report.
Comment 20•5 years ago
|
||
With how FennecAndroid had a spike with 66.0.5 the same time as Firefox desktop and FennecAndroid not using app update it makes me think there is something other than app update causing this even though the FennecAndroid numbers are low.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 21•5 years ago
|
||
This is fairly low volume on 67 release so far, for both Firefox desktop and mobile.
Updated•5 years ago
|
Reporter | ||
Comment 22•5 years ago
|
||
Still fairly low volume so I'm marking this fix-optional for 69 to remove it from triage.
Updated•5 years ago
|
Comment 23•5 years ago
|
||
No crashes starting with version 69. The problem is gone? Or the signature changed?
Comment 24•2 years ago
|
||
Closing because no crashes reported for 12 weeks.
Description
•