Startup Crash in [@ mozilla::URLPreloader::ReadCache] after "MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty)"
Categories
(Core :: XPConnect, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | wontfix |
firefox-esr91 | --- | wontfix |
firefox-esr102 | --- | wontfix |
firefox90 | --- | unaffected |
firefox91 | --- | wontfix |
firefox92 | --- | wontfix |
firefox93 | --- | wontfix |
firefox94 | --- | wontfix |
firefox95 | --- | wontfix |
firefox96 | --- | wontfix |
firefox97 | --- | wontfix |
firefox98 | --- | wontfix |
firefox99 | --- | wontfix |
firefox112 | --- | wontfix |
firefox113 | --- | wontfix |
firefox114 | --- | wontfix |
People
(Reporter: RyanVM, Unassigned)
References
(Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
+++ This bug was initially created as a clone of Bug #1719319 +++
We're starting to see some new reports now from builds with the diagnostic assert added in bug 1719319. Let's use this bug to continue the investigation since bug 1719319 appears to have successfully mitigated the crashes for release users.
Crash report: https://crash-stats.mozilla.org/report/index/99c4953b-57da-4d41-bddd-ec4470210805
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(false) (Entry should be new and not in any list)
Top 10 frames of crashing thread:
0 xul.dll mozilla::URLPreloader::ReadCache js/xpconnect/loader/URLPreloader.cpp:308
1 xul.dll mozilla::URLPreloader::BackgroundReadFiles js/xpconnect/loader/URLPreloader.cpp:340
2 xul.dll mozilla::detail::RunnableMethodImpl<D3DVsyncSource::D3DVsyncDisplay*, void xpcom/threads/nsThreadUtils.h:1201
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1142
4 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:300
5 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:324
6 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:306
7 xul.dll static nsThread::ThreadFunc xpcom/threads/nsThread.cpp:390
8 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:399
9 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:139
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 1•3 years ago
|
||
We are ending the 92 cycle so wontfix 92. This is a startup crash and such crashes are not good for user retention, I think it would be good to at least have an assignee and some investigation done, thanks.
Comment 2•3 years ago
|
||
:kmag is back from vacation. He has patches in bug 1724509. He is looking into landing them.
Comment 3•3 years ago
•
|
||
We mitigated crashes in release builds in bug 1719319, so firefox93 should be unaffected. But we definitely should look into this issue deeper.
Comment 4•3 years ago
•
|
||
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #3)
We mitigated crashes in release builds in bug 1719319, so firefox93 should be unaffected. But we definitely should look into this issue deeper.
Hey Kris, can you please confirm if my understanding about the whether firefox-93 is affected or not right? If my understanding is right, we ppbly would want to hold off landing patches in bug 1724509 until soft-freeze ends.
Reporter | ||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Comment 5•3 years ago
|
||
Bug 1717778 is on all branches now (and is unhidden) so this doesn't need to be hidden any more. Maybe we could disable this fatal assert on early beta, too, to avoid bothering release managers every cycle.
Comment 6•3 years ago
|
||
https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/js/xpconnect/loader/URLPreloader.cpp#336,338
we have pendingURLs outside the lock. Isn't it possible that we end up waiting for the lock and while that is happening some other
call ends up passing cached urls to another list?
Comment 7•3 years ago
|
||
Writing a patch (once a build is ready)
Comment 8•3 years ago
|
||
Oh wait, BackgroundReadFiles seems to have other issues too. Thinking.
Updated•3 years ago
|
Comment 9•3 years ago
|
||
I put up a patch in bug 1755539 that changes a few things around. I don't know how it could lead to this issue, but it might eliminate some weirdness at least.
Updated•3 years ago
|
Comment 10•3 years ago
|
||
My patch didn't help. I guess I can try looking at how the preloader list is written out.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 11•3 years ago
|
||
It looks like my changes in bug 1757227 did finally turn up a useful tidbit. The two crashes since my patch landed are hitting MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty)
so we're ending up with multiple empty paths.
Comment 12•3 years ago
|
||
I clicked a bit in searchfox. So we read this CacheKey
from our file. IIUC that means we already wrote it before with an empty path. Now there seem to be quite a few constructors for CacheKey
that seem not to check if we actually get a path, should we have some assert there?
Comment 13•3 years ago
|
||
Yeah, there may be some way we end up with an entry that has a blank path. My question is, how can we end up with two of them? The file is written out by iterating over a hash table where the keyed off of a CacheKey, so how can the same CacheKey appear twice? I spent some time trying to figure out if there's something weird with the hashtable that could allow that, but I couldn't come up with anything.
The type of the hashtable is nsClassHashtable<nsGenericHashKey<CacheKey>, URLEntry>, where the CacheKeys are also URLEntries, so we end up with two copies of the data (one for the key and one for the value) but aside from the small excess usage of space I can't think of anything wrong with that. I couldn't see any place that mutates the values in the hash table.
Updated•3 years ago
|
Comment 14•2 years ago
|
||
:hsinyi given the low volume crashes here, is this still considered an S2?
Comment 15•2 years ago
|
||
(In reply to Dianna Smith [:diannaS] from comment #14)
:hsinyi given the low volume crashes here, is this still considered an S2?
Just talked with mccr8, yeah, it probably makes more sense to downgrade this to S3, given the crash in beta/release has been mitigated.
Note that we sill want to figure the issue out, but it's unfortunately stalled after we gave this several try.
Comment 16•2 years ago
|
||
The severity field for this bug is set to S3. However, the bug has the topcrash
keyword.
:peterv, could you consider increasing the severity of this top-crash bug? If the crash isn't "top" anymore, could you drop the topcrash
keyword?
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Comment 18•2 years ago
|
||
We've bypassed the assertion that made it a top crash. It is still present on Nightly.
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 20•2 years ago
|
||
There's a flurry of new crashes under this signature on nightly, if it can be useful I can extract the contents of the strings causing the crash from the reports. Just NI? me if it could help.
Comment 21•2 years ago
•
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #20)
There's a flurry of new crashes under this signature on nightly, if it can be useful I can extract the contents of the strings causing the crash from the reports. Just NI? me if it could help.
Do you intend the string of MOZ_CRASH_REASON? In all the nightly instances that seems to be MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty)
, so we have a key without path, apparently. Maybe we should assert if key.mType is any of the expected types or just garbage, too?
Edit: It seems we already knew this from comment 12 and comment 13...
Comment 22•2 years ago
|
||
I mean the actual contents of the key itself.
Comment 23•2 years ago
•
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #22)
I mean the actual contents of the key itself.
That sounds helpful, thanks.
Edit: Assuming that we can see the type this way, too. Can we see the content of buf ?
Comment 24•2 years ago
|
||
Unfortunately the contents of buf
are unavailable as it's been optimized away and VS is unable to reconstruct it. key->mType
is TypeAppJar
and pendingURLs
seems to contain only one entry aside from the sentinel (I don't know what it contains though). If I go up one frame this.mGREPrefix
is jar:file:///C:/Program%20Files/Firefox%20Nightly/omni.ja!/
.
Comment 25•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly (startup)
:hsinyi, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 26•2 years ago
|
||
This doesn't look a true topcrash. There were 14 reports from 1 install and it seems that contributes to topcrash or topcrash-startup calculation.
Hey :marco, we could perhaps teach our bot to do something differently for the number of reports coming from a single install.
Comment 27•2 years ago
|
||
(In reply to Hsin-Yi Tsai (she/her) [:hsinyi] from comment #26)
This doesn't look a true topcrash. There were 14 reports from 1 install and it seems that contributes to topcrash or topcrash-startup calculation.
Hey :marco, we could perhaps teach our bot to do something differently for the number of reports coming from a single install.
Ah, looks we've already taken the above into consideration in our criteria to some extent - "Top 10 desktop browser crashes on Nightly, if they happen for enough different installations."
Comment 28•2 years ago
|
||
From Socorro, it looks like in the last week the crash happened 18 times to 8 different installations on 110.0a1 and 4 times to 1 installation on 111.0a1.
Comment 29•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly (startup)
:peterv, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Comment 31•2 years ago
|
||
The assertion is different now.
Comment 32•1 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 33•1 years ago
|
||
Most of the startup assertions we're seeing are from when we decode multiple
paths that are empty strings. That doesn't seem likely to happen very often
from on disk corruption, so presumably it must be happening before/when we
encode the cache file?
I don't have much hope that this will turn up the cause, but it seems worth a
try, and it definitely shouldn't hurt.
Updated•1 years ago
|
Comment 34•1 years ago
|
||
Comment 35•1 years ago
|
||
Backed out changeset f2fbfecc33d1 (bug 1724336) for causing mochitest failures at URLPreloader.h
Backout: https://hg.mozilla.org/integration/autoland/rev/7dd2e5db976d861f69363f886bfabeabb8807962
Failure log: https://treeherder.mozilla.org/logviewer?job_id=413718954&repo=autoland&lineNumber=7602
Comment 36•1 years ago
|
||
Comment 37•1 years ago
|
||
bugherder |
Comment 38•1 years ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.
Reporter | ||
Updated•1 years ago
|
Updated•1 year ago
|
Comment 39•11 months ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Description
•