Startup Crash in [@ mozilla::URLPreloader::ReadCache] after "MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty)"
Categories
(Core :: XPConnect, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | wontfix |
firefox-esr91 | --- | wontfix |
firefox-esr102 | --- | wontfix |
firefox90 | --- | unaffected |
firefox91 | --- | wontfix |
firefox92 | --- | wontfix |
firefox93 | --- | wontfix |
firefox94 | --- | wontfix |
firefox95 | --- | wontfix |
firefox96 | --- | wontfix |
firefox97 | --- | wontfix |
firefox98 | --- | wontfix |
firefox99 | --- | wontfix |
firefox112 | --- | wontfix |
firefox113 | --- | wontfix |
firefox114 | --- | wontfix |
People
(Reporter: RyanVM, Assigned: kmag)
References
(Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
+++ This bug was initially created as a clone of Bug #1719319 +++
We're starting to see some new reports now from builds with the diagnostic assert added in bug 1719319. Let's use this bug to continue the investigation since bug 1719319 appears to have successfully mitigated the crashes for release users.
Crash report: https://crash-stats.mozilla.org/report/index/99c4953b-57da-4d41-bddd-ec4470210805
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(false) (Entry should be new and not in any list)
Top 10 frames of crashing thread:
0 xul.dll mozilla::URLPreloader::ReadCache js/xpconnect/loader/URLPreloader.cpp:308
1 xul.dll mozilla::URLPreloader::BackgroundReadFiles js/xpconnect/loader/URLPreloader.cpp:340
2 xul.dll mozilla::detail::RunnableMethodImpl<D3DVsyncSource::D3DVsyncDisplay*, void xpcom/threads/nsThreadUtils.h:1201
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1142
4 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:300
5 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:324
6 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:306
7 xul.dll static nsThread::ThreadFunc xpcom/threads/nsThread.cpp:390
8 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:399
9 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:139
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 1•2 years ago
|
||
We are ending the 92 cycle so wontfix 92. This is a startup crash and such crashes are not good for user retention, I think it would be good to at least have an assignee and some investigation done, thanks.
Comment 2•2 years ago
|
||
:kmag is back from vacation. He has patches in bug 1724509. He is looking into landing them.
Comment 3•2 years ago
•
|
||
We mitigated crashes in release builds in bug 1719319, so firefox93 should be unaffected. But we definitely should look into this issue deeper.
Comment 4•2 years ago
•
|
||
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #3)
We mitigated crashes in release builds in bug 1719319, so firefox93 should be unaffected. But we definitely should look into this issue deeper.
Hey Kris, can you please confirm if my understanding about the whether firefox-93 is affected or not right? If my understanding is right, we ppbly would want to hold off landing patches in bug 1724509 until soft-freeze ends.
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Reporter | ||
Updated•1 year ago
|
Updated•1 year ago
|
Reporter | ||
Updated•1 year ago
|
Comment 5•1 year ago
|
||
Bug 1717778 is on all branches now (and is unhidden) so this doesn't need to be hidden any more. Maybe we could disable this fatal assert on early beta, too, to avoid bothering release managers every cycle.
Comment 6•1 year ago
|
||
https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/js/xpconnect/loader/URLPreloader.cpp#336,338
we have pendingURLs outside the lock. Isn't it possible that we end up waiting for the lock and while that is happening some other
call ends up passing cached urls to another list?
Comment 7•1 year ago
|
||
Writing a patch (once a build is ready)
Comment 8•1 year ago
|
||
Oh wait, BackgroundReadFiles seems to have other issues too. Thinking.
Updated•1 year ago
|
Comment 9•1 year ago
|
||
I put up a patch in bug 1755539 that changes a few things around. I don't know how it could lead to this issue, but it might eliminate some weirdness at least.
Updated•1 year ago
|
Comment 10•1 year ago
|
||
My patch didn't help. I guess I can try looking at how the preloader list is written out.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 11•1 year ago
|
||
It looks like my changes in bug 1757227 did finally turn up a useful tidbit. The two crashes since my patch landed are hitting MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty)
so we're ending up with multiple empty paths.
Comment 12•1 year ago
|
||
I clicked a bit in searchfox. So we read this CacheKey
from our file. IIUC that means we already wrote it before with an empty path. Now there seem to be quite a few constructors for CacheKey
that seem not to check if we actually get a path, should we have some assert there?
Comment 13•1 year ago
|
||
Yeah, there may be some way we end up with an entry that has a blank path. My question is, how can we end up with two of them? The file is written out by iterating over a hash table where the keyed off of a CacheKey, so how can the same CacheKey appear twice? I spent some time trying to figure out if there's something weird with the hashtable that could allow that, but I couldn't come up with anything.
The type of the hashtable is nsClassHashtable<nsGenericHashKey<CacheKey>, URLEntry>, where the CacheKeys are also URLEntries, so we end up with two copies of the data (one for the key and one for the value) but aside from the small excess usage of space I can't think of anything wrong with that. I couldn't see any place that mutates the values in the hash table.
Updated•1 year ago
|
Comment 14•1 year ago
|
||
:hsinyi given the low volume crashes here, is this still considered an S2?
Comment 15•1 year ago
|
||
(In reply to Dianna Smith [:diannaS] from comment #14)
:hsinyi given the low volume crashes here, is this still considered an S2?
Just talked with mccr8, yeah, it probably makes more sense to downgrade this to S3, given the crash in beta/release has been mitigated.
Note that we sill want to figure the issue out, but it's unfortunately stalled after we gave this several try.
Comment 16•1 year ago
|
||
The severity field for this bug is set to S3. However, the bug has the topcrash
keyword.
:peterv, could you consider increasing the severity of this top-crash bug? If the crash isn't "top" anymore, could you drop the topcrash
keyword?
For more information, please visit auto_nag documentation.
Updated•1 year ago
|
Comment 18•1 year ago
|
||
We've bypassed the assertion that made it a top crash. It is still present on Nightly.
Reporter | ||
Updated•1 year ago
|
Updated•11 months ago
|
Comment 20•6 months ago
|
||
There's a flurry of new crashes under this signature on nightly, if it can be useful I can extract the contents of the strings causing the crash from the reports. Just NI? me if it could help.
Comment 21•6 months ago
•
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #20)
There's a flurry of new crashes under this signature on nightly, if it can be useful I can extract the contents of the strings causing the crash from the reports. Just NI? me if it could help.
Do you intend the string of MOZ_CRASH_REASON? In all the nightly instances that seems to be MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty)
, so we have a key without path, apparently. Maybe we should assert if key.mType is any of the expected types or just garbage, too?
Edit: It seems we already knew this from comment 12 and comment 13...
Comment 22•6 months ago
|
||
I mean the actual contents of the key itself.
Comment 23•6 months ago
•
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #22)
I mean the actual contents of the key itself.
That sounds helpful, thanks.
Edit: Assuming that we can see the type this way, too. Can we see the content of buf ?
Comment 24•6 months ago
|
||
Unfortunately the contents of buf
are unavailable as it's been optimized away and VS is unable to reconstruct it. key->mType
is TypeAppJar
and pendingURLs
seems to contain only one entry aside from the sentinel (I don't know what it contains though). If I go up one frame this.mGREPrefix
is jar:file:///C:/Program%20Files/Firefox%20Nightly/omni.ja!/
.
Comment 25•5 months ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly (startup)
:hsinyi, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 26•5 months ago
|
||
This doesn't look a true topcrash. There were 14 reports from 1 install and it seems that contributes to topcrash or topcrash-startup calculation.
Hey :marco, we could perhaps teach our bot to do something differently for the number of reports coming from a single install.
Comment 27•5 months ago
|
||
(In reply to Hsin-Yi Tsai (she/her) [:hsinyi] from comment #26)
This doesn't look a true topcrash. There were 14 reports from 1 install and it seems that contributes to topcrash or topcrash-startup calculation.
Hey :marco, we could perhaps teach our bot to do something differently for the number of reports coming from a single install.
Ah, looks we've already taken the above into consideration in our criteria to some extent - "Top 10 desktop browser crashes on Nightly, if they happen for enough different installations."
Comment 28•5 months ago
|
||
From Socorro, it looks like in the last week the crash happened 18 times to 8 different installations on 110.0a1 and 4 times to 1 installation on 111.0a1.
Comment 29•2 months ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly (startup)
:peterv, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Updated•2 months ago
|
Comment 31•2 months ago
|
||
The assertion is different now.
Comment 32•1 month ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 33•1 month ago
|
||
Most of the startup assertions we're seeing are from when we decode multiple
paths that are empty strings. That doesn't seem likely to happen very often
from on disk corruption, so presumably it must be happening before/when we
encode the cache file?
I don't have much hope that this will turn up the cause, but it seems worth a
try, and it definitely shouldn't hurt.
Updated•1 month ago
|
Comment 34•1 month ago
|
||
Pushed by maglione.k@gmail.com: https://hg.mozilla.org/integration/autoland/rev/f2fbfecc33d1 Add some more URLPreloader path sanity checks. r=mccr8
Comment 35•1 month ago
|
||
Backed out changeset f2fbfecc33d1 (bug 1724336) for causing mochitest failures at URLPreloader.h
Backout: https://hg.mozilla.org/integration/autoland/rev/7dd2e5db976d861f69363f886bfabeabb8807962
Failure log: https://treeherder.mozilla.org/logviewer?job_id=413718954&repo=autoland&lineNumber=7602
Comment 36•1 month ago
|
||
Pushed by maglione.k@gmail.com: https://hg.mozilla.org/integration/autoland/rev/b3b312008642 Add some more URLPreloader path sanity checks. r=mccr8
Comment 37•1 month ago
|
||
bugherder |
Comment 38•1 month ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.
Reporter | ||
Updated•1 month ago
|
Updated•12 days ago
|
Description
•