Open Bug 1724336 Opened 1 year ago Updated 28 days ago

Startup Crash in [@ mozilla::URLPreloader::ReadCache] after "MOZ_DIAGNOSTIC_ASSERT(false) (Entry should be new and not in any list)"

Categories

(Core :: XPConnect, defect)

Unspecified
Windows
defect

Tracking

()

Tracking Status
firefox-esr78 --- wontfix
firefox-esr91 --- wontfix
firefox90 --- unaffected
firefox91 --- wontfix
firefox92 --- wontfix
firefox93 --- wontfix
firefox94 --- wontfix
firefox95 --- wontfix
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 --- wontfix
firefox99 --- wontfix

People

(Reporter: ryanvm, Unassigned, NeedInfo)

References

(Depends on 1 open bug, Regression)

Details

(Keywords: crash, regression, stalled)

Crash Data

+++ This bug was initially created as a clone of Bug #1719319 +++

We're starting to see some new reports now from builds with the diagnostic assert added in bug 1719319. Let's use this bug to continue the investigation since bug 1719319 appears to have successfully mitigated the crashes for release users.

Crash report: https://crash-stats.mozilla.org/report/index/99c4953b-57da-4d41-bddd-ec4470210805

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(false) (Entry should be new and not in any list)

Top 10 frames of crashing thread:

0 xul.dll mozilla::URLPreloader::ReadCache js/xpconnect/loader/URLPreloader.cpp:308
1 xul.dll mozilla::URLPreloader::BackgroundReadFiles js/xpconnect/loader/URLPreloader.cpp:340
2 xul.dll mozilla::detail::RunnableMethodImpl<D3DVsyncSource::D3DVsyncDisplay*, void  xpcom/threads/nsThreadUtils.h:1201
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1142
4 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:300
5 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:324
6 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:306
7 xul.dll static nsThread::ThreadFunc xpcom/threads/nsThread.cpp:390
8 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:399
9 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:139
Severity: -- → S2
Depends on: 1724509

We are ending the 92 cycle so wontfix 92. This is a startup crash and such crashes are not good for user retention, I think it would be good to at least have an assignee and some investigation done, thanks.

:kmag is back from vacation. He has patches in bug 1724509. He is looking into landing them.

We mitigated crashes in release builds in bug 1719319, so firefox93 should be unaffected. But we definitely should look into this issue deeper.

(In reply to Hsin-Yi Tsai [:hsinyi] from comment #3)

We mitigated crashes in release builds in bug 1719319, so firefox93 should be unaffected. But we definitely should look into this issue deeper.

Hey Kris, can you please confirm if my understanding about the whether firefox-93 is affected or not right? If my understanding is right, we ppbly would want to hold off landing patches in bug 1724509 until soft-freeze ends.

Flags: needinfo?(kmaglione+bmo)
Has Regression Range: --- → yes

Bug 1717778 is on all branches now (and is unhidden) so this doesn't need to be hidden any more. Maybe we could disable this fatal assert on early beta, too, to avoid bothering release managers every cycle.

Group: dom-core-security
Keywords: sec-other

https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/js/xpconnect/loader/URLPreloader.cpp#336,338
we have pendingURLs outside the lock. Isn't it possible that we end up waiting for the lock and while that is happening some other
call ends up passing cached urls to another list?

Flags: needinfo?(continuation)

Writing a patch (once a build is ready)

Oh wait, BackgroundReadFiles seems to have other issues too. Thinking.

Depends on: 1755539
Flags: needinfo?(continuation)

I put up a patch in bug 1755539 that changes a few things around. I don't know how it could lead to this issue, but it might eliminate some weirdness at least.

Flags: needinfo?(kmaglione+bmo)

My patch didn't help. I guess I can try looking at how the preloader list is written out.

Depends on: 1757227

It looks like my changes in bug 1757227 did finally turn up a useful tidbit. The two crashes since my patch landed are hitting MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty) so we're ending up with multiple empty paths.

I clicked a bit in searchfox. So we read this CacheKey from our file. IIUC that means we already wrote it before with an empty path. Now there seem to be quite a few constructors for CacheKey that seem not to check if we actually get a path, should we have some assert there?

Yeah, there may be some way we end up with an entry that has a blank path. My question is, how can we end up with two of them? The file is written out by iterating over a hash table where the keyed off of a CacheKey, so how can the same CacheKey appear twice? I spent some time trying to figure out if there's something weird with the hashtable that could allow that, but I couldn't come up with anything.

The type of the hashtable is nsClassHashtable<nsGenericHashKey<CacheKey>, URLEntry>, where the CacheKeys are also URLEntries, so we end up with two copies of the data (one for the key and one for the value) but aside from the small excess usage of space I can't think of anything wrong with that. I couldn't see any place that mutates the values in the hash table.

Whiteboard: [domcore-s2-revisit]

:hsinyi given the low volume crashes here, is this still considered an S2?

Flags: needinfo?(htsai)

(In reply to Dianna Smith [:diannaS] from comment #14)

:hsinyi given the low volume crashes here, is this still considered an S2?

Just talked with mccr8, yeah, it probably makes more sense to downgrade this to S3, given the crash in beta/release has been mitigated.
Note that we sill want to figure the issue out, but it's unfortunately stalled after we gave this several try.

Severity: S2 → S3
Flags: needinfo?(htsai)
Keywords: stalled

The severity field for this bug is set to S3. However, the bug has the topcrash keyword.
:peterv, could you consider increasing the severity of this top-crash bug? If the crash isn't "top" anymore, could you drop the topcrash keyword?

For more information, please visit auto_nag documentation.

Flags: needinfo?(peterv)
Duplicate of this bug: 1772333
Flags: needinfo?(peterv)

We've bypassed the assertion that made it a top crash. It is still present on Nightly.

Whiteboard: [domcore-s2-revisit]

Kris going to take another look.

Flags: needinfo?(kmaglione+bmo)
You need to log in before you can comment on or make changes to this bug.