Open Bug 1724336 Opened 4 years ago Updated 1 year ago

Startup Crash in [@ mozilla::URLPreloader::ReadCache] after "MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty)"

Tracking

()

Status:

REOPENED

Tracking Flags:

Tracking

Status

firefox-esr78

---

wontfix

firefox-esr91

---

wontfix

firefox-esr102

---

wontfix

firefox90

---

unaffected

firefox91

---

wontfix

firefox92

---

wontfix

firefox93

---

wontfix

firefox94

---

wontfix

firefox95

---

wontfix

firefox96

---

wontfix

firefox97

---

wontfix

firefox98

---

wontfix

firefox99

---

wontfix

firefox112

---

wontfix

firefox113

---

wontfix

firefox114

---

wontfix

Tracking

Status

relnote-firefox

thunderbird_esr115

thunderbird_esr140

firefox-esr78

firefox-esr91

firefox-esr102

firefox-esr115

firefox-esr140

firefox90

firefox91

firefox92

firefox93

firefox94

firefox95

firefox96

firefox97

firefox98

firefox99

firefox112

firefox113

firefox114

firefox146

firefox147

firefox148

People

(Reporter: RyanVM, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

Bug 1724336: Add some more URLPreloader path sanity checks. r=mccr8 2 years ago Kris Maglione [:kmag] 48 bytes, text/x-phabricator-request		Details \| Review

Ryan VanderMeulen [:RyanVM]

Reporter

Description

•

4 years ago

+++ This bug was initially created as a clone of Bug #1719319 +++

We're starting to see some new reports now from builds with the diagnostic assert added in bug 1719319. Let's use this bug to continue the investigation since bug 1719319 appears to have successfully mitigated the crashes for release users.

Crash report: https://crash-stats.mozilla.org/report/index/99c4953b-57da-4d41-bddd-ec4470210805

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(false) (Entry should be new and not in any list)

Top 10 frames of crashing thread:

0 xul.dll mozilla::URLPreloader::ReadCache js/xpconnect/loader/URLPreloader.cpp:308
1 xul.dll mozilla::URLPreloader::BackgroundReadFiles js/xpconnect/loader/URLPreloader.cpp:340
2 xul.dll mozilla::detail::RunnableMethodImpl<D3DVsyncSource::D3DVsyncDisplay*, void  xpcom/threads/nsThreadUtils.h:1201
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1142
4 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:300
5 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:324
6 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:306
7 xul.dll static nsThread::ThreadFunc xpcom/threads/nsThread.cpp:390
8 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:399
9 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:139

Severity: -- → S2

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

4 years ago

status-firefox90: --- → unaffected

status-firefox91: --- → wontfix

status-firefox92: --- → affected

status-firefox-esr78: --- → unaffected

status-firefox-esr91: --- → affected

Andrew McCreight [:mccr8]

Updated

•

4 years ago

Keywords: sec-other

Kris Maglione [:kmag]

Updated

•

4 years ago

Depends on: 1724509

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

4 years ago

status-firefox92: affected → fix-optional

status-firefox93: --- → affected

Pascal Chevrel:pascalc

Updated

•

4 years ago

Keywords: topcrash-startup

Pascal Chevrel:pascalc

Updated

•

4 years ago

status-firefox94: --- → affected

status-firefox-esr78: unaffected → affected

Pascal Chevrel:pascalc

Comment 1

•

4 years ago

We are ending the 92 cycle so wontfix 92. This is a startup crash and such crashes are not good for user retention, I think it would be good to at least have an assignee and some investigation done, thanks.

status-firefox92: fix-optional → wontfix

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 2

•

4 years ago

:kmag is back from vacation. He has patches in bug 1724509. He is looking into landing them.

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 3

•

4 years ago

•

Edited

We mitigated crashes in release builds in bug 1719319, so firefox93 should be unaffected. But we definitely should look into this issue deeper.

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 4

•

4 years ago

•

Edited

(In reply to Hsin-Yi Tsai [:hsinyi] from comment #3)

We mitigated crashes in release builds in bug 1719319, so firefox93 should be unaffected. But we definitely should look into this issue deeper.

Hey Kris, can you please confirm if my understanding about the whether firefox-93 is affected or not right? If my understanding is right, we ppbly would want to hold off landing patches in bug 1724509 until soft-freeze ends.

Flags: needinfo?(kmaglione+bmo)

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

4 years ago

status-firefox93: affected → wontfix

status-firefox94: affected → wontfix

status-firefox95: --- → affected

status-firefox-esr78: affected → wontfix

Chris Peterson [:cpeterson]

Updated

•

4 years ago

status-firefox96: --- → affected

Pascal Chevrel:pascalc

Updated

•

4 years ago

status-firefox95: affected → wontfix

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

4 years ago

status-firefox96: affected → wontfix

status-firefox97: --- → affected

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

3 years ago

status-firefox97: affected → wontfix

status-firefox98: --- → affected

Andrew McCreight [:mccr8]

Comment 5

•

3 years ago

Bug 1717778 is on all branches now (and is unhidden) so this doesn't need to be hidden any more. Maybe we could disable this fatal assert on early beta, too, to avoid bothering release managers every cycle.

Group: dom-core-security

Keywords: sec-other

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

3 years ago

https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/js/xpconnect/loader/URLPreloader.cpp#336,338
we have pendingURLs outside the lock. Isn't it possible that we end up waiting for the lock and while that is happening some other
call ends up passing cached urls to another list?

Flags: needinfo?(continuation)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 7

•

3 years ago

Writing a patch (once a build is ready)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 8

•

3 years ago

Oh wait, BackgroundReadFiles seems to have other issues too. Thinking.

Andrew McCreight [:mccr8]

Updated

•

3 years ago

Depends on: 1755539

Andrew McCreight [:mccr8]

Updated

•

3 years ago

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 9

•

3 years ago

I put up a patch in bug 1755539 that changes a few things around. I don't know how it could lead to this issue, but it might eliminate some weirdness at least.

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

3 years ago

Flags: needinfo?(kmaglione+bmo)

Andrew McCreight [:mccr8]

Comment 10

•

3 years ago

My patch didn't help. I guess I can try looking at how the preloader list is written out.

Pascal Chevrel:pascalc

Updated

•

3 years ago

status-firefox98: affected → wontfix

status-firefox99: --- → affected

Andrew McCreight [:mccr8]

Updated

•

3 years ago

Depends on: 1757227

Andrew McCreight [:mccr8]

Updated

•

3 years ago

status-firefox99: affected → wontfix

Andrew McCreight [:mccr8]

Comment 11

•

3 years ago

It looks like my changes in bug 1757227 did finally turn up a useful tidbit. The two crashes since my patch landed are hitting MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty) so we're ending up with multiple empty paths.

Jens Stutte [:jstutte]

Comment 12

•

3 years ago

I clicked a bit in searchfox. So we read this CacheKey from our file. IIUC that means we already wrote it before with an empty path. Now there seem to be quite a few constructors for CacheKey that seem not to check if we actually get a path, should we have some assert there?

Andrew McCreight [:mccr8]

Comment 13

•

3 years ago

Yeah, there may be some way we end up with an entry that has a blank path. My question is, how can we end up with two of them? The file is written out by iterating over a hash table where the keyed off of a CacheKey, so how can the same CacheKey appear twice? I spent some time trying to figure out if there's something weird with the hashtable that could allow that, but I couldn't come up with anything.

The type of the hashtable is nsClassHashtable<nsGenericHashKey<CacheKey>, URLEntry>, where the CacheKeys are also URLEntries, so we end up with two copies of the data (one for the key and one for the value) but aside from the small excess usage of space I can't think of anything wrong with that. I couldn't see any place that mutates the values in the hash table.

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

3 years ago

Whiteboard: [domcore-s2-revisit]

Andrew McCreight [:mccr8]

Updated

•

3 years ago

Comment 14

•

3 years ago

:hsinyi given the low volume crashes here, is this still considered an S2?

Flags: needinfo?(htsai)

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 15

•

3 years ago

(In reply to Dianna Smith [:diannaS] from comment #14)

:hsinyi given the low volume crashes here, is this still considered an S2?

Just talked with mccr8, yeah, it probably makes more sense to downgrade this to S3, given the crash in beta/release has been mitigated.
Note that we sill want to figure the issue out, but it's unfortunately stalled after we gave this several try.

Severity: S2 → S3

Flags: needinfo?(htsai)

Keywords: stalled

BugBot [:suhaib / :marco/ :calixte]

Comment 16

•

3 years ago

The severity field for this bug is set to S3. However, the bug has the topcrash keyword.
:peterv, could you consider increasing the severity of this top-crash bug? If the crash isn't "top" anymore, could you drop the topcrash keyword?

For more information, please visit auto_nag documentation.

Flags: needinfo?(peterv)

Andrew McCreight [:mccr8]

Updated

•

3 years ago

Flags: needinfo?(peterv)

Keywords: topcrash-startup

Andrew McCreight [:mccr8]

Comment 18

•

3 years ago

We've bypassed the assertion that made it a top crash. It is still present on Nightly.

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

3 years ago

status-firefox-esr91: affected → wontfix

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

3 years ago

Whiteboard: [domcore-s2-revisit]

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 19

•

3 years ago

Kris going to take another look.

Flags: needinfo?(kmaglione+bmo)

Gabriele Svelto [:gsvelto]

Comment 20

•

3 years ago

There's a flurry of new crashes under this signature on nightly, if it can be useful I can extract the contents of the strings causing the crash from the reports. Just NI? me if it could help.

Jens Stutte [:jstutte]

Comment 21

•

3 years ago

•

Edited

(In reply to Gabriele Svelto [:gsvelto] from comment #20)

There's a flurry of new crashes under this signature on nightly, if it can be useful I can extract the contents of the strings causing the crash from the reports. Just NI? me if it could help.

Do you intend the string of MOZ_CRASH_REASON? In all the nightly instances that seems to be MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty), so we have a key without path, apparently. Maybe we should assert if key.mType is any of the expected types or just garbage, too?

Edit: It seems we already knew this from comment 12 and comment 13...

Gabriele Svelto [:gsvelto]

Comment 22

•

3 years ago

I mean the actual contents of the key itself.

Jens Stutte [:jstutte]

Comment 23

•

3 years ago

•

Edited

(In reply to Gabriele Svelto [:gsvelto] from comment #22)

I mean the actual contents of the key itself.

That sounds helpful, thanks.

Edit: Assuming that we can see the type this way, too. Can we see the content of buf ?

Gabriele Svelto [:gsvelto]

Comment 24

•

3 years ago

Unfortunately the contents of buf are unavailable as it's been optimized away and VS is unable to reconstruct it. key->mType is TypeAppJar and pendingURLs seems to contain only one entry aside from the sentinel (I don't know what it contains though). If I go up one frame this.mGREPrefix is jar:file:///C:/Program%20Files/Firefox%20Nightly/omni.ja!/.

BugBot [:suhaib / :marco/ :calixte]

Comment 25

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 desktop browser crashes on nightly (startup)

:hsinyi, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(htsai)

Keywords: topcrash, topcrash-startup

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 26

•

2 years ago

This doesn't look a true topcrash. There were 14 reports from 1 install and it seems that contributes to topcrash or topcrash-startup calculation.

Hey :marco, we could perhaps teach our bot to do something differently for the number of reports coming from a single install.

Flags: needinfo?(htsai) → needinfo?(mcastelluccio)

Keywords: topcrash, topcrash-startup

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 27

•

2 years ago

(In reply to Hsin-Yi Tsai (she/her) [:hsinyi] from comment #26)

This doesn't look a true topcrash. There were 14 reports from 1 install and it seems that contributes to topcrash or topcrash-startup calculation.

Hey :marco, we could perhaps teach our bot to do something differently for the number of reports coming from a single install.

Ah, looks we've already taken the above into consideration in our criteria to some extent - "Top 10 desktop browser crashes on Nightly, if they happen for enough different installations."

Marco Castelluccio [:marco]

Comment 28

•

2 years ago

From Socorro, it looks like in the last week the crash happened 18 times to 8 different installations on 110.0a1 and 4 times to 1 installation on 111.0a1.

Flags: needinfo?(mcastelluccio)

BugBot [:suhaib / :marco/ :calixte]

Comment 29

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 desktop browser crashes on nightly (startup)

:peterv, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(peterv)

Keywords: topcrash, topcrash-startup

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Duplicate of this bug: 1824666

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Flags: needinfo?(peterv)

Andrew McCreight [:mccr8]

Comment 31

•

2 years ago

The assertion is different now.

Summary: Startup Crash in [@ mozilla::URLPreloader::ReadCache] after "MOZ_DIAGNOSTIC_ASSERT(false) (Entry should be new and not in any list)" → Startup Crash in [@ mozilla::URLPreloader::ReadCache] after "MOZ_DIAGNOSTIC_ASSERT(key.mPath.Length() > 0) (Path should be non-empty)"

BugBot [:suhaib / :marco/ :calixte]

Comment 32

•

2 years ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash, topcrash-startup

Kris Maglione [:kmag]

Comment 33

•

2 years ago

Attached file Bug 1724336: Add some more URLPreloader path sanity checks. r=mccr8 — Details

Most of the startup assertions we're seeing are from when we decode multiple
paths that are empty strings. That doesn't seem likely to happen very often
from on disk corruption, so presumably it must be happening before/when we
encode the cache file?

I don't have much hope that this will turn up the cause, but it seems worth a
try, and it definitely shouldn't hurt.

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → kmaglione+bmo

Status: NEW → ASSIGNED

Pulsebot

Comment 34

•

2 years ago

Pushed by maglione.k@gmail.com: https://hg.mozilla.org/integration/autoland/rev/f2fbfecc33d1 Add some more URLPreloader path sanity checks. r=mccr8

Cristina Horotan [:chorotan]

Comment 35

•

2 years ago

Backed out changeset f2fbfecc33d1 (bug 1724336) for causing mochitest failures at URLPreloader.h

Backout: https://hg.mozilla.org/integration/autoland/rev/7dd2e5db976d861f69363f886bfabeabb8807962

Failure push: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=f2fbfecc33d1e3d1eefbd953aa850ecf8acb99cb&selectedTaskRun=bn3SXBeARMO7GSJv7DkaQw.0

Failure log: https://treeherder.mozilla.org/logviewer?job_id=413718954&repo=autoland&lineNumber=7602

Pulsebot

Comment 36

•

2 years ago

Pushed by maglione.k@gmail.com: https://hg.mozilla.org/integration/autoland/rev/b3b312008642 Add some more URLPreloader path sanity checks. r=mccr8

Cosmin Sabou [:CosminS]

Comment 37

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b3b312008642

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox114: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 114 Branch

BugBot (nomail) [:suhaib / :marco/ :calixte]

Comment 38

•

2 years ago

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

2 years ago

Status: RESOLVED → REOPENED

status-firefox112: --- → wontfix

status-firefox113: --- → wontfix

status-firefox114: fixed → affected

status-firefox-esr102: --- → wontfix

Flags: needinfo?(kmaglione+bmo)

Resolution: FIXED → ---

Target Milestone: 114 Branch → ---

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

2 years ago

Regressions: 1832591

Pascal Chevrel:pascalc

Updated

•

2 years ago

status-firefox114: affected → wontfix

BugBot [:suhaib / :marco/ :calixte]

Comment 39

•

2 years ago

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: kmaglione+bmo → nobody

You need to log in before you can comment on or make changes to this bug.