Closed Bug 1599024 Opened 1 year ago Closed 8 months ago

Crash in [@ mozilla::ReadEntry]

Categories

(Toolkit :: General, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla76
Tracking Status
firefox-esr68 --- unaffected
firefox70 --- unaffected
firefox71 --- unaffected
firefox72 --- wontfix
firefox74 --- wontfix
firefox75 --- wontfix
firefox76 --- fixed

People

(Reporter: gsvelto, Assigned: dthayer)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug is for crash report bp-b63756ca-c2bc-4b0e-a7c1-0a0d90191124.

Top 10 frames of crashing thread:

0 xul.dll static class mozilla::Result<mozilla::Ok, nsresult> mozilla::ReadEntry toolkit/components/backgroundhangmonitor/HangDetails.cpp:553
1 xul.dll static class mozilla::Result<mozilla::HangDetails, nsresult> mozilla::ReadHangDetailsFromFile toolkit/components/backgroundhangmonitor/HangDetails.cpp:608
2 xul.dll nsresult mozilla::SubmitPersistedPermahangRunnable::Run toolkit/components/backgroundhangmonitor/HangDetails.cpp:697
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1250
4 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
5 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:333
6 xul.dll void MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
7 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
8 xul.dll static void nsThread::ThreadFunc xpcom/threads/nsThread.cpp:458
9 nss3.dll static void _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:399

These are crashes in the background hang monitor with crash reason:

MOZ_CRASH(Unsupported HangEntry type?)

There are 17 crashes (from 4 installations) in nightly 72 starting with buildid 20191123214900. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1594577.

[1] https://hg.mozilla.org/mozilla-central/rev?node=eb8741186210

Blocks: clouseau
Flags: needinfo?(dothayer)
Regressed by: 1594577

To clarify - this will not affect anything other than Nightly, as the background hang monitor is not enabled on anything other than Nightly. (I am not familiar with whether the status-firefoxN flags are intended to be edited by anyone, so I'm not going to mess with that.)

Gabriele, if you personally ran into this, do you happen to still have the profile that caused it? I'm specifically interested in the last_permahang.bin file which ought to be in the root of the profile directory, if you have it.

I need to update the MOZ_CRASH("Unsupported HangEntry type?"); line to be a soft failure, but with 17 crashes this doesn't look like a randomly corrupted file, but more like a bug during the writing of the file.

Flags: needinfo?(dothayer) → needinfo?(gsvelto)

(In reply to Doug Thayer [:dthayer] from comment #2)

Gabriele, if you personally ran into this, do you happen to still have the profile that caused it? I'm specifically interested in the last_permahang.bin file which ought to be in the root of the profile directory, if you have it.

I found this during crash triage. The first crash has buildid 20191122214053 so this might be caused by a regression in that build.

I need to update the MOZ_CRASH("Unsupported HangEntry type?"); line to be a soft failure, but with 17 crashes this doesn't look like a randomly corrupted file, but more like a bug during the writing of the file.

From what I can tell there's already four separate installations in the 25 crashes that have been reported. It's possible that a bug slipped in the writing or reading paths and the file is being either miswritten or misread?

Flags: needinfo?(gsvelto)

Giving this a priority to get it out of the triage queue, please adjust as necessary. If you know a better component then Firefox::General for this, please move it.

Priority: -- → P3

I get this crash all the time now with my profile.
I have a last_permahang.bin file.

(In reply to Henrik Gemal from comment #5)

I get this crash all the time now with my profile.
I have a last_permahang.bin file.

That'd be useful.

(In reply to Sam Foster [:sfoster] (he/him) from comment #4)

Giving this a priority to get it out of the triage queue, please adjust as necessary. If you know a better component then Firefox::General for this, please move it.

When this functionality was introduced in bug 909974 it was in Core::XPCOM. Maybe that's the right component?

Attached file last_permahang.bin

Fairly simple. We should fail gracefully and delete the file if it
appears to be corrupted. I wanted to see an example of a corrupted
permahang file before fixing this (it's scoped to nightly only, and
I wanted to make sure there wasn't a bug in how we're writing
these.) The example file was just filled with zeroes at the end. This
suggests to me our write calls succeeded when we wrote it, but the
OS didn't get a chance to flush its buffers to disk (but it did
reserve the required space on disk). I'm not familiar enough with
the inner workings of filesystems or the underlying drivers to
know if this is plausible or if they should have stronger guarantees
of atomicity, but it's good enough for me to conclude that the one
instance of a bad last_permahang.bin file I've seen is not directly
due to a bug in the writing code.

Assignee: nobody → dothayer
Status: NEW → ASSIGNED

(In reply to Doug Thayer [:dthayer] from comment #9)

I'm not familiar enough with
the inner workings of filesystems or the underlying drivers to
know if this is plausible or if they should have stronger guarantees
of atomicity, but it's good enough for me to conclude that the one
instance of a bad last_permahang.bin file I've seen is not directly
due to a bug in the writing code.

I've double-checked on crash-stats and the crashes go back six months - which is the retention time we have there. I don't know why I hadn't noticed before. So even in the unlikely chance that it's a bug in the writing code it's not a recent one and definitely not a regression.

Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f6bb224e706a
Don't crash when reading bad hangentry type r=froydnj
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
You need to log in before you can comment on or make changes to this bug.