Closed Bug 1455828 Opened 7 years ago Closed 7 years ago

Crash in PrefValue::Deserialize

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla61

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox-esr60

---

unaffected

firefox59

---

unaffected

firefox60

---

unaffected

firefox61

verified

People

(Reporter: calixte, Assigned: jld)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Calixte Denizet (:calixte)

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is report bp-65d48a55-59e4-4d59-bd7e-ceff90180420. ============================================================= Top 10 frames of crashing thread: 0 libxul.so PrefValue::Deserialize modules/libpref/Preferences.cpp:305 1 libxul.so Pref::Deserialize modules/libpref/Preferences.cpp:912 2 libxul.so mozilla::Preferences::DeserializePreferences modules/libpref/Preferences.cpp:3421 3 libxul.so mozilla::dom::ContentProcess::Init dom/ipc/ContentProcess.cpp:215 4 libxul.so XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp:695 5 firefox content_process_main ipc/contentproc/plugin-container.cpp:50 6 firefox main.cold.3 7 libc-2.23.so libc-2.23.so@0x2082f 8 firefox firefox@0x15d9f 9 firefox double_conversion::BignumDtoa ============================================================= There are 33 crashes (from 30 installations) in nightly 61 starting with buildid 20180420100056. :njn, could you investigate please ? By the way, for all the crashes, the uptime is 0 but the startup_crash flag is unset. Is the deserialization occuring during the startup or not ?

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 1

•

7 years ago

The crashes are all on Linux, and are frequent enough that this is the #1 topcrash on Linux Nightly. As per comment 0, they started in build 20180420100056, which is when bug 1439057 and bug 1447867 landed. So I think it's highly likely that one or both of those is to blame. jld, what do you think? The crash that's occurring is caused by the serialized pref data in the shared memory having an unexpected form. It's hard to know exactly what that invalid form is. Perhaps it's full of zeroes or something? Given the high crash rate, I think those two bugs' patches should be backed out until we work out what's happening. jld, can you do that?

Flags: needinfo?(n.nethercote) → needinfo?(jld)

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

tracking-firefox61: --- → +

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Blocks: 1447867

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Comment 2

•

7 years ago

I don't really understand what's going on here. The code I changed was only about obtaining the fd; if that works, the rest of it shouldn't be affected. (And on Linux it should be a regular file in the same directory as before.) One observable change is that I stopped seeking to the end of the file, because mmap doesn't use the file offset, but I didn't find anything trying to read/write the fd directly so that shouldn't matter. The other change is the leaf filename, but if something like AppArmor were filtering based on names that should have made the file creation fail, and then we'd never create the child process.

Calixte Denizet (:calixte)

Reporter

Comment 3

•

7 years ago

For information, there are no more crashes with this signature since the patches from bug 1439057 and bug 1447867 have been backed out.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Comment 4

•

7 years ago

Some other differences: the old code is setting O_APPEND via fdopen, as well as seeking to the end. If there's a stray write to the fd somehow, previously it would have written after the pref blob's null terminator and been ignored (because the length is passed on the command line instead of fstat'ing the file), but now it would corrupt the start. And the old code wasn't setting close-on-exec, while the new code does (via shm_open). This is potentially relevant because — and this is a separate bug I need to file — the IPC fd-shuffling code doesn't appear to unset the close-on-exec flag if an fd winds up mapped to the same fd. From looking at the code: the byte that isn't a valid pref type also isn't NUL, which means we're not reading past the end of the file (or an un-written part of it), which rules out a few things. And we could crash earlier, in the pref type switch, instead of NS_ERRORing: if we get to that case we're going to crash later anyway, but with less context. It would also be possible to include the bad data in the minidump.

Assignee: nobody → jld

Flags: needinfo?(jld)

Nicholas Nethercote [inactive]

Comment 5

•

7 years ago

> And we could crash earlier, in the pref type switch, instead of NS_ERRORing: > if we get to that case we're going to crash later anyway, but with less > context. It would also be possible to include the bad data in the minidump. Yes, the current handling of unexpected data is totally unprincipled. We could crash more eagerly, in more places, and use UNSAFE_PRINTF to show some of the data.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Depends on: 1456902

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Depends on: 1456911

Marcia Knous [:marcia]

Comment 6

•

7 years ago

This is currently the top browser crash on Nightly.

Keywords: topcrash

Nicholas Nethercote [inactive]

Comment 7

•

7 years ago

This was fixed by the backouts, as per comment 2. I think we can close it. Bug 1456902 is open for some follow-up work.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WORKSFORME

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Updated

•

7 years ago

Status: RESOLVED → VERIFIED

status-firefox61: affected → verified

Resolution: WORKSFORME → FIXED

Target Milestone: --- → mozilla61

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

No longer depends on: 1456902

Comment 8

•

3 years ago

Just by clicking on the signature, we still have numbers of crashes. Jld, is it still caused by the same reason, do you mind take a look? Thanks!

Flags: needinfo?(jld)

Andrew McCreight [:mccr8]

Updated

•

3 years ago

Comment 9

•

3 years ago

(In reply to Sean Feng [:sefeng] from comment #8)

Just by clicking on the signature, we still have numbers of crashes. Jld, is it still caused by the same reason, do you mind take a look? Thanks!

I filed bug 1767026 for this. It looks like a severe enough spike on Nightly that I think it is worth a new bug.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Comment 10

•

3 years ago

It's not clear if I ever really knew what's going on in this bug, but the comments in bug 1767026 so far suggest that the recent crashes might be a recent regression.

Flags: needinfo?(jld)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Crash in PrefValue::Deserialize

Categories

(Core :: Preferences: Backend, defect)

Tracking

()

People

(Reporter: calixte, Assigned: jld)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Comment 6

Comment 7

Updated

Updated

Updated

Comment 8

Updated

Comment 9

Comment 10