Closed Bug 1603714 Opened 8 months ago Closed 8 months ago

Crash in [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::dom::PContentParent::OnMessageReceived] from deserializing UntrustedModulesData

Categories

(External Software Affecting Firefox :: Telemetry, defect)

Unspecified
Windows 10
defect
Not set
normal

Tracking

(firefox-esr68 unaffected, firefox71 unaffected, firefox72+ fixed, firefox73+ fixed)

RESOLVED FIXED
Tracking Status
firefox-esr68 --- unaffected
firefox71 --- unaffected
firefox72 + fixed
firefox73 + fixed

People

(Reporter: pascalc, Assigned: aklotz)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(5 files, 1 obsolete file)

[Tracking Requested - why for this release]:

This bug is for crash report bp-a0d9cbfa-ea1d-463a-86b0-e0dfe0191213.

Top 10 frames of crashing thread:

0 xul.dll mozilla::ipc::FatalError ipc/glue/ProtocolUtils.cpp:165
1 xul.dll mozilla::ipc::IProtocol::HandleFatalError ipc/glue/ProtocolUtils.cpp:404
2 xul.dll mozilla::dom::PContentParent::OnMessageReceived ipc/ipdl/PContentParent.cpp:6312
3 xul.dll void mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2131
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1256
5 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
6 xul.dll void mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:109
7 xul.dll void MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
8 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
9 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137

This signature is spiking on nightly over the last 3 days.

I looked at about 10 of these crash reports and they all had an IPC fatal error message of "Error deserializing 'UntrustedModulesData?'".

Component: DOM: Content Processes → Telemetry
Flags: needinfo?(aklotz)
Product: Core → External Software Affecting Firefox
Regressed by: 1522830
Summary: Crash in [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::dom::PContentParent::OnMessageReceived] → Crash in [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::dom::PContentParent::OnMessageReceived] from deserializing UntrustedModulesData
Flags: needinfo?(davidp99)

I don't know this code at all but the ParamTraits look pretty accurate to me. There is a little mixture of nsAutoString and nsString with ParamTraits but I'm pretty sure that is legit. The only other part that I wouldn't feel 100% on is this:
https://searchfox.org/mozilla-central/rev/c61720a7d0c094d772059f9d6a7844eb7619f107/toolkit/xre/UntrustedModulesData.h#301
I don't get all of the conditions under which NS_NewLocalFile fails.

Flags: needinfo?(davidp99)
Flags: needinfo?(aklotz)
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Attached file Data Review Request
Attachment #9116556 - Flags: data-review?(chutten)
Comment on attachment 9116556 [details]
Data Review Request

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is documented in its definitions file [CrashAnnotations.yaml](https://hg.mozilla.org/mozilla-central/file/tip/toolkit/crashreporter/CrashAnnotations.yaml).

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is part of crash reporting which is opt-in on the first and subsequent crashes, as well as being controllable through Firefox's Preferences.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

No. This collection will expire as soon as we get answers.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

    Is the data collection request for default-on or default-off?

Default off for everyone. Only ever collected on Nightly, opt-in.

    Does the instrumentation include the addition of any new identifiers?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

Yes. :aklotz is responsible for renewing or removing the collection once we get answers.

---
Result: datareview+
Attachment #9116556 - Flags: data-review?(chutten) → data-review+
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/44b14e9e661f
Add temporary, Nightly-only crash annotations to UntrustedModulesData and dependencies to determine failure location; r=handyman

I've got some data coming back now. From the annotation that I've seen, it looks like there are events that reference modules that are not in the ModulesMap hash table. I'm not sure how/why yet.

[Tracking Requested - why for this release]:
Potential for buffers to be filled with invalid data, leading to unexpected errors.

Regressed by: 1587642
No longer regressed by: 1522830

Untrusted modules is receiving bad data due to a copy/move of MemorySectionNameBuf that does not update the UNICODE_STRING's Buffer field with a pointer to the receiving buffer's location.

I think this is the only cause, but I'll leave the instrumentation in a bit longer to confirm.

The affected code hasn't made it to beta yet. Revoking tracking request.

Actually, that's incorrect.

[Tracking Requested - why for this release]:
Potential for invalid buffer contents. Those buffers are not really used in 72, but we should fix this just in case. The regressing bug did land in 72, even though the subsequent work that revealed the crashes did not land until 73.

Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e0f712f3c394
Add explicit copy constructor and copy assignment operators to nt::MemorySectionNameBuf; r=mhowell

Still seeing crashes in the 20191220095035 nightly which should have the patch from comment 16, e.g. bp-4db0d02f-7b3b-4197-a9f2-f9a470191220

Flags: needinfo?(aklotz)

I still do not yet have a good grasp of the root cause of this data integrity
issue, however I also think that having the browser process crash because of it
is far worse.

This also adds a check in our telemetry processing that the deserialized event
is valid.

We'll probably need to add more diagnostic information in the future, but I'd
like to get rid of the IPC errors in Nightly in time for holidays.

Depends on D58046

Flags: needinfo?(aklotz)
Regressed by: 1522830

Comment on attachment 9117116 [details]
Bug 1603714: Add explicit copy constructor and copy assignment operators to nt::MemorySectionNameBuf; r=mhowell

Beta/Release Uplift Approval Request

  • User impact if declined: Potential for bad memory to be included in various buffers. AFAICT nothing is actually done with those buffers in 72 now that it is in Beta, but I think we should land this just in case.

(Confirming that just this patch needs to be uplifted, not the others in this bug)

  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Very simple, localized patch with good understanding of the issue.
  • String changes made/needed: None
Attachment #9117116 - Flags: approval-mozilla-beta?
See Also: → 1605478
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5d04eb35d98e
Backed out changeset 44b14e9e661f as this information is no longer needed;
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c3a21a4b5a45
Do not treat third-party module data integrity error as IPC error; r=mhowell

Comment on attachment 9117116 [details]
Bug 1603714: Add explicit copy constructor and copy assignment operators to nt::MemorySectionNameBuf; r=mhowell

approved for 72.0b11

Attachment #9117116 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(aklotz)
Attachment #9117564 - Attachment description: Bug 1603714: Follow-up: also add move constructor and move assignment operator; → Bug 1603714: Follow-up: also add move constructor and move assignment operator to MemorySectionNameBuf; r=handyman
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9017c76f2cb5
Follow-up: also add move constructor and move assignment operator to MemorySectionNameBuf; r=handyman

Aaron, can you request uplift for the new patch if you think it is safe for beta? We have a beta build tonight, and then the RC build coming up next Monday.

Flags: needinfo?(aklotz)

Comment on attachment 9117564 [details]
Bug 1603714: Follow-up: also add move constructor and move assignment operator to MemorySectionNameBuf; r=handyman

Beta/Release Uplift Approval Request

  • User impact if declined:
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Simple, understandable patch that is very tightly scoped.
  • String changes made/needed: None
Flags: needinfo?(aklotz)
Attachment #9117564 - Flags: approval-mozilla-beta?

Note that D57889 should land on Beta first, followed by D58142.

Comment on attachment 9117564 [details]
Bug 1603714: Follow-up: also add move constructor and move assignment operator to MemorySectionNameBuf; r=handyman

Fix for memory issue and possible crash. Let's take this for the RC build on Monday.

Note that D57889 should land on Beta first, followed by D58142.

Attachment #9117564 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9116555 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.