Data integrity problems in content process UntrustedModulesData
Categories
(Firefox :: Launcher Process, defect, P3)
Tracking
()
People
(Reporter: bugzilla, Assigned: toshi)
References
(Blocks 1 open bug)
Details
(Whiteboard: inj+)
Crash Data
Attachments
(2 files)
Bug 1603714 was triggered because somehow we ended up with a content process sending UntrustedModulesData
up to the parent where an event matched a non-existent module.
While I fixed the crash in that bug, unfortunately this condition still results in dropped third-party module pings from the affected machines.
We should add some diagnostics to learn why and correct this, so that we're not throwing away data that otherwise could be very useful.
Assignee | ||
Comment 1•5 years ago
|
||
Bug 1603714 showed there were UntrustedModulesData
instances in which a load
event pointed to a module which did not exist in the modules list.
This patch adds MOZ_DIAGNOSTIC_ASSERT
to the following places to narrow down
when it happened.
-
When processing load events
1-1. [Content]UntrustedModulesProcessor::CompleteProcessing:
Verify events of a trusted module were eliminated byGetModulesTrust
1-2. [Content]UntrustedModulesData::AddNewLoads
:
Verify a newModuleRecord
matches the event
1-3. [Content]UntrustedModulesProcessor::CompleteProcessing
:
Verify processed data after new items were appended. -
When processed data is sent
2-1. [Content]UntrustedModulesProcessor::GetAllProcessedData
:
Verify processed data before serialization.
2-2. [Content]ParamTraits<mozilla::UntrustedModulesData>::WriteEvent
:
Verify processed data before transferring to the browser process
2-3. [Browser]ParamTraits<mozilla::UntrustedModulesData>::ReadEvent
:
A final point to catch this integrity problem. We had an IPC error here.
Updated•5 years ago
|
Comment 3•5 years ago
|
||
bugherder |
Comment 5•5 years ago
•
|
||
Backed out changeset 996f931d61b6 (bug 1605478) for build bustages on Windows in Unified_cpp_toolkit_xre0.obj
This started to fail in today's central-as-beta simulations and would start to fail on beta tomorrow after central got merged to beta.
Push that started the failures: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=292157522&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&revision=1351d08be2769309f449b5cc80e5c449e6d6caf2
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=292157522&repo=try&lineNumber=51272
Backout: https://hg.mozilla.org/mozilla-central/rev/6b1d5e72a65b5d97025a566196fa5cb63b7e4549
Assignee | ||
Updated•5 years ago
|
Comment 7•5 years ago
|
||
bugherder |
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 9•5 years ago
|
||
All instances are hitting MOZ_DIAGNOSTIC_ASSERT(!event.IsTrusted())
. Looking at raw dumps, the variable event
points to xul.dll
but event.mLoadDurationMS
is empty. That's why it was not eliminated by the earlier check event.IsXULLoad()
.
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 10•5 years ago
|
||
Analyzing the diagnostics asserts that were temporarily introduced, there were
loading events of xul.dll where mLoadDurationMS
was empty. We put them into
the array of loading events but excluding them from the array of modules,
resulting in serialization failure of UntrustedModulesData
.
The crashing processes loaded external modules of antivirus application, such
as Symantec, AVAST, or Comodo. With these modules, xul.dll might be loaded
without hitting ntdll!LdrLoadDll
(ModuleLoadInf::IsBare()
== false).
The proposed fix is to skip processing a loading event of xul.dll regardless of
the state of mLoadDurationMS
.
Updated•5 years ago
|
Comment 11•5 years ago
|
||
There are 2 crashes with signature mozilla::UntrustedModulesData::VerifyConsistency
, the moz_crash_reason is MOZ_DIAGNOSTIC_ASSERT(mModules.Get(evt.mModule->mResolvedNtName, nullptr)) (No match in the table)
:
Comment 12•5 years ago
|
||
Aaron, any chance you could review the patch here?
Comment 13•5 years ago
|
||
Toshi, do we need separate bugs for the other assertions (comment 11)?
Assignee | ||
Comment 14•5 years ago
|
||
No need for now. I believe that signature is also caused by the same root-cause of MOZ_DIAGNOSTIC_ASSERT(!event.IsTrusted())
.
My patch generates a random number several times, which is used to reduce the number of affected users. The case of MOZ_DIAGNOSTIC_ASSERT(mModules.Get(evt.mModule->mResolvedNtName, nullptr))
and MOZ_DIAGNOSTIC_ASSERT(false) (Something else)
happens when a first random number was lucky enough to pass through the check, but after that, a second or a third random number was not.
Reporter | ||
Updated•5 years ago
|
Comment 15•5 years ago
|
||
Comment 16•5 years ago
|
||
bugherder |
Assignee | ||
Comment 17•5 years ago
|
||
Seems like all of the three diagnostics asserts are gone after the fix. We can back out the asserts.
Comment 18•5 years ago
|
||
Backed out changeset e87ab6d3d9ab (Bug 1605478) on toshi's request
Backout link: https://hg.mozilla.org/integration/autoland/rev/250960afe914203f41054e25728407476cd938fe
Comment 19•5 years ago
|
||
Comment 20•5 years ago
|
||
Sorry for the miss match.
Updated•5 years ago
|
Comment 21•5 years ago
|
||
bugherder |
Assignee | ||
Updated•5 years ago
|
Description
•