Closed Bug 1197616 Opened 9 years ago Closed 9 years ago

crash (on write, non-shutdown) in mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int)

Categories

(Toolkit :: Telemetry, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1141565
Tracking Status
firefox40 --- affected
firefox41 --- affected
firefox42 --- affected
firefox44 --- affected
firefox45 --- affected
firefox46 --- affected

People

(Reporter: jesup, Unassigned)

References

Details

(Keywords: crash, csectype-uaf, sec-high)

Crash Data

This bug was filed from the Socorro interface and is report bp-cec7adac-303a-4e6f-9e78-fce142150820. ============================================================= Spin-off from bug 1193038 to cover non-shutdown crashes, in particular what appear to be random writes (or more likely UAF writes) in ::Accumulate() See also https://crash-stats.mozilla.com/report/index/c05545b5-cc4f-47f7-bb89-129782150817#allthreads At minimum sec-high; I'll let dveditz/abillings/etc decide if it's sec-crit or not.
Marking 40-42 as affected based on crash-stats.
I loaded bp-c05545b5-cc4f-47f7-bb89-129782150817 in MSVC and get these stacks: main thread: ntdll.dll!_KiFastSystemCallRet@0 Unknown ntdll.dll!_ZwWaitForSingleObject@12 Unknown ntdll.dll!_WaitForWerSvc@0 Unknown ntdll.dll!_SendMessageToWERService@8 Unknown ntdll.dll!_ReportExceptionInternal@16 Unknown kernel32.dll!_WerpReportFaultInternal@8 Unknown kernel32.dll!_WerpReportFault@8 Unknown kernel32.dll!_BasepReportFault@8 Unknown kernel32.dll!_UnhandledExceptionFilter@4 Unknown ntdll.dll!___RtlUserThreadStart@8 Unknown ntdll.dll!@_EH4_CallFilterFunc@8 Unknown ntdll.dll!ExecuteHandler2@20 Unknown ntdll.dll!ExecuteHandler@20 Unknown ntdll.dll!_RtlDispatchException@8 Unknown ntdll.dll!_KiUserExceptionDispatcher@8 Unknown > xul.dll!mozilla::Telemetry::Accumulate Line 3721 C++ xul.dll!mozilla::Telemetry::AccumulateDelta_impl<1>::compute Line 111 C++ xul.dll!mozilla::Telemetry::AutoTimer<94,1>::~AutoTimer<94,1> Line 140 C++ xul.dll!nsCSSRendering::PaintGradient Line 2828 C++ xul.dll!nsImageRenderer::Draw Line 4942 C++ xul.dll!nsImageRenderer::DrawBackground Line 5031 C++ xul.dll!nsCSSRendering::PaintBackgroundWithSC Line 3029 C++ xul.dll!nsCSSRendering::PaintBackground Line 1669 C++ xul.dll!nsDisplayBackgroundImage::PaintInternal Line 2703 C++ xul.dll!nsDisplayBackgroundImage::Paint Line 2688 C++ xul.dll!mozilla::FrameLayerBuilder::PaintItems Line 5317 C++ xul.dll!mozilla::FrameLayerBuilder::DrawPaintedLayer Line 5516 C++ xul.dll!mozilla::layers::ClientPaintedLayer::PaintThebes Line 92 C++ xul.dll!mozilla::layers::ClientPaintedLayer::RenderLayerWithReadback Line 143 C++ xul.dll!mozilla::layers::ClientContainerLayer::RenderLayer Line 71 C++ xul.dll!mozilla::layers::ClientLayer::RenderLayerWithReadback Line 384 C++ xul.dll!mozilla::layers::ClientContainerLayer::RenderLayer Line 71 C++ xul.dll!mozilla::layers::ClientLayerManager::EndTransactionInternal Line 275 C++ xul.dll!mozilla::layers::ClientLayerManager::EndTransaction Line 318 C++ xul.dll!nsDisplayList::PaintRoot Line 1657 C++ xul.dll!nsLayoutUtils::PaintFrame Line 3330 C++ xul.dll!PresShell::Paint Line 6118 C++ xul.dll!nsViewManager::ProcessPendingUpdatesPaint Line 456 C++ xul.dll!nsViewManager::ProcessPendingUpdatesForView Line 392 C++ xul.dll!nsViewManager::ProcessPendingUpdates Line 1088 C++ xul.dll!nsRefreshDriver::Tick Line 1811 C++ xul.dll!mozilla::RefreshDriverTimer::TickDriver Line 196 C++ xul.dll!mozilla::RefreshDriverTimer::Tick Line 186 C++ xul.dll!mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers Line 438 C++ xul.dll!mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver Line 373 C++ xul.dll!nsRunnableMethodImpl<void (__thiscall mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp),1,mozilla::TimeStamp>::Run Line 828 C++ xul.dll!nsThread::ProcessNextEvent Line 849 C++ xul.dll!mozilla::ipc::MessagePump::Run Line 95 C++ xul.dll!MessageLoop::RunHandler Line 228 C++ xul.dll!MessageLoop::Run Line 202 C++ xul.dll!nsBaseAppShell::Run Line 167 C++ xul.dll!nsAppShell::Run Line 178 C++ xul.dll!nsAppStartup::Run Line 281 C++ xul.dll!XREMain::XRE_mainRun Line 4269 C++ xul.dll!XREMain::XRE_main Line 4353 C++ xul.dll!XRE_main Line 4443 C++ > xul.dll!mozilla::Telemetry::Accumulate Line 3721 C++ xul.dll!CheckPinsForHostname Line 330 C++ xul.dll!mozilla::psm::PublicKeyPinningService::ChainHasValidPins Line 364 C++ xul.dll!mozilla::psm::NSSCertDBTrustDomain::IsChainValid Line 780 C++ xul.dll!mozilla::pkix::BuildForward Line 296 C++ xul.dll!mozilla::pkix::PathBuildingStep::Check Line 195 C++ xul.dll!mozilla::psm::FindIssuerInner Line 111 C++ xul.dll!mozilla::psm::NSSCertDBTrustDomain::FindIssuer Line 139 C++ xul.dll!mozilla::pkix::BuildForward Line 323 C++ xul.dll!mozilla::pkix::PathBuildingStep::Check Line 195 C++ xul.dll!mozilla::psm::FindIssuerInner Line 111 C++ xul.dll!mozilla::psm::NSSCertDBTrustDomain::FindIssuer Line 145 C++ xul.dll!mozilla::pkix::BuildForward Line 323 C++ xul.dll!mozilla::pkix::BuildCertChain Line 360 C++ xul.dll!mozilla::psm::BuildCertChainForOneKeyUsage Line 93 C++ xul.dll!mozilla::psm::CertVerifier::VerifyCert Line 267 C++ xul.dll!mozilla::psm::CertVerifier::VerifySSLServerCert Line 471 C++ xul.dll!nsNSSSocketInfo::IsAcceptableForHost Line 390 C++ xul.dll!nsNSSSocketInfo::JoinConnection Line 423 C++ xul.dll!mozilla::net::nsHttpConnectionMgr::GetSpdyPreferredEnt Line 884 C++ xul.dll!mozilla::net::nsHttpConnection::CanDirectlyActivate Line 721 C++ xul.dll!mozilla::net::nsHttpConnectionMgr::GetSpdyPreferredConn Line 2342 C++ xul.dll!mozilla::net::nsHttpConnectionMgr::nsConnEvent::Run Line 635 C++ xul.dll!nsThread::ProcessNextEvent Line 849 C++ xul.dll!nsSocketTransportService::Run Line 898 C++ xul.dll!nsThread::ProcessNextEvent Line 849 C++ xul.dll!NS_ProcessNextEvent Line 265 C++ xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run Line 356 C++ xul.dll!MessageLoop::RunHandler Line 228 C++ xul.dll!MessageLoop::Run Line 202 C++ xul.dll!nsThread::ThreadFunc Line 362 C++ Looking through GetHistogramByEnumId, there is a static "knownHistograms" which is not protected by any lock. In that case if there were two threads accumulating the *same* histogram we could call HistogramGet simultaneously on both threads. But since these use the same ID and there is synchronization in Histogram::FactoryGet, they should end up returning the same pointer. So assuming memory writes of these pointer-size words are atomic, there should be no hazard there. I'll continue looking at the actual values.
At the point of the crash on the PSM thread: Unhandled exception at 0x62424CC6 (xul.dll) in c05545b5-cc4f-47f7-bb89-129782150817.dmp: 0xC0000005: Access violation writing location 0x74C0890C. 3721: HistogramAdd(*h, aSample, gHistograms[aHistogram].dataset); 62424CC6 shl dword ptr [edi+74C08505h],1Bh EDI = 0x00000407 (CERT_PINNING_RESULTS) Except what's weird is that gHistograms is 0x63c72680 not 0x74c08505. And gHistograms is const, anyway, so it shouldn't be doing any math on it. I hate to go straight to "memory corruption" theory for this, so I'm going to run that build and debug it to see that code in action.
This does appear a lot like memory corruption: When running normally: 02A84CC4 75 1C jne mozilla::Telemetry::Accumulate+4Eh (02A84CE2h) 3721: HistogramAdd(*h, aSample, gHistograms[aHistogram].dataset); 02A84CC6 C1 E7 05 shl edi,5 02A84CC9 85 C0 test eax,eax 02A84CCB 74 1B je mozilla::Telemetry::Accumulate+54h (02A84CE8h) 02A84CCD 80 78 45 00 cmp byte ptr [eax+45h],0 From the minidump: 62424CC2 85 CD test ebp,ecx 62424CC4 75 1C jne mozilla::Telemetry::Accumulate+4Eh (62424CE2h) 3721: HistogramAdd(*h, aSample, gHistograms[aHistogram].dataset); 62424CC6 C1 A7 05 85 C0 74 1B shl dword ptr [edi+74C08505h],1Bh 62424CCD 80 78 45 00 cmp byte ptr [eax+45h],0 I really couldn't tell you how/why we're corrupting that one byte of memory. Possibilities include some malware trying to patch our binaries or some other kind of memory corruption. It's pretty unusual that this would happen by accident at the same place every time, though: this memory should be mapped read-only after initial relocations, so you'd have to do some work to corrupt it. dmajor, any thoughts?
Flags: needinfo?(dmajor)
0:009> .formats e7 ^ a7 Evaluate expression: Hex: 00000040 Decimal: 64 Octal: 00000000100 Binary: 00000000 00000000 00000000 01000000 In other words, a single-bit error. I wouldn't be surprised if this is a hardware failure. > It's pretty unusual that this would happen by accident at the same place every time I got the impression that this was a one-off. Are there other crash reports with this error? Are they from different machines? Different libxul base addresses?
Flags: needinfo?(dmajor)
(In reply to dmajor (away October 8-15) from comment #5) > I got the impression that this was a one-off. Are there other crash reports > with this error? Are they from different machines? Different libxul base > addresses? Randell, can you answer these questions? Thanks.
Flags: needinfo?(rjesup)
Flags: needinfo?(rjesup) → needinfo?(gfritzsche)
Crash Signature: [@ mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int)] → [@ mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int)] [@ mozilla::Telemetry::Accumulate]
Flags: needinfo?(gfritzsche)
Benjamin, i don't think i can effectively make this actionable now - do we have anyone who could investigate this better until i can take it over?
Flags: needinfo?(benjamin)
Jesup, are you saying that my analysis from comment 4 is incorrect? That isn't a UAF, it's memory corruption probably from outside, and not something we'd normally track. Georg, you could load additional minidumps to see whether others have the same pattern of memory corruption, especially over multiple builds.
Flags: needinfo?(benjamin) → needinfo?(rjesup)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #9) > Jesup, are you saying that my analysis from comment 4 is incorrect? That > isn't a UAF, it's memory corruption probably from outside, and not something > we'd normally track. There are a bunch with other signatures, much like we had a bunch from nsHostResolver. The one you looked at might be a 1-byte corruption (malware or cosmic ray). But I doubt all of these are. And the thread-safeness of this code is *very* suspect -- and undocumented. And I was answering if the remaining issues here were a 1-off, and they don't appear to be. > Georg, you could load additional minidumps to see whether others have the > same pattern of memory corruption, especially over multiple builds. That would be good, especially with some of the ones that happen multiple times in a similar fashion.
Flags: needinfo?(rjesup)
I can give it a try, although i'm probably not the best person to look at it. I don't have access to the dumps though, who can i ask for the permissions?
Flags: needinfo?(benjamin)
Vladan, aklotz, froydnj and I all have access, and a few others. Send one of us a list of the UUIDs you want dumps for.
Flags: needinfo?(benjamin)
bsmedberg/vladan/froyd: the list from comment 7 are all different crash backtraces (some of which occur a number of times in the last month); that'd be an excellent set to start with. There are 7 there. (NI'ing 3 of you; if someone grabs it please clear the others. Thanks!)
Flags: needinfo?(vladan.bugzilla)
Flags: needinfo?(nfroyd)
Flags: needinfo?(benjamin)
I still haven't been granted minidump permissions, so clearing needinfo while I poke people on the permissions bug 1204548
Flags: needinfo?(vladan.bugzilla)
This crash is still showing up in low volume on 43-46.
Group: toolkit-core-security
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(benjamin)
Resolution: --- → DUPLICATE
Flags: needinfo?(nfroyd)
You need to log in before you can comment on or make changes to this bug.