Closed Bug 1245743 Opened 9 years ago Closed 9 years ago

crash in mozilla::ObservedDocShell::PopMarkers

Categories

(DevTools :: Performance Tools (Profiler/Timeline), defect, P2)

45 Branch
defect

Tracking

(firefox45 wontfix, firefox46+ wontfix, firefox47+ fixed, firefox48 fixed)

RESOLVED FIXED
Firefox 48
Tracking Status
firefox45 --- wontfix
firefox46 + wontfix
firefox47 + fixed
firefox48 --- fixed

People

(Reporter: MatsPalmgren_bugz, Assigned: tromey)

References

Details

(Keywords: crash, csectype-uaf, sec-moderate, Whiteboard: [post-critsmash-triage][adv-main47+] sec-high risk for people who use dev tools to profile)

Crash Data

Attachments

(1 file, 2 obsolete files)

This bug was filed from the Socorro interface and is report bp-34e40358-b92d-4d0d-b3bc-7d8ea2160203. ============================================================= This signature looks like a regression in v45. Relatively low volume there, but it seems to be higher volume in v46/47. Crashes on all platforms. Product Version Percentage Number Of Crashes Firefox 46.0a2 84.36% 151 Firefox 47.0a1 8.94% 16 Firefox 46.0a1 3.35% 6 Firefox 45.0a2 3.35% 6 Filing as security sensitive since the address indicates UAF: Crash Reason EXCEPTION_ACCESS_VIOLATION_READ Crash Address 0xffffffffe5e5e5e5 More Reports: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3AObservedDocShell%3A%3APopMarkers
This is the #3 topcrash in 46. Tracking for 46 and 47.
It'd be interesting to see if this affects only webdevs using the performance tool. That code should never be executed otherwise.
There's > mozilla::SamplerStackFrameRAII::SamplerStackFrameRAII(char const*, js::ProfileEntry::Category) in those stacks, so I would assume this happens while the profiler is running, so for people using the performance tool.
Priority: -- → P2
Whiteboard: sec-high risk for people who use dev tools to profile
This started (to spike) in 46.0a2 Dev Edition with the 2016-02-05 build, and over the last week is the top crash of 46 Dev Edition.
Joe or Dave can you help find someone to work on this issue? Since it's the top crash in aurora for 46 (and seems likely to be for 47 once it's in aurora) and is a sec-high issue for those users, it would be nice to fix this.
Flags: needinfo?(jwalker)
Flags: needinfo?(dcamp)
Victor and Tom, not sure what area of timeline markers this is, but I think one of you is familiar -- could you check this out or know who'd be appropriate?
Flags: needinfo?(vporof)
Flags: needinfo?(ttromey)
Flags: needinfo?(jwalker)
Flags: needinfo?(dcamp)
Looking at the reported crash location, I find it difficult to come up with theories to explain what has gone wrong: http://hg.mozilla.org/releases/mozilla-aurora/annotate/bafa23fb5915/docshell/base/timeline/ObservedDocShell.cpp#l141 The main difficulty I see is that there are previous (dominating) uses of endPayload. So I would expect the crash to occur earlier if something was wrong with this object. I looked into a few ideas with no success. * ObservedDocShell's lifecycle seems correct based on assertions in MarkerStorage constructor and destructor. * mTimelineMarkers.AppendElements(Move(mOffTheMainThreadTimelineMarkers)); ... does in fact do the right thing and doesn't prematurely destroy objects. * The whole loop in ObservedDocShell::PopMarkers is complicated but still seems to handle lifetimes properly * Some timeline markers don't implement Clone; but these all seem to be safe, and in any case if this were the failure the crash should occur elsewhere I think without STR, I'm stuck. Maybe someone else has some insight.
Flags: needinfo?(ttromey)
There were 400 crashes with this signature on aurora 46 in the last week. I'll try asking on crash-debug.
Can anything under ObservedDocShell::PopMarkers call into script?
On mac, the crash address is 0x0 On windows x86, it's pretty consistently 0xffffffffe5e5e5e5 which probably means 0xe5e5e5e5 which IIRC is our jemalloc poison-on-free marker. I'm loading this up in MSVC now to see what exact memory/instruction we're dereferencing.
> On mac, the crash address is 0x0 On Mac (and 64-bit in general, afaict) a crash address of 0x0 is often a lie. :(
endPayload->AddDetails(aCx, *marker); 10C2FA8B mov eax,dword ptr [esp+1Ch] <-- deref mTimelineMarkers.mHdr? 10C2FA8F mov edx,dword ptr [esp+2Ch] <-- `j` 10C2FA93 mov ecx,dword ptr [edx+eax] <-- *(mTimelineMarkers.mHdr + j) 10C2FA96 mov edx,dword ptr [esp+14h] 10C2FA9A push edx 10C2FA9B push dword ptr [esp+34h] 10C2FA9F mov eax,dword ptr [ecx] <-- crash is here, ECX == 0xe5e5e5e5 10C2FAA1 call dword ptr [eax+0Ch] We're recomputing mTimelineMarkers.ElementAt(j) here instead of storing it in a register, and so it seems likely that mTimelineMarkers has been modified during the call to startPayload->AddDetails. Even a resize of mTimelineMarkers could cause this.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #12) > We're recomputing mTimelineMarkers.ElementAt(j) here instead of storing it > in a register, and so it seems likely that mTimelineMarkers has been > modified during the call to startPayload->AddDetails. Even a resize of > mTimelineMarkers could cause this. Thank you. One possibility might be that https://dxr.mozilla.org/mozilla-central/source/docshell/base/timeline/JavascriptTimelineMarker.h#57 causes a GC, which in turn emits a new timeline marker. I'm sure this code was never examined to see whether it was reentrant.
(In reply to Boris Zbarsky [:bz] from comment #9) > Can anything under ObservedDocShell::PopMarkers call into script? I don't believe so.
MozReview-Commit-ID: IllTB7DOdlZ This approach just prevents any reentrant marker addition.
Maybe make that a MOZ_RELEASE_ASSERT? There are often surprising ways things can reenter when GC is involved...
MozReview-Commit-ID: IllTB7DOdlZ
Attachment #8725372 - Attachment is obsolete: true
Comment on attachment 8725717 [details] [diff] [review] don't push new timeline markers while popping markers Avoid pushing new markers while popping markers. Another possible approach would be to empty mTimelineMarkers while popping, then re-append any saved markers at the end. However the current patch seemed simpler to reason about to me, and it also seemed unlikely that markers generated by the timeline machinery itself would be interesting.
Attachment #8725717 - Flags: review?(vporof)
Attachment #8725717 - Flags: review?(vporof) → review+
Assignee: nobody → ttromey
Once this lands and we can verify it helps, we should uplift the fix to aurora and beta as soon as possible.
Rebased.
Attachment #8725717 - Attachment is obsolete: true
Attachment #8728478 - Flags: review+
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
We need this on all the branches, except ESR38, right? (assuming the patch fixes the crash.)
Comment on attachment 8728478 [details] [diff] [review] don't push new timeline markers while popping markers >+ MOZ_RELEASE_ASSERT(!mPopping); >+ mPopping = true; >+ auto resetPopping = MakeScopeExit([&] { mPopping = false; }); Btw, we have AutoRestore for this use case. Would be easier to read with that.
(In reply to Olli Pettay [:smaug] from comment #26) > Btw, we have AutoRestore for this use case. Would be easier to read with > that. Thanks. I didn't manage to find this when writing the patch. I'll file a follow up and fix this.
Group: firefox-core-security → core-security-release
Flags: needinfo?(vporof)
Can you link the followup bug from this one? This isn't a topcrash in 46 any more; not even in the top 50 crash signatures. We do still show a few (23 total) crashes in 46 beta 2. I think it is showing up more in aurora/dev edition. bz and smaug, what do you think about uplift/riskiness?
Flags: needinfo?(ttromey)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bugs)
This seems pretty safe to uplift at first glance, but I'm not exactly an expert on this code.
Flags: needinfo?(bzbarsky)
yeah, should be safe, assuming AddMarker(UniquePtr<AbstractTimelineMarker>&& aMarker) not actually moving marker to anywhere is safe. (got to hate C++ Move semantics). Looks like it is safe.
Flags: needinfo?(bugs)
Let's go with uplifting to aurora, it is not a topcrash and not sec-high so I want to minimize risk for beta. Ritu what do you think?
Flags: needinfo?(rkothari)
Followup was bug 1255459. It's not essential. Is there anything else I need to do here?
Blocks: 1255459
Flags: needinfo?(ttromey)
Hi Tom, could you please nominate uplift to Aurora47? (p.s.: This is not a top-crasher on 47 either but there have been ~200 instances so it's decently bad to take a fix which I hope helps rather than hurts the situation)
Flags: needinfo?(rkothari) → needinfo?(ttromey)
Comment on attachment 8728478 [details] [diff] [review] don't push new timeline markers while popping markers Approval Request Comment [Feature/regressing bug #]: This fixes a crasher affecting the timeline, part of the devtools performance feature. [User impact if declined]: Occasional crashes when using devtools. [Describe test coverage new/current, TreeHerder]: There are no specific tests that trigger this crash. I think it was just too difficult to trigger the crash reliably, as it required provoking a GC while fetching timeline data, and having the GC-generated marker cause an internal array to be resized. [Risks and why]: I consider this patch to be low risk. It prevents a certain path from re-entering the timeline marker code. [String/UUID change made/needed]: None.
Flags: needinfo?(ttromey)
Attachment #8728478 - Flags: approval-mozilla-aurora?
Comment on attachment 8728478 [details] [diff] [review] don't push new timeline markers while popping markers (Potential) crash fix, Aurora47+
Attachment #8728478 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: sec-high risk for people who use dev tools to profile → [post-critsmash-triage]sec-high risk for people who use dev tools to profile
Whiteboard: [post-critsmash-triage]sec-high risk for people who use dev tools to profile → [post-critsmash-triage][adv-main47+] sec-high risk for people who use dev tools to profile
Group: core-security-release
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: