Closed
Bug 1245743
Opened 9 years ago
Closed 9 years ago
crash in mozilla::ObservedDocShell::PopMarkers
Categories
(DevTools :: Performance Tools (Profiler/Timeline), defect, P2)
Tracking
(firefox45 wontfix, firefox46+ wontfix, firefox47+ fixed, firefox48 fixed)
RESOLVED
FIXED
Firefox 48
People
(Reporter: MatsPalmgren_bugz, Assigned: tromey)
References
Details
(Keywords: crash, csectype-uaf, sec-moderate, Whiteboard: [post-critsmash-triage][adv-main47+] sec-high risk for people who use dev tools to profile)
Crash Data
Attachments
(1 file, 2 obsolete files)
3.32 KB,
patch
|
tromey
:
review+
ritu
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-34e40358-b92d-4d0d-b3bc-7d8ea2160203.
=============================================================
This signature looks like a regression in v45. Relatively low volume there,
but it seems to be higher volume in v46/47. Crashes on all platforms.
Product Version Percentage Number Of Crashes
Firefox 46.0a2 84.36% 151
Firefox 47.0a1 8.94% 16
Firefox 46.0a1 3.35% 6
Firefox 45.0a2 3.35% 6
Filing as security sensitive since the address indicates UAF:
Crash Reason EXCEPTION_ACCESS_VIOLATION_READ
Crash Address 0xffffffffe5e5e5e5
More Reports:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3AObservedDocShell%3A%3APopMarkers
Comment 1•9 years ago
|
||
This is the #3 topcrash in 46. Tracking for 46 and 47.
status-firefox45:
--- → affected
status-firefox46:
--- → affected
status-firefox47:
--- → affected
tracking-firefox46:
--- → +
tracking-firefox47:
--- → +
Comment 2•9 years ago
|
||
It'd be interesting to see if this affects only webdevs using the performance tool. That code should never be executed otherwise.
Comment 3•9 years ago
|
||
There's
> mozilla::SamplerStackFrameRAII::SamplerStackFrameRAII(char const*, js::ProfileEntry::Category)
in those stacks, so I would assume this happens while the profiler is running, so for people using the performance tool.
Updated•9 years ago
|
Priority: -- → P2
Updated•9 years ago
|
Keywords: csectype-uaf,
sec-moderate
Whiteboard: sec-high risk for people who use dev tools to profile
![]() |
||
Comment 4•9 years ago
|
||
This started (to spike) in 46.0a2 Dev Edition with the 2016-02-05 build, and over the last week is the top crash of 46 Dev Edition.
Comment 5•9 years ago
|
||
Joe or Dave can you help find someone to work on this issue? Since it's the top crash in aurora for 46 (and seems likely to be for 47 once it's in aurora) and is a sec-high issue for those users, it would be nice to fix this.
Flags: needinfo?(jwalker)
Flags: needinfo?(dcamp)
Comment 6•9 years ago
|
||
Victor and Tom, not sure what area of timeline markers this is, but I think one of you is familiar -- could you check this out or know who'd be appropriate?
Flags: needinfo?(vporof)
Flags: needinfo?(ttromey)
Flags: needinfo?(jwalker)
Flags: needinfo?(dcamp)
Assignee | ||
Comment 7•9 years ago
|
||
Looking at the reported crash location, I find it difficult to come up with theories to
explain what has gone wrong:
http://hg.mozilla.org/releases/mozilla-aurora/annotate/bafa23fb5915/docshell/base/timeline/ObservedDocShell.cpp#l141
The main difficulty I see is that there are previous (dominating) uses of endPayload.
So I would expect the crash to occur earlier if something was wrong with this object.
I looked into a few ideas with no success.
* ObservedDocShell's lifecycle seems correct based on assertions in MarkerStorage
constructor and destructor.
* mTimelineMarkers.AppendElements(Move(mOffTheMainThreadTimelineMarkers));
... does in fact do the right thing and doesn't prematurely destroy objects.
* The whole loop in ObservedDocShell::PopMarkers is complicated but still seems
to handle lifetimes properly
* Some timeline markers don't implement Clone; but these all seem to be safe,
and in any case if this were the failure the crash should occur elsewhere
I think without STR, I'm stuck. Maybe someone else has some insight.
Flags: needinfo?(ttromey)
Comment 8•9 years ago
|
||
There were 400 crashes with this signature on aurora 46 in the last week. I'll try asking on crash-debug.
![]() |
||
Comment 9•9 years ago
|
||
Can anything under ObservedDocShell::PopMarkers call into script?
Comment 10•9 years ago
|
||
On mac, the crash address is 0x0
On windows x86, it's pretty consistently 0xffffffffe5e5e5e5 which probably means 0xe5e5e5e5 which IIRC is our jemalloc poison-on-free marker. I'm loading this up in MSVC now to see what exact memory/instruction we're dereferencing.
![]() |
||
Comment 11•9 years ago
|
||
> On mac, the crash address is 0x0
On Mac (and 64-bit in general, afaict) a crash address of 0x0 is often a lie. :(
Comment 12•9 years ago
|
||
endPayload->AddDetails(aCx, *marker);
10C2FA8B mov eax,dword ptr [esp+1Ch] <-- deref mTimelineMarkers.mHdr?
10C2FA8F mov edx,dword ptr [esp+2Ch] <-- `j`
10C2FA93 mov ecx,dword ptr [edx+eax] <-- *(mTimelineMarkers.mHdr + j)
10C2FA96 mov edx,dword ptr [esp+14h]
10C2FA9A push edx
10C2FA9B push dword ptr [esp+34h]
10C2FA9F mov eax,dword ptr [ecx] <-- crash is here, ECX == 0xe5e5e5e5
10C2FAA1 call dword ptr [eax+0Ch]
We're recomputing mTimelineMarkers.ElementAt(j) here instead of storing it in a register, and so it seems likely that mTimelineMarkers has been modified during the call to startPayload->AddDetails. Even a resize of mTimelineMarkers could cause this.
Assignee | ||
Comment 13•9 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #12)
> We're recomputing mTimelineMarkers.ElementAt(j) here instead of storing it
> in a register, and so it seems likely that mTimelineMarkers has been
> modified during the call to startPayload->AddDetails. Even a resize of
> mTimelineMarkers could cause this.
Thank you.
One possibility might be that
https://dxr.mozilla.org/mozilla-central/source/docshell/base/timeline/JavascriptTimelineMarker.h#57
causes a GC, which in turn emits a new timeline marker.
I'm sure this code was never examined to see whether it was reentrant.
Assignee | ||
Comment 14•9 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #9)
> Can anything under ObservedDocShell::PopMarkers call into script?
I don't believe so.
Assignee | ||
Comment 15•9 years ago
|
||
MozReview-Commit-ID: IllTB7DOdlZ
This approach just prevents any reentrant marker addition.
Comment 16•9 years ago
|
||
Maybe make that a MOZ_RELEASE_ASSERT? There are often surprising ways things can reenter when GC is involved...
Assignee | ||
Comment 17•9 years ago
|
||
Assignee | ||
Comment 18•9 years ago
|
||
MozReview-Commit-ID: IllTB7DOdlZ
Attachment #8725372 -
Attachment is obsolete: true
Assignee | ||
Comment 19•9 years ago
|
||
Comment on attachment 8725717 [details] [diff] [review]
don't push new timeline markers while popping markers
Avoid pushing new markers while popping markers.
Another possible approach would be to empty mTimelineMarkers while
popping, then re-append any saved markers at the end. However
the current patch seemed simpler to reason about to me, and it
also seemed unlikely that markers generated by the timeline machinery
itself would be interesting.
Attachment #8725717 -
Flags: review?(vporof)
Updated•9 years ago
|
Attachment #8725717 -
Flags: review?(vporof) → review+
![]() |
||
Updated•9 years ago
|
Assignee: nobody → ttromey
Comment 21•9 years ago
|
||
Once this lands and we can verify it helps, we should uplift the fix to aurora and beta as soon as possible.
Assignee | ||
Updated•9 years ago
|
Attachment #8728478 -
Flags: review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 23•9 years ago
|
||
Keywords: checkin-needed
Comment 24•9 years ago
|
||
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
Comment 25•9 years ago
|
||
We need this on all the branches, except ESR38, right? (assuming the patch fixes the crash.)
Comment 26•9 years ago
|
||
Comment on attachment 8728478 [details] [diff] [review]
don't push new timeline markers while popping markers
>+ MOZ_RELEASE_ASSERT(!mPopping);
>+ mPopping = true;
>+ auto resetPopping = MakeScopeExit([&] { mPopping = false; });
Btw, we have AutoRestore for this use case. Would be easier to read with that.
Assignee | ||
Comment 27•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #26)
> Btw, we have AutoRestore for this use case. Would be easier to read with
> that.
Thanks. I didn't manage to find this when writing the patch.
I'll file a follow up and fix this.
Updated•9 years ago
|
Group: firefox-core-security → core-security-release
Updated•9 years ago
|
Flags: needinfo?(vporof)
Comment 28•9 years ago
|
||
Can you link the followup bug from this one?
This isn't a topcrash in 46 any more; not even in the top 50 crash signatures. We do still show a few (23 total) crashes in 46 beta 2. I think it is showing up more in aurora/dev edition.
bz and smaug, what do you think about uplift/riskiness?
Flags: needinfo?(ttromey)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bugs)
![]() |
||
Comment 29•9 years ago
|
||
This seems pretty safe to uplift at first glance, but I'm not exactly an expert on this code.
Flags: needinfo?(bzbarsky)
Comment 30•9 years ago
|
||
yeah, should be safe, assuming
AddMarker(UniquePtr<AbstractTimelineMarker>&& aMarker) not actually moving marker to anywhere is safe.
(got to hate C++ Move semantics). Looks like it is safe.
Flags: needinfo?(bugs)
Comment 31•9 years ago
|
||
Let's go with uplifting to aurora, it is not a topcrash and not sec-high so I want to minimize risk for beta. Ritu what do you think?
Assignee | ||
Comment 32•9 years ago
|
||
Followup was bug 1255459. It's not essential.
Is there anything else I need to do here?
Blocks: 1255459
Flags: needinfo?(ttromey)
Hi Tom, could you please nominate uplift to Aurora47?
(p.s.: This is not a top-crasher on 47 either but there have been ~200 instances so it's decently bad to take a fix which I hope helps rather than hurts the situation)
Flags: needinfo?(rkothari) → needinfo?(ttromey)
Assignee | ||
Comment 34•9 years ago
|
||
Comment on attachment 8728478 [details] [diff] [review]
don't push new timeline markers while popping markers
Approval Request Comment
[Feature/regressing bug #]:
This fixes a crasher affecting the timeline, part of the devtools performance feature.
[User impact if declined]:
Occasional crashes when using devtools.
[Describe test coverage new/current, TreeHerder]:
There are no specific tests that trigger this crash. I think it was just too difficult
to trigger the crash reliably, as it required provoking a GC while fetching timeline
data, and having the GC-generated marker cause an internal array to be resized.
[Risks and why]:
I consider this patch to be low risk. It prevents a certain path from re-entering the
timeline marker code.
[String/UUID change made/needed]:
None.
Flags: needinfo?(ttromey)
Attachment #8728478 -
Flags: approval-mozilla-aurora?
Comment on attachment 8728478 [details] [diff] [review]
don't push new timeline markers while popping markers
(Potential) crash fix, Aurora47+
Attachment #8728478 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 36•9 years ago
|
||
Updated•9 years ago
|
Whiteboard: sec-high risk for people who use dev tools to profile → [post-critsmash-triage]sec-high risk for people who use dev tools to profile
Updated•9 years ago
|
Whiteboard: [post-critsmash-triage]sec-high risk for people who use dev tools to profile → [post-critsmash-triage][adv-main47+] sec-high risk for people who use dev tools to profile
Updated•8 years ago
|
Group: core-security-release
Updated•7 years ago
|
Product: Firefox → DevTools
You need to log in
before you can comment on or make changes to this bug.
Description
•