Closed Bug 1245743 Opened 8 years ago Closed 8 years ago

crash in mozilla::ObservedDocShell::PopMarkers

Categories

(DevTools :: Performance Tools (Profiler/Timeline), defect, P2)

45 Branch
defect

Tracking

(firefox45 wontfix, firefox46+ wontfix, firefox47+ fixed, firefox48 fixed)

RESOLVED FIXED
Firefox 48
Tracking Status
firefox45 --- wontfix
firefox46 + wontfix
firefox47 + fixed
firefox48 --- fixed

People

(Reporter: MatsPalmgren_bugz, Assigned: tromey)

References

Details

(Keywords: crash, csectype-uaf, sec-moderate, Whiteboard: [post-critsmash-triage][adv-main47+] sec-high risk for people who use dev tools to profile)

Crash Data

Attachments

(1 file, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-34e40358-b92d-4d0d-b3bc-7d8ea2160203.
=============================================================

This signature looks like a regression in v45.  Relatively low volume there,
but it seems to be higher volume in v46/47.  Crashes on all platforms.

Product 	Version Percentage 	Number Of Crashes
Firefox 	46.0a2 	84.36% 	151
Firefox 	47.0a1 	8.94% 	16
Firefox 	46.0a1 	3.35% 	6
Firefox 	45.0a2 	3.35% 	6

Filing as security sensitive since the address indicates UAF:
Crash Reason 	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address 	0xffffffffe5e5e5e5

More Reports:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3AObservedDocShell%3A%3APopMarkers
This is the #3 topcrash in 46.  Tracking for 46 and 47.
It'd be interesting to see if this affects only webdevs using the performance tool. That code should never be executed otherwise.
There's
> mozilla::SamplerStackFrameRAII::SamplerStackFrameRAII(char const*, js::ProfileEntry::Category)
in those stacks, so I would assume this happens while the profiler is running, so for people using the performance tool.
Priority: -- → P2
Whiteboard: sec-high risk for people who use dev tools to profile
This started (to spike) in 46.0a2 Dev Edition with the 2016-02-05 build, and over the last week is the top crash of 46 Dev Edition.
Joe or Dave can you help find someone to work on this issue? Since it's the top crash in aurora for 46 (and seems likely to be for 47 once it's in aurora) and is a sec-high issue for those users, it would be nice to fix this.
Flags: needinfo?(jwalker)
Flags: needinfo?(dcamp)
Victor and Tom, not sure what area of timeline markers this is, but I think one of you is familiar -- could you check this out or know who'd be appropriate?
Flags: needinfo?(vporof)
Flags: needinfo?(ttromey)
Flags: needinfo?(jwalker)
Flags: needinfo?(dcamp)
Looking at the reported crash location, I find it difficult to come up with theories to
explain what has gone wrong:

http://hg.mozilla.org/releases/mozilla-aurora/annotate/bafa23fb5915/docshell/base/timeline/ObservedDocShell.cpp#l141

The main difficulty I see is that there are previous (dominating) uses of endPayload.
So I would expect the crash to occur earlier if something was wrong with this object.

I looked into a few ideas with no success.

* ObservedDocShell's lifecycle seems correct based on assertions in MarkerStorage
  constructor and destructor.
*   mTimelineMarkers.AppendElements(Move(mOffTheMainThreadTimelineMarkers));
  ... does in fact do the right thing and doesn't prematurely destroy objects.
* The whole loop in ObservedDocShell::PopMarkers is complicated but still seems
  to handle lifetimes properly
* Some timeline markers don't implement Clone; but these all seem to be safe,
  and in any case if this were the failure the crash should occur elsewhere


I think without STR, I'm stuck.  Maybe someone else has some insight.
Flags: needinfo?(ttromey)
There were 400 crashes with this signature on aurora 46 in the last week. I'll try asking on crash-debug.
Can anything under ObservedDocShell::PopMarkers call into script?
On mac, the crash address is 0x0
On windows x86, it's pretty consistently 0xffffffffe5e5e5e5 which probably means 0xe5e5e5e5 which IIRC is our jemalloc poison-on-free marker. I'm loading this up in MSVC now to see what exact memory/instruction we're dereferencing.
> On mac, the crash address is 0x0

On Mac (and 64-bit in general, afaict) a crash address of 0x0 is often a lie.  :(
            endPayload->AddDetails(aCx, *marker);
10C2FA8B  mov         eax,dword ptr [esp+1Ch]  <-- deref mTimelineMarkers.mHdr?
10C2FA8F  mov         edx,dword ptr [esp+2Ch]  <-- `j`
10C2FA93  mov         ecx,dword ptr [edx+eax]  <-- *(mTimelineMarkers.mHdr + j)
10C2FA96  mov         edx,dword ptr [esp+14h]
10C2FA9A  push        edx  
10C2FA9B  push        dword ptr [esp+34h]  
10C2FA9F  mov         eax,dword ptr [ecx]  <-- crash is here, ECX == 0xe5e5e5e5
10C2FAA1  call        dword ptr [eax+0Ch]  

We're recomputing mTimelineMarkers.ElementAt(j) here instead of storing it in a register, and so it seems likely that mTimelineMarkers has been modified during the call to startPayload->AddDetails. Even a resize of mTimelineMarkers could cause this.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #12)

> We're recomputing mTimelineMarkers.ElementAt(j) here instead of storing it
> in a register, and so it seems likely that mTimelineMarkers has been
> modified during the call to startPayload->AddDetails. Even a resize of
> mTimelineMarkers could cause this.

Thank you.

One possibility might be that 

https://dxr.mozilla.org/mozilla-central/source/docshell/base/timeline/JavascriptTimelineMarker.h#57

causes a GC, which in turn emits a new timeline marker.

I'm sure this code was never examined to see whether it was reentrant.
(In reply to Boris Zbarsky [:bz] from comment #9)
> Can anything under ObservedDocShell::PopMarkers call into script?

I don't believe so.
MozReview-Commit-ID: IllTB7DOdlZ

This approach just prevents any reentrant marker addition.
Maybe make that a MOZ_RELEASE_ASSERT? There are often surprising ways things can reenter when GC is involved...
MozReview-Commit-ID: IllTB7DOdlZ
Attachment #8725372 - Attachment is obsolete: true
Comment on attachment 8725717 [details] [diff] [review]
don't push new timeline markers while popping markers

Avoid pushing new markers while popping markers.

Another possible approach would be to empty mTimelineMarkers while
popping, then re-append any saved markers at the end.  However
the current patch seemed simpler to reason about to me, and it
also seemed unlikely that markers generated by the timeline machinery
itself would be interesting.
Attachment #8725717 - Flags: review?(vporof)
Attachment #8725717 - Flags: review?(vporof) → review+
Assignee: nobody → ttromey
Once this lands and we can verify it helps, we should uplift the fix to aurora and beta as soon as possible.
Rebased.
Attachment #8725717 - Attachment is obsolete: true
Attachment #8728478 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/ea934f9d9700
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
We need this on all the branches, except ESR38, right? (assuming the patch fixes the crash.)
Comment on attachment 8728478 [details] [diff] [review]
don't push new timeline markers while popping markers

>+  MOZ_RELEASE_ASSERT(!mPopping);
>+  mPopping = true;
>+  auto resetPopping = MakeScopeExit([&] { mPopping = false; });

Btw, we have AutoRestore for this use case. Would be easier to read with that.
(In reply to Olli Pettay [:smaug] from comment #26)

> Btw, we have AutoRestore for this use case. Would be easier to read with
> that.

Thanks.  I didn't manage to find this when writing the patch.
I'll file a follow up and fix this.
Group: firefox-core-security → core-security-release
Flags: needinfo?(vporof)
Can you link the followup bug from this one? 

This isn't a topcrash in 46 any more; not even in the top 50 crash signatures. We do still show a few (23 total) crashes in 46 beta 2. I think it is showing up more in aurora/dev edition. 

bz and smaug, what do you think about uplift/riskiness?
Flags: needinfo?(ttromey)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bugs)
This seems pretty safe to uplift at first glance, but I'm not exactly an expert on this code.
Flags: needinfo?(bzbarsky)
yeah, should be safe, assuming
AddMarker(UniquePtr<AbstractTimelineMarker>&& aMarker) not actually moving marker to anywhere is safe.
(got to hate C++ Move semantics). Looks like it is safe.
Flags: needinfo?(bugs)
Let's go with uplifting to aurora, it is not a topcrash and not sec-high so I want to minimize risk for beta. Ritu what do you think?
Flags: needinfo?(rkothari)
Followup was bug 1255459.  It's not essential.
Is there anything else I need to do here?
Blocks: 1255459
Flags: needinfo?(ttromey)
Hi Tom, could you please nominate uplift to Aurora47? 

(p.s.: This is not a top-crasher on 47 either but there have been ~200 instances so it's decently bad to take a fix which I hope helps rather than hurts the situation)
Flags: needinfo?(rkothari) → needinfo?(ttromey)
Comment on attachment 8728478 [details] [diff] [review]
don't push new timeline markers while popping markers

Approval Request Comment
[Feature/regressing bug #]:
This fixes a crasher affecting the timeline, part of the devtools performance feature.

[User impact if declined]:
Occasional crashes when using devtools.

[Describe test coverage new/current, TreeHerder]:
There are no specific tests that trigger this crash.  I think it was just too difficult
to trigger the crash reliably, as it required provoking a GC while fetching timeline
data, and having the GC-generated marker cause an internal array to be resized.

[Risks and why]: 
I consider this patch to be low risk.  It prevents a certain path from re-entering the
timeline marker code.

[String/UUID change made/needed]:
None.
Flags: needinfo?(ttromey)
Attachment #8728478 - Flags: approval-mozilla-aurora?
Comment on attachment 8728478 [details] [diff] [review]
don't push new timeline markers while popping markers

(Potential) crash fix, Aurora47+
Attachment #8728478 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: sec-high risk for people who use dev tools to profile → [post-critsmash-triage]sec-high risk for people who use dev tools to profile
Whiteboard: [post-critsmash-triage]sec-high risk for people who use dev tools to profile → [post-critsmash-triage][adv-main47+] sec-high risk for people who use dev tools to profile
Group: core-security-release
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: