Closed Bug 957006 Opened 10 years ago Closed 10 years ago

Crash in mozilla::Maybe<nsAutoString>::~Maybe() (or sometimes [@ EnterBaseline ], or something else) a minute or so after loading a Wired article in Fennec

Categories

(Core :: DOM: Core & HTML, defect)

27 Branch
All
Android
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 952321
Tracking Status
firefox28 --- affected
firefox29 - affected
fennec 29+ ---

People

(Reporter: kbrosnan, Assigned: djvj)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, reproducible, sec-moderate, Whiteboard: [native-crash])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-ebc8b6e8-7560-4267-b3e1-b0a8b2131231.
=============================================================

CCing Rob Miller and dholbert as they submitted crashes maybe they can expand on what they were doing. URLs are all the Wired website.

http://www.wired.com/wiredenterprise/2014/01/coke-iot/
http://www.wired.com/opinion/2013/01/wiretap-backdoors/
wyciwyg://4/http://www.wired.com/threatlevel/2013/12/better-data-security-nail-polish/
wyciwyg://5/http://www.wired.com/design/2013/12/this-is-what-emoji-look-like-irl/?cid=co16436394#slideid-383861


Does not seem to be device specific. Crashes happen on Android API 15+ (Android 4.0 ICS). Looks to be new to Firefox 29.
The stack is kind of sketchy, but it looks a little more DOM-y to me.  The third frame is mozilla::dom::DocumentBinding::get_readyState.
Component: XPCOM → DOM
Sure, that's the DOMString coming of the stack.

But why should its destructor crash?
From the crash report, it looks like we're hitting a SIGILL at address 0x818d6a20 when trying to call the destructor on a string stored in a Maybe, which sounds memory-corruption-y, but maybe I'm paranoid.
I crash whenever I try to load this URL:
 http://www.wired.com/threatlevel/2013/12/better-data-security-nail-polish/
in Nightly (on Android).  I just tried twice and crashed both times.

The first crash had the signature mentioned here:
 bp-332d798f-3c42-47a5-843f-a0f892140107
and the second had a different signature, "[@ XPC_WN_NoHelper_Resolve ]"
 bp-cc13689a-3a2d-44cf-a201-0e0af2140107

(Probably doesn't matter, but FWIW: in both cases, I loaded the page by simply typing "wired threadlevel" into my awesomebar and tapping the result from my history)
Whiteboard: [native-crash]
Since we are seeing reliable steps we are tracking for now and look forward to seeing more development on this issue.
Daniel - Can you look into this? We're not sure who the owner of this bug should be?
Flags: needinfo?(dholbert)
Here's what I can glean from the stacktrace and looking at code a bit:

 * Level 2 of the backtrace, "mozilla::dom::DocumentBinding::get_readyState", is in autogenerated bindings code. It lives in $OBJ/dom/bindings/DocumentBinding.cpp, and it looks like this:
> static bool
> get_readyState(JSContext* cx, JS::Handle<JSObject*> obj, nsIDocument* self, JSJitGetterCallArgs args)
> {
>   DOMString result;
>   self->GetReadyState(result);
>   if (!xpc::NonVoidStringToJsval(cx, result, args.rval())) {
>     return false;
>   }
>   return true;
> 

* We must be getting to Level 1 of the backtrace, "mozilla::Maybe<nsAutoString>::~Maybe()", due to the local var "DOMString result" going out of scope. (The class DOMString has "Maybe<nsAutoString> mString;" as a member-var.)
 http://mxr.mozilla.org/mozilla-central/source/dom/bindings/DOMString.h#137

Beyond that, I don't really know what this code is doing (and I don't have an android dev/test environment set up currently.)

bz knows a lot about the bindings code and DOMString.h (based on hg blame). bz, do you have any ideas here?
Flags: needinfo?(dholbert) → needinfo?(bzbarsky)
Nope.  See comment 2.  :(

What should happen here is that the GetReadyState() call calls "nsIDocument::GetReadyState(nsAString& aReadyState) const", which will invoke "DOMString::operator nsString&()" to be able to make the call.  This will assert that the Maybe<nsAutoString> is not constructed yet, then call construct() on it and return its .ref(), which is the nsAutoString&.

Then GetReadyState will write to that nsAString&, like so:

  aReadyState.Assign(NS_LITERAL_STRING("loading"));

(or whichever stat it's in).  Then we unwind back to the binding code and call NonVoidStringToJsval.  This lands us in "bool NonVoidStringToJsval(JSContext* cx, mozilla::dom::DOMString& str, JS::MutableHandleValue rval)", which checks HasStringBuffer, which is false in this case, and calls "NonVoidStringToJsval(cx, str.AsAString(), rval)".  That passes the actual nsAutoString& in, which calls the nsAString& overload of NonVoidStringToJsval, and then grabs the string data and sticks it in a JSString, etc.

None of this should lead to a crash in ~Maybe, though.... and I have no idea why this is Android-specific.  :(
Flags: needinfo?(bzbarsky)
OK, I got an Android build environment configured and tried loading this in a debug Fennec build.

I got an abort with this appearing in logcat:

F/MOZ_Assert(22693): Assertion failure: typeAlloc->isArgument(), at /Users/dholbert/builds/fennec/mozilla/js/src/jit/LinearScan.cpp:595
F/libc    (22693): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1), thread 22782 (Analysis Helper)
I/Gecko   (22693): [22693] WARNING: NS_ENSURE_TRUE(siteSpecificUA) failed: file /Users/dholbert/builds/fennec/mozilla/dom/base/Navigator.cpp, line 297
I/DEBUG   (22527): unexpected waitpid response: n=22782, status=00000b00
I/DEBUG   (22527): ptrace detach from 22782 failed: No such process
I/DEBUG   (22527): debuggerd committing suicide to free the zombie!
I/DEBUG   (23008): debuggerd: Dec  4 2013 23:15:41
I/ActivityManager( 1363): Process org.mozilla.fennec_dholbert (pid 22693) has died.
Disregard the assertion in comment 9 -- that's apparently an unrelated recent JS regression that's popping up everywhere. Filed bug 958732 for that.

With that hacked around (so that I can load the wired page), I'm unable to reproduce this in my debug Fennec build, unfortunately. :-/
...and now I can't reproduce it in my Nightly, either.

So either this was fixed over the last several days, or the wired page changed, or this is just mysterious about when it reproduces.

Hopefully it's the first of those options; I guess we can see from watching crash-stats.
Ah! I spoke too soon. A little while (maybe 30 seconds to a minute?) after the last time I loaded the Wired article in Android Nightly, it crashed, though with a different signature: [@ EnterBaseline ]
  bp-f252e034-6ebc-4ba3-b1cb-0264f2140110

But then I loaded it a few more times, and waited, and was unable to crash. So, this is looking a bit finicky in when it reproduces. (if that last crash was even part of this bug)
Still no luck reproducing in a debug build on my phone (Nexus 4), though I hit this in Nightly again:
https://crash-stats.mozilla.com/report/index/d62d4ea5-136b-4fed-bb87-46b012140111

Is there any way I can run my Fennec debug build with my Nightly profile?

(I also tried to reproduce on a few-years-old 10" Samsung Tablet, using standard Firefox Nightly, and was unable to reproduce there.)
(In reply to Daniel Holbert [:dholbert] from comment #13)
> Still no luck reproducing in a debug build on my phone (Nexus 4), though I
> hit this in Nightly again:
> https://crash-stats.mozilla.com/report/index/d62d4ea5-136b-4fed-bb87-
> 46b012140111
> 
> Is there any way I can run my Fennec debug build with my Nightly profile?

Not easily. If your phone is rooted, you can copy the profile folder the the sdcard and then try to run the Fennec debug build with "-P" pointing to the sdcard folder. If you are rooted, we can try to do that via IRC.

I'm curious why you think the profile is key. I might be able to make an add-on to move around profiles too.
(In reply to Mark Finkle (:mfinkle) from comment #14)
> Not easily. If your phone is rooted

(It's not rooted, unfortunately).

> I'm curious why you think the profile is key.

Because I'm only aware (so far) of this issue reproducing on this phone, using Nightly, with my current profile.

If it weren't profile-dependent, I'd expect others to have been able to reproduce this (please chime in if you have been able to), and I'd expect to be able to reproduce it with my own local Fennec build* (but I can't), and I'd expect to be able to reproduce it in Nightly on other devices (but I can't; I've tried 2 other devices -- a 7" tablet and a 10" tablet).

* It's possible that it's formfactor-dependent and opt-dependent, I suppose. On Monday, I'll maybe try a local opt Fennec build on my phone and see if it reproduces with that.
Scratch that -- it's not dependent on my profile at all.

I just installed Nightly on my girlfriend's Android phone, and loaded the Wired testcase, and after about a minute of letting the page sit, it crashed. We restarted Nightly, and after another minute of viewing the article, it crashed again.

(Not sure why I haven't been able to repro on my Android tablets; maybe the content is viewport-size-dependaet, or something. FWIW, those STR -- load the Wired page & let it sit for a minute -- reproduces the bug for me at least 30% of the time on my phone.)

Her crashes didn't have the ~Maybe() signature, though (nor have any of my own crashes from today).

Her crashes:
 bp-55be3ba8-b3b5-4580-9216-04b622140112 [@ EnterBaseline ]
 bp-fa7d7b6c-c4b6-450a-b9d2-710e42140112 [@ xpc::WrapperFactory::WrapForSameCompartment(JSContext*, JS::Handle<JSObject*>) ]

I submitted 7 crash reports from my own phone today from loading the Wired article. Six of those were [@ EnterBaseline ], e.g.  bp-14618933-dd54-447a-bf96-1452c2140112 and bp-42deede3-45db-47cc-8b85-020472140112. And the remaining one was [@ PR_SetThreadPrivate | js::jit::IonContext::~IonContext() ]: bp-c8b1600f-a5e7-4aca-a604-1bb632140112

So, it looks like this has a lot of different signatures, and a lot of them are JS-related. So I'm starting to think this is a bug in the JS engine.

Given that this is reproducible, maybe we should track down when this regressed. (if it's a regression.) mfinkle, do you know if we have any Android-focused QA folks who could help with that?
Flags: needinfo?(mark.finkle)
Summary: crash in mozilla::Maybe<nsAutoString>::~Maybe() → Crash in mozilla::Maybe<nsAutoString>::~Maybe() (or sometimes [@ EnterBaseline ], or something else) a minute or so after loading a Wired article in Fennec
(FWIW: My phone is a Nexus 4 w/ Android 4.4.2, and my girlfriend's is a Moto X running Android 4.4. So, we know it's at least reproducible on those models/versions; probably others as well, though apparently not on tablets as far as I've seen.)
Keywords: reproducible
Not sure if it's helpful to connect this to all the various signatures, but given that EnterBaseline comes up often, let's connect it to that in our stats and analysis. ;-)

That said, given that our stackwalker does not deal well with JIT frames at this time, it's very much possible that we see varying signatures with crashes in JS JIT code.
Crash Signature: [@ mozilla::Maybe<nsAutoString>::~Maybe()] → [@ mozilla::Maybe<nsAutoString>::~Maybe()] [@ EnterBaseline ]
(In reply to Daniel Holbert [:dholbert] from comment #16)

> Given that this is reproducible, maybe we should track down when this
> regressed. (if it's a regression.) mfinkle, do you know if we have any
> Android-focused QA folks who could help with that?

NI'ing Aaron and Kevin. We might be able to get some QA help from them or the Softvision folks.
Flags: needinfo?(mark.finkle)
Flags: needinfo?(kbrosnan)
Flags: needinfo?(aaron.train)
NI'ing Naveed since this now looks like a JS crash
Flags: needinfo?(nihsanullah)
None of the above aforementioned URL's are reproducing the crash for me on my Nexus 4 (Android 4.4.2), HTC One (Android 4.4), nor my Galaxy SIV (Android 4.4) and LG Nexus 5 (Android 4.4) nor my Asus Nexus 7 (2013)
Flags: needinfo?(aaron.train)
I would like to see if bug 957475 resolves this. It is sounding kinda similar.
Flags: needinfo?(kbrosnan)
(In reply to Kevin Brosnan [:kbrosnan] from comment #22)
> I would like to see if bug 957475 resolves this. It is sounding kinda
> similar.

Not sure, as that bug affects 27 and 28 and this one is marked to only affect 29.
tracking-fennec: ? → 29+
Assignee: nobody → nicolas.b.pierron
Flags: needinfo?(nihsanullah)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #23)
> (In reply to Kevin Brosnan [:kbrosnan] from comment #22)
> > I would like to see if bug 957475 resolves this. It is sounding kinda
> > similar.
> 
> Not sure, as that bug affects 27 and 28 and this one is marked to only
> affect 29.

FWIW: I installed Aurora on my phone and got it to crash once, after a few cycles of [load wired URL from comment 4, wait a minute]. Crash report: bp-a6decb48-cd31-40c7-b339-729c22140117

(I tried to get it to crash again, but couldn't, so this might be less reproducible on Aurora.)
(In reply to Daniel Holbert [:dholbert] from comment #25)
> FWIW: I installed Aurora on my phone and got it to crash once, after a few
> cycles of [load wired URL from comment 4, wait a minute]. Crash report:
> bp-a6decb48-cd31-40c7-b339-729c22140117

Eric, do you have any idea what might have cause the crash reported in this crash report?
I would not be able to investigate before Monday.
Flags: needinfo?(efaustbmo)
I looked at this briefly on Friday. I couldn't glean that much from the crashdump.

Here's what I was able to reconstruct:

We are crashing in GetObjectClass() in jsfriendapi.h. This is called from JSObject::is<ProxyObject>(), invoked on line 1508 of IonCaches.cpp in tryAttachProxy().

The actual dereference that crashes is the deref of the TypeObject to get the class.

The assembly the runs (very rougly paraphrased and pared):

reg1 <-- arg
reg1 <-- *(reg1 + 4) /* succeeds, but yields garbage (0xe000df40)
reg2 <-- *(reg1) /* kaboom */

I suspect a bad GC interaction. Use after free, maybe? The object pointer is good enough to dereference once, but contains a bad value, and this is the first time we try to inspect anything going on inside the object in the cache.

If the values inside were bogus, but readable, that would explain how it failed all the shape guards on the Ic to get to the fallback path safely, as well.

I do not own (or have access to for the next few days) a phone with which I could debug this further.

Nicolas, does that give you enough to spring forward with?
Flags: needinfo?(efaustbmo)
I tried to reproduce it locally with the 5 URLs listed above[1] on an Unagi without any crash.

[1] http://people.mozilla.org/~npierron/shortcuts.html
Flags: needinfo?(nicolas.b.pierron)
This was reported against Firefox for Android. Not Firefox OS.
(In reply to Kevin Brosnan [:kbrosnan] from comment #31)
> This was reported against Firefox for Android. Not Firefox OS.

I noticed, but:
 - I do not have any Android device to test on.
 - The JS engine is the same on Firefox for Android and Firefox OS.
 - I am reporting a fact.

The only things which differ between the 2 architectures are:
 - the GC settings.
 - the user agent.
 - the number of cores (?)

And other differences are not going to matter, unless this is not a JS Engine issue.
Reproduces rather quickly on Firefox for Android Nexus 5, are you able to get a device from Paris IT or someone else in the office? Else we should get a device to you.
(In reply to Kevin Brosnan [:kbrosnan] from comment #33)
> Reproduces rather quickly on Firefox for Android Nexus 5, are you able to
> get a device from Paris IT or someone else in the office? Else we should get
> a device to you.

Kannan, can you reproduce and do you know how to debug/investigate Firefox for Android issues?
Flags: needinfo?(kvijayan)
I have a nexus 5 with firefox on it.  I'll try to repro it locally.
Flags: needinfo?(kvijayan)
(In reply to Kannan Vijayan [:djvj] from comment #35)
> I have a nexus 5 with firefox on it.  I'll try to repro it locally.

Were you able to reproduce this bug?
Should I assign it to you?
Flags: needinfo?(kvijayan)
Assignee: nicolas.b.pierron → kvijayan
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(kvijayan)
Reproduces for me.  Now I just need to figure out how to GDB my phone.  I suspect because it's not rooted.  I'd prefer not to root my phone.
Got a stack trace, from a Feb 01 build (doesn't reproduce on trunk, so I just went back about a month):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 17982]
0x82ba05b0 in js::jit::TypedOrValueRegister::dataValue (this=0x77fde3f8)
    at /home/kvijayan/Checkouts/mozilla-inbound/js/src/../../js/src/jit/RegisterSets.h:171
171             JS_ASSERT(hasValue());
(gdb) bt 10
#0  0x82ba05b0 in js::jit::TypedOrValueRegister::dataValue (this=0x77fde3f8)
    at /home/kvijayan/Checkouts/mozilla-inbound/js/src/../../js/src/jit/RegisterSets.h:171
#1  0x82ba0700 in js::jit::TypedOrValueRegister::valueReg (this=0x77fde3f8)
    at /home/kvijayan/Checkouts/mozilla-inbound/js/src/../../js/src/jit/RegisterSets.h:210
#2  0x82c3852c in js::jit::GetPropertyIC::tryAttachDOMProxyUnshadowed (this=0x88c23090,
    cx=0x8bf2a540, ion=0x88c23000, obj=..., name=..., resetNeeded=true, returnAddr=0x8c574e14,
    emitted=0x77fde54b)
    at /home/kvijayan/Checkouts/mozilla-inbound/js/src/../../js/src/jit/IonCaches.cpp:1455
#3  0x82c38c16 in js::jit::GetPropertyIC::tryAttachProxy (this=0x88c23090, cx=0x8bf2a540,
    ion=0x88c23000, obj=..., name=..., returnAddr=0x8c574e14, emitted=0x77fde54b)
    at /home/kvijayan/Checkouts/mozilla-inbound/js/src/../../js/src/jit/IonCaches.cpp:1521
#4  0x82c39976 in js::jit::GetPropertyIC::tryAttachStub (this=0x88c23090, cx=0x8bf2a540,
    ion=0x88c23000, obj=..., name=..., returnAddr=0x8c574e14, emitted=0x77fde54b)
    at /home/kvijayan/Checkouts/mozilla-inbound/js/src/../../js/src/jit/IonCaches.cpp:1666
#5  0x82c39b6a in js::jit::GetPropertyIC::update (cx=0x8bf2a540, cacheIndex=0, obj=..., vp=...)
    at /home/kvijayan/Checkouts/mozilla-inbound/js/src/../../js/src/jit/IonCaches.cpp:1702
#6  0x86233d18 in ?? ()
#7  0x86233d18 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
Group: core-security
Comment 3 is private: false
I'm reasonably sure the scratch reg that gets pulled up here is going to come from random memory, since the |payload| register is the second field in ValueOperand, and |isFloat_| is the second field in AnyRegister.

There's a remote possiblity this could be controlled, by engineering regalloc to put a critical value into the clobbered register, and then controlling the clobbering to get access to raw memory.  Hard to pull off, but there's definitely a shadow of an opening for that kind of attack.

Marking sec-moderate because although attack vector is tricky to exploit, the consequences of a successful exploit are arbitrary memory access.
Whiteboard: [native-crash] → [native-crash] [sec-moderate]
This fixes the crash on my setup (tried three runs, all loaded fine).  Can someone apply this patch to a broken build and confirm that it fixes the crash?

I'm using the URL from Comment 4 to test.
Attachment #8378633 - Flags: review?(efaustbmo)
sec-moderate is a keyword not a whiteboard tag.
Keywords: sec-moderate
Whiteboard: [native-crash] [sec-moderate] → [native-crash]
That failure looks an awful lot like bug 952321.  See bug 952321 comment 35 and following...
(In reply to Boris Zbarsky [:bz] (reviews will be slow; ask someone else) from comment #42)
> That failure looks an awful lot like bug 952321.  See bug 952321 comment 35
> and following...

In fact, I think the patch that landed there should obviate the one attached here.

If not, let me know.
Confirmed dup.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Comment on attachment 8378633 [details] [diff] [review]
fix-bug-957006.patch

Cancelling review, as bug marked a dupe.
Attachment #8378633 - Flags: review?(efaustbmo)
Keywords: qawanted
Group: core-security → core-security-release
Group: core-security-release
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: