Closed Bug 1009675 Opened 10 years ago Closed 10 years ago

Crash when commenting on github.com [@ mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>]

Categories

(Core :: DOM: Core & HTML, defect)

31 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
firefox30 --- wontfix
firefox31 + verified
firefox32 + verified
firefox33 + verified
firefox-esr24 --- unaffected
b2g-v1.3 --- unaffected
b2g-v1.3T --- unaffected
b2g-v1.4 --- unaffected
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: azakai, Assigned: bzbarsky)

References

(Blocks 1 open bug, )

Details

(5 keywords, Whiteboard: [GGC][testcase in comment 57][adv-main31+])

Crash Data

Attachments

(8 files, 3 obsolete files)

8.19 KB, patch
abillings
: sec-approval+
Details | Diff | Splinter Review
97.49 KB, patch
Details | Diff | Splinter Review
56.50 KB, patch
Details | Diff | Splinter Review
97.33 KB, patch
Details | Diff | Splinter Review
5.66 KB, patch
peterv
: review+
Details | Diff | Splinter Review
1.99 KB, patch
peterv
: review+
Details | Diff | Splinter Review
95.63 KB, patch
Details | Diff | Splinter Review
51.78 KB, patch
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is 
report bp-0e3f20b3-f50e-401f-b1a9-c2c312140513.
=============================================================

I've seen this twice in an hour today, both times when clicking to respond on a github issue or pull request.

Here's the other report: https://crash-stats.mozilla.com/report/index/d6a754ad-37f8-4ab9-bd2c-f96b22140513
Those are both null-derefs at line 603 of jsfriendapi.h:

600 inline const js::Class *
601 GetObjectClass(JSObject *obj)
602 {
603     return reinterpret_cast<const shadow::Object*>(obj)->type->clasp;
604 }

sadly the bits that would tell me who called GetObjectClass are missing from the stack, because of inlining.  :(

The caller we do have is Promise::RunTask doing callbacks[i]->Call(cx, value), and I guess breakpad thinks this is calling WrapperPromiseCallback::Call.

I _think_ that the AnyCallback::Call that calls doesn't do any GetObjectClass calls in its inline part, but I'm not sure.  WrapperPromiseCallback::Call definitely does GetObjectClass via UNWRAP_OBJECT, but we're checking isObject() before doing that...

Catching this in a debug build that would give us more stack info would be really helpful.
Happening on OS X, too. I merged a pull request on GitHub, and, although it was successful, Nightly (2014-05-13) crashed with this signature. Would a regression window be helpful?
OS: Linux → All
This crash signature first appeared on 2014-02-07 and is the #18 crash for Firefox 31.0a2 with 72 out of 8660 crashes in the last week. 

From the comments and urls in the crash reports, it appears to commonly happen during pull requests or merges on github. 


Crashing thread:
0 	XUL 	mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>) 	obj-firefox/x86_64/dist/include/jsfriendapi.h
1 	XUL 	mozilla::dom::Promise::RunTask() 	dom/promise/Promise.cpp
2 	XUL 	mozilla::dom::PromiseResolverTask::Run() 	dom/promise/Promise.cpp
3 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
4 	XUL 	NS_ProcessPendingEvents(nsIThread*, unsigned int) 	xpcom/glue/nsThreadUtils.cpp
5 	XUL 	nsBaseAppShell::NativeEventCallback() 	widget/xpwidgets/nsBaseAppShell.cpp
6 	XUL 	nsAppShell::ProcessGeckoEvents(void*) 	widget/cocoa/nsAppShell.mm
7 	CoreFoundation 	CoreFoundation@0x12b31 	
8 	CoreFoundation 	CoreFoundation@0x12455 	
9 	CoreFoundation 	CoreFoundation@0x357f5 	
10 	AppKit 	AppKit@0x119b37
Keywords: topcrash
Interestingly, none of the crashes are on Windows.  All are Mac or Linux...  That's pretty odd.

Logan, a regression range would definitely be helpful.
I can't seem to reproduce the issue, unfortunately. It was a very specific situation for me (accepting a pull request). Commenting on issues doesn't appear to make it crash.
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 ID:20140511004003 CSet: b122b633fe71

I also hit this crash on github when I clicked the middle-button of my mouse onto the comment field:
bp-88979999-6ad3-48ad-89ef-940b62140515

A lot of the comments on crashstats actually talk about github.
Crash Signature: [@ mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>)] → [@ mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>)]
I may have been wrong with my last comment. It actually crashed when I clicked the Submit button to post my comment. I hit one more crash. I will try to figure out a pattern to get it crashing reproducible.
(In reply to Henrik Skupin (:whimboo) from comment #8)
> I may have been wrong with my last comment. It actually crashed when I
> clicked the Submit button to post my comment. I hit one more crash. I will
> try to figure out a pattern to get it crashing reproducible.

Unable to reproduce this on 31a2
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0
ID: 20140429004000
I'm going through the list of top crashers on nightly 32a1, and this is currently in the top 10, happening mostly on OS X and Linux (there were at least a couple of reports on Windows). Most of the comments in the reports have to do with interactions in GitHub, like commenting on GitHub; pushing a Git commit; clicking on various anchor tag links on github.com.

In crash stats, the column corresponding to "First Appearance" says that this started happening on 32a1 around 5/2, but it looks like it started earlier by looking at the reports in 31a2.

0 	XUL 	mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>) 	obj-firefox/x86_64/dist/include/jsfriendapi.h
1 	XUL 	mozilla::dom::Promise::RunTask() 	dom/promise/Promise.cpp
2 	XUL 	mozilla::dom::PromiseResolverTask::Run() 	dom/promise/Promise.cpp
3 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
4 	XUL 	NS_ProcessPendingEvents(nsIThread*, unsigned int) 	xpcom/glue/nsThreadUtils.cpp
5 	XUL 	nsBaseAppShell::NativeEventCallback() 	widget/xpwidgets/nsBaseAppShell.cpp
6 	XUL 	nsAppShell::ProcessGeckoEvents(void*) 	widget/cocoa/nsAppShell.mm
7 	CoreFoundation 	CoreFoundation@0x7f731 	
8 	CoreFoundation 	CoreFoundation@0x70ea2 	
9 	CoreFoundation 	CoreFoundation@0x7062f 	
10 	libnss3.dylib 	PR_GetCurrentThread 	
11 	XUL 	XPCJSContextStack::Pop() 	obj-firefox/x86_64/dist/include/nsTArray-inl.h
12 	CoreFoundation 	CoreFoundation@0x700b5 	
13 	HIToolbox 	HIToolbox@0x2ea0d 	
14 	HIToolbox 	HIToolbox@0x2e685 	
15 	HIToolbox 	HIToolbox@0x2e5bc 	
16 	AppKit 	AppKit@0x243de 	
17 	AppKit 	AppKit@0xa506d8 	
18 	AppKit 	AppKit@0xa50795 	
19 	AppKit 	AppKit@0xa50779 	
20 	AppKit 	AppKit@0xa5075e 	
21 	libobjc.A.dylib 	libobjc.A.dylib@0x1d3fe 	
22 	libobjc.A.dylib 	libobjc.A.dylib@0x51c4 	
23 	XUL 	nsAutoRetainCocoaObject::~nsAutoRetainCocoaObject() 	widget/cocoa/nsCocoaUtils.h
24 	AppKit 	AppKit@0x23a2b 	
25 	CoreFoundation 	CoreFoundation@0x63abb 	
26 	CoreFoundation 	CoreFoundation@0x78fd2 	
27 	libobjc.A.dylib 	libobjc.A.dylib@0x5080 	
28 	AppKit 	AppKit@0x23947 	
29 	AppKit 	AppKit@0x3bb703 	
30 	AppKit 	AppKit@0x3bb6d0 	
31 	AppKit 	AppKit@0x3bb6f0 	
32 	XUL 	-[ToolbarWindow sendEvent:] 	widget/cocoa/nsCocoaWindow.mm
33 	XUL 	-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 	widget/cocoa/nsAppShell.mm
34 	AppKit 	AppKit@0x17b2c 	
35 	AppKit 	AppKit@0xa50779 	
36 	AppKit 	AppKit@0xa50795 	
37 	AppKit 	AppKit@0xa5078f 	
38 	XUL 	nsAppShell::Run() 	widget/cocoa/nsAppShell.mm
39 	XUL 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp
40 	XUL 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp
41 	XUL 	XREMain::XRE_main(int, char**, nsXREAppData const*) 	toolkit/xre/nsAppRunner.cpp
42 	XUL 	XRE_main 	toolkit/xre/nsAppRunner.cpp
43 	firefox 	main 	browser/app/nsBrowserApp.cpp
44 	firefox 	start
Version: 24 Branch → 32 Branch
Just hit this: https://crash-stats.mozilla.com/report/index/f17f2d50-964d-4083-a2e6-1369c2140516 

I was submitting a comment on a GitHub pull request
All of that has the issues from comment 1.  We need a stack from a debug build or info on regression range or something more to go on.
I created a Mozmill script earlier today to create comments and close/reopen issues via automation. But even after a couple of minutes no crash happened. Not sure about per-requisits we might have to setup first.
Maybe it's unrelated, but this happened after I referenced an issue in the comment box.
What I tried so far on an issue is:

* referencing another issue
* referencing a user
* setting/removing labels
* open/close the issue
* close the issue with adding a comment

Nothing broke it so far. When I re-read the last comments I wonder if that is more PR related.
Top crash, tracking!
Crash Signature: [@ mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>)] → [@ mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>)] [@ mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>) ]
I just saw this commenting on a pull request. [0]  As soon as I click submit, FF crashes.  I just submitted a second comment and did not experience a crash, so there must be more to it.

[0] https://github.com/mozilla/calculator/pull/71/files#r12855957
I'd really appreciate it if people who use github regularly did that with debug builds for a while here.  A crash in a debug build might give use some information about what's actually going on here...
Noob question; I just need to add to my .mozconfig:

ac_add_options --enable-debug-symbols
export MOZ_DEBUG_SYMBOLS=1 
export CFLAGS="-gdwarf-2"
export CXXFLAGS="-gdwarf-2"

and run a build, and paste the link to the crash report here?
nvm, got all set up with a debug build.
haven't had any luck trying to reproduce.
Sorry but I cannot use a debug version of Aurora with that profile. I get segfaults one after the other. I may have to file another bug, and investigate first.
Henrik, are you on Linux and not setting the JS_DISABLE_SLOW_SCRIPT_SIGNALS environment variable?  See https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_with_gdb#I_keep_getting_a_SIG32_in_the_debugger._How_do_I_fix_it.3F
Looks like that this wasn't the problem. I still get lots of those SIGSEVs, something similar to:

Assertion failure: offset < length(), at /mozilla/code/firefox/aurora/js/src/jsscript.h:942

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff2ea094c in offsetToPC (offset=<optimized out>, this=<optimized out>) at /mozilla/code/firefox/aurora/js/src/jsscript.h:942
942	        JS_ASSERT(offset < length());

Any debug symbols seem to have been optimized out, so stacks I get contain 'libxul.so +0x02F0ED67' like lines. That's with debug builds from ftp.m.o, and self-build debug builds.
I hit a lot of SIGSEVs when running Firefox. But I finally hit this crash with a debug build of Aurora. Sadly I hit 'c' in the debugger, so no way to further debug it. Boris, please let me know which commands I have to issue to give you more details.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff292b8ef in mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>) () from /mozilla/bin/aurora/libxul.so
(gdb) bt
#0  0x00007ffff292b8ef in mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>) () from /mozilla/bin/aurora/libxul.so
#1  0x00007ffff292985d in mozilla::dom::Promise::RunTask() () from /mozilla/bin/aurora/libxul.so
#2  0x00007ffff292afc0 in mozilla::dom::PromiseResolverMixin::RunInternal() () from /mozilla/bin/aurora/libxul.so
#3  0x00007ffff292afda in mozilla::dom::PromiseResolverTask::Run() () from /mozilla/bin/aurora/libxul.so
#4  0x00007ffff1bcff41 in nsThread::ProcessNextEvent(bool, bool*) () from /mozilla/bin/aurora/libxul.so
#5  0x00007ffff1bba4a9 in NS_ProcessNextEvent(nsIThread*, bool) () from /mozilla/bin/aurora/libxul.so
#6  0x00007ffff1be2e3b in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) () from /mozilla/bin/aurora/libxul.so
#7  0x00007ffff203fcb3 in MessageLoop::Run() () from /mozilla/bin/aurora/libxul.so
#8  0x00007ffff27319f5 in nsBaseAppShell::Run() () from /mozilla/bin/aurora/libxul.so
#9  0x00007ffff301e091 in nsAppStartup::Run() () from /mozilla/bin/aurora/libxul.so
#10 0x00007ffff2fea98a in XREMain::XRE_mainRun() () from /mozilla/bin/aurora/libxul.so
#11 0x00007ffff2fed069 in XREMain::XRE_main(int, char**, nsXREAppData const*) () from /mozilla/bin/aurora/libxul.so
#12 0x00007ffff2fed2e8 in XRE_main () from /mozilla/bin/aurora/libxul.so
#13 0x00000000004063bb in do_main(int, char**, nsIFile*) ()
#14 0x0000000000403834 in main ()
Henrik, that backtrace doesn't have any filenames or line numbers.  :(

In fact, this stack is identical to the breakpad stacks, but breakpad thinks the actual crash is in jsfriendapi... Are you using an --enable-debug _and_ --enable-optimize build or something?  :(
Oh, and as far as what to look for, I'd like to know which thing here is actually null and how it got into the code that crashes.
Yeah, that's what I was wondering too. Optimizations are disabled. So I talked with Glandium on IRC, and we setup a better mozconfig for me now. Once the build is ready I will continue.
Summary: crash in mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>) → Crash when commenting on github.com [@ mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>]
Ok, I was able to fix my constant SIGSEVs by putting 'handle SIG32 noprint' in my .gdbinit file. Lets hope the next crash doesn't take that long, given that using a debug build is absolutely slow on my machine and causes lot of hangs.
This is the #9 topcrash for Firefox 31 right now, with 99/4154 crashes in the last 7 days. There don't appear to be crashes with this signature for other versions of Firefox. 77 of the 99 crashes are in Mac OSX 10.9 - maybe that will help reproduce the bug.
I've been trying to reproduce on Mac for weeks now; using debug builds for all my GitHub interactions.  But I just don't use github that much.  :(
For what it's worth, I have a workable (and portable) Mac ASan build at http://people.mozilla.org/~stmichaud/bmo/firefox-asan.dmg.  It's a universal build, though only the 64-bit component actually uses ASan.

But we've never had portable ASan builds before -- even semi-official ones like mine.  So there's probably a large backlog of bugs waiting to be discovered, which may interfere with diagnosing this one.  (One was discovered only the day after I posted my build -- bug 1021612.)  But it may be worth a try.

I will keep updating my Mac ASan build to current m-c code, roughly once a week.
> I will keep updating my Mac ASan build to current m-c code, roughly once a week.

I'll also fix LLVM bugs (locally) that interfere with its usability on the Mac.  I've already fixed one.
BTW, pretty sure this is not *just* GitHub. I forgot when it crashed for me, but I'm pretty sure it wasn't while I was using GitHub.
I get it, and I really only use Linux. I have had it without github as well. Non debug build (from source)
> I get it, and I really only use Linux.

Then you might want to try one of the official Linux ASan builds, here:
https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-asan/
> Non debug build (from source)

Were you able to get a better stack than the one breakpad gives here?

Note that I'm 99% sure this is not an issue ASan will help with: we have a null deref, not a bogus-memory deref...
I should attach gdb to it. Will do that.
If you're going to attach gdb, don't use a debug build -- debug builds have their symbols stripped.  Use a "regular" (non-debug) m-c nightly instead (which doesn't have its symbols stripped, because it needs to support the Gecko profiler).

Gdb, like ASan builds, is much more likely to give you a good (complete) stack than Breakpad.

Also, I'm not yet convinced these are null-dereferences -- see bug 1018360.  But I'm currently checking.
The Windows crash addresses are 0x2b2b2b2b2b2b2b2b ("++++++++") or 0x4b4b4b4b4b4b4b4b ("KKKKKKKK").  I think that's reason enough to mark this bug security sensitive.

Those who have permission to download minidumps can do that and then run minidump-stackwalk on them.  The latest version of minidump-stackwalk displays the entire "thread state" (the values of all the user-level registers) for the crashing thread.

All the minidumps I looked at had either

rax = 0x2b2b2b2b2b2b2b2b

or

rax = 0x4b4b4b4b4b4b4b4b

These were the only register values > 0x7fffffffffffffff -- so the crashes must have happened dereferencing the "address" in rax.

Here's how to download and build the latest minidump-stackwalk (on OS X):

1) Install a reasonably recent XCode (I have 4.5.2) and the latest XCode commandline tools.
2) Visit https://github.com/mozilla/socorro and click "download zip".
3) CC=clang CXX=clang++ make breakpad
4) CC="clang -Wno-switch" CXX="clang++ -Wno-switch" make stackwalker
Group: core-security
> These were the only register values > 0x7fffffffffffffff

I meant to say:

These were the only register values > 0x7fffffffffff
Needless to say, people should start testing with ASan builds.
At least for the 2014-06-06-03-02-06-mozilla-central nightly on OS X, the crash address is also 0x2b2b2b2b2b2b2b2b ("++++++++").

I loaded XUL from this m-c nightly into a disassembler (Hopper Disassembler, http://www.hopperapp.com/), then found the following assembly code at the address where these crashes happen in that copy of XUL (offset 0x1370818):

0000000001370809 488D8538FEFFFF  lea  rax, qword [ss:rbp-0x2a0+var_216]
0000000001370810 49894718        mov  qword [ds:r15+0x18], rax
0000000001370814 488B4308        mov  rax, qword [ds:rbx+0x8]
0000000001370818 488B00          mov  rax, qword [ds:rax]
000000000137081b 8B4808          mov  ecx, dword [ds:rax+0x8]

Then I downloaded minidumps corresponding to several of this bug's crashes in this particular m-c nightly and ran the latest minidump-stackwalk on them.  They all showed the following as part of the crashing thread's crash state:

rax = 0x2b2b2b2b2b2b2b2b
0x2b2b2b2b2b2b2b2b and 0x4b4b4b4b4b4b4b4b could conceivably be values used for "poisoning" memory that's been freed.  But I thought we always used 0x5a5a5a5a5a5a5a5a for memory poisoning.
> If the OS X and Linux crashes are anything like the crashes on Windows

They're not.  They all seem to list 0x0 as "Address" in the summary table at https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Adom%3A%3AWrapperPromiseCallback%3A%3ACall%28JSContext*%2C+JS%3A%3AHandle%3CJS%3A%3AValue%3E%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2014-06-09+02%3A00%3A00&range_value=1#tab-reports

https://crash-stats.mozilla.com/report/index/753e42bd-744a-4498-9b4e-a20892140606 is an example of a 014-06-06-03-02-06 build in crash-stats and it claims 0.

All that said, here's what firebot has to say about 0x2b2b2b2b and 0x4b4b4b4b, if that's where we're crashing:

*firebot* 0x2b2b2b2b is The poison value written over the nursery after it is
          swept. When debugging a GGC crash, 0x2b or not 0x2b: that is the question.
*firebot* 0x4b4b4b4b is the poison value (JS_SWEPT_TENURED_PATTERN) written over
          tenured heap things and arenas after they are swept.

Terrence, can I somehow enable some sort of "gc more often" mode at runtime after I've loaded github?
Flags: needinfo?(terrence)
> They all seem to list 0x0 as "Address" in the summary table

But that's wrong.  See bug 1018360.
Oh, good to know!

So knowing that we have a swept object, not a null pointer, is actually pretty useful.  It means we should be looking for GC hazards in particular, I'd think.

I'll do some auditing of the Promise code tomorrow with that in mind.
Was able to catch this in a debug build with bz's suggestion to force GC more often:

* thread #1: tid = 0x71f414, 0x00000001015f8860 XUL`js::GetObjectClass(obj=0x00000001141f7638) + 16 at jsfriendapi.h:610, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001015f8860 XUL`js::GetObjectClass(obj=0x00000001141f7638) + 16 at jsfriendapi.h:610
    frame #1: 0x00000001015f8965 XUL`mozilla::dom::GetDOMClass(obj=0x00000001141f7638) + 21 at BindingUtils.h:185
    frame #2: 0x0000000103b9b16f XUL`tag_nsresult mozilla::dom::UnwrapObject<mozilla::dom::Promise, mozilla::dom::Promise*>(obj=0x00000001141f7638, value=0x00007fff5fbfcb78, protoID=Promise, protoDepth=0) + 31 at BindingUtils.h:223
    frame #3: 0x0000000103b97ee7 XUL`tag_nsresult mozilla::dom::UnwrapObject<(obj=0x00000001141f7638, value=0x00007fff5fbfcb78)331, mozilla::dom::Promise, mozilla::dom::Promise*>(JSObject*, mozilla::dom::Promise*&) + 39 at BindingUtils.h:259
    frame #4: 0x0000000103ba08d2 XUL`mozilla::dom::WrapperPromiseCallback::Call(this=0x0000000130760340, aCx=0x00000001117df170, aValue=Handle<JS::Value> at 0x00007fff5fbfcc88) + 946 at PromiseCallback.cpp:244
    frame #5: 0x0000000103b95b41 XUL`mozilla::dom::Promise::RunTask(this=0x000000012d20cce0) + 897 at Promise.cpp:1025
    frame #6: 0x0000000103b96447 XUL`mozilla::dom::Promise::RunResolveTask(this=0x000000012d20cce0, aValue=Handle<JS::Value> at 0x00007fff5fbfcf18, aState=Resolved, aAsynchronous=SyncTask) + 871 at Promise.cpp:1219
    frame #7: 0x0000000103b9cc9f XUL`mozilla::dom::PromiseResolverMixin::RunInternal(this=0x000000012b37b658) + 207 at Promise.cpp:116
    frame #8: 0x0000000103b9cd5c XUL`mozilla::dom::PromiseResolverTask::Run(this=0x000000012b37b640) + 28 at Promise.cpp:144
    frame #9: 0x00000001016b6272 XUL`nsThread::ProcessNextEvent(this=0x0000000100321aa0, aMayWait=false, aResult=0x00007fff5fbfd163) + 1570 at nsThread.cpp:766
    frame #10: 0x000000010159f36a XUL`NS_ProcessPendingEvents(thread=0x0000000100321aa0, timeout=20) + 154 at nsThreadUtils.cpp:210
    frame #11: 0x0000000103454859 XUL`nsBaseAppShell::NativeEventCallback(this=0x0000000111642600) + 201 at nsBaseAppShell.cpp:98
    frame #12: 0x00000001033e03dd XUL`nsAppShell::ProcessGeckoEvents(aInfo=0x0000000111642600) + 445 at nsAppShell.mm:286

It's still open (and I'll try to leave it) so if I can do some more investigation, let me know.
Took me some time but I think I have a solid STR:

1) Open https://github.com/mozilla/rust/issues/414
2) Open the web console
3) Paste and execute this:

setInterval(function () $("#discussion_bucket").updateContent("<div id='discussion_bucket'>" + new Array(100).join("<h1>foobar</h1>") + "</div>"), 10)

4) Wait a few seconds until Firefox crashes.
BTW, you don't need to be logged in to try the STR above.
Last good nightly: 2014-03-28
First bad nightly: 2014-03-29

Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6fa163ff81a3&tochange=4f3443da36a1

I guess this has been around since we enabled generational GC in bug 619558, then?
Version: 32 Branch → 31 Branch
Whiteboard: [GGC]
(In reply to Boris Zbarsky [:bz] from comment #47)
> > If the OS X and Linux crashes are anything like the crashes on Windows
> 
> They're not.  They all seem to list 0x0 as "Address" in the summary table at
> https://crash-stats.mozilla.com/report/
> list?signature=mozilla%3A%3Adom%3A%3AWrapperPromiseCallback%3A%3ACall%28JSCon
> text*%2C+JS%3A%3AHandle%3CJS%3A%3AValue%3E%29&product=Firefox&query_type=cont
> ains&range_unit=weeks&process_type=any&hang_type=any&date=2014-06-
> 09+02%3A00%3A00&range_value=1#tab-reports
> 
> https://crash-stats.mozilla.com/report/index/753e42bd-744a-4498-9b4e-
> a20892140606 is an example of a 014-06-06-03-02-06 build in crash-stats and
> it claims 0.
> 
> All that said, here's what firebot has to say about 0x2b2b2b2b and
> 0x4b4b4b4b, if that's where we're crashing:
> 
> *firebot* 0x2b2b2b2b is The poison value written over the nursery after it is
>           swept. When debugging a GGC crash, 0x2b or not 0x2b: that is the
> question.
> *firebot* 0x4b4b4b4b is the poison value (JS_SWEPT_TENURED_PATTERN) written
> over
>           tenured heap things and arenas after they are swept.
> 
> Terrence, can I somehow enable some sort of "gc more often" mode at runtime
> after I've loaded github?

Yes, absolutely. In an --enable-debug or --enable-gczeal build, you can set the environment JS_GC_ZEAL to a flag to trigger several helpful tools. Use JS_GC_ZEAL=help to get a list of the various modes.
Flags: needinfo?(terrence)
> you can set the environment JS_GC_ZEAL to a flag to trigger several helpful tools.

That fails the "at runtime after I've loaded github" criterion.  ;)
From terrence on IRC:

  the C interface you want is JS_SetGCZeal(cx, zealMode, frequency)
Tim, thank you for coming up with those STR and for the stack.  And Steven, thank you _very_ much for figuring out the right direction to be poking here!

So we're on this line in WrapperPromiseCallback::Call:

244         nsresult r = UNWRAP_OBJECT(Promise, valueObj, returnedPromise);

and we have:

(gdb) p valueObj.ptr->type_.value
$7 = ('js::types::TypeObject' *) 0x2b2b2b2b2b2b2b2b

valueObj came from here:

    JS::Rooted<JSObject*> valueObj(aCx, &retValue.toObject());

and retValue came from here:

  JS::Rooted<JS::Value> retValue(aCx,
    mCallback->Call(value, rv, CallbackObject::eRethrowExceptions));

What I think we have here is a rooting hazard that the analysis is missing for some reason.  In particular, in AnyCallback::Call we do this (with some error handling elided):

    CallSetup s(this, aRv, aExceptionHandling);
    return Call(s.GetContext(), JS::UndefinedHandleValue, value, aRv);

where the Call() function returns a JS::Value.  So between when that inner call returns and when we ourselves return there's an unrooted Value hanging out.  But ~CallSetup can GC, and then we end up with a dead object.

Here's a testcase that reliably reproduces the crash, in every build going back to Firefox 29 (when we enabled Promise).  All you have to do is install https://www.squarefree.com/extensions/domFuzzLite3.xpi and run this:

  <script>
    var s = document.querySelector("script");
    var obs = new MutationObserver(function() { fuzzPriv.forceGC(); });
    obs.observe(s, { childList: true });
    Promise.resolve(5).then(function() {
      s.appendChild(document.createTextNode("x"));
      return {};
    })
  </script>

or use your favorite less-reliable method of triggering GC if you don't want the extension.
Looks like we only enabled exact rooting in Firefox 28, so builds before that should not be affected at all.  Firefox 28 is _probably_ affected only if there is a non-Promise API that's calling WebIDL callbacks that return "any" or "object".

Oh, and the reason the testcase in comment 57 crashes is that ~CallSetup runs mutation observers, so that lets us explicitly trigger a gc in ~CallSetup.

Steve Fink is going to fix the analysis; I'll fix the bindings bug here.
Assignee: nobody → bzbarsky
Bug 1022773 is the analysis bug.
See Also: → 1022773
While this might show up on the release population, it doesn't currently show up as a topcrasher there and that is likely due to more GitHub devs using pre-release channels.  Will track this for FF30 and it could be a potential ride-along, depending on risk, but right now doesn't look like a chemspill driver - let's wait and watch what happens in the next day or two as we absorb users on FF30 release.
For what it's worth, I suspect risk will be pretty low and the bigger worry for me is potential exploitability than stability...

I agree we shouldn't block the release on this, but we should strongly consider it as a ride-along.
Peter, if you won't be able to get to this stuff pretty soon, let me know, please.  This needs backporting to beta 31.
Attachment #8437929 - Flags: review?(peterv)
Attachment #8437930 - Flags: review?(peterv)
Attachment #8437931 - Flags: review?(peterv)
Comment on attachment 8437929 [details] [diff] [review]
part 1.  Change the return value of getRetvalDeclarationForType to allow more than two states for the outparam bit

Review of attachment 8437929 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/bindings/Codegen.py
@@ +5808,5 @@
> +                args.append(CGGeneric("result"))
> +            else:
> +                assert resultOutParam is "ptr"
> +                args.append(CGGeneric("&result"))
> +                

Trailing whitespace.
Attachment #8437929 - Flags: review?(peterv) → review+
Comment on attachment 8437930 [details] [diff] [review]
part 2.  Return WebIDL 'any' values as handles

Review of attachment 8437930 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/html/content/src/nsGenericHTMLElement.h
@@ +132,2 @@
>    {
> +    return GetItemValue(aCx, GetWrapperPreserveColor(), aRetval, aError);

I'd drop |return |.

::: dom/base/DOMRequest.cpp
@@ +82,5 @@
>  
>  NS_IMETHODIMP
>  DOMRequest::GetResult(JS::MutableHandle<JS::Value> aResult)
>  {
> +  GetResult(aResult);

This doesn't seem right, I think you wanted to pass null for the context.

::: dom/base/DOMRequest.h
@@ +60,5 @@
> +  /*  void GetResult(JS::MutableHandle<JS::Value> aRetval) const
> +  {
> +    GetResult(nullptr, aRetval);
> +  }
> +  */

And this can go.

::: dom/base/nsGlobalWindow.cpp
@@ +10587,5 @@
> +nsGlobalWindow::GetInterface(JSContext* aCx, nsIJSID* aIID,
> +                             JS::MutableHandle<JS::Value> aRetval,
> +                             ErrorResult& aError)
> +{
> +  return dom::GetInterface(aCx, this, aIID, aRetval, aError);

I'd drop |return |.

::: dom/bindings/BindingUtils.h
@@ +1661,2 @@
>  {
> +  return GetInterfaceImpl(aCx, aThis, aThis, aIID, aRetval, aError);

I'd drop |return |.

::: dom/events/MessageEvent.cpp
@@ +78,2 @@
>    JS::Rooted<JS::Value> data(aCx, mData);
>    if (!JS_WrapValue(aCx, &data)) {

Could just do

  aData.set(mData);
  if (!JS_WrapValue(aCx, aData)) {

?

::: js/xpconnect/src/event_impl_gen.py
@@ +311,2 @@
>          fd.write("  nsresult rv = NS_ERROR_UNEXPECTED;\n")
> +        fd.write("  if (m%s && !XPCVariant::VariantDataToJS(m%s, &rv, aRetval)) {\n" % (firstCap(a.name), firstCap(a.name)))

Hmm, this used to return null if the member was null, shouldn't we keep that behaviour?
Attachment #8437930 - Flags: review?(peterv) → review+
> Could just do

Yes, done.

> Hmm, this used to return null if the member was null

Good catch.  Fixed like so:

        fd.write("  nsresult rv = NS_ERROR_UNEXPECTED;\n")
        fd.write("  if (!m%s) {\n" % firstCap(a.name))
        fd.write("    aRetval.setNull();\n")
        fd.write("  } else if (!XPCVariant::VariantDataToJS(m%s, &rv, aRetval)) {\n" % (firstCap(a.name)))
        fd.write("    aRv.Throw(NS_ERROR_FAILURE);\n")
        fd.write("  }\n")

Applied the rest of the comment 65 and comment 66 comments.
Comment on attachment 8437931 [details] [diff] [review]
part 3.  Return WebIDL 'object' values as handles

Review of attachment 8437931 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsDocument.cpp
@@ +5998,2 @@
>    JSObject* constructorObject = JS_GetFunctionObject(constructor);
> +  if (!constructorObject) {

This shouldn't ever happen, no? IIRC it returns the same object.

::: dom/base/nsGlobalWindow.h
@@ +1000,5 @@
>    {
>      if (mDoc) {
>        mDoc->WarnOnceAbout(nsIDocument::eWindow_Content);
>      }
> +    return GetContent(aCx, aRetval, aError);

I'd drop |return |.

::: dom/nfc/MozNDEFRecord.h
@@ +70,1 @@
>      }

This could just be:

  if (mType) {
    JS::ExposeObjectToActiveJS(mType);
  }
  retval.set(mType);

@@ +76,5 @@
> +      JS::ExposeObjectToActiveJS(mId);
> +      retval.set(mId);
> +    } else {
> +      retval.set(nullptr);
> +    }

Same here.

@@ +90,1 @@
>      }

And here.
Attachment #8437931 - Flags: review?(peterv) → review+
> This shouldn't ever happen, no? IIRC it returns the same object.

Indeed.  Changed to just do:

  aRetval.set(JS_GetFunctionObject(constructor));

Applied the other comments.
Attachment #8437929 - Attachment is obsolete: true
Attachment #8437930 - Attachment is obsolete: true
Attachment #8437931 - Attachment is obsolete: true
Comment on attachment 8438523 [details] [diff] [review]
part 1.  Change the return value of getRetvalDeclarationForType to allow more than two states for the outparam bit

This is actually a sec-approval request for all three patches.

[Security approval request comment]
How easily could an exploit be constructed based on the patch?  Not very easily, imo.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  No.

Which older supported branches are affected by this flaw?  Anything based on Gecko 28 or later.  I think that means just 29, 30, beta 31, aurora 32

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  I have backports for Aurora 32 and beta 31.  It should be pretty simple to create them for 30 and 29 as needed, I expect.

How likely is this patch to cause regressions; how much testing does it need?  This has moderate risk of me and Peter missing something about the mass-changes...  In some ways the more testing we can get the better.
Attachment #8438523 - Flags: sec-approval?
Part 1 and part 3 apply on Aurora as-is
Attachment #8438524 - Attachment is obsolete: true
Comment on attachment 8438523 [details] [diff] [review]
part 1.  Change the return value of getRetvalDeclarationForType to allow more than two states for the outparam bit

sec-approval+ for this to go into trunk. Since there is some risk, putting it in sooner is probably better.
Attachment #8438523 - Flags: sec-approval? → sec-approval+
Attachment #8438663 - Flags: review?(peterv)
Attachment #8438664 - Flags: review?(peterv)
Blocks: ExactRooting
Comment on attachment 8438523 [details] [diff] [review]
part 1.  Change the return value of getRetvalDeclarationForType to allow more than two states for the outparam bit

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 753203, effectively.
User impact if declined: Crashes that are pretty easy to trigger from content and
   may be exploitable.
Testing completed (on m-c, etc.): Passes tests.
Risk to taking this patch (and alternatives if risky): Medium-ish risk; there
   were a lot of changes here...
String or IDL/UUID changes made by this patch:  None.
Attachment #8438523 - Flags: approval-mozilla-beta?
Attachment #8438523 - Flags: approval-mozilla-aurora?
Attachment #8438558 - Flags: approval-mozilla-aurora?
Attachment #8438526 - Flags: approval-mozilla-aurora?
Attachment #8438558 - Attachment is obsolete: true
Attachment #8438558 - Flags: approval-mozilla-aurora?
Attachment #8438524 - Attachment is obsolete: false
Attachment #8438558 - Attachment is obsolete: false
Attachment #8438558 - Flags: approval-mozilla-aurora?
Attachment #8438680 - Flags: approval-mozilla-beta?
Attachment #8438683 - Flags: approval-mozilla-beta?
Attachment #8438663 - Flags: review?(peterv) → review+
Attachment #8438664 - Flags: review?(peterv) → review+
For the record, this is the updated analysis's hazard output for this bug:

Function '_ZN7mozilla3dom11AnyCallback4CallEN2JS6HandleINS2_5ValueEEERNS_11ErrorResultENS0_14CallbackObject17ExceptionHandlingE|JS::Value mozilla::dom::AnyCallback::Call(class JS::Handle<JS::Value>, mozilla::ErrorResult*, uint32)' has unrooted '<returnvalue>' of type 'JS::Value' live across GC call '_ZN7mozilla3dom14CallbackObject9CallSetupD1Ev|void mozilla::dom::CallbackObject::CallSetup::~CallSetup()' at dist/include/mozilla/dom/PromiseBinding.h:126
    dist/include/mozilla/dom/PromiseBinding.h:129
GC Function: _ZN7mozilla3dom14CallbackObject9CallSetupD1Ev|void mozilla::dom::CallbackObject::CallSetup::~CallSetup()
    void mozilla::dom::CallbackObject::CallSetup::~CallSetup(int32)
    uint8 JS_ReportPendingException(JSContext*)
    uint8 js_ReportUncaughtException(JSContext*)
    void js::CallErrorReporter(JSContext*, int8*, JSErrorReport*)
    IndirectCall: hook
Steve, that looks lovely.
Attachment #8438523 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8438526 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8438558 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Since there are some risks, I've only approved the aurora change. I will approve the beta changes for beta 3.
This hasn't actually landed on beta 31 yet, has it?
Flags: needinfo?(lhenry)
Correct.  It has not.
Flags: needinfo?(lhenry)
Beta 2 built. Taking the patches for beta 3.
Attachment #8438680 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8438683 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8438523 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
If it applies cleanly there, might be worth it just to reduce the differences between the branches.  It's not high priority, though, since afaik exact marking is not enabled on b2g30.
Flags: needinfo?(bzbarsky)
Lots of tedious unbitrotting trying to get the beta patches to apply to b2g30.
Given that exact rooting is required to hit this, marking v1.4 as unaffected since the benefits of any backport don't justify the effort in getting these patches to apply.
Depends on: 1029652
Whiteboard: [GGC][testcase in comment 57] → [GGC][testcase in comment 57][adv-main31+]
Rerpoduced the original issue using the following build:
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/05/2014-05-13-03-02-01-mozilla-central/
- Used the STR from comment # 52 and reproduced the issue several times

Went through the following verification:

Win 8.1: [PASSED]
- fx 33.0a1 [20140714030201]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-03-02-01-mozilla-central/
- fx 32.0a2 [20140714004001]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-00-40-01-mozilla-aurora/
- fx 31.0b9 [20140710141843]: http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/31.0b9/win32/en-US/

Ubuntu 13.10 x64: [PASSED]
- fx 33.0a1 [20140714030201]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-03-02-01-mozilla-central/
- fx 32.0a2 [20140714004001]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-00-40-01-mozilla-aurora/
- fx 31.0b9 [20140710141843]: http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/31.0b9/linux-x86_64/en-US/

OSX 10.9.4 x64: [PASSED]
- fx 33.0a1 [20140714030201]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-03-02-01-mozilla-central/
- fx 32.0a2 [20140714004001]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-00-40-01-mozilla-aurora/
- fx 31.0b9 [20140710141843]: http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/31.0b9/mac/en-US/

- ran through the STR using several different tabs
- ran through the STR using several different windows
- ran through the STR using private windows/tabs
- ran through the STR using e10s windows/tabs
Status: RESOLVED → VERIFIED
QA Whiteboard: [qa!]
Group: core-security → core-security-release
Group: core-security-release
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.