Persona is no longer an option for authentication on BMO. For more details see Persona Deprecated.
Last Comment Bug 650519 - Firefox crash [@ js_GCThingIsMarked(void*, unsigned int) ]
: Firefox crash [@ js_GCThingIsMarked(void*, unsigned int) ]
[Aurora][Fx5] fixed-in-tracemonkey
: crash, crashreportid, regression
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: 5 Branch
: x86 Windows 7
: -- critical (vote)
: mozilla6
Assigned To: Andrew McCreight [:mccr8]
: Jason Orendorff [:jorendorff]
: 647467 649579 652965 (view as bug list)
Depends on:
Blocks: 641910
  Show dependency treegraph
Reported: 2011-04-16 10:53 PDT by Raul Malea
Modified: 2013-12-27 14:29 PST (History)
19 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

possible fix for this crash, by stopping two classes from adding JS strings to the cycle collector graph (2.59 KB, patch)
2011-04-19 15:14 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Splinter Review
by stopping two classes from adding JS strings to the cycle collector graph (fix ASSERT debug build bustage) (2.59 KB, patch)
2011-04-19 15:53 PDT, Andrew McCreight [:mccr8]
gal: review+
Details | Diff | Splinter Review
don't examine mark bits on static strings (1.82 KB, patch)
2011-04-21 11:57 PDT, Andrew McCreight [:mccr8]
gal: review+
Details | Diff | Splinter Review
avoid passing static strings by testing for strings in NoteScriptChild (1.93 KB, patch)
2011-04-29 12:32 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Splinter Review
add safe function to check JS object gray bits (3.24 KB, text/plain)
2011-05-04 09:51 PDT, Andrew McCreight [:mccr8]
no flags Details
add safe function to check JS object gray bits (3.24 KB, patch)
2011-05-04 15:31 PDT, Andrew McCreight [:mccr8]
gal: review+
Details | Diff | Splinter Review
add safe function to check JS object gray bits (addressed gal's review points) (3.52 KB, patch)
2011-05-04 16:25 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Splinter Review
add a safe function to check grey bits, call it in the cycle collector (4.11 KB, patch)
2011-05-05 08:31 PDT, Andrew McCreight [:mccr8]
gal: review+
sayrer: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Raul Malea 2011-04-16 10:53:40 PDT
Aurora ln10 (romanian ) build 16.04.2011 and Nighlty ln10(romanian) build 15.04.2011 crashed when i try opened my default profile (fx4). In new profile this builds opened and run normally.

Crashed reports (both Aurora and Nightly):

App tabs opened: Gmail, facebook, Bugzilla, mozillazine, babelzilla,, narro,, my wordpress controlpanel

My addons list :

Abduction! 3.0.10
About startup 0.1.5
about:me 0.5
about:response 0.1
Add-on Collector 2.0.1
Add-on Compatibility Reporter 0.8.3
AddonFox 1.1.7
Adobe Acrobat
Adobe Contribute Toolbar 6.0 [DISABLED]
Blog Comment Autofill 
British English Dictionary 1.19.1
Bugzilla Tweaks 1.9
Converter 1.0.9
Crash Report Helper (Asistent raportare incidente) 1.2
Default 4.0 [DISABLED]
Dicționar românesc de corectare ortografică. 1.11
Download Statusbar 0.9.8
DownloadHelper 4.8.6
DownThemAll! 2.0.2
F1 by Mozilla Labs 0.8.3
FFixer 2.3.1
FileList V3 
fireblur 1260925626
Firefox 4 Menu Button with icon+(Tabs in Titlebar) 
Firefox B 1260925626 [DISABLED]
Flagfox 4.1.1
FlashGames Maximizer [DISABLED]
FoxTab 1.4.2
German Dictionary 2.0.2
Google Earth Plugin
Google Shortcuts 2.1.6
Google Talk Plugin
Google Talk Plugin Video Accelerator
Google Update
Greasefire 1.0.4 [DISABLED]
Greasemonkey 0.9.2
Hard blockers counter 1.0.1 [DISABLED]
IE Tab 2 (FF 3.6+)
IE Tab Plug-in
ImTranslator 4.01
Java Deployment Toolkit
Java(TM) Platform SE 6 U24
Massive Extender 1.0b5
Microsoft Office 2010 14.0.4730.1010
Microsoft Office 2010 14.0.4761.1000
Minefield - Better Add-on Bar/Status Bar  [DISABLED]
Movie Torrent Search Linker (for IMDB, Rotten Tomatoes and Yahoo Movies)
Mozilla Labs: Prospector - Home Dash 10 [DISABLED]
Mozilla QA Companion 1.2.3
MozMill Crowd 0.1.2
Nightly Tester Tools 3.1.5
NVIDIA 3D Vision
Personas 1.6.2
Power Twitter 1.60
Prism for Firefox 1.0b4 [DISABLED]
Puppy Dogs... 1291695684 [DISABLED]
QuickTime Plug-in 7.6.9
Read It Later 2.1.1
RealJukebox NS Plugin
RealPlayer Version Plugin
RealPlayer(tm) G2 LiveConnect-Enabled Plug-In (32-bit)
RealPlayer(tm) HTML5VideoShim Plug-In (32-bit)
Sage 1.4.10
Shockwave Flash
Shockwave for Director
Show Just Image 2 2.5.2
Silverlight Plug-In 4.0.60129.0
startup.service 0.2
Status-4-Evar 2011.04.06.18
StumbleUpon 3.81
Stylish 1.1.2
Test Pilot 1.1
TortoiseSVN Menu 0.2.5
Ubiquity 0.6 [DISABLED]
URL Fixer 2.0.3
Userscripts Updater 
Windows Live™ Photo Gallery 15.4.3508.1109
WiseStamp 2.2.1
Yahoo Application State Plugin
Yoono 7.6.2
YouTube Enhancer 2010-12-18
Comment 1 Raul Malea 2011-04-16 10:59:15 PDT
I used master windows password. 
My graphics is:


Descrierea adaptorului NVIDIA GeForce 8600 GT 
ID furnizor 10de
ID dispozitiv 0402
RAM pentru adaptor 512
Drivere pentru adaptor  nvd3dum nvwgf2um,nvwgf2um
Versiune driver
Dată driver  2-23-2011
Direct2D activat true
DirectWrite activat  true (6.1.7601.17563, font cache n/a)
Motor de afișare WebGL  Google Inc. -- ANGLE -- OpenGL ES 2.0 (ANGLE
Accelerare GPU Windows 1/1 Direct3D 10
Comment 3 (mostly gone) XtC4UaLL [:xtc4uall] 2011-04-17 04:39:32 PDT
Frame 	Module 	Signature 	Source
0 	mozjs.dll 	js_GCThingIsMarked(void*,unsigned int) 	js/src/jsgc.cpp:523
1 	xul.dll 	GCGraphBuilder::NoteScriptChild(unsigned int,void*) 	xpcom/base/nsCycleCollector.cpp:1680
2 	xul.dll 	xul.dll@0xc65bdb 	
3 	xul.dll 	NoteChild 	obj-firefox/xpcom/build/nsCycleCollectionParticipant.cpp:46
4 	xul.dll 	nsJSArgArray::cycleCollection::Trace(void*,void (*)(unsigned int,void*,void*),void*) 	dom/base/nsJSEnvironment.cpp:4001
5 	xul.dll 	nsJSArgArray::cycleCollection::Traverse(void*,nsCycleCollectionTraversalCallback&) 	dom/base/nsJSEnvironment.cpp:3988
6 	xul.dll 	nsCycleCollector::MarkRoots(GCGraphBuilder&) 	xpcom/base/nsCycleCollector.cpp:1773
7 	xul.dll 	nsCycleCollector::BeginCollection(int,nsICycleCollectorListener*) 	xpcom/base/nsCycleCollector.cpp:2636
8 	xul.dll 	nsCycleCollectorRunner::Run() 	xpcom/base/nsCycleCollector.cpp:3316
9 	xul.dll 	nsThread::ProcessNextEvent(int,int*) 	xpcom/threads/nsThread.cpp:618
10 	xul.dll 	nsThreadStartupEvent::Run() 	xpcom/threads/nsThread.cpp:202
11 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426
12 	nspr4.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:122
13 	mozcrt19.dll 	_callthreadstartex 	obj-firefox/memory/jemalloc/crtsrc/threadex.c:348
14 	mozcrt19.dll 	_threadstartex 	obj-firefox/memory/jemalloc/crtsrc/threadex.c:326
15 	kernel32.dll 	BaseThreadInitThunk 	
16 	ntdll.dll 	__RtlUserThreadStart 	
17 	ntdll.dll 	_RtlUserThreadStart

Can you pinpoint the Crashes to a certain Addon (Extension/Plugin)?
Comment 4 Andrew McCreight [:mccr8] 2011-04-18 09:23:04 PDT
Something is awry with js_GCThingIsMarked, either with the implementation or the way people are invoking it.  I have found crashes for pretty much every place it is invoked, except in nsXPConnect::Traverse (which may be a hint, as the call there is guarded by some auxiliary tests about the nature of the object being checked), in 4.0, 5.0 or 6.0.  The change I introduced in Bug 641910 seems to cause the crashes to be much more frequent, but I'm not sure if that is due to way it is invoked, or if the function is being called much more frequently with this change.
Comment 5 Andrew McCreight [:mccr8] 2011-04-18 10:04:19 PDT
On a brief run of a bunch of pages, 563822 out of 2313193 (24%) of invocations of js_GCThingIsMarked were in the cycle collector, so frequency of invocation doesn't seem to entirely explain why crashes are much more frequent in the code I added.
Comment 6 Andrew McCreight [:mccr8] 2011-04-18 10:30:54 PDT
These crashes are rare with 4.0, so if they are causing you trouble, you can switch back to the main release from Aurora or Nightly.
Comment 7 Andrew McCreight [:mccr8] 2011-04-18 17:13:52 PDT
Almost all of the crashes of this in 5.0a2 are caused by traversals of nsJSArgArray.  There are also a couple of traversals of nsJSScriptTimeoutHandler.  Hopefully this will provide some traction to figure out what is causing the bug.
Comment 8 Sheila Mooney 2011-04-19 10:08:20 PDT
This appears to be only on Aurora and not on the trunk.
Comment 9 Andrew McCreight [:mccr8] 2011-04-19 10:15:41 PDT
I'm not sure exactly how version numbers correspond to names, but it is showing up for me on Crash Stats in version 6.0a1 and 5.0a2.
Comment 10 Andrew McCreight [:mccr8] 2011-04-19 15:14:28 PDT
Created attachment 527125 [details] [diff] [review]
possible fix for this crash, by stopping two classes from adding JS strings to the cycle collector graph

After some discussion with Andreas and Gregor, I have a possible solution.  This is currently the number 5 crasher on Aurora.

nsJSScriptTimeoutHandler is adding a JSFlatString to the cycle collector graph, which can possibly be a statically allocated string, which would cause a crash when FF tries to examine the mark bits.  Similarly, Andreas said that nsJSArgArray can frequently contain strings, which will cause the same problem.  This patch removes the traversal of the field of the timeout handler that contains a string, and changes the guard for the array elements to prevent primitives from being added to the CC graph.  These two classes are responsible for 100% of the crashes with this signature that I have seen on Nightly and Aurora when digging through the crash reports.

I also added a JS_ASSERT to js_GCThingIsMarked to hopefully catch this in a slightly more controlled fashion.
Comment 11 Andrew McCreight [:mccr8] 2011-04-19 15:25:47 PDT
Also, independent of causing crashes, strings are acyclic and thus should never be added to the cycle collector graph.
Comment 12 christian 2011-04-19 15:48:11 PDT
It is currently the number 5 crasher for Aurora.  #24 for Nightly. Tracking...
Comment 13 Andrew McCreight [:mccr8] 2011-04-19 15:53:29 PDT
Created attachment 527138 [details] [diff] [review]
by stopping two classes from adding JS strings to the cycle collector graph (fix ASSERT debug build bustage)

I used a . instead of a -> in the assertion...
Comment 14 Andrew McCreight [:mccr8] 2011-04-19 16:56:11 PDT
The assertion I added is triggering on my try-server builds.  I'll look into what the problem is. (Ignore the debug build failures...)
Comment 15 Nicholas Nethercote [:njn] 2011-04-19 20:01:52 PDT
(In reply to comment #4)
> Something is awry :)
Comment 16 Peter Van der Beken [:peterv] 2011-04-20 00:49:28 PDT
Was js_GCThingIsMarked changed? It seems weird that an API with that name can't handle all GCThings.
Comment 17 Peter Van der Beken [:peterv] 2011-04-20 01:20:59 PDT
XPCTraceableVariant might have the same problem?
Comment 18 Andreas Gal :gal 2011-04-20 01:39:02 PDT
Yeah we might want to rename that.
Comment 19 Peter Van der Beken [:peterv] 2011-04-20 02:25:47 PDT
(In reply to comment #17)
> XPCTraceableVariant might have the same problem?

and IDBCursor adnd IDBRequest?
Comment 20 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-20 08:39:34 PDT
Can't we just fix js_GCThingIsMarked?
Comment 21 Andrew McCreight [:mccr8] 2011-04-20 08:56:11 PDT
In my current patch, it seems like the assertion basically fires off immediately when the CC starts up.  I'll need to figure out what kind of object is causing the problem.  Hopefully this doesn't mean that the change to nsJSArgArray isn't also wrong.  In addition to every debug run failing, there were a number of failures from timeouts on the opt builds.  I'm not sure what the deal is with that.

js_GCThingIsMarked was changed by Gregor's compartment GC patch, but the previous version also looked up in a mark bit map, so I'm not sure how it would have worked before that either.  My guess is that my patch changed things so the entire CC graph is pushed into the function, whereas before whatever corner of the CC graph has these two types wasn't getting examined in there.

It does seem a little sketchy to require a delicate globally enforced invariant like this to prevent crashes.  Judging by Aurora data, it seems okay for now, but there's nothing to prevent somebody from adding a JS string to the cycle collector.  If the guard is too weak, there are crashes.  If the guard is too strong, there are leaks.
Comment 22 Gregor Wagner [:gwagner] 2011-04-20 09:25:40 PDT
I looked up the old version and I didn't do any changes in the behavior. We also went straight into the bitmap and tested if the bit corresponding to the address is set.

The definition of gcthings has changed. We started with static strings that are no longer on the heap about 2 years ago and recently we added shapes to the heap. Calling js_GCThingIsMarked with a static string will result in a crash right now.

We can catch static strings and always return true for the optimized builds (so we don't crash) and we can also add an assertion that triggers for static strings in debug builds.

If static strings are not the reason we could use the logic from the conservative stack scanner to see if the actual pointer we feed into the function points to a real gcthing.
Comment 23 Andrew McCreight [:mccr8] 2011-04-20 09:36:46 PDT
The assertion I added seems to be firing off on a JS number.  I'm basing this guess  on the fact that thing-> == "Number", for the thing that is causing the assertion.  They do seem to be primitives, judging by the use of "setPrimitiveThis", which also applies to booleans.  Next, I'm going to look at what is adding a JS number to the CC graph.  It is coming in via js::gc::Mark<JSObject>.
Comment 24 Andrew McCreight [:mccr8] 2011-04-20 10:55:17 PDT
So, this particular number is being reached via MarkObjectSlot, from a ChromeWindow.  It seems like that would be a fairly common thing.

Bill McCloskey said that the right way to do the assert is to check to see if it is a static string, because everything else should be okay.  I changed the assertion to this:
...and it at least lets me browse around a bit without failing.  I'll run Mochitest on my machine and see how that goes.

We could also check for this, and return something without looking at the mark bit.  I'm not really sure what the right value to return would be, though.

It might be nice to eventually have a CC-specific variant that will ignore primitive objects, that the GC does care about.
Comment 25 Andrew McCreight [:mccr8] 2011-04-20 17:08:37 PDT
With this patch in place, Mochitest 1-5 and Crashtest orange out on Linux debug builds, due to a crash.  I can confirm that on my own Mac, OSX64 debug builds for at least Mochitest 2 consistently crash on the exact same test, so this probably isn't just platform wonkiness or transient errors.

The Linux stack traces all look like more or less like this (with registers and "Found by" deleted):

Crash reason:  SIGSEGV
Crash address: 0xdadadada

Thread 0 (crashed)
 0!nsCharTraits<short unsigned int>::length [nsCharTraits.h : 384 + 0x3]
 1!nsDependentString::nsDependentString [nsTDependentString.h : 89 + 0xa]
 2!nsGlobalWindow::RunTimeout [nsGlobalWindow.cpp : 9096 + 0xba]
 3!nsGlobalWindow::TimerCallback [nsGlobalWindow.cpp : 9456 + 0x1e]
 4!nsTimerImpl::Fire [nsTimerImpl.cpp : 424 + 0x14]
 5!nsTimerEvent::Run [nsTimerImpl.cpp : 516 + 0x12]

From Googling, it looks like 0xdadadada is written to JS objects when they are freed.

On my own mac, I was able to stop the Mochitest 2 crash by backing out the change to the nsJSScriptTimeoutHandler, adding this line back:

mExpr is a JS string, in a timeout handler, so combining the above facts together my theory is that somehow making the cycle collector not traverse mExpr is causing the JS GC to free mExpr, which is later causing the crash when something tries to access it.  I have no idea how that would happen, as removing objects from the cycle collector graph should only cause leaks, not make objects get freed too early.
Comment 26 Peter Van der Beken [:peterv] 2011-04-20 23:55:59 PDT
(In reply to comment #25)
> I have no idea how that would happen, as
> removing objects from the cycle collector graph should only cause leaks, not
> make objects get freed too early.

That's not true for JS objects. Who's responsible for rooting mExpr after your change? AFAICT nothing, so it's kinda normal that it becomes garbage after the next GC. You probably need to add explicit JS_Add*Root/JS_Remove*Root calls.
Comment 27 Andreas Gal :gal 2011-04-21 00:07:26 PDT
Ouch. Good point. We could just change the macro to trace but not add to the CC if its not an object?
Comment 28 Andrew McCreight [:mccr8] 2011-04-21 08:46:59 PDT
One of these crashes that shows up very rarely in 4.0 showed up in 5.0:

The call stack looks like 

It doesn't look like this could be caused by a string (because the thing is the result of GetFlatJSObjectPreserveColor(), so hopefully it can't be a string), but as I said this is a very rare crash in 4.0 (ranked < 300 I think) so probably not worth worrying about too much.
Comment 29 Andrew McCreight [:mccr8] 2011-04-21 11:57:35 PDT
Created attachment 527607 [details] [diff] [review]
don't examine mark bits on static strings

What peterv is saying is that the Trace function is used by both the CC and the GC, so we can't skip children there.  This patch instead fixes js_GCThingIsMarked so it returns false for static strings.  This should fix the function for all uses, and makes it so the name should be accurate.  Seems a little ugly, but this seems similar to how the rest of the JS GC deals with this problem.  Static strings don't have any marking bits associated with them, so they are never marked any color.  Fanciness like pruning primitives from the CC can be done in a later patch.  I also removed a completely bogus cast.

I'm trying this out on the try server, for a Linux debug build.  A random chunk of mochitest I tried was okay, where it wasn't before.
Comment 30 Andrew McCreight [:mccr8] 2011-04-21 14:05:41 PDT
Linux debug passed all tests on the try server.
Comment 31 Andrew McCreight [:mccr8] 2011-04-22 09:11:28 PDT
I ran this patch on the try server, and it passed everything on the try server across all platforms, except for what look like 3 known oranges, plus another orange on a test that seems to have trouble around midnight, and was run around midnight.  Linux64 debug had some kind of immediate infrastructure failure, so I guess I should retry that for completeness.
Comment 32 Andrew McCreight [:mccr8] 2011-04-26 14:34:53 PDT
An alternate form of this patch would leave out the changes to nsCycleCollector.cpp, which I think just removes a no-op that is incorrect (the value goes from void* -> JSObject* -> void* so nothing should happen).  I've pushed that to try make sure that is stable.  I could then find a JS GC person to review it if Andreas is too busy.
Comment 33 Marcia Knous [:marcia - use ni] 2011-04-26 14:59:16 PDT
*** Bug 652965 has been marked as a duplicate of this bug. ***
Comment 34 Alex Vincent [:WeirdAl] 2011-04-27 15:13:32 PDT
I see there's currently no testcase for this bug yet, and I have a theory derived from comment 7 and comment 23.

window.setTimeout returns a number - or at least, that's how JS variables store it.  That number is how users can cancel the timeout, by calling window.clearTimeout later...

Could a setTimeout stress test capture this bug?
Comment 35 Andrew McCreight [:mccr8] 2011-04-27 15:25:32 PDT
Comment 23 was about a patch I proposed that was wrong, so it shouldn't have anything to do with the original problem.

To trigger the problem according to my theory, the cycle collector has to attempt to add a JSStaticAtom to the CC graph.  I think these are JS strings that are very short (like "a"), and the runtime just statically allocates them.  The field mExpr of the timeout handler and the contents of JSArgArrays seem to be where this seems to happen.  Once that's been built, it needs to be reachable from the DOM via XPConnect, but not from any Javascript roots.  Then you have to leave things like that until a cycle collection happens.  I don't know the best way to go about getting that to happen.
Comment 36 Nicholas Nethercote [:njn] 2011-04-27 16:20:28 PDT
(In reply to comment #35)
> I think these are JS strings
> that are very short (like "a"), and the runtime just statically allocates them.

That's right, as an optimization the JS engine statically allocates a bunch of common 1, 2 and 3 char strings.  For details, look for "unit string", "length-2 strings" and "integer strings" in js/src/jsstr.cpp.
Comment 37 Andreas Gal :gal 2011-04-28 17:53:39 PDT
Comment on attachment 527607 [details] [diff] [review]
don't examine mark bits on static strings

This might slow us down, so lets do some profiling after this and see whether it bites (I think the rest of the GC heap walk code is shitty enough that this is actually not going to make a difference for now).
Comment 38 Andrew McCreight [:mccr8] 2011-04-29 07:58:54 PDT
It looks like every other use of this function is guarded in a way that keeps strings from hitting it, so that's the implicit invariant that kept it from breaking before.
Comment 39 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-29 10:02:07 PDT
Maybe if we remove those other guards then the addition of this check won't really change perf then?
Comment 40 Andrew McCreight [:mccr8] 2011-04-29 10:48:13 PDT
I don't think they can be easily removed, as they do things like check whether a GCthing is a string or an object, and do different things based on that.
For instance, in one place the guard is GetGCThingTraceKind, which calls JSAtom::isStatic, so you'd have to lift the call to xpc_IsGrayGCThing above the call to GetGCThingTraceKind, then have a variant of GetGCThingTraceKind that doesn't call JSAtom::isStatic.  In NoteJSChild, the determination of the kind of the object is done outside of the function entirely.

The easiest way to avoid perf problems, is to leave js_GCThingIsMarked in its current fragile state and change nsCycleCollector.cpp to something like:

if (!JSAtom::isStatic(child) && !xpc_IsGrayGCThing(child) && !WantAllTraces())

Along these same lines, I could check the GCThingKind and filter out strings.  This might filter out some more things from the CC graph, but would make the test more expensive.

As I said, these solutions leave js_GCThingIsMarked in its current fragile state, but renaming it (xpc_IsGrayGCThingThatIsNotAStaticString or xpc_IsGrayGCThingThatIsActuallyAGCThing?) and adding some assertions might help a bit.

The real underlying problem here is that statically allocated JS strings are subtypes of js::gc::Cell, but don't actually behave like Cells.  Maybe this could be fixed a bit by statically allocating some mark bits for them, but I think that cases where the mark bits are written to would still have to be guarded.
Comment 41 Andrew McCreight [:mccr8] 2011-04-29 11:54:02 PDT
The test would actually be something like:
    if (langID == nsIProgrammingLanguage::JAVASCRIPT &&
        (JSAtom::isStatic(child) || !xpc_IsGrayGCThing(child)) &&
Or possibly "js::gc::GetGCThingTraceKind(child) == JSTRACE_STRING" in place of the isStatic declaration, but I get a warning:
js::gc::GetGCThingTraceKind(child) == JSTRACE_STRING
Comment 42 Andrew McCreight [:mccr8] 2011-04-29 12:00:19 PDT
Sorry, I clicked Save Changes by mistake.  I get a warning "../../dist/include/jsstr.h:630: warning: inline function ‘static bool JSAtom::isStatic(const void*)’ used but never defined".  But it looks like I can call js_GetGCThingTraceKind(child), so I'll go with that.
Comment 43 Andrew McCreight [:mccr8] 2011-04-29 12:32:53 PDT
Created attachment 529175 [details] [diff] [review]
avoid passing static strings by testing for strings in NoteScriptChild

This should fix the static string problem without altering js_GCThingIsMarked.  Getting the trace kind is a little more heavyweight than we strictly need for the crash, but I'm not sure how to get at the IsStatic function outside of the JS directory.  I also added an assertion to js_GCThingIsMarked.  I'll see how it does on try, applied to the latest Aurora.
Comment 44 Andrew McCreight [:mccr8] 2011-04-29 12:36:27 PDT
I'm also taking suggestions for what is the right way to format that monster test.
Comment 45 Brendan Eich [:brendan] 2011-04-29 14:28:57 PDT
Comment on attachment 527607 [details] [diff] [review]
don't examine mark bits on static strings

The other approach is to JS_ASSERT(!JSAtom::isStatic(thing)) and rely on DEBUG build coverage.

In this patch, shouldn't the new code return true, not false, i.e. impute liveness to static strings?

Comment 46 Brendan Eich [:brendan] 2011-04-29 14:52:19 PDT
Comment on attachment 529175 [details] [diff] [review]
avoid passing static strings by testing for strings in NoteScriptChild

> js_GCThingIsMarked(void *thing, uintN color = BLACK)
> {
>     JS_ASSERT(thing);
>     AssertValidColor(thing, color);
>+    JS_ASSERT(!JSAtom::isStatic(thing));

Ok, that is more like it ;-).

>     // skip over non-grey JS children
>-    if (langID == nsIProgrammingLanguage::JAVASCRIPT) {
>-        JSObject *obj = static_cast<JSObject*>(child);
>-        if (!xpc_IsGrayGCThing(obj) && !WantAllTraces())
>-            return;
>-    }
>+    if (langID == nsIProgrammingLanguage::JAVASCRIPT &&
>+        (js_GetGCThingTraceKind(child) == JSTRACE_STRING || !xpc_IsGrayGCThing(child)) &&
>+        !WantAllTraces())
>+        return;

Nits: brace consequent even if one line when if condition and/or else clause require more than one line. (We also brace both then and else if either requires braces.)

I think the old if-if style was better since the langID condition could become part of a switch or if-else-if chain (in theory -- unlikely in reality), so it's more future proof and conceptually nicer to avoid the A&&(B||C)&&D complexity.

Did you want a review on this patch? If so, count this as my r+.

Comment 47 Andrew McCreight [:mccr8] 2011-04-29 15:12:39 PDT
Given the trail of obsoleted r+s from Andreas on this bug, I was going to wait for a successful try run before soliciting another review, but thanks, I was going to ask for a review eventually.

I could imagine there being other JS-specific optimizations that could be done in the future, but I'll just wait to split it up until that happens.
Comment 48 Brendan Eich [:brendan] 2011-04-29 18:37:07 PDT
js_ is not public API, so can assert instead of testing. If we must have a JS_IsGCThingMarked (note my reordering to put Is at the front), it can test.

Comment 49 Andrew McCreight [:mccr8] 2011-04-29 19:06:27 PDT
That's good to know.  As things currently stand, the only caller of js_GCThingIsMarked is xpc_IsGrayGCThing (which is just a wrapper that passes a constant for the second argument), which is in xpcpublic.h, so I guess a test would go there if we were to add one.
Comment 50 Brendan Eich [:brendan] 2011-04-29 19:21:27 PDT
I should add that while we do add tests at public API entry points that would be asserts in internal underpinnings, we also do not test a lot at API entry points. You have to be "this tall" to ride the JS API roller coaster. There are judgment calls where we can test in public and assert underneath, such as this present case.

Comment 51 Andrew McCreight [:mccr8] 2011-04-29 19:22:56 PDT
Hmm, since js_* isn't public, I guess that means I shouldn't call js_GetGCThingTraceKind inside of xpcom?  Which would explain why the Windows build of my patch is busted with this error:

nsCycleCollector.obj : error LNK2001: unresolved external symbol
"__declspec(dllimport) unsigned long __cdecl js_GetGCThingTraceKind(void *)"

I'll look around for some kind of public way to distinguish JS strings from JS objects and JS shapes or whatnot.  I appear to have added the only use of xpc_IsGrayGCThing outside of the js directory.
Comment 52 Brendan Eich [:brendan] 2011-04-29 19:26:58 PDT
Hmm, jsgc.h in my tm tip is JS_FRIEND_API, which is enough to export the entry point from the DLL/DSO, or should be.

Adding a public API near the other JS_*GCThing* APIs would be ok, and could test.

Comment 53 Andrew McCreight [:mccr8] 2011-05-02 16:38:07 PDT
The problem is probably that js_GetGCThingTraceKind is declared JS_FRIEND_API in jsgc.h, but not jsgc.cpp.
Comment 54 Andrew McCreight [:mccr8] 2011-05-02 17:57:15 PDT
Adding a JS_FRIEND_API annotation to jsgc.cpp didn't work, so I'll look into adding to the public API.
Comment 55 Brendan Eich [:brendan] 2011-05-02 18:05:14 PDT
The .h file JS_{FRIEND,PUBLIC}_API decorator should work, it applies to the defn in the .cpp file too.

Something's up, and if there's any justice, JS_PUBLIC_API should fail likewise. Don't tell me we have to update a .def file like on Win3.1?

Comment 56 Andrew McCreight [:mccr8] 2011-05-03 16:54:28 PDT
Using dumpbin /EXPORTS, I found that mozjs.dll is exporting js_GetGCThingTraceKind with a return type of "unsigned int", but nsCycleCollector was expecting "unsigned long".  In fact, it appears that on Windows (at least on my Vista computer) that uint32 is defined as "unsigned int" in js/jsgc.cpp and as "unsigned long" in xpcom/base/nsCycleCollector.cpp, and that on Windows they are different types.  So I'll add a public API call for it, which Andreas says should help.
Comment 57 Andrew McCreight [:mccr8] 2011-05-04 09:51:40 PDT
Created attachment 530048 [details]
add safe function to check JS object gray bits

Here's an alternate approach to this problem.  Instead of having to export this whole new language about checking JStraceKinds, I added a new function xpc_GCThingIsGrayCCThing to xpcpublic.h that takes a GCthing, and only checks if it is gray if the object is a JS object or an XML object.  One thing I don't really like about this is that I inlined the macro ADD_TO_CC from nsXPConnect here, because it doesn't seem like that's a concept we need to export from xpcpublic.  Maybe that should be factored differently.

The existing calls to xpc_IsGrayGCThing remain as they are.  They are all within the /js/src/xpconnect/src directory, so maybe xpc_IsGrayGCThing could be moved from xpcpublic to somewhere else.  It might also be nice to get rid of xpc_IsGrayGCThing and replace the existing uses with specialized versions (in a separate patch):
1. a version that takes a JSObject* instead of a void*.  This covers the places where the GCthing is known to be a JSObject, in xpc_UnmarkGrayObject and XPCJSRuntime::SuspectWrappedNative.

2. a version that takes the trace kind of the object, and asserts that the kind is ADD_TO_CC, but does not actually check it in non-debug builds.  This covers the uses in NoteJSChild, nsXPConnect::Traverse and UnmarkGrayChildren, where the trace kind of the GCThing is externally supplied and tested.

Along with xpc_GCThingIsGrayCCThing, this covers every use of the ill-named xpc_IsGrayGCThing.  This change would document the existing invariants that make invoking xpc_IsGrayGCThing okay, without incurring any additional overhead.  I don't know if it would make sense to export all three invariants these via xpcpublic.h or not.  As I said, except for the use I added in the cycle collector, all of the existing uses are in the xpconnect directory.  Anyway, just a thought I had.
Comment 58 Andrew McCreight [:mccr8] 2011-05-04 15:31:31 PDT
Created attachment 530177 [details] [diff] [review]
add safe function to check JS object gray bits

forgot to check patch for the previous upload of this
Comment 59 Andreas Gal :gal 2011-05-04 15:51:07 PDT
Comment on attachment 530177 [details] [diff] [review]
add safe function to check JS object gray bits

Review of attachment 530177 [details] [diff] [review]:

::: js/src/xpconnect/src/xpcpublic.h
@@ +150,1 @@
 inline JSBool

This needs an explanation. JSStaticAtoms are not GC things, they are allocated statically in the data segment. Thats why it crashes. Explain that.

@@ +161,5 @@
+xpc_GCThingIsGrayCCThing(void *thing)
+    uint32 kind = js_GetGCThingTraceKind(thing);
+    return (kind == JSTRACE_OBJECT || kind == JSTRACE_XML) &&
+        xpc_IsGrayGCThing(thing);

Indent xpc_ with the (, or even better does it fit on one line?

::: xpcom/base/nsCycleCollector.cpp
@@ +1677,5 @@
     // skip over non-grey JS children
+    if (langID == nsIProgrammingLanguage::JAVASCRIPT &&
+        !xpc_GCThingIsGrayCCThing(child) &&
+        !WantAllTraces())
+        return;

Multi-line conditions want a { } for clarity.
Comment 60 Andrew McCreight [:mccr8] 2011-05-04 16:25:39 PDT
Created attachment 530191 [details] [diff] [review]
add safe function to check JS object gray bits (addressed gal's review points)

I addressed all of Andreas's comments.  I'm going to push this to try.
Comment 61 Andrew McCreight [:mccr8] 2011-05-04 19:27:03 PDT
Well, due to the fact that xpc_GCThingIsGrayCCThing is inlined, causing it to have a call to js_GCThingTraceKind in nsCycleCollector, this hits the exact same build problem with unsigned int vs unsigned long.  I guess I'll make it not inlined and see if that fixes it.
Comment 62 Andrew McCreight [:mccr8] 2011-05-05 08:31:22 PDT
Created attachment 530323 [details] [diff] [review]
add a safe function to check grey bits, call it in the cycle collector

Basically the same as the last patch, except that instead of inlining the function, I move it into nsXPConnect.cpp to avoid build problems on Windows.

I did a try run on this over night, built against Mozilla Central, and it is entirely green except for the two known oranges on Linux{64} opt Moth, the usual JP Win debug reds, and some funkiness in Android opt that looks about the same as mobile, and doesn't show up on the M-C tinderbox anyways.
Comment 63 Blake Kaplan (:mrbkap) 2011-05-05 15:17:10 PDT
Comment 64 Andrew McCreight [:mccr8] 2011-05-08 12:54:36 PDT
*** Bug 649579 has been marked as a duplicate of this bug. ***
Comment 65 Paul Biggar 2011-05-09 07:11:08 PDT
*** Bug 647467 has been marked as a duplicate of this bug. ***
Comment 66 Andrew McCreight [:mccr8] 2011-05-09 13:31:10 PDT
pbiggar reported in another bug that this patch fixed at least one apparent instance of this crash, which is heartening.

My patch has been on tracemonkey for a few days without any apparent issues.  I'm going to do a try run against moz-central, and then find somebody to land it there if there's no Tracemonkey-Mozilla Central merge in the next day or two.  Once that is done, it should stop showing up in crash-reports for Nightly.
Comment 67 Chris Leary [:cdleary] (not checking bugmail) 2011-05-10 15:12:05 PDT
cdleary-bot mozilla-central merge info:
Comment 68 Sheila Mooney 2011-05-11 10:34:05 PDT
Did this get checked into Aurora as well?
Comment 69 Andrew McCreight [:mccr8] 2011-05-11 10:34:40 PDT
Comment 70 Andrew McCreight [:mccr8] 2011-05-12 09:58:43 PDT
Comment on attachment 530323 [details] [diff] [review]
add a safe function to check grey bits, call it in the cycle collector

This landed a day or two ago on Moz Central and there haven't been any instances of this crash on nightly builds since then.  Of course, there haven't been any crashes on Aurora builds since then, either.  The patch has also been on Tracemonkey for a week without any apparent problems.
Comment 71 JP Rosevear [:jpr] 2011-05-12 14:45:24 PDT
Please land for Aurora by Monday May 16 or the approval will potentially be lost.  Please mark as status-firefox5 fixed when you do.
Comment 72 Andrew McCreight [:mccr8] 2011-05-12 14:46:46 PDT
sicking said he can land it for me when he gets approval for another patch of his.
Comment 73 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-05-13 15:40:09 PDT
checked in

Thanks for providing a |hg export| patch! Makes checking in so much easier.
Comment 74 Raul Malea 2011-05-14 16:13:54 PDT
Very good jobs! Thanks!

(bug spam :P)

Note You need to log in before you can comment on or make changes to this bug.