Closed Bug 650519 Opened 14 years ago Closed 14 years ago

Firefox crash [@ js_GCThingIsMarked(void*, unsigned int) ]

Categories

(Core :: JavaScript Engine, defect)

5 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla6
Tracking Status
firefox5 + fixed

People

(Reporter: raul.malea, Assigned: mccr8)

References

Details

(Keywords: crash, crashreportid, regression, Whiteboard: [Aurora][Fx5] fixed-in-tracemonkey)

Crash Data

Attachments

(1 file, 7 obsolete files)

Aurora ln10 (romanian ) build 16.04.2011 and Nighlty ln10(romanian) build 15.04.2011 crashed when i try opened my default profile (fx4). In new profile this builds opened and run normally.


Crashed reports (both Aurora and Nightly):

https://crash-stats.mozilla.com/report/index/bp-59e6c021-7526-4ca9-8e46-504342110416

https://crash-stats.mozilla.com/report/index/bp-bc4e29d7-101c-4895-900a-a63132110416

https://crash-stats.mozilla.com/report/index/bp-2c3aec9f-1a55-4420-bf1c-a9d7a2110416

https://crash-stats.mozilla.com/report/index/bp-31cd77a2-950b-498d-b0fe-1561e2110416

https://crash-stats.mozilla.com/report/index/bp-755d797c-509d-4686-b8b3-031142110416

App tabs opened: Gmail, facebook, Bugzilla, mozillazine, babelzilla, userbase.kde.org, narro, crowdin.net, my wordpress controlpanel


My addons list :

Abduction! 3.0.10
About startup 0.1.5
about:me 0.5
about:response 0.1
Add-on Collector 2.0.1
Add-on Compatibility Reporter 0.8.3
AddonFox 1.1.7
Adobe Acrobat 10.0.1.434
Adobe Contribute Toolbar 6.0 [DISABLED]
Blog Comment Autofill 
British English Dictionary 1.19.1
Bugzilla Tweaks 1.9
ChatZilla 0.9.86.1
Converter 1.0.9
Crash Report Helper (Asistent raportare incidente) 1.2
Default 4.0 [DISABLED]
Dicționar românesc de corectare ortografică. 1.11
Download Statusbar 0.9.8
DownloadHelper 4.8.6
DownThemAll! 2.0.2
F1 by Mozilla Labs 0.8.3
FFixer 2.3.1
FileList V3 
fireblur 1260925626
Firefox 4 Menu Button with icon+(Tabs in Titlebar) 
Firefox B 1260925626 [DISABLED]
Flagfox 4.1.1
FlashGames Maximizer 3.1.3.86 [DISABLED]
FlashGot 1.2.9.2
FoxTab 1.4.2
German Dictionary 2.0.2
Google Earth Plugin 1.0.0.1
Google Shortcuts 2.1.6
Google Talk Plugin 1.9.2.0
Google Talk Plugin Video Accelerator 0.1.43.5
Google Update 1.2.183.39
Greasefire 1.0.4 [DISABLED]
Greasemonkey 0.9.2
Hard blockers counter 1.0.1 [DISABLED]
IE Tab 2 (FF 3.6+) 2.12.21.1
IE Tab Plug-in 2.2.0.1
ImTranslator 4.01
Java Deployment Toolkit 6.0.240.7 6.0.240.7
Java(TM) Platform SE 6 U24 6.0.240.7
Massive Extender 1.0b5
Microsoft Office 2010 14.0.4730.1010
Microsoft Office 2010 14.0.4761.1000
Minefield - Better Add-on Bar/Status Bar  [DISABLED]
Movie Torrent Search Linker (for IMDB, Rotten Tomatoes and Yahoo Movies) 0.3.7.9
Mozilla Labs: Prospector - Home Dash 10 [DISABLED]
Mozilla QA Companion 1.2.3
MozMill Crowd 0.1.2
Nightly Tester Tools 3.1.5
NoScript 2.1.0.2
NVIDIA 3D Vision 7.17.12.6099
NVIDIA 3D VISION 7.17.12.6099
Personas 1.6.2
Picasa 3.0.0.0
Power Twitter 1.60
Prism for Firefox 1.0b4 [DISABLED]
Puppy Dogs... 1291695684 [DISABLED]
QuickTime Plug-in 7.6.9 7.6.9.0
Read It Later 2.1.1
RealJukebox NS Plugin 12.0.1.633
RealPlayer Version Plugin 12.0.1.633
RealPlayer(tm) G2 LiveConnect-Enabled Plug-In (32-bit)  12.0.1.633
RealPlayer(tm) HTML5VideoShim Plug-In (32-bit)  12.0.1.633
ReminderFox 1.9.9.3.1
Sage 1.4.10
Shockwave Flash 10.2.153.1
Shockwave for Director 11.5.9.620
Show Just Image 2 2.5.2
Silverlight Plug-In 4.0.60129.0
startup.service 0.2
Status-4-Evar 2011.04.06.18
StumbleUpon 3.81
Stylish 1.1.2
Test Pilot 1.1
TortoiseSVN Menu 0.2.5
Ubiquity 0.6 [DISABLED]
URL Fixer 2.0.3
Userscripts Updater 
Windows Live Photo Gallery 15.4.3508.1109
WiseStamp 2.2.1
Yahoo Application State Plugin 1.0.0.7
Yoono 7.6.2
YouTube Enhancer 2010-12-18
I used master windows password. 
My graphics is:

Grafică

Descrierea adaptorului NVIDIA GeForce 8600 GT 
ID furnizor 10de
ID dispozitiv 0402
RAM pentru adaptor 512
Drivere pentru adaptor  nvd3dum nvwgf2um,nvwgf2um
Versiune driver  8.17.12.6724
Dată driver  2-23-2011
Direct2D activat true
DirectWrite activat  true (6.1.7601.17563, font cache n/a)
Motor de afișare WebGL  Google Inc. -- ANGLE -- OpenGL ES 2.0 (ANGLE 0.0.0.541)
Accelerare GPU Windows 1/1 Direct3D 10
Whiteboard: [Aurora][Fx5]
Frame 	Module 	Signature 	Source
0 	mozjs.dll 	js_GCThingIsMarked(void*,unsigned int) 	js/src/jsgc.cpp:523
1 	xul.dll 	GCGraphBuilder::NoteScriptChild(unsigned int,void*) 	xpcom/base/nsCycleCollector.cpp:1680
2 	xul.dll 	xul.dll@0xc65bdb 	
3 	xul.dll 	NoteChild 	obj-firefox/xpcom/build/nsCycleCollectionParticipant.cpp:46
4 	xul.dll 	nsJSArgArray::cycleCollection::Trace(void*,void (*)(unsigned int,void*,void*),void*) 	dom/base/nsJSEnvironment.cpp:4001
5 	xul.dll 	nsJSArgArray::cycleCollection::Traverse(void*,nsCycleCollectionTraversalCallback&) 	dom/base/nsJSEnvironment.cpp:3988
6 	xul.dll 	nsCycleCollector::MarkRoots(GCGraphBuilder&) 	xpcom/base/nsCycleCollector.cpp:1773
7 	xul.dll 	nsCycleCollector::BeginCollection(int,nsICycleCollectorListener*) 	xpcom/base/nsCycleCollector.cpp:2636
8 	xul.dll 	nsCycleCollectorRunner::Run() 	xpcom/base/nsCycleCollector.cpp:3316
9 	xul.dll 	nsThread::ProcessNextEvent(int,int*) 	xpcom/threads/nsThread.cpp:618
10 	xul.dll 	nsThreadStartupEvent::Run() 	xpcom/threads/nsThread.cpp:202
11 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426
12 	nspr4.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:122
13 	mozcrt19.dll 	_callthreadstartex 	obj-firefox/memory/jemalloc/crtsrc/threadex.c:348
14 	mozcrt19.dll 	_threadstartex 	obj-firefox/memory/jemalloc/crtsrc/threadex.c:326
15 	kernel32.dll 	BaseThreadInitThunk 	
16 	ntdll.dll 	__RtlUserThreadStart 	
17 	ntdll.dll 	_RtlUserThreadStart

Can you pinpoint the Crashes to a certain Addon (Extension/Plugin)?
Severity: blocker → critical
Summary: Aurora and Nighlty crashed when default profile (Fx4) is opened. → Aurora and Nighlty crashed when default profile (Fx4) is opened [@ js_GCThingIsMarked(void*, unsigned int) ]
Assignee: nobody → general
Component: General → JavaScript Engine
Product: Firefox → Core
QA Contact: general → general
Something is awry with js_GCThingIsMarked, either with the implementation or the way people are invoking it.  I have found crashes for pretty much every place it is invoked, except in nsXPConnect::Traverse (which may be a hint, as the call there is guarded by some auxiliary tests about the nature of the object being checked), in 4.0, 5.0 or 6.0.  The change I introduced in Bug 641910 seems to cause the crashes to be much more frequent, but I'm not sure if that is due to way it is invoked, or if the function is being called much more frequently with this change.
Blocks: 641910
Summary: Aurora and Nighlty crashed when default profile (Fx4) is opened [@ js_GCThingIsMarked(void*, unsigned int) ] → js_GCThingIsMarked(void*, unsigned int) crashes
On a brief run of a bunch of pages, 563822 out of 2313193 (24%) of invocations of js_GCThingIsMarked were in the cycle collector, so frequency of invocation doesn't seem to entirely explain why crashes are much more frequent in the code I added.
blocking2.0: --- → ?
These crashes are rare with 4.0, so if they are causing you trouble, you can switch back to the main release from Aurora or Nightly.
Almost all of the crashes of this in 5.0a2 are caused by traversals of nsJSArgArray.  There are also a couple of traversals of nsJSScriptTimeoutHandler.  Hopefully this will provide some traction to figure out what is causing the bug.
This appears to be only on Aurora and not on the trunk.
I'm not sure exactly how version numbers correspond to names, but it is showing up for me on Crash Stats in version 6.0a1 and 5.0a2.
After some discussion with Andreas and Gregor, I have a possible solution.  This is currently the number 5 crasher on Aurora.

nsJSScriptTimeoutHandler is adding a JSFlatString to the cycle collector graph, which can possibly be a statically allocated string, which would cause a crash when FF tries to examine the mark bits.  Similarly, Andreas said that nsJSArgArray can frequently contain strings, which will cause the same problem.  This patch removes the traversal of the field of the timeout handler that contains a string, and changes the guard for the array elements to prevent primitives from being added to the CC graph.  These two classes are responsible for 100% of the crashes with this signature that I have seen on Nightly and Aurora when digging through the crash reports.

I also added a JS_ASSERT to js_GCThingIsMarked to hopefully catch this in a slightly more controlled fashion.
Attachment #527125 - Flags: review?(gal)
Assignee: general → continuation
Also, independent of causing crashes, strings are acyclic and thus should never be added to the cycle collector graph.
blocking2.0: ? → ---
It is currently the number 5 crasher for Aurora.  #24 for Nightly. Tracking...
I used a . instead of a -> in the assertion...
Attachment #527125 - Attachment is obsolete: true
Attachment #527125 - Flags: review?(gal)
Attachment #527138 - Flags: review?(gal)
Attachment #527138 - Flags: review?(gal) → review+
The assertion I added is triggering on my try-server builds.  I'll look into what the problem is. (Ignore the debug build failures...)
http://tbpl.mozilla.org/?tree=MozillaTry&rev=78649e59b57b
http://tbpl.mozilla.org/?tree=MozillaTry&rev=c271271d877b
(In reply to comment #4)
> Something is awry

arewereliableyet.com? :)
Was js_GCThingIsMarked changed? It seems weird that an API with that name can't handle all GCThings.
XPCTraceableVariant might have the same problem?
Yeah we might want to rename that.
(In reply to comment #17)
> XPCTraceableVariant might have the same problem?

and IDBCursor adnd IDBRequest?
Can't we just fix js_GCThingIsMarked?
In my current patch, it seems like the assertion basically fires off immediately when the CC starts up.  I'll need to figure out what kind of object is causing the problem.  Hopefully this doesn't mean that the change to nsJSArgArray isn't also wrong.  In addition to every debug run failing, there were a number of failures from timeouts on the opt builds.  I'm not sure what the deal is with that.

js_GCThingIsMarked was changed by Gregor's compartment GC patch, but the previous version also looked up in a mark bit map, so I'm not sure how it would have worked before that either.  My guess is that my patch changed things so the entire CC graph is pushed into the function, whereas before whatever corner of the CC graph has these two types wasn't getting examined in there.

It does seem a little sketchy to require a delicate globally enforced invariant like this to prevent crashes.  Judging by Aurora data, it seems okay for now, but there's nothing to prevent somebody from adding a JS string to the cycle collector.  If the guard is too weak, there are crashes.  If the guard is too strong, there are leaks.
I looked up the old version and I didn't do any changes in the behavior. We also went straight into the bitmap and tested if the bit corresponding to the address is set.

The definition of gcthings has changed. We started with static strings that are no longer on the heap about 2 years ago and recently we added shapes to the heap. Calling js_GCThingIsMarked with a static string will result in a crash right now.

We can catch static strings and always return true for the optimized builds (so we don't crash) and we can also add an assertion that triggers for static strings in debug builds.

If static strings are not the reason we could use the logic from the conservative stack scanner to see if the actual pointer we feed into the function points to a real gcthing.
The assertion I added seems to be firing off on a JS number.  I'm basing this guess  on the fact that thing->clasp.name == "Number", for the thing that is causing the assertion.  They do seem to be primitives, judging by the use of "setPrimitiveThis", which also applies to booleans.  Next, I'm going to look at what is adding a JS number to the CC graph.  It is coming in via js::gc::Mark<JSObject>.
So, this particular number is being reached via MarkObjectSlot, from a ChromeWindow.  It seems like that would be a fairly common thing.

Bill McCloskey said that the right way to do the assert is to check to see if it is a static string, because everything else should be okay.  I changed the assertion to this:
  JS_ASSERT(!JSAtom::isStatic(thing));
...and it at least lets me browse around a bit without failing.  I'll run Mochitest on my machine and see how that goes.

We could also check for this, and return something without looking at the mark bit.  I'm not really sure what the right value to return would be, though.

It might be nice to eventually have a CC-specific variant that will ignore primitive objects, that the GC does care about.
With this patch in place, Mochitest 1-5 and Crashtest orange out on Linux debug builds, due to a crash.  I can confirm that on my own Mac, OSX64 debug builds for at least Mochitest 2 consistently crash on the exact same test, so this probably isn't just platform wonkiness or transient errors.

The Linux stack traces all look like more or less like this (with registers and "Found by" deleted):

Crash reason:  SIGSEGV
Crash address: 0xdadadada

Thread 0 (crashed)
 0  libxul.so!nsCharTraits<short unsigned int>::length [nsCharTraits.h : 384 + 0x3]
 1  libxul.so!nsDependentString::nsDependentString [nsTDependentString.h : 89 + 0xa]
 2  libxul.so!nsGlobalWindow::RunTimeout [nsGlobalWindow.cpp : 9096 + 0xba]
 3  libxul.so!nsGlobalWindow::TimerCallback [nsGlobalWindow.cpp : 9456 + 0x1e]
 4  libxul.so!nsTimerImpl::Fire [nsTimerImpl.cpp : 424 + 0x14]
 5  libxul.so!nsTimerEvent::Run [nsTimerImpl.cpp : 516 + 0x12]

From Googling, it looks like 0xdadadada is written to JS objects when they are freed.

On my own mac, I was able to stop the Mochitest 2 crash by backing out the change to the nsJSScriptTimeoutHandler, adding this line back:
  NS_IMPL_CYCLE_COLLECTION_TRACE_JS_MEMBER_CALLBACK(mExpr)

mExpr is a JS string, in a timeout handler, so combining the above facts together my theory is that somehow making the cycle collector not traverse mExpr is causing the JS GC to free mExpr, which is later causing the crash when something tries to access it.  I have no idea how that would happen, as removing objects from the cycle collector graph should only cause leaks, not make objects get freed too early.
(In reply to comment #25)
> I have no idea how that would happen, as
> removing objects from the cycle collector graph should only cause leaks, not
> make objects get freed too early.

That's not true for JS objects. Who's responsible for rooting mExpr after your change? AFAICT nothing, so it's kinda normal that it becomes garbage after the next GC. You probably need to add explicit JS_Add*Root/JS_Remove*Root calls.
Ouch. Good point. We could just change the macro to trace but not add to the CC if its not an object?
One of these crashes that shows up very rarely in 4.0 showed up in 5.0: https://crash-stats.mozilla.com/report/index/55b850c8-6269-4552-974e-8b2c82110420

The call stack looks like 
 XPCJSRuntime::SuspectWrappedNative
 WrappedNativeSuspecter
 JS_DHashTableEnumerate
 XPCWrappedNativeScope::SuspectAllWrappers

It doesn't look like this could be caused by a string (because the thing is the result of GetFlatJSObjectPreserveColor(), so hopefully it can't be a string), but as I said this is a very rare crash in 4.0 (ranked < 300 I think) so probably not worth worrying about too much.
What peterv is saying is that the Trace function is used by both the CC and the GC, so we can't skip children there.  This patch instead fixes js_GCThingIsMarked so it returns false for static strings.  This should fix the function for all uses, and makes it so the name should be accurate.  Seems a little ugly, but this seems similar to how the rest of the JS GC deals with this problem.  Static strings don't have any marking bits associated with them, so they are never marked any color.  Fanciness like pruning primitives from the CC can be done in a later patch.  I also removed a completely bogus cast.

I'm trying this out on the try server, for a Linux debug build.  A random chunk of mochitest I tried was okay, where it wasn't before.
Attachment #527138 - Attachment is obsolete: true
Attachment #527607 - Flags: review?(gal)
Linux debug passed all tests on the try server.
I ran this patch on the try server, and it passed everything on the try server across all platforms, except for what look like 3 known oranges, plus another orange on a test that seems to have trouble around midnight, and was run around midnight.  Linux64 debug had some kind of immediate infrastructure failure, so I guess I should retry that for completeness.
http://tbpl.mozilla.org/?tree=MozillaTry&rev=f26e54fd04ce
http://tbpl.mozilla.org/?tree=MozillaTry&rev=45f577ed0fa1
An alternate form of this patch would leave out the changes to nsCycleCollector.cpp, which I think just removes a no-op that is incorrect (the value goes from void* -> JSObject* -> void* so nothing should happen).  I've pushed that to try make sure that is stable.  I could then find a JS GC person to review it if Andreas is too busy.
Summary: js_GCThingIsMarked(void*, unsigned int) crashes → Firefox crash [@ js_GCThingIsMarked(void*, unsigned int) ]
I see there's currently no testcase for this bug yet, and I have a theory derived from comment 7 and comment 23.

window.setTimeout returns a number - or at least, that's how JS variables store it.  That number is how users can cancel the timeout, by calling window.clearTimeout later...

Could a setTimeout stress test capture this bug?
Comment 23 was about a patch I proposed that was wrong, so it shouldn't have anything to do with the original problem.

To trigger the problem according to my theory, the cycle collector has to attempt to add a JSStaticAtom to the CC graph.  I think these are JS strings that are very short (like "a"), and the runtime just statically allocates them.  The field mExpr of the timeout handler and the contents of JSArgArrays seem to be where this seems to happen.  Once that's been built, it needs to be reachable from the DOM via XPConnect, but not from any Javascript roots.  Then you have to leave things like that until a cycle collection happens.  I don't know the best way to go about getting that to happen.
(In reply to comment #35)
> I think these are JS strings
> that are very short (like "a"), and the runtime just statically allocates them.

That's right, as an optimization the JS engine statically allocates a bunch of common 1, 2 and 3 char strings.  For details, look for "unit string", "length-2 strings" and "integer strings" in js/src/jsstr.cpp.
Comment on attachment 527607 [details] [diff] [review]
don't examine mark bits on static strings

This might slow us down, so lets do some profiling after this and see whether it bites (I think the rest of the GC heap walk code is **** enough that this is actually not going to make a difference for now).
Attachment #527607 - Flags: review?(gal) → review+
It looks like every other use of this function is guarded in a way that keeps strings from hitting it, so that's the implicit invariant that kept it from breaking before.
Maybe if we remove those other guards then the addition of this check won't really change perf then?
I don't think they can be easily removed, as they do things like check whether a GCthing is a string or an object, and do different things based on that.
For instance, in one place the guard is GetGCThingTraceKind, which calls JSAtom::isStatic, so you'd have to lift the call to xpc_IsGrayGCThing above the call to GetGCThingTraceKind, then have a variant of GetGCThingTraceKind that doesn't call JSAtom::isStatic.  In NoteJSChild, the determination of the kind of the object is done outside of the function entirely.

The easiest way to avoid perf problems, is to leave js_GCThingIsMarked in its current fragile state and change nsCycleCollector.cpp to something like:

if (!JSAtom::isStatic(child) && !xpc_IsGrayGCThing(child) && !WantAllTraces())

Along these same lines, I could check the GCThingKind and filter out strings.  This might filter out some more things from the CC graph, but would make the test more expensive.

As I said, these solutions leave js_GCThingIsMarked in its current fragile state, but renaming it (xpc_IsGrayGCThingThatIsNotAStaticString or xpc_IsGrayGCThingThatIsActuallyAGCThing?) and adding some assertions might help a bit.

The real underlying problem here is that statically allocated JS strings are subtypes of js::gc::Cell, but don't actually behave like Cells.  Maybe this could be fixed a bit by statically allocating some mark bits for them, but I think that cases where the mark bits are written to would still have to be guarded.
The test would actually be something like:
    if (langID == nsIProgrammingLanguage::JAVASCRIPT &&
        (JSAtom::isStatic(child) || !xpc_IsGrayGCThing(child)) &&
        !WantAllTraces())
        return;
Or possibly "js::gc::GetGCThingTraceKind(child) == JSTRACE_STRING" in place of the isStatic declaration, but I get a warning:
js::gc::GetGCThingTraceKind(child) == JSTRACE_STRING
Sorry, I clicked Save Changes by mistake.  I get a warning "../../dist/include/jsstr.h:630: warning: inline function ‘static bool JSAtom::isStatic(const void*)’ used but never defined".  But it looks like I can call js_GetGCThingTraceKind(child), so I'll go with that.
This should fix the static string problem without altering js_GCThingIsMarked.  Getting the trace kind is a little more heavyweight than we strictly need for the crash, but I'm not sure how to get at the IsStatic function outside of the JS directory.  I also added an assertion to js_GCThingIsMarked.  I'll see how it does on try, applied to the latest Aurora.
I'm also taking suggestions for what is the right way to format that monster test.
Comment on attachment 527607 [details] [diff] [review]
don't examine mark bits on static strings

The other approach is to JS_ASSERT(!JSAtom::isStatic(thing)) and rely on DEBUG build coverage.

In this patch, shouldn't the new code return true, not false, i.e. impute liveness to static strings?

/be
Comment on attachment 529175 [details] [diff] [review]
avoid passing static strings by testing for strings in NoteScriptChild

> js_GCThingIsMarked(void *thing, uintN color = BLACK)
> {
>     JS_ASSERT(thing);
>     AssertValidColor(thing, color);
>+    JS_ASSERT(!JSAtom::isStatic(thing));

Ok, that is more like it ;-).

>     // skip over non-grey JS children
>-    if (langID == nsIProgrammingLanguage::JAVASCRIPT) {
>-        JSObject *obj = static_cast<JSObject*>(child);
>-        if (!xpc_IsGrayGCThing(obj) && !WantAllTraces())
>-            return;
>-    }
>+    if (langID == nsIProgrammingLanguage::JAVASCRIPT &&
>+        (js_GetGCThingTraceKind(child) == JSTRACE_STRING || !xpc_IsGrayGCThing(child)) &&
>+        !WantAllTraces())
>+        return;

Nits: brace consequent even if one line when if condition and/or else clause require more than one line. (We also brace both then and else if either requires braces.)

I think the old if-if style was better since the langID condition could become part of a switch or if-else-if chain (in theory -- unlikely in reality), so it's more future proof and conceptually nicer to avoid the A&&(B||C)&&D complexity.

Did you want a review on this patch? If so, count this as my r+.

/be
Given the trail of obsoleted r+s from Andreas on this bug, I was going to wait for a successful try run before soliciting another review, but thanks, I was going to ask for a review eventually.

I could imagine there being other JS-specific optimizations that could be done in the future, but I'll just wait to split it up until that happens.
js_ is not public API, so can assert instead of testing. If we must have a JS_IsGCThingMarked (note my reordering to put Is at the front), it can test.

/be
That's good to know.  As things currently stand, the only caller of js_GCThingIsMarked is xpc_IsGrayGCThing (which is just a wrapper that passes a constant for the second argument), which is in xpcpublic.h, so I guess a test would go there if we were to add one.
I should add that while we do add tests at public API entry points that would be asserts in internal underpinnings, we also do not test a lot at API entry points. You have to be "this tall" to ride the JS API roller coaster. There are judgment calls where we can test in public and assert underneath, such as this present case.

/be
Hmm, since js_* isn't public, I guess that means I shouldn't call js_GetGCThingTraceKind inside of xpcom?  Which would explain why the Windows build of my patch is busted with this error:

nsCycleCollector.obj : error LNK2001: unresolved external symbol
"__declspec(dllimport) unsigned long __cdecl js_GetGCThingTraceKind(void *)"
(__imp_?js_GetGCThingTraceKind@@YAKPAX@Z)

I'll look around for some kind of public way to distinguish JS strings from JS objects and JS shapes or whatnot.  I appear to have added the only use of xpc_IsGrayGCThing outside of the js directory.
Hmm, jsgc.h in my tm tip is JS_FRIEND_API, which is enough to export the entry point from the DLL/DSO, or should be.

Adding a public API near the other JS_*GCThing* APIs would be ok, and could test.

/be
The problem is probably that js_GetGCThingTraceKind is declared JS_FRIEND_API in jsgc.h, but not jsgc.cpp.
Adding a JS_FRIEND_API annotation to jsgc.cpp didn't work, so I'll look into adding to the public API.
The .h file JS_{FRIEND,PUBLIC}_API decorator should work, it applies to the defn in the .cpp file too.

Something's up, and if there's any justice, JS_PUBLIC_API should fail likewise. Don't tell me we have to update a .def file like on Win3.1?

/be
Using dumpbin /EXPORTS, I found that mozjs.dll is exporting js_GetGCThingTraceKind with a return type of "unsigned int", but nsCycleCollector was expecting "unsigned long".  In fact, it appears that on Windows (at least on my Vista computer) that uint32 is defined as "unsigned int" in js/jsgc.cpp and as "unsigned long" in xpcom/base/nsCycleCollector.cpp, and that on Windows they are different types.  So I'll add a public API call for it, which Andreas says should help.
Here's an alternate approach to this problem.  Instead of having to export this whole new language about checking JStraceKinds, I added a new function xpc_GCThingIsGrayCCThing to xpcpublic.h that takes a GCthing, and only checks if it is gray if the object is a JS object or an XML object.  One thing I don't really like about this is that I inlined the macro ADD_TO_CC from nsXPConnect here, because it doesn't seem like that's a concept we need to export from xpcpublic.  Maybe that should be factored differently.

The existing calls to xpc_IsGrayGCThing remain as they are.  They are all within the /js/src/xpconnect/src directory, so maybe xpc_IsGrayGCThing could be moved from xpcpublic to somewhere else.  It might also be nice to get rid of xpc_IsGrayGCThing and replace the existing uses with specialized versions (in a separate patch):
1. a version that takes a JSObject* instead of a void*.  This covers the places where the GCthing is known to be a JSObject, in xpc_UnmarkGrayObject and XPCJSRuntime::SuspectWrappedNative.

2. a version that takes the trace kind of the object, and asserts that the kind is ADD_TO_CC, but does not actually check it in non-debug builds.  This covers the uses in NoteJSChild, nsXPConnect::Traverse and UnmarkGrayChildren, where the trace kind of the GCThing is externally supplied and tested.

Along with xpc_GCThingIsGrayCCThing, this covers every use of the ill-named xpc_IsGrayGCThing.  This change would document the existing invariants that make invoking xpc_IsGrayGCThing okay, without incurring any additional overhead.  I don't know if it would make sense to export all three invariants these via xpcpublic.h or not.  As I said, except for the use I added in the cycle collector, all of the existing uses are in the xpconnect directory.  Anyway, just a thought I had.
forgot to check patch for the previous upload of this
Attachment #530048 - Attachment is obsolete: true
Attachment #530177 - Flags: review?(gal)
Comment on attachment 530177 [details] [diff] [review]
add safe function to check JS object gray bits

Review of attachment 530177 [details] [diff] [review]:

::: js/src/xpconnect/src/xpcpublic.h
@@ +150,1 @@
 inline JSBool

This needs an explanation. JSStaticAtoms are not GC things, they are allocated statically in the data segment. Thats why it crashes. Explain that.

@@ +161,5 @@
+xpc_GCThingIsGrayCCThing(void *thing)
+{
+    uint32 kind = js_GetGCThingTraceKind(thing);
+    return (kind == JSTRACE_OBJECT || kind == JSTRACE_XML) &&
+        xpc_IsGrayGCThing(thing);

Indent xpc_ with the (, or even better does it fit on one line?

::: xpcom/base/nsCycleCollector.cpp
@@ +1677,5 @@
     // skip over non-grey JS children
+    if (langID == nsIProgrammingLanguage::JAVASCRIPT &&
+        !xpc_GCThingIsGrayCCThing(child) &&
+        !WantAllTraces())
+        return;

Multi-line conditions want a { } for clarity.
Attachment #530177 - Flags: review?(gal) → review+
I addressed all of Andreas's comments.  I'm going to push this to try.
Attachment #527607 - Attachment is obsolete: true
Attachment #529175 - Attachment is obsolete: true
Well, due to the fact that xpc_GCThingIsGrayCCThing is inlined, causing it to have a call to js_GCThingTraceKind in nsCycleCollector, this hits the exact same build problem with unsigned int vs unsigned long.  I guess I'll make it not inlined and see if that fixes it.
Basically the same as the last patch, except that instead of inlining the function, I move it into nsXPConnect.cpp to avoid build problems on Windows.

I did a try run on this over night, built against Mozilla Central, and it is entirely green except for the two known oranges on Linux{64} opt Moth, the usual JP Win debug reds, and some funkiness in Android opt that looks about the same as mobile, and doesn't show up on the M-C tinderbox anyways.
http://tbpl.mozilla.org/?tree=Try&rev=f806dbbe6aaf
Attachment #530323 - Flags: review?(gal)
Attachment #530177 - Attachment is obsolete: true
Attachment #530191 - Attachment is obsolete: true
Attachment #530323 - Flags: review?(gal) → review+
Whiteboard: [Aurora][Fx5] → [Aurora][Fx5][needs landing] tracemonkey
Keywords: checkin-needed
Whiteboard: [Aurora][Fx5][needs landing] tracemonkey → [Aurora][Fx5][needs push to t-m]
http://hg.mozilla.org/tracemonkey/rev/624c69edea94
Keywords: checkin-needed
Whiteboard: [Aurora][Fx5][needs push to t-m] → [Aurora][Fx5] fixed-in-tracemonkey
pbiggar reported in another bug that this patch fixed at least one apparent instance of this crash, which is heartening.

My patch has been on tracemonkey for a few days without any apparent issues.  I'm going to do a try run against moz-central, and then find somebody to land it there if there's no Tracemonkey-Mozilla Central merge in the next day or two.  Once that is done, it should stop showing up in crash-reports for Nightly.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Did this get checked into Aurora as well?
No.
Comment on attachment 530323 [details] [diff] [review]
add a safe function to check grey bits, call it in the cycle collector

This landed a day or two ago on Moz Central and there haven't been any instances of this crash on nightly builds since then.  Of course, there haven't been any crashes on Aurora builds since then, either.  The patch has also been on Tracemonkey for a week without any apparent problems.
Attachment #530323 - Flags: approval-mozilla-aurora?
Attachment #530323 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Keywords: checkin-needed
Whiteboard: [Aurora][Fx5] fixed-in-tracemonkey → [Aurora][Fx5] fixed-in-tracemonkey [needs push to Aurora]
Please land for Aurora by Monday May 16 or the approval will potentially be lost.  Please mark as status-firefox5 fixed when you do.
sicking said he can land it for me when he gets approval for another patch of his.
checked in

http://hg.mozilla.org/releases/mozilla-aurora/rev/4187f7b5edaf

Thanks for providing a |hg export| patch! Makes checking in so much easier.
Keywords: checkin-needed
Whiteboard: [Aurora][Fx5] fixed-in-tracemonkey [needs push to Aurora] → [Aurora][Fx5] fixed-in-tracemonkey
Very good jobs! Thanks!

(bug spam :P)
Target Milestone: --- → mozilla6
Crash Signature: [@ js_GCThingIsMarked(void*, unsigned int) ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: