Last Comment Bug 648022 - Firefox 4.2a1pre Crash Report [@ js_XDRScript(JSXDRState*, JSScript**) ] [@ js_XDRAtom ] [@ js_XDRScriptAndSubscripts(JSXDRState*, JSScript**) ]
: Firefox 4.2a1pre Crash Report [@ js_XDRScript(JSXDRState*, JSScript**) ] [@ j...
Status: RESOLVED FIXED
[sg:critical?] fixed-in-tracemonkey
: crash, regression
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86 Windows 7
: -- critical (vote)
: ---
Assigned To: Michael Wu [:mwu]
:
Mentors:
Depends on: 652185
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-06 10:18 PDT by Marcia Knous [:marcia - use ni]
Modified: 2015-10-07 18:44 PDT (History)
23 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed
+
fixed
unaffected
unaffected
unaffected


Attachments
Back out bug 643927 (9.72 KB, patch)
2011-05-06 14:35 PDT, Michael Wu [:mwu]
no flags Details | Diff | Splinter Review
Back out second part of bug 632253 (4.84 KB, patch)
2011-05-06 15:19 PDT, Michael Wu [:mwu]
igor: review+
Details | Diff | Splinter Review
Back out bug 643927 and second part of bug 632253 (10.75 KB, patch)
2011-05-06 15:37 PDT, Michael Wu [:mwu]
sayrer: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
Backout bug 518230 (17.81 KB, patch)
2011-05-09 14:47 PDT, Michael Wu [:mwu]
igor: review+
sayrer: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
Return on bad idx's (560 bytes, patch)
2011-05-17 14:52 PDT, Michael Wu [:mwu]
no flags Details | Diff | Splinter Review
Backout bug 518230 (updated for latest mozilla-aurora) (9.98 KB, patch)
2011-06-10 15:50 PDT, Michael Wu [:mwu]
bugzilla: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
Backout bug 518230 (for mozilla-central) (8.27 KB, patch)
2011-06-10 16:50 PDT, Michael Wu [:mwu]
no flags Details | Diff | Splinter Review

Description Marcia Knous [:marcia - use ni] 2011-04-06 10:18:28 PDT
Seen while reviewing trunk top changers. http://tinyurl.com/3bb45hf to the reports.

https://crash-stats.mozilla.com/report/index/64affb79-7ce6-437d-8e7d-441812110405

Frame 	Module 	Signature [Expand] 	Source
0 	mozjs.dll 	js_XDRScript 	js/src/jsscript.cpp:439
1 	mozjs.dll 	JS_XDRScriptObject 	js/src/jsxdrapi.cpp:735
2 	xul.dll 	ReadScriptFromStream 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:367
3 	xul.dll 	mozJSComponentLoader::ReadScript 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:899
4 	xul.dll 	mozJSComponentLoader::GlobalForLocation 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:1044
5 	xul.dll 	nsRefPtr<nsPresContext>::~nsRefPtr<nsPresContext> 	obj-firefox/dist/include/nsAutoPtr.h:969
6 	xul.dll 	mozJSComponentLoader::JarKey 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:599
7 	kernel32.dll 	InterlockedExchangeAdd 	

https://crash-stats.mozilla.com/report/index/64affb79-7ce6-437d-8e7d-441812110405
Comment 1 Wesley W. Garland 2011-04-07 12:40:16 PDT
bug 630209 recently landed (post ff4) and changes some of the rooting API/semantics for JScripts; the scrobj and JSScript are now used as a single unit (a JSObject *) in the API.  This bug also introduced JS_XDRScriptObject(), replacing JS_XDRScript().
Comment 3 chris hofmann 2011-04-11 16:52:12 PDT
around #4 topcrash on mozilla-central so this needs to block wide distribution of aurrora.
Comment 4 Wesley W. Garland 2011-04-13 05:54:15 PDT
Is anybody looking at this?

436 JSAtom *name;
437 if (xdr->mode == JSXDR_ENCODE)
438   name = JS_LOCAL_NAME_TO_ATOM(names[i]);
439 if (!js_XDRAtom(xdr, &name))
440   return false;
441 if (xdr->mode == JSXDR_DECODE) {
442   BindingKind kind = (i < nargs) 

I think we're exploding due to a bad name.  Maybe it's straight-up uninitialized?  Hard to say with the flow control in there, but I have a feeling that maybe this is what should be there:

436 JSAtom *name;
437 if (xdr->mode == JSXDR_ENCODE) {
438   name = JS_LOCAL_NAME_TO_ATOM(names[i]);
439   if (!js_XDRAtom(xdr, &name))
440     return false;
    }
441 if (xdr->mode == JSXDR_DECODE) {
442   BindingKind kind = (i < nargs) 

This is code that was refactored with Bug 614493 - Move name information from JSFunction into JSScript.
Comment 5 Brendan Eich [:brendan] 2011-04-13 06:02:45 PDT
Wes: js_XDRAtom, like the other XDR functions, is a symmetric codec single entry point, so it must be called in both encode and decode cases. In the encode case, the variable passed by reference to it must be initialized; in the decode case, this variable need not be and the XDR function fills in the out param.

Igor, can you take this?

/be
Comment 6 Wesley W. Garland 2011-04-13 07:31:52 PDT
Thanks for the explanation, Brendan - that de-mystifies a bit of this code, but it also means I have no clue where this bug is coming from.

It's almost as though names[] is mutating  (it looks like it gets validated as non-explody up around line 405), but I can't see how it could mutate. The data in there isn't GC'd, it's in a cx->tempPool.
Comment 7 Michael Wu [:mwu] 2011-04-13 10:29:17 PDT
It can't be related to the line above |name = JS_LOCAL_NAME_TO_ATOM(names[i]);| since this is called from ReadScriptFromStream, which means xdr->mode == JSXDR_DECODE.

I have seen one crash in js_XDRScript with xdr->mode == JSXDR_ENCODE, but it's in a different place and isn't the startup crash this one seems to be. https://crash-stats.mozilla.com/report/index/b7b05f57-9b88-4952-a71c-7b6552110412
Comment 8 Benjamin Smedberg [:bsmedberg] 2011-04-13 12:16:31 PDT
We need to figure out what caused this recent regression, and do a backout for Aurora hopefully within the next week.
Comment 9 chris hofmann 2011-04-15 14:26:57 PDT
first started showing up in 3/31 in builds from 3/30


         js_XDRScript.JSXDRState...JSScript...
date     total    breakdown by build
         crashes  count build, count build, ...
20110328   
20110329   
20110330   
20110331 4 4.2a1pre2011033003 	4 ,
20110401 11  	6 4.2a1pre2011040103, 
        		4 4.2a1pre2011033003, 	1 4.2a1pre2011033112, 
20110402 21  	19 4.2a1pre2011040203, 
        		2 4.2a1pre2011040103, 
20110403 19  	17 4.2a1pre2011040203, 
        		2 4.2a1pre2011040303,
Comment 10 christian 2011-04-15 14:33:29 PDT
Cedar merge, ugh:

http://hg.mozilla.org/mozilla-central/rev/422bbd8245a7
Comment 11 Sheila Mooney 2011-04-20 10:10:04 PDT
Is someone working on isolating a regression window for this one?
Comment 12 Daniel Holbert [:dholbert] 2011-04-20 12:35:13 PDT
(In reply to comment #11)
> Is someone working on isolating a regression window for this one?
(I'm not sure anyone can, beyond comment 9, since there aren't any reliable STR)
Comment 13 Daniel Holbert [:dholbert] 2011-04-20 12:46:57 PDT
(In reply to comment #9)
> first started showing up in 3/31 in builds from 3/30

Unless I'm misreading the data, dbaron's query from comment 2 shows some hits from a 3/29 build. (though the crashes in that build occurred on April 8th & April 9th.)

So comment 10 might not be the right regression push-range... (it's on 3/30)
Comment 14 Daniel Holbert [:dholbert] 2011-04-20 13:10:24 PDT
Some of the 3/29-build crashes are inside of nsComponentManagerImpl::GetServiceByContractID / nsComponentManagerImpl::CreateInstanceByContractID.

In at the push-range from the first win32 build on 3/28 to the last win32 build on 3/29 [1], there's a cset in the build-system merge that included a number of ContractID-related changes.  Perhaps one of those changes is involved?

[1] http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=dacd66ab4dc4&tochange=820512b8223e
[2] warning, this is a huge cset that hangs Firefox on and off while it's loading: http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=25caa0ec50f2
Comment 15 Daniel Holbert [:dholbert] 2011-04-20 13:35:23 PDT
(d'oh, nevermind @ the ContractID lead in comment 14 -- that cset (25caa0ec50f2) is labeled "merge m-c to b-s", and it's part of a push to merge b-s back to m-c.  If I manually diff the cset before that push vs. the cset after that push, there are only a few makefile differences.)
Comment 16 Gregor Wagner [:gwagner] 2011-04-21 09:30:51 PDT
Maybe related to 605707?
Comment 17 Michael Wu [:mwu] 2011-04-21 15:24:12 PDT
I suspect this is related to the linux crash @ js_XDRAtom - https://crash-stats.mozilla.com/report/index/c026959f-265a-4182-99e8-4e08b2110421 . Guess js_XDRAtom might've gotten inlined on Windows.

The crash is at http://hg.mozilla.org/mozilla-central/file/fbe7830f27c0/js/src/jsxdrapi.cpp#l649 which suggests that the index that was read is invalid.
Comment 18 Igor Bukanov 2011-04-21 15:56:53 PDT
(In reply to comment #17)
> I suspect this is related to the linux crash @ js_XDRAtom -
> https://crash-stats.mozilla.com/report/index/c026959f-265a-4182-99e8-4e08b2110421
> . Guess js_XDRAtom might've gotten inlined on Windows.

I do not see how that may result in something bad. The callers of js_XDRAtom roots the storage they use to store atoms or store atoms in GC things.
Comment 19 Michael Wu [:mwu] 2011-04-21 16:03:53 PDT
(In reply to comment #18)
> (In reply to comment #17)
> > I suspect this is related to the linux crash @ js_XDRAtom -
> > https://crash-stats.mozilla.com/report/index/c026959f-265a-4182-99e8-4e08b2110421
> > . Guess js_XDRAtom might've gotten inlined on Windows.
> 
> I do not see how that may result in something bad. The callers of js_XDRAtom
> roots the storage they use to store atoms or store atoms in GC things.

Not saying that's bad. Just saying that the crash is probably actually in js_XDRAtom, not in js_XDRScript. And the js_XDRAtom crashes on linux suggest that the index into the atoms table might be wrong/corrupted.
Comment 20 Igor Bukanov 2011-04-22 11:01:12 PDT
Another possible suspect is js_XDRRegExpObject that does JS_XDRString to read the source of the string, then calls NewBuiltinClassInstance, then calls RegExp::create, the call initRegExp, which in turn may allocate initial regexp shape before setting the private data of the new regexp object to the regexp. If the compiler keeps the only reference to the source string in RegExp, then we have the same problem as in the bug 605707. I am not sure that this may be the cause of the crashes, so I will file another bug to fix that.
Comment 21 Sheila Mooney 2011-05-02 10:14:23 PDT
Can we get an update on this. It's still in the top 3 crashes and a regression. Have we isolated a regression window? We need to figure out what caused this and back it out.
Comment 22 Igor Bukanov 2011-05-05 12:38:21 PDT
(In reply to comment #21)
> Can we get an update on this. It's still in the top 3 crashes and a regression.

I am not sure which regression is implied here. I filed the bug 652185 since I was not sure if fixing that issue alone would be sufficient to address the crashes here in this bug.

> Have we isolated a regression window? We need to figure out what caused this
> and back it out.

If fixing the bug 652185 is not enough to silence the crash reports, then we may consider backing out the bug 630209.
Comment 23 Michael Wu [:mwu] 2011-05-05 15:25:41 PDT
I took one of the minidumps and stuck it into a debugger:

# get index? (JS_XDRUInt32)
5f896f88 ffd2            call    edx
5f896f8a 83c408          add     esp,8
5f896f8d 85c0            test    eax,eax
5f896f8f 0f84d7d30800    je      mozjs!js_XDRScript+0x8d68c (5f92436c)
5f896f95 833e01          cmp     dword ptr [esi],1
5f896f98 0f8512020000    jne     mozjs!js_XDRScript+0x4d0 (5f8971b0)
5f896f9e 8b44244c        mov     eax,dword ptr [esp+4Ch]

# check for -1 index
5f896fa2 83f8ff          cmp     eax,0FFFFFFFFh
5f896fa5 0f84d70d0000    je      mozjs!js_XDRScript+0x10a2 (5f897d82)

# dereference a bunch of stuff to get to the atom pointer?
5f896fab 8b4e24          mov     ecx,dword ptr [esi+24h]
5f896fae 8b5110          mov     edx,dword ptr [ecx+10h]
5f896fb1 8b0482          mov     eax,dword ptr [edx+eax*4] < crash here

As far as I can tell, this is the same point that the Linux crash crashes in - http://hg.mozilla.org/mozilla-central/file/fbe7830f27c0/js/src/jsxdrapi.cpp#l649

Not sure why things are ending up this way, though.
Comment 24 Daniel Veditz [:dveditz] 2011-05-05 16:17:09 PDT
It doesn't look like bug 652185 fixed this on mozilla-central, still seeing this in recent builds. e.g. bp-769705c8-2349-4c60-85b5-a75232110505

backout time?
Comment 25 Igor Bukanov 2011-05-06 11:58:49 PDT
 (In reply to comment #23)
> As far as I can tell, this is the same point that the Linux crash crashes in
> -
> http://hg.mozilla.org/mozilla-central/file/fbe7830f27c0/js/src/jsxdrapi.
> cpp#l649

So this looks like a regression from the bug 643927.
Comment 26 Michael Wu [:mwu] 2011-05-06 12:33:28 PDT
(In reply to comment #25)
>  (In reply to comment #23)
> > As far as I can tell, this is the same point that the Linux crash crashes in
> > -
> > http://hg.mozilla.org/mozilla-central/file/fbe7830f27c0/js/src/jsxdrapi.
> > cpp#l649
> 
> So this looks like a regression from the bug 643927.

I guess that's possible. I'm really not sure though. That patch seemed fairly correct, and it doesn't seem likely that the index is randomly corrupted either..

BTW I just noticed a recent crash report in js_XDRAtom on windows on that same line - https://crash-stats.mozilla.com/report/index/1ff8e8bc-6fad-4e90-b941-a8b912110504 .
Comment 27 Michael Wu [:mwu] 2011-05-06 14:35:52 PDT
Created attachment 530742 [details] [diff] [review]
Back out bug 643927

If this doesn't fix it, bug 518230 might be next..
Comment 28 Igor Bukanov 2011-05-06 14:50:49 PDT
(In reply to comment #27)
> Created attachment 530742 [details] [diff] [review] [review]
> Back out bug 643927

Wait, but that reintroduces a GC hazard. I suppose we should backout that fix and the initial optimization bug.
Comment 29 Michael Wu [:mwu] 2011-05-06 15:19:17 PDT
Created attachment 530754 [details] [diff] [review]
Back out second part of bug 632253
Comment 30 Igor Bukanov 2011-05-06 15:22:11 PDT
Comment on attachment 530754 [details] [diff] [review]
Back out second part of bug 632253

Review of attachment 530754 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, lets start with that backout.
Comment 31 Michael Wu [:mwu] 2011-05-06 15:37:24 PDT
Created attachment 530761 [details] [diff] [review]
Back out bug 643927 and second part of bug 632253

Requesting approval for a backout to try to fix a crash.
Comment 32 Michael Wu [:mwu] 2011-05-09 12:21:33 PDT
Try server run for the backout:

http://tbpl.mozilla.org/?tree=Try&rev=9118d02d7265
Comment 33 Michael Wu [:mwu] 2011-05-09 14:47:18 PDT
Created attachment 531152 [details] [diff] [review]
Backout bug 518230

Plan B.
Comment 34 Robert Sayre 2011-05-09 19:25:36 PDT
which patch needs approval here?
Comment 35 Michael Wu [:mwu] 2011-05-09 19:29:42 PDT
(In reply to comment #34)
> which patch needs approval here?

Just the one that I requested approval on. The second patch backs out more stuff in case the first back out isn't enough.
Comment 37 Michael Wu [:mwu] 2011-05-11 18:54:09 PDT
The crash signature has changed but it's still basically the same crash at |if (!js_XDRAtom(xdr, &name))|.

https://crash-stats.mozilla.com/report/index/38df0cde-9e14-4faf-bede-bfac32110511
Comment 38 JP Rosevear [:jpr] 2011-05-12 14:45:15 PDT
Please land for Aurora by Monday May 16 or the approval will potentially be lost.  Please mark as status-firefox5 fixed when you do.
Comment 39 Robert Sayre 2011-05-12 14:46:50 PDT
Guys, what's going on here? Is this fixed on Aurora?
Comment 40 Michael Wu [:mwu] 2011-05-12 14:52:34 PDT
(In reply to comment #39)
> Guys, what's going on here? Is this fixed on Aurora?

The first backout wasn't enough. Going to request approval for another backout which should fix for good.
Comment 41 Michael Wu [:mwu] 2011-05-16 09:26:13 PDT
I believe this backout should fix things for aurora. However, still need to figure out what's going on and what we should do for trunk.

http://hg.mozilla.org/releases/mozilla-aurora/rev/1ae19a9279d0
Comment 42 Sheila Mooney 2011-05-17 11:05:31 PDT
Our next merge from mozilla-central to aurora is next Tues May 24th. Can we get a fix on the trunk so we don't propagate this to Aurora again?
Comment 43 christian 2011-05-17 11:07:01 PDT
(In reply to comment #42)
> Our next merge from mozilla-central to aurora is next Tues May 24th. Can we
> get a fix on the trunk so we don't propagate this to Aurora again?

Marking it as tracking-firefox6 to make sure we don't lose this / ship a regression.
Comment 44 Michael Wu [:mwu] 2011-05-17 14:52:18 PDT
Created attachment 533082 [details] [diff] [review]
Return on bad idx's

I'm running out of ideas. Let's try testing one theory.
Comment 45 Igor Bukanov 2011-05-17 15:43:16 PDT
(In reply to comment #44)
> Created attachment 533082 [details] [diff] [review] [review]
> Return on bad idx's
> 
> I'm running out of ideas. Let's try testing one theory.

We must find the real cause for the regression. So the previous backouts has not helped, right?
Comment 46 Michael Wu [:mwu] 2011-05-17 16:06:58 PDT
(In reply to comment #45)
> (In reply to comment #44)
> > Created attachment 533082 [details] [diff] [review] [review] [review]
> > Return on bad idx's
> > 
> > I'm running out of ideas. Let's try testing one theory.
> 
> We must find the real cause for the regression. So the previous backouts has
> not helped, right?

Oh this is for trunk. I haven't seen this crash on aurora and I don't expect to see it again, though it has only been one day.
Comment 47 Robert Kaiser 2011-05-20 08:16:56 PDT
On Aurora, it seems like we're still seeing crashes happen currently with the js_XDRScriptAndSubscripts signature, but those are early startup so so prohibit people from updating. In fact, we only see it in build IDs up to 2011051600 on Aurora, so the comment #41 backout seems to have helped there, we just have people stranded on builds from before which won't launch properly.
Comment 48 :Ehsan Akhgari 2011-05-24 11:07:02 PDT
I have transplanted the backout changeset for this bug on Aurora for Firefox 6: <http://hg.mozilla.org/releases/mozilla-aurora/rev/65aaadba0020>
Comment 49 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-26 13:37:29 PDT
Marking fixed for Firefox 6 then.
Comment 50 Robert Kaiser 2011-06-03 08:50:42 PDT
Michael, Igor: ping?

The backouts fixed Aurora and Beta, but on trunk we are still missing a fix for the actual problem behind this...
Comment 51 Michael Wu [:mwu] 2011-06-03 08:59:29 PDT
(In reply to comment #50)
> Michael, Igor: ping?
> 
> The backouts fixed Aurora and Beta, but on trunk we are still missing a fix
> for the actual problem behind this...

I enabled crc32 checking on omnijar startup cache in bug 661305 in case it's a corrupted omni.jar that's causing this crash. If this doesn't stop the crashes, I'm just gonna back out bug 518230 entirely.
Comment 52 Robert Kaiser 2011-06-07 12:34:53 PDT
js_XDRAtom is still around: https://crash-stats.mozilla.com/report/list?signature=js_XDRAtom

For the other two signatures, it's hard to say on trunk, probably due to the low amount of users, but the only reports at all we have for those are 13 for js_XDRScript on the 2011-05-26 build and one for this one on the build from the day before.
Comment 53 Sheila Mooney 2011-06-09 09:27:10 PDT
We are still seeing this signature on Aurora (6.0a2). It's no longer appearing on trunk or 5.0.
Comment 54 Robert Kaiser 2011-06-09 15:19:18 PDT
I can't see js_XDRScript on any build of any version with a higher build ID than 20110527 - but then, the 3 reports on that build ID are on 5.0b3.

js_XDRScriptAndSubscripts still seems to happen on current aurora builds but is not seen anywhere else.

js_XDRAtom still happens across all versions on current builds, mostly on trunk and aurora, though.
Comment 55 Sheila Mooney 2011-06-10 10:25:13 PDT
Can someone investigate why we are seeing the js_XDRScriptAndSubscripts only on Aurora?
Comment 56 Michael Wu [:mwu] 2011-06-10 15:50:43 PDT
Created attachment 538622 [details] [diff] [review]
Backout bug 518230 (updated for latest mozilla-aurora)

It looks like not all the backouts made it when aurora was branched. This is the missing backout which should fix js_XDRScriptAndSubscripts . Going to follow up with a version for mozilla-central to fix js_XDRAtom.
Comment 57 Michael Wu [:mwu] 2011-06-10 16:50:24 PDT
Created attachment 538635 [details] [diff] [review]
Backout bug 518230 (for mozilla-central)

This just backs out the xdr parts of bug 518230.
Comment 58 Michael Wu [:mwu] 2011-06-13 18:18:02 PDT
http://hg.mozilla.org/tracemonkey/rev/3acacde59381
Comment 59 Johnathan Nightingale [:johnath] 2011-06-15 11:56:05 PDT
Comment on attachment 538622 [details] [diff] [review]
Backout bug 518230 (updated for latest mozilla-aurora)

As we understand it, we are just approving the rest of the backout
Comment 61 Sheila Mooney 2011-06-20 09:10:44 PDT
So as of 06/17/2011 we are getting an explosive crash with this signature. On 06/16/2011 build we had 38 crashes then it jumped to over 200 for each day after with yesterday at over 300. There still seems to be some problem.
Comment 62 Robert Kaiser 2011-06-20 12:00:15 PDT
(In reply to comment #61)
> So as of 06/17/2011 we are getting an explosive crash with this signature.
> On 06/16/2011 build we had 38 crashes then it jumped to over 200 for each
> day after with yesterday at over 300. There still seems to be some problem.

As I said in today's CrashKill meeting, all those crashes are with build IDs from before this landed, so the backout on Aurora is probably fine, just for some reason, people with non-recent builds had a run of crashes in the last days.
Comment 63 Michael Wu [:mwu] 2011-06-20 17:08:48 PDT
Tracemonkey landed on mozilla-central today.
Comment 64 Sheila Mooney 2011-06-21 14:21:17 PDT
Michael, so I see 2 of these crashes in an Aurora build from June 21st. It is possible that we will still get a few of these crashes or should it be eliminated entirely. I cannot find any on 5.0 for 4 weeks.
Comment 65 Michael Wu [:mwu] 2011-06-21 16:33:07 PDT
Thanks for the heads up.

These two crash reports don't make much sense. They're crashing in a function that I removed/renamed in the backout, yet they still claim to be built after the backout landed. Not sure what to make of that..
Comment 66 Robert Kaiser 2011-06-22 04:37:12 PDT
(In reply to comment #65)
> Thanks for the heads up.
> 
> These two crash reports don't make much sense. They're crashing in a
> function that I removed/renamed in the backout, yet they still claim to be
> built after the backout landed. Not sure what to make of that..

In that case we should try to verify that this are "our" builds, i.e. from our pristine source, and not something someone built himself (or some state of aurora a distro has and happened to rebuild something from without updating the source).
Comment 67 Michael Wu [:mwu] 2011-06-22 05:34:02 PDT
(In reply to comment #66)
> In that case we should try to verify that this are "our" builds, i.e. from
> our pristine source, and not something someone built himself (or some state
> of aurora a distro has and happened to rebuild something from without
> updating the source).

As I understand it, getting working crash reports from non-mozilla builds isn't possible. The crash stacks require debugging information from the build and hopefully only our own build machines can upload that debugging information to the symbol server. A hash is used to match crash reports with the right symbols.

However - I've found something interesting in comparing two crash reports with different build IDs:
Build ID 20110611042006: https://crash-stats.mozilla.com/report/index/de2099d9-5809-4c7a-9839-3a7ae2110621
Build ID 20110621042010: https://crash-stats.mozilla.com/report/index/d7e991bb-a3ef-4316-a314-dbc312110621

In both cases, the module lists reports a xul.dll with version 6.0.0.4179 and a debug identifier of F0C503A2E6D84C00888AE3CF29B650952. We might have broken updates here.
Comment 68 Robert Kaiser 2011-06-22 06:57:04 PDT
(In reply to comment #67)
> (In reply to comment #66)
> As I understand it, getting working crash reports from non-mozilla builds
> isn't possible

That's oversimplifying things, as you just need a proper account to upload symbols, but it probably turns out mostly true in the end.

> In both cases, the module lists reports a xul.dll with version 6.0.0.4179
> and a debug identifier of F0C503A2E6D84C00888AE3CF29B650952. We might have
> broken updates here.

That's surely a possible explanation, yes. We had update problems like this before, though we should have fixed all the major cases meanwhile. Thing still might go wrong in some strange ways at times, though, this is computers after all. ;-)
Comment 69 Ted Mielczarek [:ted.mielczarek] 2011-06-22 06:57:37 PDT
Interestingly, the xul.dlls and mozjs.dlls are both the same in both of those reports, and the changeset both DLLs are built from is the same in both cases, so it's not a mismatched xul+js in this case, anyway.
Comment 70 Michael Wu [:mwu] 2011-06-22 17:25:07 PDT
I've filed bug 666451 to try to figure out how often this sort of update breakage occurs.
Comment 71 Robert Strong [:rstrong] (use needinfo to contact me) 2011-06-22 18:39:50 PDT
btw: there is already bug 635834 and bug 666065 for checking version number mismatches which should be the same.
Comment 72 Michael Wu [:mwu] 2011-06-22 18:53:46 PDT
(In reply to comment #71)
> btw: there is already bug 635834 and bug 666065 for checking version number
> mismatches which should be the same.

AIUI that only checks mismatches between dlls. In this case, we've found mismatches between dlls and application.ini. (and likely other non-dlls since just updating application.ini is unlikely to cause crashes)
Comment 73 Robert Strong [:rstrong] (use needinfo to contact me) 2011-06-22 19:13:29 PDT
Understood... just pointing it out for reference of similar work.

When you get the data please check if the application.ini is for a newer or older version. It is always possible to update application.ini whereas dll's can  be locked which prevents updating it and whether it is newer or older will provide a hint as to what may be going on.

Also, if in fact application.ini is incorrect this would prevent the application from running and display an error with release builds since there is a version check against the gecko version in that file. It has been a long time since I have seen any reports of that happening and iirc the reports were often due to a user performing a system restore which excludes several file types such as ini.
Comment 74 Robert Kaiser 2011-06-23 08:41:21 PDT
https://crash-stats.mozilla.com/report/list?signature=js_XDRAtom continues to be high on trunk, is that a different bug that needs to be filed?
Comment 75 Michael Wu [:mwu] 2011-06-23 16:43:42 PDT
(In reply to comment #74)
> https://crash-stats.mozilla.com/report/list?signature=js_XDRAtom continues
> to be high on trunk, is that a different bug that needs to be filed?

I don't think so. The oldest build ID with the crash is 20110620121237 which is around when the backout landed. There are no js_XDRAtom crashes after 6/20 on trunk.
Comment 76 Robert Kaiser 2011-06-23 16:52:20 PDT
(In reply to comment #75)
> (In reply to comment #74)
> > https://crash-stats.mozilla.com/report/list?signature=js_XDRAtom continues
> > to be high on trunk, is that a different bug that needs to be filed?
> 
> I don't think so. The oldest build ID with the crash is 20110620121237 which
> is around when the backout landed. There are no js_XDRAtom crashes after
> 6/20 on trunk.

I see 9 listed with build IDs from the 22nd, see the following one that has 20110622030205 as the build ID:
https://crash-stats.mozilla.com/report/index/0fd414b9-28c1-40bd-9952-b3f8e2110623

But right, the higher volume seems to be builds from 20th or earlier.
Comment 77 Michael Wu [:mwu] 2011-06-23 17:03:55 PDT
(In reply to comment #76)
> I see 9 listed with build IDs from the 22nd, see the following one that has
> 20110622030205 as the build ID:
> https://crash-stats.mozilla.com/report/index/0fd414b9-28c1-40bd-9952-
> b3f8e2110623
> 

Compare the xul.dll version with the crash report in https://crash-stats.mozilla.com/report/index/003142e0-98d4-42ee-8f78-181802110623 . The dlls are the same but the build ID is different - 20110620030203. So this particular report looks like another broken update.
Comment 78 Robert Strong [:rstrong] (use needinfo to contact me) 2011-06-23 17:14:29 PDT
(In reply to comment #77)
> (In reply to comment #76)
> > I see 9 listed with build IDs from the 22nd, see the following one that has
> > 20110622030205 as the build ID:
> > https://crash-stats.mozilla.com/report/index/0fd414b9-28c1-40bd-9952-
> > b3f8e2110623
> > 
> 
> Compare the xul.dll version with the crash report in
> https://crash-stats.mozilla.com/report/index/003142e0-98d4-42ee-8f78-
> 181802110623 . The dlls are the same but the build ID is different -
> 20110620030203. So this particular report looks like another broken update.
The build id should be 20110620030833 vs. 20110620030203 as reported in the crash report.

20110620030203 and 20110620030833 are for builds on 2011-06-20. Since we typically only update once per day could this be misreporting or did we offer two updates on 2011-06-20?
Comment 79 Robert Strong [:rstrong] (use needinfo to contact me) 2011-06-23 17:21:09 PDT
Ben / Nick, can either of you verify whether we offered two updates on Windows on 2011-06-20.
Comment 80 Robert Strong [:rstrong] (use needinfo to contact me) 2011-06-23 17:28:46 PDT
note: I only see one update for 2011-06-20
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-06-20-03-mozilla-central/

If that is the case then it is possible that it was an hourly (build 20110620030203) that was updated to 20110620030833 where the application.ini wasn't updated.
Comment 81 Michael Wu [:mwu] 2011-06-23 17:31:24 PDT
Looking at tbpl, I see two nightlies with build ids starting with 20110620:

20110620030833
20110620121237

Neither of which is 20110620030203.
Comment 82 Nick Thomas [:nthomas] 2011-06-23 17:50:56 PDT
The update system only knows about 20110620030833 for m-c win32 nightlies on the 20th. But there was a 20110620030203 for the Win64 nightly that day. How does crash-stats handle the win32/win64 difference ?
Comment 83 Robert Kaiser 2011-06-23 18:22:00 PDT
(In reply to comment #82)
> How
> does crash-stats handle the win32/win64 difference ?

It reports the build ID and the architecture, if it says "amd64" then I guess it's probably a 64bit build.
Comment 84 Robert Strong [:rstrong] (use needinfo to contact me) 2011-06-23 18:22:52 PDT
Thanks Nick! It appears that the buildid and file versions are correct for a Win64 build.
Comment 85 Robert Strong [:rstrong] (use needinfo to contact me) 2011-06-24 01:26:44 PDT
(In reply to comment #67)
> (In reply to comment #66)
> > In that case we should try to verify that this are "our" builds, i.e. from
> > our pristine source, and not something someone built himself (or some state
> > of aurora a distro has and happened to rebuild something from without
> > updating the source).
> 
> As I understand it, getting working crash reports from non-mozilla builds
> isn't possible. The crash stacks require debugging information from the
> build and hopefully only our own build machines can upload that debugging
> information to the symbol server. A hash is used to match crash reports with
> the right symbols.
> 
> However - I've found something interesting in comparing two crash reports
> with different build IDs:
> Build ID 20110611042006:
> https://crash-stats.mozilla.com/report/index/de2099d9-5809-4c7a-9839-
> 3a7ae2110621
This one has the correct build id for the dll versions.

> Build ID 20110621042010:
> https://crash-stats.mozilla.com/report/index/d7e991bb-a3ef-4316-a314-
> dbc312110621
and this one doesn't

Another strange thing about this one is that the install time is 	2011-06-21 19:31:05 with an age of 5 seconds since it was installed while the dll's reported are from 10 days previously.

Perhaps a startup crash with the new install while another installed version is already running?
Comment 86 Ted Mielczarek [:ted.mielczarek] 2011-06-24 07:03:41 PDT
FYI, "Install Time" is just calculated as "time since the version with this Build ID was first run":
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#977
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#913

We store build-id named files in "%APPDATA%\Mozilla\Firefox\Crash Reports" with timestamps in them.

Note You need to log in before you can comment on or make changes to this bug.