Closed Bug 691847 Opened 13 years ago Closed 13 years ago

Firefox Startup Crash @ js::Bindings::getLocalNameArray

Categories

(Core :: JavaScript Engine, defect)

7 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla11
Tracking Status
firefox8 + ---
firefox9 - ---
firefox10 - ---
firefox11 + verified

People

(Reporter: marcia, Assigned: mwu)

References

(Blocks 1 open bug)

Details

(Keywords: crash, Whiteboard: [qa!])

Crash Data

Attachments

(4 files)

Seen while reviewing the Firefox 7 explosive report:

https://crash-stats.mozilla.com/report/list?signature=js::Bindings::getLocalNameArray%28JSContext*,%20js::Vector%3CJSAtom*,%20int,%20js::TempAllocPolicy%3E*%29

The comments don't seem to be terribly useful and there are no correlations showing, although I can see if I can hunt some down manually.

Bug 672262 was previously filed for this same signature.

Frame 	Module 	Signature [Expand] 	Source
0 	mozjs.dll 	js::Bindings::getLocalNameArray 	js/src/jsscript.cpp:179
1 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4598
2 	mozcrt19.dll 	arena_dalloc_small 	obj-firefox/memory/jemalloc/crtsrc/jemalloc.c:4054
3 	mozjs.dll 	js::Execute 	js/src/jsinterp.cpp:908
4 	mozjs.dll 	js::ExternalExecute 	js/src/jsinterp.cpp:944
5 	mozjs.dll 	JS_ExecuteScriptVersion 	js/src/jsapi.cpp:4932
6 	xul.dll 	mozJSComponentLoader::GlobalForLocation 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:1138
7 	xul.dll 	mozJSComponentLoader::ImportInto 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:1364
8 	xul.dll 	mozJSComponentLoader::Import 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:1271
9 	xul.dll 	nsXPCComponents_Utils::Import 	js/src/xpconnect/src/xpccomponents.cpp:3726
10 	xul.dll 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:102
11 	xul.dll 	XPC_WN_CallMethod 	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1592
12 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:656
13 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4091
14 	mozcrt19.dll 	arena_dalloc_small 	obj-firefox/memory/jemalloc/crtsrc/jemalloc.c:4054
15 	mozjs.dll 	js::Execute 	js/src/jsinterp.cpp:908
16 	mozjs.dll 	js::ExternalExecute 	js/src/jsinterp.cpp:944
17 	mozjs.dll 	JS_ExecuteScriptVersion 	js/src/jsapi.cpp:4932
18 	xul.dll 	mozJSComponentLoader::GlobalForLocation 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:1138
19 	xul.dll 	mozJSComponentLoader::LoadModuleImpl 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:668
20 	xul.dll 	mozJSComponentLoader::LoadModuleFromJAR 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:636
21 	xul.dll 	PL_DHashTableOperate 	obj-firefox/xpcom/build/pldhash.c:625
22 	xul.dll 	nsFactoryEntry::GetFactory 	xpcom/components/nsComponentManager.cpp:1970
23 	xul.dll 	nsComponentManagerImpl::CreateInstanceByContractID 	xpcom/components/nsComponentManager.cpp:1292
24 	xul.dll 	nsComponentManagerImpl::GetServiceByContractID 	xpcom/components/nsComponentManager.cpp:1698
25 	xul.dll 	CallGetService 	obj-firefox/xpcom/build/nsComponentManagerUtils.cpp:94
26 	xul.dll 	nsGetServiceByContractIDWithError::operator 	obj-firefox/xpcom/build/nsComponentManagerUtils.cpp:288
27 	xul.dll 	nsAppStartupNotifier::Observe 	embedding/components/appstartup/src/nsAppStartupNotifier.cpp:100
28 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3394
29 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:107
30 	firefox.exe 	firefox.exe@0x4043 	
31 	firefox.exe 	_RTC_Initialize 	
32 	mozcrt19.dll 	_initterm 	obj-firefox/memory/jemalloc/crtsrc/crt0dat.c:852
33 	firefox.exe 	firefox.exe@0x2087 	
34 	ntdll.dll 	WinSqmSetIfMaxDWORD 	
35 	ntdll.dll 	_RtlUserThreadStart 	
36 	firefox.exe 	firefox.exe@0x1cef 	
37 	firefox.exe 	firefox.exe@0x1cef
Hmm, interestingly, those all seems to happen with 7.0 and not 7.0.1!
Umm, I'll take that back, it exists in 7.0.1 as well, from what I see.
OK, so it happens with 7.0 and 7.01 release versions, but is quite low in volume - it is absolutely high in volume with 7.0b6 though, see https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/7.0b6/7 but has only started spiking on that build over the last weekend, i.e. when 7.0 final had already been released, possibly even 8.0b1 had been public at that point.
Another crash report: https://crash-stats.mozilla.com/report/index/5d0e3012-610a-408d-968b-488fd2111107
The user tried to upgrade Firefox from 7.0.1 to 8 and at the end of install process, he has been told to restart his computer what he did. But his computer was staying blocked on start page during 5 min and he did an hard reset. After checking the integrity of his computer, he started FF and received this crash report. Now each time he tries to uninstall/install Firefox, the installer says an install is processing and he has to restart his computer to finish the install.
This has exploded to topcrash #2 on 7 in yesterday's data. :(
With some tutoring by billm I pulled a minidump up in a debugger for this, and it does seem to confirm that fun->script() is NULL.  Which makes no sense, of course.  I think it's time for skidmarks.  That'll be first-ish thing tomorrow morning.
Is there a known workaround to fix this bug? Because on various Firefox support boards, some users are meeting this crash at startup and are not able to start Firefox and surf. :/
Do you think remove and reinstall properly Firefox (without erasing the profile data) can fix it?
topcrash #1 in Firefox 7 data yesterday by far. :(

2523 crashes per million ADU, >18k processed crashes on a day, with the 10% throttling taken into account, this means we did receive 180-190k (!) crashes yesterday with this signature.
Hello Jeff - Do you have any update on this? We will likely be discussing this at the 2 PM meeting today.

(In reply to Jeff Walden (remove +bmo to email) from comment #6)
> With some tutoring by billm I pulled a minidump up in a debugger for this,
> and it does seem to confirm that fun->script() is NULL.  Which makes no
> sense, of course.  I think it's time for skidmarks.  That'll be first-ish
> thing tomorrow morning.
Blocks: 701456
I have a user on the phone who I think has this problem.  What should I be asking?
Sorry, had a late night last night, haven't gotten started yet -- doing so now.

This may just be me inexplicably being stupid, but I'm having trouble finding the JSOP_GETUPVAR_DBG code at all.  I see it in my mozilla-release tree, but not in my mozilla-aurora or mozilla-beta.  (All three trees are at least older than the merge that happened Tuesday.)  So this may only affect the current release, as far as our code goes.  I'll still try to skidmark up something, but be aware that it looks like (unless the crash has morphed) I'll only be able to test against the mozilla-release tree -- at best.

As for your user (who is probably no longer on the phone), what would be useful would be if he could get the crash in a debugger, so that |fun| could be examined.  I suspect a user talking to SUMO is unlikely to be able to do this, unfortunately.  :-\

...and more fun.  The jscrashreport.h stuff that billm pointed me at yesterday for skidding seems not to be in my mozilla-release tree.  This will require some thinking.
Talking to billm about the jscrashreport.h stuff now..

The change that removed the code with the crash occurred for the Firefox 8 release.  So none of mozilla-release, mozilla-aurora, or mozilla-beta have the code, and should not show any crashes for this.

I guess I look for the Firefox 7 source tree now to try and hack it?  :-(
After some investigation of this crash, I believe it may be related to the other known crash we have with http://www.trusteer.com/product/trusteer-rapport.

    * Bug 699776
    * Bug 681521

There are no correlations showing up in crash stats, but I see a user that has rookscom.dll in their report.

I am installing Trusteer Rapport now to see if I can reproduce. Note that this software is distributed by various online banks as well.
To follow up on Jeff's comment, the code that's crashing was removed in bug 620316. That landed for Firefox 8. So we no longer ship any releases that will crash here.

So is this a problem that happens (only?) when people upgrade from Firefox 7? That would explain the big increase recently.
FYI we have turned off ->8.0 automatic updates because of this startup crash.
A lot of the comments mention System Restore and power failures.  Could this be some sort of partially applied update?
Of course, System Restore might just be what normal users do when things break ...
I suspect they are just doing system restore and I doubt that there are a ton of users that have a power failure in the middle of an update
When Sheila and I sat down and looked at the data there was a huge spike associated with Firefox 7 (about 35k crashes, which you have to multiply because of throttling) and a total of 3 (just 3, not 3k) crashes with this signature for Firefox 8.  So it's specific to a reporter with a reported version of Firefox 7.

My guess would be that it's triggered as part of the update and it crashes and leaves the browser in an unstartable state with a reporter that says it's Firefox 7

Cheng, do you think it's possible we get a set of binaries and profile for a user with this problem?
https://crash-stats.mozilla.com/report/index/208e5df9-126c-4f9d-b598-82d472111110 is one report that shows the RapportTanzan7.dll in the module list.
I have tested a few times in the lab and so far I have not been able to reproduce the crash with the Truster Rapport Emerald Build 1108.55. I tested on a Win XP machine going from 7.0.1->8 as well as on a Windows 7 machine following the same path.

Trying some other avenues now. I downloaded Trusteer from http://www.trusteer.com/webform/download-rapport. I am not sure how different the bank versions are from what they are serving on their site.

When I look at their .bin directory I do see a myriad of dlls including the 7 and 8 dlls.
https://crash-stats.mozilla.com/report/index/d68e81d0-9094-4147-8ac3-244512111110 is a user with this crash who doesn't have trusteer.  I don't think that this is actually a strong correlation and it's possibly a red herring.

Bug 699776 is the bug for startup crashes with Trusteer.
Bug 701535 is logged on the Socorro side to try to get some correlations. I will await what that correlation report says before speculating further on the cause.
(In reply to Marcia Knous [:marcia] from comment #21)
> I have tested a few times in the lab and so far I have not been able to
> reproduce the crash with the Truster Rapport Emerald Build 1108.55. I tested
> on a Win XP machine going from 7.0.1->8 as well as on a Windows 7 machine
> following the same path.

Look at my comment here: https://bugzilla.mozilla.org/show_bug.cgi?id=699776#c3
Trusteer claims the issue is fixed in Rapport Build 1108.47. You need to find and test previous versions of Rapport to reproduce the crashes.
If I could find an old version, I would gladly test one. Right now the only one I can find is from their site directly. I am going to wait for bug logged in Comment 23 to show correlations.

(In reply to Loic from comment #24)
> (In reply to Marcia Knous [:marcia] from comment #21)
> > I have tested a few times in the lab and so far I have not been able to
> > reproduce the crash with the Truster Rapport Emerald Build 1108.55. I tested
> > on a Win XP machine going from 7.0.1->8 as well as on a Windows 7 machine
> > following the same path.
> 
> Look at my comment here:
> https://bugzilla.mozilla.org/show_bug.cgi?id=699776#c3
> Trusteer claims the issue is fixed in Rapport Build 1108.47. You need to
> find and test previous versions of Rapport to reproduce the crashes.
I was able to ferret out some correlations from chofmann's report.
(In reply to Marcia Knous [:marcia] from comment #25)
> If I could find an old version, I would gladly test one. Right now the only
> one I can find is from their site directly. I am going to wait for bug
> logged in Comment 23 to show correlations.

I have joined a previous version of Trusteer Rapport, v3.5.1105.45 from July 2011 (for Windows OS). You can use http://www.virustotal.com to have more info about the exe.
I attached the specifics from that signature from the module report in Comment 26.

https://crash-analysis.mozilla.com/crash_analysis/20111110/20111110_Firefox_7.0.1-interesting-addons-with-versions.txt.gz, which is the addon report, shows no correlations for js::Bindings::getLocalNameArray(JSContext*, js::Vector<JSAtom*, int, js::TempAllocPolicy>*)|EXCEPTION_ACCESS_VIOLATION_READ based on 17737 crashes.

So it looks as if right now all we have to go on is what is showing up in the module report.
Attachment #573694 - Attachment mime type: text/plain → application/octet-stream
https://bug701535.bugzilla.mozilla.org/attachment.cgi?id=573688 shows a fairly high frequency correlation to mozsqlite3.dll	- 32287.
The original change which made us start doing |fun->script().bindings....| was this one, which has been around since the start of the year:

http://hg.mozilla.org/releases/mozilla-release/rev/0d9a5752b1cf

There have been slight tweaks -- the name-array to vector of names change -- but nothing which should affect the crash.  The patch moved the nvars/nargs fields from JSFunction to JSScript, so we deref fun->script() where before we did not.  But it should have been the case all along that fun->script() was always non-null, so it shouldn't have been crashing regardless.

I've never been able to see a problem with this, in a fair bit of looking at it many different times.  Maybe some other JS hacker can see something I've missed here?  I am grasping at straws with this.  :-\
(In reply to Robert Strong [:rstrong] (do not email) from comment #18)
> I suspect they are just doing system restore and I doubt that there are a
> ton of users that have a power failure in the middle of an update
I checked the debug identifiers on firefox.exe, mozjs.dll, mozcrt19.dll, mozcpp19.dll, plc4.dll, plds4.dll, nspr4.dll, mozsqlite3.dl, nssutil3.dll, softokn3.dll, nss3.dll, ssl3.dll, smime3.dll, and xul.dll for
https://crash-stats.mozilla.com/report/index/d68e81d0-9094-4147-8ac3-244512111110

and they were correct for Firefox 7.0.1. So this doesn't appear to be due to a busted update.
(In reply to Robert Strong [:rstrong] (do not email) from comment #31)
> So this doesn't appear to be due to a busted update.

Rob, could you explain how the update process works? It seems like this probably has something to do with updates, given the recent spike. Does the updater run JavaScript at all? What code runs in the old version and what code runs in the new version?

Also, this might be a silly question, but could we have forgotten to bump the bytecode version at some point?
(In reply to Bill McCloskey (:billm) from comment #32)
> (In reply to Robert Strong [:rstrong] (do not email) from comment #31)
> > So this doesn't appear to be due to a busted update.
> 
> Rob, could you explain how the update process works? It seems like this
> probably has something to do with updates, given the recent spike. Does the
> updater run JavaScript at all?
Not at all... it is a separate binary that doesn't rely on gecko.

> What code runs in the old version and what
> code runs in the new version?
Not sure what you mean by this. The updater updates the files on the file system.
http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/updater/

> Also, this might be a silly question, but could we have forgotten to bump
> the bytecode version at some point?
For partial updates a diff is performed between the old and the new. For complete updates the new files are placed on the file system.

The fact that the version numbers and debug identifiers are correct for Firefox 7.0.1 in the crash report implies that the binary files are all for the same version.
(In reply to Robert Strong [:rstrong] (do not email) from comment #33)
> 
> The fact that the version numbers and debug identifiers are correct for
> Firefox 7.0.1 in the crash report implies that the binary files are all for
> the same version.

If you want us to check, we could run the frankeninstall report for 7.0.1.  (See bug 666065 for reference)
I spot checked three of the crash reports and they all had the correct 7.0.1 binary files based on the debug signature and when applicable version for the files and that report just provides the version numbers for the binary files. I don't think the report will show anything too different than that but if anyone wants that checked I'm fine with running the log parsing script I have against the output of the logs.
(In reply to Robert Strong [:rstrong] (do not email) from comment #33)
> The fact that the version numbers and debug identifiers are correct for
> Firefox 7.0.1 in the crash report implies that the binary files are all for
> the same version.

One other point on this subject - if we think these installs aren't frankeninstalls, yet installing 8 on top fixes the issue, could that mean that 8 binaries were never laid down but some sort of profile update did still occur?
The fact that system restores (on top of updates) are causing this bug (see crash comments) could mean an out of sync profile and binaries.
Installing 8 on top of 7.0.1 might very well fix it without it being a frankenfox. For example, the startupCache might not be getting invalidated properly.

A system restore will screw up lots of things and always has because system restore only restores binary files. Also, I suspect that many of the system restore comments are due to performing a system restore after Firefox has already stopped working.
(In reply to Robert Strong [:rstrong] (do not email) from comment #38)
> Installing 8 on top of 7.0.1 might very well fix it without it being a
> frankenfox. For example, the startupCache might not be getting invalidated
> properly.
> 
> A system restore will screw up lots of things and always has because system
> restore only restores binary files.
Specifically iirc, unless there has been a complete restore point created which is often not the case and we don't definitely create one at this time because that could add several minutes to the update process. We are evaluating creating the restore point with the service in the future after we have the service. It is important to note that this has always been the case regarding system restore and Firefox.

> Also, I suspect that many of the system
> restore comments are due to performing a system restore after Firefox has
> already stopped working.
nthomas tested restoring to 7 and commented as follows on IRC
"so on XP installing Fx7, creating a restore point, update to 8.0, restore to snapshot will give you fx7 dll's with fx8's omni.jar
and firefox crashes on start, https://crash-stats.mozilla.com/report/index/bp-1030a175-94f5-4579-9a89-b8c932111110"

blizzard and I are discussing the possibility of a perf bug causing users to restore back to 7 en masse.

That could also just be one of a couple of causes of this bug, but may point us in the right direction nonetheless.
(In reply to Alex Keybl [:akeybl] from comment #40)
> nthomas tested restoring to 7 and commented as follows on IRC
> "so on XP installing Fx7, creating a restore point, update to 8.0, restore
> to snapshot will give you fx7 dll's with fx8's omni.jar
> and firefox crashes on start,
> https://crash-stats.mozilla.com/report/index/bp-1030a175-94f5-4579-9a89-
> b8c932111110"
Yes, that and other files aren't binary files and typically won't be restored via system restore which has always been the case.

> blizzard and I are discussing the possibility of a perf bug causing users to
> restore back to 7 en masse.
> 
> That could also just be one of a couple of causes of this bug, but may point
> us in the right direction nonetheless.
I think that is the right track to take.
Early returns on the mass emailing:

1) Almost everyone (about 15) who replied had already fixed the problem by reinstalling Firefox so no useful data there.

2) I asked back for information about system restore: 

"I had to do a system restore shortly after Firefox and MacAfee both installed major updates - the restore must have set me back to Firefox 7, which was apparently corrupted or incomplete. As soon as I got online with IE (which had its own bizarre glitches) and downloaded version 8 and re-installed it, everything worked fine."

"Thank you for getting back to me. Yes, I had restored my system just prior to the crashes (and hadn't made the correlation). 

As a web developer, firefox+firebug is essential, and I was forced to uninstall/reinstall firefox 8. It solved the launch issue for me, but unfortunately leaves me unable to provide further assistance."

"reinstalled mozilla, restored system, now it all works ok."

===

Other useful replies:

"When Mozilla wanted to upgrade it didn't give me enough time to turn off my firewall thus it crashed. My HP computer tech fixed it by installing it. Thank you for the e-mail" -- This one sounds like it wasn't related to a system restore(?)

I also have one install folder (in my instructions I had them make the upload before reinstalling so hopefully it's a 7.0.1 one, but I don't have windows so I can't check). -- Ask me in person and I'll pass it along.  It didn't come with an email address, sadly so I can't tell which email (if any) it goes with.

Separately, I have a (after fixing) about:support from a user who had to reinstall everything. It has the following add-ons:
Name 	Version 	Enabled 	ID
Add to Amazon Wish List Button	1.7	false	amznUWL2@amazon.com
AVG Safe Search 	9.0.0.911	false	{3f963a5b-e555-4543-90e2-c3908898db71}

More notably, it looks like she lost everything since the profile has almost no modified prefs.
So can we reproduce this crash just be replacing Firefox 7's omni.jar with the Firefox 8 version and then starting it?

Even if that's not what's causing the problem for everyone, it might be helpful to step through in a debugger and see why we're crashing in this particular spot.
(In reply to Alex Keybl [:akeybl] from comment #40)
> nthomas tested restoring to 7 and commented as follows on IRC
> "so on XP installing Fx7, creating a restore point, update to 8.0, restore
> to snapshot will give you fx7 dll's with fx8's omni.jar
> and firefox crashes on start,
> https://crash-stats.mozilla.com/report/index/bp-1030a175-94f5-4579-9a89-
> b8c932111110"

omni.jar contains compiled versions of JS modules and components. I don't know a lot about this but I'm guessing it is some kind of bytecode that is likely Firefox version specific, so trying to run the code generated from one version of Firefox on another is probably a bad thing. Perhaps that is the source of this? The stack certainly looks like it's during module or component load. Does this compiled form have any kind of sanity checks to make sure it is run against the right version?
[jwalden@wheres-wally tmp]$ unzip -v omni.jar | grep -v 'css$' | grep -v 'jsm$' | grep -v 'dtd$' | grep -v '\.js$' | grep -v 'xml$' | grep -v 'properties$' | grep -v 'gif$' | grep -v 'png$' | grep -v 'xul$' | grep -v 'xml$' | grep -v 'html$' | grep -v 'svg$' | grep -v 'rdf$' | grep -v 'wav$' | grep -v 'xsl$' | grep -v 'manifest$' | grep -v 'htmlmathml-f.ent$'
warning [omni.jar]:  5050841 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [omni.jar]:  reported length of central directory is
  -5050841 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
  zipfile?).  Compensating...
Archive:  omni.jar
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
  399799  Defl:X   177475  56% 01-01-2010 00:00 c34bce7a  components/browser.xpt
--------          -------  ---                            -------
16504995          4941667  70%                            1380 files

That's the only binary file I see in an omni.jar (from a trunk build) that's not in essence a data file.  .xpt specifies components and interfaces and stuff like that, so it could be problematic, but it seem unlikely to be so in the way of this bug.  I'd expect some sort of clearly-XPCOM-related crash if it were a bad .xpt file at fault.

In short, I think we have a useful technique for reproducing this, even if the method of production is rather novel.  I'll take a debugger at a frankenbrowser in the morning and see what I can get from it.  I predict this will point out the problem without too much trouble.
I found this bug by following a crash report sent from my mum's computer. Nothing unusual happened when she updated from 7 to 8, except that FF crashed after the update (even in FF safe mode - at the first prompt, asking if we wanted to reset stuff or continue in safe mode). I uninstalled & reinstalled, without help. I system restored, without help. I uninstalled, removed registry entries (HKLM and HKCU) and Firefox directories (%appdata% local and roaming) and re-installed - same problem. The only way I've managed to get FF working is finding and downloading 7.0.1 again.

System is Win7 64-bit, running AVG. Happy to provide further info if requested. Mostly just wanted to point out that it's not all power failures, AV update clashes, and random system restores during an update :)
(In reply to Bill McCloskey (:billm) from comment #43)
> So can we reproduce this crash just be replacing Firefox 7's omni.jar with
> the Firefox 8 version and then starting it?

FTR I just tried this with Thunderbird. When attempting to start Thunderbird it just didn't do anything, no errors, no crash reports - nothing.
(In reply to Mantrid from comment #46)
> I found this bug by following a crash report sent from my mum's computer.
> Nothing unusual happened when she updated from 7 to 8, except that FF
> crashed after the update (even in FF safe mode - at the first prompt, asking
> if we wanted to reset stuff or continue in safe mode). I uninstalled &
> reinstalled, without help. I system restored, without help. I uninstalled,
> removed registry entries (HKLM and HKCU) and Firefox directories (%appdata%
> local and roaming) and re-installed - same problem. The only way I've
> managed to get FF working is finding and downloading 7.0.1 again.
> 
> System is Win7 64-bit, running AVG. Happy to provide further info if
> requested. Mostly just wanted to point out that it's not all power failures,
> AV update clashes, and random system restores during an update :)

Hi!  I think Cheng will contact you tomorrow.  It's interesting that re-installing didn't solve the problem for you.  Can you mail me at blizzard@mozilla.com with a phone number or non-obscured email address?  (I'm going to assume that your -mozilla address will only take bugzilla mail.)

Do you have a link to the crash report?  I think I want to verify the problem you saw was the same one we're talking about here, at least by signature.
Crash report is at https://crash-stats.mozilla.com/report/index/112c5b4b-d6f1-4ad7-976a-3ab7c2111110

You can e-mail me on the address listed here. It's just a wildcard domain I use, so I can track and block addresses. It's amazing how many places sell addresses or get hacked :/
(In reply to Jeff Walden (remove +bmo to email) from comment #45)
> [jwalden@wheres-wally tmp]$ unzip -v omni.jar | grep -v 'css$' | grep -v
> 'jsm$' | grep -v 'dtd$' | grep -v '\.js$' | grep -v 'xml$' | grep -v
> 'properties$' | grep -v 'gif$' | grep -v 'png$' | grep -v 'xul$' | grep -v
> 'xml$' | grep -v 'html$' | grep -v 'svg$' | grep -v 'rdf$' | grep -v 'wav$'
> | grep -v 'xsl$' | grep -v 'manifest$' | grep -v 'htmlmathml-f.ent$'
> warning [omni.jar]:  5050841 extra bytes at beginning or within zipfile
>   (attempting to process anyway)
> error [omni.jar]:  reported length of central directory is
>   -5050841 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
>   zipfile?).  Compensating...
> Archive:  omni.jar
>  Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
> --------  ------  ------- ---- ---------- ----- --------  ----
>   399799  Defl:X   177475  56% 01-01-2010 00:00 c34bce7a 
> components/browser.xpt
> --------          -------  ---                            -------
> 16504995          4941667  70%                            1380 files
> 
> That's the only binary file I see in an omni.jar (from a trunk build) that's
> not in essence a data file.  .xpt specifies components and interfaces and
> stuff like that, so it could be problematic, but it seem unlikely to be so
> in the way of this bug.  I'd expect some sort of clearly-XPCOM-related crash
> if it were a bad .xpt file at fault.

The compiled forms have the same names as their original JS files, so .js and .jsm which you excluded in your search. Basically everything under the jsloader directory of omni.jar is a binary version of a shipped component or module. You also might need to actually package the build to have them generated, I don't know where it happens, but I see them on my installed nightly.
Mantrid, do you have the crash info for the first crash (the one before you did a session restore)? I'm willing to bet it was a different crash and when users session restore they hit this one.
Cheng has done some outreach and it sounds like almost everyone has had problems as a result of a windows system restore.  People that he's gotten responses from who have re-installed Firefox generally don't have problems.

It also turns out that there were a lot of other reasons why people might have done a system restore because of external factors in the market on Tuesday.  And also try and update their virus scanners as well.

Marcia has reproduced this problem in the lab as well, so we're just waiting on that install directory (and profile just in case) to see if we can look into the root cause with Waldo or another JS hacker.
In looking at the explosive report for Firefox 7 today - https://crash-analysis.mozilla.com/rkaiser/2011-11-10/2011-11-10.firefox.7.explosiveness.html - https://crash-stats.mozilla.com/report/list?signature=js%3A%3ABindings%3A%3AgetLocalNameArray - Mac and Linux users seem to be describing the same type of issue trying to update 7.0.1.
Crash Signature: [@ js::Bindings::getLocalNameArray(JSContext*, js::Vector<JSAtom*, int, js::TempAllocPolicy>*) ] → [@ js::Bindings::getLocalNameArray(JSContext*, js::Vector<JSAtom*, int, js::TempAllocPolicy>*) ] [@ js::Bindings::getLocalNameArray ]
Marcia is going to get some email addresses for the Mac users and Cheng is going to do some outreach and see if we can get a Mac example since that's an excellent canary.
(In reply to Robert Strong [:rstrong] (do not email) from comment #33)
> > Also, this might be a silly question, but could we have forgotten to bump
> > the bytecode version at some point?
> For partial updates a diff is performed between the old and the new. For
> complete updates the new files are placed on the file system.

The diff won't address a failure to update the bytecode version that's checked against a version stored in the startup cache, though.

> The fact that the version numbers and debug identifiers are correct for
> Firefox 7.0.1 in the crash report implies that the binary files are all for
> the same version.

Right. Let's say the installer/updater is off the hook. It still seems to me we could have a problem with some users' startup caches holding bytecodes that were retired and whose opcodes were recycled.

Here's a diff -u of `grep '^date:' ...` on the hg logs for jsopcode.tbl and jsxdrapi.h:

--- /tmp/od     2011-11-11 12:05:07.000000000 -0800
+++ /tmp/xd     2011-11-11 12:01:42.000000000 -0800
@@ -1,213 +1,173 @@
-date:        Thu Nov 10 12:34:24 2011 -0800
 date:        Wed Nov 09 17:43:46 2011 +0100
 date:        Wed Nov 09 17:42:27 2011 +0100
+date:        Thu Nov 03 16:39:08 2011 -0400
 date:        Tue Nov 01 09:01:04 2011 -0700
 date:        Thu Sep 22 17:35:25 2011 +0100
-date:        Sat Sep 17 19:14:22 2011 -0700
-date:        Sat Sep 17 16:32:43 2011 +0100
-date:        Tue Sep 06 22:34:23 2011 -0700
+date:        Fri Sep 02 08:46:00 2011 +0200
+date:        Thu Sep 08 16:36:51 2011 -0500
 date:        Sat Sep 03 12:22:10 2011 +0200
 date:        Fri Sep 02 16:52:13 2011 -0500
-date:        Wed Aug 17 06:48:14 2011 -0700
 date:        Sun Aug 14 19:51:16 2011 -0700
-date:        Thu Jul 28 09:16:53 2011 -0700
+date:        Tue Aug 09 13:29:11 2011 -0700
 date:        Thu Jul 21 18:53:37 2011 -0700
 date:        Sat Jul 16 13:47:58 2011 -0700
-date:        Sat Jul 16 08:25:22 2011 -0700
 date:        Fri Jul 15 14:40:15 2011 -0700
 date:        Fri Jul 08 18:46:04 2011 -0700
+date:        Tue Aug 09 10:21:12 2011 -0500
 date:        Wed Jul 27 18:23:16 2011 -0500
 date:        Mon Jul 18 10:58:56 2011 -0500
 date:        Fri Jul 08 17:58:10 2011 -0700
 date:        Fri Jul 01 16:24:32 2011 -0700

Lines starting with - mean jsopcode.tbl changed without jsxdrapi.h changing. Lines starting with + mean the opposite. It sure looks to me as though we had opcode changes without JSXDR_BYTECODE_VERSION changing.

/be
Brendan, can you make sure that the bytecode versions are different between 7.0.1 and the 8 we just released?
(In reply to Dave Townsend (:Mossop) from comment #50)
> The compiled forms have the same names as their original JS files, so .js
> and .jsm which you excluded in your search.

Egad.  I would love it if we had some other extension that made this difference clear.  Would it be possible to change this, and use a new extension, or an extra extension (.js.compiled), or something?  (New bug if so.)
[jwalden@wheres-wally release]$ ack JSXDR_BYTECODE
js/src/jsiter.h
54: * JSXDR_BYTECODE_VERSION.

js/src/jsemit.h
937: * Don't forget to update JSXDR_BYTECODE_VERSION in jsxdrapi.h for all such

js/src/jsxdrapi.h
225:#define JSXDR_BYTECODE_VERSION      (0xb973c0de - 89)

content/xul/document/public/nsIXULPrototypeCache.h
83:// See also JSXDR_BYTECODE_VERSION in jsxdrapi.h, which tracks incompatible JS
[jwalden@wheres-wally release]$ hg id
58f3edbff1b9 (GECKO701_2011092812_RELBRANCH) FIREFOX_7_0_1_BUILD1/SEAMONKEY_2_4_1_BUILD1/SEAMONKEY_2_4_1_RELEASE/THUNDERBIRD_7_0_1_BUILD1/THUNDERBIRD_7_0_1_RELEASE

So nothing's checking bytecode version in 7.0.1.  :-(
We had a big "oh s#!7" moment in the JS pit, realizing that the startup cache redo assumes the omnijar written by the installer matches the executable for all sorts of things, definitely including bytecode version.

That's not a safe assumption, at least in part due to Windows System Restore, which restores EXEs and DLLs but not Omnijars.

There are other ways to get inconsistent executable vs. startup cache. We must defend. Mwu can you take this one?

/be
(In reply to Brendan Eich [:brendan] from comment #59)
> We had a big "oh s#!7" moment in the JS pit, realizing that the startup
> cache redo assumes the omnijar written by the installer matches the
> executable for all sorts of things, definitely including bytecode version.
> 
> That's not a safe assumption, at least in part due to Windows System
> Restore, which restores EXEs and DLLs but not Omnijars.
> 
> There are other ways to get inconsistent executable vs. startup cache. We
> must defend. Mwu can you take this one?
> 
> /be

I'll take it, but note that this will just be a small bandaid over a gaping wound. Any mismatches between the dlls/exes and omnijars cannot be tolerated and simply verifying the bytecode is not enough. We will still have severe problems even with bytecode checking if we cannot ensure omnijars are correct.

Note that the profile startup cache version checking/invalidation is even more strict than bytecode version checking - it attempts to invalidate all of the startup cache for any install update.

I'll also open another bug to rename omni.jar to omnijar.dll to ensure system restore does the right thing. We'll also want to add a buildid to omnijar to ensure we never attempt to start with a mismatched omnijar.
Assignee: general → mwu
This adds a bytecode version check to xdr serialization. It's cheap to check so we might as well do it all the time and save spidermonkey users from shooting themselves in the foot.
Attachment #573932 - Flags: review?
Attachment #573932 - Flags: review? → review?(jwalden+bmo)
Comment on attachment 573932 [details] [diff] [review]
Check bytecode version during xdr deserialization

Doing this for every script still seems wrong, but if it's not much of a size hit and not noticeable cycle hit, maybe. So I agree we want defense at the JS engine level, to avoid anyone embedding it feeling our pain like this again. But can we find a way to do this once per JSXDRState?

/be
Comment on attachment 573932 [details] [diff] [review]
Check bytecode version during xdr deserialization

Looks correct to me -- as far as I understand things.

I don't know anything about the XDR API, so it is not incumbent on users to do any of their own bytecode version checking, and that they should be relying on checks done within the XDR methods themselves?  That would seem logical, that XDR itself would do those checks.  But I just want to confirm that, to be safe.

I'm guessing we'd land this on trunk and on branches, of course.  Something else would have to happen for release trees or what-have-you.  It's not entirely clear to me yet what that will be, or if it will involve bytecode version checks like this.
Attachment #573932 - Flags: review?(jwalden+bmo) → review+
(Rather, I should say I don't know "much", not "anything".  :-)  I'm familiar with the idea at a high level, but not with the exact way it's supposed to be used.)
(In reply to Brendan Eich [:brendan] from comment #62)
> Doing this for every script still seems wrong, but if it's not much of a
> size hit and not noticeable cycle hit, maybe. So I agree we want defense at
> the JS engine level, to avoid anyone embedding it feeling our pain like this
> again. But can we find a way to do this once per JSXDRState?
> 

It seems safer to do this per file. Also, gecko users only use JSXDRState once and throw it away, so it wouldn't save us anything. Other users, maybe, but it's only 4 bytes per script..

(In reply to Jeff Walden (remove +bmo to email) from comment #63)
> I don't know anything about the XDR API, so it is not incumbent on users to
> do any of their own bytecode version checking, and that they should be
> relying on checks done within the XDR methods themselves?  That would seem
> logical, that XDR itself would do those checks.  But I just want to confirm
> that, to be safe.
> 

We currently require users to do their own bytecode checking, but now we're removing that requirement.

> I'm guessing we'd land this on trunk and on branches, of course.  Something
> else would have to happen for release trees or what-have-you.  It's not
> entirely clear to me yet what that will be, or if it will involve bytecode
> version checks like this.

This should only be needed on builds that have startup cache built into omnijar. IIRC, that didn't happen until after Firefox 4, so fixing Firefox 8 and after should be enough.
See bug #701944, but what we really need is something that can tell us if omnijar doesn't match the binary.  I'm not sure if a bytecode version check will tell us that (probably not, or maybe too late?)  What we need is a simple check that we can implement and throw a simple win32 dialog.
https://hg.mozilla.org/mozilla-central/rev/470d6053b749

Marking fixed, but as noted in the comments, the bigger issue hasn't been fixed. Bug 701944 will help users fix identify broken installs and bug 701875 will prevent system restore from breaking Firefox installs.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Attachment #573932 - Flags: approval-mozilla-release?
Attachment #573932 - Flags: approval-mozilla-beta?
Attachment #573932 - Flags: approval-mozilla-aurora?
Would it be possible to get a try build of FF8+this patch for testing?
I believe a try build of FF9+this patch would also be desirable for QA to be able to test whether there is a startup crash after restoring from 9->8.
(In reply to Alex Keybl [:akeybl] from comment #69)
> I believe a try build of FF9+this patch would also be desirable for QA to be
> able to test whether there is a startup crash after restoring from 9->8.

Pushed on top of FIREFOX_9_0b1_RELEASE.

Status:  https://tbpl.mozilla.org/?tree=Try&rev=1bbf9e6288b1
Builds: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nthomas@mozilla.com-1bbf9e6288b1
(In reply to Alex Keybl [:akeybl] from comment #68)
> Would it be possible to get a try build of FF8+this patch for testing?

Attachment 573932 [details] [diff] doesn't apply to mozilla-release on FIREFOX_8_0_RELEASE. It fails the first hunk of js/src/jsxdrapi.cpp, because JS_XDRScriptObject is different.
(In reply to Michael Wu [:mwu] from comment #72)
> Created attachment 574015 [details] [diff] [review] [diff] [details] [review]
> Check bytecode version during xdr deserialization (FF8)

Pushed on top of FIREFOX_8.0_RELEASE

Status: https://tbpl.mozilla.org/?tree=Try&rev=8e4c5469bb44
Builds: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nthomas@mozilla.com-8e4c5469bb44
Hi.  I have a few questions about this patch.

1. Does this prevent us from crashing when we load bytecode from the omni.jar file?

2. Will this cause us to properly invalidate the local startup cache file that's in a person's profile?  (Which we were concerned wasn't happening?)  Is that related to this at all?

3. What's the end user experience with this patch when we end up with a mis-matched omni.jar file?  Right now we get a crash.  In previous releases we got a non-startup.  Like, nothing happened at all.
(In reply to Christopher Blizzard (:blizzard) from comment #74)
> Hi.  I have a few questions about this patch.
> 
> 1. Does this prevent us from crashing when we load bytecode from the
> omni.jar file?
> 

It prevents us from loading bytecode that we shouldn't be trying to load, so yes.

> 2. Will this cause us to properly invalidate the local startup cache file
> that's in a person's profile?  (Which we were concerned wasn't happening?) 
> Is that related to this at all?
> 

Not related. The profile startup cache should always be invalidated after an update.

> 3. What's the end user experience with this patch when we end up with a
> mis-matched omni.jar file?  Right now we get a crash.  In previous releases
> we got a non-startup.  Like, nothing happened at all.

This only prevents a crash in the JS engine. In the best case, the user gets a UI that mostly works, and in the worst case, nothing might happen or we might just crash in a different part of gecko. We need bug 701944 and bug 701875 to really fix the issue.
https://crash-analysis.mozilla.com/rkaiser/2011-11-11/2011-11-11.firefox.8.explosiveness.html is showing a new signature ([@ JSObject::getGlobal() ] ) which is Firefox 8 beta users who are crashing when trying to move to Firefox 9. https://crash-stats.mozilla.com/report/list?signature=JSObject%3A%3AgetGlobal%28%29 are those crashes which all mention updating.

Will this fix address this scenario as well?
(In reply to Marcia Knous [:marcia] from comment #76)
> https://crash-analysis.mozilla.com/rkaiser/2011-11-11/2011-11-11.firefox.8.
> explosiveness.html is showing a new signature ([@ JSObject::getGlobal() ] )
> which is Firefox 8 beta users who are crashing when trying to move to
> Firefox 9.
> https://crash-stats.mozilla.com/report/
> list?signature=JSObject%3A%3AgetGlobal%28%29 are those crashes which all
> mention updating.
> 
> Will this fix address this scenario as well?

It'll fix the crash, but their install would still have the wrong omnijar.
QA is reporting that "7.0.1->8 + bug 691847->7.0.1: No crash. Launches normally."

Is this expected behavior? I thought this bug only resolved 8.0.1->9->8.0.1. Thanks for the clarification.
If the script in omni.jar is invalid due to a failed bytecode check, I would guess fallback behavior is to load the script from scratch.  If that's the case, the script might then "work" fine.  I'd guess that's what's happening for QA there.

But of course this wouldn't affect other cached resources.  I noticed in my initial 7-with-8-omni debugging that the profile manager UI (what you get with ./firefox -P) was busted, almost certainly from the omni.jar mismatch.  There might be a similar bust-y thing in an 8->9->8 situation, but I wouldn't know what it is -- someone more familiar with the contents of omni.jar, and how they depend on the Gecko they depend on, would know better.
(In reply to Alex Keybl [:akeybl] from comment #78)
> QA is reporting that "7.0.1->8 + bug 691847->7.0.1: No crash. Launches
> normally."
> 
> Is this expected behavior? I thought this bug only resolved 8.0.1->9->8.0.1.
> Thanks for the clarification.

I didn't test it correctly it turns out. Re-testing shows:

7.0.1->8+bug 691847->7.0.1 no crash, no launch.
8->9+bug 691847->8 no crash, no launch.
(In reply to juan becerra [:juanb] from comment #80)
> I didn't test it correctly it turns out. Re-testing shows:
> 
> 7.0.1->8+bug 691847->7.0.1 no crash, no launch.

I can reproduce this but don't understand it, as I would have expected a crash. The differences between the mozconfigs for the 8.0 release build and the simulacrum on try are
* no official branding on try
* telemetry not enabled on try
* pgo not enabled on try
Hard to say if that could cause any significant differences or not.
Comment on attachment 573932 [details] [diff] [review]
Check bytecode version during xdr deserialization

[Triage Comment]
We don't want to hide the system restore startup crash reports from ourselves on aurora/beta (as QA has reported we would), so denying there.

Sent email to RS and Waldo about backing out of m-c if appropriate.
Attachment #573932 - Flags: approval-mozilla-beta?
Attachment #573932 - Flags: approval-mozilla-beta-
Attachment #573932 - Flags: approval-mozilla-aurora?
Attachment #573932 - Flags: approval-mozilla-aurora-
Just FYI, this is back with a vengeance in terms of crash rates now that 8.0.1 has been released. The actual number hasn't increased that much (16920 processed crashes yesterday, i.e. ~170,000 crashes that got reported to us) but due to the reduced number of people reporting in with 7 now, this now amounts to >11,000 crashes per million ADU.

We should at least try to email all people with that signature that have left an email in the report to ask them to re-install Firefox or Thunderbird (whatever they are seeing the crash with) from a fresh download and that will make the application work again.

Who can do that?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #83)
> Just FYI, this is back with a vengeance in terms of crash rates now that
> 8.0.1 has been released. The actual number hasn't increased that much (16920
> processed crashes yesterday, i.e. ~170,000 crashes that got reported to us)
> but due to the reduced number of people reporting in with 7 now, this now
> amounts to >11,000 crashes per million ADU.
> 
> We should at least try to email all people with that signature that have
> left an email in the report to ask them to re-install Firefox or Thunderbird
> (whatever they are seeing the crash with) from a fresh download and that
> will make the application work again.
> 
> Who can do that?

Cheng would be the right person to do that, but I'll leave it up to him to decide whether our current sumo front page strategy is more productive. It may be good to email those people who put their addresses in after a certain date, since any older crash reports with email have likely been resolved by now (we previously saw >90% of contacted users had already re-installed).
Target Milestone: --- → mozilla11
Tracking for Firefox 11 to make sure that we don't land this without making sure that we've first fixed the system restore issue.

Is there anything we can do to ensure that this doesn't fail silently? Would that require a separate bug?
(In reply to Alex Keybl [:akeybl] from comment #85)
> Is there anything we can do to ensure that this doesn't fail silently? Would
> that require a separate bug?

My understanding is that this will cause us to crash silently in this situation. Should we resolve in JS code or is that a crash reporter problem?
Attachment #574015 - Flags: approval-mozilla-release? → approval-mozilla-release-
Whiteboard: [qa+]
(In reply to Paul Silaghi [QA] from comment #87)
> Are any of these related to this bug ?
There are two kinds of stack, both unrelated to this bug:
Frame 	Module 	Signature [Expand] 	Source
0 	mozjs.dll 	js::Bindings::getLocalNameArray 	js/src/jsscript.cpp:204
1 	mozjs.dll 	js_XDRScript 	js/src/jsscript.cpp:399
2 	mozjs.dll 	js_XDRFunctionObject 	js/src/jsfun.cpp:1409
3 	mozjs.dll 	js_XDRScript 	js/src/jsscript.cpp:663
4 	mozjs.dll 	JS_XDRScript 	js/src/jsxdrapi.cpp:742
...

0 	mozjs.dll 	js::Bindings::getLocalNameArray 	js/src/jsscript.cpp:208
1 	mozjs.dll 	js_NewPrinter 	js/src/jsopcode.cpp:1020
2 	mozjs.dll 	DecompileExpression 	js/src/jsopcode.cpp:5512
3 	mozjs.dll 	js_DecompileValueGenerator 	js/src/jsopcode.cpp:5416
4 	mozjs.dll 	js_ReportValueErrorFlags 	js/src/jscntxt.cpp:1220
5 	mozjs.dll 	js_ReportIsNotFunction 	js/src/jsfun.cpp:2399
6 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:600
7 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:3506
...
Based on comment 88 marking the bug as verified fixed on FF 11.
There are no other crashes related to this bug reported.
Status: RESOLVED → VERIFIED
Whiteboard: [qa+] → [qa!]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: