Closed
Bug 1262731
Opened 9 years ago
Closed 8 years ago
startup crash in mozalloc_abort | NS_DebugBreak | NS_InitXPCOM2, because JS_Init() fails
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
mozilla50
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
(Keywords: crash, regression, topcrash, Whiteboard: [startupcrash])
Crash Data
Attachments
(5 files, 6 obsolete files)
5.59 KB,
patch
|
sfink
:
review+
n.nethercote
:
checkin+
|
Details | Diff | Splinter Review |
2.44 KB,
text/plain
|
Details | |
1.51 KB,
text/plain
|
Details | |
2.22 KB,
patch
|
Details | Diff | Splinter Review | |
2.57 KB,
patch
|
ted
:
review+
gchang
:
approval-mozilla-aurora+
gchang
:
approval-mozilla-beta+
n.nethercote
:
checkin+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-709fc1b1-2c91-44c4-9844-db1772160406.
=============================================================
I see 156 crashes with this signature, all since March 31. All the reports I looked at showed us hitting a NS_RUNTIMEABORT because JS_Init() fails.
It's a mix of Firefox and Thunderbird, and even one SeaMonkey. It's mostly in 48.0a1, which suggests it's gotten a lot more frequent recently.
I see 7 different ways that JS_Init() could fail early and return false. Maybe we should change its API (or introduce a compansion JS_Init2() function) to return a reason for the failure, and then the NS_RUNTIMEABORT message could be chosen accordingly? That would give us a better idea what's going wrong.
Assignee | ||
Comment 1•9 years ago
|
||
It's a start-up crash, of course. It occurred 7 times in Nightly 20160406030221, all to the same user. So it's the kind of crash that could cause someone to switch browsers quickly.
There are now 35 crashes over the last three days, making this 3% of all crashes and the #2 topcrash.
Assignee | ||
Comment 3•9 years ago
|
||
I'll try to write a diagnostic patch to give us more info.
Assignee: nobody → n.nethercote
Assignee | ||
Comment 4•9 years ago
|
||
I looked at all the functions called by JS_Init(). They more or less all have a
single reason they can go wrong, so I think one string per fallible function
should be good enough to make progress on this bug.
Attachment #8739252 -
Flags: review?(sphink)
Comment 5•9 years ago
|
||
Comment on attachment 8739252 [details] [diff] [review]
Add JS_InitWithFailureDiagnostic()
Review of attachment 8739252 [details] [diff] [review]:
-----------------------------------------------------------------
r=me for diagnostic usage, though I probably would've gone for a MOZ_RELEASE_ALWAYS_TRUE or something, to avoid needing to add all these strings. But since you've gone and done it, fine by me.
Attachment #8739252 -
Flags: review?(sphink) → review+
Assignee | ||
Comment 6•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/8ea65a181e4b63568eb30a82629beb27cdf1f2cb
Bug 1262731 - Add JS_InitWithFailureDiagnostic(). r=sfink.
Comment 7•9 years ago
|
||
must be a regression, no?
interesting that there are more Thunderbird crashes than FF.
I'm crashing on every TB start bp-331cc40c-0cad-40ce-a89b-959972160408
Severity: critical → blocker
Keywords: dogfood
Updated•9 years ago
|
Keywords: topcrash-thunderbird
Comment 8•9 years ago
|
||
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #7)
> must be a regression, no?
>
> interesting that there are more Thunderbird crashes than FF.
> I'm crashing on every TB start bp-331cc40c-0cad-40ce-a89b-959972160408
I forgot about bug 1262561
Keywords: dogfood,
topcrash-thunderbird
Comment 9•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Updated•9 years ago
|
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Target Milestone: mozilla48 → ---
Assignee | ||
Comment 10•9 years ago
|
||
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #8)
> >
> > interesting that there are more Thunderbird crashes than FF.
> > I'm crashing on every TB start bp-331cc40c-0cad-40ce-a89b-959972160408
>
> I forgot about bug 1262561
We have crash reports now with the diagnostic in, and all the ones I've seen have:
> ABORT: JS_InitWithFailureDiagnostic: u_init() failed
Which is a failure in ICU initialization, which is consistent with bug 1262561.
Furthermore, the last crash like this seen in Firefox was in Nightly 20160408030212 (April 8). There are no crashes like this in builds from April 9 and 10. So this is good! But it's unclear to me how it got fixed...
Trying to untangle things here:
- Bug 1262561 is about this NS_InitXPCOM2() abort in Thunderbird. It is marked as fixed.
- It depended on the following two bugs:
* Bug 1262567, which is the Thunderbird port of bug 1186060: "Use Visual Studio 2015 Update 1 for release builds"
* Bug 1262636, which is the Thunderbird port of bug 1239083: "Stop using ICU's autoconf build system"
Those two Firefox bugs landed more than 2 days ago, so I don't see why the April 9 Nightly would be the one where the crash disappeared. Maybe something else got fixed then -- ted, any ideas?
Assignee | ||
Comment 11•9 years ago
|
||
BTW, I'm inclined to keep the JS_InitWithFailureDiagnostic() patch in the tree, given that it will be useful if something else goes wrong during JS initialization in the future.
Comment 12•9 years ago
|
||
I haven't touched anything related to ICU since landing bug 1239083 last week, so I don't have any real clues here.
Flags: needinfo?(ted)
Comment 13•9 years ago
|
||
This signature started popping up in Firefox 46.0b10 but it was not present in 46.0b9
Assignee | ||
Comment 14•9 years ago
|
||
[Tracking Requested - why for this release]:
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #13)
> This signature started popping up in Firefox 46.0b10 but it was not present
> in 46.0b9
The obvious question is whether 46.0b10 has the vs2015 and ICU build changes from the blocking bugs.
KaiRo, what's the general procedure for seeing what changes have gone into a particular release, whether it be Nightly, Aurora, Beta or Release? Thank you.
Comment 15•9 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #14)
> KaiRo, what's the general procedure for seeing what changes have gone into a
> particular release, whether it be Nightly, Aurora, Beta or Release? Thank
> you.
Looking at the changelog/pushlog. Releases and betas you can look by tags, for nightly/aurora you need to find the changesets in the builds or on archive.m.o
For the range I mentioned, the pushlog is https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_46_0b9_RELEASE&tochange=FIREFOX_46_0b10_RELEASE
Flags: needinfo?(kairo)
Comment 16•9 years ago
|
||
That said, it looks like it's gone again in b11 and no other signature rose that same amount. Either it spread into multiple smaller signatures, or it was fixed again, or it only happens with certain compiler optimizations and we hit those with that one build.
Comment 17•9 years ago
|
||
Sounds like this was from Dragana's temporary diagnostic patch in bug Bug 1257216. That landed in beta 10 and backed out in beta 11. Nicholas does that help?
Flags: needinfo?(n.nethercote)
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #17)
> Sounds like this was from Dragana's temporary diagnostic patch in bug Bug
> 1257216. That landed in beta 10 and backed out in beta 11. Nicholas does
> that help?
Do you mean the "Increase toolkit.asyncshutdown.timeout.crash for WinXP" patch? That affected shutdown, but this bug is about XPCOM and JS initialization at start-up, so I don't think they are related.
Flags: needinfo?(n.nethercote)
Comment 19•9 years ago
|
||
Ah, ok. Thanks for the explanation, I jumped to conclusions!
It also looks like in beta 10 we had 565 crashes all for one installation. Maybe this was someone doing automated tests (fuzzing?) Since we don't see the signature for beta 11 or 46.0 RC, and this is one install's crash spike, I don't think we need to track this for 46.
Comment 20•9 years ago
|
||
Just seen this on Ubuntu wily 15.04 / Firefox Aurora Mozilla Firefox 48.0a2 48.0~a2~hg20160509r318685-0u.
Backtrace:
```
(gdb) backtrace
#0 0x00007fb45120abd9 in raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/pt-raise.c:36
#1 0x00007fb443a9afa7 in nsProfileLock::FatalSignalHandler (signo=11, info=0x7ffdda92ceb0, context=0x7ffdda92cd80)
at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/toolkit/profile/nsProfileLock.cpp:181
#2 <signal handler called>
#3 mozalloc_abort (
msg=msg@entry=0x7ffdda92d34c "[4058] ###!!! ABORT: JS_InitWithFailureDiagnostic: u_init() failed: file /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/xpcom/build/XPCOMInit.cpp, line 709")
at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/memory/mozalloc/mozalloc_abort.cpp:33
#4 0x00007fb4420b715b in Abort (
aMsg=0x7ffdda92d34c "[4058] ###!!! ABORT: JS_InitWithFailureDiagnostic: u_init() failed: file /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/xpcom/build/XPCOMInit.cpp, line 709")
at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/xpcom/base/nsDebugImpl.cpp:447
#5 NS_DebugBreak (aSeverity=<optimised out>, aSeverity@entry=3, aStr=0x7fb444a15eb8 "JS_InitWithFailureDiagnostic: u_init() failed", aExpr=aExpr@entry=0x0,
aFile=aFile@entry=0x7fb444532cef "/build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/xpcom/build/XPCOMInit.cpp", aLine=aLine@entry=709)
at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/xpcom/base/nsDebugImpl.cpp:434
#6 0x00007fb4421119c1 in NS_InitXPCOM2 (aResult=aResult@entry=0x7fb43f1c5488, aBinDirectory=<optimised out>, aAppFileLocationProvider=<optimised out>)
at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/xpcom/build/XPCOMInit.cpp:709
#7 0x00007fb442111bca in NS_InitXPCOM2 (aResult=aResult@entry=0x7fb43f1c5488, aBinDirectory=<optimised out>, aAppFileLocationProvider=<optimised out>)
at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/xpcom/build/XPCOMInit.cpp:485
#8 0x00007fb443a9df04 in ScopedXPCOMStartup::Initialize (this=0x7fb43f1c5488) at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/toolkit/xre/nsAppRunner.cpp:1533
#9 0x00007fb443aa2177 in XREMain::XRE_main (this=this@entry=0x7ffdda92d9c8, argc=argc@entry=1, argv=argv@entry=0x7ffdda92eed8, aAppData=aAppData@entry=0x7ffdda92dbc8)
at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/toolkit/xre/nsAppRunner.cpp:4447
#10 0x00007fb443aa23c6 in XRE_main (argc=1, argv=0x7ffdda92eed8, aAppData=0x7ffdda92dbc8, aFlags=<optimised out>) at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/toolkit/xre/nsAppRunner.cpp:4559
#11 0x0000561a1e05a3a4 in do_main (argc=1, argv=0x7ffdda92eed8, envp=<optimised out>, xreDirectory=0x7fb450174780) at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/browser/app/nsBrowserApp.cpp:220
#12 0x0000561a1e059a5a in main (argc=1, argv=0x7ffdda92eed8, envp=0x7ffdda92eee8) at /build/firefox-S3Cgxu/firefox-48.0~a2~hg20160509r318685/browser/app/nsBrowserApp.cpp:360
```
Comment 21•9 years ago
|
||
The output
> ABORT: JS_InitWithFailureDiagnostic: u_init() failed: file /firefox-48.0~a2~hg20160509r318685/xpcom/build/XPCOMInit.cpp, line 709")
is mentioned in bug 1262731, bug 1264836, bug 1272246, bug 1272209. (In case some might be dups.)
Comment 22•8 years ago
|
||
[Tracking Requested - why for this release]:
this crash is spiking up in early 48.0b1 data, where it's making up 11.75% of crashes currently, always on startup. it seems to happen on all windows versions but predominantly on windos xp (two thirds of the time)
Comment 23•8 years ago
|
||
I wonder if there's some Antivirus removing the ICU .dat file or something like that?
Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #23)
> I wonder if there's some Antivirus removing the ICU .dat file or something
> like that?
I just tried removing that file on my Windows installation of Beta and I got exactly this crash signature (https://crash-stats.mozilla.com/report/index/e662c47f-52d4-4a38-96a1-4e28a2160610). So it's a plausible hypothesis.
Assignee | ||
Comment 25•8 years ago
|
||
I did some digging into ICU init failure modes. I built with UDATA_DEBUG enabled and got this output when icudt56l.dat is present and Firefox starts up normally.
Assignee | ||
Comment 26•8 years ago
|
||
Here's the UDATA_DEBUG output when icudt56l.dat is missing.
Assignee | ||
Comment 27•8 years ago
|
||
So basically, ICU starts by looking for an individual file (cnvalias.icu). That fails, so it looks for the all-in-one file (icudt56l.dat), which fails, which causes u_init() to fail.
I traced through this with a debugger, and the point where the failure is detected is here:
https://dxr.mozilla.org/mozilla-central/rev/f8e3b81a79f45ef8647c98281a9a00d1ddb28b73/intl/icu/source/common/udata.cpp#764
The internal error code is U_FILE_ACCESS_ERROR, which has the value 4.
I could augment my previous diagnostic patch to print the ICU error number. If all the crashes with the extra diagnostic show an error code of 4, then we'll know it's because the icudt56l.dat file is missing, or in some other way broken. (It's possible there are other code paths under u_init() that can cause an U_FILE_ACCESS_ERROR failure.)
Assignee | ||
Comment 28•8 years ago
|
||
sfink: please review the code.
ted: do you think this is a helpful change? It changes the abort message on ICU
init failure to something like this:
> ABORT: JS_InitWithFailureDiagnostic: u_init() failed with U_FILE_ACCESS_ERROR
Attachment #8761914 -
Flags: review?(sphink)
Attachment #8761914 -
Flags: feedback?(ted)
Updated•8 years ago
|
Attachment #8761914 -
Flags: feedback?(ted) → feedback+
Comment 29•8 years ago
|
||
I think it can't hurt, certainly. I'm pretty sure we'll find that all of these crashes are just "we couldn't load the ICU data file".
Comment 30•8 years ago
|
||
Comment on attachment 8761914 [details] [diff] [review]
Add more detail to JS_InitWithFailureDiagnostic()'s failure message when ICU fails
Review of attachment 8761914 [details] [diff] [review]:
-----------------------------------------------------------------
Pedantically, this is incorrect on OOM. In fact, an OOM would make it appear to succeed. The "proper" way to do this would be to use an infallible allocator.
But this seems good enough for its intended purpose. (If jorendorff were still doing his oom static analysis thing, it probably would flag this, but I think he abandoned it in favor of the Result<> stuff.)
Attachment #8761914 -
Flags: review?(sphink) → review+
Assignee | ||
Comment 31•8 years ago
|
||
> Pedantically, this is incorrect on OOM. In fact, an OOM would make it appear
> to succeed. The "proper" way to do this would be to use an infallible
> allocator.
Good point.
How would I do infallible allocation in SM?
Alternatively, I could separate success/failure indication from the error message, e.g. by returning |bool| and having an outparam for the message.
I'll fix it one way or the other before landing.
Assignee | ||
Comment 32•8 years ago
|
||
This is an updated version that handles OOM properly. But I now think this
isn't the best approach, because:
(a) it gives us a little more information about the ICU failure, but we could
do better;
(b) returning an error code instead of a string would probably be better.
In terms of (a), we could instead do a more specific check, e.g. check if the
icu .dat file is present, and if it is, how big it is, and include that info in
the abort message. That would give a clearer diagnosis.
Assignee | ||
Updated•8 years ago
|
Attachment #8761914 -
Attachment is obsolete: true
Assignee | ||
Comment 33•8 years ago
|
||
> In terms of (a), we could instead do a more specific check, e.g. check if the
> icu .dat file is present, and if it is, how big it is, and include that info
> in
> the abort message. That would give a clearer diagnosis.
I'm trying to work out how to do this. Some pieces:
- The filename can be obtained via ICU_DATA_FILE -- we'll need to pass it in, probably by specifying a -D flag in xpcom/build/moz.build for XPCOMInit.cpp.
- The file path... it's probably the distdir? I'm not sure how to get that from C++ code.
- Use PR_Access and/or PR_GetFileInfo to get info about the file.
Updated•8 years ago
|
Crash Signature: [@ mozalloc_abort | NS_DebugBreak | NS_InitXPCOM2] → [@ mozalloc_abort | NS_DebugBreak | NS_InitXPCOM2]
[@ Abort | JS_InitWithFailureDiagnostic: u_init() failed | mozalloc_abort | NS_DebugBreak | NS_InitXPCOM2]
Comment 34•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] (on vacation until July 11) from comment #33)
> - The filename can be obtained via ICU_DATA_FILE -- we'll need to pass it
> in, probably by specifying a -D flag in xpcom/build/moz.build for
> XPCOMInit.cpp.
This is correct. You should just be able to do `DEFINES['ICU_DATA_FILE'] = CONFIG['ICU_DATA_FILE']`.
> - The file path... it's probably the distdir? I'm not sure how to get that
> from C++ code.
It's NS_GRE_DIR:
https://dxr.mozilla.org/mozilla-central/rev/51377a64158941f89ed73f388ae437cfa494c030/xpcom/build/XPCOMInit.cpp#696
Comment 36•8 years ago
|
||
Tracking as it is an important crash (startup)
status-firefox47:
--- → wontfix
Comment 37•8 years ago
|
||
tagging this so that it shows up as regression in firefox 48 (introduced by bug 926980, bug 1239083).
status-firefox46:
wontfix → ---
status-firefox47:
wontfix → ---
status-firefox49:
--- → affected
status-firefox50:
--- → affected
Keywords: regression
Assignee | ||
Updated•8 years ago
|
Attachment #8762243 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Attachment #8739252 -
Flags: checkin+
Assignee | ||
Comment 38•8 years ago
|
||
I was going to ask ted to review this, but he's not accepting review requests.
glandium, maybe you can take a look?
I tested this on Windows, Mac and Linux. It'll give a clearer abort cause.
But I'm also wondering if we should just switch back to embedding the ICU data
into the binary. AIUI the motivation for switching to a separate file was
that on Mac we included the data twice due to universal binaries, is that
right?
Attachment #8771198 -
Flags: review?(mh+mozilla)
Comment 39•8 years ago
|
||
That's correct. What's the platform breakdown on these crashes? If they're mostly Windows we could switch back on Windows-only. We're already building a linkable version of the data file (we link it in the JS shell), we'd just have to do some moz.build fiddling to do the right thing.
Comment 40•8 years ago
|
||
The other motivation for the change was to be able to commit the pre-built data file to the tree instead of having to run ICU's build to generate it, which is a pretty good thing, but my workaround to link the data into the JS shell is a simple yasm file that includes the binary data.
Assignee | ||
Comment 41•8 years ago
|
||
> That's correct. What's the platform breakdown on these crashes? If they're
> mostly Windows we could switch back on Windows-only
Among crashes in the past week it's 100% Windows. And it's the #10 top crasher on 48 Beta.
Comment 42•8 years ago
|
||
Comment on attachment 8771198 [details] [diff] [review]
Abort more informatively when the ICU data file is missing or unreadable
Review of attachment 8771198 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/build/XPCOMInit.cpp
@@ +708,5 @@
> + // JS_InitWithFailureDiagnostic() will fail anyway. By checking here the
> + // cause of the abort will be clearer.)
> + nativeGREPath.Append(FILE_PATH_SEPARATOR);
> + nativeGREPath.Append(NS_STRINGIFY(ICU_DATA_FILE));
> + if (PR_Access(nativeGREPath.get(), PR_ACCESS_EXISTS) == PR_FAILURE) {
nsIFile has a `boolean exists()` method. (so, in C++, something like nsresult Exists(bool *)). There are also isFile and isReadable.
@@ +709,5 @@
> + // cause of the abort will be clearer.)
> + nativeGREPath.Append(FILE_PATH_SEPARATOR);
> + nativeGREPath.Append(NS_STRINGIFY(ICU_DATA_FILE));
> + if (PR_Access(nativeGREPath.get(), PR_ACCESS_EXISTS) == PR_FAILURE) {
> + NS_RUNTIMEABORT("ICU data file is missing");
Is there a less abrupt way we could fail here? Like, can't we have the js engine be happy with not having the data, but have the intl api degraded? That sounds like a better outcome than crashing at startup with a message that users won't understand (if it's presented to them at all. It most certainly isn't for most people on Linux and Mac. I don't know if that opens a popup on Windows)
What happens if we still go through startup but don't call u_setDataDirectory?
Attachment #8771198 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 43•8 years ago
|
||
> Is there a less abrupt way we could fail here? Like, can't we have the js
> engine be happy with not having the data, but have the intl api degraded?
I don't know. I'm just trying to get a clearer idea of what went wrong by making the crash reports more informative.
> What happens if we still go through startup but don't call
> u_setDataDirectory?
I tried commenting out the u_setDataDirectory() call. The call to u_init() fails, which meands JS_InitWithDiagnosticFailure() fails, which causes us to abort.
Assignee | ||
Comment 44•8 years ago
|
||
I switched to using nsIFile methods.
Assignee | ||
Updated•8 years ago
|
Attachment #8771198 -
Attachment is obsolete: true
Assignee | ||
Comment 45•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Attachment #8771253 -
Attachment is obsolete: true
Nick what's the status? Do we want to land this patch for 48? Comment #41 says this is a top crash there.
Flags: needinfo?(n.nethercote)
Comment 47•8 years ago
|
||
Jeff can you comment on https://bugzilla.mozilla.org/show_bug.cgi?id=1262731#c42 please?
Flags: needinfo?(jwalden+bmo)
Assignee | ||
Comment 48•8 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #46)
> Nick what's the status? Do we want to land this patch for 48? Comment #41
> says this is a top crash there.
The status is unclear. My patch would give slightly more clarity about what's going wrong -- i.e. is the file missing, as we suspect? -- but it won't get us any closer to a fix.
The best option, IMO, is to do what Ted suggested in comment 39 and comment 40: on Windows, link the data file back into the binary, which is how it used to be done. Ted would have to be the one to do that, AFAIK.
Another possibility is to try and continue with the Intl API degraded, as suggested in comment 42. But I don't know if that's feasible, and it would result in a small number of users having different behaviour, which doesn't sound good.
Flags: needinfo?(n.nethercote)
Comment 49•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #48)
> Another possibility is to try and continue with the Intl API degraded, as
> suggested in comment 42. But I don't know if that's feasible, and it would
> result in a small number of users having different behaviour, which doesn't
> sound good.
Doesn't sound good, but it sounds better than users not being able to start Firefox at all.
Assignee | ||
Comment 50•8 years ago
|
||
(In reply to Naveed Ihsanullah [:naveed] from comment #47)
> Jeff can you comment on
> https://bugzilla.mozilla.org/show_bug.cgi?id=1262731#c42 please?
I tried making ICU start-up deliberately fail and then used the Intl API. I got a JS exception instead of the normal behaviour. So, better than a crash, but has the potential to lead to lots of mysterious behaviour.
I'm looking at the "don't have a separate file on Windows" option. IIUC it's not that hard. I'm testing a patch locally right now.
Flags: needinfo?(jwalden+bmo)
Assignee | ||
Comment 51•8 years ago
|
||
I've tested this locally and it seems to work. I'll do a try run shortly.
glandium said on IRC "I'm tempted to say we should keep things similar on all
platforms, whether we do use a separate data file or not". But then we'd go
back to the data duplication on Mac... I can live with Windows being different.
Attachment #8772215 -
Flags: review?(ted)
Assignee | ||
Comment 52•8 years ago
|
||
The previous version of this patch broke Windows packaging on try. Hopefully
this will work better; I just started another try run.
Attachment #8772228 -
Flags: review?(ted)
Assignee | ||
Updated•8 years ago
|
Attachment #8772215 -
Attachment is obsolete: true
Attachment #8772215 -
Flags: review?(ted)
Assignee | ||
Comment 53•8 years ago
|
||
> The previous version of this patch broke Windows packaging on try. Hopefully
> this will work better; I just started another try run.
Looking better:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3fb8d98fbbda (original, Windows is busted)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=235423a0c30e (with Windows package error fixed)
Comment 54•8 years ago
|
||
Comment on attachment 8772228 [details] [diff] [review]
Don't use a separate ICU data file on Windows
Review of attachment 8772228 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/installer/package-manifest.in
@@ +100,5 @@
> @BINPATH@/api-ms-win-*.dll
> @BINPATH@/ucrtbase.dll
> #endif
> #endif
> +#ifdef MOZ_ICU_DATA_ARCHIVE
This won't work because MOZ_ICU_DATA_ARCHIVE isn't in DEFINES, you'll have to set it here:
https://dxr.mozilla.org/mozilla-central/rev/4c05938a64a7fde3ac2d7f4493aee1c5f2ad8a0a/browser/installer/Makefile.in#121
::: build/autoconf/icu.m4
@@ +85,5 @@
> dnl JS standalone so that embedders don't have to deal with it.
> + dnl We also don't do it on Windows because sometimes the file goes
> + dnl missing -- possibly due to overzealous antivirus software? --
> + dnl which prevents the browser from starting up :(
> + if test -z "$JS_STANDALONE" -a -z "$MOZ_SYSTEM_ICU" -a "$OS_TARGET" != WINNT; then
I bet we can move this all to moz.configure now! (You don't have to do that in this patch.)
Assignee | ||
Comment 55•8 years ago
|
||
In my local testing I couldn't see any effect from the additional
browser/installer/Makefile.in change, but I'll take your word that it's
necessary.
Attachment #8773100 -
Flags: review?(ted)
Assignee | ||
Updated•8 years ago
|
Attachment #8772228 -
Attachment is obsolete: true
Attachment #8772228 -
Flags: review?(ted)
Updated•8 years ago
|
Attachment #8773100 -
Flags: review?(ted) → review+
Assignee | ||
Comment 56•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/59e9f5b2c2469df928fda147e70709613d21b80e
Bug 1262731 - Don't use a separate ICU data file on Windows. r=ted.
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Assignee | ||
Updated•8 years ago
|
Attachment #8773100 -
Flags: checkin+
Assignee | ||
Comment 57•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=65c401d9043b was my final try push, BTW
Assignee | ||
Comment 58•8 years ago
|
||
Comment on attachment 8773100 [details] [diff] [review]
Don't use a separate ICU data file on Windows
Approval Request Comment
[Feature/regressing bug #]: bug 1239083.
[User impact if declined]: Firefox won't start. A topcrash on Beta.
[Describe test coverage new/current, TreeHerder]: just landed on inbound. All the existing Intl tests should cover this.
[Risks and why]: low? It's a build system change. Any problems are likely to be obvious, e.g. Firefox won't start.
[String/UUID change made/needed]: none.
Attachment #8773100 -
Flags: approval-mozilla-beta?
Attachment #8773100 -
Flags: approval-mozilla-aurora?
Comment 59•8 years ago
|
||
Comment on attachment 8773100 [details] [diff] [review]
Don't use a separate ICU data file on Windows
Review of attachment 8773100 [details] [diff] [review]:
-----------------------------------------------------------------
This patch fixes a topcrash. Let's take it in 48 beta 10 and aurora.
Attachment #8773100 -
Flags: approval-mozilla-beta?
Attachment #8773100 -
Flags: approval-mozilla-beta+
Attachment #8773100 -
Flags: approval-mozilla-aurora?
Attachment #8773100 -
Flags: approval-mozilla-aurora+
Comment 60•8 years ago
|
||
not on central right now but will uplift as soon as this hits central
Flags: needinfo?(cbook)
Comment 61•8 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 9 years ago → 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment 62•8 years ago
|
||
Updated•8 years ago
|
Summary: crash in mozalloc_abort | NS_DebugBreak | NS_InitXPCOM2, because JS_Init() fails → startup crash in mozalloc_abort | NS_DebugBreak | NS_InitXPCOM2, because JS_Init() fails
Whiteboard: [startupcrash]
Assignee | ||
Comment 63•8 years ago
|
||
I can verify that this has been fixed.
- Nightly: The last Nightly build on which this crash occurred was 20160721030216, which matches the m-c push date.
- Aurora: The last Aurora build on which this crash occurred was 20160721004019, which also matches the m-c push date.
- Beta: In the last week there are only 3 of these crashes in 48.0b10. I'm not sure what caused those (possibly a different kind of ICU data corruption to what the landed patch addresses) but in comparison 48.0b9 had 17,380 crashes.
Status: RESOLVED → VERIFIED
Comment 64•8 years ago
|
||
Any idea why crashes with this signature were ~56% of startup crashes for 48.0b9 a few days ago [1] and are ~94% now [2]?
Maybe users on 48.0b9 can't update anymore because of this crash?
[1]: https://crash-stats.mozilla.com/search/?product=Firefox&version=48.0b9&uptime=%3C60&date=%3E%3D2016-07-20&date=%3E2016-07-21&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
[2]: https://crash-stats.mozilla.com/search/?product=Firefox&version=48.0b9&uptime=%3C60&date=%3E%3D2016-07-26&date=%3E2016-07-27&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Comment 65•8 years ago
|
||
Ted, what do you think about comment #64? Thanks
Flags: needinfo?(ted)
Flags: needinfo?(n.nethercote)
Comment 66•8 years ago
|
||
Looking at the comments, many users say that Firefox started crashing after a system restore, after forcibly turning off the computer, after a power outage, after updating to Windows 10.
Also, interestingly, looks like 100% of crashes with this signature occur when e10s is disabled (~87% of crashes occur when e10s is disabled overall in Beta): https://raw.githubusercontent.com/mozilla/stab-crashes/1936b4980cd619b45d0ca82f782bcc1b2542b3bb/plots/beta/Abort%20%7C%20JS_InitWithFailureDiagnostic%3A%20u_init()%20failed%20%7C%20mozalloc_abort%20%7C%20NS_DebugBreak%20%7C%20NS_InitXPCOM2.png
Comment 67•8 years ago
|
||
Do we know if they crash only once or always?
Comment 68•8 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #67)
> Do we know if they crash only once or always?
I see a lot of crashes with the same install time, so I guess they're crashing multiple times.
I can reproduce the crash by installing 48.0b9 and removing the ICU data file before starting Firefox. I always crash (I'm not able to launch Firefox).
I'll test 48.0b10 next.
Comment 69•8 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #68)
> (In reply to Sylvestre Ledru [:sylvestre] from comment #67)
> > Do we know if they crash only once or always?
>
> I see a lot of crashes with the same install time, so I guess they're
> crashing multiple times.
>
> I can reproduce the crash by installing 48.0b9 and removing the ICU data
> file before starting Firefox. I always crash (I'm not able to launch
> Firefox).
>
> I'll test 48.0b10 next.
Well, obviously I can't reproduce with 48.0b10, since the ICU data file is no more (so I
can't simply remove it to trigger the crash).
Comment 70•8 years ago
|
||
I don't know offhand, but that's plausible. The comments about system restore etc support my theory from comment 23. This sounds similar to the situation that caused us to rename omni.jar -> omni.ja.
I think the e10s stat is not interesting since this is a startup crash, so we probably don't have the actual e10s enabled/disabled data yet.
Flags: needinfo?(ted)
Assignee | ||
Comment 71•8 years ago
|
||
I agree that e10s is irrelevant here. I don't have anything else to add, sorry.
Flags: needinfo?(n.nethercote)
Comment 72•8 years ago
|
||
OK, so, let's try to summarize that:
* we think it might be caused by anti virus removing the dat file
* we had a startup crash spike since July 22
* maybe this spike can be explained by an (incorrect) update of an antivirus removing this file
* Nicholas fixed in beta 10
* we don't have any signature of this crash in beta 10 or the two RC
While we can't really explain the spike, I guess we can say that it is not blocking the 48 release.
Nicholas, do you agree with all that? Thanks
Flags: needinfo?(n.nethercote)
Assignee | ||
Comment 73•8 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #72)
> OK, so, let's try to summarize that:
> * we think it might be caused by anti virus removing the dat file
> * we had a startup crash spike since July 22
> * maybe this spike can be explained by an (incorrect) update of an antivirus
> removing this file
> * Nicholas fixed in beta 10
> * we don't have any signature of this crash in beta 10 or the two RC
>
> While we can't really explain the spike, I guess we can say that it is not
> blocking the 48 release.
We know it's something going wrong with the loading of the data file -- it must be missing, corrupted, or something else like that. Missing due to AV is our best guess.
> Nicholas, do you agree with all that? Thanks
Yes.
Flags: needinfo?(n.nethercote)
Comment 74•8 years ago
|
||
Crash volume for signature 'Abort | JS_InitWithFailureDiagnostic: u_init() failed | mozalloc_abort | NS_DebugBreak | NS_InitXPCOM2':
- nightly(version 50):457 crashes from 2016-06-06.
- aurora (version 49):1403 crashes from 2016-06-07.
- beta (version 48):59015 crashes from 2016-06-06.
- release(version 47):0 crashes from 2016-05-31.
- esr (version 45):0 crashes from 2016-04-07.
Crash volume on the last weeks:
W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7
- nightly 99 3 1 4 23 0 0
- aurora 515 47 64 25 26 0 0
- beta 7819 1896 1521 1755 1037 0 0
- release 0 0 0 0 0 0 0
- esr 0 0 0 0 0 0 0
Affected platform: Windows
Comment 75•8 years ago
|
||
Andrei, I was wondering if you could help identifying which anti virus is causing that mess?
This is impacting 48 beta 9.
I would like to have a better understanding of the situation before we release 48
Flags: needinfo?(andrei.vaida)
Comment 76•8 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #75)
> I would like to have a better understanding of the situation before we
> release 48
There should be nothing to worry about for the release channel: 47 didn't have the file that was removed by $whatever (AV or otherwise) and caused crashes in 48 betas. And 48 release won't have it either. So $whatever caused the problem by removing the file in the first place is not going to cause problems by removing the file because the file is not there to be removed anymore.
The remaining problem, however, is people who were on the beta channel, had the file removed, and won't get autoupdated because they can't even start Firefox.
Comment 77•8 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #76)
> The remaining problem, however, is people who were on the beta channel, had
> the file removed, and won't get autoupdated because they can't even start
> Firefox.
Is there anything we can do here? I guess they will have to reinstall...
Comment 78•8 years ago
|
||
I've found a post on the Google Chrome help forum with a similar problem: https://productforums.google.com/forum/#!topic/chrome/oWe87QVrmZA.
Comment 79•8 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #77)
> (In reply to Mike Hommey [:glandium] from comment #76)
> > The remaining problem, however, is people who were on the beta channel, had
> > the file removed, and won't get autoupdated because they can't even start
> > Firefox.
> Is there anything we can do here? I guess they will have to reinstall...
Short of actively developing something that can autoupdate a firefox that crashes on startup, there is nothing that can be done. And even then, that would only save future problems like this one, not this one.
Comment 80•8 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #75)
> Andrei, I was wondering if you could help identifying which anti virus is
> causing that mess?
> This is impacting 48 beta 9.
>
> I would like to have a better understanding of the situation before we
> release 48
I've reproduced the crash on 48.0b9-build1 by manually removing the icudt56l.dat file.
Unfortunately, I was unable to find a scenario in which this happens as a result of an antivirus software's interference.
I've tested on Windows XP Professional x64 Edition (SP2, build 3790.srv03_sp2_rtm.070216-1710), using the following antivirus solutions:
- AVG Antivirus Business Edition
- AVG Internet Security 2016
- Avast Free AntiVirus 2016 (which also installs the Avast Online Security 12.0.88 and Avast SafePrice 10.3.5.39 add-ons)
- Comodo Internet Security 2016 (which also installs GeekBuddy)
- ESET NOD32 Antivirus (Trial, 9.0.386.0)
I've also tried looking for other clues, to see if this file was ever suspected of being malicious, but Virus Total's analysis shows that this file is trusted [1] on all its 57 sources.
According to Comment 66, this looks like a bug that requires very specific conditions to manifest itself, but at this point it's like a needle in the haystack.
[1] https://www.virustotal.com/en/file/b5df330ddb768687717b552be3699e5c442e9c69085dd8bc3da1abedb8375df4/analysis/
Flags: needinfo?(andrei.vaida)
Comment 81•8 years ago
|
||
The System Restore theory from comment 70 seems pretty plausible too. I wonder if icudt56l.dat would be present if you deleted the intact Firefox install dir and then ran a System Restore.
Comment 82•8 years ago
|
||
Windows 7 represents ~88% of the crashes, so I got the impacted Windows 7 version:
6.1.7601 Service Pack 1 32394 60.98 %
6.1.7600 20688 38.95 %
6.1.7600 Service Pack 1 14 0.03 %
6.1.7601 Service Pack 2 10 0.02 %
6.1.7600 Service Pack 3 8 0.02 %
6.1.7601 4 0.01%
and I found this:
https://support.microsoft.com/en-us/kb/3161647
an update has been made the 2016-06-21 (crashes appeared the 22nd of June)
So maybe we have a correlation here.
Comment 83•8 years ago
|
||
I just confirmed that comment 70/81 are in fact a way of getting into this state. In a Windows 7 VM:
* Install Fx48b9, confirm that icudt56l.dat is present.
* Create a new System Restore point.
* Delete the Firefox install directory.
* Restore your earlier Restore Point.
* icudt56l.dat is no longer present in the install dir and "Could load XPCOM" errors trying to launch Firefox.
Comment 84•8 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #83)
> * icudt56l.dat is no longer present in the install dir and "Could load
> XPCOM" errors trying to launch Firefox.
*Couldn't* of course :)
Comment 85•8 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #84)
> (In reply to Ryan VanderMeulen [:RyanVM] from comment #83)
> > * icudt56l.dat is no longer present in the install dir and "Could load
> > XPCOM" errors trying to launch Firefox.
>
> *Couldn't* of course :)
We have a support article on this that suggests a reinstall. Could probably use some cleanup work though.
https://support.mozilla.org/en-US/questions/943596
You need to log in
before you can comment on or make changes to this bug.
Description
•