Closed Bug 978450 Opened 6 years ago Closed 6 years ago

Homescreen keeps crashing 1/3 nightly (Peak, Keon, Hamachi)

Categories

(Core :: JavaScript Engine: JIT, defect, critical)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED FIXED
1.4 S3 (14mar)
blocking-b2g 1.4+
Tracking Status
b2g-v1.4 --- fixed

People

(Reporter: past, Unassigned)

References

Details

(4 keywords, Whiteboard: [b2g-crash])

Crash Data

Attachments

(1 file)

Attached file Logcat
The latest Peak nightly from Geeksphone (March 1st) results in a homescreen that keeps crashing and restarts. I tried both updating to the 1/3 build using the settings app and by flashing the Geeksphone-provided tarball. I've also had the Dialer app crash too, after a brief period where the homescreen remained stable, until I eventually reverted to an older nightly. Logcat is attached.
(In reply to Panos Astithas [:past] from comment #0)
> Created attachment 8384129 [details]
> Logcat
> 
> The latest Peak nightly from Geeksphone (March 1st) results in a homescreen
> that keeps crashing and restarts. I tried both updating to the 1/3 build
> using the settings app and by flashing the Geeksphone-provided tarball. I've
> also had the Dialer app crash too, after a brief period where the homescreen
> remained stable, until I eventually reverted to an older nightly. Logcat is
> attached.

Can you get a crash report URL? See https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Getting_crashes_off_the_Device.

This might related to the automation problem we are seeing in bug 978458.
Thanks for the link. Here are the latest crashes that I found on the device:

https://crash-stats.mozilla.com/report/index/0d162a2c-0705-4c5a-9716-4d1662140301
https://crash-stats.mozilla.com/report/index/12d85604-166b-412e-9a44-42e942140301
https://crash-stats.mozilla.com/report/index/8d64e5cd-bf92-40bd-82e6-dba3f2140301
https://crash-stats.mozilla.com/report/index/ffb5b976-cb76-4fdd-bf2b-8a48d2140301

Top of the stack:

0 @0x431899e8 	
1 js::jit::EnterBaselineMethod(JSContext*, js::RunState&) 	/home/geeksphone/FOS/peak/gecko/js/src/jit/BaselineJIT.cpp
2 js::Invoke 	/home/geeksphone/FOS/peak/gecko/js/src/vm/Interpreter.cpp
3 js_fun_call 	/home/geeksphone/FOS/peak/gecko/js/src/jsfun.cpp
4 js::Invoke 	/home/geeksphone/FOS/peak/gecko/js/src/jscntxtinlines.h
5 Interpret 	/home/geeksphone/FOS/peak/gecko/js/src/vm/Interpreter.cpp
6 js::Invoke 	/home/geeksphone/FOS/peak/gecko/js/src/vm/Interpreter.cpp
7 JS_CallFunction(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSFunction*>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) 	/home/geeksphone/FOS/peak/gecko/js/src/jsapi.cpp
8 mozJSComponentLoader::ObjectForLocation(nsIFile*, nsIURI*, JSObject**, JSScript**, char**, bool, JS::MutableHandle<JS::Value>) 	/home/geeksphone/FOS/peak/gecko/js/xpconnect/loader/mozJSComponentLoader.cpp
9 mozJSComponentLoader::ImportInto(nsACString_internal const&, JS::Handle<JSObject*>, JSContext*, JS::MutableHandle<JSObject*>) 	/home/geeksphone/FOS/peak/gecko/js/xpconnect/loader/mozJSComponentLoader.cpp
10 mozJSComponentLoader::Import(nsACString_internal const&, JS::Handle<JS::Value>, JSContext*, unsigned char, JS::MutableHandle<JS::Value>) 	/home/geeksphone/FOS/peak/gecko/js/xpconnect/loader/mozJSComponentLoader.cpp
11 nsXPCComponents_Utils::Import(nsACString_internal const&, JS::Handle<JS::Value>, JSContext*, unsigned char, JS::MutableHandle<JS::Value>) 	/home/geeksphone/FOS/peak/gecko/js/xpconnect/src/XPCComponents.cpp
12 NS_InvokeByIndex 	/home/geeksphone/FOS/peak/gecko/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp
13 XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	/home/geeksphone/FOS/peak/gecko/js/xpconnect/src/XPCWrappedNative.cpp
14 XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) 	/home/geeksphone/FOS/peak/gecko/js/xpconnect/src/XPCWrappedNativeJSOps.cpp
15 js::Invoke 	/home/geeksphone/FOS/peak/gecko/js/src/jscntxtinlines.h
16 Interpret 	/home/geeksphone/FOS/peak/gecko/js/src/vm/Interpreter.cpp
Crash Signature: [@ @0x0 | js::jit::EnterBaselineMethod(JSContext*, js::RunState&) ]
Keywords: crash
Severity: normal → critical
Component: Gaia::Homescreen → JavaScript Engine
Product: Firefox OS → Core
Whiteboard: [b2g-crash]
Nicolas, can you reproduce this?
Component: JavaScript Engine → JavaScript Engine: JIT
Flags: needinfo?(nicolas.b.pierron)
Not sure this is the same issue as I don't have Wi-Fi and so my dumps are not sent, but I see the exact same on a Fugu. When the Homescreen eventually launches then any app will crash, except rarely I can launch it.
I reproduced the issue on Keon device too. In case homescreeen launches then any app will crash.
If I rolled back before bug 976120 (m-c 171153:6cf927291112) was checked-in, I don't see crash.
We get the same issue on Hamachis. After the flash the HomeScreen app is crashing. I can't send a report because I can't get to the WiFi settings screen.

The GaiaUI-tests are failing because of this.

Gaia      a980b8f54956ed470667033630b02492efdf4a07
Gecko     https://hg.mozilla.org/mozilla-central/rev/0085a162499f
BuildID   20140301160203
Version   30.0a1
ro.build.version.incremental=324
ro.build.date=Thu Dec 19 14:04:55 CST 2013
I *may* know what's going on here, would one of you be able to test a patch?
Yes of course, please attach it :)

FWIW it seems that everything is working fine when I flash Gecko and b2g starts, but then after a reboot the bug happens. Happened twice already.
Summary: Homescreen keeps crashing in the Peak 1/3 nightly → Homescreen keeps crashing 1/3 nightly (Peak, Keon, Hamachi)
I can confirm the issue on hamachi.

Found on:
Alcatel One Touch Fire production (got from T-mobile Poland)
B2G version: 1.4.0.0-prerelease master
Platform version: 30.0a1
Build Identifier: 20140301160203
Git commit info: 2014-02-28
blocking-b2g: --- → 1.4?
Although I guess it does not add much value, I can also confirm the issue in the Unagi using today's build Gecko-db5f706.Gaia-5684544.
I'm bisecting right now.
And Inari:
 Gaia      a980b8f54956ed470667033630b02492efdf4a07                         
 Gecko     https://hg.mozilla.org/mozilla-central/rev/0085a162499f         
 BuildID   20140301160203                                                   
 Version   30.0a1                                                          
 ro.build.version.incremental=eng.cltbld.20140227.192705                    
 ro.build.date=Thu Feb 27 19:49:33 EST 2014
Breaking down on tinderbox builds (to get slightly more granularity than nightly builds)

The Gaia commit for the first breaking tinderbox build was:
Gaia a980b8f54956ed470667033630b02492efdf4a07
and Gecko revision:
Gecko 8abc76dedec2

Linked to here:
https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/tinderbox-builds/mozilla-central-hamachi-eng/20140228130531/
Relevant JS Engine bugs in push log:

* bug 939562
* bug 977117
* bug 977224
* bug 978047
* bug 930477
* bug 957004
This is as deep as we can go from pvtbuilds, as we don't have mozilla-inbound device images right now. We'll need someone on the JS team to link the crash stack to one of those 6 bugs above to identify the regressing cause & get it backed out.
blocking-b2g: 1.4? → 1.4+
(In reply to Florin Strugariu [:Bebe] from comment #13)
> And Inari:
>  Gaia      a980b8f54956ed470667033630b02492efdf4a07                         
>  Gecko     https://hg.mozilla.org/mozilla-central/rev/0085a162499f         
>  BuildID   20140301160203                                                   
>  Version   30.0a1                                                          
>  ro.build.version.incremental=eng.cltbld.20140227.192705                    
>  ro.build.date=Thu Feb 27 19:49:33 EST 2014

The ro.build.date is outdated compared to the BuildID and the Gecko changeset.

Assuming this was not another issue (*), AWFY failed to report frequently when the problem appeared:

http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=39101e03fc13a8b7447b1555881cce5d439ba255&tochange=20c705d00e7c48ff5c82558af56be53a1a8f7f4c

(*) There is another issue, where benchmark cannot run because the keyboard position might have changed, and the marionette harness is typing "http:++" instead of "http://".
Keywords: smoketest
(In reply to Jason Smith [:jsmith] from comment #17)
> This is as deep as we can go from pvtbuilds, as we don't have
> mozilla-inbound device images right now. We'll need someone on the JS team
> to link the crash stack to one of those 6 bugs above to identify the
> regressing cause & get it backed out.

The stack trace does not tell us anything except that this is in Gecko's JS, as baseline fails after a "Cu.import".  Then there is nothing else we can learn from this back trace.

Also, it would be nice to generate stack traces, while disabling the jits, such as
  javascript.options.baselinejit.content -> false
  javascript.options.baselinejit.chrome -> false
  javascript.options.ion.content -> false

This can be done with ./edit-prefs.sh .

So far I am unable to configure a gecko locally to test on an Unagi.  I setup AWFY to make Unagi's images for each modifications made in the js directory, and I would have these images tomorrow.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #19)
> (In reply to Jason Smith [:jsmith] from comment #17)
> > This is as deep as we can go from pvtbuilds, as we don't have
> > mozilla-inbound device images right now. We'll need someone on the JS team
> > to link the crash stack to one of those 6 bugs above to identify the
> > regressing cause & get it backed out.
> 
> The stack trace does not tell us anything except that this is in Gecko's JS,
> as baseline fails after a "Cu.import".  Then there is nothing else we can
> learn from this back trace.
> 
> Also, it would be nice to generate stack traces, while disabling the jits,
> such as
>   javascript.options.baselinejit.content -> false
>   javascript.options.baselinejit.chrome -> false
>   javascript.options.ion.content -> false
> 
> This can be done with ./edit-prefs.sh .
> 
> So far I am unable to configure a gecko locally to test on an Unagi.  I
> setup AWFY to make Unagi's images for each modifications made in the js
> directory, and I would have these images tomorrow.

Tomorrow isn't good enough - we need this fixed asap within the next hour or two. We've got a busted m-c build with an entire b2g organization blocked here on testing. Someone needs to get on this now and get the regressing patch backed out.
(In reply to Jason Smith [:jsmith] from comment #20)
> (In reply to Nicolas B. Pierron [:nbp] from comment #19)
> > So far I am unable to configure a gecko locally to test on an Unagi.  I
> > setup AWFY to make Unagi's images for each modifications made in the js
> > directory, and I would have these images tomorrow.
> 
> Tomorrow isn't good enough - we need this fixed asap within the next hour or
> two. We've got a busted m-c build with an entire b2g organization blocked
> here on testing. Someone needs to get on this now and get the regressing
> patch backed out.

I am sorry if everybody is blocked on inbound, but as long as people are landing on inbound, AWFY's priority is to check these commits.

When it has idle time, it is building images of listed commits, knowing that I listed 20 changes, this will take a while, and is unlikely to complete in the next 2 hours.
Bisection leads to:

178889   508848ad378a   2014-02-26 10:25 +0100   jdemooij
  Bug 939562 part 3 - Move JIT flags from ContextOptions to RuntimeOptions. r=bent,bholley,luke

I'll try to revert this from latest central, if this reverts cleanly and see how this behaves.
(In reply to Julien Wajsberg [:julienw] from comment #22)
> Bisection leads to:
> 
> 178889   508848ad378a   2014-02-26 10:25 +0100   jdemooij
>   Bug 939562 part 3 - Move JIT flags from ContextOptions to RuntimeOptions.
> r=bent,bholley,luke

OK, that change must be exposing an existing bug, likely because we JIT more code now.

Thanks for your help Julien, we should back it out then. Too bad we don't have tests for this on inbound that would have caught this.
Blocks: 939562
Ed is working on backing out bug 939562 right now.
The last part of bug 939562 has been backed out (and also 978456 which landed on top of it).
Duplicate of this bug: 978816
I just want to report that it works fine for me with 508848ad378a reverted, both on my Buri and my Fugu (which was crashing a lot).

Should we close this bug now ?
Yup.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Does anyone have any idea why this didn't break on the emulator builds we monitor with TBPL?  If there's a known class of potential regressions that our test automation is missing, we should at least open a bug about it.
nightly device builds have been re-spun: results will be found here (inside Ed Morley's backout tbpl push):

https://tbpl.mozilla.org/?showall=all&rev=c8bea55437c1
(In reply to Jed Davis [:jld] from comment #29)
> Does anyone have any idea why this didn't break on the emulator builds we
> monitor with TBPL?  If there's a known class of potential regressions that
> our test automation is missing, we should at least open a bug about it.

The emulator does not catch misaligned accesses, and spidermonkey tends to hit this deficiency more frequently than other code.
Duplicate of this bug: 978458
(In reply to Zac C (:zac) from comment #14)
> Breaking down on tinderbox builds (to get slightly more granularity than
> nightly builds)
> 
> The Gaia commit for the first breaking tinderbox build was:
> Gaia a980b8f54956ed470667033630b02492efdf4a07
> and Gecko revision:
> Gecko 8abc76dedec2
> 
> Linked to here:
> https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/tinderbox-builds/
> mozilla-central-hamachi-eng/20140228130531/

Can I have access to this link? Or it is restricted to employees only..
No more crashes on:

Alcatel One Touch Fire production (got from T-mobile Poland)
B2G version: 1.4.0.0-prerelease master
Platform version: 30.0a1
Build Identifier: 20140303114510
Git commit info: 2014-03-03 10:34:58 dfae3744

Even after several restarts.
(In reply to Jed Davis [:jld] from comment #29)
> Does anyone have any idea why this didn't break on the emulator builds we
> monitor with TBPL?  If there's a known class of potential regressions that
> our test automation is missing, we should at least open a bug about it.

Yep, I've discussed this with our automation friends today: this bug could have been caught if TBPL was running the integration tests on the emulator.

We have a bug, this is Bug 916368.
(In reply to Marcela Oniga from comment #33)

> > Linked to here:
> > https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/tinderbox-builds/
> > mozilla-central-hamachi-eng/20140228130531/
> 
> Can I have access to this link? Or it is restricted to employees only..

Yes this is restricted, sorry about this.
(In reply to Julien Wajsberg [:julienw] from comment #35)
> (In reply to Jed Davis [:jld] from comment #29)
> > Does anyone have any idea why this didn't break on the emulator builds we
> > monitor with TBPL?  If there's a known class of potential regressions that
> > our test automation is missing, we should at least open a bug about it.
> 
> Yep, I've discussed this with our automation friends today: this bug could
> have been caught if TBPL was running the integration tests on the emulator.
> 
> We have a bug, this is Bug 916368.

Oh, and we actually run this automation on a device, Bug 978458 (dupe) was created thanks to this. But it's not on TBPL.
Target Milestone: --- → 1.4 S3 (14mar)
You need to log in before you can comment on or make changes to this bug.