Closed Bug 907957 Opened 6 years ago Closed 6 years ago

On-demand decompression startup failure with Firefox on the HTC One X (Android 4.2.2)

Categories

(Firefox for Android :: General, defect, critical)

24 Branch
ARM
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
Firefox 26
Tracking Status
firefox23 --- unaffected
firefox24 + verified disabled
firefox25 + fixed
firefox26 + verified
fennec 24+ ---

People

(Reporter: ezh, Assigned: glandium)

References

Details

(Keywords: regression, reproducible)

Attachments

(1 file, 2 obsolete files)

Updated my HTC One X to Android 4.2.2 and Sense 5 (official update). Firefox 23  runs fine. FF24 beta, 25 aurora, 26 nightly are crashing. For 1-2 seconds I see the main screen and then it shuts down without and error. But stays in running .

A pal of my with the same issue with his own HTC One X.

If needed I may record a video.
Does the crash reporter appear? Are you submitting a crash-report? If not, can you capture a logcat from the device and attach it to this bug?

https://wiki.mozilla.org/Mobile/Fennec/Android#Using_logcat
No crash report. Could you please tell me what command should I run and what next? I read the URL, installed an terminal, but have problem understand what should I do. Tnx.
Same on latest builds of FF beta, Aurora and Nightly.
Hardware: x86_64 → ARM
Using a Stock HTC One X+ Firefox 24, 25 and 26 start up on Android 4.1.1. Updating the phone to 4.2.2. shortly.
Upgrading to Android 4.2.2 caused this issue. Regression range is May-20 build good, May-21 build bad. This points to on demand compression.
Blocks: 848764
tracking-fennec: --- → ?
Flags: needinfo?(mh+mozilla)
Summary: Firefox beta+ (24+) crashes (?) on HTC One X after update to A4.2.2 and Sense 5 → Firefox beta 24+ crashes on HTC One X or One X+ after update to Android 4.2.2 and Sense 5
Severity: normal → critical
Summary: Firefox beta 24+ crashes on HTC One X or One X+ after update to Android 4.2.2 and Sense 5 → On-demand decompression startup failure with Firefox on the HTC One X (Android 4.2.2)
Actually it is not quite a crash, since it remains in tasks.
Thanks to Kevin Brosnan giving me remote access to an HTC One X+, I was able to find a root cause and a possible work around.

Can you test this build http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-58391ce5fe5b/try-android/fennec-26.0a1.en-US.android-arm.apk ?
Assignee: nobody → mh+mozilla
Flags: needinfo?(mh+mozilla)
This is the patch I used in that build. I'll tag it for review when I have confirmation it works.
For the record, for some reason, the segfault signal handler on that device is never going a correct siginfo_t.si_addr (address of the segfault), without which on-demand decompression can't work.
Test build working fine on HTC One X.
:glandium, do we belive any other devices may be impacted here or would this be limited to HTC in particular or a OS issue for any device that may be on 4.2.2 ? Has QA tried any other devices yet ?

Looks like you have a work around and the test build work fine given comment #10.Is that the lowest risk option we can go with for Fx24, given we only have 2 more Beta's ? Please let us know if there are any other lower risk alternatives(disabling or backing out 848764 for specific cases ?) keeping in mind the testing time we will get on this .

I've looped in support team and emailed them about the issue to see if we have seen any reports on SUMO, will keep this bug updated with the feedback we get.
(In reply to bhavana bajaj [:bajaj] from comment #11)
> :glandium, do we belive any other devices may be impacted here or would this
> be limited to HTC in particular or a OS issue for any device that may be on
> 4.2.2 ? Has QA tried any other devices yet ?
>
> Looks like you have a work around and the test build work fine given comment
> #10.Is that the lowest risk option we can go with for Fx24, given we only
> have 2 more Beta's ? Please let us know if there are any other lower risk
> alternatives(disabling or backing out 848764 for specific cases ?) keeping
> in mind the testing time we will get on this .

A less risky alternative is to just do as in https://hg.mozilla.org/mozilla-central/rev/4655d7317a03
but for the HTC devices, although it's hard to get a comprehensive list of failing devices without testing them all, which is just impossible. I was hoping the Xperia Z issue was the same (bug 886736) but it turns out it's not :(
Attachment #795951 - Flags: review?(nfroyd)
Comment on attachment 795951 [details] [diff] [review]
Detect if a segfault signal handler is useless. If it is, disable on-demand decompression

Review of attachment 795951 [details] [diff] [review]:
-----------------------------------------------------------------

I love segfault handlers.

::: mozglue/linker/ElfLoader.cpp
@@ +917,5 @@
> +  action.sa_flags = SA_SIGINFO | SA_NODEFER;
> +  action.sa_restorer = NULL;
> +  if (sys_sigaction(SIGSEGV, &action, &this->action))
> +    return;
> +  else {

No need for the else block, since you've returned in the if block.

@@ +918,5 @@
> +  action.sa_restorer = NULL;
> +  if (sys_sigaction(SIGSEGV, &action, &this->action))
> +    return;
> +  else {
> +    /* See comment above SEGVHandler::test_handler */

I think the bulk of the comment for test_handler actually belongs here, since it's describing why you're going through this dance.

Maybe something like "Some devices don't provide useful information to their SIGSEGV handlers.  To check if we're on such a device, setup a temporary handler and deliberately trigger a segfault.  The handler will set oldStack.ss_size if the provided information is bogus."

Whatever we do, I think the comments rightly go above the if, or maybe even before the initialization of |action|.

@@ +925,5 @@
> +                                    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0));
> +    if (stackPtr.get() != MAP_FAILED) {
> +      *((volatile int*)stackPtr.get()) = 123;
> +      if (oldStack.ss_size)
> +        return;

Should we restore the original signal handler here?

@@ +956,2 @@
>    /* Restore original signal handler */
>    sys_sigaction(SIGSEGV, &this->action, NULL);

Don't we only want to do this if we've managed to register a handler?

@@ +962,5 @@
> + * impossible for on-demand decompression to work.
> + * Set a temporary test signal handler and trigger a segfault.
> + * The signal handler will set a flag when the test fails.
> + * Abuse oldStack.ss_size as that flag, since its value doesn't matter at this
> + * point. */

...and then here: "Test handler for a deliberately triggered SIGSEGV that determines whether useful information is provided to signal handlers, particularly whether si_addr is filled in properly."

WDYT?
Attachment #795951 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd (:froydnj) from comment #13)
> Comment on attachment 795951 [details] [diff] [review]
> Detect if a segfault signal handler is useless. If it is, disable on-demand
> decompression
> 
> Review of attachment 795951 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I love segfault handlers.
> 
> ::: mozglue/linker/ElfLoader.cpp
> @@ +917,5 @@
> > +  action.sa_flags = SA_SIGINFO | SA_NODEFER;
> > +  action.sa_restorer = NULL;
> > +  if (sys_sigaction(SIGSEGV, &action, &this->action))
> > +    return;
> > +  else {
> 
> No need for the else block, since you've returned in the if block.

I'm using the else to begin a new block, so that .... oh, I forgot to switch a local variable here. The intent was for it to be unmapped at the end of that block and/or on the failure return.
(In reply to Mike Hommey [:glandium] from comment #14)
> I'm using the else to begin a new block, so that .... oh, I forgot to switch
> a local variable here. The intent was for it to be unmapped at the end of
> that block and/or on the failure return.

But then i can't pass it to the handler. So I need to release it manually.
With some fixes, review comments addressed, and a new interface for the js engine in bug 909709.
Attachment #796331 - Flags: review?(nfroyd)
Attachment #795951 - Attachment is obsolete: true
Depends on: 910024
Blocks: 910248
Comment on attachment 796331 [details] [diff] [review]
Detect if a segfault signal handler is useless. If it is, disable on-demand decompression

Review of attachment 796331 [details] [diff] [review]:
-----------------------------------------------------------------

I like this better?

::: mozglue/linker/ElfLoader.cpp
@@ +913,5 @@
>  {
> +  /* Initialize oldStack.ss_flags to an invalid value when used to set
> +   * an alternative stack, meaning we haven't got information about the
> +   * original alternative stack and thus don't mean to restore it */
> +  oldStack.ss_flags = 0;

We are assuming the ss_flags will always get set to a non-zero value through calling sigaltstack, correct?  (Probably SS_DISABLE?)

The man pages sound ambiguous on whether ss_flags get set or not: "The oss.ss_flags may return either of the following values...".  That reads to me like ss_flags might not get set, though I can see how the interpretation that ss_flags will be set and be one of the following values would be reasonable.

If the current setup works, then it works.  Just seems a little fragile.

@@ +934,5 @@
> +  if (sys_sigaction(SIGSEGV, &action, NULL))
> +    return;
> +  stackPtr.Assign(MemoryRange::mmap(NULL, PageSize(), PROT_NONE,
> +                                    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0));
> +  if (stackPtr.get() != MAP_FAILED) {

I guess there's really not anything else reasonable to do in the failure case here. =/
Attachment #796331 - Flags: review?(nfroyd) → review+
You made me relook at the sigaltstack man page, and that's a good thing, because my logic was completely wrong. Hopefully, this is now fixed. With a bit more error handling, too.
Attachment #796981 - Flags: review?(nfroyd)
Attachment #796331 - Attachment is obsolete: true
Comment on attachment 796981 [details] [diff] [review]
Detect if a segfault signal handler is useless. If it is, disable on-demand decompression.

Review of attachment 796981 [details] [diff] [review]:
-----------------------------------------------------------------

Third time's the charm!  I think everything looks OK here.  Do address the SA_ONSTACK lossage.

::: mozglue/linker/ElfLoader.cpp
@@ +959,5 @@
> +      stack_t stack;
> +      stack.ss_sp = stackPtr;
> +      stack.ss_size = stackSize;
> +      stack.ss_flags = 0;
> +      if (sigaltstack(&stack, NULL))

For consistency with the sigaltstack call to retrieve |oldStack|, let's make this |== 0|.  Or lose the |== 0| on the other one.

@@ -925,3 @@
>    action.sa_sigaction = &SEGVHandler::handler;
> -  sigemptyset(&action.sa_mask);
> -  action.sa_flags = SA_SIGINFO | SA_NODEFER | SA_ONSTACK;

You lost the SA_ONSTACK flag when setting up the actual handler.  I guess you don't need SA_ONSTACK for the test handler.
Attachment #796981 - Flags: review?(nfroyd) → review+
tracking-fennec: ? → 24+
Blocks: 909709
Status: NEW → ASSIGNED
Flags: needinfo?(kbrosnan)
Keywords: verifyme
(In reply to Eugene Savitsky from comment #10)
> Test build working fine on HTC One X.

Hi Eugene, can you please help us by verifying the latest nightly and confirm this is fixed for you ?
https://hg.mozilla.org/mozilla-central/rev/8e76a8ebcf76
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 26
While reviewing the past months changes to the linker to see if the patch would work properly on beta, I stumbled upon bug 882608, which is not currently fixed on beta. So on beta, landing the patch from this bug, or the less risky alternative would both trade crashes with different crashes (but not make non-crashy devices crashy).

So the options are:
- Land bug 882608 and this bug on beta
- Land bug 882608 and the less risky alternative on beta (note this means we would still crash on other currently unidentified devices that have the same problem ; bug 910024 is there to try to identify at least some of them)
- Disable on-demand decompression on beta (iow, backout bug 848764 on beta)

I'd go with the first or last options. But considering we're not sure what's going on with bug 886736 yet, I'd tend for the latter.
26a1, 2013-09-01

Works fine on HTC One X with A4.2.2 and Sense 5. 

Thank you for fixing it!
Verified fixed on the HTC One X+ as well.
Status: RESOLVED → VERIFIED
Flags: needinfo?(kbrosnan)
Keywords: verifyme
needinfoing' :glandium here to help with the backout on beta given comment#23 .
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(mh+mozilla)
Comment on attachment 796981 [details] [diff] [review]
Detect if a segfault signal handler is useless. If it is, disable on-demand decompression.

Can we uplift this to aurora?
Attachment #796981 - Flags: approval-mozilla-aurora?
Requesting QA verification to ensure the on-demand decompression is disabled on beta for the upcoming build.
:glandium, QA has this bug on radar as a testcase for the above. Anything else which makes this to be disabled in an obvious way that you think is verifiable will be helpful?
(In reply to bhavana bajaj [:bajaj] from comment #29)
> Requesting QA verification to ensure the on-demand decompression is disabled
> on beta for the upcoming build.
> :glandium, QA has this bug on radar as a testcase for the above. Anything
> else which makes this to be disabled in an obvious way that you think is
> verifiable will be helpful?

The backout of bug 848764 should trigger an artificial decrease in RSS which should be appear in a mail to dev-tree-management. (Landing of bug 848764 artificially increased it)

Other than that, checking /proc/<pid>/maps should show only a few zones for /dev/shmem/lib*.so instead of hundreds.
Attachment #796981 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.