Schedule Android 4.2 x86 Opt set S4 (only) on all trunk trees and make them ride the trains

RESOLVED FIXED

Status

RESOLVED FIXED
5 years ago
5 months ago

People

(Reporter: gbrown, Assigned: armenzg)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
https://tbpl.mozilla.org/?tree=Cedar&rev=d528cc2b5446 demonstrates the high stability of Android 4.2 x86 Opt set S4. In 111 runs of S4, there are only 2 robocop failures and 1 xpcshell failure.

Unfortunately, S1, S2, and S3 are not ready for trunk at this time.

I would like to start running S4 on trunk trees, to give us some on-going x86 test coverage and help identify and contain the remaining low-frequency intermittent failures.
(Assignee)

Comment 1

5 years ago
edmorley, philor: given the information above, do you have any objections?
Assignee: nobody → armenzg
Flags: needinfo?(philringnalda)
Flags: needinfo?(emorley)
(Assignee)

Updated

5 years ago
Depends on: 940399

Comment 2

5 years ago
sgtm :-)
Flags: needinfo?(emorley)
Since bug 848621 is now esr24-only, I think it's more likely that your xpcshell failure is some new failing in the emulator, but it's no skin off my nose if we run them everywhere, as long as we're not planning on deciding to leave them running but hidden if there does turn out to be more trouble with them.
Flags: needinfo?(philringnalda)
(Assignee)

Comment 4

5 years ago
Created attachment 8361834 [details] [diff] [review]
enable_s4.diff
Attachment #8361834 - Flags: review?(bugspam.Callek)
(Assignee)

Comment 5

5 years ago
Created attachment 8361836 [details]
differences.txt

This attachment shows the builders that are affected.
(Assignee)

Comment 6

5 years ago
I'm only enabling it on gecko 29 branches. Please let me know if you need more in the future.

Updated

5 years ago
Attachment #8361834 - Flags: review?(bugspam.Callek) → review+
in production
(Assignee)

Comment 9

5 years ago
We should start seeing it in here:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=android.*x86
(Assignee)

Comment 10

5 years ago
philor is letting me know that we have some intermittent oranges:
https://tbpl.mozilla.org/php/getParsedLog.php?id=33335041&tree=B2g-Inbound#error0
https://tbpl.mozilla.org/?tree=B2g-Inbound&jobname=android.*x86

gbrown: could you please file it so we can star it?
(Assignee)

Comment 11

5 years ago
philor says the filing the bug with one of these "search terms" will help star tbpl failures:
https://tbpl.mozilla.org/php/getLogExcerpt.php?type=annotated&regenerate=1&id=33335041&debug=1

He also recommends this:
<philor> none of them good
<philor> probably "Intermittent Androidx86 testFormHistory | application crashed [Unknown top frame]"
(Assignee)

Comment 12

5 years ago
I re-triggered the job.
(Reporter)

Comment 13

5 years ago
I filed bug 962121.
(Assignee)

Comment 16

5 years ago
It seems that this is sticking.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Comment 17

4 years ago
Could you make these jobs run by default when people push to try with '-p all -u all -t all'?  Also, in the meantime, how can one explicitly request an S4 run?
(Assignee)

Comment 18

4 years ago
IIRC you have to choose one of the 4 suites that run within the S4 job to get triggered:
http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla-tests/mobile_config.py#l825

Of the top of my head, I don't know why it would not be running by default on try.

Callek, do you know why the android-x86 jobs don't run by default on try? I can't figure it out by looking at the buildbot-configs.

Does anybody know if we decided to drop x86?
Is it ever going to become a tier-1?
Flags: needinfo?(bugspam.Callek)
(Reporter)

Comment 19

4 years ago
btw, the try -all issue is explicitly tracked in bug 964589.

Last I heard we were in wait-and-see mode on the future of x86: We don't want to put much additional effort into x86 testing, but we also are not ready to shut it down. :blassey -- can you verify?
Flags: needinfo?(blassey.bugs)

Comment 20

4 years ago
Thanks for the response!  I'd really appreciate if we could disable Android-x86 S4 on trunk since I was just backed out for an S4-only crash which doesn't have symbolicated crash stacks:
  https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=10616214c160
and I'm having trouble getting https://wiki.mozilla.org/Auto-tools/Projects/Mozharness to work so I can reproduce locally.

Comment 21

4 years ago
(On the subject, if we can't turn off Android-x86, I'd really appreciate any help getting crash stacks; it's probably an obvious platform-dependent bug that I could fix if I could just see where it was crashing.)
(Assignee)

Comment 22

4 years ago
Luke, I've added a section with jobs that are known to not be able to run locally:
https://wiki.mozilla.org/ReleaseEngineering/Mozharness/How_to_run_tests_as_a_developer#Known_issues

Android tests is one of them. You will need a loaner.

Comment 23

4 years ago
Are the lack of symbolicated crash stacks also a known issue?

Comment 24

4 years ago
Also, is there a bug for turning off the Android x86 tests?  Without some help figuring out what's going on here, I'm probably going to just need to block my bug on that.
(Assignee)

Comment 25

4 years ago
(In reply to Luke Wagner [:luke] from comment #23)
> Are the lack of symbolicated crash stacks also a known issue?

I am not aware of it. Maybe gbrown knows.

(In reply to Luke Wagner [:luke] from comment #24)
> Also, is there a bug for turning off the Android x86 tests?  Without some
> help figuring out what's going on here, I'm probably going to just need to
> block my bug on that.

No, there is no bug.
Let's see what blassey says.
(In reply to Geoff Brown [:gbrown] from comment #19)
> btw, the try -all issue is explicitly tracked in bug 964589.
> 
> Last I heard we were in wait-and-see mode on the future of x86: We don't
> want to put much additional effort into x86 testing, but we also are not
> ready to shut it down. :blassey -- can you verify?

You got it
Flags: needinfo?(blassey.bugs)
That's a pretty neat state: it doesn't meet the visiblity requirements several times over (no active owner, no useful crashes, not easily run on try), but we don't want to shut it off, but since it doesn't meet the visiblity requirements it should be hidden, but the second it is hidden Luke is waiting to break it because it doesn't meet the visibility requirements by having useful crash stacks.

Comment 28

4 years ago
Agreed with philor :)

> but the second it is hidden Luke
> is waiting to break it because it doesn't meet the visibility requirements
> by having useful crash stacks.

Geoff Brown is magnanimously helping me to get a callstack in bug 1091916 and this callstack may point to an obvious fix.  Failing that, could we consider disabling S4 on trunk?  Otherwise, this will be holding back a pretty nice improvement in bug 1091912.
Flags: needinfo?(blassey.bugs)
(In reply to Phil Ringnalda (:philor) from comment #27)
> That's a pretty neat state: it doesn't meet the visiblity requirements
> several times over (no active owner, no useful crashes, not easily run on
> try), but we don't want to shut it off, but since it doesn't meet the
> visiblity requirements it should be hidden, but the second it is hidden Luke
> is waiting to break it because it doesn't meet the visibility requirements
> by having useful crash stacks.

I don't understand the "no active owner" bit here. There is a bug open to get it more easily run on try.
(In reply to Luke Wagner [:luke] from comment #24)
> Also, is there a bug for turning off the Android x86 tests?  Without some
> help figuring out what's going on here, I'm probably going to just need to
> block my bug on that.

There are no plans to turn off x86 tests or builds.
(In reply to Luke Wagner [:luke] from comment #28)
> Agreed with philor :)
> 
> > but the second it is hidden Luke
> > is waiting to break it because it doesn't meet the visibility requirements
> > by having useful crash stacks.
> 
> Geoff Brown is magnanimously helping me to get a callstack in bug 1091916
> and this callstack may point to an obvious fix.  Failing that, could we
> consider disabling S4 on trunk?  Otherwise, this will be holding back a
> pretty nice improvement in bug 1091912.
Can you not reproduce the crash locally? One nice thing about our x86 tests is they're run in emulators, so no hardware is needed. If debugging on real hardware would be easier, you should be able to borrow a Razr i from QA.
Flags: needinfo?(blassey.bugs)

Comment 30

4 years ago
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #29)
> Can you not reproduce the crash locally?

comment 22, and trying earlier today, seems to indicate 'no'.
if I read that correctly, the problem is running the tests locally in mozharness on android, not running the tests themselves. gbrown tells me test set 4 is the xpcshell tests and robocop tests. You can find instructions on how to run those here:

https://wiki.mozilla.org/Mobile/Fennec/Android#xpcshell
and
https://wiki.mozilla.org/Mobile/Fennec/Android#Robocop

Updated

4 years ago
Flags: needinfo?(bugspam.Callek)
(Reporter)

Comment 32

4 years ago
btw, I checked on our Android xpcshell crash handling and it seems to be working normally and providing crash dumps with symbols, even on x86:

There may be something "special" about Luke's crash.

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=be4f82d8fc6f

23:06:22  WARNING -  PROCESS-CRASH | /builds/slave/talos-slave/test/build/tests/xpcshell/tests/dom/encoding/test/unit/test_shift_jis.js | application crashed [@ XRE_XPCShellMain]
23:06:22     INFO -  Crash dump filename: /tmp/tmp1Wtxr1/2ad6baa5-9362-dace-443869ce-7ae44a00.dmp
23:06:22     INFO -  Operating system: Android
23:06:22     INFO -                    0.0.0 Linux 2.6.29 #1 PREEMPT Thu Nov 7 22:27:50 UTC 2013 i686 Android/full/generic:4.2/JOP40C/eng.ubuntu.20131107.195638:eng/test-keys
23:06:22     INFO -  CPU: x86
23:06:22     INFO -       GenuineIntel family 6 model 3 stepping 3
23:06:22     INFO -       1 CPU
23:06:22     INFO -  
23:06:22     INFO -  Crash reason:  SIGSEGV
23:06:22     INFO -  Crash address: 0x0
23:06:22     INFO -  
23:06:22     INFO -  Thread 0 (crashed)
23:06:22     INFO -   0  libxul.so!XRE_XPCShellMain [XPCShellImpl.cpp:be4f82d8fc6f : 1514 + 0x0]
23:06:22     INFO -      eip = 0xb53dc033   esp = 0xbffff410   ebp = 0xbffff628   ebx = 0xb7aa6f84
23:06:22     INFO -      esi = 0x00000016   edi = 0xbffffb3c   eax = 0xb49d254c   ecx = 0x00000000
23:06:22     INFO -      edx = 0xbffff48c   efl = 0x00010246
23:06:22     INFO -      Found by: given as instruction pointer in context
23:06:22     INFO -   1  xpcshell!main [xpcshell.cpp:be4f82d8fc6f : 43 + 0xd]
23:06:22     INFO -      eip = 0x080485da   esp = 0xbffff630   ebp = 0xbffff668   ebx = 0x08049fd8
23:06:22     INFO -      esi = 0x0000001b   edi = 0xbffff6c4
23:06:22     INFO -      Found by: call frame info
23:06:22     INFO -   2  libc.so + 0x19ae3
23:06:22     INFO -      eip = 0xb7f44ae4   esp = 0xbffff670   ebp = 0xbffff734
23:06:22     INFO -      Found by: previous frame's frame pointer
23:06:22     INFO -   3  0xbffffb57
23:06:22     INFO -      eip = 0xbffffb58   esp = 0xbffff73c   ebp = 0xbffffb3c
23:06:22     INFO -      Found by: previous frame's frame pointer
23:06:22     INFO -   4  0x2f617460
23:06:22     INFO -      eip = 0x2f617461   esp = 0xbffffb44   ebp = 0x642f3d5f
23:06:22     INFO -      Found by: previous frame's frame pointer
23:06:22     INFO -  
23:06:22     INFO -  Thread 1
23:06:22     INFO -   0  libc.so + 0x26837
23:06:22     INFO -      eip = 0xb7f51837   esp = 0xb2101bdc   ebp = 0xb2101c28   ebx = 0x00000003
23:06:22     INFO -      esi = 0xffffffff   edi = 0xffffffff   eax = 0xfffffffc   ecx = 0xb4960300
23:06:22     INFO -      edx = 0x00000020   efl = 0x00000246
23:06:22     INFO -      Found by: given as instruction pointer in context
23:06:22     INFO -   1  libxul.so!event_base_loop [event.c:be4f82d8fc6f : 1607 + 0x9]
23:06:22     INFO -      eip = 0xb524b59d   esp = 0xb2101c30   ebp = 0xb2101ca8
23:06:22     INFO -      Found by: previous frame's frame pointer
23:06:22     INFO -   2  libxul.so!base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) [message_pump_libevent.cc:be4f82d8fc6f : 340 + 0xb]
23:06:22     INFO -      eip = 0xb524f069   esp = 0xb2101cb0   ebp = 0xb2101d08   ebx = 0xb7aa6f84
23:06:22     INFO -      esi = 0xb4901a90   edi = 0xb2101d80
23:06:22     INFO -      Found by: call frame info
23:06:22     INFO -   3  libxul.so!MessageLoop::RunInternal() [message_loop.cc:be4f82d8fc6f : 233 + 0x5]
23:06:22     INFO -      eip = 0xb5256f5f   esp = 0xb2101d10   ebp = 0xb2101d28   ebx = 0xb7aa6f84
23:06:22     INFO -      esi = 0xb2101d80   edi = 0xb2101d80
23:06:22     INFO -      Found by: call frame info
(Reporter)

Comment 33

4 years ago
(In reply to Geoff Brown [:gbrown] from comment #32) 
> There may be something "special" about Luke's crash.

That turned out to be the crashing tests themselves: They are all multi-process xpcshell tests, which are not supported on Android. We are disabling those tests on Android in bug 1093719.
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.