Closed Bug 960674 Opened 11 years ago Closed 11 years ago

Schedule Android 4.2 x86 Opt set S4 (only) on all trunk trees and make them ride the trains

Categories

(Release Engineering :: General, defect)

x86_64
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gbrown, Assigned: armenzg)

References

Details

Attachments

(2 files)

https://tbpl.mozilla.org/?tree=Cedar&rev=d528cc2b5446 demonstrates the high stability of Android 4.2 x86 Opt set S4. In 111 runs of S4, there are only 2 robocop failures and 1 xpcshell failure. Unfortunately, S1, S2, and S3 are not ready for trunk at this time. I would like to start running S4 on trunk trees, to give us some on-going x86 test coverage and help identify and contain the remaining low-frequency intermittent failures.
edmorley, philor: given the information above, do you have any objections?
Assignee: nobody → armenzg
Flags: needinfo?(philringnalda)
Flags: needinfo?(emorley)
Depends on: 940399
sgtm :-)
Flags: needinfo?(emorley)
Since bug 848621 is now esr24-only, I think it's more likely that your xpcshell failure is some new failing in the emulator, but it's no skin off my nose if we run them everywhere, as long as we're not planning on deciding to leave them running but hidden if there does turn out to be more trouble with them.
Flags: needinfo?(philringnalda)
Attached patch enable_s4.diffSplinter Review
Attachment #8361834 - Flags: review?(bugspam.Callek)
Attached file differences.txt
This attachment shows the builders that are affected.
I'm only enabling it on gecko 29 branches. Please let me know if you need more in the future.
Attachment #8361834 - Flags: review?(bugspam.Callek) → review+
in production
philor is letting me know that we have some intermittent oranges: https://tbpl.mozilla.org/php/getParsedLog.php?id=33335041&tree=B2g-Inbound#error0 https://tbpl.mozilla.org/?tree=B2g-Inbound&jobname=android.*x86 gbrown: could you please file it so we can star it?
philor says the filing the bug with one of these "search terms" will help star tbpl failures: https://tbpl.mozilla.org/php/getLogExcerpt.php?type=annotated&regenerate=1&id=33335041&debug=1 He also recommends this: <philor> none of them good <philor> probably "Intermittent Androidx86 testFormHistory | application crashed [Unknown top frame]"
I re-triggered the job.
I filed bug 962121.
It seems that this is sticking.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Could you make these jobs run by default when people push to try with '-p all -u all -t all'? Also, in the meantime, how can one explicitly request an S4 run?
IIRC you have to choose one of the 4 suites that run within the S4 job to get triggered: http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla-tests/mobile_config.py#l825 Of the top of my head, I don't know why it would not be running by default on try. Callek, do you know why the android-x86 jobs don't run by default on try? I can't figure it out by looking at the buildbot-configs. Does anybody know if we decided to drop x86? Is it ever going to become a tier-1?
Flags: needinfo?(bugspam.Callek)
btw, the try -all issue is explicitly tracked in bug 964589. Last I heard we were in wait-and-see mode on the future of x86: We don't want to put much additional effort into x86 testing, but we also are not ready to shut it down. :blassey -- can you verify?
Flags: needinfo?(blassey.bugs)
Thanks for the response! I'd really appreciate if we could disable Android-x86 S4 on trunk since I was just backed out for an S4-only crash which doesn't have symbolicated crash stacks: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=10616214c160 and I'm having trouble getting https://wiki.mozilla.org/Auto-tools/Projects/Mozharness to work so I can reproduce locally.
(On the subject, if we can't turn off Android-x86, I'd really appreciate any help getting crash stacks; it's probably an obvious platform-dependent bug that I could fix if I could just see where it was crashing.)
Luke, I've added a section with jobs that are known to not be able to run locally: https://wiki.mozilla.org/ReleaseEngineering/Mozharness/How_to_run_tests_as_a_developer#Known_issues Android tests is one of them. You will need a loaner.
Are the lack of symbolicated crash stacks also a known issue?
Also, is there a bug for turning off the Android x86 tests? Without some help figuring out what's going on here, I'm probably going to just need to block my bug on that.
(In reply to Luke Wagner [:luke] from comment #23) > Are the lack of symbolicated crash stacks also a known issue? I am not aware of it. Maybe gbrown knows. (In reply to Luke Wagner [:luke] from comment #24) > Also, is there a bug for turning off the Android x86 tests? Without some > help figuring out what's going on here, I'm probably going to just need to > block my bug on that. No, there is no bug. Let's see what blassey says.
(In reply to Geoff Brown [:gbrown] from comment #19) > btw, the try -all issue is explicitly tracked in bug 964589. > > Last I heard we were in wait-and-see mode on the future of x86: We don't > want to put much additional effort into x86 testing, but we also are not > ready to shut it down. :blassey -- can you verify? You got it
Flags: needinfo?(blassey.bugs)
That's a pretty neat state: it doesn't meet the visiblity requirements several times over (no active owner, no useful crashes, not easily run on try), but we don't want to shut it off, but since it doesn't meet the visiblity requirements it should be hidden, but the second it is hidden Luke is waiting to break it because it doesn't meet the visibility requirements by having useful crash stacks.
Agreed with philor :) > but the second it is hidden Luke > is waiting to break it because it doesn't meet the visibility requirements > by having useful crash stacks. Geoff Brown is magnanimously helping me to get a callstack in bug 1091916 and this callstack may point to an obvious fix. Failing that, could we consider disabling S4 on trunk? Otherwise, this will be holding back a pretty nice improvement in bug 1091912.
Flags: needinfo?(blassey.bugs)
(In reply to Phil Ringnalda (:philor) from comment #27) > That's a pretty neat state: it doesn't meet the visiblity requirements > several times over (no active owner, no useful crashes, not easily run on > try), but we don't want to shut it off, but since it doesn't meet the > visiblity requirements it should be hidden, but the second it is hidden Luke > is waiting to break it because it doesn't meet the visibility requirements > by having useful crash stacks. I don't understand the "no active owner" bit here. There is a bug open to get it more easily run on try. (In reply to Luke Wagner [:luke] from comment #24) > Also, is there a bug for turning off the Android x86 tests? Without some > help figuring out what's going on here, I'm probably going to just need to > block my bug on that. There are no plans to turn off x86 tests or builds. (In reply to Luke Wagner [:luke] from comment #28) > Agreed with philor :) > > > but the second it is hidden Luke > > is waiting to break it because it doesn't meet the visibility requirements > > by having useful crash stacks. > > Geoff Brown is magnanimously helping me to get a callstack in bug 1091916 > and this callstack may point to an obvious fix. Failing that, could we > consider disabling S4 on trunk? Otherwise, this will be holding back a > pretty nice improvement in bug 1091912. Can you not reproduce the crash locally? One nice thing about our x86 tests is they're run in emulators, so no hardware is needed. If debugging on real hardware would be easier, you should be able to borrow a Razr i from QA.
Flags: needinfo?(blassey.bugs)
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #29) > Can you not reproduce the crash locally? comment 22, and trying earlier today, seems to indicate 'no'.
if I read that correctly, the problem is running the tests locally in mozharness on android, not running the tests themselves. gbrown tells me test set 4 is the xpcshell tests and robocop tests. You can find instructions on how to run those here: https://wiki.mozilla.org/Mobile/Fennec/Android#xpcshell and https://wiki.mozilla.org/Mobile/Fennec/Android#Robocop
Flags: needinfo?(bugspam.Callek)
btw, I checked on our Android xpcshell crash handling and it seems to be working normally and providing crash dumps with symbols, even on x86: There may be something "special" about Luke's crash. https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=be4f82d8fc6f 23:06:22 WARNING - PROCESS-CRASH | /builds/slave/talos-slave/test/build/tests/xpcshell/tests/dom/encoding/test/unit/test_shift_jis.js | application crashed [@ XRE_XPCShellMain] 23:06:22 INFO - Crash dump filename: /tmp/tmp1Wtxr1/2ad6baa5-9362-dace-443869ce-7ae44a00.dmp 23:06:22 INFO - Operating system: Android 23:06:22 INFO - 0.0.0 Linux 2.6.29 #1 PREEMPT Thu Nov 7 22:27:50 UTC 2013 i686 Android/full/generic:4.2/JOP40C/eng.ubuntu.20131107.195638:eng/test-keys 23:06:22 INFO - CPU: x86 23:06:22 INFO - GenuineIntel family 6 model 3 stepping 3 23:06:22 INFO - 1 CPU 23:06:22 INFO - 23:06:22 INFO - Crash reason: SIGSEGV 23:06:22 INFO - Crash address: 0x0 23:06:22 INFO - 23:06:22 INFO - Thread 0 (crashed) 23:06:22 INFO - 0 libxul.so!XRE_XPCShellMain [XPCShellImpl.cpp:be4f82d8fc6f : 1514 + 0x0] 23:06:22 INFO - eip = 0xb53dc033 esp = 0xbffff410 ebp = 0xbffff628 ebx = 0xb7aa6f84 23:06:22 INFO - esi = 0x00000016 edi = 0xbffffb3c eax = 0xb49d254c ecx = 0x00000000 23:06:22 INFO - edx = 0xbffff48c efl = 0x00010246 23:06:22 INFO - Found by: given as instruction pointer in context 23:06:22 INFO - 1 xpcshell!main [xpcshell.cpp:be4f82d8fc6f : 43 + 0xd] 23:06:22 INFO - eip = 0x080485da esp = 0xbffff630 ebp = 0xbffff668 ebx = 0x08049fd8 23:06:22 INFO - esi = 0x0000001b edi = 0xbffff6c4 23:06:22 INFO - Found by: call frame info 23:06:22 INFO - 2 libc.so + 0x19ae3 23:06:22 INFO - eip = 0xb7f44ae4 esp = 0xbffff670 ebp = 0xbffff734 23:06:22 INFO - Found by: previous frame's frame pointer 23:06:22 INFO - 3 0xbffffb57 23:06:22 INFO - eip = 0xbffffb58 esp = 0xbffff73c ebp = 0xbffffb3c 23:06:22 INFO - Found by: previous frame's frame pointer 23:06:22 INFO - 4 0x2f617460 23:06:22 INFO - eip = 0x2f617461 esp = 0xbffffb44 ebp = 0x642f3d5f 23:06:22 INFO - Found by: previous frame's frame pointer 23:06:22 INFO - 23:06:22 INFO - Thread 1 23:06:22 INFO - 0 libc.so + 0x26837 23:06:22 INFO - eip = 0xb7f51837 esp = 0xb2101bdc ebp = 0xb2101c28 ebx = 0x00000003 23:06:22 INFO - esi = 0xffffffff edi = 0xffffffff eax = 0xfffffffc ecx = 0xb4960300 23:06:22 INFO - edx = 0x00000020 efl = 0x00000246 23:06:22 INFO - Found by: given as instruction pointer in context 23:06:22 INFO - 1 libxul.so!event_base_loop [event.c:be4f82d8fc6f : 1607 + 0x9] 23:06:22 INFO - eip = 0xb524b59d esp = 0xb2101c30 ebp = 0xb2101ca8 23:06:22 INFO - Found by: previous frame's frame pointer 23:06:22 INFO - 2 libxul.so!base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) [message_pump_libevent.cc:be4f82d8fc6f : 340 + 0xb] 23:06:22 INFO - eip = 0xb524f069 esp = 0xb2101cb0 ebp = 0xb2101d08 ebx = 0xb7aa6f84 23:06:22 INFO - esi = 0xb4901a90 edi = 0xb2101d80 23:06:22 INFO - Found by: call frame info 23:06:22 INFO - 3 libxul.so!MessageLoop::RunInternal() [message_loop.cc:be4f82d8fc6f : 233 + 0x5] 23:06:22 INFO - eip = 0xb5256f5f esp = 0xb2101d10 ebp = 0xb2101d28 ebx = 0xb7aa6f84 23:06:22 INFO - esi = 0xb2101d80 edi = 0xb2101d80 23:06:22 INFO - Found by: call frame info
(In reply to Geoff Brown [:gbrown] from comment #32) > There may be something "special" about Luke's crash. That turned out to be the crashing tests themselves: They are all multi-process xpcshell tests, which are not supported on Android. We are disabling those tests on Android in bug 1093719.
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: