run talos/raptor tests on windows10 aarch64 laptops

RESOLVED FIXED in mozilla68

Status

enhancement
P1
normal
RESOLVED FIXED
3 months ago
18 days ago

People

(Reporter: jmaher, Assigned: stephend)

Tracking

(Depends on 1 bug)

Version 3
mozilla68
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox68 affected)

Details

Attachments

(1 attachment, 1 obsolete attachment)

Reporter

Description

3 months ago

now that bug 1530737 is resolved, we can run talos/raptor tests on try server:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=022a6d1d82df08e169e29de49085ba88ac293cfe

as you can see in that try push there is a lot of orange.

Some of the orange is an issue with the laptops where the job doesn't even run, but many of the cases is failure.

Specifically:
talos: damp, ps (both of these hit taskcluster max time)
raptor: tp6-3

Assignee: nobody → stephen.donner
Status: NEW → ASSIGNED

:stephend could you see if you can replicate the failures locally? If these are easy to resolve together then we can take care of them all in this bug. If not, we can open new bugs as dependencies.

Flags: needinfo?(stephen.donner)
Assignee

Comment 2

3 months ago

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #1)

:stephend could you see if you can replicate the failures locally? If these are easy to resolve together then we can take care of them all in this bug. If not, we can open new bugs as dependencies.

Absolutely; looking!

Flags: needinfo?(stephen.donner)
Assignee

Updated

3 months ago
Depends on: 1532560

Stephen, what is left to do on this bug to get Raptor working on aarch64? Are you still working on it?

Flags: needinfo?(stephen.donner)

Raptor is running on aarch64, and can be seen here: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=windows%2Caarch64%2Crap&revision=4f6327fecae4df135bc1787f75e1432230b510f7

I don't see any Talos jobs running though. What else needs to be done to get these enabled?

Flags: needinfo?(jmaher)
Reporter

Comment 5

2 months ago

I was waiting on bug 1531876 to be resolved, damp and ps were broken in the original runs.

Flags: needinfo?(jmaher)
Assignee

Comment 6

2 months ago

Just landed https://hg.mozilla.org/try/rev/03593782f9c13b4f635cdf5a138b23d8901c40a1 which casts the proverbial net a little wider than Geoff's previous Try push, over in https://treeherder.mozilla.org/#/jobs?repo=try&revision=022a6d1d82df08e169e29de49085ba88ac293cfe.

Assuming I too see pageload-timeout failures in this run (for damp/ps), is the following the correct approach?

  1. Locally. bump the current max-run-time: 1200 to 1800 or something else more reasonable, via https://searchfox.org/mozilla-central/rev/8d78f219702286c873860f39f9ed78bad1a6d062/taskcluster/ci/test/talos.yml#275
  2. Push to try[0]; are the following platforms/tasks enough/too much[1]?
"tasks": [
+        "build-win64-aarch64-shippable/opt",
+        "build-win64-aarch64/debug",
+        "build-win64-aarch64/opt",
+        "test-windows10-aarch64/opt-talos-damp-e10s",
+        "test-windows10-aarch64/opt-talos-other-e10s"
+    ],
  1. Once satisfied with (how many runs? 10 or 20, Joel? Sorry, I know you answered this once for me already...), make a permanent commit to add - windows-talos to https://searchfox.org/mozilla-central/rev/dd7e27f4a805e4115d0dbee70e1220b23b23c567/taskcluster/ci/test/test-platforms.yml#207-213 with any appropriately adjusted max-run-time value here: https://searchfox.org/mozilla-central/rev/8d78f219702286c873860f39f9ed78bad1a6d062/taskcluster/ci/test/talos.yml#275

[0] https://wiki.mozilla.org/Performance_sheriffing/Talos/Running#Try_Server
[1] https://hg.mozilla.org/try/rev/03593782f9c13b4f635cdf5a138b23d8901c40a1

Reporter

Comment 8

a month ago

that could be failure to close the browser- either way that looks unstable :)

Assignee

Comment 9

a month ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #5)

I was waiting on bug 1531876 to be resolved, damp and ps were broken in the original runs.

That's this bug; did you perhaps mean bug 1532560, or another bug?

no, I meant this bug, comment 0 mentions 2 talos tests that are broken.

(In reply to Stephen Donner [:stephend] from comment #7)

Looks like it caught a crasher, here: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=239180509&repo=try&lineNumber=3126

This crash is:

23:23:12 INFO - Thread 0 (crashed)
23:23:12 INFO - 0 xul.dll!CrashReporter::CreateMinidumpsAndPair(void *,unsigned long,nsTSubstring<char> const &,nsIFile *,nsIFile * *) [nsExceptionHandler.cpp:03593782f9c13b4f635cdf5a138b23d8901c40a1 : 3529 + 0xc8]
23:23:12 INFO - Found by: given as instruction pointer in context

Gabriele, is that a stack you have seen yet? At least I cannot find a bug logged about that.

Flags: needinfo?(gsvelto)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #11)

(In reply to Stephen Donner [:stephend] from comment #7)

Looks like it caught a crasher, here: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=239180509&repo=try&lineNumber=3126

This crash is:

23:23:12 INFO - Thread 0 (crashed)
23:23:12 INFO - 0 xul.dll!CrashReporter::CreateMinidumpsAndPair(void *,unsigned long,nsTSubstring<char> const &,nsIFile *,nsIFile * *) [nsExceptionHandler.cpp:03593782f9c13b4f635cdf5a138b23d8901c40a1 : 3529 + 0xc8]
23:23:12 INFO - Found by: given as instruction pointer in context

Gabriele, is that a stack you have seen yet? At least I cannot find a bug logged about that.

No, and the crash reason is very odd: EXCEPTION_NONCONTINUABLE_EXCEPTION. I've never encountered it before, Windows documentation suggests that it might happens in case of code mismatches (like an x86-64 program trying to load an x86 DLL) but the stack doesn't suggest that here.

Flags: needinfo?(gsvelto)

(In reply to Gabriele Svelto [:gsvelto] from comment #12)

No, and the crash reason is very odd: EXCEPTION_NONCONTINUABLE_EXCEPTION. I've never encountered it before, Windows documentation suggests that it might happens in case of code mismatches (like an x86-64 program trying to load an x86 DLL) but the stack doesn't suggest that here.

I filed bug 1544360 for that now.

Assignee

Comment 14

a month ago

How do I disable talos-damp[0] and talos-perf-reftest-singletons[1], cleanly?

For the former, we have the above bug 1544360 crasher filed for aarch64; so I assume that I at least:

  1. In (?:windows10-64|windows7-32|linux64)(?:-qr)?/opt: ['mozilla-central', 'try'] from https://searchfox.org/mozilla-central/rev/1b2636e8517aa48422ed516affe4d28cb7fa220a/taskcluster/ci/test/talos.yml#74, remove the mozilla-central portion, leaving only try to remain, with an accompanying comment # bug 1544360?
  2. Once that's landed, and a similar process followed for talos-perf-reftest-singletons aka ps, I then enable the full suite on windows10--aarch64 by adding - windows-talos to the windows10-aarch64/opt test-sets: block in https://searchfox.org/mozilla-central/rev/1b2636e8517aa48422ed516affe4d28cb7fa220a/taskcluster/ci/test/test-platforms.yml#207-214, a la https://searchfox.org/mozilla-central/rev/1b2636e8517aa48422ed516affe4d28cb7fa220a/taskcluster/ci/test/test-platforms.yml#199?

Sorry for continued pings on this, but it's obviously pretty important to get right, and have landed ASAP.

[0] https://searchfox.org/mozilla-central/rev/1b2636e8517aa48422ed516affe4d28cb7fa220a/taskcluster/ci/test/talos.yml#65-80
[1] https://searchfox.org/mozilla-central/rev/1b2636e8517aa48422ed516affe4d28cb7fa220a/taskcluster/ci/test/talos.yml#287-302

Flags: needinfo?(stephen.donner) → needinfo?(gbrown)

(In reply to Stephen Donner [:stephend] from comment #14)

Note that (?:windows10-64|windows7-32|linux64)(?:-qr)?/opt: ['mozilla-central', 'try'] does not apply to aarch64, but does apply to many Windows and Linux runs. It looks like this bug is about aarch64, right?

To only disable on windows10 aarch64, use something like

https://searchfox.org/mozilla-central/rev/1b2636e8517aa48422ed516affe4d28cb7fa220a/taskcluster/ci/test/talos.yml#411
run-on-projects:
by-test-platform:
windows10-aarch64/opt: []

or windows10-aarch64/opt: ['try'] if you want it to run on try only.

Flags: needinfo?(gbrown)
Assignee

Comment 16

a month ago

Thanks; gave it a whirl over in https://treeherder.mozilla.org/#/jobs?repo=try&revision=36371ee0b10df27e86f9692afff789b24786f5d2, which should, I hope:

  • enable Talos tests on Windows (```windows10-aarch64```` in this case) to run on try
  • limit talos-damp (damp) and talos-perf-reftest-singletons to try for windows10-aarch64
Assignee

Updated

a month ago
Priority: -- → P1

Comment 19

a month ago
Pushed by sdonner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6356a349f12f
Only run talos-perf-reftest-singletons via try, on windows10-aarch64. r=jmaher
Attachment #9060005 - Attachment is obsolete: true

Comment 20

a month ago
bugherder
Status: ASSIGNED → RESOLVED
Last Resolved: a month ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

it appears that the 'bcv' job is the only job running, I am not sure why based on a quick look at talos.yml, :stephend is there pending work to do here?

Status: RESOLVED → REOPENED
Flags: needinfo?(stephen.donner)
Resolution: FIXED → ---

It looks like bcv is running tier 1, while all the others are running as tier 3. I think they should all be tier 2? See also bug 1546595.

thanks :gbrown, this sounds like we just need to fix the tier status

Status: REOPENED → RESOLVED
Last Resolved: a month ago29 days ago
Flags: needinfo?(stephen.donner)
Resolution: --- → FIXED
Assignee

Updated

28 days ago
Depends on: 1547220
Assignee

Updated

28 days ago
Depends on: 1547044
Assignee

Updated

28 days ago
Depends on: 1547272
Assignee

Updated

18 days ago
See Also: → 1549272
Assignee

Updated

18 days ago
See Also: → 1549273
You need to log in before you can comment on or make changes to this bug.