Closed Bug 1532560 Opened 2 years ago Closed 2 years ago

Only run raptor-tp6-3 tests on AARM64 in try

Categories

(Testing :: Raptor, defect, P1)

Version 3
ARM64
Windows 10
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: stephend, Assigned: stephend)

References

(Blocks 2 open bugs)

Details

(Keywords: crash)

Attachments

(1 file, 5 obsolete files)

We should disable raptor-tp6-imgur-firefox[0] (from raptor-tp6-3) on Windows 10 AARM64, as it currently causes (exacerbates) a Blue Screen(s) of Death (tm)

This is due to an auto-play video on the test page at https://imgur.com/gallery/m5tYJL6, which crashes Windows via a VIDEO_SCHEDULER_INTERNAL_ERROR.

I haven't been able to get logs which show the crash (or even leading up to it), yet.

This is just the first bug of more to come, from the investigative work I'm catching up on over in bug 1531876.

[0] https://searchfox.org/mozilla-central/rev/92d11a33250a8e368c8ca3e962e15ca67117f765/testing/raptor/raptor/tests/raptor-tp6-3.ini#29-34

I verified that I get the same crash on my laptop running raptor-tp6-3.

(In reply to Geoff Brown [:gbrown] from comment #2)

I verified that I get the same crash on my laptop running raptor-tp6-3.

Thanks, Geoff. If you or others beat me to finding a regression range/root cause for this, obviously please post back!

Looks like Denis is also experiencing the same/similar issue on his Lenov as Geoff and I/others are, over in bug 1521335, which I believe this is essentially a DUPE of, but I'm leaving open for cleaner bug # references in related test commits.

See Also: → 1521335

I've added blocks for both 66 and 67 uplift tracking bugs. Please remove if not in scope/applicable.

Rob, Dave: what should we do, here? :nbp, :jya, and :froydnj recommend we try to black-list the current hardware decoder, instead of xfailing/disabling the test.

I tested disabling hardware acceleration locally via about:config (media.hardware-video-decoding.enabled toggled/set to false) on https://imgur.com/gallery/m5tYJL6, and scrolling/navigating/reloading is all currently stable.

Flags: needinfo?(rwood)
Flags: needinfo?(dave.hunt)

(In reply to Stephen Donner [:stephend] from comment #5)

Rob, Dave: what should we do, here? :nbp, :jya, and :froydnj recommend we try to black-list the current hardware decoder, instead of xfailing/disabling the test.

Right so if the failure is only on that one platform then yes we can't disable the test in the test INI as that will disable it everywhere. It will need to be done at the taskcluster configs level - set that test so on Win 10 AARM64 it only will run on try.

i.e in the raptor-tp6-3-firefox test entry [0] add a

run-on-projects:
    by-test-platform:
        whatever-the-win10-aarm64-name-is: ['try']

[0] https://searchfox.org/mozilla-central/rev/fbb251448feb7276f9b1d0a88f9c0cb1cd144ce4/taskcluster/ci/test/raptor.yml#107

Flags: needinfo?(rwood)

if we do make this try only, please make it tier-2 as well, if not tier-3

Assignee: nobody → stephen.donner
Attached file Bug 1532560 - Fix test-platform name (obsolete) —

Depends on D23193

Attachment #9050414 - Attachment description: Fix bug 1532560: Only run raptor-tp6-3 tests on AARM64 in try → Fix Bug 1532560: Only run raptor-tp6-3 tests on AARM64 in try

(In reply to Stephen Donner [:stephend] from comment #5)

Rob, Dave: what should we do, here? :nbp, :jya, and :froydnj recommend we try to black-list the current hardware decoder, instead of xfailing/disabling the test.

I tested disabling hardware acceleration locally via about:config (media.hardware-video-decoding.enabled toggled/set to false) on https://imgur.com/gallery/m5tYJL6, and scrolling/navigating/reloading is all currently stable.

:rwood Are we able to set preferences for specific platforms? If so, perhaps that will allow us to continue running the test on this platform.

Flags: needinfo?(dave.hunt) → needinfo?(rwood)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #11)

:rwood Are we able to set preferences for specific platforms? If so, perhaps that will allow us to continue running the test on this platform.

No, Raptor prefs are set in each Raptor test INI and not at the platform level i.e. [0]. This will need to be done at taskcluster configs level.

[0] https://searchfox.org/mozilla-central/rev/4763b8d576ce52625d245d1ab6d9404ea025b026/testing/raptor/raptor/tests/raptor-wasm-misc-baseline.ini#20

Flags: needinfo?(rwood)

(In reply to Robert Wood [:rwood] from comment #13)

Applied all patches locally and pushed to try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5a299877151d6048b17c49f050dbe5ecad097ec7

^ tp6-3 still shows as tier 1; there's an update needed to your 'fix platform name' commit (see phab) thanks!

Attachment #9051378 - Attachment is obsolete: true
Attachment #9051373 - Attachment is obsolete: true
Attachment #9050414 - Attachment is obsolete: true

(In reply to Stephen Donner [:stephend] from comment #16)

OK, pushed https://treeherder.mozilla.org/#/jobs?repo=try&revision=74889fdc771a48d36c0d5ca001e0cd8f28cc2a80 to try

I've added raptor tp6-3 on windows10-aarch64 opt to your try run (via 'add new jobs' on treeherder).

Attachment #9050414 - Attachment is obsolete: false
Status: NEW → ASSIGNED
Priority: -- → P1
Summary: Disable raptor-tp6-imgur-firefox on AARM64, due to operating-system level crashes → Only run raptor-tp6-3 tests on AARM64 in try
Pushed by sdonner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1ccb52b0784e
Fix Bug 1532560:  Only run raptor-tp6-3 tests on AARM64 in try r=jmaher,rwood,gbrown
Attachment #9051812 - Attachment is obsolete: true

(In reply to Arthur Iakab [arthur_iakab] from comment #19)

<snip>

Stephen can you please take a look?

Yep; on it - sorry about this bustage. I've pushed https://hg.mozilla.org/try/rev/de8777a6aa392fb34a81f4cabf4d396774a0c926 to try.

Attachment #9050414 - Attachment is obsolete: true
Flags: needinfo?(stephen.donner)

this is running in automation, I will do some work in bug 1541835 to hack on this.

Attachment #9054994 - Attachment is obsolete: true

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #23)

this is running in automation, I will do some work in bug 1541835 to hack on this.

Thanks, Joel! While I'm still catching up/trying to grok the changes, is there more left for me to do, here? I don't want to miss anything.

Flags: needinfo?(jmaher)

right now this is running properly, so I believe this specific bug is all done; we haven't started running talos on here as that had some issues last time we ran it all.

Flags: needinfo?(jmaher)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #25)

right now this is running properly, so I believe this specific bug is all done; we haven't started running talos on here as that had some issues last time we ran it all.

Thanks so much (as always), Joel!

I still owe (perhaps among other things) a writeup of the process, both for myself as well as general reference.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.