1736430 - Perma tests/jit-test/jit-test/tests/wasm/large-memory.js | Unknown (code -11, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.0 s] | (code 138, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.2 s]

Summary: Perma tests/jit-test/jit-test/tests/wasm/large-memory.js | Unknown (code -11, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.0 s] → Perma tests/jit-test/jit-test/tests/wasm/large-memory.js | Unknown (code -11, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.0 s] | (code 138, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.2 s]

Comment hidden (Intermittent Failures Robot)

Lars T Hansen [:lth]

Assignee

Updated

•

3 years ago

Assignee: nobody → lhansen

Status: NEW → ASSIGNED

Priority: P5 → P3

Lars T Hansen [:lth]

Assignee

Updated

•

3 years ago

Component: JavaScript Engine → Javascript: WebAssembly

Lars T Hansen [:lth]

Assignee

Comment 4

•

3 years ago

To summarize:

jit-test/tests/wasm/large-memory.js fails with what appears to be a SIGSEGV on MacOS 11 x64 and on Android 8 arm64. This test was not changed by the memory64 patch set so something got perturbed in a way to make this fail. This could be a code bug or the introduction of a concurrent test case that causes turbulence.

If the systems are underprovisioned on real memory or swap then the huge memory demands from the jit-test/tests/wasm/memory64/basic.js test that could be running concurrently could in principle cause some overcommit issues, but that's not a completely obvious candidate.

Both failing builds are opt builds, but this could be happenstance - not sure if non-opt builds are even running on these devices. It looks like the failure appears with various command line parameters. In particular, it also occurs with --disable-wasm-huge-memory.

Not able to repro the mac failure locally, but I have a MBP with a newer OS and lots of RAM so that's not much data. With a full jit-test --tbpl run, memory use on the system never got very high at all.

The next step here would probably be to disable jit-test/tests/wasm/memory64/basic.js on the affected devices to see if that makes any difference on the outcome of the other test. If it does, the problem has to do with the provisioning of the test systems. If it does not, the memory64 patch introduced a bug.

Flags: needinfo?(lhansen)

BugBot [:suhaib / :marco/ :calixte]

Comment 5

•

3 years ago

Set release status flags based on info from the regressing bug 1727084

status-firefox93: --- → unaffected

status-firefox94: --- → unaffected

status-firefox95: --- → affected

status-firefox-esr78: --- → unaffected

status-firefox-esr91: --- → unaffected

Lars T Hansen [:lth]

Assignee

Comment 6

•

3 years ago

Hah, I can repro on try at least: https://treeherder.mozilla.org/jobs?repo=try&revision=b3fdd8d557062832a3f4b9e50ea60c3c7f4540eb&selectedTaskRun=EYfGCe83Qs6UC2WbS-jRcg.0

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 7

•

3 years ago

Disabling basic.js didn't reproduce the Windows ccov failure but macOS and Android persisted.

Lars T Hansen [:lth]

Assignee

Comment 8

•

3 years ago

Yeah. I found another problem last night that may be the cause of the present bug, will test that this morning.

Lars T Hansen [:lth]

Assignee

Comment 9

•

3 years ago

•

Edited

I can't repro locally with the failing artifact, and the other bug I thought might be connected almost certainly isn't, so I'm going to have to bisect this on try in the hope of tracking down some non-obvious problem. I will update this comment as I progress.

Range of updates:

Newest:
changeset: ______:80388e7f335c
user: Lars T Hansen <lhansen@mozilla.com>
date: Mon Oct 18 09:58:16 2021 +0000
summary: Bug 1727084 - Memory64 - Test cases and testing code. r=yury
Try run on that patch: https://treeherder.mozilla.org/jobs?repo=try&revision=eeb728b75e64e981f8032344d14a599dd60f64a4 shows (last entry for "OS X 11 WebRender Shippable") the desired failure

changeset: ______:13ee9674ee35
user: Lars T Hansen <lhansen@mozilla.com>
date: Mon Oct 18 09:58:14 2021 +0000
summary: Bug 1727084 - Memory64 - Bulk memory operations. r=yury
Try run on that patch: https://treeherder.mozilla.org/jobs?repo=try&revision=42ec18a0685d80b97690f133693826e914920e94&selectedTaskRun=YVqRvipjQ5utWHMJ64OhRA.0 shows the desired failure.

changeset: ______:b658cfe4b173
user: Lars T Hansen <lhansen@mozilla.com>
date: Mon Oct 18 09:58:14 2021 +0000
summary: Bug 1727084 - Memory64 - Expose the index type via js-types. r=yury
Try run on that patch: https://treeherder.mozilla.org/jobs?repo=try&revision=29ad95f7325351c973d8d4d61d5ea5e6e0769bc5&selectedTaskRun=GgOEvTsrTfq5MhGkr2gHxg.0 shows no failure.

changeset: ______:caca657178e4
user: Lars T Hansen <lhansen@mozilla.com>
date: Mon Oct 18 09:58:14 2021 +0000
summary: Bug 1727084 - Memory64 - Allow larger-than-4GB allocations. r=yury
Try run on that patch: https://treeherder.mozilla.org/jobs?repo=try&revision=1412f60882731b418d484c82cef651223d42fc7e shows no failure.

Oldest:
changeset: ____:83e52246d0ea
user: Lars T Hansen <lhansen@mozilla.com>
date: Mon Oct 18 09:58:13 2021 +0000
summary: Bug 1727084 - Memory64 - Huge-memory status depends on index type. r=yury
(Unknown)

Try run on the patch before the oldest patch: https://treeherder.mozilla.org/jobs?repo=try&revision=1470bbfa0c0dfd6bf40b1c4cc6fa0da9318e08ba&selectedTaskRun=MPGnd5CaQoqshprHrr8PBg.0 shows no failure, as expected and desired.

In conclusion, it looks like the bulk memory change created this problem.

Lars T Hansen [:lth]

Assignee

Updated

•

3 years ago

Blocks: wasm64

Comment hidden (Intermittent Failures Robot)

Lars T Hansen [:lth]

Assignee

Comment 11

•

3 years ago

The test run succeeds if I remove the bulk memory tests from large-memory.js so I think we have a smoking gun. Tomorrow I'll try to narrow it down to one of memory.copy, memory.fill, memory.init, and memory.grow.

It's worth noting that there have been no failures for several days, and I can't repro on current central. It is possible that this is somehow related to bug [redacted] (now fixed), though it's a little hard to see how precisely. It's possible there have been no failures since there's been little activity on the weekend. But since I have been able to repro every time I've tried with patches in that queue, and I can't now, it's possible that the problem has been fixed.

Lars T Hansen [:lth]

Assignee

Comment 12

•

3 years ago

Oh, there's an important detail: the name of the job that fails is "OS X 11 WebRender Shippable opt test-macosx1100-64-shippable-qr/opt-jittest-1proc Jit" and the artifact is "macosx1100-64-shippable-qr". I took this to mean x64. But it is not: the orange factor graph shows that all failures are on an M1 Mac Mini (in addition to the pixel 2). That is, this is exclusively an arm64 bug, and that makes it vastly more likely that bug [redacted] (now fixed) was the cause of this.

I'm not sure who to blame for the mixup here; in truth, macosx builds are multi-arch, so "64" is technically correct, even if confusing (to me anyhow).

Lars T Hansen [:lth]

Assignee

Comment 13

•

3 years ago

Sebastian, re comment 12, the fact that this is an arm64 bug is very well hidden. Consider the failure on https://treeherder.mozilla.org/jobs?repo=try&revision=42ec18a0685d80b97690f133693826e914920e94&selectedTaskRun=YVqRvipjQ5utWHMJ64OhRA.0. If I select the failing run, I find no indication of architecture in any of the panes. If I inspect the task (from the meatball menu), ditto. And if I open the log and scroll to the top, I get this confusing collection of facts:

Worker Type (releng-hardware/gecko-t-osx-1100-m1) settings:
[taskcluster 2021-10-23T07:01:19.596Z]   {
[taskcluster 2021-10-23T07:01:19.596Z]     "arch": "x86_64",
[taskcluster 2021-10-23T07:01:19.596Z]     "config": {
[taskcluster 2021-10-23T07:01:19.596Z]       "deploymentId": ""
[taskcluster 2021-10-23T07:01:19.596Z]     },
[taskcluster 2021-10-23T07:01:19.596Z]     "disk_size": "228.27 GiB",
[taskcluster 2021-10-23T07:01:19.596Z]     "generic-worker": {
[taskcluster 2021-10-23T07:01:19.596Z]       "engine": "simple",
[taskcluster 2021-10-23T07:01:19.596Z]       "go-arch": "arm64",
[taskcluster 2021-10-23T07:01:19.596Z]       "go-os": "darwin",
[taskcluster 2021-10-23T07:01:19.596Z]       "go-version": "go1.16.4",
[taskcluster 2021-10-23T07:01:19.596Z]       "release": "https://github.com/taskcluster/taskcluster/releases/tag/v30.0.2",
[taskcluster 2021-10-23T07:01:19.596Z]       "revision": "6fdba0dad3ef52d4c547a794901f75b7171e3172",
[taskcluster 2021-10-23T07:01:19.596Z]       "source": "https://github.com/taskcluster/taskcluster/commits/6fdba0dad3ef52d4c547a794901f75b7171e3172",
[taskcluster 2021-10-23T07:01:19.596Z]       "version": "30.0.2"
[taskcluster 2021-10-23T07:01:19.596Z]     },
[taskcluster 2021-10-23T07:01:19.596Z]     "ip": "10.155.0.59",
[taskcluster 2021-10-23T07:01:19.596Z]     "machine-setup": {
[taskcluster 2021-10-23T07:01:19.596Z]       "config": "https://github.com/mozilla-platform-ops/ronin_puppet"
[taskcluster 2021-10-23T07:01:19.596Z]     },
[taskcluster 2021-10-23T07:01:19.596Z]     "memory": "16 GB",
[taskcluster 2021-10-23T07:01:19.596Z]     "model_identifier": "Macmini9,1",
[taskcluster 2021-10-23T07:01:19.596Z]     "processor_cores": "8",
[taskcluster 2021-10-23T07:01:19.596Z]     "processor_count": "1",
[taskcluster 2021-10-23T07:01:19.596Z]     "processor_name": "Unknown",
[taskcluster 2021-10-23T07:01:19.596Z]     "processor_speed": "2.4 GHz",
[taskcluster 2021-10-23T07:01:19.596Z]     "system_version": "macOS 11.2.3 (20D91)",
[taskcluster 2021-10-23T07:01:19.596Z]     "workerGroup": "macstadium-vegas",
[taskcluster 2021-10-23T07:01:19.596Z]     "workerId": "macmini-m1-49"
[taskcluster 2021-10-23T07:01:19.596Z]   }

There are several clues here that this is an arm64 / M1 machine, yet "arch" is plainly stated to be "x86_64".

This situation seems suboptimal. Where might I file bugs about making it more difficult to make the same mistake that I did about the architecture?

Flags: needinfo?(aryx.bugmail)

Lars T Hansen [:lth]

Assignee

Updated

•

3 years ago

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Resolution: --- → DUPLICATE

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 15

•

3 years ago

Sorry for the trouble. Please file a bug in Firefox Build System :: Task Configuration and needinfo glandium and CC me. Thank you.

Flags: needinfo?(aryx.bugmail)

BMO Automation

Updated

•

3 years ago

Has Regression Range: --- → yes

Bugzilla

Perma tests/jit-test/jit-test/tests/wasm/large-memory.js | Unknown (code -11, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.0 s] | (code 138, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.2 s]

Categories

(Core :: JavaScript: WebAssembly, defect, P3)

Tracking

()

People

(Reporter: intermittent-bug-filer, Assigned: lth)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, regression)

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 3

Updated

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Comment 15

Updated