Closed Bug 1645075 Opened 4 years ago Closed 4 years ago

Perma Linux SM(tsan) js/src/jit-test/tests/wasm/tables.js | /builds/worker/checkouts/gecko/js/src/jit-test/lib/wasm.js line 12 > WebAssembly.Module:222:1 RuntimeError: indirect call to null (code 3, args " when Gecko 79 merges to Beta on 2020-06-29

Categories

(Core :: JavaScript: WebAssembly, defect, P1)

x86_64
Linux
defect

Tracking

()

RESOLVED WORKSFORME
mozilla79
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox77 --- unaffected
firefox78 --- unaffected
firefox79 + fixed

People

(Reporter: aryx, Assigned: rhunt)

References

(Regression)

Details

(Keywords: regression)

central-as-beta simulation: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&revision=41d33c6c9caa97604eb73694179e1ec0d5d426f7&selectedTaskRun=O-MiVozyT_mSxGNPNsC9sQ.0

Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3d1e9c77a42dec977bd7a22e2668af56b2587145&tochange=b2df79a80c0303df9d710800ae37dce56847eef5

Log: https://treeherder.mozilla.org/logviewer.html#?job_id=305909020&repo=try

[task 2020-06-11T10:25:12.056Z] TEST-PASS | js/src/jit-test/tests/wasm/streaming.js | Success (code 0, args "--ion-eager --ion-check-range-analysis --ion-extra-checks --no-sse3") [3.1 s]
[task 2020-06-11T10:25:12.066Z] /builds/worker/checkouts/gecko/js/src/jit-test/lib/wasm.js line 12 > WebAssembly.Module:222:1 RuntimeError: indirect call to null
[task 2020-06-11T10:25:12.066Z] Stack:
[task 2020-06-11T10:25:12.066Z] call@/builds/worker/checkouts/gecko/js/src/jit-test/lib/wasm.js line 12 > WebAssembly.Module:wasm-function[7]:0xde
[task 2020-06-11T10:25:12.066Z] @/builds/worker/checkouts/gecko/js/src/jit-test/tests/wasm/tables.js:235:10
[task 2020-06-11T10:25:12.066Z] Exit code: 3
[task 2020-06-11T10:25:12.066Z] FAIL - wasm/tables.js
[task 2020-06-11T10:25:12.066Z] TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/wasm/tables.js | /builds/worker/checkouts/gecko/js/src/jit-test/lib/wasm.js line 12 > WebAssembly.Module:222:1 RuntimeError: indirect call to null (code 3, args "--ion-eager --ion-check-range-analysis --ion-extra-checks --no-sse3") [1.0 s]
[task 2020-06-11T10:25:12.066Z] INFO exit-status : 3
[task 2020-06-11T10:25:12.066Z] INFO timed-out : False
[task 2020-06-11T10:25:12.066Z] INFO stderr 2> /builds/worker/checkouts/gecko/js/src/jit-test/lib/wasm.js line 12 > WebAssembly.Module:222:1 RuntimeError: indirect call to null
[task 2020-06-11T10:25:12.066Z] INFO stderr 2> Stack:
[task 2020-06-11T10:25:12.066Z] INFO stderr 2> call@/builds/worker/checkouts/gecko/js/src/jit-test/lib/wasm.js line 12 > WebAssembly.Module:wasm-function[7]:0xde
[task 2020-06-11T10:25:12.066Z] INFO stderr 2> @/builds/worker/checkouts/gecko/js/src/jit-test/tests/wasm/tables.js:235:10

Lars, can you check the push log what regressed this, please?

Flags: needinfo?(lhansen)

The test case could indicate a new problem with call_indirect, maybe a fallout from ongoing work. But since this is a problem for beta sim, it's more likely a problem with reference types? Ryan, could you take a look?

Flags: needinfo?(lhansen)
Flags: needinfo?(rhunt)

Okay, I've tried everything I can think of to reproduce this locally and can't seem to be able to.

I'm on a 64bit linux system, compiled on the linked revision from try, using SM standalone, with TSAN enabled, with the args provided. I've also tried larger runs of tests, the default args, non TSAN builds, debug builds, and release builds. I also tried just toggling wasm features on a normal build and wasn't seeing anything, so it's not obviously feature issue.

It's a bit mysterious, but I feel like I have to be missing something obvious. Will continue trying to reproduce this, or maybe pull down the build to see if I can figure out what's going on from it.

Flags: needinfo?(rhunt)

Scanning the pushlog... We could try backing out Dmitry's patch since it futzes with the ABI for indirect calls and is a suspect in this matter: https://hg.mozilla.org/mozilla-central/rev/f1beae5af8565899c72dfccafe9a7eacdb0c708e. (I may not have time to get to that until Monday.)

Frequently we have failures on test hardware and not locally because test hardware is pretty feeble esp wrt core counts, as in, we could perhaps try to lower the core count either through the command line switch or by manipulating the core affinity for the test case.

Ryan, could you clarify what breaks the test? What exactly corresponds to [1] bad, [2] good?

Flags: needinfo?(rhunt)

(In reply to Dmitry Bezhetskov from comment #8)

Ryan, could you clarify what breaks the test? What exactly corresponds to [1] bad, [2] good?

I was trying to narrow down the changeset that caused the test failure as I can't reproduce this locally. So [1] was a revision that failed the test and [2] was a revision that passed the test. So the regressor changeset had to be somewhere in-between. However I messed up the commits I chose and there was a 5-day gap meaning that there's not much we can tell from it.

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #9)

The regressor should be in the push log mentioned in comment 0: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3d1e9c77a42dec977bd7a22e2668af56b2587145&tochange=b2df79a80c0303df9d710800ae37dce56847eef5

First bad (ignore the lines at the top): https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&revision=2148cea246618988ca0e4be17793b2622ce3e26f&selectedTaskRun=SNGLGUXgT--bfMSMJMXb7w.0
Base revision: https://hg.mozilla.org/mozilla-central/rev/b2df79a80c0303df9d710800ae37dce56847eef5

Last good: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&revision=09ee4be5cc8391808c4f6c132a9733bfb2d9c99f&searchStr=SM%28tsan%29&selectedTaskRun=CHe8JyPGSdKwKslhetHY1g.0
Base revision: https://hg.mozilla.org/mozilla-central/rev/3d1e9c77a42dec977bd7a22e2668af56b2587145

Ah, thank you! I should have read closer before setting out to do my own bisection..

Flags: needinfo?(rhunt)

Sorry for the confusion, I noticed the revision for the last good central-as-beta sim was wrong. The actual pushlog (after some bisection) is https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=63dc5e9b1b02b0aebd6badfe5eaef7bb9aa8f430&tochange=7f7b983390650cbc7d736e92fd3e1f629a30ac02

df279d4082d84c4205c67a5903f808583c69c098 is already affected.

Assignee: nobody → rhunt
Severity: -- → S2
Status: NEW → ASSIGNED
OS: Unspecified → Linux
Priority: -- → P1
Hardware: Unspecified → x86_64

Bisection shows this started when bug 1643013 landed.

Regressed by: 1643013
Has Regression Range: --- → yes

That's amazing. Adam and I will look at it tomorrow.

It isn't even the one that added a slot to the global object...

And the feature that patch works on is preffed off by default.

The issue is gone in beta simulations based on recent central revisions. Is further action needed here?

That's incredibly weird. But if it's not showing up anymore, I don't think we have any hope of solving the issue unless it comes back. So I'd say we should close this.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
Target Milestone: --- → mozilla79
You need to log in before you can comment on or make changes to this bug.