Open Bug 1599295 Opened 2 years ago Updated 6 days ago

Intermittent http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/<test name> | No test results reported. (SCRIPT)

Categories

(Core :: JavaScript Engine, defect, P2)

Unspecified
Android
defect

Tracking

()

REOPENED

People

(Reporter: intermittent-bug-filer, Assigned: sfink)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(2 files, 1 obsolete file)

Filed by: shindli [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=278093947&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/MJSp9HspTauHnzKiUCZtMA/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/MJSp9HspTauHnzKiUCZtMA/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1


[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/constructor_bad-args.js | 965 / 3192 (30%)
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-PASS | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/constructor_bad-args.js | item 1
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/constructor_bad-args.js
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/detached-array-buffer-checks.js
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/detached-array-buffer-checks.js | 966 / 3192 (30%)
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-PASS | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/detached-array-buffer-checks.js | item 1
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/detached-array-buffer-checks.js
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/entries.js
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/entries.js | 967 / 3192 (30%)
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-PASS | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/entries.js | item 1
[task 2019-11-25T23:18:26.849Z] 23:18:16 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/entries.js
[task 2019-11-25T23:18:26.849Z] 23:18:26 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/every-and-some.js
[task 2019-11-25T23:18:26.849Z] 23:18:26 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/every-and-some.js | 968 / 3192 (30%)
[task 2019-11-25T23:18:26.849Z] 23:18:26 INFO - REFTEST TEST-PASS | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/every-and-some.js | item 1
[task 2019-11-25T23:18:26.849Z] 23:18:26 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/every-and-some.js
[task 2019-11-25T23:18:26.849Z] 23:18:26 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/fill-detached.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/fill-detached.js | 969 / 3192 (30%)
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-PASS | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/fill-detached.js | item 1
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/fill-detached.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/fill.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/fill.js | 970 / 3192 (30%)
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-PASS | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/fill.js | item 1
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/fill.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-species.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-species.js | 971 / 3192 (30%)
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-PASS | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-species.js | item 1
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-species.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js | 972 / 3192 (30%)
[task 2019-11-25T23:18:26.853Z] 23:18:26 WARNING - REFTEST TEST-UNEXPECTED-FAIL | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js | No test results reported. (SCRIPT)
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/find-and-findIndex.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/find-and-findIndex.js | 973 / 3192 (30%)
[task 2019-11-25T23:18:26.853Z] 23:18:26 WARNING - REFTEST TEST-UNEXPECTED-FAIL | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/find-and-findIndex.js | No test results reported. (SCRIPT)
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/find-and-findIndex.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/forEach.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/forEach.js | 974 / 3192 (30%)
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-PASS | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/forEach.js | item 1
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-END | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/forEach.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-START | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/from-iterable-validation.js
[task 2019-11-25T23:18:26.853Z] 23:18:26 INFO - REFTEST TEST-LOAD | http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/from-iterable-validation.js | 975 / 3192 (30%)

Summary: Intermittent http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js | No test results reported. (SCRIPT) → Intermittent http://10.7.205.220:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/<test name> | No test results reported. (SCRIPT)

(In reply to Andreea Pavel [:apavel] from comment #1)

Started with this merge: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=a0b2e7b06070b778426c2030e7192a1396399a01

Tom can it be from bug 1562667?

Based on the content of the failed test (filter-validation.js), I don't think it's related to Bug 1562667.
The failure test doesn't test SharedArrayBuffer or Atomic so it's probably not related to 73f35057a2b20a8b28f408175dbcb38439cba657 (Bug 1562667 - P4a - Enable SAB by default on nightly;)
5337448b7a79da8a3ff7619bb3cd01e33df01e28(Bug 1562667 - P4b - Disable jsref test for Atomic on Nightly) disable some js tests, but the disabled tests are "test262" and the failing tests are "non262".

I wonder if it's related to other commits of different bugs in that merge, but I have no idea who or which one should I ask for. So, redirect the ni back.

Flags: needinfo?(ttung) → needinfo?(apavel)

André, I'm not really familiar with the testing framework for js ref test. Could you verify my understanding in Comment #3?

I believe the failures are not related because the failures are testing different objects (TypeArrays instead of SharedArrayBuffer, or Atomic) and the failures are for different folders ("non262" rather than "test262").

Please feel free to let me know if I misunderstood anything or it could be related to my changes. Thanks in advance!

Flags: needinfo?(andrebargull)

The test case uses anyTypedArrayConstructors, which is defined in shell.js and uses TypedArray sub-classes which default to SharedArrayBuffer instead of ArrayBuffer.

Flags: needinfo?(andrebargull)

(In reply to André Bargull [:anba] from comment #5)

The test case uses anyTypedArrayConstructors, which is defined in shell.js and uses TypedArray sub-classes which default to SharedArrayBuffer instead of ArrayBuffer.

I see. That means it's possible that it's related. I will look into it. Thanks!

Flags: needinfo?(apavel)

If it's related to Bug 1562667, then it should only happen on Nightly since Bug 1562667 enables SAB only on Nightly.

(Mark it block the resab for now. If it's not related to SAB, then we could remove the depenedency)

Blocks: resab

(In reply to Intermittent Failures Robot from comment #2)

20 failures in 947 pushes (0.021 failures/push) were associated with this bug yesterday.

Repository breakdown:

  • mozilla-central: 12
  • autoland: 8

Platform breakdown:

  • android-hw-p2-8-0-arm7-api-16: 20

For more details, see:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1599295&startday=2019-11-27&endday=2019-11-27&tree=all

It all happened on Android Pixel

OS: Unspecified → Android

It's still hard for me to identify if it's a bug for SAB or not. It probably needs someone in the JS team to take a look into this.

Are there some restriction for SABs when accessing them from an <iframe>?

When running in the browser, this newGlobal is used. So these test cases are accessing SharedArrayBuffers from a different <iframe>. Can this somehow lead to issues?

There are 58 total failures in the last 7 days on android-hw-p2-8-0-arm7-api-16 pgo, this is a tier2 failure and there are no failures since the 30th of November, likely due to few pushes during the weekend.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=278864225&repo=autoland&lineNumber=11372

[task 2019-11-30T08:42:30.388Z] 08:42:19 INFO - REFTEST TEST-START | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js
[task 2019-11-30T08:42:30.388Z] 08:42:19 INFO - REFTEST TEST-LOAD | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js | 972 / 6106 (15%)
[task 2019-11-30T08:42:30.388Z] 08:42:30 WARNING - REFTEST TEST-UNEXPECTED-FAIL | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js | No test results reported. (SCRIPT)
[task 2019-11-30T08:42:30.388Z] 08:42:30 INFO - REFTEST TEST-END | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js
[task 2019-11-30T08:42:30.388Z] 08:42:30 INFO - REFTEST TEST-START | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/find-and-findIndex.js
[task 2019-11-30T08:42:30.388Z] 08:42:30 INFO - REFTEST TEST-LOAD | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/find-and-findIndex.js | 973 / 6106 (15%)
[task 2019-11-30T08:42:30.388Z] 08:42:30 WARNING - REFTEST TEST-UNEXPECTED-FAIL | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/find-and-findIndex.js | No test results reported. (SCRIPT)
[task 2019-11-30T08:42:30.388Z] 08:42:30 INFO - REFTEST TEST-END | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/find-and-findIndex.js
[task 2019-11-30T08:42:30.388Z] 08:42:30 INFO - REFTEST TEST-START | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/forEach.js
[task 2019-11-30T08:42:30.388Z] 08:42:30 INFO - REFTEST TEST-LOAD | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/forEach.js | 974 / 6106 (15%)
[task 2019-11-30T08:42:30.388Z] 08:42:30 WARNING - REFTEST TEST-UNEXPECTED-FAIL | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/forEach.js | No test results reported. (SCRIPT)
[task 2019-11-30T08:42:30.388Z] 08:42:30 INFO - REFTEST TEST-END | http://10.7.205.210:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/forEach.js

Whiteboard: [stockwell needswork:owner]

(In reply to André Bargull [:anba] from comment #10)

Are there some restriction for SABs when accessing them from an <iframe>?

Not sure the answer to this.

It's still not clear to me how this could be related to Bug 1562667. The changes in Bug 1562667 are flipping the pref on by default on Nightly and disable a few tests because of that.

If these tests were disabled due to SAB in the past and thus enabling the pref causes the failures, then it makes sense to me (from the viewpoint of failures) and we should disable these tests (also, file bugs for enabling them in the future). If they weren't disabled before Bug 1562667, then I have no idea why it could be related.

The tests use SharedArrayBuffers when available, so when bug 1562667 re-enabled SABs for web-content, the tests started to use them again, too.

Some of the now failing tests were added in bug 1122396, and at that point of time they seem to have succeeded on Android (or at least I don't see any indication that the parts testing SABs were somehow disabled on Android), so I have to guess that there's some kind of new issue that now leads to these intermittents.

Duplicate of this bug: 1600405
Duplicate of this bug: 1600406

Try-build with the <iframe> related parts enabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8520cdde71487794c2b71e99f73d656b936d9b7
Try-build with the <iframe> related parts disabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f70225204b5d96a4c8d6eae967aaae9b0e803dfa

This does seem to indicate there's some kind of issue with SharedArrayBuffers and <iframe>.

<iframe> should not matter one bit. Lars, can you take a look?

Flags: needinfo?(lhansen)

Given the symptoms (ARM32 + SAB + intermittent), we should expect to find an obscure ARM32 JIT problem, either some kind of bug where a loop never terminates (maybe because a length field is read improperly?), or a timeout because some SAB code is not optimized as well as it needs to be and we didn't notice this before. Ryan did some things with the length fields recently and it's entirely possible that some hidden assumption was violated.

The lengths used in the test cases are pretty short, so an outright bug seems more likely that a missing optimization. Yet with the newGlobal thing, we're going to have cross-compartment wrappers that may have to be handled and it's possible that there's a missing optimization around those for SABs that slow us down, or broken code generation for those ditto.

I'll look into it a bit, see what comes of it.

Flags: needinfo?(lhansen)

I've triggered another try-build with some additional logging. And based on the output logs for the tests where the execution is suddenly stopped (Note: Not all try-builds have already been started at the time of writing this comment):

We're in this loop for the case when

Above try push is green.

Duplicate of this bug: 1601485

(In reply to Andreea Pavel [:apavel] from comment #21)

Above try push is green.

That's a false positive due the functions I've used for logging.

Priority: P5 → P2
Duplicate of this bug: 1603446

Cf comment 3. I expect this is an older bug that is only becoming visible because we're now enabling the SAB feature and therefore running tests in automation. Fun times.

Edit: to be more explicit, what comment 3 says is not true, SharedArrayBuffer is being tested here, that's what comment 20 shows.

One thing we don't know is what stops the execution. The logs show the test case stopping during the same one-second interval as they are started, so it's not a timeout. But I'm a little suspicious since the times are unvarying down to the millisecond.

I added some more logging and did some test runs to try to pin down exactly where in that loop we quit, but no luck so far. https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1db8f9b0eed3a1ca7ce3d310364eec3c0bdb4bf. When we quit during these runs it's before that loop, or the test does not run at all (as in, there's some kind of major malfunction elsewhere).

Hi Lars, this has reached the disable-recommended queue, are there any updates? Can this be disabled?

Flags: needinfo?(lhansen)

(In reply to Andreea Pavel [:apavel] from comment #30)

Hi Lars, this has reached the disable-recommended queue, are there any updates? Can this be disabled?

Ideally not, as the test case is not the problem. There is probably a real bug here. I'm going to kick this up to Steven, since this is not a WebAssembly bug, I just ended up handling it because of the SharedArrayBuffer.

Flags: needinfo?(lhansen) → needinfo?(sdetar)

Sean, could you look at this. Since this is ARM32 related.

Flags: needinfo?(sdetar) → needinfo?(sstangl)
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]

Hi Sean, did you have the time to look? This has been in the disable recommended queue for a week now and failure rate is still high.

Flags: needinfo?(sstangl)
Flags: needinfo?(sstangl)
Blocks: 1606624

Note that we'd like to ship SharedArrayBuffer and atomics to Release in Fx 75 (without postMessage() support). Making this only block that rather than the overall effort of "resab", which includes postMessage() changes. Please let me know if Fixing this in the Fx 74 timeframe is unrealistic.

No longer blocks: resab

There are 49 total failures in the last 7 days on android-hw-p2-8-0-arm7-api-16 pgo and debug.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=284103835&repo=mozilla-central&lineNumber=10712

[task 2020-01-08T23:55:17.458Z] 23:55:17 INFO - REFTEST TEST-START | http://10.7.205.223:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js
[task 2020-01-08T23:55:17.458Z] 23:55:17 INFO - REFTEST TEST-LOAD | http://10.7.205.223:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js | 974 / 6118 (15%)
[task 2020-01-08T23:55:17.458Z] 23:55:17 WARNING - REFTEST TEST-UNEXPECTED-FAIL | http://10.7.205.223:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js | No test results reported. (SCRIPT)
[task 2020-01-08T23:55:17.458Z] 23:55:17 INFO - REFTEST TEST-END | http://10.7.205.223:8854/jsreftest/tests/jsreftest.html?test=non262/TypedArray/filter-validation.js

Steven can you assign someone else to take a look?

Flags: needinfo?(sstangl) → needinfo?(sdetar)
Whiteboard: [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Anne is this something you can work on? It has reached the disable recommended queue.

Flags: needinfo?(annevk)

I can't unfortunately. If you leave the bug open I don't mind if it the test is disabled for now, but the test failures really need to be fixed soonish as we want to ship this to release in 75.

Flags: needinfo?(annevk)

(In reply to Anne (:annevk) from comment #44)

I can't unfortunately. If you leave the bug open I don't mind if it the test is disabled for now, but the test failures really need to be fixed soonish as we want to ship this to release in 75.

This is failing in multiple tests so i'm not sure if disabling is the way to go.

Jason saw you are the owner of this module https://wiki.mozilla.org/Modules/All#Core, can you please take a look and assign someone to fix this?

Flags: needinfo?(jorendorff)

Update:
There have been 34 failures within the last 7 days, all of them on Android 8.0 Pixel2 debug/opt.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=286791732&repo=autoland&lineNumber=10716

Gentle reminder. This has been in our disable-recommended filter for some time now.

Flags: needinfo?(sphink)
Flags: needinfo?(sdetar)
Flags: needinfo?(jorendorff)

Update: there are 28 total failures in the last 7 days and 198 failures in the last 30 days, all on android-hw-p2-8-0-arm7-api-16.
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-01-28&endday=2020-02-04&tree=trunk&bug=1599295
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=287410407&repo=mozilla-central

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Flags: needinfo?(sphink)

Steve can you please take a look at this?

Flags: needinfo?(sphink)

I checked a logcat, but only found (lightly edited):

02-04 04:43:23.673  6164  6181 I Gecko   : {"action":"test_start","time":1580791403672,"thread":null,"pid":null,"source":"reftest","test":"js/src/tests/non262/TypedArray/slice.js"}
02-04 04:43:23.673  6164  6181 I Gecko   : REFTEST TEST-LOAD | http://10.7.205.151:8854/jsreftest/tests/js/src/tests/jsreftest.html?test=non262/TypedArray/slice.js | 1028 / 3259 (31%)
02-04 04:43:23.673  6164  6181 I Gecko   : {"action":"log","time":1580791403673,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"START http://10.7.205.151:8854/jsreftest/tests/js/src/tests/jsreftest.html?test=non262/TypedArray/slice.js"}
02-04 04:43:24.538  6255  6270 E Web Content: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://10.7.205.151:8854/jsreftest/tests/js/src/tests/jsreftest.html?test=non262/TypedArray/slice.js" line: 0}]
02-04 04:43:24.547  6164  6181 I Gecko   : {"action":"log","time":1580791404547,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] OnDocumentLoad triggering AfterOnLoadScripts"}
02-04 04:43:24.579  6164  6181 I Gecko   : {"action":"log","time":1580791404579,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"Initializing canvas snapshot"}
02-04 04:43:24.584  6255  6270 W Web Content: [JavaScript Warning: "Use of Mutation Events is deprecated. Use MutationObserver instead." {file: "resource://reftest/reftest-content.js" line: 975}]
02-04 04:43:24.586  6164  6181 I Gecko   : {"action":"log","time":1580791404585,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] AfterOnLoadScripts belatedly entering WaitForTestEnd"}
02-04 04:43:24.587  6164  6181 I Gecko   : {"action":"log","time":1580791404587,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] WaitForTestEnd: Adding listeners"}
02-04 04:43:24.588  6164  6181 I Gecko   : {"action":"log","time":1580791404588,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"Initializing canvas snapshot"}
02-04 04:43:24.591  6164  6181 I Gecko   : {"action":"log","time":1580791404591,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress"}
02-04 04:43:24.612  6164  6181 I Gecko   : {"action":"log","time":1580791404612,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT"}
02-04 04:43:24.613  6164  6181 I Gecko   : {"action":"log","time":1580791404613,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: dispatching MozReftestInvalidate"}
02-04 04:43:24.616  6164  6181 I Gecko   : {"action":"log","time":1580791404616,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress"}
02-04 04:43:24.633  6164  6181 I Gecko   : {"action":"log","time":1580791404633,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_FOR_REFTEST_WAIT_REMOVAL"}
02-04 04:43:24.633  6164  6181 I Gecko   : {"action":"log","time":1580791404633,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress"}
02-04 04:43:24.646  6164  6181 I Gecko   : {"action":"log","time":1580791404646,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_FOR_SPELL_CHECKS"}
02-04 04:43:24.646  6164  6181 I Gecko   : {"action":"log","time":1580791404646,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_FOR_APZ_FLUSH"}
02-04 04:43:24.648  6164  6181 I Gecko   : {"action":"log","time":1580791404648,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: APZ flush not required"}
02-04 04:43:24.650  6164  6181 I Gecko   : {"action":"log","time":1580791404649,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress"}
02-04 04:43:24.650  6164  6181 I Gecko   : {"action":"log","time":1580791404650,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_TO_FINISH"}
02-04 04:43:24.651  6164  6181 I Gecko   : {"action":"log","time":1580791404651,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: Doing sync flush to compositor"}
02-04 04:43:24.652  6164  6181 I Gecko   : {"action":"log","time":1580791404652,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: Completed"}
02-04 04:43:24.652  6164  6181 I Gecko   : {"action":"log","time":1580791404652,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] RecordResult fired"}
02-04 04:43:24.653  6164  6181 I Gecko   : {"action":"log","time":1580791404653,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"RecordResult fired"}
02-04 04:43:24.654  6164  6181 I Gecko   : {"action":"test_status","time":1580791404654,"thread":null,"pid":null,"source":"reftest","test":"js/src/tests/non262/TypedArray/slice.js","subtest":"No test results reported. (SCRIPT)\n","status":"FAIL","expected":"PASS","extra":{"status_msg":"UnexpectedFail"}}
02-04 04:43:24.654  6164  6181 I Gecko   : {"action":"test_end","time":1580791404654,"thread":null,"pid":null,"source":"reftest","test":"js/src/tests/non262/TypedArray/slice.js","status":"OK"}
02-04 04:43:24.654  6164  6181 I Gecko   : {"action":"log","time":1580791404654,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"Loading a blank page"}
02-04 04:43:24.692  6164  6181 I Gecko   : {"action":"assertion_count","time":1580791404692,"thread":null,"pid":null,"source":"reftest","test":"js/src/tests/non262/TypedArray/slice.js","min_expected":0,"max_expected":0,"count":0}

which seems to be the same as a passing test, up until the "No test results reported." line.

It does seem to report a pid, which doesn't change after the crash. So maybe the whole process isn't crashing.

Tomorrow, I'll take a look at what happens when an uncaught exception gets thrown. That's my only idea so far.

Are there any updates here Steve?

Just got back to this. The one thing I can say is that the symptom matches having an uncatchable exception being thrown. I added Object.prototype.throwUncatchable(), injected it into a random ref test (non262/get-set/getset-002.js), and pushed to try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dfc7d666164d550760be97eb412cac294fe79728&selectedJob=289085389

You can see that it produces the same symptom (which also appears in that push for the original failures in this bug).

I'll look at where that might be happening and instrument the relevant code paths next.

It's failing the SharedArrayRawBuffer::Allocate call at https://searchfox.org/mozilla-central/rev/96f1457323cc598a36f5701f8e67aedaf97acfcf/js/src/vm/SharedArrayObject.cpp#226

It doesn't seem to happen if I cut down the number of tests being run (though it seems to timeout instead?)

So, bug #1: this path should be reporting OOM. But there's still the question of why this is happening on ARM32. I have two other try pushes going to narrow this down.

At least for TypedArray/slice.js, the problem is that we allow a max of 1000 live shared buffers before failing (and then incorrectly not setting an error), and we hit the 1000 limit. From my printouts to logcat:

02-25 01:29:01.381  6518  6535 I Gecko   : Returning rawbuf = 0xd78b0fcc
02-25 01:29:01.382  6518  6535 I Gecko   : Allocate(length=12) called
02-25 01:29:01.382  6518  6535 I Gecko   : Returning rawbuf = 0xd7716fcc
02-25 01:29:01.382  6518  6535 I Gecko   : Allocate(length=4) called
02-25 01:29:01.382  6518  6535 I Gecko   : Returning rawbuf = 0xd7714fcc
02-25 01:29:01.382  6518  6535 I Gecko   : Allocate(length=0) called
02-25 01:29:01.382  6518  6535 I Gecko   : Returning rawbuf = 0xd7713fcc
02-25 01:29:01.383  6518  6535 I Gecko   : Allocate(length=0) called
02-25 01:29:01.383  6518  6535 I Gecko   : Returning rawbuf = 0xd7712fcc
02-25 01:29:01.383  6518  6535 I Gecko   : Allocate(length=0) called
02-25 01:29:01.383  6518  6535 I Gecko   : Returning rawbuf = 0xd7711fcc
02-25 01:29:01.383  6518  6535 I Gecko   : Allocate(length=0) called
02-25 01:29:01.383  6518  6535 I Gecko   : large allocation failure: 1000 exceeds max allowed live mapped buffers 1000
02-25 01:29:01.387  6518  6535 I Gecko   : failing allocation, will continue failing until liveBufferCount drops
02-25 01:29:01.387  6518  6535 I Gecko   : MapBufferMemory(4096,4096) return nullptr
02-25 01:29:01.390  6518  6535 I GeckoDump: TEST-INFO |  FAILED!  uncaught exception: out of memory 

In various cases we trigger an emergency GC when we hit limits like this, I wonder why not here.

(In reply to Lars T Hansen [:lth] from comment #60)

In various cases we trigger an emergency GC when we hit limits like this, I wonder why not here.

It looks like we're undercounting the memory for these buffers when calculating GC presssure -- we're only charging for the length, when in fact these map 1 or 2 extra pages. I'll upload a patch that appears to fix these failures, after I disentangle it from all my diagnostic code and make sure it still compiles.

It is still possible that you could hit this problem with lots of 0- or 1-element SAB allocations, since they'll only take up 4MB or 8MB (I'm not sure which) for the 1000 allocations. But if that happens, it should be relatively to track down now that it'll properly report OOM instead of silently failing.

Flags: needinfo?(sphink)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c012bf46a54f
Report out of memory errors in shared array buffer creation path. r=jonco
https://hg.mozilla.org/integration/autoland/rev/dd22925d7a0d
Account for the full mapped size of shared array buffer mappings. r=jonco
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75
Regressions: 1618502

(In reply to Dorel Luca [:dluca] from comment #66)

This is still happening on central:

The symptom has changed enough (even though it's the same basic cause) that I'd like to call that type of failure bug 1618502.

Lars, Steve, is this still bad enough to not push shared memory features to release? And if yes, does the argument change if postMessage() throws and therefore the memory is effectively not shareable? (In that case this should maybe block a different bug.)

At this point, I regard this failure as more of a testing nuisance than anything that should block release. Which is not to say that I shouldn't fix it somehow. But the only failure that has been observed is from artificially low limits on allocating these buffers.

When running these tests, its seems like they're not grabbing large chunks of VM space in the first place. Does this still happen? There's a comment that on Android Wasm will be making 6GB allocations for bounds checking or something, but I couldn't find a code path that does that.

So then I guess my question is: if liveBufferCount is only necessary to prevent hogging lots of VM space, and otherwise the regular malloc accounting is good enough, then should this perhaps only be used when allocating more VM than is actually being used? I'll put this question to Lars in the form of a patch.

Comment on attachment 9135488 [details]
Bug 1599295 - Possible fix -- only count against maximum if we are mapping more than using

Revision D68076 was moved to bug 1618502. Setting attachment 9135488 [details] to obsolete.

Attachment #9135488 - Attachment is obsolete: true
No longer blocks: 1606624
You need to log in before you can comment on or make changes to this bug.