Closed Bug 1520496 Opened 1 year ago Closed 1 year ago

Intermittent testGCAllocator | /builds/worker/workspace/build/src/js/src/jsapi-tests/testGCAllocator.cpp:137:CHECK failed: positionIsCorrect("x--xx--xx-oox---", stagingArea, chunkPool, tempChunks, UseLastDitchAllocator)

Categories

(Core :: JavaScript: GC, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: ehoogeveen)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Filed by: aciure [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=222210767&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/XdJQgraXSdeCWxAMJEfcBA/runs/0/artifacts/public/logs/live_backing.log

06:22:32 INFO - TEST-PASS | testGCCellPtr | ok
06:22:32 INFO - testGCAllocator
06:22:32 WARNING - TEST-UNEXPECTED-FAIL | testGCAllocator | /builds/worker/workspace/build/src/js/src/jsapi-tests/testGCAllocator.cpp:137:CHECK failed: positionIsCorrect("x--xx--xx-oox---", stagingArea, chunkPool, tempChunks, UseLastDitchAllocator)
06:22:32 INFO - testFunctionProperties
06:22:32 INFO - TEST-PASS | testFunctionProperties | ok

Could this be related to bug 1502733 which just landed?

Component: JavaScript Engine → JavaScript: GC
Flags: needinfo?(emanuel.hoogeveen)

That bug enabled the test on more platforms, so that seems likely. It's odd to see intermittent failures here given that the test has been (as far as I know) consistently passing for years on other configurations, but maybe bug 1502733 changed something. Either way I'll have a look.

Flags: needinfo?(emanuel.hoogeveen)

So this failure is on OSX opt, which is odd as that should be using the new algorithm to begin with. But it's also a platform that has been running these tests since they were created.

I wonder if this is a case where the binary search actually failed, returning a smaller address range than the system has and making it fall back to the old algorithm. If so, we might need to make it more careful, but it still doesn't explain why the tests would fail when running the old algorithm.

If I'm reading the results right, it's correctly determining that addresses are given out in increasing order on OSX, and the test that's failing is the very last one, checking the last ditch algorithm. I wonder if my small changes to the chunk alignment code broke that somehow, perhaps only for OSX (being the only Unix-based platform where addresses are given out in increasing order).

Blocks: 1502733

Bug 1522294 should fix the intermittent failures here, but I'd like to use this bug to tighten up the address range checking (since it's apparently failing far more frequently than I would have expected).

Depends on: 1522294

With the patch in bug 1522294 landed the assertion failures here should be gone, but the fact that we got assertion failures at all indicates that we're falling back to the old allocator. Not often, but once every few thousand pushes is still a fair bit.

This patch tightens up the address range detection code by adding a third stage that checks larger addresses again with double the tries. This should almost never get beyond the first set of tries in practice. I moved the inner loop into a separate function to avoid duplicating it once more, and tweaked the first stage to check 47-bit addresses as well before moving onto the binary search, which should be faster on x86-64 platforms (and makes the failure in this bug less likely - there's a much lower chance of stumbling on an allocated region in the 47-bit range than in the 40-bit step of the binary search).

Last non-Phabricator review? I'll have to get that set up at some point...

Assignee: nobody → emanuel.hoogeveen
Status: NEW → ASSIGNED
Attachment #9046085 - Flags: review?(sphink)
Comment on attachment 9046085 [details] [diff] [review]
Make address range check more reliable.

Review of attachment 9046085 [details] [diff] [review]:
-----------------------------------------------------------------

Heh, this stuff is crazy, and it's hard to believe it all gets hit. But apparently it does. I can't spot anything wrong here. (I'm not sure that means much, but whatever.)

Thanks!
Attachment #9046085 - Flags: review?(sphink) → review+
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

This one can ride the trains I think. The failures should be gone with bug 1522294, and the situation addressed by this patch is harmless except on platforms with 48-bit pointers - where the failure state should be several orders of magnitude less likely.

You need to log in before you can comment on or make changes to this bug.