Intermittent Hit MOZ_CRASH(ElementAt(aIndex = 18446744073709551615, aLength = 920)) at /builds/worker/checkouts/gecko/mfbt/Assertions.cpp:51 | single tracking bug
Categories
(Core :: Networking: Cache, defect, P2)
Tracking
()
People
(Reporter: intermittent-bug-filer, Assigned: jstutte)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure, intermittent-testcase, Whiteboard: [necko-triaged][necko-priority-new])
Crash Data
Attachments
(1 file)
Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=444144151&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bzWHdERYQNqmJb1zf2rZDQ/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bzWHdERYQNqmJb1zf2rZDQ/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[task 2024-01-22T15:55:32.879Z] 15:55:32 INFO - TEST-START | /css/css-flexbox/flexbox-with-multi-column-property.html
[task 2024-01-22T15:55:32.882Z] 15:55:32 INFO - PID 12084 | 1705938932881 Marionette INFO Testing http://web-platform.test:8000/css/css-flexbox/flexbox-with-multi-column-property.html == http://web-platform.test:8000/css/css-flexbox/reference/flexbox-with-multi-column-property-ref.html
[task 2024-01-22T15:55:32.895Z] 15:55:32 INFO - PID 12084 | [Child 12509, Main Thread] WARNING: JSWindowActorChild::SendRawMessage (MarionetteReftest, MarionetteReftestParent:reftestWait) not sent: !CanSend() || !mManager || !mManager->CanSend(): file /builds/worker/checkouts/gecko/dom/ipc/jsactor/JSWindowActorChild.cpp:61
[task 2024-01-22T15:55:32.904Z] 15:55:32 INFO - PID 12084 | Hit MOZ_CRASH(ElementAt(aIndex = 18446744073709551615, aLength = 920)) at /builds/worker/checkouts/gecko/mfbt/Assertions.cpp:51
[task 2024-01-22T15:55:32.909Z] 15:55:32 INFO - PID 12084 | ExceptionHandler::GenerateDump cloned child ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2024-01-22T15:55:32.910Z] 15:55:32 INFO - PID 12084 | 14073
[task 2024-01-22T15:55:32.910Z] 15:55:32 INFO - PID 12084 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2024-01-22T15:55:33.025Z] 15:55:33 INFO - PID 12084 | [WARN firefox_on_glean::private::timing_distribution] Elapsed milliseconds larger than fits into 32-bytes. Saturating at u32::MAX.
[task 2024-01-22T15:55:33.038Z] 15:55:33 INFO - PID 12084 | [Child 12695, IPC I/O Child] WARNING: [6DC4289873E3BC2C.564AC89C54A0EB5C]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:352
[task 2024-01-22T15:55:33.042Z] 15:55:33 WARNING - Exception in TextExecutor.run:
[task 2024-01-22T15:55:33.043Z] 15:55:33 WARNING - Traceback (most recent call last):
[task 2024-01-22T15:55:33.043Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/base.py", line 313, in run_test
[task 2024-01-22T15:55:33.043Z] 15:55:33 WARNING - result = self.do_test(test)
[task 2024-01-22T15:55:33.044Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 1115, in do_test
[task 2024-01-22T15:55:33.044Z] 15:55:33 WARNING - result = self.implementation.run_test(test)
[task 2024-01-22T15:55:33.044Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 1192, in run_test
[task 2024-01-22T15:55:33.044Z] 15:55:33 WARNING - rv = self.executor.protocol.marionette._send_message("reftest:run",
[task 2024-01-22T15:55:33.044Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/decorators.py", line 34, in _
[task 2024-01-22T15:55:33.044Z] 15:55:33 WARNING - m._handle_socket_failure()
[task 2024-01-22T15:55:33.044Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/marionette.py", line 820, in _handle_socket_failure
[task 2024-01-22T15:55:33.045Z] 15:55:33 WARNING - reraise(exc_cls, exc, tb)
[task 2024-01-22T15:55:33.045Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/third_party/six/six.py", line 719, in reraise
[task 2024-01-22T15:55:33.045Z] 15:55:33 WARNING - raise value
[task 2024-01-22T15:55:33.045Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/decorators.py", line 24, in _
[task 2024-01-22T15:55:33.045Z] 15:55:33 WARNING - return func(*args, **kwargs)
[task 2024-01-22T15:55:33.045Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/marionette.py", line 769, in _send_message
[task 2024-01-22T15:55:33.046Z] 15:55:33 WARNING - msg = self.client.request(name, params)
[task 2024-01-22T15:55:33.046Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/transport.py", line 382, in request
[task 2024-01-22T15:55:33.046Z] 15:55:33 WARNING - return self.receive()
[task 2024-01-22T15:55:33.046Z] 15:55:33 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/transport.py", line 229, in receive
[task 2024-01-22T15:55:33.046Z] 15:55:33 WARNING - raise socket.error("No data received over socket")
[task 2024-01-22T15:55:33.046Z] 15:55:33 WARNING - OSError: No data received over socket
Comment 1•5 months ago
|
||
There is a second failure with this index underflow. Both have CacheStorageService
in their stack.
Jan, could you investigate this issue?
Comment 2•5 months ago
•
|
||
This belongs to the other cache, networking cache.
Updated•5 months ago
|
Comment 3•5 months ago
|
||
This specific crash is a release assert so it isn't exactly a security issue, but it does look like there was an integer underflow so we can leave it hidden for now. This is inside an nsTArray Sort call, so this is likely related to bug 1839051. Maybe Jens can tell if there's something funky with ExpirationComparator that violates some assumption of the new sorting.
Updated•5 months ago
|
Comment 4•5 months ago
|
||
(It looks like 18446744073709551615 is -1.)
Assignee | ||
Comment 5•5 months ago
•
|
||
The comparator as such looks pretty trivial. But is there a chance that mozilla::net::DoUpdateExpirationTime
can happen while sorting? AFAICS CacheEntry::GetExpirationTime
asserts to be on a certain thread (and is used here by the comparator), but CacheEntry::SetExpirationTime
does not, not sure if that means it is supposed to be called from other threads, too? If so we would definitely need to make a copy of the array behind a mutex before sorting or otherwise add some mutex that encloses the entire sorting.
:jesup, wdyt?
Edit: There is also this scary comment: // Aligned assignment, thus atomic.
- that hints indeed to some concurrency here? It is worth mentioning that sorting in general can only work reliably if the comparator results for given a and b do not change while sorting and std::sort
in particular is not doing any additional bounds checks if things go wrong.
Updated•5 months ago
|
Assignee | ||
Comment 6•5 months ago
|
||
(removing severity to make this show up in triage)
Assignee | ||
Updated•5 months ago
|
Assignee | ||
Updated•5 months ago
|
Assignee | ||
Comment 7•5 months ago
|
||
There is also this crash in the same place, but the signature is a bit general to be added.
Assignee | ||
Comment 8•5 months ago
|
||
A CacheEntry's mExpirationTime can be updated asynchronously while sorting. This is at best undefined behavior and at worst can derail std::sort's algorithm out of bounds. While we are here, we also can use the snapshot copy to iterate while purging and thus modifying the underlying array and make use of the sorted array to break the iteration earlier.
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Comment 9•5 months ago
|
||
Site-scout has reported this issue a few times as well. I've been trying to reproduce it but I can't.
Updated•5 months ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 12•5 months ago
|
||
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1d43a9352af4 Refactor MemoryPool::PurgeExpired and PurgeByFrecency to reduce overhead and protect against races while sorting. r=necko-reviewers,valentin
Comment 13•5 months ago
|
||
bugherder |
Comment 14•5 months ago
|
||
The patch landed in nightly and beta is affected.
:jstutte, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox123
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 15•5 months ago
|
||
My gut feeling is we need no uplift, but I leave the final say to :valentin here.
The crash volume isn't huge, and this is a fairly complex refactoring, so I think it's OK to let it ride the trains.
Comment 18•4 months ago
|
||
Copying crash signatures from duplicate bugs.
Assignee | ||
Comment 20•2 months ago
|
||
The instances that showed up on Fenix are unrelated to MemoryPool
, AFAICT. They happen on the MediaSupervisor
thread and flag aMozPromise<mozilla::media::TimeUnit, mozilla::SeekRejectValue, true>::AssertIsDead()
which I believe means we destroy a promise that did not resolve or reject, yet. I think we do not have a bug for those.
Assignee | ||
Updated•29 days ago
|
Description
•