Closed Bug 1875859 Opened 5 months ago Closed 5 months ago

Intermittent Hit MOZ_CRASH(ElementAt(aIndex = 18446744073709551615, aLength = 920)) at /builds/worker/checkouts/gecko/mfbt/Assertions.cpp:51 | single tracking bug

Categories

(Core :: Networking: Cache, defect, P2)

defect

Tracking

()

RESOLVED FIXED
124 Branch
Tracking Status
firefox123 --- wontfix
firefox124 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jstutte)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, intermittent-testcase, Whiteboard: [necko-triaged][necko-priority-new])

Crash Data

Attachments

(1 file)

Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=444144151&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bzWHdERYQNqmJb1zf2rZDQ/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bzWHdERYQNqmJb1zf2rZDQ/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1


[task 2024-01-22T15:55:32.879Z] 15:55:32     INFO - TEST-START | /css/css-flexbox/flexbox-with-multi-column-property.html
[task 2024-01-22T15:55:32.882Z] 15:55:32     INFO - PID 12084 | 1705938932881	Marionette	INFO	Testing http://web-platform.test:8000/css/css-flexbox/flexbox-with-multi-column-property.html == http://web-platform.test:8000/css/css-flexbox/reference/flexbox-with-multi-column-property-ref.html
[task 2024-01-22T15:55:32.895Z] 15:55:32     INFO - PID 12084 | [Child 12509, Main Thread] WARNING: JSWindowActorChild::SendRawMessage (MarionetteReftest, MarionetteReftestParent:reftestWait) not sent: !CanSend() || !mManager || !mManager->CanSend(): file /builds/worker/checkouts/gecko/dom/ipc/jsactor/JSWindowActorChild.cpp:61
[task 2024-01-22T15:55:32.904Z] 15:55:32     INFO - PID 12084 | Hit MOZ_CRASH(ElementAt(aIndex = 18446744073709551615, aLength = 920)) at /builds/worker/checkouts/gecko/mfbt/Assertions.cpp:51
[task 2024-01-22T15:55:32.909Z] 15:55:32     INFO - PID 12084 | ExceptionHandler::GenerateDump cloned child ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2024-01-22T15:55:32.910Z] 15:55:32     INFO - PID 12084 | 14073
[task 2024-01-22T15:55:32.910Z] 15:55:32     INFO - PID 12084 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2024-01-22T15:55:33.025Z] 15:55:33     INFO - PID 12084 | [WARN  firefox_on_glean::private::timing_distribution] Elapsed milliseconds larger than fits into 32-bytes. Saturating at u32::MAX.
[task 2024-01-22T15:55:33.038Z] 15:55:33     INFO - PID 12084 | [Child 12695, IPC I/O Child] WARNING: [6DC4289873E3BC2C.564AC89C54A0EB5C]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:352
[task 2024-01-22T15:55:33.042Z] 15:55:33  WARNING - Exception in TextExecutor.run:
[task 2024-01-22T15:55:33.043Z] 15:55:33  WARNING - Traceback (most recent call last):
[task 2024-01-22T15:55:33.043Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/base.py", line 313, in run_test
[task 2024-01-22T15:55:33.043Z] 15:55:33  WARNING -     result = self.do_test(test)
[task 2024-01-22T15:55:33.044Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 1115, in do_test
[task 2024-01-22T15:55:33.044Z] 15:55:33  WARNING -     result = self.implementation.run_test(test)
[task 2024-01-22T15:55:33.044Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 1192, in run_test
[task 2024-01-22T15:55:33.044Z] 15:55:33  WARNING -     rv = self.executor.protocol.marionette._send_message("reftest:run",
[task 2024-01-22T15:55:33.044Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/decorators.py", line 34, in _
[task 2024-01-22T15:55:33.044Z] 15:55:33  WARNING -     m._handle_socket_failure()
[task 2024-01-22T15:55:33.044Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/marionette.py", line 820, in _handle_socket_failure
[task 2024-01-22T15:55:33.045Z] 15:55:33  WARNING -     reraise(exc_cls, exc, tb)
[task 2024-01-22T15:55:33.045Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/third_party/six/six.py", line 719, in reraise
[task 2024-01-22T15:55:33.045Z] 15:55:33  WARNING -     raise value
[task 2024-01-22T15:55:33.045Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/decorators.py", line 24, in _
[task 2024-01-22T15:55:33.045Z] 15:55:33  WARNING -     return func(*args, **kwargs)
[task 2024-01-22T15:55:33.045Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/marionette.py", line 769, in _send_message
[task 2024-01-22T15:55:33.046Z] 15:55:33  WARNING -     msg = self.client.request(name, params)
[task 2024-01-22T15:55:33.046Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/transport.py", line 382, in request
[task 2024-01-22T15:55:33.046Z] 15:55:33  WARNING -     return self.receive()
[task 2024-01-22T15:55:33.046Z] 15:55:33  WARNING -   File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/transport.py", line 229, in receive
[task 2024-01-22T15:55:33.046Z] 15:55:33  WARNING -     raise socket.error("No data received over socket")
[task 2024-01-22T15:55:33.046Z] 15:55:33  WARNING - OSError: No data received over socket

There is a second failure with this index underflow. Both have CacheStorageService in their stack.

Jan, could you investigate this issue?

Group: dom-core-security
Component: Audio/Video: Playback → Storage: Cache API
Flags: needinfo?(jvarga)

This belongs to the other cache, networking cache.

Component: Storage: Cache API → Networking: Cache
Flags: needinfo?(jvarga)

This specific crash is a release assert so it isn't exactly a security issue, but it does look like there was an integer underflow so we can leave it hidden for now. This is inside an nsTArray Sort call, so this is likely related to bug 1839051. Maybe Jens can tell if there's something funky with ExpirationComparator that violates some assumption of the new sorting.

Flags: needinfo?(jstutte)
Group: dom-core-security → network-core-security

(It looks like 18446744073709551615 is -1.)

The comparator as such looks pretty trivial. But is there a chance that mozilla::net::DoUpdateExpirationTime can happen while sorting? AFAICS CacheEntry::GetExpirationTime asserts to be on a certain thread (and is used here by the comparator), but CacheEntry::SetExpirationTime does not, not sure if that means it is supposed to be called from other threads, too? If so we would definitely need to make a copy of the array behind a mutex before sorting or otherwise add some mutex that encloses the entire sorting.

:jesup, wdyt?

Edit: There is also this scary comment: // Aligned assignment, thus atomic. - that hints indeed to some concurrency here? It is worth mentioning that sorting in general can only work reliably if the comparator results for given a and b do not change while sorting and std::sort in particular is not doing any additional bounds checks if things go wrong.

Flags: needinfo?(jstutte)
Group: network-core-security

(removing severity to make this show up in triage)

Severity: S4 → --
Priority: P5 → --
Crash Signature: [@ mozilla::detail::InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsTArray_Impl<T>::Sort<T>(mozilla::net::(anonymous namespace)::ExpirationComparator const&) | mozilla::net::CacheStorageService::MemoryPool::PurgeExpired ]
Crash Signature: [@ mozilla::detail::InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsTArray_Impl<T>::Sort<T>(mozilla::net::(anonymous namespace)::ExpirationComparator const&) | mozilla::net::CacheStorageService::MemoryPool::PurgeExpired ] → [@ mozilla::detail::InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsTArray_Impl<T>::Sort<T>(mozilla::net::(anonymous namespace)::ExpirationComparator const&) | mozilla::net::CacheStorageService::MemoryPool::PurgeExpired ] [@ mozilla::detail::In…

There is also this crash in the same place, but the signature is a bit general to be added.

See Also: → 1877259

A CacheEntry's mExpirationTime can be updated asynchronously while sorting. This is at best undefined behavior and at worst can derail std::sort's algorithm out of bounds. While we are here, we also can use the snapshot copy to iterate while purging and thus modifying the underlying array and make use of the sorted array to break the iteration earlier.

Assignee: nobody → jstutte
Status: NEW → ASSIGNED
Attachment #9377090 - Attachment description: WIP: Bug 1875859 - Harden MemoryPool::PurgeExpired against races on mExpirationTime while sorting. r?#necko-reviewers → Bug 1875859 - Harden MemoryPool::PurgeExpired against races on mExpirationTime while sorting. r?#necko-reviewers
Severity: -- → S3
Flags: needinfo?(rjesup)
Priority: -- → P2
Whiteboard: [necko-triaged][necko-priority-new]
Attachment #9377090 - Attachment description: Bug 1875859 - Harden MemoryPool::PurgeExpired against races on mExpirationTime while sorting. r?#necko-reviewers → WIP: Bug 1875859 - Refactor MemoryPool::PurgeExpired and PurgeByFrecency to reduce overhead and protect against races while sorting. r?#necko-reviewers

Site-scout has reported this issue a few times as well. I've been trying to reproduce it but I can't.

Attachment #9377090 - Attachment description: WIP: Bug 1875859 - Refactor MemoryPool::PurgeExpired and PurgeByFrecency to reduce overhead and protect against races while sorting. r?#necko-reviewers → Bug 1875859 - Refactor MemoryPool::PurgeExpired and PurgeByFrecency to reduce overhead and protect against races while sorting. r?#necko-reviewers
See Also: → 1879800
Blocks: 1879814
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1d43a9352af4
Refactor MemoryPool::PurgeExpired and PurgeByFrecency to reduce overhead and protect against races while sorting. r=necko-reviewers,valentin
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → 124 Branch

The patch landed in nightly and beta is affected.
:jstutte, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox123 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(jstutte)

My gut feeling is we need no uplift, but I leave the final say to :valentin here.

Flags: needinfo?(jstutte) → needinfo?(valentin.gosu)

The crash volume isn't huge, and this is a fairly complex refactoring, so I think it's OK to let it ride the trains.

Flags: needinfo?(valentin.gosu)
Duplicate of this bug: 1870504

Copying crash signatures from duplicate bugs.

Crash Signature: mozilla::detail::InvalidArrayIndex_CRASH | std::sort<T>(mozilla::ArrayIterator<T>, mozilla::ArrayIterator<T>, nsTArray_Impl<T>::Sort<T>(mozilla::net::(anonymous namespace)::ExpirationComparator const&)::{lambda(auto:1 const&, auto:2 const&)#1}) ] → mozilla::detail::InvalidArrayIndex_CRASH | std::sort<T>(mozilla::ArrayIterator<T>, mozilla::ArrayIterator<T>, nsTArray_Impl<T>::Sort<T>(mozilla::net::(anonymous namespace)::ExpirationComparator const&)::{lambda(auto:1 const&, auto:2 const&)#1}) ] [@ mozi…

The instances that showed up on Fenix are unrelated to MemoryPool, AFAICT. They happen on the MediaSupervisor thread and flag aMozPromise<mozilla::media::TimeUnit, mozilla::SeekRejectValue, true>::AssertIsDead() which I believe means we destroy a promise that did not resolve or reject, yet. I think we do not have a bug for those.

Crash Signature: , auto:2 const&)#1}) ] [@ mozilla::detail::InvalidArrayIndex_CRASH] → , auto:2 const&)#1}) ] [@ mozilla::detail::InvalidArrayIndex_CRASH]
Crash Signature: mozilla::detail::InvalidArrayIndex_CRASH | std::sort<T>(mozilla::ArrayIterator<T>, mozilla::ArrayIterator<T>, nsTArray_Impl<T>::Sort<T>(mozilla::net::(anonymous namespace)::ExpirationComparator const&)::{lambda(auto:1 const&, auto:2 const&)#1}) ] [@ moz… → mozilla::detail::InvalidArrayIndex_CRASH | std::sort<T>(mozilla::ArrayIterator<T>, mozilla::ArrayIterator<T>, nsTArray_Impl<T>::Sort<T>(mozilla::net::(anonymous namespace)::ExpirationComparator const&)::{lambda(auto:1 const&, auto:2 const&)#1}) ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: