Closed Bug 1768594 Opened 2 years ago Closed 2 years ago

Intermittent awsy/test_memory_usage.py TestMemoryUsage.test_open_tabs | application crashed [@ zone_size]

Categories

(Core :: JavaScript Engine, defect, P3)

Unspecified
macOS
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: crash, intermittent-failure)

Crash Data

Filed by: bszekely [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=377507344&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/WnajqWeNRwmT2l6PxaIF7w/runs/0/artifacts/public/logs/live_backing.log


[task 2022-05-10T06:03:26.362Z] 06:03:26     INFO - TEST-START | awsy/test_memory_usage.py TestMemoryUsage.test_open_tabs
[task 2022-05-10T06:03:26.375Z] 06:03:26     INFO - areweslimyet run by 0 pages, 1 iterations, 15 perTabPause, 30 settleWaitTime
[task 2022-05-10T06:03:26.385Z] 06:03:26     INFO - setting up
[task 2022-05-10T06:03:26.417Z] 06:03:26     INFO - mozproxy mozproxy_dir used for mitmproxy downloads and exe files: /opt/worker/tasks/task_165216248867635/build/tests/html/testing/mozproxy
[task 2022-05-10T06:03:26.419Z] 06:03:26     INFO - mozproxy Playback tool: mitmproxy
[task 2022-05-10T06:03:26.419Z] 06:03:26     INFO - mozproxy Playback tool version: 5.1.1
[task 2022-05-10T06:03:26.419Z] 06:03:26     INFO - mozproxy create mitmproxy 5.1.1 dir
[task 2022-05-10T06:03:26.419Z] 06:03:26     INFO - mozproxy downloading mitmproxy binary
[task 2022-05-10T06:03:26.533Z] 06:03:26     INFO - mozproxy b'INFO - File mitmproxy-5.1.1-osx.tar.gz not present in local cache folder /builds/tooltool_cache'
<...>
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - PROCESS-CRASH | awsy/test_memory_usage.py TestMemoryUsage.test_open_tabs | application crashed [@ zone_size]
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - Mozilla crash reason: MOZ_DIAGNOSTIC_ASSERT(size != 0)
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - Crash dump filename: /var/folders/_h/_ptw4k5n7g5288_0xrxn1mzc000014/T/tmp1rf3j1ot.profile/minidumps/8B4EAA84-648F-4C56-902F-A8CBF63B5445.dmp
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - Operating system: Mac OS X
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO -                   10.15.7 19H524
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - CPU: amd64
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO -      family 6 model 158 stepping 10
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO -      12 CPUs
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - 
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - Crash address: 0x0
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - Mac Crash Info:
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - 
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - Process uptime: 629 seconds
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - 
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO - Thread 0 MainThread (crashed)
[task 2022-05-10T06:17:50.077Z] 06:17:50     INFO -  0  libmozglue.dylib!zone_size [zone.c:f1e87b2a9a25aa0d27de002ba9c6c1e170ed6e32 : 113 + 0x24c]
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -      rax = 0x000000010c82dcc3    rdx = 0x0000000000000020
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -      rcx = 0x000000010c835368    rbx = 0x0000000000000047
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -      rsi = 0x00000001134477d8    rdi = 0x000000010c8353b8
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -      rbp = 0x00007ffeedeedcf0    rsp = 0x00007ffeedeedce0
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -       r8 = 0x00000000000072fe     r9 = 0xffffffff00000000
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -      r10 = 0x0000000109b3d328    r11 = 0x000000010c900c40
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -      r12 = 0x0000000000000000    r13 = 0x0000000000000000
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -      r14 = 0x0000000000000000    r15 = 0x0000000000000020
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -      rip = 0x000000010c7a3796
[task 2022-05-10T06:17:50.078Z] 06:17:50     INFO -     Found by: given as instruction pointer in context

Another GlobalHelperThreadState::addSizeOfIncludingThis crash.

Component: AWSY → JavaScript Engine
Product: Testing → Core
See Also: → 1722406
See Also: → 1638464

Steve fixed one issue with this code in bug 1638464, but it looks like one remains.

It looks like thread 26 is holding a mutex while tracing a GlobalHelperThreadState, which seems like it could be related, though I don't know how it would cause this issue.

[task 2022-05-10T06:17:50.146Z] 06:17:50 INFO - Thread 26 DOM Worker
[task 2022-05-10T06:17:50.147Z] 06:17:50 INFO - 0 libsystem_kernel.dylib!__psynch_mutexwait + 0xa
[task 2022-05-10T06:17:50.147Z] 06:17:50 INFO - 1 libsystem_pthread.dylib!_pthread_mutex_firstfit_lock_slow + 0xdd
[task 2022-05-10T06:17:50.147Z] 06:17:50 INFO - 2 libmozglue.dylib!mozilla::detail::MutexImpl::lock() [Mutex_posix.cpp:f1e87b2a9a25aa0d27de002ba9c6c1e170ed6e32 : 118 + 0xa]
[task 2022-05-10T06:17:50.148Z] 06:17:50 INFO - 3 XUL!js::GlobalHelperThreadState::trace(JSTracer*) [HelperThreads.cpp:f1e87b2a9a25aa0d27de002ba9c6c1e170ed6e32 : 2522 + 0xb]
[task 2022-05-10T06:17:50.148Z] 06:17:50 INFO - 4 XUL!js::gc::GCRuntime::traceRuntimeCommon(JSTracer*, js::gc::GCRuntime::TraceOrMarkRuntime) [RootMarking.cpp:f1e87b2a9a25aa0d27de002ba9c6c1e170ed6e32 : 350 + 0xe]
[task 2022-05-10T06:17:50.148Z] 06:17:50 INFO - 5 XUL!js::gc::GCRuntime::traceRuntimeForMajorGC(JSTracer*, js::gc::AutoGCSession&) [RootMarking.cpp:f1e87b2a9a25aa0d27de002ba9c6c1e170ed6e32 : 247 + 0xf]

Okay, based on the line number, the actual line we're crashing on inside the helper state code is this:
gcParallelWorklist_.sizeOfExcludingThis(mallocSizeOf) +

The actual crash is on a jemalloc assertion: MOZ_DIAGNOSTIC_ASSERT(size != 0)

Steve, any ideas? Thanks.

Flags: needinfo?(sphink)
Severity: S4 → S3
Priority: -- → P3

(In reply to Andrew McCreight [:mccr8] from comment #2)

It looks like thread 26 is holding a mutex while tracing a GlobalHelperThreadState, which seems like it could be related, though I don't know how it would cause this issue.

Rather, GlobalHelperThreadState::addSizeOfIncludingThis() should already have the lock held, and thread 26 is blocked trying to acquire the lock. The lock would have been taken at the top of CollectGlobalStats.

So I guess the jemalloc bookkeeping for an element ofgcParallelWorklist_ (an intrusive linked list via mozilla::LinkedList) is corrupted here. The list is only added to in js::GCParallelTask::startWithLockHeld. But I guess some other (neighboring?) object could be corrupting the memory.

From the specific assert (though I don't know how mccr8 worked that out), it seems like the pointer is in a valid jemalloc arena, its chunk's mapbits say the pointer is for a larger allocation (CHUNK_MAP_LARGE is set), but the same mapbits have zero for their upper size bits. Which I guess just means there is malloc bookkeeping corruption.

I'm not sure how to proceed with this other than somehow arranging for this list scan to happen more often, particularly during a GC.

I will note that this seems to only happen on OSX.

Flags: needinfo?(sphink)
OS: Unspecified → macOS
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.