Open Bug 1594450 Opened 1 year ago Updated 1 year ago

Intermittent /service-workers/cache-storage/serviceworker/cache-match.https.html | application crashed [@ mozilla::dom::cache::Context::RemoveActivity(mozilla::dom::cache::Context::Activity*)]

Categories

(Core :: DOM: Service Workers, defect, P3)

defect

Tracking

()

People

(Reporter: intermittent-bug-filer, Unassigned)

Details

(Keywords: crash, intermittent-failure)

Crash Data

Filed by: malexandru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=274833763&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/MZE06CvkT9q6-XxaG3p_9g/runs/0/artifacts/public/logs/live_backing.log


[task 2019-11-06T14:58:37.869Z] 14:58:37 INFO - TEST-OK | /service-workers/cache-storage/serviceworker/cache-keys.https.html | took 896ms
[task 2019-11-06T14:58:37.869Z] 14:58:37 INFO - TEST-START | /service-workers/cache-storage/serviceworker/cache-match.https.html
[task 2019-11-06T14:58:37.869Z] 14:58:37 INFO - Closing window 26
[task 2019-11-06T14:58:39.839Z] 14:58:39 INFO - mozcrash .........................
[task 2019-11-06T14:58:39.839Z] 14:58:39 INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/VB3fzn-7SPyCzUdAhdgpCQ/artifacts/public/build/target.crashreporter-symbols.zip
[task 2019-11-06T14:58:42.508Z] 14:58:42 INFO - mozcrash Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpE2MHTQ/26d4002a-0e84-99c2-0ed8-a87a78067516.dmp /tmp/tmpJFjUKQ
[task 2019-11-06T14:58:45.143Z] 14:58:45 INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/26d4002a-0e84-99c2-0ed8-a87a78067516.dmp
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO - PROCESS-CRASH | /service-workers/cache-storage/serviceworker/cache-match.https.html | application crashed [@ mozilla::dom::cache::Context::RemoveActivity(mozilla::dom::cache::Context::Activity*)]
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO - Crash dump filename: /tmp/tmpE2MHTQ/26d4002a-0e84-99c2-0ed8-a87a78067516.dmp
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO - Operating system: Android
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO - 0.0.0 Linux 3.10.0+ #260 SMP PREEMPT Fri May 19 12:48:14 PDT 2017 x86_64
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO - CPU: amd64
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO - family 6 model 6 stepping 3
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO - 4 CPUs
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO -
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO - GPU: UNKNOWN
[task 2019-11-06T14:58:45.254Z] 14:58:45 INFO -
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Crash address: 0x7c79880d
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Process uptime: not available
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO -
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Thread 34 (crashed)
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - 0 libxul.so!mozilla::dom::cache::Context::RemoveActivity(mozilla::dom::cache::Context::Activity*) [Context.cpp:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 999 + 0x7]
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rax = 0x000000007c79880d rdx = 0x0000000000000008
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rcx = 0x00007c798851be18 rbx = 0x00007c7975911ad8
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rsi = 0x00007c7975911ad0 rdi = 0x00007c7977fc0002
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rbp = 0x00007c797b343d20 rsp = 0x00007c797b343ce0
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - r8 = 0x0000000000000000 r9 = 0x0000000000000000
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000246
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - r12 = 0x00007c7975911ac0 r13 = 0x0000000000000001
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - r14 = 0x00007c7977fc0002 r15 = 0x00007c797bfa6e40
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rip = 0x00007c7983ca7f6f
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Found by: given as instruction pointer in context
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - 1 libxul.so!mozilla::dom::cache::Context::ActionRunnable::Clear() [Context.cpp:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 539 + 0x9]
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rbp = 0x00007c797b343d40 rsp = 0x00007c797b343d30
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rip = 0x00007c7983ca6f03
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Found by: previous frame's frame pointer
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - 2 libxul.so!mozilla::dom::cache::Context::ActionRunnable::Run() [Context.cpp:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 654 + 0x8]
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rbp = 0x00007c797b343d60 rsp = 0x00007c797b343d50
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rip = 0x00007c7983ca6e8e
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Found by: previous frame's frame pointer
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - 3 libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 1225 + 0xa]
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rbp = 0x00007c797b3442a0 rsp = 0x00007c797b343d70
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rip = 0x00007c7982bd3192
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Found by: previous frame's frame pointer
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - 4 libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 486 + 0x10]
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rbp = 0x00007c797b3442d0 rsp = 0x00007c797b3442b0
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rip = 0x00007c7982bd47e1
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Found by: previous frame's frame pointer
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - 5 libxul.so!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [MessagePump.cpp:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 333 + 0xb]
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rbp = 0x00007c797b344310 rsp = 0x00007c797b3442e0
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - rip = 0x00007c7982f6321b
[task 2019-11-06T14:58:45.255Z] 14:58:45 INFO - Found by: previous frame's frame pointer
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - 6 libxul.so!MessageLoop::Run() [message_loop.cc:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 290 + 0xc]
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rbp = 0x00007c797b344350 rsp = 0x00007c797b344320
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rip = 0x00007c7982f37f3f
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - Found by: previous frame's frame pointer
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - 7 libxul.so!nsThread::ThreadFunc(void*) [nsThread.cpp:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 458 + 0x8]
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rbp = 0x00007c797b3443a0 rsp = 0x00007c797b344360
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rip = 0x00007c7982bd1899
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - Found by: previous frame's frame pointer
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - 8 libnss3.so!_pt_root [ptthread.c:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 201 + 0x7]
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rbp = 0x00007c797b3443e0 rsp = 0x00007c797b3443b0
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rip = 0x00007c79886b9311
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - Found by: previous frame's frame pointer
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - 9 libnss3.so!pt_recvfrom_cont [ptio.c:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 0 + 0x5]
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rbp = 0x00007c797b3443e0 rsp = 0x00007c797b3443d0
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rip = 0x00007c79886b921e
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - Found by: stack scanning
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - 10 libc.so + 0x89772
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rsp = 0x00007c797b3443f0 rip = 0x00007c79a57f4772
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - Found by: stack scanning
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - 11 libnss3.so!pt_recvfrom_cont [ptio.c:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 0 + 0x5]
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rsp = 0x00007c797b344428 rip = 0x00007c79886b921e
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - Found by: stack scanning
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - 12 libc.so + 0x299ec
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rsp = 0x00007c797b344440 rip = 0x00007c79a57949ec
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - Found by: stack scanning
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - 13 libc.so + 0x896c0
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rsp = 0x00007c797b344448 rip = 0x00007c79a57f46c0
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - Found by: stack scanning
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - 14 libc.so + 0x1ca66
[task 2019-11-06T14:58:45.256Z] 14:58:45 INFO - rsp = 0x00007c797b344450 rip = 0x00007c79a5787a66
[task 2019-11-06T14:58:45.257Z] 14:58:45 INFO - Found by: stack scanning
[task 2019-11-06T14:58:45.257Z] 14:58:45 INFO - 15 libnss3.so!pt_recvfrom_cont [ptio.c:db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7 : 0 + 0x5]
[task 2019-11-06T14:58:45.257Z] 14:58:45 INFO - rsp = 0x00007c797b3444b8 rip = 0x00007c79886b921e
[task 2019-11-06T14:58:45.257Z] 14:58:45 INFO - Found by: stack scanning
[task 2019-11-06T14:58:45.257Z] 14:58:45 INFO -
[task 2019-11-06T14:58:45.257Z] 14:58:45 INFO - Thread 0

tl;dr: Looks like memory corruption and is not immediately actionable. If we could see if this reproduces under rr that would be ideal. The crash address 7c79880d directly seems to correspond to a memory space used by the dynamic linker/dalvik, but that hasn't been used yet or has been completely freed. Per minidump_dump, an excerpt around there:

75d16000-75e83000 rw-p 00000000 00:04 750                                /dev/ashmem/dalvik-zygote space (deleted)
75e83000-75e84000 rw-p 00000000 00:04 898                                /dev/ashmem/dalvik-non moving space (deleted)
75e84000-75ea9000 rw-p 00001000 00:04 898                                /dev/ashmem/dalvik-non moving space (deleted)
75ea9000-79917000 ---p 00026000 00:04 898                                /dev/ashmem/dalvik-non moving space (deleted)
79917000-79d16000 rw-p 03a94000 00:04 898                                /dev/ashmem/dalvik-non moving space (deleted)
7c79880d GOES HERE
11cde640000-11cde680000 rw-p 00000000 00:00 0                            [anon:js-gc-heap]
168c5180000-168c51c0000 rw-p 00000000 00:00 0                            [anon:js-gc-heap]
2a9cedc0000-2a9cee00000 rw-p 00000000 00:00 0                            [anon:js-gc-heap]
2b5b2b00000-2b5b2b40000 rw-p 00000000 00:00 0                            [anon:js-gc-heap]
39bbd380000-39bbd3c0000 rw-p 00000000 00:00 0                            [anon:js-gc-heap]
4a98e4c0000-4a98e500000 rw-p 00000000 00:00 0                            [anon:js-gc-heap]

And if we viewed the location as a right-shifted pointer, I think that would put it in one of these bottom 2 mappings:

7c7982b36000-7c7987ed4000 r-xp 000fd000 fd:20 7135                       /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so
7c7987ed4000-7c79880d4000 ---p 00000000 00:00 0 
7c79880d4000-7c79884e7000 r--p 0549b000 fd:20 7135                       /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so

Longer Analysis

Aside: There doesn't seem to be a [DUMPID].extra file that includes the threads. I'm really confused how we get so many different scenarios for how the minidump processing workings in our tests.

Superficially, this looks like a lifecycle issue, but I don't think it is. We're on the PBackground thread and we seem to be exactly in the MOZ_ALWAYS_TRUE(mActivityList.RemoveElement(aActivity)); line of Context::RemoveActivity called from ContextActivity::Clear called from Context::ActionRunnable::Run in the STATE_COMPLETING state.

However, it's not immediately clear how this could be going wrong given that:

  • The state machine explicitly knows to clear on PBackground. This isn't the case where the runnable's destructor is running on the originally dispatching thread due to thread scheduling.
  • There should be sufficient strong references:
    • The Context::ActionRunnable is held strongly by the message loop while it's executing. This is the aActivity passed to RemoveActivity
    • The runnable holds a strong reference to the Context. whcih is the mContext. Additionally, there's a diagnostic assert on mContext existing, and we would have crashed with a NULL pointer deref if it somehow wasn't.

Given that the lifecycle management looks strong, the most straightforward explanations are then:

  • Memory corruption via rogue-write. Per the tl;dr section at the top, this looks like a right-shifted libxul pointer address, which is weird.
  • The debug-only NS_ASSERT_OWNINGTHREAD(Context); invariants around mutation of mActivityList are being violated and a different thread caused the underlying nsTObserverArray<Activity*> to reallocate its storage somewhat concurrently. This doesn't really make sense given the memory address we're crashing on, though.
Priority: -- → P3

Oh, and crash disassembly:

https://hg.mozilla.org/integration/autoland/annotate/db7fb8a98e5aadc3581cfdaa8dc23b4fa2fbaeb7/dom/cache/Context.cpp#l993
  993   mActivityList.AppendElement(aActivity);
   7c7983ca7eef:	48 8b 83 80 00 00 00 	mov rax, qword ptr [rbx + 0x80]
   7c7983ca7ef6:	8b 30                	mov esi, dword ptr [rax]
   7c7983ca7ef8:	48 ff c6             	inc rsi
   7c7983ca7efb:	6a 08                	push 8
   7c7983ca7efd:	5a                   	pop rdx
   7c7983ca7efe:	e8 15 ae ea fe       	call 0x7c7982b52d18
   7c7983ca7f03:	48 8b 83 80 00 00 00 	mov rax, qword ptr [rbx + 0x80]
   7c7983ca7f0a:	8b 08                	mov ecx, dword ptr [rax]
   7c7983ca7f0c:	4c 89 74 c8 08       	mov qword ptr [rax + rcx*8 + 8], r14
   7c7983ca7f11:	48 8b 83 80 00 00 00 	mov rax, qword ptr [rbx + 0x80]
   7c7983ca7f18:	ff 00                	inc dword ptr [rax]
  994 }
   7c7983ca7f1a:	5b                   	pop rbx
   7c7983ca7f1b:	41 5e                	pop r14
   7c7983ca7f1d:	5d                   	pop rbp
   7c7983ca7f1e:	c3                   	ret 
  991   MOZ_DIAGNOSTIC_ASSERT(aActivity);
   7c7983ca7f1f:	48 8d 05 d3 03 7b 02 	lea rax, qword ptr [rip + 0x27b03d3]
   7c7983ca7f26:	48 8b 0d 5b d1 83 04 	mov rcx, qword ptr [rip + 0x483d15b]
   7c7983ca7f2d:	48 89 01             	mov qword ptr [rcx], rax
   7c7983ca7f30:	c7 04 25 00 00 00 00 	mov dword ptr [0], 0x3df
   7c7983ca7f37:	df 03 00 00          	
   7c7983ca7f3b:	e8 e0 f4 e8 fe       	call 0x7c7982b37420
  996 void Context::RemoveActivity(Activity* aActivity) {
   7c7983ca7f40:	55                   	push rbp
   7c7983ca7f41:	48 89 e5             	mov rbp, rsp
   7c7983ca7f44:	41 57                	push r15
   7c7983ca7f46:	41 56                	push r14
   7c7983ca7f48:	41 54                	push r12
   7c7983ca7f4a:	53                   	push rbx
   7c7983ca7f4b:	48 83 ec 20          	sub rsp, 0x20
   7c7983ca7f4f:	64 48 8b 04 25 28 00 	mov rax, qword ptr fs:[0x28]
   7c7983ca7f56:	00 00                	
   7c7983ca7f58:	48 89 45 d8          	mov qword ptr [rbp - 0x28], rax
  998   MOZ_DIAGNOSTIC_ASSERT(aActivity);
   7c7983ca7f5c:	48 85 f6             	test rsi, rsi
   7c7983ca7f5f:	0f 84 be 00 00 00    	je 0x7c7983ca8023
   7c7983ca7f65:	49 89 fe             	mov r14, rdi
  999   MOZ_ALWAYS_TRUE(mActivityList.RemoveElement(aActivity));
   7c7983ca7f68:	48 8b 87 80 00 00 00 	mov rax, qword ptr [rdi + 0x80]
   7c7983ca7f6f:	8b 08                	mov ecx, dword ptr [rax]
^^^^^^^^^^^^^^^^	^^^^^^^^^^^^^^^^^^^^^	^^^^^^^^^^^^^^^^^^^^^^^^
   7c7983ca7f71:	48 85 c9             	test rcx, rcx
   7c7983ca7f74:	0f 84 8d 00 00 00    	je 0x7c7983ca8007
   7c7983ca7f7a:	4d 8d be 80 00 00 00 	lea r15, qword ptr [r14 + 0x80]
   7c7983ca7f81:	48 83 c0 08          	add rax, 8
   7c7983ca7f85:	48 c1 e1 03          	shl rcx, 3
   7c7983ca7f89:	48 89 c3             	mov rbx, rax
   7c7983ca7f8c:	48 39 33             	cmp qword ptr [rbx], rsi
   7c7983ca7f8f:	74 0c                	je 0x7c7983ca7f9d
   7c7983ca7f91:	48 83 c3 08          	add rbx, 8
   7c7983ca7f95:	48 83 c1 f8          	add rcx, -8
   7c7983ca7f99:	75 f1                	jne 0x7c7983ca7f8c
   7c7983ca7f9b:	eb 6a                	jmp 0x7c7983ca8007
   7c7983ca7f9d:	48 29 c3             	sub rbx, rax
   7c7983ca7fa0:	48 83 fb f8          	cmp rbx, -8
   7c7983ca7fa4:	74 61                	je 0x7c7983ca8007
   7c7983ca7fa6:	48 c1 fb 03          	sar rbx, 3
   7c7983ca7faa:	4c 8d 65 c8          	lea r12, qword ptr [rbp - 0x38]
   7c7983ca7fae:	49 89 1c 24          	mov qword ptr [r12], rbx
   7c7983ca7fb2:	41 c6 44 24 08 01    	mov byte ptr [r12 + 8], 1
   7c7983ca7fb8:	6a 01                	push 1
   7c7983ca7fba:	5e                   	pop rsi
   7c7983ca7fbb:	4c 89 e7             	mov rdi, r12
   7c7983ca7fbe:	e8 df ac ea fe       	call 0x7c7982b52ca2
   7c7983ca7fc3:	41 80 7c 24 08 00    	cmp byte ptr [r12 + 8], 0
   7c7983ca7fc9:	74 7e                	je 0x7c7983ca8049
   7c7983ca7fcb:	48 8d 7d c8          	lea rdi, qword ptr [rbp - 0x38]
   7c7983ca7fcf:	e8 cc ae ea fe       	call 0x7c7982b52ea0
   7c7983ca7fd4:	49 8b 0f             	mov rcx, qword ptr [r15]
   7c7983ca7fd7:	8b 31                	mov esi, dword ptr [rcx]
   7c7983ca7fd9:	48 39 f0             	cmp rax, rsi
   7c7983ca7fdc:	77 70                	ja 0x7c7983ca804e
   7c7983ca7fde:	6a 01                	push 1
   7c7983ca7fe0:	5a                   	pop rdx
   7c7983ca7fe1:	6a 08                	push 8
   7c7983ca7fe3:	41 58                	pop r8
   7c7983ca7fe5:	4c 89 ff             	mov rdi, r15
   7c7983ca7fe8:	48 89 de             	mov rsi, rbx
   7c7983ca7feb:	31 c9                	xor ecx, ecx
You need to log in before you can comment on or make changes to this bug.