Intermittent Tier 2 testing/firefox-ui/tests/functional/safebrowsing/test_warning_pages.py TestSafeBrowsingWarningPages.test_warning_pages | application crashed [@ mozilla::Telemetry::CanRecordExtended()]
Categories
(Toolkit :: Telemetry, defect, P1)
Tracking
()
People
(Reporter: intermittent-bug-filer, Assigned: chutten)
References
Details
(5 keywords, Whiteboard: [post-critsmash-triage][adv-main71+r])
Crash Data
Attachments
(2 files)
Filed by: rgurzau [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=264324953&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/a3n_iDH3QOudZIKpTJPeXA/runs/0/artifacts/public/logs/live_backing.log
[task 2019-08-30T20:40:44.246Z] 20:40:44 INFO - ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2019-08-30T20:40:44.246Z] 20:40:44 INFO - ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2019-08-30T20:40:44.345Z] 20:40:44 INFO - Exiting due to channel error.
[task 2019-08-30T20:40:44.345Z] 20:40:44 INFO - [GFX1-]: Receive IPC close with reason=AbnormalShutdown
[task 2019-08-30T20:40:44.345Z] 20:40:44 INFO - [Child 904, Chrome_ChildThread] WARNING: pipe error (3): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 358
[task 2019-08-30T20:40:44.345Z] 20:40:44 INFO - Exiting due to channel error.
[task 2019-08-30T20:40:44.347Z] 20:40:44 INFO - [Child 1058, Chrome_ChildThread] WARNING: pipe error (3): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 358
[task 2019-08-30T20:40:44.348Z] 20:40:44 INFO - Exiting due to channel error.
[task 2019-08-30T20:40:44.410Z] 20:40:44 INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/HMWDQuhTTFGsU-Bpgf6jwQ/artifacts/public/build/target.crashreporter-symbols.zip
[task 2019-08-30T20:40:48.231Z] 20:40:48 INFO - mozcrash Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /builds/worker/workspace/build/tmpQousue.mozrunner/minidumps/4aead703-b6df-7310-9936-7040b9a16c41.dmp /tmp/tmpe_tevC
[task 2019-08-30T20:40:53.055Z] 20:40:53 INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/4aead703-b6df-7310-9936-7040b9a16c41.dmp
[task 2019-08-30T20:40:53.055Z] 20:40:53 INFO - mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/4aead703-b6df-7310-9936-7040b9a16c41.extra
[task 2019-08-30T20:40:53.209Z] 20:40:53 INFO - PROCESS-CRASH | testing/firefox-ui/tests/functional/safebrowsing/test_warning_pages.py TestSafeBrowsingWarningPages.test_warning_pages | application crashed [@ mozilla::Telemetry::CanRecordExtended()]
[task 2019-08-30T20:40:53.210Z] 20:40:53 INFO - Crash dump filename: /builds/worker/workspace/build/tmpQousue.mozrunner/minidumps/4aead703-b6df-7310-9936-7040b9a16c41.dmp
[task 2019-08-30T20:40:53.210Z] 20:40:53 INFO - Operating system: Linux
[task 2019-08-30T20:40:53.211Z] 20:40:53 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2019-08-30T20:40:53.211Z] 20:40:53 INFO - CPU: amd64
[task 2019-08-30T20:40:53.212Z] 20:40:53 INFO - family 6 model 85 stepping 4
[task 2019-08-30T20:40:53.213Z] 20:40:53 INFO - 1 CPU
[task 2019-08-30T20:40:53.213Z] 20:40:53 INFO -
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO - GPU: UNKNOWN
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO -
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO - Crash reason: SIGSEGV /SEGV_ACCERR
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO - Crash address: 0x7f355d0b537f
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO - Process uptime: not available
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO -
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO - Thread 17 (crashed)
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - 0 libxul.so!mozilla::Telemetry::CanRecordExtended() [Telemetry.cpp:466e8f97c54f14584c6c23d8a8057255e5957abb : 1997 + 0x0]
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - rax = 0x000005e3704fc2f0 rdx = 0x00000000430bd000
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - rcx = 0x000005e36e455b1a rbx = 0x00000000430bd000
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - rsi = 0x0000000000000002 rdi = 0x00007f35430bd000
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - rbp = 0x00007f354e88d630 rsp = 0x00007f354e88d5d8
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - r8 = 0x89c4ddff1b344a00 r9 = 0x89c4ddff1b344a00
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - r10 = 0x00007f354e88d5b0 r11 = 0x0000000000000246
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - r12 = 0x00007f35430bd000 r13 = 0x0000000000000001
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - r14 = 0x00007f354e88d601 r15 = 0x00007f354241a000
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - rip = 0x00007f355d0b537f
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - Found by: given as instruction pointer in context
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - 1 libxul.so!mozilla::layers::AsyncPanZoomController::ReportCheckerboard(mozilla::TimeStamp const&) [AsyncPanZoomController.cpp:466e8f97c54f14584c6c23d8a8057255e5957abb : 4352 + 0x5]
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - rbx = 0x00000000430bd000 rbp = 0x00007f354e88d630
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - rsp = 0x00007f354e88d5e0 r12 = 0x00007f35430bd000
[task 2019-08-30T20:40:53.217Z] 20:40:53 INFO - r13 = 0x0000000000000001 r14 = 0x00007f354e88d601
[task 2019-08-30T20:40:53.217Z] 20:40:53 INFO - r15 = 0x00007f354241a000 rip = 0x00007f355a34effe
[task 2019-08-30T20:40:53.217Z] 20:40:53 INFO - Found by: call frame info
[task 2019-08-30T20:40:53.217Z] 20:40:53 INFO - 2 libxul.so!mozilla::layers::ForEachNode<mozilla::layers::ForwardIterator, mozilla::layers::LayerMetricsWrapper, (lambda at /builds/worker/workspace/build/src/gfx/layers/TreeTraversal.h:189:14), (lambda at /builds/worker/workspace/build/src/gfx/layers/apz/src/APZSampler.cpp:111:7)> [TreeTraversal.h:466e8f97c54f14584c6c23d8a8057255e5957abb : 145 + 0x1a]
[task 2019-08-30T20:40:53.217Z] 20:40:53 INFO - rbx = 0x00007f35430bd000 rbp = 0x00007f354e88d690
[task 2019-08-30T20:40:53.217Z] 20:40:53 INFO - rsp = 0x00007f354e88d640 r12 = 0x00007f354e88d7c0
[task 2019-08-30T20:40:53.218Z] 20:40:53 INFO - r13 = 0x0000000000000000 r14 = 0x00007f354e88d6b0
[task 2019-08-30T20:40:53.218Z] 20:40:53 INFO - r15 = 0x00007f354241a000 rip = 0x00007f355a381b70
[task 2019-08-30T20:40:53.218Z] 20:40:53 INFO - Found by: call frame info
[task 2019-08-30T20:40:53.218Z] 20:40:53 INFO - 3 libxul.so!mozilla::layers::ForEachNode<mozilla::layers::ForwardIterator, mozilla::layers::LayerMetricsWrapper, (lambda at /builds/worker/workspace/build/src/gfx/layers/TreeTraversal.h:189:14), (lambda at /builds/worker/workspace/build/src/gfx/layers/apz/src/APZSampler.cpp:111:7)> [TreeTraversal.h:466e8f97c54f14584c6c23d8a8057255e5957abb : 142 + 0xe]
[task 2019-08-30T20:40:53.218Z] 20:40:53 INFO - rbx = 0x00007f354241a000 rbp = 0x00007f354e88d6f0
Comment hidden (Intermittent Failures Robot) |
Comment 2•5 years ago
|
||
Happened when loading https://www.itisatrap.org/firefox/its-an-attack.html
:
[task 2019-08-30T20:40:44.218Z] 20:40:44 INFO - 1567197644213 Marionette TRACE [39] Received DOM event DOMContentLoaded for https://www.itisatrap.org/firefox/its-an-attack.html
[task 2019-08-30T20:40:44.246Z] 20:40:44 INFO - ExceptionHandler::GenerateDump cloned child 1119
[task 2019-08-30T20:40:44.246Z] 20:40:44 INFO - ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2019-08-30T20:40:44.246Z] 20:40:44 INFO - ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2019-08-30T20:40:44.345Z] 20:40:44 INFO - Exiting due to channel error.
[task 2019-08-30T20:40:44.345Z] 20:40:44 INFO - [GFX1-]: Receive IPC close with reason=AbnormalShutdown
The crash itself seems to be security sensitive:
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO - Crash reason: SIGSEGV /SEGV_ACCERR
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO - Crash address: 0x7f355d0b537f
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO - Process uptime: not available
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO -
[task 2019-08-30T20:40:53.214Z] 20:40:53 INFO - Thread 17 (crashed)
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - 0 libxul.so!mozilla::Telemetry::CanRecordExtended() [Telemetry.cpp:466e8f97c54f14584c6c23d8a8057255e5957abb : 1997 + 0x0]
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - rax = 0x000005e3704fc2f0 rdx = 0x00000000430bd000
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - rcx = 0x000005e36e455b1a rbx = 0x00000000430bd000
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - rsi = 0x0000000000000002 rdi = 0x00007f35430bd000
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - rbp = 0x00007f354e88d630 rsp = 0x00007f354e88d5d8
[task 2019-08-30T20:40:53.215Z] 20:40:53 INFO - r8 = 0x89c4ddff1b344a00 r9 = 0x89c4ddff1b344a00
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - r10 = 0x00007f354e88d5b0 r11 = 0x0000000000000246
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - r12 = 0x00007f35430bd000 r13 = 0x0000000000000001
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - r14 = 0x00007f354e88d601 r15 = 0x00007f354241a000
[task 2019-08-30T20:40:53.216Z] 20:40:53 INFO - rip = 0x00007f355d0b537f
Assignee | ||
Comment 3•5 years ago
|
||
Shall I presume that the stack pointer being at Telemetry::CanRecordExtended
is more likely meaning TelemetryImpl::CanRecordExtended
instead?
If so, the only thing I can think of is if sTelemetry
is non-null, but is also not the singleton TelemetryImpl
. This would mean APZ is calling in to CanRecordExtended
before CreateTelemetryInstance
has been called (which is called the first time the Telemetry service is requested). Since APZ accesses the C++ API directly, it doesn't ask for the service, so this is entirely possible.
I think we should throw sTelemetry
in a DataMutex or Atomic or some other mutual exclusion data access thingy. Might not be the problem causing this crash, but it seems like a decent idea anyway.
ni?Dexter for a second pair of eyes.
Comment 4•5 years ago
|
||
(In reply to Chris H-C :chutten from comment #3)
If so, the only thing I can think of is if
sTelemetry
is non-null, but is also not the singletonTelemetryImpl
. This would mean APZ is calling in toCanRecordExtended
beforeCreateTelemetryInstance
has been called (which is called the first time the Telemetry service is requested). Since APZ accesses the C++ API directly, it doesn't ask for the service, so this is entirely possible.
Yes, I agree, this is entirely possible.
I think we should throw
sTelemetry
in a DataMutex or Atomic or some other mutual exclusion data access thingy. Might not be the problem causing this crash, but it seems like a decent idea anyway.
I agree on using Atomic or similar here. I wonder why this even crashes, though. The first thing we do in TelemetryImpl::CanRecordExtended
is to check for sTelemetry
being valid.
Assignee | ||
Comment 5•5 years ago
•
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #4)
I agree on using Atomic or similar here. I wonder why this even crashes, though. The first thing we do in
TelemetryImpl::CanRecordExtended
is to check forsTelemetry
being valid.
The first thing we check is if sTelemetry
is non-null, which is different from "valid". Imagine if we caught it with only one half of the pointer written (to my knowledge C++ doesn't require atomic assignment of pointer types).
And, come to think of it, there's nothing stopping sTelemetry from being nulled out after we've done the check, either. t1 could check sTelemetry
and it could be 100% fine, and then t2 comes in and shuts down Telemetry, and then t1 continues and tries to use sTelemetry
.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
|
||
It could very well happen that someone calls into a static method of
TelemetryImpl during Telemetry shutdown. There could be a race where sTelemetry
is non-null at the top of a method like CanRecordExtended, but is null by the
time we try to use it.
So let's put a DataMutex around it. This isn't trying to mutex the data itself.
Instead I'm just trying to protect places where the pointer is being read
or changed.
Comment 7•5 years ago
|
||
(In reply to Chris H-C :chutten from comment #5)
(In reply to Alessio Placitelli [:Dexter] from comment #4)
I agree on using Atomic or similar here. I wonder why this even crashes, though. The first thing we do in
TelemetryImpl::CanRecordExtended
is to check forsTelemetry
being valid.The first thing we check is if
sTelemetry
is non-null, which is different from "valid". Imagine if we caught it with only one half of the pointer written (to my knowledge C++ doesn't require atomic assignment of pointer types).
It does not require the type to be written "all at once", as it were (e.g. 64-bit types on 32-bit machines), but we can assume that word-size things are written with word-size load/store instructions. That is, they are written "all at once", and this assumption would apply to pointers on all the machines we care about.
Whether those stores are visible to other threads is a different matter. And whether the non-nullness remains true in the body of the if
after being checked in the conditional is also up for debate in a multi-threaded world, as you pointed out.
Assignee | ||
Comment 8•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 9•5 years ago
|
||
https://hg.mozilla.org/integration/autoland/rev/8dd82e13bdf6ec7198b4c4ea62e6c51d43e4f3ec
https://hg.mozilla.org/integration/autoland/rev/8c5640edebc663298f77cab04a849690e9e042d1
Comment 10•5 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/8dd82e13bdf6
https://hg.mozilla.org/mozilla-central/rev/8c5640edebc6
Comment 11•5 years ago
|
||
This grafts cleanly to Beta and ESR68. Is this something we should consider backporting or can it ride Fx71 to release?
Assignee | ||
Comment 12•5 years ago
|
||
I don't think we're in a hurry here. I think it can ship in 71.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•3 years ago
|
Description
•