Closed Bug 1573767 Opened 6 years ago Closed 2 years ago

Intermittent telemetry/marionette/tests/client/test_event_ping.py TestEventPing.test_event_ping | application crashed [@ libxul.so + 0x292e529]

Categories

(Toolkit :: Telemetry, defect, P4)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: crash, intermittent-failure, regression)

Crash Data

Filed by: aciure [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=261512689&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/aWR-32TSTn-qkTSt3i9v3Q/runs/0/artifacts/public/logs/live_backing.log


[task 2019-08-14T05:02:08.984Z] 05:02:08 INFO - pingserver pings_handler received 'first-shutdown' ping
[task 2019-08-14T05:03:11.152Z] 05:03:11 INFO - ExceptionHandler::GenerateDump cloned child ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2019-08-14T05:03:11.152Z] 05:03:11 INFO - 991
[task 2019-08-14T05:03:11.152Z] 05:03:11 INFO - ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2019-08-14T05:03:11.227Z] 05:03:11 INFO - mozcrash Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /builds/worker/workspace/build/tmpKZhHb6.mozrunner/minidumps/3ddf6268-5faf-e199-393a-dd2dff991e46.dmp /tmp/tmpQWYAxb
[task 2019-08-14T05:03:12.934Z] 05:03:12 INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/3ddf6268-5faf-e199-393a-dd2dff991e46.dmp
[task 2019-08-14T05:03:12.937Z] 05:03:12 INFO - mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/3ddf6268-5faf-e199-393a-dd2dff991e46.extra
[task 2019-08-14T05:03:12.974Z] 05:03:12 INFO - PROCESS-CRASH | telemetry/marionette/tests/client/test_event_ping.py TestEventPing.test_event_ping | application crashed [@ libxul.so + 0x292e529]
[task 2019-08-14T05:03:12.974Z] 05:03:12 INFO - Crash dump filename: /builds/worker/workspace/build/tmpKZhHb6.mozrunner/minidumps/3ddf6268-5faf-e199-393a-dd2dff991e46.dmp
[task 2019-08-14T05:03:12.975Z] 05:03:12 INFO - Operating system: Linux
[task 2019-08-14T05:03:12.975Z] 05:03:12 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2019-08-14T05:03:12.975Z] 05:03:12 INFO - CPU: amd64
[task 2019-08-14T05:03:12.976Z] 05:03:12 INFO - family 6 model 85 stepping 4
[task 2019-08-14T05:03:12.976Z] 05:03:12 INFO - 1 CPU
[task 2019-08-14T05:03:12.977Z] 05:03:12 INFO -
[task 2019-08-14T05:03:12.977Z] 05:03:12 INFO - GPU: UNKNOWN
[task 2019-08-14T05:03:12.977Z] 05:03:12 INFO -
[task 2019-08-14T05:03:12.977Z] 05:03:12 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2019-08-14T05:03:12.977Z] 05:03:12 INFO - Crash address: 0x0
[task 2019-08-14T05:03:12.977Z] 05:03:12 INFO - Process uptime: not available
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO -
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - Thread 23 (crashed)
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - 0 libxul.so + 0x292e529
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - rax = 0x00007fe72e2ea380 rdx = 0x0000000000000074
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - rcx = 0x0000555914b99c30 rbx = 0x00007fe72df912a0
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - rsi = 0x00007fe72e5028c0 rdi = 0x00007fe72e2ea3b4
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - rbp = 0x00007fe72cefee90 rsp = 0x00007fe72cefee10
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - r8 = 0x0000000000000001 r9 = 0x000000000000c000
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - r10 = 0x0000000000000024 r11 = 0x00007fe754cc8e10
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - r12 = 0x00007fe72df91330 r13 = 0x0000000000000001
[task 2019-08-14T05:03:12.978Z] 05:03:12 INFO - r14 = 0x00007fe72df912d0 r15 = 0x00007fe72df91300
[task 2019-08-14T05:03:12.979Z] 05:03:12 INFO - rip = 0x00007fe744d43529
[task 2019-08-14T05:03:12.979Z] 05:03:12 INFO - Found by: given as instruction pointer in context
[task 2019-08-14T05:03:12.979Z] 05:03:12 INFO - 1 libxul.so + 0x40952c1
[task 2019-08-14T05:03:12.979Z] 05:03:12 INFO - rbp = 0x00007fe72cefeec0 rsp = 0x00007fe72cefeea0
[task 2019-08-14T05:03:12.979Z] 05:03:12 INFO - rip = 0x00007fe7464aa2c1
[task 2019-08-14T05:03:12.979Z] 05:03:12 INFO - Found by: previous frame's frame pointer
[task 2019-08-14T05:03:12.979Z] 05:03:12 INFO - 2 libnspr4.so + 0x2acbe
[task 2019-08-14T05:03:12.980Z] 05:03:12 INFO - rbp = 0x00007fe72cefef10 rsp = 0x00007fe72cefeed0
[task 2019-08-14T05:03:12.980Z] 05:03:12 INFO - rip = 0x00007fe755f73cbe
[task 2019-08-14T05:03:12.980Z] 05:03:12 INFO - Found by: previous frame's frame pointer
[task 2019-08-14T05:03:12.980Z] 05:03:12 INFO - 3 libpthread-2.23.so!start_thread [pthread_create.c : 333 + 0x11]
[task 2019-08-14T05:03:12.980Z] 05:03:12 INFO - rsp = 0x00007fe72cefef20 rip = 0x00007fe755bbf6ba
[task 2019-08-14T05:03:12.980Z] 05:03:12 INFO - Found by: stack scanning
[task 2019-08-14T05:03:12.980Z] 05:03:12 INFO - 4 libc-2.23.so!__clone + 0x6d
[task 2019-08-14T05:03:12.981Z] 05:03:12 INFO - rsp = 0x00007fe72cefefc0 rip = 0x00007fe754c4841d
[task 2019-08-14T05:03:12.981Z] 05:03:12 INFO - Found by: stack scanning

Having happened exactly once, and with unsymbolicated stack, there's nothing I can do here.

Raphael, do you know if there's a reason why t-t-c tests that crash have stacks without symbols?

Flags: needinfo?(rpierzina)

Chris, no I don't know. I'm not familiar with crash stacks in general. Would you mind explaining that to me?

Flags: needinfo?(rpierzina) → needinfo?(chutten)

Sure. When Firefox crashes, it's because it tried to do something naughty and the OS killed it. Sometimes it's asking for more memory when there isn't any (out of memory crash, or OOM). Sometimes it's trying to execute code it's not allowed to (segmentation fault/SEGFAULT). Firefox can also crash itself if we catch ourselves in code about to do something bad (ASSERT failures). (there are more, but these are three common cases)

In any case, the line of code trying to do something bad was called by some other code, which was called by some other code, which was called by some other code, and so forth. The whole pile of these frames of code are a stack. In this case the stack is

0 libxul.so + 0x292e529
1 libxul.so + 0x40952c1
2 libnspr4.so + 0x2acbe
3 libpthread-2.23.so!start_thread [pthread_create.c : 333 + 0x11]
4 libc-2.23.so!__clone + 0x6d

This isn't really helpful. It'd be much nicer if they told us what code file at which line was the culprit, not some memory offset into a library (libxul being the library for most of firefox). This mapping of offsets to files and lines requires "symbols".

Without symbols I can't venture a guess whereabouts we even are inside Firefox code. Without this happening more than once, I can't do population inference to see if maybe it's only happening on a specific version or a specific OS or anything. And without it happening reliably, I can't reproduce it locally and debug it live with my tools.

So the first step is figuring out why this (and I think some others?) crash is without symbols. Maybe it has something to do with how t-t-c is run. Do you know who we could reach out to about this sort of question?

Flags: needinfo?(chutten) → needinfo?(rpierzina)

Thank you :chutten for this super helpful explanation!

No, I don't really know who we could reach out to. Maybe asking on IRC or Slack would be best?

Flags: needinfo?(rpierzina)

The priority flag is not set for this bug.
:chutten, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(chutten)

Seems to be a case of "crashes that happen when the test is running" rather than "crashes caused by the test", though without symbolicated stacks it's hard to say for sure.

See also bug 1509241 for another symbol-free low-freq bucket of crashes.

Flags: needinfo?(chutten)
Priority: -- → P4
See Also: → 1509241

These reports are MOZ_DIAGNOSTIC_ASSERT(Request::mDisconnected) which come from MozPromise::AssertIsDead() which appears to only be called as a part of a promise chain dtor situation.

Sounds like an unclean shutdown situation, but without a symbolicated stack I couldn't tell you which promise or why.

Depends on: 1594515
Severity: critical → S2

Since the crash volume is low (less than 5 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.