Closed Bug 1454112 Opened 6 years ago Closed 6 years ago

Intermittent awsy/test_memory_usage.py TestMemoryUsage.test_open_tabs | application crashed [@ libc-2.23.so + 0x8f2d0]

Categories

(Testing :: AWSY, defect, P5)

Version 3
defect

Tracking

(firefox-esr52 unaffected, firefox59 unaffected, firefox60 unaffected, firefox61+ fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 + fixed

People

(Reporter: intermittent-bug-filer, Assigned: aosmond)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [stockwell fixed:backout][retriggered][MemShrink])

Crash Data

Filed by: apavel [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=173592603&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/Rd-pt1AZQw-ILooGpDpbEA/runs/0/artifacts/public/logs/live_backing.log

[task 2018-04-13T21:03:37.330Z] 21:03:37     INFO - PROCESS-CRASH | awsy/test_memory_usage.py TestMemoryUsage.test_open_tabs | application crashed [@ libc-2.23.so + 0x8f2d0]
[task 2018-04-13T21:03:37.331Z] 21:03:37     INFO - Crash dump filename: /tmp/tmp4N6F25.profile/minidumps/0e581b86-d55e-21b4-f839-d76fd2b9d553.dmp
[task 2018-04-13T21:03:37.331Z] 21:03:37     INFO - Operating system: Linux
[task 2018-04-13T21:03:37.332Z] 21:03:37     INFO -                   0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2018-04-13T21:03:37.333Z] 21:03:37     INFO - CPU: amd64
[task 2018-04-13T21:03:37.333Z] 21:03:37     INFO -      family 6 model 62 stepping 4
[task 2018-04-13T21:03:37.333Z] 21:03:37     INFO -      4 CPUs
[task 2018-04-13T21:03:37.334Z] 21:03:37     INFO - 
[task 2018-04-13T21:03:37.334Z] 21:03:37     INFO - GPU: UNKNOWN
[task 2018-04-13T21:03:37.335Z] 21:03:37     INFO - 
[task 2018-04-13T21:03:37.336Z] 21:03:37     INFO - Crash reason:  SIGBUS
[task 2018-04-13T21:03:37.336Z] 21:03:37     INFO - Crash address: 0x7efe4fe29000
[task 2018-04-13T21:03:37.337Z] 21:03:37     INFO - Process uptime: not available
[task 2018-04-13T21:03:37.338Z] 21:03:37     INFO - 
[task 2018-04-13T21:03:37.338Z] 21:03:37     INFO - Thread 12 (crashed)
[task 2018-04-13T21:03:37.338Z] 21:03:37     INFO -  0  libc-2.23.so + 0x8f2d0
[task 2018-04-13T21:03:37.339Z] 21:03:37     INFO -     rax = 0x00007efe4fd6f000   rdx = 0x00007efe4fe6e240
[task 2018-04-13T21:03:37.339Z] 21:03:37     INFO -     rcx = 0x00007efe4fe29000   rbx = 0x00007efe4eb3a380
[task 2018-04-13T21:03:37.340Z] 21:03:37     INFO -     rsi = 0x00000000000000ff   rdi = 0x00007efe4fd6f000
[task 2018-04-13T21:03:37.340Z] 21:03:37     INFO -     rbp = 0x00007efe7cb483a0   rsp = 0x00007efe7cb48368
[task 2018-04-13T21:03:37.341Z] 21:03:37     INFO -      r8 = 0x0000000000000000    r9 = 0x0000000000000001
[task 2018-04-13T21:03:37.341Z] 21:03:37     INFO -     r10 = 0x0000000000000001   r11 = 0x0000000000000246
[task 2018-04-13T21:03:37.341Z] 21:03:37     INFO -     r12 = 0x00000000000ff250   r13 = 0x00007efe7cb483b8
[task 2018-04-13T21:03:37.342Z] 21:03:37     INFO -     r14 = 0x0000000000000001   r15 = 0x00007efe4fd6f000
[task 2018-04-13T21:03:37.342Z] 21:03:37     INFO -     rip = 0x00007efe7b6462d0
[task 2018-04-13T21:03:37.343Z] 21:03:37     INFO -     Found by: given as instruction pointer in context
[task 2018-04-13T21:03:37.343Z] 21:03:37     INFO -  1  libxul.so!mozilla::image::imgFrame::InitForDecoder [imgFrame.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 300 + 0x25]
[task 2018-04-13T21:03:37.344Z] 21:03:37     INFO -     rbp = 0x00007efe7cb483e0   rsp = 0x00007efe7cb483b0
[task 2018-04-13T21:03:37.344Z] 21:03:37     INFO -     rip = 0x00007efe6b468052
[task 2018-04-13T21:03:37.345Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.345Z] 21:03:37     INFO -  2  libxul.so!mozilla::image::Decoder::AllocateFrameInternal [Decoder.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 343 + 0x20]
[task 2018-04-13T21:03:37.346Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48490   rsp = 0x00007efe7cb483f0
[task 2018-04-13T21:03:37.346Z] 21:03:37     INFO -     rip = 0x00007efe6b452293
[task 2018-04-13T21:03:37.347Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.348Z] 21:03:37     INFO -  3  libxul.so!mozilla::image::Decoder::AllocateFrame [Decoder.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 292 + 0x1b]
[task 2018-04-13T21:03:37.348Z] 21:03:37     INFO -     rbp = 0x00007efe7cb484d0   rsp = 0x00007efe7cb484a0
[task 2018-04-13T21:03:37.349Z] 21:03:37     INFO -     rip = 0x00007efe6b452421
[task 2018-04-13T21:03:37.350Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.351Z] 21:03:37     INFO -  4  libxul.so!mozilla::image::nsJPEGDecoder::ReadJPEGData [nsJPEGDecoder.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 389 + 0x5]
[task 2018-04-13T21:03:37.351Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48600   rsp = 0x00007efe7cb484e0
[task 2018-04-13T21:03:37.352Z] 21:03:37     INFO -     rip = 0x00007efe6b497350
[task 2018-04-13T21:03:37.353Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.354Z] 21:03:37     INFO -  5  libxul.so!mozilla::image::nsJPEGDecoder::DoDecode [nsJPEGDecoder.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 198 + 0x19]
[task 2018-04-13T21:03:37.355Z] 21:03:37     INFO -     rbp = 0x00007efe7cb486f0   rsp = 0x00007efe7cb48610
[task 2018-04-13T21:03:37.355Z] 21:03:37     INFO -     rip = 0x00007efe6b49a12e
[task 2018-04-13T21:03:37.356Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.357Z] 21:03:37     INFO -  6  libxul.so!mozilla::image::Decoder::Decode [Decoder.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 134 + 0xd]
[task 2018-04-13T21:03:37.358Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48740   rsp = 0x00007efe7cb48700
[task 2018-04-13T21:03:37.358Z] 21:03:37     INFO -     rip = 0x00007efe6b453da2
[task 2018-04-13T21:03:37.359Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.360Z] 21:03:37     INFO -  7  libxul.so!mozilla::image::DecodedSurfaceProvider::Run [DecodedSurfaceProvider.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 139 + 0x10]
[task 2018-04-13T21:03:37.360Z] 21:03:37     INFO -     rbp = 0x00007efe7cb487b0   rsp = 0x00007efe7cb48750
[task 2018-04-13T21:03:37.361Z] 21:03:37     INFO -     rip = 0x00007efe6b45729b
[task 2018-04-13T21:03:37.361Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.362Z] 21:03:37     INFO -  8  libxul.so!mozilla::image::DecodePoolWorker::Run [DecodePool.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 284 + 0x6]
[task 2018-04-13T21:03:37.362Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48840   rsp = 0x00007efe7cb487c0
[task 2018-04-13T21:03:37.363Z] 21:03:37     INFO -     rip = 0x00007efe6b4513f6
[task 2018-04-13T21:03:37.363Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.363Z] 21:03:37     INFO -  9  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 1096 + 0xa]
[task 2018-04-13T21:03:37.364Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48dd0   rsp = 0x00007efe7cb48850
[task 2018-04-13T21:03:37.365Z] 21:03:37     INFO -     rip = 0x00007efe6a7d5f91
[task 2018-04-13T21:03:37.365Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.366Z] 21:03:37     INFO - 10  libxul.so!NS_ProcessNextEvent [nsThreadUtils.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 519 + 0xe]
[task 2018-04-13T21:03:37.366Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48e00   rsp = 0x00007efe7cb48de0
[task 2018-04-13T21:03:37.367Z] 21:03:37     INFO -     rip = 0x00007efe6a7dd55b
[task 2018-04-13T21:03:37.367Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.368Z] 21:03:37     INFO - 11  libxul.so!mozilla::ipc::MessagePumpForNonMainThreads::Run [MessagePump.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 364 + 0xd]
[task 2018-04-13T21:03:37.368Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48e40   rsp = 0x00007efe7cb48e10
[task 2018-04-13T21:03:37.369Z] 21:03:37     INFO -     rip = 0x00007efe6ac13a09
[task 2018-04-13T21:03:37.369Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.369Z] 21:03:37     INFO - 12  libxul.so!MessageLoop::Run [message_loop.cc:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 326 + 0x5]
[task 2018-04-13T21:03:37.370Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48e70   rsp = 0x00007efe7cb48e50
[task 2018-04-13T21:03:37.370Z] 21:03:37     INFO -     rip = 0x00007efe6abe8c82
[task 2018-04-13T21:03:37.371Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.371Z] 21:03:37     INFO - 13  libxul.so!nsThread::ThreadFunc [nsThread.cpp:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 425 + 0x8]
[task 2018-04-13T21:03:37.372Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48ec0   rsp = 0x00007efe7cb48e80
[task 2018-04-13T21:03:37.372Z] 21:03:37     INFO -     rip = 0x00007efe6a7d40c6
[task 2018-04-13T21:03:37.373Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.373Z] 21:03:37     INFO - 14  libnspr4.so!_pt_root [ptthread.c:837a6f4efa3eaf3cff106349b735e718efb4d4a6 : 201 + 0x7]
[task 2018-04-13T21:03:37.374Z] 21:03:37     INFO -     rbp = 0x00007efe7cb48f10   rsp = 0x00007efe7cb48ed0
[task 2018-04-13T21:03:37.374Z] 21:03:37     INFO -     rip = 0x00007efe7b3a3b81
[task 2018-04-13T21:03:37.375Z] 21:03:37     INFO -     Found by: previous frame's frame pointer
[task 2018-04-13T21:03:37.375Z] 21:03:37     INFO - 15  libpthread-2.23.so + 0x76ba
[task 2018-04-13T21:03:37.376Z] 21:03:37     INFO -     rbx = 0x0000000000000000   rbp = 0x0000000000000000
[task 2018-04-13T21:03:37.376Z] 21:03:37     INFO -     rsp = 0x00007efe7cb48f20   r12 = 0x0000000000000000
[task 2018-04-13T21:03:37.377Z] 21:03:37     INFO -     r13 = 0x00007ffc1f984e1f   r14 = 0x00007efe7cb499c0
[task 2018-04-13T21:03:37.377Z] 21:03:37     INFO -     r15 = 0x00007efe60c805e0   rip = 0x00007efe7c6356ba
[task 2018-04-13T21:03:37.377Z] 21:03:37     INFO -     Found by: call frame info
[task 2018-04-13T21:03:37.378Z] 21:03:37     INFO - 16  libc-2.23.so + 0x10741d
[task 2018-04-13T21:03:37.378Z] 21:03:37     INFO -     rsp = 0x00007efe7cb48fc0   rip = 0x0
[task 2018-04-13T21:03:37.378Z] 21:03:37     INFO -     Found by: stack scanning
I did some triggers to find where this fail started and it lead me to this push:

https://hg.mozilla.org/integration/mozilla-inbound/rev/2efe54944e8cb91f1198d0c55ccb440de5b37997

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=test-linux64-qr%2Fopt-awsy-e10s&tochange=a434fac58370fc14bceb525e1ca8b7125638b6cd&fromchange=69f26860c7d23c1c71db6778f342829ee7c8b118&selectedJob=173732495

As you can see I retriggered this test on several pushes and that test failed a lot. 

@Joel can you please take a look? or maybe NI on Andrew (aosmond)?
Flags: needinfo?(jmaher)
Crash Signature: [@ libc-2.23.so + 0x8f2d0] → [@ libc-2.23.so + 0x8f2d0] [@ libc-2.23.so + 0x15e61b]
Crash Signature: [@ libc-2.23.so + 0x8f2d0] [@ libc-2.23.so + 0x15e61b] → [@ libc-2.23.so + 0x8f2d0] [@ libc-2.23.so + 0x15e61b] [@ mozilla::ipc::Shmem::Alloc]
Crash Signature: [@ libc-2.23.so + 0x8f2d0] [@ libc-2.23.so + 0x15e61b] [@ mozilla::ipc::Shmem::Alloc] → [@ libc-2.23.so + 0x8f2d0] [@ libc-2.23.so + 0x8f29c] [@ libc-2.23.so + 0x15e61b] [@ mozilla::ipc::Shmem::Alloc]
Thanks Cristian, this is very clear as we go from 100% green to just about 100% orange.

Andrew, can you look at fixing this perma fail that we see from your push?
Blocks: 1453801
Flags: needinfo?(jmaher) → needinfo?(aosmond)
Whiteboard: [stockwell needswork][retriggered]
A permafailing AWSY test seems bad, from a MemShrink perspective.
Whiteboard: [stockwell needswork][retriggered] → [stockwell needswork][retriggered][MemShrink]
This looks like we're running out of space in /dev/shm; see also bug 1245239.

SIGBUS crashes in random places in graphics generally mean that.  SIGBUS crashes in Shmem::Create (where the length is stored in-band at the end of the memory region) could indicate the length got mixed up somewhere, but if they happen immediately after another process took SIGBUS somewhere else, that probably means that the first process exhausted /dev/shm and crashed, and then the second “OOM”ed on the first page of a new shmem because /dev/shm was completely full at that point; for this, see also bug 1446618 comment #2

If so, this should all go away — for Linux ≥ 3.17, which includes Mozilla CI — with bug 1440203, after which shared memory will be subject to the same (lack of) limits as regular anonymous mmaps.
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #6)
> Shmem::Create

By which I mean Shmem::Alloc.
I'm looking into it. I know we are running out, it is just how I managed to do that with a patch that should have stopped us from keeping unused image buffers around is puzzling me :).
I had hoped the patch in this try would help (https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6ba965a0a69c958e3b6c81ab90bee1dd9337c96) but while it corrects an issue locally where we end up holding onto images much longer than we should, it didn't fix the awsy test.
Consistently crashing on these two links:

https://www.ethoswatches.com/brands/casio-watches.html  
https://www.ethoswatches.com/product/casio-g-shock-ga-1000-1adr-men-s-watch.html  


multiple crash report(different signatures):
Firefox 61.0a1 Crash Report [@ mozilla::ipc::Shmem::Alloc ] 
 https://crash-stats.mozilla.com/report/index/1f9c5b26-205d-4bce-a09e-28d7b0180417
Firefox 61.0a1 Crash Report [@ avx::memset32 ] 
 https://crash-stats.mozilla.com/report/index/a057f81c-faa1-4532-9d03-cc7500180417
Firefox 61.0a1 Crash Report [@ mozilla::wr::ShmSegmentsWriter::Write ] 
 https://crash-stats.mozilla.com/report/index/ac251e09-4c02-4600-8265-1002b0180417
Firefox 61.0a1 Crash Report [@ libc-2.23.so@0x8f2d0 ] 
 https://crash-stats.mozilla.com/report/index/a6ef3233-e5c0-4ec6-85f8-cd9a60180417

Had commented here as well with more crash reports and associated ones in the past week - 
https://bugzilla.mozilla.org/show_bug.cgi?id=1255130#c37
When bug 1453801 relands, I will ensure this regression is avoided then.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(aosmond)
Resolution: --- → FIXED
Whiteboard: [stockwell needswork][retriggered][MemShrink] → [stockwell fixed:backout][retriggered][MemShrink]
Assignee: nobody → aosmond
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.