Closed Bug 1556188 Opened 5 years ago Closed 4 years ago

Crash in [@ IPCError-browser | ShutDownKill]

Categories

(Core :: Gecko Profiler, defect, P2)

69 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1279293
Tracking Status
firefox69 --- affected

People

(Reporter: auroraofearth, Unassigned, NeedInfo)

Details

Crash Data

This bug is for crash report bp-1222a43f-806f-4120-aa4a-db92d0190601.

Top 10 frames of crashing thread:

0 libpthread-2.28.so libpthread-2.28.so@0x1129c 
1 libxul.so profiler_add_marker_for_thread tools/profiler/core/platform.cpp:3969
2 libxul.so mozilla::BackgroundHangThread::ReportHang toolkit/components/backgroundhangmonitor/BackgroundHangMonitor.cpp:500
3 libxul.so mozilla::BackgroundHangThread::NotifyWait toolkit/components/backgroundhangmonitor/BackgroundHangMonitor.cpp:240
4 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1076
5 libxul.so <name omitted> xpcom/threads/nsThreadUtils.cpp:486
6 libxul.so nsThread::Shutdown xpcom/threads/nsThread.cpp:882
7 libxul.so mozilla::ChildProfilerController::ShutdownAndMaybeGrabShutdownProfileFirst tools/profiler/gecko/ChildProfilerController.cpp:62
8 libxul.so mozilla::dom::ContentChild::ShutdownInternal dom/ipc/ContentChild.cpp:3045
9 libxul.so mozilla::dom::ContentChild::RecvShutdown dom/ipc/ContentChild.cpp:2989

ShutDownKill just means that a content process was doing something and didn't shut down in time; it's usually not about IPC. The content process seems to be blocked on a lock in the profiler:

PSAutoLock lock(gPSMutex);

The lock holder is probably Thread 21, which is in dl_iterate_phdr. Moving this bug to the profiler component.

Component: IPC → Gecko Profiler

The crash signature graph scared me, but restricting to crash reports that have "profiler" in the stack shows "only" a dozen reports since March:
https://crash-stats.mozilla.org/signature/?reason=%3DDUMP_REQUESTED&proto_signature=~profiler&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2018-12-05T12%3A02%3A00.000Z&date=%3C2019-06-05T12%3A02%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-build_id&_sort=-date&page=1

Though that's Linux only. Removing the "DUMP_REQUESTED" filter, there are more crashes, mostly Windows and a few on macos, and some of these also appear to be in/near the profiler:
https://crash-stats.mozilla.org/signature/?proto_signature=~profiler&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2018-12-05T12%3A02%3A00.000Z&date=%3C2019-06-05T12%3A02%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1

Most have profiler_get_backtrace waiting for the lock at the top of the main thread, a locked_profiler_stream_json_for_this_process elsewhere (that would be the one holding the lock), and maybe some other threads trying to do profiler work.
Could it just be that the profiler buffer serialization is taking too long?

It seems to be starting on 68.0a1 20190319095054.
Interestingly, the generic crash signature graph also jumps from hundreds per day to thousands per day around that date! Did we do something special then? E.g., shortened the shutdown timer?

Jed, any idea about this 19 March phenomenon? (Not trying to shift the blame, I think this bug can stay with Gecko Profiler and that the profiler should still be improved; but just wondering what caused the massive jump for this signature.)

To be investigated further...

Flags: needinfo?(jld)
OS: Linux → All
Priority: -- → P2
Hardware: x86_64 → All

I skimmed the output of hg log -r '880331515823 % fe798624cda0', which should be all the changes in 20190319095054 but not the previous build (using https://buildhub.moz.tools/ to map the build to a revision, then the Hg web interface's “last build without” to step backwards), but I don't see anything that looks relevant. The only thing that's even tangentially IPC-related that I can see in that range is bug 1533842. Of course, it's possible that a regression was introduced in an earlier build but didn't happen to produce a crash report.

Flags: needinfo?(jld)

¡Hola Aurora!

Are you still able to reproduce https://bugzilla.mozilla.org/show_bug.cgi?id=1556188 on the current https://nightly.mozilla.org ?

Please update and report back.

¡Gracias!
Alex

Flags: needinfo?(auroraofearth)
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.