Crash in [@ GeckoViewStreamingTelemetry::SendBatchRunnable::Run]
Categories
(Toolkit :: Telemetry, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox-esr78 | --- | unaffected |
firefox77 | --- | wontfix |
firefox78 | --- | wontfix |
firefox79 | --- | fixed |
People
(Reporter: yoasif, Assigned: chutten)
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
This bug is for crash report bp-2c5238c7-34d8-4f08-bde6-a77540200607.
Top 10 frames of crashing thread:
0 libxul.so GeckoViewStreamingTelemetry::SendBatchRunnable::Run toolkit/components/telemetry/geckoview/streaming/GeckoViewStreamingTelemetry.cpp:123
1 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1236
2 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:87
3 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
4 libxul.so nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137
5 libxul.so nsAppStartup::Run toolkit/components/startup/nsAppStartup.cpp:271
6 libxul.so XREMain::XRE_mainRun toolkit/xre/nsAppRunner.cpp:4664
7 libxul.so XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:4811
8 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:4865
9 libxul.so GeckoStart toolkit/xre/nsAndroidStartup.cpp:46
Mentioned https://www.reddit.com/r/firefox/comments/gy1w7v/please_help_with_fenix/ for Fenix.
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
Crashing address is 0xe5e5e5e5e5e5f5
which looks like an offset to a marker value to me. Suggests that mDelegate
has been freed by jemalloc maybe.
Reports on this signature are consistent with a solitary user having a really hard time with Fenix every month or two for a week at a stretch.
mDelegate
should not possibly be null. Access to gDelegate
is protected by mutex, and we were holding it from when we last checked it was non-null to when we stuffed it in the SendBatchRunnable
which will keep it alive inside its RefPtr
. We haven't changed the GV Streaming code in quite some time, so it seems likely that this is either uncovering something incredibly unlikely that hasn't been seen before, or perhaps some change outside of GV Streaming has complicated matters (maybe the delegate now frees itself even if it has a refcount? But that code, too, hasn't changed since last August)
Maybe a check for !gDelegate
doesn't work if it hasn't been explicitly set to nullptr
and so the fact that it's set to 0xe5e5e5e5e5e5e5
allows it to pass the check? Is there a different way I should be checking if a RefPtr
is pointing to something that's been deallocated? (aklotz, do you know?)
Comment 2•4 years ago
|
||
Having taken a cursory look over this code, I don't see any smoking guns here. :-(
Comment 3•4 years ago
|
||
I looked briefly at this; the assembly leading up to the crash is:
# load `mDelegate`
7f817bcad4: 60 12 40 f9 ldr x0, [x19, #0x20]
7f817bcad8: e9 17 40 f9 ldr x9, [sp, #0x28]
# pointer to `histogramName`
7f817bcadc: e1 43 00 91 add x1, sp, #0x10
# load `mDelegate`'s vtable
7f817bcae0: 08 00 40 f9 ldr x8, [x0]
# load a pointer to `samples` (I think)
7f817bcae4: 22 41 00 91 add x2, x9, #0x10
# load the address of the virtual method `ReceiveHistogramSamples`
# CRASH
7f817bcae8: 08 09 40 f9 ldr x8, [x8, #0x10]
# call the method
7f817bcaec: 00 01 3f d6 blr x8
How in the world is the vtable holding a pointer to jemalloc poision? Are there terrible things going on in the Java interface bridge, or did we manage to royally scribble over static memory?
Comment 4•4 years ago
|
||
Come to think of it, I think I got needinfo'd recently in bug 1635400 where it sounded like perhaps there was some JNI shenanigans going on. I wonder if these are related in any way...
Assignee | ||
Comment 5•4 years ago
|
||
:esawin, you wrote the JNI interface layer back in August. Do you have any idea what's up?
Comment 6•4 years ago
•
|
||
I see that StreamingTelemetryDelegate
is (in theory, at least?) used from different threads, but does not use a thread-safe ref counting mechanism. I think this may explain things. :chutten I think you should try changing StreamingTelemetryDelegate
to derive from AtomicRefCounted
and see if that helps.
Apparently the right thing to do is to use NS_INLINE_DECL_THREADSAFE_REFCOUNTING(StreamingTelemetryDelegate)
Assignee | ||
Comment 8•4 years ago
|
||
Ah. If the refcnt isn't atomic, I can imagine that interesting things could indeed happen when the delegate is dropped after a runnable is dispatched. (or if the delegate is changed). It'd be weird as all get out for this to be run across, but I guess that matches the frequency of the crash.
Let's make the refcounting threadsafe and see what we see.
Assignee | ||
Comment 9•4 years ago
|
||
Comment 10•4 years ago
|
||
Comment 11•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Description
•