Closed Bug 1127464 Opened 5 years ago Closed 5 years ago

still crashing in nsObserverService::RemoveObserver(nsIObserver*, char const*)

Categories

(Firefox for Android :: Toolbar, defect, critical)

34 Branch
ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 39
Tracking Status
firefox35 --- wontfix
firefox36 + wontfix
firefox37 + wontfix
firefox38 + wontfix
firefox39 --- fixed
fennec 37+ ---

People

(Reporter: snorp, Assigned: snorp)

References

Details

(Keywords: crash, reproducible, topcrash-android-armv7)

Crash Data

Attachments

(1 file, 3 obsolete files)

+++ This bug was initially created as a clone of Bug #1062758 +++

https://crash-stats.mozilla.com/report/index/3bee9ab7-9790-4b87-b55d-bd95d2150124

We are somehow exiting in the middle of a composite, possible due to divide-by-zero. No STR.
Assignee: nobody → snorp
[Tracking Requested - why for this release]:
tracking-fennec: --- → 37+
Messing with the summary to prevent gmail from being stupid.
Summary: crash in nsObserverService::RemoveObserver(nsIObserver*, char const*) → still crashing in nsObserverService::RemoveObserver(nsIObserver*, char const*)
We seem to have a signal handler installed to catch SIGFPE (and as a result, do not crash/exit), but I wonder if some versions of Android have their own which does exit.
Frame 22 in that crash is libgsl.so@0xd2a7, which on a 4.4.2 mako build is a "blx exit@plt" in ioctl_kgsl_sharedmem_alloc. Two instructions before, it's calling an os_alog function so there presumably is something in logcat.

So, there are two issues here:
- we should MOZ_CRASH during exit() instead of crashing while running some of our destructors. Since I want to keep our linker code working outside of our usecase, I'd rather make us crash optionally and enable that on our builds only, something like a MOZ_LINKER_CRASH_AT_EXIT define.
- there's an OOM happening, which we might be able to do something about, considering the "many pictures (~450)" comment in the crash report.
This crash is reproducible whilst scrolling around the embedded Google map on http://www.flightradar24.com/data/flights/ua931/

https://crash-stats.mozilla.com/report/index/efc1f999-1b89-457a-9d9c-9bbdc2150203

Device: Galaxy Note 3 (4.4.2)
From some logcats that mfinkle sent me:

01-27 08:21:10.454  7364  7382 D skia    : SkGrPixelRef::onReadPixels failed to alloc bitmap for result!
01-27 08:21:10.454  7364  7382 D skia    : SkROLockPixelsPixelRef::onLockPixels failed!
01-27 08:21:10.674  7364  7513 W Adreno-GSL: <sharedmem_gpumem_alloc_id:1489>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory
01-27 08:21:10.674  7364  7513 E Adreno-GSL: <ioctl_kgsl_sharedmem_alloc:1590>: ioctl_kgsl_sharedmem_alloc: FATAL ERROR : (null)

So Skia was trying to do a readback and failed to allocate, but then there was some other allocation attempt in GL that caused the exit().
Milan there may be some sort of Skia or GL-related leak here
Component: General → Graphics, Panning and Zooming
Catalin it seems like we already had a crash with the STR you gave there. Can you remember which one that is? Did we fix it?
Flags: needinfo?(catalin.suciu)
https://bugzilla.mozilla.org/show_bug.cgi?id=1098227
This is the other crash and I can't reproduce it using the initial steps
Flags: needinfo?(catalin.suciu)
Looks like it is too late for 36...
An alternative to this patch would be turning the MOZ_CRASH() into a debug assert and just skipping the finalization. I like this one better because the app is still going down unexpectedly (crashing), and we will get a report. But it won't look like an actual crash like this one, we should get the assertion message.
Comment on attachment 8569446 [details] [diff] [review]
Assert when we unexpectedly unload libraries on Android

Review of attachment 8569446 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/ElfLoader.cpp
@@ +492,5 @@
>  void
>  ElfLoader::Init()
>  {
>    Dl_info info;
> +  expect_shutdown = false;

might as well set it in the ElfLoader constructor... except there is no constructor at the moment. Meh.

That said, as I said in comment 4, defaults should be sensible to non-Fennec use, and defaulting to false means a voluntary crash will always happen in that case.

@@ +591,5 @@
>  
>  void
>  ElfLoader::__wrap_cxa_finalize(void *dso_handle)
>  {
> +  if (!ElfLoader::Singleton.IsShutdownExpected()) { \

What are the backslashes for? You can also remove ElfLoader::
Attachment #8569446 - Flags: review?(mh+mozilla) → review-
(In reply to Mike Hommey [:glandium] from comment #13)
> Comment on attachment 8569446 [details] [diff] [review]
> Assert when we unexpectedly unload libraries on Android
> 
> Review of attachment 8569446 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: mozglue/linker/ElfLoader.cpp
> @@ +492,5 @@
> >  void
> >  ElfLoader::Init()
> >  {
> >    Dl_info info;
> > +  expect_shutdown = false;
> 
> might as well set it in the ElfLoader constructor... except there is no
> constructor at the moment. Meh.
> 
> That said, as I said in comment 4, defaults should be sensible to non-Fennec
> use, and defaulting to false means a voluntary crash will always happen in
> that case.
> 

Yeah, forgot about that. I'll add some #ifdef stuff.

> @@ +591,5 @@
> >  
> >  void
> >  ElfLoader::__wrap_cxa_finalize(void *dso_handle)
> >  {
> > +  if (!ElfLoader::Singleton.IsShutdownExpected()) { \
> 
> What are the backslashes for? You can also remove ElfLoader::

Oh, I had it in a macro at first, but then realized I only needed to check the one spot. Oops.
One interesting thing is that finalizers do not get called if you exit(0), so we won't be able to catch that.
Attachment #8569446 - Attachment is obsolete: true
Attachment #8570513 - Flags: review?(mh+mozilla)
Comment on attachment 8570513 [details] [diff] [review]
Assert when we unexpectedly unload libraries on Android

Review of attachment 8570513 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/ElfLoader.cpp
@@ +489,5 @@
>    }
>  }
>  
> +ElfLoader::ElfLoader()
> +  : expect_shutdown(false)

Trivial enough to go in the class definition. Default should be true.

@@ +497,5 @@
>  void
>  ElfLoader::Init()
>  {
>    Dl_info info;
> +  expect_shutdown = false;

redundant with the constructor.

@@ +596,5 @@
>  
>  void
>  ElfLoader::__wrap_cxa_finalize(void *dso_handle)
>  {
> +#ifdef MOZ_LINKER_CRASH_UNEXPECTED_EXIT

This shouldn't be a compile-time thing. Just make the load*Libs functions in APKOpen.cpp set the flag to false.
Attachment #8570513 - Flags: review?(mh+mozilla) → review-
Attachment #8570513 - Attachment is obsolete: true
Attachment #8571374 - Flags: review?(mh+mozilla)
Ugh, this seems to have some problems with PR_LoadLibrary. I guess I need to check that the library is libxul or something before asserting.
Comment on attachment 8571374 [details] [diff] [review]
Assert when we unexpectedly unload libraries on Android

Review of attachment 8571374 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/android/APKOpen.cpp
@@ +407,2 @@
>    GeckoStart(args, &sAppData);
> +  ElfLoader::Singleton.ExpectShutdown(true);

Note, considering gecko doesn't survive the shutdown, we might as well never allow it for gecko.

::: mozglue/linker/ElfLoader.h
@@ +459,5 @@
>    friend int __wrap_dlclose(void *handle);
>    const char *lastError;
>  
>  private:
> +  ElfLoader() : expect_shutdown(true){}

space before {
Attachment #8571374 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8571374 [details] [diff] [review]
Assert when we unexpectedly unload libraries on Android

Review of attachment 8571374 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/ElfLoader.cpp
@@ +591,5 @@
>  ElfLoader::__wrap_cxa_finalize(void *dso_handle)
>  {
> +  if (!Singleton.IsShutdownExpected()) {
> +    MOZ_CRASH("Unexpected shutdown");
> +  }

Ah, wrt comment 19, you want thsi part in ~ElfLoader instead.
(In reply to Mike Hommey [:glandium] from comment #20)
> Comment on attachment 8571374 [details] [diff] [review]
> Assert when we unexpectedly unload libraries on Android
> 
> Review of attachment 8571374 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: mozglue/android/APKOpen.cpp
> @@ +407,2 @@
> >    GeckoStart(args, &sAppData);
> > +  ElfLoader::Singleton.ExpectShutdown(true);
> 
> Note, considering gecko doesn't survive the shutdown, we might as well never
> allow it for gecko.

We do survive if the main loop has exited, typically, because we have a chance to clean up after ourselves. This avoids the xpcom freak-out that happens when exit() is called randomly.
https://hg.mozilla.org/mozilla-central/rev/c3e4f5f4a4d0
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 39
37 and 38 are marked as affected. wdyt about uplifting the fix?
Flags: needinfo?(snorp)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #26)
> 37 and 38 are marked as affected. wdyt about uplifting the fix?

This isn't *really* fixed. The patch just keeps us from crashing in a stupid way. The root cause is an OOM (at least it seems to be most of the time). I don't see a reason to uplift it right now.
Flags: needinfo?(snorp)
From speaking with snorp, we're going to leave this one closed even though the issue hasn't really been resolved. There's nothing to uplift but we expect that the patch will produce different signatures that should help in the diagnosis of the OOM issue that this bug is about.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Lawrence did you mean to reopen this?
Flags: needinfo?(lmandel)
Nope.
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Flags: needinfo?(lmandel)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.