If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

image caches should not be statically allocated (ABORT: imgCacheEntry release isn't thread-safe!: '_mOwningThread.GetThread() == PR_GetCurrentThread()')

NEW
Unassigned

Status

()

Core
ImageLib
8 years ago
2 years ago

People

(Reporter: Joe Drew (not getting mail), Unassigned)

Tracking

Trunk
x86
Mac OS X
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

8 years ago
Created attachment 413454 [details] [diff] [review]
allocate caches on the heap

Currently, the image caches are all static members of the imgLoader class. This means that, in the case that we crash or for some other reason call exit() on a non-main thread, we will call the caches' destructors on a non-main thread, leading to an abort:

###!!! ABORT: imgCacheEntry release isn't thread-safe!: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file /builds/slave/mozilla-central-linux-debug/build/modules/libpr0n/src/imgLoader.h, line 81

In general, it's not safe to destroy or construct things statically, and since we already have an entry/exit point for the cache (XPCOM creation and shutdown), we should just use that more aggressively, and allocate the hash tables on the heap inside them, with only a pointer to these hash tables remaining static.

This patch also ensures that we never try to use the cache after we've shut down XPCOM, and thus inserts a lot of null checks.
Attachment #413454 - Flags: review?(bobbyholley)

Comment 1

8 years ago
Comment on attachment 413454 [details] [diff] [review]
allocate caches on the heap

>@@ -677,19 +678,23 @@ nsresult imgLoader::InitCache()
>+  sCache = new imgCacheTable();
>+  sCacheQueue = new imgCacheQueue();
>+  if (!sCache || !sCache->Init() || !sCacheQueue)
>       return NS_ERROR_OUT_OF_MEMORY;

please delete + null sCache + sCacheQueue before returning.

>+  sChromeCache = new imgCacheTable();
>+  sChromeCacheQueue = new imgCacheQueue();
>+  if (!sChromeCache || !sChromeCache->Init() || !sChromeCacheQueue)
>       return NS_ERROR_OUT_OF_MEMORY;

please delete + null everything here I guess.
Comment on attachment 413454 [details] [diff] [review]
allocate caches on the heap

 
 nsresult imgLoader::ClearChromeImageCache()
 {
-  return EvictEntries(sChromeCache);
+  return EvictEntries(*sChromeCache);
 }
 
 nsresult imgLoader::ClearImageCache()
 {
-  return EvictEntries(sCache);
+  return EvictEntries(*sCache);
 }
 
 void imgLoader::MinimizeCaches()
 {
-  EvictEntries(sCacheQueue);
-  EvictEntries(sChromeCacheQueue);
+  EvictEntries(*sCacheQueue);
+  EvictEntries(*sChromeCacheQueue);
 }
 
MinimizeCaches, at the very least, is called by an observer that doesn't seem to check for null caches. As such, I'd think we'd need a null check here. Overall, can't we just make EvictEntries take a pointer, if that's how we're treating the caches these days?

-  if (queue.GetNumElements() == 0)
-    NS_ASSERTION(queue.GetSize() == 0, 
+  if (!cache || !queue)
+    return;
+
+  if (queue->GetNumElements() == 0)
+    NS_ASSERTION(queue->GetSize() == 0, 

Any reason this isn't NS_ABORT_IF_FALSE?

Looks good otherwise. r+ with those changes + timeless'.
Attachment #413454 - Flags: review?(bobbyholley) → review+
I'm not convinced that it's worth writing code (and paying the costs for heap allocation and null checks) to handle exit() being called on a different thread. For one, I don't think that we should be calling exit() after crashing and I can't really think of any reason another thread would be justified in calling exit(). 

Can someone justify supporting calling exit() from a different thread?
I think it's worth avoiding static construction/destruction in shared libraries in general.  Static constructors and destructors run while the shared library loader is holding a lock (although with different variations between platforms).  Depending on platform, things like (a) loading other shared libraries (which can be triggered by do_CreateInstance) or (b) calling dladdr (which can be triggered by many in-process debugging tools like trace-malloc or nsTraceRefcnt) may need this same lock (in the trace-malloc case, while holding the malloc lock).  This creates risks for AB-BA deadlocks that worsen with the presence of such debugging tools.  (Doing anything that can call into the component manager or service manager during a static constructor is a problem even without debugging tools, I think.)

Can you find a way to do this without null-checks?  If imgLoader::InitCache fails, you shouldn't be doing anything with these tables anyway?  Can you just skip deleting them at shutdown if something has happened such that they might still be manipulated later (e.g., objects still alive)?
(Reporter)

Updated

8 years ago
Duplicate of this bug: 539169

Comment 6

8 years ago
jrmuizel: roughly (per bug 539169 comment 2), exit will happen on the main thread at the wrong time for a long time in the future (until we rewrite something or another).

Unfortunately, users do not expect Quit from the dock to result in a crash.

And as dbaron kinda notes, xpcom conventions forbid doing interesting work in static constructors/destructors.
Duplicate of this bug: 537320

Comment 8

8 years ago
Created attachment 424111 [details]
crash data

I'm crashing a lot during debugging, seems like it might be because of this issue. This is the full Apple crash report.

Comment 9

8 years ago
This is a frustrating bug - can we get an owner here?
(Reporter)

Comment 10

8 years ago
Oh, I own this - I just haven't come up with a nicer patch yet.
Assignee: nobody → joe

Updated

7 years ago
Blocks: 549767

Updated

7 years ago
No longer blocks: 549767

Comment 11

7 years ago
FYI: I see this daily in the crash test automation. The domains I've reproduced it on so far are:

addons.mozilla.org, blogs.ubc.ca, clbaddestfemale.wordpress.com, comparescreenshots.slicx.com, developer.mozilla.org, devices.live.com, forum.wawa-mania.ws, forums.nwp4life.com, jeremy.visser.name, member.yam.com, mercedesalkatreszek.cegvalaszto.hu, micro.msi.com, picleecher.net, readygames.com.br, romakhin.ru, screenshotcomparison.com, secure.provinet.net, services.addons.mozilla.org, sexynudesgirls.org, shineeshawols.wordpress.com, slipstreamtv.com, svithue.net, tfile.ru, thetravisty.com, tvdirect.jimdo.com, usuarios.multimania.es, www.accountcentralonline.com, www.alemdizi.com, www.biggby.com, www.cajasol.es, www.casinosbo.com, www.diziizleyelim.com, www.epubbud.com, www.esikis.net, www.fridcar.hu, www.hazirfilm.com, www.horogay.com, www.houseofpain.com, www.htcsense.com, www.izleyin.gen.tr, www.king.org, www.kvfan.net, www.minecraft.net, www.myyearbook.com, www.ndr.de, www.oberbank-banking.at, www.picleecher.net, www.pornoxo.com, www.provincial.com, www.sexyavenue.com, www.simbirskrealty.ru, www.softdownload.com.br, www.spanishdict.com, www.strawmachine-supplier.com, www.tnaondemand.com, www.usaa.com, www.veetle.com, www.vidyoara.com, www.watchop.com, www.ziddu.com

Note that I only tested the urls on these sites because our users are crashing there.
Blocks: 532972

Comment 12

7 years ago
Joe,

This continues to be a common Abort during crash automation testing. It can cause issues for automated testing by potentially hiding other crashes.

If it isn't important enough to fix, then perhaps it should be a non-fatal ASSERTION rather than an Abort?
(In reply to comment #12)
> Joe,
> 
> This continues to be a common Abort during crash automation testing. It can
> cause issues for automated testing by potentially hiding other crashes.
> 
> If it isn't important enough to fix, then perhaps it should be a non-fatal
> ASSERTION rather than an Abort?

Is the Abort happening during exit?

Comment 14

7 years ago
I think so. Sometimes it happens during normal shutdown of the browser

Assertion failure: ready > 0, at c:/work/mozilla/builds/2.0.0/mozilla/nsprpub/pr/src/md/windows/w32poll.c:322
nsStringStats
 => mAllocCount:          10856
 => mReallocCount:         1174
 => mFreeCount:            5403  --  LEAKED 5453 !!!
 => mShareCount:          15003
 => mAdoptCount:           1129
 => mAdoptFreeCount:       1118  --  LEAKED 11 !!!
###!!! ASSERTION: nsScriptCacheCleaner not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file c:/work/mozilla/builds/2.0.0/mozilla/content/base/src/nsFrameMessageManager.cpp, line 729
###!!! ABORT: imgCacheEntry release isn't thread-safe!: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file c:\work\mozilla\builds\2.0.0\mozilla\modules\libpr0n\src\imgLoader.h, line 85

other times I get an out of memory loading the page which I believe causes an exit

nsStringStats
 => mAllocCount:        1733731
 => mReallocCount:          888
 => mFreeCount:            5672  --  LEAKED 1728059 !!!
 => mShareCount:        1758318
 => mAdoptCount:          10886
 => mAdoptFreeCount:      10884  --  LEAKED 2 !!!
out of memory
###!!! ASSERTION: nsScriptCacheCleaner not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file c:/work/mozilla/builds/2.0.0/mozilla/content/base/src/nsFrameMessageManager.cpp, line 733
###!!! ABORT: imgCacheEntry release isn't thread-safe!: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file c:\work\mozilla\builds\2.0.0\mozilla\modules\libpr0n\src\imgLoader.h, line 85
Attachment #413454 - Flags: review+ → review-

Comment 15

6 years ago
will this fix land someday ? I got stuck at exactly the same issue.
Or is there an actual workaround for a proper cleanup ?

Comment 16

6 years ago
ping
(Reporter)

Comment 17

6 years ago
Please don't do that. https://bugzilla.mozilla.org/page.cgi?id=etiquette.html

When there are updates, you'll hear them.

Comment 18

6 years ago
ahh oki. Just saw it in another issue and thought it is usable this way. Never mind.
(Reporter)

Updated

4 years ago
Assignee: joe → nobody

Comment 19

2 years ago
We don't see this in automation anymore.
No longer blocks: 532972
You need to log in before you can comment on or make changes to this bug.