1131330 - shutdownhang in _VEC_memset via CacheFileMetadata

Reporter

Description

•

9 years ago

This bug was filed from the Socorro interface and is 
report bp-750265a8-35d0-45dc-8296-6db392150202.
=============================================================

In recent builds of 36 beta this is 0.83% of crashes but that relative percentage will probably rise as the media stuff gets sorted out.

0 	_VEC_memset 	f:\dd\vctools\crt\crtw32\string\i386\p4_memset.c
1 	je_free 	memory/mozjemalloc/jemalloc.c
2 	mozilla::net::CacheFileMetadata::~CacheFileMetadata() 	netwerk/cache2/CacheFileMetadata.cpp
3 	mozilla::net::CacheFileMetadata::`scalar deleting destructor'(unsigned int) 	
4 	mozilla::net::CacheFileMetadata::Release() 	netwerk/cache2/CacheFileMetadata.cpp
5 	nsRefPtr<XPCWrappedNative>::~nsRefPtr<XPCWrappedNative>() 	xpcom/base/nsRefPtr.h
6 	mozilla::net::CacheFile::~CacheFile() 	netwerk/cache2/CacheFile.cpp
7 	mozilla::net::CacheFile::`scalar deleting destructor'(unsigned int) 	
8 	nsBaseAppShell::Release() 	widget/nsBaseAppShell.cpp
9 	nsRefPtr<XPCWrappedNative>::~nsRefPtr<XPCWrappedNative>() 	xpcom/base/nsRefPtr.h
10 	mozilla::net::CacheEntry::~CacheEntry() 	netwerk/cache2/CacheEntry.cpp
11 	mozilla::net::CacheEntry::`scalar deleting destructor'(unsigned int) 	
12 	mozilla::net::CacheEntry::Release() 	netwerk/cache2/CacheEntry.cpp
13 	nsBaseHashtableET<nsCStringHashKey, nsRefPtr<mozilla::net::CacheEntry> >::`scalar deleting destructor'(unsigned int) 	
14 	nsTHashtable<nsBaseHashtableET<nsCStringHashKey, nsRefPtr<mozilla::net::CacheEntry> > >::s_ClearEntry(PLDHashTable*, PLDHashEntryHdr*) 	xpcom/glue/nsTHashtable.h
15 	nsTHashtable<StaticAtomEntry>::`scalar deleting destructor'(unsigned int) 	
16 	nsBaseHashtableET<nsCStringHashKey, nsAutoPtr<mozilla::net::CacheEntryTable> >::`scalar deleting destructor'(unsigned int) 	
17 	nsTHashtable<nsBaseHashtableET<nsCStringHashKey, nsAutoPtr<mozilla::net::CacheEntryTable> > >::s_ClearEntry(PLDHashTable*, PLDHashEntryHdr*) 	xpcom/glue/nsTHashtable.h
18 	mozilla::net::CacheStorageService::Shutdown() 	netwerk/cache2/CacheStorageService.cpp
19 	mozilla::net::CacheObserver::Observe(nsISupports*, char const*, wchar_t const*) 	netwerk/cache2/CacheObserver.cpp
20 	nsObserverList::NotifyObservers(nsISupports*, char const*, wchar_t const*) 	xpcom/ds/nsObserverList.cpp
21 	nsObserverService::NotifyObservers(nsISupports*, char const*, wchar_t const*) 	xpcom/ds/nsObserverService.cpp
22 	nsXREDirProvider::DoShutdown() 	toolkit/xre/nsXREDirProvider.cpp
23 	ScopedXPCOMStartup::~ScopedXPCOMStartup() 	toolkit/xre/nsAppRunner.cpp
24 	ScopedXPCOMStartup::`scalar deleting destructor'(unsigned int) 	
25 	XREMain::XRE_main(int, char** const, nsXREAppData const*) 	toolkit/xre/nsAppRunner.cpp
26 	XRE_main 	toolkit/xre/nsAppRunner.cpp

(Away)

Reporter

Comment 1

•

9 years ago

At first glance these "hangs" are during jemalloc's poisoning of freed memory. As an outsider to the cache code, there are many unknowns from my perspective:

* Is this stack merely a victim of some previous stack that took too long? Maybe something like bug 1124880 ate most of the 60 second watchdog timer, and this code just happened to come after it with little time left? I don't know the relative order of these shutdown items.

* It's easy to blame memory poisoning and try to hack around it. But my instinct is that's not the "right" root-cause solution.

* Maybe these buffers are too large and/or too many? I minidumps I see the buffers are 4k, which seems to come from kMinMetadataRead and kAlignSize. Did this TODO ever get resolved?
> #define kMinMetadataRead 1024  // TODO find optimal value from telemetry
> #define kAlignSize       4096

Flags: needinfo?(honzab.moz)

(Away)

Reporter

Comment 2

•

9 years ago

[Tracking Requested - why for this release]: 
Not entirely sure this meets your tracking threshold for 36, but worst case you'll just say no.

37 and 38 are affected in much lower volume. I'm guessing because those builds don't accumulate a lot of cache. So I assume we'll see higher volumes when those trains hit beta.

status-firefox36: --- → affected

status-firefox37: --- → affected

status-firefox38: --- → affected

tracking-firefox36: --- → ?

tracking-firefox37: --- → ?

tracking-firefox38: --- → ?

Michal Novotny [:michal]

Comment 3

•

9 years ago

(In reply to David Major [:dmajor] (UTC+13) from comment #1)
> * Maybe these buffers are too large and/or too many? I minidumps I see the
> buffers are 4k, which seems to come from kMinMetadataRead and kAlignSize.
> Did this TODO ever get resolved?
> > #define kMinMetadataRead 1024  // TODO find optimal value from telemetry
> > #define kAlignSize       4096

I did a quick test on my profile and the average metadata size is around 1kB. The memory limit for all metadata is 250kB by default, so we should keep around 250 CacheFileMetadata instances in the memory. In reality, it will be probably less because we waste some memory when we read the metadata from the disk due to kMinMetadataRead and kAlignSize.


(In reply to David Major [:dmajor] (UTC+13) from comment #2)
> 37 and 38 are affected in much lower volume. I'm guessing because those
> builds don't accumulate a lot of cache. So I assume we'll see higher volumes
> when those trains hit beta.

I'm not sure I fully understand what do you mean by "accumulate a lot of cache". You'll cache 250 entries after 5 minutes browsing.

(Away)

Reporter

Comment 4

•

9 years ago

Like I said, I don't know anything about this code. I'm just going on the crash data that I have, which is that most of these crashes are after many hours of uptime. I guessed that it involves accumulating a bunch of stuff: it shouldn't take too long to clobber a 4k buffer so I assumed there were tons of them.

Honza Bambas (:mayhemer)

Updated

•

9 years ago

Flags: needinfo?(honzab.moz) → needinfo?(michal.novotny)

Michal Novotny [:michal]

Comment 5

•

9 years ago

Honza, check whether my computation in comment #3 makes sense. There is nothing we can do about it in CacheFileMetadata. If we optimize more the buffer size during metadata reading then we would keep more CacheFileMetadatas in memory (more instances will fit within the 250kB limit), so in the end it won't help during shutdown. Maybe we should lower the 250kB limit and/or release metadata from memory after some time.

Flags: needinfo?(michal.novotny) → needinfo?(honzab.moz)

Robert Kaiser

Comment 6

•

9 years ago

David, should _VEC_memset be added to the prefix skiplist? If so, can you file a bug for that?

Lawrence Mandel [:lmandel] (use needinfo)

Comment 7

•

9 years ago

Tracking this crash. Realistically, unless we have a safe fix by tomorrow, this is unlikely to make 36.

Honza - Are you going to take this bug?

tracking-firefox36: ? → +

tracking-firefox37: ? → +

tracking-firefox38: ? → +

Sylvestre Ledru [:Sylvestre]

Comment 8

•

9 years ago

Too late for 36. Honza, let us know if you have plans for this for 37. thanks

status-firefox36: affected → wontfix

Lawrence Mandel [:lmandel] (use needinfo)

Comment 9

•

9 years ago

ni on jduell to see about getting this moving.

Flags: needinfo?(jduell.mcbugs)

Honza Bambas (:mayhemer)

Comment 10

•

9 years ago

I first want to understand why releasing few kB of memory blocks the shutdown.  It's a memory release operation, it should be fast.  Is releasing DOM trees also taking so long time?

Should we move to some kind of an "arena" allocation strategy and release all at once?

Michal, any IO or locking that could be delayed by IO involved here?

Flags: needinfo?(honzab.moz) → needinfo?(michal.novotny)

Honza Bambas (:mayhemer)

Comment 11

•

9 years ago

I also don't see this happen that often.  Some of the reports seems to be duplicates (roughly speaking) and may come from slow/overwhelmed machines being shutdown with all the application running.

I really don't know what we should improve here except removal of some of the locking we do in dtors.  

My intention is btw to cache even more (to enlarge the intermediate memory cache)  How are DOM and JS objects released from memory not casing these problems?

(Away)

Reporter

Comment 12

•

9 years ago

(In reply to Honza Bambas (:mayhemer) from comment #10)
> I first want to understand why releasing few kB of memory blocks the
> shutdown.  It's a memory release operation, it should be fast.

Our implementation of free() clobbers released memory with 0x5a for safety. It should be fast for small objects, unless there are tons of those objects, or there is some kind of pathological memory locality / cache thrashing issue. (I don't know about the DOM question)

The allocator uses a different kind of block for allocations >= 1MB. During free() it just unmaps the block, no need to write 0x5a. So using an arena would avoid this hang, but it feels like a hacky workaround.

Does anyone know if there is any potential to this theory?
> * Is this stack merely a victim of some previous stack that took too long?
> Maybe something like bug 1124880 ate most of the 60 second watchdog timer,
> and this code just happened to come after it with little time left? I don't
> know the relative order of these shutdown items.

Perhaps we could add some instrumentation to see how much time the cache shutdown is actually using.

Jason Duell

Comment 13

•

9 years ago

It does sound fishy to me that freeing the cache items (even with cache poisoning) should take long enough to cause a shutdownhang.   :dmajor's idea to capture cache shutdown times might be useful to figure out if we're a victim of some other stack, or if there actually is an issue (perhaps with locking, as Honza suggests).

Flags: needinfo?(jduell.mcbugs)

Lawrence Mandel [:lmandel] (use needinfo)

Comment 14

•

9 years ago

OK. Who can take on this investigation?

Tobias B. Besemer [:BesTo] (QA)

Comment 15

•

9 years ago

(In reply to Honza Bambas (:mayhemer) from comment #11)
> I also don't see this happen that often.

After FF Session Restore works better again (or I have now more trust again), I always (mostly) kill the task because shutdown of FF takes really to long.


> Some of the reports seems to be duplicates (roughly speaking) and may come from slow/overwhelmed
> machines being shutdown with all the application running.

Yes, my machine is slow/overwhelmed!
This comes by the HD mostly from the big fragmentation from the cache and by Mem from the resources FF needs to browser e.g. Facebook after some minutes.
(Privat ~2.3GB, Reserved ~2.7GB. But FF crash no more after the changes in mem management and so it seems I have no more a change to send in mem dumps for analyses, but it crashes e.g. the Session Restore anyway while FF is still running. I see this in the Browser Console.)


> My intention is btw to cache even more (to enlarge the intermediate memory
> cache)  How are DOM and JS objects released from memory not casing these
> problems?

I'm not really sure about which cache is here talked: Cache in mem or on HD?
I set my Cache on HD to "manual" and set the space much higher. Can this be a problem?
I also have a lot of problems with hanging scripts e.g. from Google or Facebook. Can such (slow & nearly) hanging scripts be a problem by the shutdown?

Tobias B. Besemer [:BesTo] (QA)

Comment 16

•

9 years ago

Btw. my last crash report:
https://crash-stats.mozilla.com/report/index/7dd6d66a-c8b2-49d9-8d77-bf1b02150219

Tobias B. Besemer [:BesTo] (QA)

Comment 17

•

9 years ago

OK, I tried to reproduce this crash ... but as it always is: It don't crashed in this case.

But anyway some infos & logs to this - maybe someone can analyses it and it helps.

I had 13 windows open with a lot of tabs - but only in 3 windows were some tabs loaded; normally only Facebook, GMail and some Mozilla Stuff.

I attach an anonymized mem report. The big pages should be FB.

I think that the JS Engine have mem leaks or don't give it free anymore - so after some minutes of browsing the browser is up at ~2.5GB mem.

I know that normally in windows profiles the timeout of the two dom.max is much higher as on other OSes, but I think it only freeze FF and don't help the user!

Normally a lot of scripts going to be slow in the background and hang but never reach the 40 sec. until the break dialog comes and you can stop the scripts with this dialog.
Also stopping scripts with this dialog is really difficult, takes a long time and isn't something for normal users ...

Anyway: If you set down the time and stop the hanging scripts the browser goes anyway fast to his mem limits!

Then, when I look in the Browser Console I see often some logs like this:

"Could not write session state file " Error: : NS_ERROR_OUT_OF_MEMORY: 
Stack trace:
postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
 "postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
" SessionFile.jsm:314

"Could not write session state file " Error: TypeError: invalid 'in' operand exn
Stack trace:
postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
 "postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
" SessionFile.jsm:314

ThreadSources.prototype._fetchSourceMap threw an exception: Error: Request failed for 'file:///internal://sourcemap1/': Component returned failure code: 0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH) [nsIChannel.asyncOpen]
Stack: fetch@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:529:25
ThreadSources.prototype._fetchSourceMap@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/script.js:5511:20
ThreadSources.prototype.fetchSourceMap@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/script.js:5465:18
SourceActor.prototype._getSourceText@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/script.js:2370:12
resolve@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/deprecated-sync-thenables.js:40:40
then@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/deprecated-sync-thenables.js:20:43
then@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/deprecated-sync-thenables.js:58:9
SourceActor.prototype.onSource@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/script.js:2470:12
SourceActor.prototype.onDisablePrettyPrint@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/script.js:2665:12
DSC_onPacket@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/main.js:1422:15
LocalDebuggerTransport.prototype.send/<@resource://gre/modules/devtools/dbg-client.jsm -> resource://gre/modules/devtools/transport/transport.js:545:11
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14
Line: 529, column: 24 DevToolsUtils.js:58:0

"Could not write session state file " Error: : NS_ERROR_OUT_OF_MEMORY: 
Stack trace:
postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
 "postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
" SessionFile.jsm:314

"Could not write session state file " Error: TypeError: invalid 'in' operand exn
Stack trace:
postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
 "postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
"

When I try now to shut down FF the normal way it takes ~2min. until the windows disappear and additional ~3min. were the process runs still with ~2.5GB in the background (I see them in the Task Manager) until the FF is really closed. (This is why I "close" FF in 99% of the situations via killing the task.)

If FF crash it happens while the last 3mins..

Sorry for posting all in this bug, but I don't know what all can belong to this bug, and that should be fixed in a other bug.
Anyway: I think as long FF have so much problems while running and before closing this should maybe investigated before because who know what else crashes and happens in the background and if the Session Store is the only thing that crash ...

(Btw.: A bug where users can attach mem reports with high mem for analyze would be nice. And if FF use less mem, then he should have less problems while closing, too.)

Tobias B. Besemer [:BesTo] (QA)

Comment 18

•

9 years ago

Attached file Example memory report — Details

Tobias B. Besemer [:BesTo] (QA)

Comment 19

•

9 years ago

OK, here some more informations about my problems and why I think that all those shutdown problems/bugs/crashers are just the last link in the chain and not the reason, why this problems aren't about "slow systems", the crash statistic don't really show the problem and why this is also a reason why FF lose in the last years market share to Chrome.

Since ~2 years I be part of the Facebook Editor Community and started to do QA for them, too.
So I was thinking it isn't a good idea to use Chrome for this and switched after some years back to FF.
I got all those hanging scripts and started to log them for FB, too.
Then doing QA for FF again and report hanging scripts (and other problems) to Google, too.

I was reading a little bit about the hanging scripts in FF and the (old) reasons (Yahoo) for setting the timeout higher. I also read (and saw) that the values on Windows system much higher then on other systems. So I set dom.max_chrome_script_run_time=20 and dom.max_script_run_time=5. (After some test I saw that this are good values to make QA for FB & G.)

When I leaf my PC e.g. overnight I just set FF to the Offline Mode.
What I exacting some hours later - when I come back - is that FF "almost" runs like before with the same amount of mem, performance, ...

This is how it goes when I come back:
At first I have problems to log in in the system again - this shows me that the scripts hang again.
When logged in I need ~30-60 min. to stop all scripts, log them, restart FF and be able to work again.
Normally I have hanging scripts from all ambitious page like GMail (and Talkgadet/Hangouts), Youtube (ytimg.com), G+ and FB (fbstatic-a.akamaihd.net). (And sometimes some jQuery Stuff/Pages.)
To be clear: All those scripts weren't hanging when I left the PC and FF was running fine!
While the scripts are hanging the light of my HD (2TB, WD, should be fast) is lighting constant because of much "work" on it.
FF is in task manager down on ~1.6-1.8GB and will be back on ~2.1-2.3GB after having all scripts stopped.
I think that the high action on my HD is coming from attempts from Windows to manage "some problems" and writing & reading all the time the page file.
Normally, when I was able to stop all hanging scripts because they needed longer the 5 sec. to response, FF go back to the high use of mem and the light of my HD goes off.

Stopping the scripts (even when the timeout is by 5 sec.) is a really difficult and time expansive task, because FF is hanging in this moment, too. (And sometimes the system gets really slow because of action from Windows in page file, on HD, in mem.)
OK, my test system isn't the newest but I just want to browse the web and it was working before, so it shouldn't be a problem, or?

I think there are 3 possible solutions to fix the complete hanging of FF while the scripts are hanging:
- Let the script make a break until the user have made his decision;
- Run the scripts on a lower priority (or maybe a own Exe) then the GUI of FF;
- Let FF stop the scripts like Chrome does it.

(Btw.: I think a own Exe for the JS Engine would be [in future] nice, because it would allow to use/test/compare e.g. with the V8 from Chrome.)

Even when I get the scripts stopped and logged, FF uses much to much mem!
So maybe the Session Store stops in the background with only a message in the Browser Console.
(No normal user will ever see this!)

This looks like this:
"Could not write session state file " Error: : NS_ERROR_OUT_OF_MEMORY:
Stack trace:
postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
"postMessage@resource://gre/modules/PromiseWorker.jsm:324:1
TaskImpl_run@resource://gre/modules/Task.jsm:315:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:873:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:749:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:691:37
"

Then I know it is really time for restarting FF!
(Often I have my daily update of FF then, too. So I can install that.)
To don't waste more time and don't create crash reports that maybe nobody can investigate, I just kill the task in the Task Manager.

I will also attach a new memory report of this scenario and some screenshots of the about:memory white visible informations to the bug.

Tobias B. Besemer [:BesTo] (QA)

Comment 20

•

9 years ago

Attached file memory-report_(2).json.gz — Details

Tobias B. Besemer [:BesTo] (QA)

Comment 21

•

9 years ago

Attached image 2015-02-21 05_23_47-about_memory - Firefox Developer Edition (Build 20150219004126).png — Details

Maybe it helps a dev.
Mem after stopping hanging scripts.

Tobias B. Besemer [:BesTo] (QA)

Comment 22

•

9 years ago

Attached image 2015-02-21 05_25_14-about_memory - Firefox Developer Edition (Build 20150219004126).png — Details

Maybe it helps a dev.
Mem after stopping hanging scripts.

Tobias B. Besemer [:BesTo] (QA)

Comment 23

•

9 years ago

Attached image 2015-02-21 05_26_14-about_memory - Firefox Developer Edition (Build 20150219004126).png — Details

Maybe it helps a dev.
Mem after stopping hanging scripts.

Tobias B. Besemer [:BesTo] (QA)

Comment 24

•

9 years ago

Attached file memory-report_newstart.json.gz — Details

Memory Report after restart of FF with all scripts working.

Loaded Pages:
about:memory
https://bugzilla.mozilla.org/attachment.cgi?bugid=1131330
http://dict.leo.org/
https://mail.google.com/mail/u/0/#inbox (+ Talkgadget)
https://plus.google.com/u/0/+TobiasBesemer/posts
https://www.youtube.com/ (logged in)
https://www.beta.facebook.com/notifications

Tobias B. Besemer [:BesTo] (QA)

Comment 25

•

9 years ago

Another nice log from the Browser Console that normally see no normal user:

TelemetryStopwatch: key "FX_SESSION_RESTORE_SERIALIZE_DATA_MS" was already initialized TelemetryStopwatch.jsm:52:0
TelemetryStopwatch: key "FX_SESSION_RESTORE_SERIALIZE_DATA_LONGEST_OP_MS" was already initialized TelemetryStopwatch.jsm:52:0
TelemetryStopwatch: key "FX_SESSION_RESTORE_WRITE_STATE_LONGEST_OP_MS" was already initialized TelemetryStopwatch.jsm:52:0
uncaught exception: out of memory <unknown>
uncaught exception: out of memory <unknown>
TelemetryStopwatch: key "FX_SESSION_RESTORE_SERIALIZE_DATA_MS" was already initialized TelemetryStopwatch.jsm:52:0
TelemetryStopwatch: key "FX_SESSION_RESTORE_SERIALIZE_DATA_LONGEST_OP_MS" was already initialized TelemetryStopwatch.jsm:52:0
TelemetryStopwatch: key "FX_SESSION_RESTORE_WRITE_STATE_LONGEST_OP_MS" was already initialized TelemetryStopwatch.jsm:52:0
uncaught exception: out of memory <unknown>
no element found request:1:1
uncaught exception: out of memory <unknown>
TelemetryStopwatch: key "FX_SESSION_RESTORE_SERIALIZE_DATA_MS" was already initialized TelemetryStopwatch.jsm:52:0
TelemetryStopwatch: key "FX_SESSION_RESTORE_SERIALIZE_DATA_LONGEST_OP_MS" was already initialized TelemetryStopwatch.jsm:52:0
TelemetryStopwatch: key "FX_SESSION_RESTORE_WRITE_STATE_LONGEST_OP_MS" was already initialized TelemetryStopwatch.jsm:52:0
uncaught exception: out of memory <unknown>
TelemetryStopwatch: key "FX_SESSION_RESTORE_SERIALIZE_DATA_MS" was already initialized TelemetryStopwatch.jsm:52:0
TelemetryStopwatch: key "FX_SESSION_RESTORE_SERIALIZE_DATA_LONGEST_OP_MS" was already initialized TelemetryStopwatch.jsm:52:0
TelemetryStopwatch: key "FX_SESSION_RESTORE_WRITE_STATE_LONGEST_OP_MS" was already initialized TelemetryStopwatch.jsm:52:0
uncaught exception: out of memory <unknown>

Think it's time again to kill the task ...

Michal Novotny [:michal]

Comment 26

•

9 years ago

(In reply to Honza Bambas (:mayhemer) from comment #10)
> Should we move to some kind of an "arena" allocation strategy and release
> all at once?

Is it possible that those people changed their browser.cache.disk.metadata_memory_limit preference? With the default value we simply cannot end up with a lot of entries in memory.

> Michal, any IO or locking that could be delayed by IO involved here?

I don't see any IO here. The Cache IO thread exists at that time, so all IO would be posted to the background thread. I've checked several reports and Cache IO thread is doing nothing in all reports I've seen so far.

Flags: needinfo?(michal.novotny)

Honza Bambas (:mayhemer)

Comment 27

•

9 years ago

(In reply to Michal Novotny (:michal) from comment #26)
> (In reply to Honza Bambas (:mayhemer) from comment #10)
> > Should we move to some kind of an "arena" allocation strategy and release
> > all at once?
> 
> Is it possible that those people changed their
> browser.cache.disk.metadata_memory_limit preference? With the default value
> we simply cannot end up with a lot of entries in memory.

I wouldn't say so much for sure.  There are mechanisms that may keep the entries alive and go well over the memory limit if there is nothing to release and if there is a bug we both eat memory and have a tun of entries kept alive.  Telemetry on entries lifetime [1] shows there is a big number of entries with very long lifetime.  There simply might be a bug!

> 
> > Michal, any IO or locking that could be delayed by IO involved here?
> 
> I don't see any IO here. The Cache IO thread exists at that time, so all IO
> would be posted to the background thread. I've checked several reports and
> Cache IO thread is doing nothing in all reports I've seen so far.

That is not exactly what I was asking for.  I'm interested in anything that could do I/O (on whatever thread) that main thread could somehow be blocked or waiting for result of before it can go on with the shutdown.  But that is a corner case, I would first focus on the memory pool limits logic.



[1] http://telemetry.mozilla.org/#filter=release%2F35%2FHTTP_CACHE_ENTRY_ALIVE_TIME%2Fsaved_session%2FFirefox&aggregates=multiselect-all!Submissions!Mean!5th percentile!25th percentile!median!75th percentile!95th percentile&evoOver=Builds&locked=true&sanitize=true&renderhistogram=Graph

Michal Novotny [:michal]

Comment 28

•

9 years ago

(In reply to Honza Bambas (:mayhemer) from comment #27)
> That is not exactly what I was asking for.  I'm interested in anything that
> could do I/O (on whatever thread) that main thread could somehow be blocked
> or waiting for result of before it can go on with the shutdown.  But that is
> a corner case, I would first focus on the memory pool limits logic.

There are two cases that I'm aware of but both happen after CacheStorageService::Shutdown() finishes so there is definitely no relation to this bug:

1) CacheFileIOManager::Shutdown() posts event to the IO thread and waits until it finishes.
2) CacheIndex::Shutdown() writes journal to disk synchronously. This is actually part of CacheFileIOManager::Shutdown().

There is a telemetry in CacheFileIOManager::Shutdown(), but I cannot see any report for it. Maybe it is collected so late during the shutdown that we never post the data to telemetry server, but this is just a guess.

Tobias B. Besemer [:BesTo] (QA)

Comment 29

•

9 years ago

(In reply to Michal Novotny (:michal) from comment #26)
> (In reply to Honza Bambas (:mayhemer) from comment #10)
> > Should we move to some kind of an "arena" allocation strategy and release
> > all at once?
> Is it possible that those people changed their
> browser.cache.disk.metadata_memory_limit preference? With the default value
> we simply cannot end up with a lot of entries in memory.

No.
I have the default:
browser.cache.disk.metadata_memory_limit = 250

Tobias B. Besemer [:BesTo] (QA)

Comment 30

•

9 years ago

I have set cache management to manual and set up the cache to 1GB.

(I have a 2TB HD, but we in Germany have normally much lower broadband then in the USA and so a bigger cache is better. I also think that the default value for the cache should be set up in general by Firefox/Mozilla.)

Can this cause problems?

Tobias B. Besemer [:BesTo] (QA)

Comment 31

•

9 years ago

I upgraded to FF38.0a2.

Mem looks in this version much better.
Scripts seem to work better, too.

So I started a test ...
I browsed some time and then leaf the PC with the browser open without switching in Offline Mode.

After I came back after some hours I had no problems with the scripts.
But mem was high anyway.
But no "out of memory" messages in the Browser Console.
And no big action on the HD because of page file management.

I mad a anonymized memorey-report and some screenshots of the not anonymized version that can be maybe helpful for the devs.
I attach all as a ZIP-file.

(Btw.: The option "anonymize" in "about:memory" is by "Save memory reports" but is also used - if you don't deselect it - by "Show memory reports". So it IMHO works wrong or is (like I prefer) on the wrong position to understand it clearly. Is there a bug open for this?)

After I mad the reports I tried to shut down FF.

Closing the windows was taking around 3 min.
After 3 min. more were the process was still in the Task Manager, FF crashed with a "[@ shutdownhang | _VEC_memset]".
This is the report to it:
https://crash-stats.mozilla.com/report/index/d7a03e0e-109b-4865-a9cb-01f402150225

Tobias B. Besemer [:BesTo] (QA)

Comment 32

•

9 years ago

Attached file FF38.0a2_-_20150223_-_memory-report_(by_BesTo).zip — Details

Honza Bambas (:mayhemer)

Comment 33

•

9 years ago

(In reply to Tobias B. Besemer from comment #32)
> Created attachment 8569764 [details]
> FF38.0a2_-_20150223_-_memory-report_(by_BesTo).zip

  {
   "process": "Main Process (pid 107192)",
   "path": "explicit/network/cache2/io",
   "kind": 1,
   "units": 0,
   "amount": 22656,
   "description": "Memory used by the cache IO manager."
  },
  {
   "process": "Main Process (pid 107192)",
   "path": "explicit/network/cache2/index",
   "kind": 1,
   "units": 0,
   "amount": 371808,
   "description": "Memory used by the cache index."
  },
  {
   "process": "Main Process (pid 107192)",
   "path": "explicit/network/cache2/service",
   "kind": 1,
   "units": 0,
   "amount": 139584,
   "description": "Memory used by the cache storage service."
  },
  {
   "process": "Main Process (pid 107192)",
   "path": "explicit/network/cache2/disk-storage(a,)",
   "kind": 1,
   "units": 0,
   "amount": 265998,
   "description": "Memory used by the cache storage."
  },
  {
   "process": "Main Process (pid 107192)",
   "path": "explicit/network/cache2/memory-storage(/M)",
   "kind": 1,
   "units": 0,
   "amount": 64624240,
   "description": "Memory used by the cache storage."
  },
  {
   "process": "Main Process (pid 107192)",
   "path": "explicit/network/cache2/memory-storage(a,/M)",
   "kind": 1,
   "units": 0,
   "amount": 192,
   "description": "Memory used by the cache storage."
  },
  {
   "process": "Main Process (pid 107192)",
   "path": "explicit/network/cache2/disk-storage()",
   "kind": 1,
   "units": 0,
   "amount": 9576790,
   "description": "Memory used by the cache storage."
  }, 


* cache2/memory-storage(/M) = 64'624'240
* explicit/network/cache2/disk-storage() = 9'576'790
* others look sane.

If not adjusted manually, an automatic max size for memory cache is 33'554'432, so we are somewhat out of line here.  The disk storage usage can be unstored data chunks, since as mentioned in comment 15 the machine is slow.  (We have priority and non-priority buckets each 10M by default.)  Anyway, after a long inactivity time, that seems improbable.


@Tobias, thanks for the report.  Now, can you please try your test with a new profile [1]?  And, can you please share (best sent to my bugzilla email) prefs.js file from your default profile [2] you are reproducing this bug with?

[1] http://kb.mozillazine.org/Creating_a_new_Firefox_profile_on_Windows

    or:
    run cmd
    cd C:\Program Files\Nightly (or where your installation resides)
    firefox.exe -profile C:\path\to\new-profile -no-remote


[2] http://kb.mozillazine.org/Profile_folder_-_Firefox

Flags: needinfo?(Tobias.Besemer)

Tobias B. Besemer [:BesTo] (QA)

Comment 34

•

9 years ago

(In reply to Honza Bambas (:mayhemer) from comment #33)

> * others look sane.

Don't think so!

After restating FF with Session Restore FF needs 520MB in Task Manager and have 605MB reserved.
After opening some pages FF needs 815MB and 895MB reserved.
Miles away from using 2,500MB and having 3,000MB reserved.

On the 1st screenshot you see at the top that there is a "heap-unclassified" that need ~650MB.
Then in "js-non-window/zones" under "strings" there are a lot of strings loaded.
Every string have thousand of copies.

On the 2th screenshot you see that even the browser identification string should have 42980 copies and need ~4MB !!!

On the 3th screenshot you see that there are also 129MB "string(<non-notable string>)" with a "malloc-heap" / "latin1" with 129MB.
Also a "gc-heap" with 10MB.
I also need 252MB for "compartment" and 497MB for "window-objects".

On the 4th screenshot you see that I need for each Facebook-Page ~10-20MB.
ATM I need for https://www.beta.facebook.com/notifications 21MB.
For comparison:
For https://de.wikipedia.org/wiki/Versionsgeschichte_von_Mozilla_Firefox with a logged in user I need ATM 8.65MB.
For a search on dict.leo.org I need ATM 5.50MB.
And for this bug I need ATM 5.45MB.

On the 5th screenshot you see that I need for https://theinternetoffendsme.wordpress.com/2013/04/09/the-real-story-behind-facebook-moderation-and-your-petty-reports/ 27.5MB!
ATM I need for this page 34.79MB were DOM needs ~10MB, the layout 8.23MB and the frames 4.10MB !!!

On the 6th screenshot you see that FF loads 3 images from my HD (for Download-Manager?) with "moz-icon" and need for each 19MB !!!

On the 9th screenshot you see that "about:newtab" need 202 event-counts.
Sounds much for me, or not?

On screenshot 10th you see that there are ~4,6 event-counter used for browser.xul+devtools.
ATM I need for the same 4,222 event-counter.
There is also the webconsole with 2,788 event-counts.
I don't have it open ATM so I don't have a comparison.


> The disk storage usage can be unstored data chunks, since as mentioned in
> comment 15 the machine is slow.

I don't know what you understand under "slow".
I just want to browse the web - what kind of machine I need with FF for that?
With this "slow" machine I had much more windows with much more tabs open with Chrome and no "out of memory" and no hangs and no crashes!
Seems Mozilla still don't understand why they losing all the time market share to Chrome ...
One explanation for you:
I - as a power user - have switched to Chrome.
All my friends - as power user - have switched to Chrome.
We all have suggested in the past to newbies the same browser we all used at this time: Firefox.
Later, we all have suggested to newbies the same browser we all used at this time: Chrome.
Now, when me somebody ask if he should use Chrome or FF I say: "You decision!"
Any more questions about losing market share ???



> @Tobias, thanks for the report.  Now, can you please try your test with a
> new profile [1]?

Sorry, no.
I really know how to create a new one, but in this profile I have all my loggings, cookies, bookmarks ... and testing would take hours or days were I can't use the PC for working with my normal profile.
And I need ATM this slow machine for working! ;-)
But if I should try to change settings in this profile, deleting, or replacing files, ... I will do that!



> And, can you please share (best sent to my bugzilla email)
> prefs.js file from your default profile [2] you are reproducing this bug
> with?

Yes, I will send you th prefs.js.

(Btw.: My prefs.js have 165KB and I got errors about that in the past in the Browser Console directly after starting FF. But I haven't looked at this a long time no more.)

Honza Bambas (:mayhemer)

Comment 35

•

9 years ago

(In reply to Tobias B. Besemer from comment #34)
> (In reply to Honza Bambas (:mayhemer) from comment #33)
> 
> > * others look sane.
> 
> Don't think so!

I was talking about what cache2 reported.  I didn't take care about the rest at all, it's not a concern for this particular bug.

> > The disk storage usage can be unstored data chunks, since as mentioned in
> > comment 15 the machine is slow.

Comment 15: "Yes, my machine is slow/overwhelmed!"


> > @Tobias, thanks for the report.  Now, can you please try your test with a
> > new profile [1]?
> 
> Sorry, no.
> I really know how to create a new one, but in this profile I have all my
> loggings, cookies, bookmarks ... and testing would take hours or days were I
> can't use the PC for working with my normal profile.
> And I need ATM this slow machine for working! ;-)
> But if I should try to change settings in this profile, deleting, or
> replacing files, ... I will do that!

Please leave the profile as is.  You are precious person to us!  You can easily reproduce this problem and are willing to communicate, it doesn't happen that often ;)  Changes to the profile may damage a way to reproduce.  If you can create a backup of your profile it would be great.  If you do so, please also backup the local part of the profile where the HTTP cache resides.  The regular (roaming) part is under %APPDATA%\Mozilla\Firefox\Profiles\xxxxxxxx.default and the local part is under %LOCALAPPDATA%\Mozilla\Firefox\Profiles\xxxxxxxx.default.  Thanks.


To try with a new profile it's just and only about few steps:
- run cmd.exe (Windows-R, type "cmd", press enter)
- then type:
- cd C:\Program Files (x86)\Mozilla Firefox\
- firefox -profile C:\Temp\Firefox-New-Profile\ -no-remote

A new Firefox window will open w/o any bookmarks, cookies and browsing history.  Give it a try for a time and see how the memory and performance goes.

This step is trying to eliminate influence of preference changes and extensions.  It's better to create a new profile since, as I say above, experimenting with your regular profile setting might ruin the way to reproduce this highly critical bug.

> 
> 
> 
> > And, can you please share (best sent to my bugzilla email)
> > prefs.js file from your default profile [2] you are reproducing this bug
> > with?
> 
> Yes, I will send you th prefs.js.

Thanks.

Lawrence Mandel [:lmandel] (use needinfo)

Comment 36

•

9 years ago

Honza - Are you going to take this bug? If so, 37 is marked as affected and, with only 4 Betas left, it is preferable to get a fix ASAP.

Flags: needinfo?(honzab.moz)

Tobias B. Besemer [:BesTo] (QA)

Comment 37

•

9 years ago

(In reply to Honza Bambas (:mayhemer) from comment #35)
> (In reply to Tobias B. Besemer from comment #34)
> > (In reply to Honza Bambas (:mayhemer) from comment #33)

Sorry for giving long time no answer! Was busy with other stuff ...


> > > The disk storage usage can be unstored data chunks, since as mentioned in
> > > comment 15 the machine is slow.
> Comment 15: "Yes, my machine is slow/overwhelmed!"

There was a "but because of FF" after that ... ;-)
But yes, had the old bill in my hands and the workstation is older then I was thinking ... ;-)


> > > @Tobias, thanks for the report.  Now, can you please try your test with a
> > > new profile [1]?
> > Sorry, no.
> > I really know how to create a new one, but in this profile I have all my
> > loggings, cookies, bookmarks ... and testing would take hours or days were I
> > can't use the PC for working with my normal profile.
> > And I need ATM this slow machine for working! ;-)
> To try with a new profile it's just and only about few steps:

Should I really write something to this answer ??? ^^

OK, but anyway ... I did it! ;-)

I didn't send you my prefs.js because I had a look at it and it have much data in e.g. for sync and NoScript ... also you have no @Mozilla.org-Email-Address ... ^^
... So maybe you should ask for one ... ^^ ;-)

OK, but because I was able to reproduce it with a new profile, my prefs.js shouldn't be from interest anymore ... ;-)


> A new Firefox window will open w/o any bookmarks, cookies and browsing
> history.  Give it a try for a time and see how the memory and performance
> goes.

OK, first thing:
- FF38.0a2 seems to use less mem! :-)
- Hanging scripts are nor much better! :-)

(But Firefox still hangs some seconds when I stop a script ... So maybe a little bit less priority?)

I had some different scenarios on shutdown:
- Not much mem used and work: FF close immediately the windows and remove itself from mem;
- Much mem used and don't work: Like described above with long time to close windows and stay in mem and then crash;
- Not much mem used and don't work: But anyway the crash (was new for me);
- Much mem used but works anyway (count down the mem in task manager);
- Much mem used but other crash: Crashed while shutdown with mozalloc_abort.

My crashes from the last days:
https://crash-stats.mozilla.com/report/index/7a8003f7-21c9-4ba0-954c-a69cf2150228
https://crash-stats.mozilla.com/report/index/6a7049b7-f1e0-4844-8d93-b8b4f2150306
https://crash-stats.mozilla.com/report/index/90cf406f-8da5-4e29-b527-84eb92150308
https://crash-stats.mozilla.com/report/index/aeb7af60-b491-45d9-a17b-c8b202150308
https://crash-stats.mozilla.com/report/index/e7459fc8-90db-472f-8949-0b3242150312
https://crash-stats.mozilla.com/report/index/8104ba93-e201-4d31-a57a-8e51a2150312

The 4th seems to be from GFX and is just to post all. ^^
The 5th is the mozalloc_abort.
The 6th is a @ js::gc::AllocateNonObject (while work[?]).

To reproduce it's normally really simple:
- New profile;
- Log in to Facebook;
- Open e.g. your own profile (no stream needed);
- Wait some hours (e.g. overnight);
- Try to shut down.

Normally - as far as I can remember - the browser can be also in offline mode while he runs for hours.
I get some/a lot of notifications, but I think you can reproduce the crash w/o getting one.
Normally the crash should work with every page (also Google, ...) that use complex JS.

As the crash only appears by Windows Systems, complex JS, long runtime ... and as far as I know have Win a not so good garbage collection ... so I guess that there is a mem leak in JS (or something similar like e.g. pointers, ...) that getting bigger and bigger over hours. Because Win don't manage that, it get's a problem ... and at shutdown of FF FF can't handle it and crash.
(The last crash while I was away and wasn't while shooting down FF, is also a JS Crash ...)

I will also upload a new mem report from me.
Not much tabs open, but a really huge use of mem!


Hope that all helps ... Cheers! ^^

Tobias B. Besemer [:BesTo] (QA)

Comment 38

•

9 years ago

Attached file memory-report.json.gz — Details

Flags: needinfo?(Tobias.Besemer)

Tobias B. Besemer [:BesTo] (QA)

Comment 39

•

9 years ago

Btw.: Can I now have my name in the credits of FF/Mozilla under "About" for QA ??? ^^

Tobias B. Besemer [:BesTo] (QA)

Comment 40

•

9 years ago

Bug 1143257 - crash in OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xrealloc | mozilla::CycleCollectedJSRuntime::DeferredFinalize(nsISupports*)

Bug 1143258 - crash in js::gc::AllocateNonObject<JSFatInlineString, int>(js::ExclusiveContext*)

status-firefox-esr38: --- → affected

Tobias B. Besemer [:BesTo] (QA)

Comment 41

•

9 years ago

Bug 1143260 - crash in shutdownhang | RtlEnterCriticalSection | PR_Lock | mozilla::net::CacheFile::~CacheFile()

Tobias B. Besemer [:BesTo] (QA)

Updated

•

9 years ago

Keywords: topcrash-win

Liz Henry (:lizzard) (relman/hg->git project)

Comment 42

•

9 years ago

Tobias, thanks for the help reproducing this bug and giving us more details! Have a look here https://www.mozilla.org/credits/faq/  That explains the criteria and the steps to follow to request your name to go into about:credits. 

Honza, the last 37 Beta goes to build this Thu, Mar 19 since this is a short release cycle. We may end up doing more than one RC build and this is a top crasher so I want to leave this tracked and keep an eye on it.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 43

•

9 years ago

Honza I just realized I forgot to needinfo you. We do still want to take this in 37 even though we're getting to the end of the beta cycle,  because it's a top crash. Are you able to take on this bug?

Lawrence Mandel [:lmandel] (use needinfo)

Comment 44

•

9 years ago

With no progress and it now being the end of the 37 cycle, I'm marking this bug as wontfix for 37. This is a topcrash. Can this please be prioritized for 38?

status-firefox37: affected → wontfix

Flags: needinfo?(jduell.mcbugs)

Honza Bambas (:mayhemer)

Comment 45

•

9 years ago

(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #44)
> With no progress and it now being the end of the 37 cycle, I'm marking this
> bug as wontfix for 37. This is a topcrash. Can this please be prioritized
> for 38?

Hi, sorry, I was sick for almost 2 weeks.  I'll take care of this bug as a priority now.

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Updated

•

9 years ago

Flags: needinfo?(jduell.mcbugs)

Honza Bambas (:mayhemer)

Updated

•

9 years ago

Comment 46

•

9 years ago

I'm looking ATM at https://crash-stats.mozilla.com/report/list?product=Firefox&signature=shutdownhang+|+_VEC_memset#tab-comments for interesting user comments ...

This one is interesting:
https://crash-stats.mozilla.com/report/index/3d0e566b-e078-40bb-a714-f154b2150323
User Comments: I've had frequent Firefox crashes -- I tend to leave my computer on overnight and Firefox simply cannot handle that. My entire computer is so screwed up from it that I have to reboot. Thinking about uninstalling Firefox -- not worth the hassle!

A lot of users also reporting (like me) that scripts hanging (in the past) and also that plug-ins hanging. Seems to be Flash but I don't know if this is really related to this crash or just something that happens a lot with FF in last time ...

Tobias B. Besemer [:BesTo] (QA)

Comment 47

•

9 years ago

https://crash-stats.mozilla.com/report/index/990bdd29-d2f2-4bd7-af52-aa4a82150326
User Comments: Microsoft Windows Msg :"Plugin Container for Firefox has stopped working"
The only program I left open was an Open Office spreadsheet when placing system in sleep mode. When opening system this morning, received message that Firefox had crashed along with message above.

Maybe the crash is also related to hanging scripts.
I recognized in the last days by my tests, that when a script is hanging while FF try to shut down, the shutdown stops until I have made a decision what FF should do. Normally senseless because I try to shut down FF ... (FF can continue the script or stop it - it shouldn't matter.)
Is it possible that scripts can still hang in the background and prevent FF from finish shutting down even when no window of FF is anymore displayed in Windows ???

Tobias B. Besemer [:BesTo] (QA)

Comment 48

•

9 years ago

https://crash-stats.mozilla.com/report/index/b12b0ab3-73af-473f-832f-cde872150327
User Comments: set netbook on sleep mode, opened and it crashed

Think this problem after sleep/hibernate (also by Bug 1143866) is similar to my description before and comes from the hanging scripts.

(In reply to Tobias B. Besemer from comment #19)
> This is how it goes when I come back:
> At first I have problems to log in in the system again - this shows me that
> the scripts hang again.
> When logged in I need ~30-60 min. to stop all scripts, log them, restart FF
> and be able to work again.
> Normally I have hanging scripts from all ambitious page like GMail (and
> Talkgadet/Hangouts), Youtube (ytimg.com), G+ and FB
> (fbstatic-a.akamaihd.net). (And sometimes some jQuery Stuff/Pages.)
> To be clear: All those scripts weren't hanging when I left the PC and FF was
> running fine!

Also some user report high mem and a lot of action in the pagefile, too.

Tobias B. Besemer [:BesTo] (QA)

Comment 49

•

9 years ago

https://crash-stats.mozilla.com/report/index/0280cb72-0ebb-4422-b4d3-d9ba82150327
User Comments: I had left my computer on overnight with the Gmail page open, when I opened the screen, a message about stopping the script came up. I said to stop script. The screen remained frozen. I closed the browser. When I tried to reopen the browser, this message came up.

Tobias B. Besemer [:BesTo] (QA)

Comment 50

•

9 years ago

I see a lot of reports w/ FF36 and frustrated users in the 279 user comments ...
... maybe somebody should think about "status-firefox37: wontfix" again ...

Tobias B. Besemer [:BesTo] (QA)

Comment 51

•

9 years ago

Btw.: Shutedown works the last days much better for me! :-)

Tobias B. Besemer [:BesTo] (QA)

Comment 52

•

9 years ago

Seems I be no more able to reproduce this crash.

I get now this signatures:

[@ shutdownhang | mozilla::dom::OwningNonNull<mozilla::dom::PositionCallback>::~OwningNonNull<mozilla::dom::PositionCallback>() ]
https://crash-stats.mozilla.com/report/index/2a983d25-8616-48bf-9d03-bb1702150329

[@ shutdownhang | nsStringBuffer::Release() ]
https://crash-stats.mozilla.com/report/index/934ebd31-4bd7-4c19-905e-1e43d2150330

Tobias B. Besemer [:BesTo] (QA)

Updated

•

9 years ago

OS: Windows NT → Windows

Sylvestre Ledru [:Sylvestre]

Comment 53

•

9 years ago

Honza, any news on this? Thanks

Flags: needinfo?(honzab.moz)

Sylvestre Ledru [:Sylvestre]

Updated

•

9 years ago

status-firefox39: --- → ?

status-firefox40: --- → ?

Tobias B. Besemer [:BesTo] (QA)

Comment 54

•

9 years ago

FF39: WFM

Tobias B. Besemer [:BesTo] (QA)

Comment 55

•

9 years ago

But 64bit.

Honza Bambas (:mayhemer)

Comment 56

•

9 years ago

I start to be afraid we (the cache) are just in a frame of the 65 secs after the shutdown because of some timeout that is ~60 seconds (favorite interval ;)).

According Tobias comments, it seems so.  The crash stack has changed.

Better shutdown tracking (what all happens there) would be great.

Anyway, I currently don't have ideas how to quickly proceed here.

Flags: needinfo?(honzab.moz)

Tobias B. Besemer [:BesTo] (QA)

Comment 57

•

9 years ago

Made some test with FF39.0a2 32bit, too.
No problems at this tests.
But will make some more those days.

Set now FF39.0a2 to "unaffected" (because WFM doesn't exist).


I have a new top shutdown problem that doesn't create a crash and a signature.
Maybe somebody can have a look at this.
Bug 1152113 - FF39.0a2 64bit stays in task manager when try to shut FF down

status-firefox39: ? → unaffected

Sylvestre Ledru [:Sylvestre]

Comment 58

•

9 years ago

This is too late for 38. Hopefully, 39 won't have this coming back.

status-firefox38: affected → wontfix

Tobias B. Besemer [:BesTo] (QA)

Comment 59

•

9 years ago

Versions 39.0a2 & 40.0a1 are still affected. Much less, but still affected.

Aurora last 7 days:
https://crash-stats.mozilla.com/report/list?date=2015-05-10&range_unit=days&range_value=7&signature=shutdownhang+|+_VEC_memset#tab-graph

Nightly last 7 days:
https://crash-stats.mozilla.com/report/list?date=2015-05-10&range_unit=days&range_value=7&signature=shutdownhang+|+_VEC_memset#tab-graph

Whats about "status-firefox-esr38": Wontfix?

status-firefox39: unaffected → affected

status-firefox40: ? → affected

Tobias B. Besemer [:BesTo] (QA)

Comment 60

•

9 years ago

Last 28 days:
https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=28&signature=shutdownhang+|+_VEC_memset

Operating System:
Operating System 	Percentage 	Number Of Crashes
Windows 7 		73.70% 		20102
Windows XP 		8.79% 		2397
Windows Vista 		7.74% 		2110
Windows 8.1 		7.05% 		1923
Windows 8 		1.73% 		472
Windows 10 		0.96% 		261
Windows Unknown 	0.03% 		9

Crashing Thread
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::`anonymous namespace'::RunWatchdog(void*)
 				toolkit/components/terminator/nsTerminator.cpp
1 	nss3.dll 	_PR_NativeRunThread
 				nsprpub/pr/src/threads/combined/pruthr.c
2 	nss3.dll 	pr_root
 				nsprpub/pr/src/md/windows/w95thred.c
3 	msvcr120.dll 	_callthreadstartex
 				f:\dd\vctools\crt\crtw32\startup\threadex.c:376
4 	msvcr120.dll 	msvcr120.dll@0x2c000 	
5 	kernel32.dll 	BaseThreadInitThunk 	
6 	ntdll.dll 	__RtlUserThreadStart 	
7 	ntdll.dll 	_RtlUserThreadStart

Comments 101. 

Some comments:
can not get on firefox or when i do can not get my mail from prtcnet.org 6.60
Submitted: 2015-12-01T23:25:15+00:00
https://crash-stats.mozilla.com/report/index/f0e312ac-e6c3-4639-ba45-178e52151201

i was playing farmville and it just stopped
Submitted: 2015-11-27T21:08:57+00:00
https://crash-stats.mozilla.com/report/index/edd3e709-2a38-47e1-bedd-22bf62151127

Maybe caused by a video downlader
Submitted: 2015-11-28T20:24:18+00:00
https://crash-stats.mozilla.com/report/index/55c3f7db-1c25-4cb8-8c66-fda0e2151128

this is almost an daily event .... can you fix this ???
Submitted: 2015-12-01T01:23:29+00:00
https://crash-stats.mozilla.com/report/index/5fad3029-6c93-4852-8281-d91a32151201

This is happening too frequently on Microsoft Vista!!! FIX THIS!!!
Submitted: 2015-12-02T09:28:40+00:00
https://crash-stats.mozilla.com/report/index/207a95e6-d177-4e8b-9e40-6b0aa2151202

About 12 time that FF crashed when doing a Windows 8.1 restart. 3 FF windows open, about 10 tabs in each.
Submitted: 2015-11-29T21:24:57+00:00
https://crash-stats.mozilla.com/report/index/f64a868c-dbb2-42dc-9490-b33ed2151129

URL: https://crash-stats.mozilla.com/repor...

status-firefox42: --- → affected

status-firefox43: --- → affected

status-firefox44: --- → affected

status-firefox45: --- → affected

status-thunderbird_esr38: --- → affected

Component: Networking: Cache → Hardware Abstraction Layer (HAL)

Wayne Mery (:wsmwk)

Comment 61

•

8 years ago

#53 crash for Thunderbird 38.5.1. 
At least one user with this signature also crashed with 
shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | _PR_MD_WAIT_CV | _PR_WaitCondVar | PR_WaitCondVar | PR_Wait | mozilla::ReentrantMonitor::Wait | nsEventQueue::GetEvent aka bug 1149287/bug 1224815

Whiteboard: [tbird crash]

alex_mayorga

Comment 62

•

8 years ago

¡Hola!

Ended up here from bp-b9860f85-2e51-43c8-83c2-8709b2160227

12805 crashes in the past week at https://crash-stats.mozilla.com/report/list?product=Firefox&signature=shutdownhang+|+_VEC_memset

¡Gracias!
Alex

Calixte Denizet (:calixte)

Comment 63

•

8 years ago

In release, the crash is spiking since 2016-05-04 (increased by ~36% until 2016-05-08) and is #11 in top-crashes for 46.0.1 and #10 for 47.0b3.

status-firefox46: --- → affected

status-firefox47: --- → affected

Honza Bambas (:mayhemer)

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → DUPLICATE

Calixte Denizet (:calixte)

Comment 65

•

8 years ago

Crash volume for signature 'shutdownhang | _VEC_memset':
  - beta (48): 4314
  - esr (45): 3144

Affected platform: Windows

status-firefox48: --- → affected

status-firefox-esr45: --- → affected

Honza Bambas (:mayhemer)

Comment 66

•

8 years ago

(In reply to Calixte Denizet (:calixte) from comment #65)
> Crash volume for signature 'shutdownhang | _VEC_memset':
>   - beta (48): 4314

Expected to be fixed by https://hg.mozilla.org/releases/mozilla-beta/rev/ec62fd62c9a7, which has landed on 2016-06-21.

Are the beta crashes with a buildid after that date?

>   - esr (45): 3144

This is expected.

> 
> Affected platform: Windows

Flags: needinfo?(cdenizet)

Calixte Denizet (:calixte)

Comment 67

•

8 years ago

(In reply to Honza Bambas (:mayhemer) from comment #66)
> (In reply to Calixte Denizet (:calixte) from comment #65)
> > Crash volume for signature 'shutdownhang | _VEC_memset':
> >   - beta (48): 4314
> 
> Expected to be fixed by
> https://hg.mozilla.org/releases/mozilla-beta/rev/ec62fd62c9a7, which has
> landed on 2016-06-21.
> 
> Are the beta crashes with a buildid after that date?

According to [1] there are 752 crashes with this signature after the patch has landed (48.0b3 has been released on 2016-06-23).


[1] https://crash-stats.mozilla.com/search/?product=Firefox&release_channel=beta&version=48.0b3&version=48.0b4&version=48.0b5&version=48.0b6&version=48.0b7&version=48.0b8&version=48.0b9&signature=%3Dshutdownhang%20%7C%20_VEC_memset&date=%3E%3D2016-06-21&_sort=-date&_facets=signature&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports


> 
> >   - esr (45): 3144
> 
> This is expected.
> 
> > 
> > Affected platform: Windows

Flags: needinfo?(cdenizet)

Honza Bambas (:mayhemer)

Comment 68

•

8 years ago

(In reply to Calixte Denizet (:calixte) from comment #67)
> (In reply to Honza Bambas (:mayhemer) from comment #66)
> > (In reply to Calixte Denizet (:calixte) from comment #65)
> > > Crash volume for signature 'shutdownhang | _VEC_memset':
> > >   - beta (48): 4314
> > 
> > Expected to be fixed by
> > https://hg.mozilla.org/releases/mozilla-beta/rev/ec62fd62c9a7, which has
> > landed on 2016-06-21.
> > 
> > Are the beta crashes with a buildid after that date?
> 
> According to [1] there are 752 crashes with this signature after the patch
> has landed (48.0b3 has been released on 2016-06-23).
> 
> 
> [1]
> https://crash-stats.mozilla.com/search/
> ?product=Firefox&release_channel=beta&version=48.0b3&version=48.
> 0b4&version=48.0b5&version=48.0b6&version=48.0b7&version=48.0b8&version=48.
> 0b9&signature=%3Dshutdownhang%20%7C%20_VEC_memset&date=%3E%3D2016-06-
> 21&_sort=-
> date&_facets=signature&_facets=build_id&_columns=date&_columns=signature&_col
> umns=product&_columns=version&_columns=build_id&_columns=platform#crash-
> reports
> 
> 
> > 
> > >   - esr (45): 3144
> > 
> > This is expected.
> > 
> > > 
> > > Affected platform: Windows

Michal, any idea what's still wrong here?

Flags: needinfo?(michal.novotny)

Michal Novotny [:michal]

Comment 69

•

8 years ago

(In reply to Honza Bambas (:mayhemer) from comment #68)
> > According to [1] there are 752 crashes with this signature after the patch
> > has landed (48.0b3 has been released on 2016-06-23).
> > 
> > [1]
> > https://crash-stats.mozilla.com/search/
> > ?product=Firefox&release_channel=beta&version=48.0b3&version=48.
> > 0b4&version=48.0b5&version=48.0b6&version=48.0b7&version=48.0b8&version=48.
> > 0b9&signature=%3Dshutdownhang%20%7C%20_VEC_memset&date=%3E%3D2016-06-
> > 21&_sort=-
> > date&_facets=signature&_facets=build_id&_columns=date&_columns=signature&_col
> > umns=product&_columns=version&_columns=build_id&_columns=platform#crash-
> > reports
> 
> Michal, any idea what's still wrong here?

No, but I can't find any cache related shutdown hang using the link above. For some reason crash-stats returns "No results were found" on second and any further result page so I could check only first 50 results and none of the crash is caused by CacheFileMetadata or any other cache code.

Calixte, open a new bug for those crashes with correct signature.

Flags: needinfo?(michal.novotny)

Example memory report 9 years ago Tobias B. Besemer [:BesTo] (QA) 631.00 KB, application/x-gzip		Details
memory-report_(2).json.gz 9 years ago Tobias B. Besemer [:BesTo] (QA) 718.78 KB, application/x-gzip		Details
2015-02-21 05_23_47-about_memory - Firefox Developer Edition (Build 20150219004126).png 9 years ago Tobias B. Besemer [:BesTo] (QA) 38.06 KB, image/png		Details
2015-02-21 05_25_14-about_memory - Firefox Developer Edition (Build 20150219004126).png 9 years ago Tobias B. Besemer [:BesTo] (QA) 114.19 KB, image/png		Details
2015-02-21 05_26_14-about_memory - Firefox Developer Edition (Build 20150219004126).png 9 years ago Tobias B. Besemer [:BesTo] (QA) 36.57 KB, image/png		Details
memory-report_newstart.json.gz 9 years ago Tobias B. Besemer [:BesTo] (QA) 588.56 KB, application/x-gzip		Details
FF38.0a2_-_20150223_-_memory-report_(by_BesTo).zip 9 years ago Tobias B. Besemer [:BesTo] (QA) 574.53 KB, application/x-zip-compressed		Details
memory-report.json.gz 9 years ago Tobias B. Besemer [:BesTo] (QA) 257.96 KB, application/x-gzip		Details