Closed
Bug 705761
Opened 12 years ago
Closed 8 years ago
chromehang | ZwQueryFullAttributesFile in nsDiskCacheDevice::Shutdown()
Categories
(Core :: Networking: Cache, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: jimm, Assigned: michal)
References
Details
(Keywords: hang, Whiteboard: [Snappy:P1])
Crash Data
Attachments
(1 file)
1.76 KB,
patch
|
jduell.mcbugs
:
review+
briansmith
:
review-
|
Details | Diff | Splinter Review |
All appear to be in the same place, occurring while deleting a "trash" folder. 0|0|ntdll.dll|ZwQueryFullAttributesFile|||0xa 0|1|KERNELBASE.dll||||0x81cc 0|2|xul.dll|nsAString_internal::ReplacePrepInternal(unsigned int,unsigned int,unsigned int,unsigned int)|hg:hg.mozilla.org/mozilla-central:xpcom/string/src/nsTSubstring.cpp:c58bad0b4640|209|0x10 0|3|xul.dll|nsAString_internal::Replace(unsigned int,unsigned int,nsSubstringTuple const &)|hg:hg.mozilla.org/mozilla-central:xpcom/string/src/nsTSubstring.cpp:c58bad0b4640|530|0xf 0|4|xul.dll|nsAString_internal::Assign(wchar_t const *,unsigned int)|hg:hg.mozilla.org/mozilla-central:xpcom/string/src/nsTSubstring.cpp:c58bad0b4640|336|0x15 0|5|xul.dll|GetFileInfo|hg:hg.mozilla.org/mozilla-central:xpcom/io/nsLocalFileWin.cpp:c58bad0b4640|460|0xf 0|6|KERNELBASE.dll||||0x13c5 0|7|xul.dll|nsLocalFile::ResolveAndStat()|hg:hg.mozilla.org/mozilla-central:xpcom/io/nsLocalFileWin.cpp:c58bad0b4640|814|0xd 0|8|xul.dll|nsLocalFile::AppendInternal(nsString const &,bool)|hg:hg.mozilla.org/mozilla-central:xpcom/io/nsLocalFileWin.cpp:c58bad0b4640|1107|0x44 0|9|xul.dll|IsShortcutPath|hg:hg.mozilla.org/mozilla-central:xpcom/io/nsLocalFileWin.cpp:c58bad0b4640|320|0x4b 0|10|xul.dll|nsLocalFile::IsDirectory(bool *)|hg:hg.mozilla.org/mozilla-central:xpcom/io/nsLocalFileWin.cpp:c58bad0b4640|2471|0x4 0|11|xul.dll|nsLocalFile::Remove(bool)|hg:hg.mozilla.org/mozilla-central:xpcom/io/nsLocalFileWin.cpp:c58bad0b4640|1833|0x13 0|12|xul.dll|nsCOMPtr_base::assign_from_qi(nsQueryInterface,nsID const &)|hg:hg.mozilla.org/mozilla-central:obj-firefox/xpcom/build/nsCOMPtr.cpp:c58bad0b4640|96|0x18 0|13|xul.dll|nsDirEnumerator::HasMoreElements(bool *)|hg:hg.mozilla.org/mozilla-central:xpcom/io/nsLocalFileWin.cpp:c58bad0b4640|652|0x28 0|14|xul.dll|nsCOMPtr<nsIFile>::nsCOMPtr<nsIFile>(nsQueryInterface)|hg:hg.mozilla.org/mozilla-central:obj-firefox/dist/include/nsCOMPtr.h:c58bad0b4640|623|0xe 0|15|xul.dll|nsLocalFile::Remove(bool)|||0xc2 0|16|xul.dll|nsLocalFile::Remove(bool)|||0xc2 0|17|xul.dll|DeleteDirThreadFunc|hg:hg.mozilla.org/mozilla-central:netwerk/cache/nsDeleteDir.cpp:c58bad0b4640|53|0xe 0|18|xul.dll|DeleteDir(nsIFile *,bool,bool,unsigned int)|hg:hg.mozilla.org/mozilla-central:netwerk/cache/nsDeleteDir.cpp:c58bad0b4640|115|0x7 0|19|xul.dll|GetTrashDir(nsIFile *,nsCOMPtr<nsIFile> *)|hg:hg.mozilla.org/mozilla-central:netwerk/cache/nsDeleteDir.cpp:c58bad0b4640|144|0xf 0|20|xul.dll|nsLocalFile::Exists(bool *)|hg:hg.mozilla.org/mozilla-central:xpcom/io/nsLocalFileWin.cpp:c58bad0b4640|2252|0x7 0|21|xul.dll|nsDiskCacheDevice::Shutdown()|||0x11
![]() |
Reporter | |
Updated•12 years ago
|
Crash Signature: [@ chromehang | ZwQueryFullAttributesFile ] → [@ chromehang | ZwQueryFullAttributesFile ]
[@ chromehang | ZwClose ]
[@ chromehang | NtClose ]
![]() |
Reporter | |
Comment 1•12 years ago
|
||
Those last two signatures include some other cases, but profile shutdown and the deleting of this folder appear to be the most common.
Comment 3•12 years ago
|
||
Most, but not all, of the dozen hangs I looked at in NtQueryFullAttributesFile are also in ::Shutdown().
Crash Signature: [@ chromehang | ZwQueryFullAttributesFile ]
[@ chromehang | ZwClose ]
[@ chromehang | NtClose ] → [@ chromehang | ZwQueryFullAttributesFile ]
[@ chromehang | ZwClose ]
[@ chromehang | NtClose ]
[@ chromehang | NtQueryFullAttributesFile]
Comment 4•12 years ago
|
||
Stack in comment one is due to browser invalidating our disk cache. Is there a reason this bug isn't under networking?
Comment 5•12 years ago
|
||
Nevermind, it's something in netwerk that removes .Trash, no idea what that directory is used for
Updated•12 years ago
|
Component: XPCOM → Networking: Cache
QA Contact: xpcom → networking.cache
Whiteboard: [Snappy:P1]
Comment 6•12 years ago
|
||
(In reply to Taras Glek (:taras) from comment #5) > Nevermind, it's something in netwerk that removes .Trash, no idea what that > directory is used for That's nsDeleteDir, isn't it? (ref bug 670911)
Comment 7•12 years ago
|
||
Is this on trunk or a release? Michal fixed a race condition bug with the trash deletion that IIRC should be in aurora at this point. Might be a dupe of bug 695003?
Comment 8•12 years ago
|
||
Oh, I'm wrong--bug 695003 hasn't landed yet. We did fix another cache deletion issue in bug 681407, but the stack trace is different. But if this is somehow related to either 1) multiple threads deleting the same directory, or 2) the delete thread continuing to delete stuff too late in XPCOM shutdown, then bug 695003 might still be our answer. Assigning to michal for now--let me know if you get stuck or can't get to it soon. Do we have any idea how common this is? Would be nice to help us prioritize. If this is low-frequency, one answer might just be to land bug 695003 and see if this goes away.
Assignee: nobody → michal.novotny
Comment 9•12 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #8) > Do we have any idea how common this is? With combined signatures, it's #1 top crasher in 11.0a1 from build 20111126 to build 20111129 (hang detector turn on and plugin time out lower than chrome time out).
Depends on: 695003
Updated•12 years ago
|
Crash Signature: [@ chromehang | ZwQueryFullAttributesFile ]
[@ chromehang | ZwClose ]
[@ chromehang | NtClose ]
[@ chromehang | NtQueryFullAttributesFile] → [@ chromehang | ZwQueryFullAttributesFile ]
[@ chromehang | ZwClose ]
[@ chromehang | NtClose ]
[@ chromehang | NtQueryFullAttributesFile]
[@ chromehang | NtOpenFile]
[@ chromehang | ZwOpenFile]
[@ chromehang | ZwQueryDirectoryFile]
![]() |
Reporter | |
Comment 10•12 years ago
|
||
Generally happens more on Win7 than other os, and as scoob pointed out if you combine the various signatures this shows up under it's very common. ZwQueryFullAttributesFile: https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2011-11-30&signature=chromehang%20|%20ZwQueryFullAttributesFile&version=Firefox%3A11.0a1 To find the actual stack, you have to look at the raw dump in these. See bug 706027 for background on the hang detector.
![]() |
Reporter | |
Comment 11•12 years ago
|
||
Correction - bug 429592 is the hang detector bug.
Comment 12•12 years ago
|
||
BTW what counts as a "hang?" for the hang detector? Is it doing a Turing-complete analysis to make sure the program will never exit (sorry-that's a joke :) or are we just hitting a timeout? The cache deletion may take a while (and used to cause a crash in come cases--that's fixed) but not an actual permanent hang. As long as this isn't a permanent hang, I'm fairly confident bug 695003 will fix this for all but the (rare) cases where a user has selected "clear private data" AND their browser session has accumulated a large cache, in which case it's still a *feature* that we wait for the whole cache to be deleted: (see bug 695003 comment 27).
Comment 13•12 years ago
|
||
By "clear private data" I mean Prefs | Privacy | custom settings for history | clear history when APP closes. Don't have data on how many users use it, but suspect it's a small percentage (and only a small percent of those will have long enough sessions to suffer long shutdown hangs, hopefully. And they asked for it :)
Comment 14•12 years ago
|
||
I think it triggers when main thread hasn't responded in 30 seconds.
![]() |
||
Comment 15•12 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #12) > BTW what counts as a "hang?" for the hang detector? See the patch in bug 429592 for details, but it's when our main thread doesn't respond to pings from another thread for 30 seconds, AFAIK.
![]() |
Reporter | |
Comment 16•12 years ago
|
||
On windows it's tied into the main event dispatch loop: http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsAppShell.cpp#318
Assignee | ||
Comment 17•12 years ago
|
||
As Jason already wrote, most of the crashes should be now fixed by bug 695003. This patch suspends the hang detector in case that user wants to clear the cache on shutdown.
Attachment #578835 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Comment 18•12 years ago
|
||
There are also a rare cases in the crash stats that will be fixed neither by 695003 nor by this bug (e.g. https://crash-stats.mozilla.com/report/index/ff8ade26-f283-4c3c-9321-76a2b2111128). I'll fill a follow-up bug for this issue.
Assignee | ||
Comment 19•12 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #18) > I'll fill a follow-up bug for this issue. bug #707436
Comment 20•12 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #13) > By "clear private data" I mean Prefs | Privacy | custom settings for history > | clear history when APP closes. Don't have data on how many users use it, > but suspect it's a small percentage (and only a small percent of those will > have long enough sessions to suffer long shutdown hangs, hopefully. And > they asked for it :) I just posted on dev.platform, saying that we should remove this feature, partially for this reason. Please comment in that thread.
Comment 21•12 years ago
|
||
https://crash-stats.mozilla.com/report/index/bp-3239f21b-f08d-4d8f-9601-299852111212 Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:11.0a1) Gecko/20111212 Firefox/11.0a1 ID:20111212031122 Should I file another bug or this one covers this hang?
Comment 22•12 years ago
|
||
Comment on attachment 578835 [details] [diff] [review] fix Review of attachment 578835 [details] [diff] [review]: ----------------------------------------------------------------- Michal: I don't think we'll need this once bug 709262 lands, correct?
Attachment #578835 -
Flags: review?(jduell.mcbugs)
Comment 23•12 years ago
|
||
Can verify that the incidence of this has gone down significantly? We shouldn't be seeing this hang (or at best should see it a very small % of the time, for users who have "Clear private data at shutdown set", and/or hit bug 707436) as of builds after 12/4, ie after https://bugzilla.mozilla.org/show_bug.cgi?id=695003#c34
Assignee | ||
Comment 24•12 years ago
|
||
Comment on attachment 578835 [details] [diff] [review] fix (In reply to Jason Duell (:jduell) from comment #22) > Michal: I don't think we'll need this once bug 709262 lands, correct? See https://bugzilla.mozilla.org/show_bug.cgi?id=709964#c1
Attachment #578835 -
Flags: review?(jduell.mcbugs)
![]() |
||
Comment 25•12 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #23) > Can verify that the incidence of this has gone down significantly? All chromehang signatures have faded away as we disabled the hang monitor that made them happen on 2011-11-29 in bug 705748, so Nightly builds since 2011-11.30 don't see those signatures any more. That didn't make the hangs go away, it just made them not report crashes any more.
Assignee | ||
Comment 26•12 years ago
|
||
(In reply to alex_mayorga from comment #21) > https://crash-stats.mozilla.com/report/index/bp-3239f21b-f08d-4d8f-9601- > 299852111212 > > Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:11.0a1) Gecko/20111212 > Firefox/11.0a1 ID:20111212031122 > > Should I file another bug or this one covers this hang? This is a different hang, file another bug.
Comment 27•12 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #26) > (In reply to alex_mayorga from comment #21) > > https://crash-stats.mozilla.com/report/index/bp-3239f21b-f08d-4d8f-9601- > > 299852111212 > > > > Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:11.0a1) Gecko/20111212 > > Firefox/11.0a1 ID:20111212031122 > > > > Should I file another bug or this one covers this hang? > > This is a different hang, file another bug. Filed bug 710705
Comment 28•12 years ago
|
||
Comment on attachment 578835 [details] [diff] [review] fix Review of attachment 578835 [details] [diff] [review]: ----------------------------------------------------------------- OK, this will now be a rare case, but if you want to suspend the hang monitor for it I'm OK with that.
Attachment #578835 -
Flags: review?(jduell.mcbugs) → review+
Comment 29•12 years ago
|
||
Comment on attachment 578835 [details] [diff] [review] fix I think that this will be very rare now, but this still isn't the right fix. The right fix will be whatever we do for bug 648232, where we block the UI the "right" way without making the UI unresponsive (e.g. modal dialog box, like we use in PSM for long-running operations). I don't think we need this patch to wallpaper over things because bug 705748 already temporarily disabled in bug 705748. I suggest we make this (or a follow-up bug) depend on bug 648232, and implement the fix for bug 648232 with this in mind. If we do that, then we could do the "clear cache" operation upon closing the preferences dialog box, which would cover the edge case that we are worried about.
Attachment #578835 -
Flags: review-
Comment 30•12 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #29) > I don't think we need this patch to wallpaper over things because bug 705748 > already temporarily disabled in bug 705748. In English: I don't think we need this patch to wallpaper over the problem because bug 705748 already temporarily disables the hang detector completely.
Comment 31•10 years ago
|
||
Well after chain-crashing about 12 more times in a row, I figured I'd toss in some wisdom from one of my MOZILLA CRASH REPORTs (under the RAW DUMP tab). https://crash-stats.mozilla.com/report/index/532a04c4-abae-449e-b5de-8304e2131023 OS|Windows NT|5.1.2600 Service Pack 3 CPU|x86|AuthenticAMD family 15 model 107 stepping 2|2 Crash|EXCEPTION_BREAKPOINT|0x3e119c|10 Module|mozalloc.dll|24.0.0.5001|mozalloc.pdb|C4CA4FDDBAD64EB8A36992C68FF8B63F1|0x003e0000|0x003e5fff|0 10|0|mozalloc.dll|mozalloc_abort(char const * const)|hg:hg.mozilla.org/releases/mozilla-release:memory/mozalloc/mozalloc_abort.cpp:7c3b0732e765|30|0x0 10|1|xul.dll|NS_DebugBreak|hg:hg.mozilla.org/releases/mozilla-release:xpcom/base/nsDebugImpl.cpp:7c3b0732e765|417|0xb And of course a whole bunch of other stuff, but the critical break is mozalloc.dll|mozalloc_abort(char const * const) (Which probably explains why the thing chews up 1.7 GB of RAM & 1.5 GB of disk space before the freakout goes into chain-loop crashing.) Bug 901346 - crash in mozalloc_abort(char const* const) | js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) https://bugzilla.mozilla.org/show_bug.cgi?id=901346 2012-01-08 20:06 PST by rtrappman Bug 716443 - Large Data Submission causes crash: [@ mozalloc_abort(char const* const) | mozalloc_handle_oom() | xul.dll@0x126615 ] via js::Shape::set https://bugzilla.mozilla.org/show_bug.cgi?id=716443 2013-08-04 09:36 PDT by Marcia Knous [:marcia]
Comment 32•10 years ago
|
||
Thankfully this crash in mozalloc_abort(char const* const) is "resolved" as in STILL CRASHING EVERYONE'S COMPUTERS. The new definition of "Resolved" = "Never Fixed, Ignored for a decade".
Gridlock: you're in the wrong bug. Also, if you take a look at http://dxr.mozilla.org/mozilla-central/source/memory/mozalloc/mozalloc_abort.cpp?from=mozalloc_abort#l22, you will realize that mozalloc_abort crashes *by definition* - it is one of our main assertion failure mechanisms.
Comment 34•10 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=637282 Bug 637282 - Crash [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PPluginInstance::Transition(mozilla::plugins::PPluginInstance::State, mozilla::ipc::Trigger, mozilla::plugins::PPluginInstance::State*) ] Reported: 2011-02-28 05:08 PST by Scoobidiver (away) SO why exactly are we filling the a CHAR length variable with the square of a constant variable? char = Character or small integer. 1byte = signed: -128 to 127 unsigned: 0 to 255 What is the value of the CONST variable and is it bigger than one byte? If it is, then the source of the crash is pretty obvious to even a first year computer programmer. Conflict stemming from implicit retyping (which can lead to buffer overflows which corrupt your active dataspaces). http://www.4d.com/docs/CMU/CMU10048.HTM Some functions return variables of a very precise data type. Assigning the result of one of such variables to a variable already typed differently will cause a data type conflict if you are not careful.
Comment 35•10 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] from comment #33) > Gridlock: you're in the wrong bug. Also, if you take a look at > http://dxr.mozilla.org/mozilla-central/source/memory/mozalloc/mozalloc_abort. > cpp?from=mozalloc_abort#l22, you will realize that mozalloc_abort crashes > *by definition* - it is one of our main assertion failure mechanisms. I see... and why is this happening on WindowsXP if this deliberate bug is supposed to be catching only UNIX Malloc errors? #if defined(XP_WIN) || defined(XP_OS2) # define MOZALLOC_EXPORT __declspec(dllexport) #endif Going to the source code, I have the message in the code. MOZALLOC_EXPORT #if !defined(__arm__) MOZ_NORETURN #endif void mozalloc_abort(const char* const msg); #endif /* ifndef mozilla_mozalloc_abort_h */ And yet the error is mozalloc_abort(char const* const) mozalloc_abort(char const * const) It is just some very messy looking pointer coding to me and the error message looks like a byte-length character variable is being squared by a constant reference variable which is exceptionally confusing to a casual viewer. Are you certain there isn't some unintentional hijinks in play with Wild Pointers or Dangling Pointers? It looks like this is something that should have a second process in play watching the memory allocation exceptions to ensure that a cleanup is double-checking to make certain memory is actually being released rather than being left to zombie memory allocations after being hooked for process observation. (Like a maid patiently checking that the garbage in the trash cans have all been tossed into the incinerator)
Comment 36•10 years ago
|
||
Sorry about the annoyance, but given the quirky aspects of C++ versus standard C the error message appears like a bad variable initialization math expression rather than an odd choice of pointer allocation exception handling.
Gridlock, let's move this conversation to IRC. Here, we are adding noise in an unrelated bug. ( irc.mozilla.org, channels #introduction or #firefox )
Comment 38•10 years ago
|
||
I wanted to at least briefly answer comment #36, since I'm not sure I understand your confusion. 1) |const char * const| is the same type as |char const * const| - a read-only pointer to one or more read-only chars. The latter is arguably more consistent , but allowing both |const char| and |char const| is just a quirk of the language. To be clear: |char const *| is a non-const pointer to const chars, and |char * const| is a const pointer to non-const chars. 2) This parameter just refers to a null-terminated ASCII string, presumably intended to indicate why the malloc failed (in your case, it sounds like it ran out of address space). There's no type punning or other weirdness going on. I hope that clears things up - if not, I must be missing something about what you're saying.
![]() |
||
Comment 39•10 years ago
|
||
chromehang crashing is turned off by default in current versions. You have turned it on specifically in your preferences, I think, otherwise you cannot get to even crashing this way.
![]() |
||
Comment 40•10 years ago
|
||
Oh, and BTW, in mozalloc_abort crashes, the most important info is in the abort message, which you will find in the app notes field: xpcom_runtime_abort(###!!! ABORT: HangMonitor triggered: file e:/builds/moz2_slave/rel-m-rel-w32_bld-000000000000/build/xpcom/threads/HangMonitor.cpp, line 112) But then, this code is only active if the preference hangmonitor.timeout is set to anything else than the default value of 0. Check about:config to see if that is set to anything else and reset it to disable this code.
Comment 41•9 years ago
|
||
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #29) > Comment on attachment 578835 [details] [diff] [review] > fix > > I think that this will be very rare now, but this still isn't the right fix. > The right fix will be whatever we do for bug 648232, where we block the UI > the "right" way without making the UI unresponsive (e.g. modal dialog box, > like we use in PSM for long-running operations). > > I don't think we need this patch to wallpaper over things because bug 705748 > already temporarily disabled in bug 705748. > > I suggest we make this (or a follow-up bug) depend on bug 648232, and > implement the fix for bug 648232 with this in mind. If we do that, then we > could do the "clear cache" operation upon closing the preferences dialog > box, which would cover the edge case that we are worried about. This was fixed in 32. So this bug can be closed?
Flags: needinfo?(brian)
Comment 42•9 years ago
|
||
I have no idea what's happening with the disk cache nowadays.
Flags: needinfo?(brian)
![]() |
||
Comment 44•8 years ago
|
||
We are fully async with cache2, and this code is triggered not any more (even for app cache).
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(honzab.moz)
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•