Closed Bug 1267980 Opened 4 years ago Closed 4 years ago

crash in shutdownhang | memset | free_impl | mozilla::net::CacheFileChunk::~CacheFileChunk

Categories

(Core :: Networking, defect, critical)

48 Branch
All
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: marco, Assigned: michal)

References

Details

(Keywords: crash, Whiteboard: [necko-active])

Crash Data

Attachments

(2 files, 4 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-ca6c7447-db95-412a-a303-22ebc2160426.
=============================================================

Filing this bug because there is no associated report to that signature yet.

This is #49 top crasher on Firefox 48.
See Also: → 1267981
Flags: needinfo?(michal.novotny)
Assigning to Michal for investigation. Please dupe if we already have a bug for this.
Assignee: nobody → michal.novotny
Whiteboard: [necko-active]
It seems that there are a lot of entries and freeing them takes too much time. But why is releasing a memory so slow? Especially in CacheFileChunk we're releasing a single large 256kB buffer. Also it is strange that this bug and bug 1267981 are Windows only bugs.

We could probably try to measure how much time it takes to free all CacheEntry instances together with all their members.

If it's really slow then easiest is IMO to leak the memory during shutdown. We could also create a memory pool for buffers in cache code, but I'd like to avoid that, especially if we're not sure that freeing memory in large blocks would solve the problem.

Honza, do you have any opinion?
Flags: needinfo?(michal.novotny) → needinfo?(honzab.moz)
(In reply to Michal Novotny (:michal) from comment #2)
> we're releasing a single large 256kB buffer. Also it is strange that this
> bug and bug 1267981 are Windows only bugs.

It's not: https://crash-stats.mozilla.com/signature/?signature=shutdownhang+%7C+memset_pattern4+%7C+memset+%7C+arena_dalloc+%7C+free+%7C+mozilla%3A%3Anet%3A%3ACacheFileChunk%3A%3A%7ECacheFileChunk#aggregations

We do poisoning on the memory, that is the slow part.

> 
> If it's really slow then easiest is IMO to leak the memory during shutdown.

That's not a bad idea, actually.

> We could also create a memory pool for buffers in cache code, 

There are more arguments against it, we don't want large allocations and mainly to allocate memory that will always be to some, probably significant, ratio unused.



Nick or Benjamin, how to allocate memory that would avoid poisoning it during its deallocation?
Flags: needinfo?(n.nethercote)
Flags: needinfo?(honzab.moz)
Flags: needinfo?(benjamin)
(In reply to Honza Bambas (:mayhemer) from comment #3)
> > If it's really slow then easiest is IMO to leak the memory during shutdown.
> That's not a bad idea, actually.
If you do that, be sure to guard the cleanup code behind NS_FREE_PERMANENT_DATA rather than DEBUG, or LSan will report leaks.
> Nick or Benjamin, how to allocate memory that would avoid poisoning it
> during its deallocation?

I don't think it's possible to avoid it for heap allocations.

You could instead use mmap/VirtualAlloc, but that requires some platform-specific code. There are plenty of examples in the codebase where we do that.
Flags: needinfo?(n.nethercote)
(In reply to Nicholas Nethercote [:njn] from comment #5)
> You could instead use mmap/VirtualAlloc, 

Please see bug 1015957 comment 2 why we have decided not to do it.  However, if there is a potential to make this work somehow well, I would be all for it!
Leaking memory during shutdown is perfectly reasonable in non-leakchecking builds. I don't know the details about NS_FREE_PERMANENT_DATA or whatnot that are required to get that correct.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #7)
> Leaking memory during shutdown is perfectly reasonable in non-leakchecking
> builds. I don't know the details about NS_FREE_PERMANENT_DATA or whatnot
> that are required to get that correct.

In other words, leaking this in release builds is another option, OK.


Michal, how hard would it be to leak chunk buffers?
Flags: needinfo?(michal.novotny)
Attached patch fix (obsolete) — Splinter Review
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7065be6d8bfd
Flags: needinfo?(michal.novotny)
Attachment #8749946 - Flags: review?(honzab.moz)
Why are you introducing a new flag when we already have CacheFileIOManager::mShutdownDemanded set on (almost) the exact same place?  Rather have just one?
Michal, comment 10.
Flags: needinfo?(michal.novotny)
Attached patch patch v2 (obsolete) — Splinter Review
(In reply to Honza Bambas (:mayhemer) from comment #10)
> Why are you introducing a new flag when we already have
> CacheFileIOManager::mShutdownDemanded set on (almost) the exact same place? 
> Rather have just one?

It is set too late. The crashes reported in this bug happen before CacheFileIOManager::Shutdown is called. But you're right we can have a single flag. I moved mShutdownDemandedTime into CacheObserver and it is now used in both cases. Note that I had to change TimeStamp to PRIntervalTime to be able to make it atomic because it is now used on multiple threads. Also sMaxShutdownIOLag should be IMO atomic and I changed it to unsigned int because I was too lazy to add mozilla::Preferences::AddAtomicIntVarCache() method.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=58dfc0520bcd
Attachment #8749946 - Attachment is obsolete: true
Attachment #8749946 - Flags: review?(honzab.moz)
Flags: needinfo?(michal.novotny)
Attachment #8750217 - Flags: review?(honzab.moz)
Comment on attachment 8750217 [details] [diff] [review]
patch v2

Sorry, this patch is incomplete, I didn't remove mShutdownDemanded from CacheFileIOManager...
Attachment #8750217 - Attachment is obsolete: true
Attachment #8750217 - Flags: review?(honzab.moz)
Attached patch patch v3 (obsolete) — Splinter Review
complete patch

https://treeherder.mozilla.org/#/jobs?repo=try&revision=890c7ec9a3fa
Attachment #8750256 - Flags: review?(honzab.moz)
(In reply to Michal Novotny (:michal) from comment #12)
> Created attachment 8750217 [details] [diff] [review]
> patch v2
> 
> (In reply to Honza Bambas (:mayhemer) from comment #10)
> > Why are you introducing a new flag when we already have
> > CacheFileIOManager::mShutdownDemanded set on (almost) the exact same place? 
> > Rather have just one?
> 
> It is set too late. The crashes reported in this bug happen before
> CacheFileIOManager::Shutdown is called. But you're right we can have a
> single flag. I moved mShutdownDemandedTime into CacheObserver and it is now
> used in both cases. 

OK.

> Note that I had to change TimeStamp to PRIntervalTime to
> be able to make it atomic because it is now used on multiple threads. 

If there is no lock to go with, then yes, OK.

> Also
> sMaxShutdownIOLag should be IMO atomic and I changed it to unsigned int
> because I was too lazy to add mozilla::Preferences::AddAtomicIntVarCache()
> method.

There is one! :)  https://hg.mozilla.org/mozilla-central/annotate/043082cb7bd8/modules/libpref/Preferences.h#l288

Feel free to make sMaxShutdownIOLag uint.

> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=58dfc0520bcd
(In reply to Honza Bambas (:mayhemer) from comment #15)
> There is one! :) 
> https://hg.mozilla.org/mozilla-central/annotate/043082cb7bd8/modules/libpref/
> Preferences.h#l288
> 
> Feel free to make sMaxShutdownIOLag uint.

That's exactly what I did. I changed it from int to uint to be able to use the existing method :)
Crash Signature: [@ shutdownhang | memset | free_impl | mozilla::net::CacheFileChunk::~CacheFileChunk] → [@ shutdownhang | memset | free_impl | mozilla::net::CacheFileChunk::~CacheFileChunk] [@ shutdownhang | memset | free_impl | mozilla::net::CacheFileMetadata::~CacheFileMetadata] [@ shutdownhang | memset | arena_dalloc_large | je_free | mozilla::net::Cac…
Duplicate of this bug: 1267981
Comment on attachment 8750256 [details] [diff] [review]
patch v3

Review of attachment 8750256 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheObserver.cpp
@@ +493,5 @@
> +    return false;
> +  }
> +
> +  if ((PR_IntervalNow() - sShutdownDemandedTime) >
> +      PR_SecondsToInterval(sMaxShutdownIOLag)) {

could we cache the PR_SecondsToInterval(sMaxShutdownIOLag) result?
Attachment #8750256 - Flags: review?(honzab.moz) → review+
https://hg.mozilla.org/mozilla-central/rev/7061e837b509
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Comment on attachment 8751204 [details] [diff] [review]
patch v4 - sMaxShutdownIOLag converted to PRIntervalTime is cached

Approval Request Comment
[Feature/regressing bug #]: HTTP cache v2
[User impact if declined]: Shutdown hang/crash
[Describe test coverage new/current, TreeHerder]: on m-c for almost a month, heavily exercised code
[Risks and why]: very very small, this just effectively bypasses freeing memory after shutdown
[String/UUID change made/needed]: none
Attachment #8751204 - Flags: approval-mozilla-aurora?
On m-a must land before bug 1268569.
Blocks: 1268569
Approval Request Comment
please see comment 22
Attachment #8759596 - Flags: approval-mozilla-aurora?
Attachment #8751204 - Flags: approval-mozilla-aurora?
Comment on attachment 8759596 [details] [diff] [review]
[m-a version] patch v4 - sMaxShutdownIOLag converted to PRIntervalTime is cached

hg qref is my friend for the next time ;)
Attachment #8759596 - Attachment is obsolete: true
Attachment #8759596 - Flags: approval-mozilla-aurora?
Approval Request Comment, please see comment 22.

This is the one.
Attachment #8759598 - Flags: approval-mozilla-aurora?
Rather repost to not let it be lost:

This patch must on m-a land before bug 1268569.
Comment on attachment 8759598 [details] [diff] [review]
[m-b version] patch v4 - sMaxShutdownIOLag converted to PRIntervalTime is cached

This now becomes an m-b patch and a?
Attachment #8759598 - Attachment description: [m-a version] patch v4 - sMaxShutdownIOLag converted to PRIntervalTime is cached → [m-b version] patch v4 - sMaxShutdownIOLag converted to PRIntervalTime is cached
Attachment #8759598 - Flags: approval-mozilla-aurora? → approval-mozilla-beta?
Comment on attachment 8759598 [details] [diff] [review]
[m-b version] patch v4 - sMaxShutdownIOLag converted to PRIntervalTime is cached

Fix a shutdown issue, taking it.
Should be in 48 beta 2
Attachment #8759598 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
has problems to apply to beta:

renamed 1267980 -> diff-bug1267980-v4.patch
applying diff-bug1267980-v4.patch
patching file netwerk/cache2/CacheFileChunk.cpp
Hunk #1 FAILED at 127
Hunk #2 FAILED at 443
Hunk #3 FAILED at 509
Hunk #4 FAILED at 541
4 out of 4 hunks FAILED -- saving rejects to file netwerk/cache2/CacheFileChunk.cpp.rej
patching file netwerk/cache2/CacheFileIOManager.cpp
Hunk #1 FAILED at 530
Hunk #2 FAILED at 728
Hunk #3 FAILED at 1169
Hunk #4 FAILED at 1267
Hunk #5 FAILED at 1972
Hunk #6 FAILED at 2299
6 out of 6 hunks FAILED -- saving rejects to file netwerk/cache2/CacheFileIOManager.cpp.rej
patching file netwerk/cache2/CacheFileIOManager.h
Hunk #1 FAILED at 426
1 out of 1 hunks FAILED -- saving rejects to file netwerk/cache2/CacheFileIOManager.h.rej
patching file netwerk/cache2/CacheFileMetadata.cpp
Hunk #1 FAILED at 127
Hunk #2 FAILED at 296
Hunk #3 FAILED at 645
Hunk #4 FAILED at 822
4 out of 4 hunks FAILED -- saving rejects to file netwerk/cache2/CacheFileMetadata.cpp.rej
patching file netwerk/cache2/CacheFileUtils.cpp
Hunk #1 FAILED at 502
1 out of 1 hunks FAILED -- saving rejects to file netwerk/cache2/CacheFileUtils.cpp.rej
patching file netwerk/cache2/CacheFileUtils.h
Hunk #1 FAILED at 140
1 out of 1 hunks FAILED -- saving rejects to file netwerk/cache2/CacheFileUtils.h.rej
patching file netwerk/cache2/CacheObserver.cpp
Hunk #1 FAILED at 88
Hunk #2 FAILED at 242
Hunk #3 FAILED at 474
Hunk #4 FAILED at 506
4 out of 4 hunks FAILED -- saving rejects to file netwerk/cache2/CacheObserver.cpp.rej
patching file netwerk/cache2/CacheObserver.h
Hunk #1 FAILED at 65
Hunk #2 FAILED at 97
2 out of 2 hunks FAILED -- saving rejects to file netwerk/cache2/CacheObserver.h.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working directory
errors during apply, please fix and qrefresh diff-bug1267980-v4.patch

can you take a look, thanks!
Flags: needinfo?(michal.novotny)
The patch seems to be already in beta tree https://hg.mozilla.org/releases/mozilla-beta/rev/ec62fd62c9a7
Flags: needinfo?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #31)
> The patch seems to be already in beta tree
> https://hg.mozilla.org/releases/mozilla-beta/rev/ec62fd62c9a7

Updating flags according it.  Thanks.
Duplicate of this bug: 1131330
You need to log in before you can comment on or make changes to this bug.