Closed Bug 1177278 Opened 9 years ago Closed 9 years ago

Large OOMs in CacheFileMetadata::WriteMetadata

Categories

(Core :: Networking: Cache, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox39 --- wontfix
firefox40 + wontfix
firefox41 + fixed
firefox42 + fixed

People

(Reporter: michal, Assigned: michal)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 1 obsolete file)

This is a follow-up bug for OOM not fixed in 1171724. The crash reports seem strange for these signatures.

David, can we trust "oom allocation size" in the crash reports? There is quite a lot of reports with size 2-3GB, which doesn't seem realistic. I checked the code and I don't see any potential underflow when computing the buffer size.

Nick, the crashes in CacheFileMetadata::EnsureBuffer() are mostly coming from the predictor trying to set an extra large metadata element. Usually 4-16MB, but also 128MB. What is the size limit of data generated by the predictor? I can't imagine such a large metadata even for a very large URLs.

Anyway, I'd like to fix this by introducing a limit for metadata size. 64kB seems like a reasonable limit since 98% of entries has metadata smaller than 4kB.
Flags: needinfo?(hurley)
Flags: needinfo?(dmajor)
> David, can we trust "oom allocation size" in the crash reports? There is
> quite a lot of reports with size 2-3GB, which doesn't seem realistic.

You can trust that the "oom allocation size" was the actual size requested from moz_xmalloc. You'll need to determine whether it's reasonable to be making such an allocation in the first place.

If you find yourself in an "impossible" situation, you may want to consider indirect factors: memory overwriting, memory poisoning, etc.
Flags: needinfo?(dmajor)
Depends on: 1171724
Crash Signature: const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsCOMPtr<nsIFile>::~nsCOMPtr<nsIFile>() | mozilla::net::CacheFileMetadata::WriteMetadata(unsigned int, mozilla::net::CacheFileMetadataListener*)] → const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsCOMPtr<nsIFile>::~nsCOMPtr<nsIFile>() | mozilla::net::CacheFileMetadata::WriteMetadata(unsigned int, mozilla::net::CacheFileMetadataListener*)] [@ OOM | large | mozalloc_abort(char cons…
Depends on: 1181258
Flags: needinfo?(hurley)
Crash Signature: , mozilla::net::CacheFileMetadataListener*)] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::net::CacheFileMetadata::WriteMetadata] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | mozilla::net::Cache… → , mozilla::net::CacheFileMetadataListener*)] [@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsRefPtr<T>::~nsRefPtr<T>() | mozilla::net::CacheFileMetadata::WriteMetadata(unsigned int, mozilla::net::C…
Attached patch fix (obsolete) — Splinter Review
This patch limits the size of elements stored in metadata to 64kB. The patch also fixes 2 other bugs:

- browser.cache.disk.max_entry_size and browser.cache.memory.max_entry_size were handled as unsigned integers but in fact they are signed because -1 is a special value meaning no limit
- due to metadata offset limitation the cached data cannot be larger than 4GB, but there was no check that would handle exceeding this limit. Anyway, this was just a theoretical bug since CacheObserver::EntryIsTooBig() shouldn't allow such big entry unless somebody manually tweaks the limits in preferences.
Attachment #8633424 - Flags: review?(honzab.moz)
Comment on attachment 8633424 [details] [diff] [review]
fix

Review of attachment 8633424 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheFileMetadata.cpp
@@ +264,5 @@
>  
>    mIsDirty = false;
>  
> +  mWriteBuf = static_cast<char *>(moz_xmalloc(CalcMetadataSize(mElementsSize,
> +                                                               mHashCount)));

please make this fallible (if we fail writing, just doom the file)

@@ +994,5 @@
>        aSize = kInitialBufSize;
>      }
>  
>      mBufSize = aSize;
>      mBuf = static_cast<char *>(moz_xrealloc(mBuf, mBufSize));

so, this is now easy to make fallible, right?

::: netwerk/cache2/CacheFileOutputStream.cpp
@@ +103,5 @@
>      return NS_ERROR_FILE_TOO_BIG;
>    }
>  
> +  // We use 64-bit offset when accessing the file, unfortunatelly we use 32-bit
> +  // metadata offset, so we cannot handle data bigger than 4GB.

funny design :D

::: netwerk/cache2/CacheObserver.cpp
@@ +65,5 @@
>  static uint32_t const kDefaultPreloadChunkCount = 4;
>  uint32_t CacheObserver::sPreloadChunkCount = kDefaultPreloadChunkCount;
>  
>  static uint32_t const kDefaultMaxMemoryEntrySize = 4 * 1024; // 4 MB
> +int32_t CacheObserver::sMaxMemoryEntrySize = kDefaultMaxMemoryEntrySize;

nit: change also type of the default const.
Attachment #8633424 - Flags: review?(honzab.moz) → review+
[Tracking Requested - why for this release]:
This is in the Top 10 Crash Scores for 39 release and 40 Beta as large OOM get higher score factors - we should never OOM-crash on large allocations but fail "gracefully" instead. Requesting tracking because it's in the top 10 scores.
Tracked for 40, 41, and 42, due to Top 10 Crash, plus OOM crashes should fail gracefully, as indicated in Comment 4.
Attached patch patch v2Splinter Review
Attachment #8633424 - Attachment is obsolete: true
Attachment #8637185 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/05b5f7f9f579
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
40 and 41 are marked as affected. This is a decent amount of code change for this late in Beta but also a serious enough issue to consider uplift of the patch. What do you think about uplifting this fix for beta8?
Flags: needinfo?(michal.novotny)
Personally I'd be willing to live with this OOM on 40 at this point. Looking at the code, I don't think a chance of regression here is worth 0.7% of crashes.
I think it's a low risk to uplift this patch. If all memory allocation succeed then the only change made by this patch is the new size limit for metadata elements, but this limit is way bigger than is normally needed, so this shouldn't be a problem. In case some memory allocation fails then we should just not use the cache entry and get the content from the server. If there is a bug in the existing code (just discovered bug 1188387) that won't handle NS_ERROR_OUT_OF_MEMORY correctly the worst what can happen is that the channel won't deliver the data. IMO still better than crashing.
Flags: needinfo?(michal.novotny)
Keywords: crash
(In reply to Michal Novotny (:michal) from comment #11)
> I think it's a low risk to uplift this patch.

If you think it's worth uplifting, can you request approval, please?
Flags: needinfo?(michal.novotny)
Comment on attachment 8637185 [details] [diff] [review]
patch v2

Approval Request Comment
[Feature/regressing bug #]: this code is probably in the cache2 from the beginning
[User impact if declined]: OOM crash
[Describe test coverage new/current, TreeHerder]: did some tests manually, but in general this is hard to test
[Risks and why]: low, see comment #11
[String/UUID change made/needed]: none
Flags: needinfo?(michal.novotny)
Attachment #8637185 - Flags: approval-mozilla-beta?
Attachment #8637185 - Flags: approval-mozilla-aurora?
Comment on attachment 8637185 [details] [diff] [review]
patch v2

Even though this is considered low risk, I agree with David that I don't want the extra regression risk. Given that this fix missed beta8 and that 40 is already in good shape from a stability perspective, I think we should take this change in 41. Beta- Aurora+
Attachment #8637185 - Flags: approval-mozilla-beta?
Attachment #8637185 - Flags: approval-mozilla-beta-
Attachment #8637185 - Flags: approval-mozilla-aurora?
Attachment #8637185 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: