Closed Bug 1131348 Opened 5 years ago Closed 5 years ago

Large OOMs in content serialize code

Categories

(Core :: DOM: Core & HTML, defect, critical)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox36 --- wontfix
firefox37 + wontfix
firefox38 + fixed
firefox39 --- fixed

People

(Reporter: dmajor, Assigned: wchen)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-008f38d1-704c-4c08-946d-ee8442150202.
=============================================================

In total about 0.72% of beta crashes. Usually 500k or 1M allocations. Socorro puts some garbage on the stacks but generally the real stacks go like:

> nsAString_internal::Replace(unsigned int, unsigned int, nsAString_internal const&)
> nsDocumentEncoder::SerializeNodeStart(nsINode*, int, int, nsAString_internal&, nsINode*)
> nsDocumentEncoder::SerializeToStringRecursive(nsINode*, nsAString_internal&, bool, unsigned int)

> nsAString_internal::Replace(unsigned int, unsigned int, nsAString_internal const&)
> nsXMLContentSerializer::AppendToString(nsAString_internal const&, nsAString_internal&)
> nsXMLContentSerializer::AppendToStringConvertLF(nsAString_internal const&, nsAString_internal&)
> nsXMLContentSerializer::SerializeAttr(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, nsAString_internal&, bool)
How invasive would it be to make these fallible?
Flags: needinfo?(bugs)
Hmm, do we have proper stacks for these crashes? Like showing where the calls are coming?
But yeah, I think we could make this stuff fallible.
htmlelement.innerHTML uses different, faster code paths these days anyway.
Flags: needinfo?(bugs)
They come from JS calls to XMLSerializerBinding::serializeToString.

More stacks available here: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=OOM+|+large+|+NS_ABORT_OOM%28unsigned+int%29+|+nsAString_internal%3A%3AReplace%28unsigned+int%2C+unsigned+int%2C+nsAString_internal+const%26%29#tab-reports

(Just ignore the frames that don't make any sense)
Oh, it is XMLSerializer. Not at all what I thought it would be.

Anyhow, better to throw some sane exception to the web pages using XMLSerializer and causing OOM.


Henri, would you have time to look at this?
Flags: needinfo?(hsivonen)
(In reply to Olli Pettay [:smaug] (no new review requests before Feb 22, please) from comment #4)
> Henri, would you have time to look at this?

I don't, unfortunately.
Flags: needinfo?(hsivonen)
Flags: needinfo?(bugs)
[Tracking Requested - why for this release]:
The crash signature of this bug is #8 with ~1.4% of all 37.0b3 crashes, I think we should track and put more emphasis on this.
Tracking topcrash.

Andrew - Can you please help find an owner?
Flags: needinfo?(overholt)
William has offered to investigate.
Assignee: nobody → wchen
Flags: needinfo?(overholt)
William - We have very little time left in the 37 cycle. Last Beta goes to build on this Thu, Mar 19. Do you have an update on this top crash?
Flags: needinfo?(wchen)
Here is a patch that converts a lot of the Append calls (which looks like is the caller of Replace that OOMs) to the fallible version.

Unfortunately this patch makes the code ugly, and it's quite invasive. I fixed a few instances manually before I gave up and used VIM commands to mass replace the rest (I did look over the changes and made adjustments before uploading the patch).

Almost all the crash reports are missing frames that I want to see. From the few that actually show frames from the serializer, it seems like nsXMLContentSerializer::AppendToString is eventually calling nsAString_internal::ReplaceReplace that causes the OOM. Even changing only nsXMLContentSerializer::AppendToString to be infallible is very invasive.

Alternatively, we can try to be more precise and only use the fallible version of AppendToString in nsXMLContentSerializer::AppendElementStart because that's the only function I have found in the crash reports, see https://crash-stats.mozilla.com/report/index/1ec97c5b-fea9-432c-a940-0b2002150310 for an example of a more complete stack. But these stacks are fairly rare and the OOM can easily be happening in other places. Let me know if you rather take the more precise approach and I can get a new patch up pretty quickly.
Flags: needinfo?(wchen)
Attachment #8578517 - Flags: review?(bugs)
Comment on attachment 8578517 [details] [diff] [review]
Use fallible append in content serializers.

String API changes should be reviewed by someone else.
Attachment #8578517 - Flags: review?(nfroyd)
Attachment #8578517 - Flags: review?(bugs)
The main issue the patch has that some methods return bool and some nsresult.
So whenever NS_ENSURE_TRUE(somemethod(),) is used, one needs to check whether the method return bool
or nsresult.
Comment on attachment 8578517 [details] [diff] [review]
Use fallible append in content serializers.

>
> nsXMLContentSerializer::AppendElementStart(Element* aElement,
>                                            Element* aOriginalElement,
>                                            nsAString& aStr)
> {
>   NS_ENSURE_ARG(aElement);
> 
>   nsIContent* content = aElement;
> 
>   bool forceFormat = false;
>-  if (!CheckElementStart(content, forceFormat, aStr)) {
>-    return NS_OK;
>+  nsresult rv = NS_OK;
>+  if (!CheckElementStart(content, forceFormat, aStr, rv)) {
>+    return rv;
>   }
> 
>+  NS_ENSURE_SUCCESS(rv, NS_ERROR_OUT_OF_MEMORY);
Nit, NS_ENSURE_SUCCESS(rv, rv); would make more sense
Flags: needinfo?(bugs)
Attachment #8578517 - Flags: review?(bugs) → review+
Comment on attachment 8578517 [details] [diff] [review]
Use fallible append in content serializers.

Review of attachment 8578517 [details] [diff] [review]:
-----------------------------------------------------------------

String parts look good.  r=me
Attachment #8578517 - Flags: review?(nfroyd)
Attachment #8578517 - Flags: review?(bugs)
Attachment #8578517 - Flags: review+
Comment on attachment 8578517 [details] [diff] [review]
Use fallible append in content serializers.

Review of attachment 8578517 [details] [diff] [review]:
-----------------------------------------------------------------

Sigh, bugzilla.  Re-adding Olli's r+.
Attachment #8578517 - Flags: review?(bugs) → review+
https://hg.mozilla.org/mozilla-central/rev/be159a7b732a
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
This but affects 37 but the William and I agree that the fix is too risky to take right at the end of the Beta cycle. If the fix is shown to be good in Nightly, let's look at uplifting to 38.
William, please let me know when you know if you want to uplift that to 38. Thanks
Flags: needinfo?(wchen)
Comment on attachment 8578517 [details] [diff] [review]
Use fallible append in content serializers.

Approval Request Comment
[Feature/regressing bug #]: Feature has been around since hg@1
[User impact if declined]: Browser crashing when trying to serialize a large DOM tree.
[Describe test coverage new/current, TreeHerder]: Passes all tests on TreeHerder. No new serializer bugs since landing on m-c.
[Risks and why]: Low, each change causes functions to return early instead of crashing in OOM situations. The biggest risk I can think of is that returning early in some functions may leave the serializer with an incorrect internal state and cause it to return incorrect output.
[String/UUID change made/needed]: None
Flags: needinfo?(wchen)
Attachment #8578517 - Flags: approval-mozilla-aurora?
Attachment #8578517 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Duplicate of this bug: 1114435
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.