Closed Bug 943519 Opened 11 years ago Closed 10 years ago

Use fallible allocator in nsTextFragment::Append

Categories

(Core :: DOM: Core & HTML, defect)

x86_64
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: benjamin, Assigned: Dexter)

References

(Blocks 1 open bug)

Details

(Whiteboard: [mentor=benjamin@smedbergs.us][lang=c++])

Attachments

(3 files, 4 obsolete files)

The #3 cause of OOM crashes appears to be nsTextFragment::Append. There are some insanely large allocations, plus some more normal-sized ones:

bp-aa8c6972-e74c-4214-a9bc-3927a2131122 OOMAllocationSize=696913888
bp-154f2784-a0db-4045-a29e-24cad2131122 OOMAllocationSize=360711886
bp-bd28a98b-b3c0-43b2-a2d3-7d2152131122 OOMAllocationSize=73001

Allocation point:
unicode: http://hg.mozilla.org/releases/mozilla-release/annotate/d20d499b219f/content/base/src/nsTextFragment.cpp#l317
or ascii: http://hg.mozilla.org/releases/mozilla-release/annotate/d20d499b219f/content/base/src/nsTextFragment.cpp#l339

What I'm not certain of is how errors should propagate. I would imagine that the correct thing to do is terminate parsing, but I'm not sure if the parser is already prepared for OOM error codes. http://hg.mozilla.org/releases/mozilla-release/annotate/d20d499b219f/parser/html/nsHtml5TreeOperation.cpp#l140 is one callsite which does seem to have error handling, but I'm not sure how far down it goes. hsivonen, do you know?
Flags: needinfo?(hsivonen)
Hmm, nsGenericDOMDataNode::SetTextInternal can already return NS_ERROR_OUT_OF_MEMORY.  Does this mean that the parser is not capable of handling that?
The parser does not propagate/handle the error correctly at the moment. The calls to Perform drop the return value in https://mxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5TreeOpExecutor.cpp .

The calls to Perform should change to check the return value and if it is not a success, call MarkAsBroken (https://mxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5TreeOpExecutor.cpp#240) with the return value as the argument and then break out of the loop inside which Perform was being called.
Flags: needinfo?(hsivonen)
Whiteboard: [mentor=benjamin@smedbergs.us][lang=c++]
Attached patch bug943519.patchSplinter Review
I've added error handling to 2 of the 4 Perform calls. The return value of the other 2 is void.
--
I picked this issue as my first bug (since it is mentored) to start contributing to our beloved Firefox. I've been strictly following the "Contributing to Mozilla codebase" guide, so I hope I got everything right!
Attachment #8340748 - Flags: review?(benjamin)
Attachment #8340748 - Flags: review?(benjamin)
Comment on attachment 8340748 [details] [diff] [review]
bug943519.patch

Henri is the reviewer for the HTML parser code, although because of bug 945799 I'm having trouble actually marking him for review.

Are you going to attach another patch to actually use the fallible allocator and return failure?
Attachment #8340748 - Flags: feedback?(hsivonen)
Assignee: nobody → alessio.placitelli
Flags: needinfo?(alessio.placitelli)
Attachment #8340748 - Flags: review?(hsivonen)
Comment on attachment 8340748 [details] [diff] [review]
bug943519.patch

Thanks.

(Indeed, the two other Perform() calls are calls to a different Perform().)
Attachment #8340748 - Flags: review?(hsivonen)
Attachment #8340748 - Flags: review+
Attachment #8340748 - Flags: feedback?(hsivonen)
Attachment #8340748 - Flags: feedback+
As far as I've understood by reading previous fixed bugs and some article on MDN, switching from infallible allocations to fallible ones basically means using moz_malloc/ new (fallible) in place of nsMemory::Alloc. Is that correct?

In the proposed patch I've changed all the uses of nsMemory::Alloc/Free/realloc to the corresponding moz_malloc/free/realloc. The only reference left to nsMemory is nsMemory::Clone, which allocates memory as well: is there a fallible equivalent function which I have yet to find?
Attachment #8342236 - Flags: review?(benjamin)
Flags: needinfo?(alessio.placitelli) → needinfo?(benjamin)
Comment on attachment 8342236 [details] [diff] [review]
fallibleAlloc.patch - Using fallible allocations instead of infallible ones.

nsMemory::Clone is alloc+memcpy see http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsMemory.cpp#29 and you can probably just use that pattern in the case here.

However, currently many of the methods you're touching have void return types, which means that failure to allocate is silently ignored, for example:

nsTextFragment::SetTo
nsTextFragment::Append

Currently these methods null-check and simply return, but I don't think that can be the correct behavior. We should at least propagate errors back to nsGenericDOMDataNode::SetTextInternal so that the HTML parser patch can do the correct thing and terminate parsing.

Although technically I'm not the module owner for this code, I think it would be perfectly fine for the methods which currently return void to return a simple bool success/fail code. nsresults are too heavy for this purpose. I suspect hsivonen should do the followup review as well.
Attachment #8342236 - Flags: review?(benjamin) → review-
Attached patch FallibleAllocator_contd.patch (obsolete) — Splinter Review
As suggested by bsmedberg, I've changed the return type for the following functions:

nsTextFragment::SetTo
nsTextFragment::Append

Failed allocations make those function return false instead of the default value of true. Errors are propagated back and handled in nsGenericDOMDataNode::SetTextInternal (hopefully correctly!).
Attachment #8342428 - Flags: review?(hsivonen)
Flags: needinfo?(benjamin)
Comment on attachment 8342428 [details] [diff] [review]
FallibleAllocator_contd.patch

Are parts of attachment 8342236 [details] [diff] [review] meant to survive with this patch? It would be good to roll the surviving parts of that patch into the same patch with these changes.

Please don't use the variable name "rv" for types other than nsresult. You could call it "bool ok".
Thank you hsivonen. Yes, parts of attachment 8342236 [details] [diff] [review] are meant to survive this new patch. I will follow your advice and combine the two patches (also modifying the variable name).
Attached patch fallibleAlloc.patch (obsolete) — Splinter Review
I've combined the fallible allocation patches together and changed the variable names to "bool ok" instead of "rv", as suggested!
Attachment #8342236 - Attachment is obsolete: true
Attachment #8342428 - Attachment is obsolete: true
Attachment #8342428 - Flags: review?(hsivonen)
Attachment #8343661 - Flags: review?(hsivonen)
Comment on attachment 8343661 [details] [diff] [review]
fallibleAlloc.patch

This looks generally good. Two comments. One trivial, the other less so.

>-    nsMemory::Free(m2b); // m1b == m2b as far as nsMemory is concerned
>+    moz_free(m2b); // m1b == m2b as far as nsMemory is concerned

The trivial comment: Please update the comment to talk about moz_free instead of nsMemory.

>@@ -99,19 +99,23 @@ nsTextFragment::operator=(const nsTextFr
> {
>   ReleaseText();
> 
>   if (aOther.mState.mLength) {
>     if (!aOther.mState.mInHeap) {
>       m1b = aOther.m1b; // This will work even if aOther is using m2b
>     }
>     else {
>-      m2b = static_cast<PRUnichar*>
>-                       (nsMemory::Clone(aOther.m2b, aOther.mState.mLength *
>-                                    (aOther.mState.mIs2b ? sizeof(PRUnichar) : sizeof(char))));
>+      size_t m2bSize = aOther.mState.mLength *
>+        (aOther.mState.mIs2b ? sizeof(PRUnichar) : sizeof(char));
>+
>+      m2b = static_cast<PRUnichar*>(moz_malloc(m2bSize));
>+      if (m2b) {
>+        memcpy(m2b, aOther.m2b, m2bSize);
>+      }

If the allocation fails, m2b becomes nullptr, but no error is propagated. That's the state this code was in before the infallible malloc was introduced, but won't that make us crash at a later point in time?

Is there a reason why the allocation here should not be moz_xmalloc()? That could still lead to an OOM crash, but at least it would be an up-front OOM-crash-looking crash instead of a deferred bad memory access crash.
Thank you again for your review, I think you have a point with your second comment. You're right, it would be much better to have an up-front OOM-crash-looking crash, instead of a subtle deferred bad memory access crash. Updating the patch right now.
Can't we propagate an error? The whole point of this patch was to avoid the infallible allocators, since they are showing up in crash-stats.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #14)
> Can't we propagate an error? The whole point of this patch was to avoid the
> infallible allocators, since they are showing up in crash-stats.

I suppose it would be possible to let m2b become nullptr and then have all methods that access it first null-check it return a failure value if it is null.

(If there's a way to propagate the error on the spot, I fail to see what it would be.)

Alessio, does adding null-checking like that to methods that access m2b look sensible?
Could we instead make the nsTextFragment an 0-length string and propagate error? I'm perfectly fine with aborting if we can't allocate small fixed-size buffers such as "". What I'm trying to avoid is aborting if there is a 6/60/600MB string, which is what we're seeing.
Attached patch fallibleAlloc.patch (obsolete) — Splinter Review
Sorry for the delay, I've modified the operator= method to use an infallible 1-byte allocation ('\0') if the fallible allocation fails. Does this sound reasonable?
Attachment #8343661 - Attachment is obsolete: true
Attachment #8343661 - Flags: review?(hsivonen)
Attachment #8347256 - Flags: review?(hsivonen)
Comment on attachment 8347256 [details] [diff] [review]
fallibleAlloc.patch

>@@ -99,19 +99,30 @@ nsTextFragment::operator=(const nsTextFr
> {
>   ReleaseText();
> 
>   if (aOther.mState.mLength) {
>     if (!aOther.mState.mInHeap) {
>       m1b = aOther.m1b; // This will work even if aOther is using m2b
>     }
>     else {
>-      m2b = static_cast<PRUnichar*>
>-                       (nsMemory::Clone(aOther.m2b, aOther.mState.mLength *
>-                                    (aOther.mState.mIs2b ? sizeof(PRUnichar) : sizeof(char))));
>+      size_t m2bSize = aOther.mState.mLength *
>+        (aOther.mState.mIs2b ? sizeof(PRUnichar) : sizeof(char));
>+
>+      m2b = static_cast<PRUnichar*>(moz_malloc(m2bSize));
>+      if (m2b) {
>+        memcpy(m2b, aOther.m2b, m2bSize);
>+      } else {
>+        // allocate a 0-length string to propagate the allocation error

Looks like a 1-length string that contains U+0000. Please update the comment to say "allocate a buffer for a single REPLACEMENT CHARACTER" (see below)

>+        m2b = static_cast<PRUnichar*>(moz_xmalloc(sizeof(char)));

Please change sizeof(char) to sizeof(PRUnichar).

>+        m2b[0] = '\0';

Please change this to 
m2b[0] = 0xFFFD; // REPLACEMENT CHARACTER
in order to make sure that JavaScript  compilation fails in an obvious way if this text node happens to be a child of the script element.

>+        mState.mIs2b = false;

And them true here.

>+        mState.mInHeap = true;
>+        mState.mLength = 1;
>+      }

r+ with those changes.
Attachment #8347256 - Flags: review?(hsivonen) → review+
Henri, thank you very much for your review, I've updated the patch based on your suggestions.
Attachment #8347256 - Attachment is obsolete: true
Attachment #8347948 - Flags: review?(hsivonen)
Comment on attachment 8347948 [details] [diff] [review]
fallibleAlloc.patch

Thanks.
Attachment #8347948 - Flags: review?(hsivonen) → review+
Keywords: checkin-needed
12:42:18     INFO -  28636 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 1 elements
12:42:18     INFO -  28637 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements[0] should be instance of function HTMLTextAreaElement() {
12:42:18     INFO -      [native code]
12:42:18     INFO -  }
12:42:18     INFO -  28638 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 0 elements
12:42:18     INFO -  28639 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 0 elements
12:42:18     INFO -  28640 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 0 elements
12:42:18     INFO -  28641 ERROR TEST-UNEXPECTED-FAIL | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 1 elements - got 0, expected 1
12:42:18     INFO -  28642 ERROR TEST-UNEXPECTED-FAIL | /tests/content/html/content/test/test_bug595449.html | fieldset.elements[0] should be instance of function HTMLSelectElement() {
12:42:18     INFO -      [native code]
12:42:18     INFO -  }
12:42:18     INFO -  28643 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 0 elements
12:42:18     INFO -  28644 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 1 elements
12:42:18     INFO -  28645 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements[0] should be instance of function HTMLInputElement() {
12:42:18     INFO -      [native code]
12:42:18     INFO -  }

So somehow this test is failing: http://hg.mozilla.org/mozilla-central/annotate/cd35ef59eae0/content/html/content/test/test_bug595449.html#l70
It appears that it's only failing on the "B2G ICS Emulator Opt" builds. This is going to be a PITA to debug locally, so I submitted a try build with more logging of the failure paths in these patches. Here's the log of the failure:

https://tbpl.mozilla.org/php/getParsedLog.php?id=32109489&tree=Try&full=1

None of the OOM paths were hit on the run, but only the HTML parser error paths:

13:26:43     INFO -  12-17 21:17:24.840   703   703 I GeckoDump: 28780 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 1 elements
13:26:43     INFO -  12-17 21:17:24.840   703   703 I GeckoDump: 28781 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements[0] should be instance of function HTMLTextAreaElement() {
13:26:43     INFO -  12-17 21:17:24.840   703   703 I GeckoDump:     [native code]
13:26:43     INFO -  12-17 21:17:24.840   703   703 I GeckoDump: }
13:26:43     INFO -  12-17 21:17:24.850   703   703 I GeckoDump: 28782 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 0 elements
13:26:43     INFO -  12-17 21:17:24.860   703   703 I GeckoDump: 28783 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 0 elements
13:26:43     INFO -  12-17 21:17:24.870   703   703 I GeckoDump: 28784 INFO TEST-PASS | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 0 elements
13:26:43     INFO -  12-17 21:17:24.891   703   703 I Gecko   : bsmedberg - iter->Perform failed at line 632
13:26:43     INFO -  12-17 21:17:24.911   703   703 I GeckoDump: 28785 ERROR TEST-UNEXPECTED-FAIL | /tests/content/html/content/test/test_bug595449.html | fieldset.elements should contain 1 elements - got 0, expected 1
13:26:43     INFO -  12-17 21:17:24.911   703   703 I GeckoDump: 28786 ERROR TEST-UNEXPECTED-FAIL | /tests/content/html/content/test/test_bug595449.html | fieldset.elements[0] should be instance of function HTMLSelectElement() {
13:26:43     INFO -  12-17 21:17:24.911   703   703 I GeckoDump:     [native code]
13:26:43     INFO -  12-17 21:17:24.911   703   703 I GeckoDump: }

So I'm surprised that there are B2G-only failure paths here: I don't know yet which operations are failing: hsivonen, are there ways to dump an nsHtml5TreeOperation* to see what operation it is, or is that only something which can be done in a live debugger?
Flags: needinfo?(hsivonen)
Benjamin, FWIW you could do these builds locally and if you can reproduce the problem then you might even be able to attach gdb and do some debugging, although in my experience gdb is not really very stable in this configuration but even printf-debugging with a local build will give you a much faster turn-around time.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #24)
> hsivonen, are there ways to dump an
> nsHtml5TreeOperation* to see what operation it is

There is no existing operation dumping code. nsHtml5TreeOperation::Perform is what documents the layout of the operations, so you could duplicate that method to get a skeleton switch() on the opcode and replace the code under each case with appropriate printfs.
Flags: needinfo?(hsivonen)
Alessio was going to look at this, but it was really bugging me all day so I spent the time and got a b2g emulator build going myself:

http://hg.mozilla.org/mozilla-central/annotate/7ec385eb27f5/parser/html/nsHtml5TreeOperation.cpp#l385 is the failing line. This fails because the form processor require PSM, which isn't allowed in the content process. I really don't know whether a change in HTML parsing behavior is expected as a result of this, but I believe the thing we need to return to the previous state is to return NS_OK in this condition rather than the failure code.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #27)
> Alessio was going to look at this, but it was really bugging me all day so I
> spent the time and got a b2g emulator build going myself:
> 
> http://hg.mozilla.org/mozilla-central/annotate/7ec385eb27f5/parser/html/
> nsHtml5TreeOperation.cpp#l385 is the failing line. This fails because the
> form processor require PSM, which isn't allowed in the content process. I
> really don't know whether a change in HTML parsing behavior is expected as a
> result of this, but I believe the thing we need to return to the previous
> state is to return NS_OK in this condition rather than the failure code.

I think that's being fixed in bug 101019, too.
Attached patch bug943519-keygenSplinter Review
I'm sorry I didn't give you a change to try this yourself, Dexter!
Attachment #8349729 - Flags: review?(hsivonen)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #29)
> Created attachment 8349729 [details] [diff] [review]
> bug943519-keygen
> 
> I'm sorry I didn't give you a change to try this yourself, Dexter!

No problem Benjamin, thanks for the help! I will give the b2g emulator build a shot later on ;)
Depends on: 966744
OS: Windows 7 → All
Depends on: 998356
Depends on: CVE-2014-1592
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.