Closed Bug 890284 Opened 6 years ago Closed 4 years ago

Stop splitting textnodes into 4KB chunks in the XML parser

Categories

(Core :: XML, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

Details

Attachments

(5 files, 2 obsolete files)

Like bug 194231, but for XML.

Henri, do you have time for this, or know who else might be a good bet to fix it?
Flags: needinfo?(hsivonen)
 I'm hoping to be able to look in the next couple of weeks.  As for who else, Ms2ger maybe?
Flags: needinfo?(hsivonen)
Possibly. needinfo'ing myself so I don't forget
Flags: needinfo?(Ms2ger)
I doubt I'll end up doing this.
Flags: needinfo?(Ms2ger)
Duplicate of this bug: 1196926
Please fix this. Firefox is the only browser with this serious bug. Web apps that reply on correct XML parsing will fail under unpredictable circumstances, simply when they download too big data. 

Again, the testcase:

URL: http://quizzicalweb.com/bugreport/firefox_bug_1196926.htm
Hi guys, I registered just to vote for this bug. It is platform independent, not just x86 Mac OS X as stated in the description - I have this same problem with Firefox 40.0.3 on Windows 7 x64.

In our company we use SAP NetWeaver Java AS EJB webservice which returns JSON in XML TextNode which is longer than 4096 characters - it fails so we have to use this workaround: https://bugzilla.mozilla.org/show_bug.cgi?id=194231#c68

What a nuisance, since it works without problem in IE 11, Edge and Chrome.

THIS IS A SERIOUS BUG REPORTED ON 2002-07-13 AND NEVER SOLVED SINCE!!!
https://bugzilla.mozilla.org/show_bug.cgi?id=157299

They say it is intended to work like that for performance reasons. Maybe that was the case in 2002. Today, I see it as a pure bull****.
Peter, Henri is out for a few more weeks and I think that leaves you and sicking as viable reviewers for this stuff....
Attachment #8662175 - Flags: review?(peterv)
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Comment on attachment 8662175 [details] [diff] [review]
Stop splitting textnodes in the XML content sink

Review of attachment 8662175 [details] [diff] [review]:
-----------------------------------------------------------------

r- for not removing the loop. Feel free to ignore if I'm missing a reason why it's needed.

::: dom/xml/nsXMLContentSink.cpp
@@ +1437,5 @@
>    // Copy data from string into our buffer; flush buffer when it fills up
>    int32_t offset = 0;
>    while (0 != aLength) {
>      int32_t amount = mTextSize - mTextLength;
>      if (0 == amount) {

So if |0 < amount < aLength| we'll go through this loop twice, but I don't see what that gains us. Couldn't we convert the |0 == amount| check into a check for |amount < aLength| and drop the loop? (You'd need to correct the mTextSize calculation, probably to |mTextSize += aLength - amount|).
Attachment #8662175 - Flags: review?(peterv) → review-
Good catch.  Loop removed.  Interdiffs coming up.
Attachment #8664443 - Flags: review?(peterv)
Attachment #8662175 - Attachment is obsolete: true
Comment on attachment 8664443 [details] [diff] [review]
Stop splitting textnodes in the XML content sink

Review of attachment 8664443 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks!
Attachment #8664443 - Flags: review?(peterv) → review+
I backed this out for being the likely cause of some linux asan failures in mochitest-5:

https://treeherder.mozilla.org/logviewer.html#?job_id=14724678&repo=mozilla-inbound

https://hg.mozilla.org/integration/mozilla-inbound/rev/e882f6a95406
Flags: needinfo?(bzbarsky)
I won't have a build environment for a week or so, unfortunately, so can't do anything about that until then.  That said:

1)  Were the asan failures consistent or intermittent?
2)  Did they go away after the backout?
Flags: needinfo?(wkocher)
1) Yes, quite consistent: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=2b5a7e0e2082&filter-searchStr=asan%20m%285&tochange=e882f6a95406
2) We'll see once the backout gets builds and tests.

There were a few other patches landed in the same push as yours, so maybe one of bug 1202696, bug 1197211, or bug 1182112 caused it, but none of them seems like as likely to me.
Flags: needinfo?(wkocher)
Yeah, hmm.  None of those seem terribly likely, including this one.  But OK, let's see what the tree says.  At least it's consistent...
So I tried the simplest possible change: just always set mConstrainSize to false.  That same test ends up timing out....
So this is exciting.  In a local ASAN build with my patch, I do see a very long pause right after "doTest testNum 7" as on try... but as soon as I blur the window the test proceeds and finishes.
Something in this test is just weirdly timing-dependent.  A debug build passes.  If I change the timers from 10ms to 200ms the test passes.  If I run under rr, the test passes.

Given that test_prompt.html is already skipped on Linux because it times out there (I bet for similar reasons), can we just skip this test under ASAN somehow?
Flags: needinfo?(wkocher)
Looks like "skip-if = asan" should do it.[1]


1. http://mxr.mozilla.org/mozilla-central/source/browser/base/content/test/general/browser.ini#348
Flags: needinfo?(wkocher)
OK, so I did some more digging, and this is _really_ fun.

This particular test in fact sends an XML file with a large textnode in it.  About 1MB of textnode, to be exact; see the "huge" case in toolkit/components/passwordmgr/test/authenticate.sjs.  The text is made up of lots of copies of "123456789\n" and the XML parser feeds that to the content sink as a sequence of "123456789", "\n", "123456789", "\n", etc.  Probably because it has to do newline normalization and counting anyway.

The upshot is that we do two reallocs for every 10 chars in this textnode, the textnode is about 1e6 chars, so we do around 200k reallocs.  I am guessing these are a lot more expensive with ASAN than without, so the test is actually timing out.

When running locally, various things one might do (like moving focus) can trigger a flush, which would empty the large buffer and mean many fewer reallocs are needed, which is why the test passed for me then, I expect.
This more or less restores the old behavior, but just uses FlushText(false) to avoid starting a new textnode.
Attachment #8671458 - Flags: review?(peterv)
Attachment #8664443 - Attachment is obsolete: true
Flags: needinfo?(bzbarsky)
Comment on attachment 8671460 [details] [diff] [review]
Complete patch against tip; may be easier to review maybe

Review of attachment 8671460 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/xml/nsXMLContentSink.cpp
@@ +1429,1 @@
>    if (0 == mTextSize) {

It seems that after your patch mTextSize is either 0 (if mText is null) or NS_ACCUMULATION_BUFFER_SIZE (if mText is non-null)? Could we get rid of mTextSize and check for a null mText here?

@@ +1438,3 @@
>    int32_t offset = 0;
>    while (0 != aLength) {
>      int32_t amount = mTextSize - mTextLength;

This would be |amount = NS_ACCUMULATION_BUFFER_SIZE - mTextLength|.

@@ +1442,5 @@
> +      nsresult rv = FlushText(false);
> +      if (NS_WARN_IF(NS_FAILED(rv))) {
> +        return rv;
> +      }
> +      amount = mTextSize - mTextLength;

This could be |amount = NS_ACCUMULATION_BUFFER_SIZE - mTextLength;| or an assert that mTextLength == 0 and |amount = NS_ACCUMULATION_BUFFER_SIZE|.
Attachment #8671460 - Flags: review+
Yes, we can nix mTextSize as you suggest.  I'll do that.
Actually, per IRC conversation, we'll just make mText be a PRUnichar[NS_ACCUMULATION_BUFFER_SIZE] and nix that "is the buffer here?" check altogether.
Attachment #8671458 - Flags: review?(peterv)
https://hg.mozilla.org/mozilla-central/rev/b836be6a5d20
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
You need to log in before you can comment on or make changes to this bug.