Closed Bug 194231 Opened 21 years ago Closed 13 years ago

4k+ text blocks create adjacent DOM text nodes

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: jst, Unassigned)

References

Details

Attachments

(2 files)

This ain't what the spec says, we should create one text node that holds all the
text, no matter how much of it there is...
Maybe the content sink needs to hold on to the last text node and append text
instead of creating a new node if certain conditions are satisfied?
We just have these overly complex buffers in the sink for perf reasons, it was
more efficient to create 4k buffers than continuously reallocate and copy stuff.
But maybe our string classes are now smart enough and allocate new buffers as
needed inside the strings without actually copying? It should be possible to
remove a nice chunk of code and make it simpler.
Text nodes do not use our string classes internally... Or are you talking about
different code?

Our string classes _may_ be smart enough to do what you want, depending on how
you use them (eg using += on an nsString _will_ reallocate, but using dependent
concatenations will not....)
re: what Boris said in comment #2.
"if certain conditions are satisfied" 

Please let that also take into consideration escaped data.
-------

I've attached a zip file containing an issue I was trying to pin down and show
to Heikki re: Mozilla's parsing of a returned SOAP payload via the XMLHTTP and
the DOMParser. The XML is a sample is a real SOAP response from our SOAP
listener. SOAP/IIS automatically encodes it and IE handles it seamlessly
(happened after SOAP2.0 was released, guess you guys had them worried with the
release of NS 6.1's XML capabilities :-) ).

We are currently having to serialize the XMLHTTP return (having wrapped the
result xml doc in a dummy node so as to end up with a valid doc after
splitting, split it, unescape it, then load it back into a DOM. 

IE allows us to just get the ResultNode and run. 

Not saying IE is right, not complaining about Moz, just trying to make it more
efficient for Web Developers and allow Moz to maintain a leg up on WebServices.

Also, attached the files so you all could see what real world idiots, I mean
developers (such as myself) are trying to do with your technologies.
I turned off the escape option in the WSDL file just to play and now remember
why we left it enabled, we thought Mozilla's DOMParser was choking on a
character being returned in the Result node. Turns out to be this 4k limit. 

Here is what we get without the encoding option enabled - note it returns the
string as escaped, does it do that only for the error report or does it
automatically escape nodeValues?:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">
	XML Parsing Error: not well-formedLocation: file:///Line Number 1, Column 372:
	<sourcetext>
		&lt;?xml version="1.0" encoding="UTF-8"
standalone="no"?&gt;&lt;soap-env:envelope soap-env:encodingstyle="http:/ ..
[snip]...

	</sourcetext>
</parsererror>
that should read "our interpreter" not WSDL file, sorry. 

interpreter = jscript files that get launched by our appserver for us to handle
specific controls and things, also allows the server to be extended easily. We
are having to catch the payload before it returns, escape it, then do the string
parsing/splitting in Moz to make it work. 

Just so everyone knows, calling document.normalize() on a document should
combine the adjacent text nodes if they do end up in a document. This souldn't
be needed, of course, but if someone is looking for a workaround, this would be it.
Mass-reassigning bugs to dom_bugs@netscape.com
Assignee: jst → dom_bugs
*** Bug 209980 has been marked as a duplicate of this bug. ***
DOM3 says "When a document is first made available via the DOM, there is only
one Text node for each block of text."
http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030609/core.html#ID-1312295772
This descrition is completely same as DOM2 and DOM1.
However, Mozilla splits consecutive block of text in a HTML Tag into multiple
textNode(s) of 4KB. 
Mozilla violated DOM specification, I think.
I agree on that splitting to 4KB buffers is required for performance reason.
But splitting should have been done internaly in a textNode instead of splitting
to multiple textNode(s).
*** Bug 245092 has been marked as a duplicate of this bug. ***
*** Bug 280026 has been marked as a duplicate of this bug. ***
Blocks: 265353
*** Bug 292110 has been marked as a duplicate of this bug. ***
*** Bug 293358 has been marked as a duplicate of this bug. ***
*** Bug 305194 has been marked as a duplicate of this bug. ***
*** Bug 306524 has been marked as a duplicate of this bug. ***
QA Contact: desale → ian
*** Bug 338826 has been marked as a duplicate of this bug. ***
*** Bug 346004 has been marked as a duplicate of this bug. ***
The odd thing is, if you reference the textContent field of the parent node, you will get the full text.  It's only among the #text nodes themselves that this problem persists.  Unfortunately, textContent is not very portable at the moment (so far as my testing has indicated).

It's difficult for me to fathom performance as a significant reason not to change this approach when it seems that this behavior is contradicted by the parent node's textContent field.
textContent contcatenates all text nodes together; see definition in the DOM spec.
*** Bug 346850 has been marked as a duplicate of this bug. ***
Do we have any good performance tests for this?
Is there any way to determine if two textnodes right next to each other are the result of mozilla splitting them or not?  Or is the only way to test is to assume if the current node is 4K in length then the next node is a result of a split of the current node?
There is no real way to tell, no.
Is there any workaround that I (we web-developers) can use to get the whole text and not the first 4K without checking a specific browser or version? 

I have to say that I don't see the purpose of splitting text. Taking the point of view of a web-developer, why should he paste his data together? If you load an image you don't have to do that either. If I buy a banana I don't want that the shopkeeper cuts it in peaces too so he can wrap it up efficiently.

I think, that improving performance of FF is not something that web-developers must be involved with. And is it realy such a performance boost? 

Looking at the first comment of this issue, 2003-02-20 I'm wondering if something is going to be changed or fixed. 
(In reply to comment #28)
> Is there any workaround that I (we web-developers) can use to get the whole
> text and not the first 4K without checking a specific browser or version? 

You can call the normalize method on the document to make sure adjacent text nodes are being merged.
Then there is the textContent property that gives you all the text content of e.g. an element node, whether that has adjacent text child nodes or not does not matter for textContent.


> improving performance of FF is not something that web-developers
> must be involved with

True, which is why this bug is open.

> And is it realy such a performance boost?

Yes.  The current layout algorithm ends up being O(N^2) in the length of the text.  So limiting at 4KB often makes the difference between rendering the text and completely hanging.

I think we should revisit this after the new textframe lands; I'd be interested in doing some measurements at that point.
Depends on: 367177
New text frame won't fix this, and I don't think this should be a priority for Gecko 1.9, unless some volunteer steps up to do it. New textframe will provide a better base for fixing this, though.
We might need to revisit bug 371839 when we fix this.
Depends on: 371839
It also appears that if your text is wrapped in a CDATA block, the text is not broken into chunks.  This may help some folks as a workaround until the bug is fixed.
(In reply to comment #29)
> You can call the normalize method on the document to make sure adjacent text
> nodes are being merged.

can you please, send me some example of code?

> Then there is the textContent property that gives you all the text content of
> e.g. an element node, whether that has adjacent text child nodes or not does
> not matter for textContent.
> 

I've used loops to merge data:

//firefox 6kb xml tag limit bug fix
var txt = ''; try { for (var q = 0; q<100; q++) { var pointer = (q==0) ?
textArr[i].firstChild : pointer.nextSibling ; txt += pointer.data;}} catch (e)
{}

But, it is kinda slowy and not needed for IE
(In reply to comment #35)
> (In reply to comment #29)
> > You can call the normalize method on the document to make sure adjacent text
> > nodes are being merged.
> 
> can you please, send me some example of code?
> 
 
document.normalize();
(In reply to comment #35)
> (In reply to comment #29)
> > You can call the normalize method on the document to make sure adjacent text
> > nodes are being merged.
> 
> can you please, send me some example of code?
> 
> > Then there is the textContent property that gives you all the text content of
> > e.g. an element node, whether that has adjacent text child nodes or not does
> > not matter for textContent.
> > 
> 
> I've used loops to merge data:
> 
> //firefox 6kb xml tag limit bug fix
> var txt = ''; try { for (var q = 0; q<100; q++) { var pointer = (q==0) ?
> textArr[i].firstChild : pointer.nextSibling ; txt += pointer.data;}} catch (e)
> {}
> 
> But, it is kinda slowy and not needed for IE
> 

Because document.normalize() didn't work I used the following. This code-example you asked for handles the data and use the normalize method only when it is available.

I use the adapter-pattern for this issue. 

when iterating through an xml-doc:

someNode=xGetElementsByTagName("someTag",elChild)[0]; //x-library needed!
node=nodeValue(someNode);

the function:

function nodeValue(xmlTag){
 if(xmlTag.firstChild.textContent && xmlTag.normalize) {
  xmlTag.normalize(xmlTag.firstChild);
  content=xmlTag.firstChild.textContent;
  } else if(xmlTag.firstChild.nodeValue) {
   content=xmlTag.firstChild.nodeValue;
  } else {
   content=null;
  }
 return content;
}

So the var node contains the value of the someTag tag whether it is more or less than 4K.

This is an code example you can use right away without concerning about details. As mentioned, you need the function xGetElementsByTagName() from the x-library of Mike Foster. 
Because my knowledge is limited everyone can expand this function to make it better.
Do we have good testcases measuring the performance now that new text frame has landed?
> Do we have good testcases measuring the performance

See testcases linked off bug 384260, for example.  You might want some non-plaintext testcases too, of course; you can always toss one of those files inside a <body> to test that.
Also bug 359555.  And you can get to others through the dependency trees there, I think...
Here is another testcase. Simple XML for simple HTML body ;)

See https://bugzilla.mozilla.org/show_bug.cgi?id=423442
Component: DOM: Core → DOM: Core & HTML
QA Contact: ian → general
Another workaround is to avoid using firstChild altogether...  The recursive call below is the trick -- it processes thru all the large node pages back-to-back so you can combine them however you want (just lists the data here).

// and just my own start-up code here, get there however you like:
allnodes = xmlDoc.getElementsByTagName('*').item(0);
listNodes(allnodes);

// recursive listing function
function listNodes(nodes) {
  for (var i=0; i &lt; nodes.childNodes.length; i++) {
    var node = nodes.childNodes[i];

    if (node.nodeType == 1) {
      document.write(node.nodeName + "&lt;br/&gt;");
    }
    else if (node.nodeType == 3 || node.nodeType == 4) {
      document.write(node.nodeValue + "&lt;br/&gt;");
    }

    if (node.hasChildNodes()) {
      listNodes(node);
    }
  }
}  // end of listNodes

Hope this helps.

Mark Omohundro
ajamyajax.com
ok... 

try:

for (var i=0; i < nodes.childNodes.length; i++) {

instead of:

for (var i=0; i &lt; nodes.childNodes.length; i++) {

and <br/>

instead of:

lt;br/&gt;
Really, calling document.normalize() should work just fine. If it doesn't please file a separate bug so we can fix it, but my testing shows it works.

This of course doesn't mean that this bug is any less important. But it's a workaround that can be used in the meantime.
Mr. Sicking (Jonas):

I for one don't use .firstChild so it's not easy to plug in an xml document.normalize() node test, but will try to do so in the near future.  But, since you mentioned it have you tried any of your own tests with > 32K worth of node text, by any chance?

I just read in this site how Opera might have a text node limit of 32K: http://www.howtocreate.co.uk/tutorials/javascript/dombasics and when ran a quick test of this on my own, both Firefox and Opera seemed to have a problem with 32k+ nodes...  Again, just my own test but thought you might like to know.
And there might be some legacy browser issues also...  These are the kind of issues that steer people like me to "generic" JavaScript workarounds for some problems.  Thank you.

p.s. I'm new here, but also thank you very much for all your fine work on Firefox, and with this board.  And to all.  You probably don't hear that too often, but I for one sure appreciate it.
Thanks for the compliments :)

What type of problems did you run into with 32k+ textnodes in firefox? Slowness or things simply not working correctly? Either way please file bugs.

The only hard limit that I know of for textnode sizes is at around 500 megs of data in a single textnode. Of course, way before then you will probably run into performance issues.
Well, maybe that 32k problem I mentioned wasn't a bug... Earlier I ran an xml file with a 50k text node thru an old test app and it hung up, but just tried the same 50k data in a different xml and it loaded just fine in Firefox (up to the 4k max per the topic of this thread).  

If I can recreate any 32k+ problems consistently, I will file a bug report as you suggested. And will try to see if I can get document.normalize() to work also. 

Thanks again.
Jonas was right.  I got the normalize method to work in Firefox for the 4k+ issue.  Those who use firstChild might try something like this:

for (var i=0; i < nodes.length; i++) {
  if (nodes[i].hasChildNodes()) {
    nodes[i].normalize();
  }

  var elementvar1 =
    nodes[i].getElementsByTagName('elementname1')[0].firstChild.nodeValue;

  // element 2-x processing here...
}

Results might vary depending on the XML file, but hope that helps.  Bye.
Or you can just do:

document.addEventListener("load", function() {document.normalize();}, true);
I'm seeing <4k nodes split in inconsistent places when FF (3.0.3, at least) is loading it for the first or second time... sound like the same thing or a separate issue? This is in the document coming from responseXML. Same node, same text, same script, etc, giving different results; 95%+ of the time it's fine, but on the first load or two it'll split, and not always at the same place, and not always the same tags...
I can confirm comment #51 on Firefox 3.0.4, it is random. The normalize function fixes the issue. I noticed that there is more of a chance of getting a complete node on the local network as opposed to over a VPN / over the internet.

-Ryan
No longer blocks: 306814
I don't know English well. so my friend helps translation for me.

if string has 4096 words over, String variable at javascript engine in firefox
doesn't keep string .

safari, opera, crome, IE can keep its. so I think that it is bug. it will be
bug.

this source can show example of bug to you.
-----------------------------------------------------------------------------------------
var t = xslt.getElementsByTagName("text")[0].firstChild.nodeValue;//4096words over
if(t.length==4096)
{
    t = '<pre class="Errorff">파이어폭스는 4096자 이상의값을 변수에 저장할 수 없습니다. 이 문서는 다른 브라우저로
읽으셔야합니다.<br/>Firefox cannot store the data 

that contains more than 4096 characters in a variable.<br/>This article should
be opened by another browser.</pre>'+t;
}
-----------------------------------------------------------------------------------------
this source dosen't keep String 4096 words

if you want to check bug in my site, you can go to my site.
and compare firefox with another browser.
http://terassia.plecore.com/?code=A23E&cat=%EA%B0%95%EC%9D%98&view=52
Virats, thank you for your comment. This is a bug in Firefox. If you care to work around the problem on your site you can use document.normalize() before accessing the nodeValue or what not of a text node. I'm guessing this problem is triggered on your site by this line of code:

$setInHTML("ID_BOARD_TEXT",this.viewMacro(xslt.getElementsByTagName("text")[0].firstChild.nodeValue));

so if you normalize before that the problem should be worked around. Alternatively you could replace .firstChild.nodeValue with .textContent, depending on what other children there are in your "text" element.
Johnny Stenback, thank you for your reply.

I don't know English well so my frined helps me.

this source is example of my source to understand.

but you found original source. (I'm surprised.) << 사용하려면 사용해
I solved my problem. source of bottom is its.

if($browser().indexOf("msie")==-1)//if browser is IE.
{
 $setInHTML("ID_BOARD_TEXT",this.viewMacro(xslt.getElementsByTagName("text")

[0].textContent));
}
else
{
 $setInHTML("ID_BOARD_TEXT",this.viewMacro(xslt.getElementsByTagName("text")

[0].firstChild.nodeValue));
}

although my problem solve, I want to fix 4k of .firstChild.nodeValue problem.

thank you for reading.

have you nice day!


(In reply to comment #58)
> Virats, thank you for your comment. This is a bug in Firefox. If you care to
> work around the problem on your site you can use document.normalize() before
> accessing the nodeValue or what not of a text node. I'm guessing this problem
> is triggered on your site by this line of code:
> $setInHTML("ID_BOARD_TEXT",this.viewMacro(xslt.getElementsByTagName("text")[0].firstChild.nodeValue));
> so if you normalize before that the problem should be worked around.
> Alternatively you could replace .firstChild.nodeValue with .textContent,
> depending on what other children there are in your "text" element.
Attached image screenshot.png
Is this still a problem now that we have the HTML 5 parser?  The quirksmode testpage WFM ...
HTML5 parser fixed this.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
according to comment #63.
Status: RESOLVED → VERIFIED
Encountered this bug again in FF 10.0 when parsing an XML String with text nodes > 4kB.
Yes, it's still a problem for XML.  That's tracked in a separate bug, I believe.
The problem is still existing, but I found a workaround on the net.
(Works for me)

#==========================================================#
var message = '';
var tmp = xmlDocument.getElementsByTagName('message')[0];
for (i=0; i<tmp.childNodes.length; i++)
{
  message += tmp.childNodes[i].data;
}
#==========================================================#
#==========Source:=========================================#
#==========================================================#
http://spaghetticode.wordpress.com/2009/09/02/firefox-and-javascript-4kb-limit-for-dom-text-nodes/#comment-3
#==========================================================#
#==========================================================#
this bug still here in 23.r2

and still requires client-side code modification as the only solution

wtf this got "fixed" status?
This bug as filed (for the HTML parser) is fixed.  See comment 67.
That said, I don't see an obvious bug tracking the XML case, so I filed bug 890284.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: