Closed
Bug 194231
Opened 21 years ago
Closed 13 years ago
4k+ text blocks create adjacent DOM text nodes
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
VERIFIED
FIXED
People
(Reporter: jst, Unassigned)
References
Details
Attachments
(2 files)
This ain't what the spec says, we should create one text node that holds all the text, no matter how much of it there is...
Comment 1•21 years ago
|
||
Maybe the content sink needs to hold on to the last text node and append text instead of creating a new node if certain conditions are satisfied?
We just have these overly complex buffers in the sink for perf reasons, it was more efficient to create 4k buffers than continuously reallocate and copy stuff. But maybe our string classes are now smart enough and allocate new buffers as needed inside the strings without actually copying? It should be possible to remove a nice chunk of code and make it simpler.
Comment 3•21 years ago
|
||
Text nodes do not use our string classes internally... Or are you talking about different code? Our string classes _may_ be smart enough to do what you want, depending on how you use them (eg using += on an nsString _will_ reallocate, but using dependent concatenations will not....)
Comment 4•21 years ago
|
||
re: what Boris said in comment #2. "if certain conditions are satisfied" Please let that also take into consideration escaped data. ------- I've attached a zip file containing an issue I was trying to pin down and show to Heikki re: Mozilla's parsing of a returned SOAP payload via the XMLHTTP and the DOMParser. The XML is a sample is a real SOAP response from our SOAP listener. SOAP/IIS automatically encodes it and IE handles it seamlessly (happened after SOAP2.0 was released, guess you guys had them worried with the release of NS 6.1's XML capabilities :-) ). We are currently having to serialize the XMLHTTP return (having wrapped the result xml doc in a dummy node so as to end up with a valid doc after splitting, split it, unescape it, then load it back into a DOM. IE allows us to just get the ResultNode and run. Not saying IE is right, not complaining about Moz, just trying to make it more efficient for Web Developers and allow Moz to maintain a leg up on WebServices. Also, attached the files so you all could see what real world idiots, I mean developers (such as myself) are trying to do with your technologies.
Comment 5•21 years ago
|
||
I turned off the escape option in the WSDL file just to play and now remember why we left it enabled, we thought Mozilla's DOMParser was choking on a character being returned in the Result node. Turns out to be this 4k limit. Here is what we get without the encoding option enabled - note it returns the string as escaped, does it do that only for the error report or does it automatically escape nodeValues?: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml"> XML Parsing Error: not well-formedLocation: file:///Line Number 1, Column 372: <sourcetext> <?xml version="1.0" encoding="UTF-8" standalone="no"?><soap-env:envelope soap-env:encodingstyle="http:/ .. [snip]... </sourcetext> </parsererror>
Comment 6•21 years ago
|
||
that should read "our interpreter" not WSDL file, sorry. interpreter = jscript files that get launched by our appserver for us to handle specific controls and things, also allows the server to be extended easily. We are having to catch the payload before it returns, escape it, then do the string parsing/splitting in Moz to make it work.
Reporter | ||
Comment 7•21 years ago
|
||
Just so everyone knows, calling document.normalize() on a document should combine the adjacent text nodes if they do end up in a document. This souldn't be needed, of course, but if someone is looking for a workaround, this would be it.
Reporter | ||
Comment 8•21 years ago
|
||
Mass-reassigning bugs to dom_bugs@netscape.com
Assignee: jst → dom_bugs
Comment 9•21 years ago
|
||
*** Bug 209980 has been marked as a duplicate of this bug. ***
Comment 10•21 years ago
|
||
DOM3 says "When a document is first made available via the DOM, there is only one Text node for each block of text." http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030609/core.html#ID-1312295772 This descrition is completely same as DOM2 and DOM1. However, Mozilla splits consecutive block of text in a HTML Tag into multiple textNode(s) of 4KB. Mozilla violated DOM specification, I think. I agree on that splitting to 4KB buffers is required for performance reason. But splitting should have been done internaly in a textNode instead of splitting to multiple textNode(s).
Comment 11•20 years ago
|
||
*** Bug 245092 has been marked as a duplicate of this bug. ***
Comment 12•19 years ago
|
||
*** Bug 280026 has been marked as a duplicate of this bug. ***
Comment 13•19 years ago
|
||
test case at: http://www.quirksmode.org/dom/tests/textnodesize.html
Comment 14•19 years ago
|
||
*** Bug 292110 has been marked as a duplicate of this bug. ***
Comment 15•19 years ago
|
||
*** Bug 293358 has been marked as a duplicate of this bug. ***
Comment 16•19 years ago
|
||
*** Bug 305194 has been marked as a duplicate of this bug. ***
Comment 17•19 years ago
|
||
*** Bug 306524 has been marked as a duplicate of this bug. ***
Blocks: 306814
Updated•18 years ago
|
QA Contact: desale → ian
Comment 18•18 years ago
|
||
*** Bug 338826 has been marked as a duplicate of this bug. ***
Comment 19•18 years ago
|
||
*** Bug 346004 has been marked as a duplicate of this bug. ***
Comment 20•18 years ago
|
||
The odd thing is, if you reference the textContent field of the parent node, you will get the full text. It's only among the #text nodes themselves that this problem persists. Unfortunately, textContent is not very portable at the moment (so far as my testing has indicated). It's difficult for me to fathom performance as a significant reason not to change this approach when it seems that this behavior is contradicted by the parent node's textContent field.
Comment 21•18 years ago
|
||
textContent contcatenates all text nodes together; see definition in the DOM spec.
Comment 22•18 years ago
|
||
*** Bug 346850 has been marked as a duplicate of this bug. ***
Comment 23•18 years ago
|
||
Do we have any good performance tests for this?
Comment 24•18 years ago
|
||
Is there any way to determine if two textnodes right next to each other are the result of mozilla splitting them or not? Or is the only way to test is to assume if the current node is 4K in length then the next node is a result of a split of the current node?
There is no real way to tell, no.
Comment 28•17 years ago
|
||
Is there any workaround that I (we web-developers) can use to get the whole text and not the first 4K without checking a specific browser or version? I have to say that I don't see the purpose of splitting text. Taking the point of view of a web-developer, why should he paste his data together? If you load an image you don't have to do that either. If I buy a banana I don't want that the shopkeeper cuts it in peaces too so he can wrap it up efficiently. I think, that improving performance of FF is not something that web-developers must be involved with. And is it realy such a performance boost? Looking at the first comment of this issue, 2003-02-20 I'm wondering if something is going to be changed or fixed.
Comment 29•17 years ago
|
||
(In reply to comment #28) > Is there any workaround that I (we web-developers) can use to get the whole > text and not the first 4K without checking a specific browser or version? You can call the normalize method on the document to make sure adjacent text nodes are being merged. Then there is the textContent property that gives you all the text content of e.g. an element node, whether that has adjacent text child nodes or not does not matter for textContent.
Comment 30•17 years ago
|
||
> improving performance of FF is not something that web-developers > must be involved with True, which is why this bug is open. > And is it realy such a performance boost? Yes. The current layout algorithm ends up being O(N^2) in the length of the text. So limiting at 4KB often makes the difference between rendering the text and completely hanging. I think we should revisit this after the new textframe lands; I'd be interested in doing some measurements at that point.
Depends on: 367177
New text frame won't fix this, and I don't think this should be a priority for Gecko 1.9, unless some volunteer steps up to do it. New textframe will provide a better base for fixing this, though.
Comment 33•17 years ago
|
||
It also appears that if your text is wrapped in a CDATA block, the text is not broken into chunks. This may help some folks as a workaround until the bug is fixed.
Comment 35•17 years ago
|
||
(In reply to comment #29) > You can call the normalize method on the document to make sure adjacent text > nodes are being merged. can you please, send me some example of code? > Then there is the textContent property that gives you all the text content of > e.g. an element node, whether that has adjacent text child nodes or not does > not matter for textContent. > I've used loops to merge data: //firefox 6kb xml tag limit bug fix var txt = ''; try { for (var q = 0; q<100; q++) { var pointer = (q==0) ? textArr[i].firstChild : pointer.nextSibling ; txt += pointer.data;}} catch (e) {} But, it is kinda slowy and not needed for IE
Comment 36•17 years ago
|
||
(In reply to comment #35) > (In reply to comment #29) > > You can call the normalize method on the document to make sure adjacent text > > nodes are being merged. > > can you please, send me some example of code? > document.normalize();
Comment 37•17 years ago
|
||
(In reply to comment #35) > (In reply to comment #29) > > You can call the normalize method on the document to make sure adjacent text > > nodes are being merged. > > can you please, send me some example of code? > > > Then there is the textContent property that gives you all the text content of > > e.g. an element node, whether that has adjacent text child nodes or not does > > not matter for textContent. > > > > I've used loops to merge data: > > //firefox 6kb xml tag limit bug fix > var txt = ''; try { for (var q = 0; q<100; q++) { var pointer = (q==0) ? > textArr[i].firstChild : pointer.nextSibling ; txt += pointer.data;}} catch (e) > {} > > But, it is kinda slowy and not needed for IE > Because document.normalize() didn't work I used the following. This code-example you asked for handles the data and use the normalize method only when it is available. I use the adapter-pattern for this issue. when iterating through an xml-doc: someNode=xGetElementsByTagName("someTag",elChild)[0]; //x-library needed! node=nodeValue(someNode); the function: function nodeValue(xmlTag){ if(xmlTag.firstChild.textContent && xmlTag.normalize) { xmlTag.normalize(xmlTag.firstChild); content=xmlTag.firstChild.textContent; } else if(xmlTag.firstChild.nodeValue) { content=xmlTag.firstChild.nodeValue; } else { content=null; } return content; } So the var node contains the value of the someTag tag whether it is more or less than 4K. This is an code example you can use right away without concerning about details. As mentioned, you need the function xGetElementsByTagName() from the x-library of Mike Foster. Because my knowledge is limited everyone can expand this function to make it better.
Comment 38•17 years ago
|
||
Do we have good testcases measuring the performance now that new text frame has landed?
Comment 39•17 years ago
|
||
> Do we have good testcases measuring the performance See testcases linked off bug 384260, for example. You might want some non-plaintext testcases too, of course; you can always toss one of those files inside a <body> to test that.
Comment 40•17 years ago
|
||
Also bug 359555. And you can get to others through the dependency trees there, I think...
Comment 41•16 years ago
|
||
Here is another testcase. Simple XML for simple HTML body ;) See https://bugzilla.mozilla.org/show_bug.cgi?id=423442
Comment 43•16 years ago
|
||
Another workaround is to avoid using firstChild altogether... The recursive call below is the trick -- it processes thru all the large node pages back-to-back so you can combine them however you want (just lists the data here). // and just my own start-up code here, get there however you like: allnodes = xmlDoc.getElementsByTagName('*').item(0); listNodes(allnodes); // recursive listing function function listNodes(nodes) { for (var i=0; i < nodes.childNodes.length; i++) { var node = nodes.childNodes[i]; if (node.nodeType == 1) { document.write(node.nodeName + "<br/>"); } else if (node.nodeType == 3 || node.nodeType == 4) { document.write(node.nodeValue + "<br/>"); } if (node.hasChildNodes()) { listNodes(node); } } } // end of listNodes Hope this helps. Mark Omohundro ajamyajax.com
Comment 44•16 years ago
|
||
ok... try: for (var i=0; i < nodes.childNodes.length; i++) { instead of: for (var i=0; i < nodes.childNodes.length; i++) { and <br/> instead of: lt;br/>
Really, calling document.normalize() should work just fine. If it doesn't please file a separate bug so we can fix it, but my testing shows it works. This of course doesn't mean that this bug is any less important. But it's a workaround that can be used in the meantime.
Comment 46•16 years ago
|
||
Mr. Sicking (Jonas): I for one don't use .firstChild so it's not easy to plug in an xml document.normalize() node test, but will try to do so in the near future. But, since you mentioned it have you tried any of your own tests with > 32K worth of node text, by any chance? I just read in this site how Opera might have a text node limit of 32K: http://www.howtocreate.co.uk/tutorials/javascript/dombasics and when ran a quick test of this on my own, both Firefox and Opera seemed to have a problem with 32k+ nodes... Again, just my own test but thought you might like to know. And there might be some legacy browser issues also... These are the kind of issues that steer people like me to "generic" JavaScript workarounds for some problems. Thank you. p.s. I'm new here, but also thank you very much for all your fine work on Firefox, and with this board. And to all. You probably don't hear that too often, but I for one sure appreciate it.
Thanks for the compliments :) What type of problems did you run into with 32k+ textnodes in firefox? Slowness or things simply not working correctly? Either way please file bugs. The only hard limit that I know of for textnode sizes is at around 500 megs of data in a single textnode. Of course, way before then you will probably run into performance issues.
Comment 48•16 years ago
|
||
Well, maybe that 32k problem I mentioned wasn't a bug... Earlier I ran an xml file with a 50k text node thru an old test app and it hung up, but just tried the same 50k data in a different xml and it loaded just fine in Firefox (up to the 4k max per the topic of this thread). If I can recreate any 32k+ problems consistently, I will file a bug report as you suggested. And will try to see if I can get document.normalize() to work also. Thanks again.
Comment 49•16 years ago
|
||
Jonas was right. I got the normalize method to work in Firefox for the 4k+ issue. Those who use firstChild might try something like this: for (var i=0; i < nodes.length; i++) { if (nodes[i].hasChildNodes()) { nodes[i].normalize(); } var elementvar1 = nodes[i].getElementsByTagName('elementname1')[0].firstChild.nodeValue; // element 2-x processing here... } Results might vary depending on the XML file, but hope that helps. Bye.
Or you can just do: document.addEventListener("load", function() {document.normalize();}, true);
Comment 51•16 years ago
|
||
I'm seeing <4k nodes split in inconsistent places when FF (3.0.3, at least) is loading it for the first or second time... sound like the same thing or a separate issue? This is in the document coming from responseXML. Same node, same text, same script, etc, giving different results; 95%+ of the time it's fine, but on the first load or two it'll split, and not always at the same place, and not always the same tags...
Comment 52•16 years ago
|
||
That's bug 449219.
Comment 53•16 years ago
|
||
I can confirm comment #51 on Firefox 3.0.4, it is random. The normalize function fixes the issue. I noticed that there is more of a chance of getting a complete node on the local network as opposed to over a VPN / over the internet. -Ryan
Comment 57•15 years ago
|
||
I don't know English well. so my friend helps translation for me. if string has 4096 words over, String variable at javascript engine in firefox doesn't keep string . safari, opera, crome, IE can keep its. so I think that it is bug. it will be bug. this source can show example of bug to you. ----------------------------------------------------------------------------------------- var t = xslt.getElementsByTagName("text")[0].firstChild.nodeValue;//4096words over if(t.length==4096) { t = '<pre class="Errorff">파이어폭스는 4096자 이상의값을 변수에 저장할 수 없습니다. 이 문서는 다른 브라우저로 읽으셔야합니다.<br/>Firefox cannot store the data that contains more than 4096 characters in a variable.<br/>This article should be opened by another browser.</pre>'+t; } ----------------------------------------------------------------------------------------- this source dosen't keep String 4096 words if you want to check bug in my site, you can go to my site. and compare firefox with another browser. http://terassia.plecore.com/?code=A23E&cat=%EA%B0%95%EC%9D%98&view=52
Reporter | ||
Comment 58•15 years ago
|
||
Virats, thank you for your comment. This is a bug in Firefox. If you care to work around the problem on your site you can use document.normalize() before accessing the nodeValue or what not of a text node. I'm guessing this problem is triggered on your site by this line of code: $setInHTML("ID_BOARD_TEXT",this.viewMacro(xslt.getElementsByTagName("text")[0].firstChild.nodeValue)); so if you normalize before that the problem should be worked around. Alternatively you could replace .firstChild.nodeValue with .textContent, depending on what other children there are in your "text" element.
Comment 59•15 years ago
|
||
Johnny Stenback, thank you for your reply. I don't know English well so my frined helps me. this source is example of my source to understand. but you found original source. (I'm surprised.) << 사용하려면 사용해 I solved my problem. source of bottom is its. if($browser().indexOf("msie")==-1)//if browser is IE. { $setInHTML("ID_BOARD_TEXT",this.viewMacro(xslt.getElementsByTagName("text") [0].textContent)); } else { $setInHTML("ID_BOARD_TEXT",this.viewMacro(xslt.getElementsByTagName("text") [0].firstChild.nodeValue)); } although my problem solve, I want to fix 4k of .firstChild.nodeValue problem. thank you for reading. have you nice day! (In reply to comment #58) > Virats, thank you for your comment. This is a bug in Firefox. If you care to > work around the problem on your site you can use document.normalize() before > accessing the nodeValue or what not of a text node. I'm guessing this problem > is triggered on your site by this line of code: > $setInHTML("ID_BOARD_TEXT",this.viewMacro(xslt.getElementsByTagName("text")[0].firstChild.nodeValue)); > so if you normalize before that the problem should be worked around. > Alternatively you could replace .firstChild.nodeValue with .textContent, > depending on what other children there are in your "text" element.
Comment 62•14 years ago
|
||
Is this still a problem now that we have the HTML 5 parser? The quirksmode testpage WFM ...
Comment 64•13 years ago
|
||
HTML5 parser fixed this.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 66•12 years ago
|
||
Encountered this bug again in FF 10.0 when parsing an XML String with text nodes > 4kB.
Comment 67•12 years ago
|
||
Yes, it's still a problem for XML. That's tracked in a separate bug, I believe.
Comment 68•12 years ago
|
||
The problem is still existing, but I found a workaround on the net. (Works for me) #==========================================================# var message = ''; var tmp = xmlDocument.getElementsByTagName('message')[0]; for (i=0; i<tmp.childNodes.length; i++) { message += tmp.childNodes[i].data; } #==========================================================# #==========Source:=========================================# #==========================================================# http://spaghetticode.wordpress.com/2009/09/02/firefox-and-javascript-4kb-limit-for-dom-text-nodes/#comment-3 #==========================================================# #==========================================================#
Comment 69•11 years ago
|
||
this bug still here in 23.r2 and still requires client-side code modification as the only solution wtf this got "fixed" status?
Comment 70•11 years ago
|
||
This bug as filed (for the HTML parser) is fixed. See comment 67.
Comment 71•11 years ago
|
||
That said, I don't see an obvious bug tracking the XML case, so I filed bug 890284.
You need to log in
before you can comment on or make changes to this bug.
Description
•