315 bytes, text/html
640 bytes, text/html
2.86 KB, text/html
7.20 KB, patch
|Details | Diff | Splinter Review|
8.44 KB, patch
|Details | Diff | Splinter Review|
I created a simple HTML file: <html> <body bgcolor="white" "bogus extra stuff""""> </body> </html> I opened the file up in Mozilla 0.8 under Win2K, and went to View Page Source. I expected to see the same HTML code that I typed in, despite the fact that the HTML code contained errors. Instead, some of the double-quotes got cleaned up: <html> <body bgcolor="white" bogus extra stuff> </body> </html>
Same in here, under Linux. The question is whether it's a bug: a html file should look like that: <TAG OPT1="value1"> and not <TAG "value1"> ...
sending to parser for triage
Confirming as per user comment, setting OS=All.
View source should be showing the source as it really is -- otherwise its usefulness for debugging web pages is not very great (and why does anyone ever look at view source otherwise?).
because we want to see what content mozilla decided was worth parsing ;-b otherwise, we'd use telnet to get the real content :) *sigh* this is at least normal severity.
I appreciate that this is truly annoying, but can it actually alter the semantics of content so that developers are misled rather than just inconvenienced? Futuring for now, will reconsider given a compelling test case.
The attached testcase sucks as HTML goes. Nevertheless... Moving the closing quote to before </html> will make things work OK. But the fact remains that view source just silently lost half the source of that page, and more importantly has lost all indications that an error of any kind has occured. I'm not completely sure that it's the same bug, but it seems related...
17 years ago
*** This bug has been marked as a duplicate of 57724 ***
Reopening 57724 dependencies for independent resolution.
Created attachment 82208 [details] Single quotes on event handler morphed to double-quotes Outer single quotes are converted to double quotes both in "View->Source" and via the "Save Page As..." dialogue. In the latter case, the saved page will not work because this change breaks the nested quotes.
Roland, I cannot reproduce the error you describe when viewing the source of that attachment (linux trunk build 2002-05-01-21). I see single quotes... The "save as" issue is a separate issue (with the "web page, complete" mode only) and should be filed as a separate bug.
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and <http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss bugs are of critical or possibly higher severity. Only changing open bugs to minimize unnecessary spam. Keywords to trigger this would be crash, topcrash, topcrash+, zt4newcrash, dataloss.
Please actually read bugs before changing the severity, ok?
See also comment #3 of this bug: 223753
*** Bug 233609 has been marked as a duplicate of this bug. ***
This "cleaning up" of the source before interpretation can also cause security issues. Eg my website allows users to add certain simple HTML tags to their posts, but not others. If, however, a user enters this: <B ONCLICK="window.open('http://www.badsite.com')" then mozilla will automatically append the closing > and render the next part of the website's own content with this onclick link. Admittedly, it's my fault in this case for not having a good enough regexp for filtering out bad tags (which I've now fixed), but I do wonder whether Mozilla should be displaying "what an attacker means" rather than "what the designer said". The following was the vulnerable html cleanup code I had used. I've simplified $allowed for clarity. $allowed='\s*\/?\s*(b|i|u|s|pre|tt|ul|ol|li|p|)\s*'; $memo=preg_replace("/<((?!($allowed>))[^<>]*)>/is", "<\\1>", $memo);
--> me since I'm working on this.
Fixing summary to reflect the bug that this is covering (our awful handling of unclosed tags and the end of the document). For the record: start tags disappear altogether, end tags get duplicated.
bz, rbs, do you have any more information on the stuff here? If you know anyone who may, please add them to the CC list. My explanation for this bug turned into a 300 word mini-essay on end-of-file handling at the tokenizer level, so instead of filling up this bug with it, I'll attach it (it's also available at: http://www.owlnet.rice.edu/~mrbkap/bug70828explain.html). My explanation is pretty long and comprehensive, but the short of it is that I have a partial patch with a couple of problems left to fix. I'm not sure about some parts of the patch (see the explanation). Any suggestions on my explanation or solution for this are welcome! Also, I can generalize my explanation and put it up as parser documentation (on EOF handling in the tokenizer) somewhere (since htmlparser documentation is terribly lacking), I just need to know what format/where to put it.
Created attachment 160805 [details] Explanation of this bug. I'm attaching this to the bug so that if/when I take the explanation down from the internet, people looking into parser code can still find it here (and don't receive a 404).
Created attachment 161187 [details] [diff] [review] work in progress This is a partial fix for this bug. I'm still unsure as to the nsScanner::IsIncremental()/mIsFinalChunk decision and there are a couple of (easier) bugs that I'd like to fix in this code before I spend more time on this bug.
Comment on attachment 161187 [details] [diff] [review] work in progress I've decided to go with the existing code's idea and fix this (for the most part) with nsScanner::IsIncremental(). I'll post a new patch later this week.
Created attachment 162547 [details] [diff] [review] patch v2 This patch fixes this bug. Basically, because of document.write(), mIsFinalChunk is useless to us (this is the reason it isn't really used anywhere in the parser). Basically, I went through and each place we can receive EOF, and added a check to see if the document was out of data. This also fixes the bug where the document: "<script>foo;</script " would show up as "<script>foo;</script</script" in view-source. I'm not sure if I was just unobservant, but my debug build seems to be viewing source very slowly (1/2 to 1/4 of a second lag on slashdot.org). I'm unsure if it's my patch, because when I locally backed it out, I still saw the lag. If someone could test and see if this does adversely view-source performance, it would be great.
Comment on attachment 162547 [details] [diff] [review] patch v2 rbs, could you take a look at this? An additional note: the changes to nsScanner::ReadTagIdentifier are because if we ran out of data before reaching an invalid character, we wouldn't end up appending what we had already.
Note that testing performance in a debug build is very pointless....
Comment on attachment 162547 [details] [diff] [review] patch v2 I think you are better off at this point to find out how to run the parser regression tests. They will add another degree of confidence. This patch is invasive and hard to assess. I am witholding my r= pending what you see from the regression tests.
Comment on attachment 162547 [details] [diff] [review] patch v2 Rerequesting r= after running the parser regression tests. The only two testcases to change were quote002.html and quote003.html, both of which exhibit the problem that this bug is trying to fix (unclosed start tag at the end of the document gets dropped).
Comment on attachment 162547 [details] [diff] [review] patch v2 r=rbs
Comment on attachment 162547 [details] [diff] [review] patch v2 bz, asking you for sr since I'm touching nsScanner::ReadTagIdentifier. again.
Comment on attachment 162547 [details] [diff] [review] patch v2 This patch misses a case: the document |<a b| will lose the 'b'. I have a fix in my tree, but don't have time to attach it now. The patch is pretty much the same, except for an extra check in CAttributeToken::Consume().
Created attachment 163144 [details] [diff] [review] patch v2 This makes us not lose the last attribute in an unclosed tag.
Comment on attachment 163144 [details] [diff] [review] patch v2 This should be the final patch. The differences between this patch are all in CAttributeToken::Consume(). I also fixed some typos and cleaned up some comments.
Comment on attachment 163144 [details] [diff] [review] patch v2 In an attempt to get this in faster. I'm playing with sr's. Trying roc first. rbs, I'm asking for r= on this one (for the attribute changes).
Comment on attachment 163144 [details] [diff] [review] patch v2 r=rbs
Fix checked in (albeit with the wrong bug number in the comment).