Closed Bug 97886 Opened 24 years ago Closed 20 years ago

JavaScript rendered as HTML after document.write

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: munyer, Assigned: mrbkap)

References

Details

(Keywords: dom0)

Attachments

(4 files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Macintosh; U; PPC; en-US; rv:0.9.3+) Gecko/20010831 BuildID: 2001083110 For a project which I'm not allowed to disclose, I needed a way to make HTML pages that could detect when a script has failed to execute (for any reason at all, including complete removal by a security-conscious proxy) and insert other content where that script was intended to be. The only syntax I could find that would meet those requirements was the HTML code shown below. This idiom is rather subtle (one might even say sneaky) but it works, and it complies with all the standards. In fact, if I understand the history of HTML correctly, this code should work in EVERY correctly functioning browser, all the way back to the very first one (Tim Berners-Lee's original NeXT Step browser). All it requires of pre-JS1.2 browsers is the ability to ignore SGML comments, and unrecognized tags and attributes -- and, if it recognizes <SCRIPT> tags, to respect the LANGUAGE attribute. This code worked in MSIE 4, and (with a little tweaking) in NN 4 as well. MSIE 5 and 6 didn't cause any trouble. Unfortunately, now I'm getting reports that these pages don't work in NN 6.1. I've been able to replicate the problem in today's Mozilla build. I've included three example pages below. The first two work correctly; the third fails. The only difference between these pages is in the way they load the script. Page 1 uses an inline script; page 2 uses a "<SCRIPT SRC=>" tag to load an external script; and page 3 uses an inline script to document.write a "<SCRIPT SRC=>" element which then loads the external script. Reproducible: Always Steps to Reproduce: 1. Disable JavaScript. 2. View pages 1, 2 and 3 (attached below). 3. Re-enable JavaScript. 4. View the same three pages with JavaScript. 5. If you feel like it, repeat the above with MSIE 4/5/6. Actual Results: Pages 1 and 2 work correctly. Page 3 works without JavaScript. But when JavaScript is enabled, page 3 interprets the /* and the "fallback content" as HTML, even though these data are inside a SCRIPT element and therefore should be interpreted as JavaScript. Expected Results: Page 3 should work the same as pages 1 and 2. If at first you don't understand how this idiom works (I wouldn't blame you) -- no problem; do the following and you'll understand. 1. Use the W3C HTML validator to make a SGML parse tree of page 1 below. This tree shows how the page would be parsed by any agent that recognizes the scripts but does not execute them (a proxy, a robot, Lynx, NN 2, NN 6 with JS disabled, etc.). 2. Imagine this parse tree with both SCRIPT elements removed. That's how the output of a script-stripping proxy would look. 3. Load page 1 into a text editor. Manually simulate the effect of the first script, by replacing the entire SCRIPT element (from <SCRIPT> to </SCRIPT>, including everything in between) with the characters the script would feed to document.write. Use the W3C validator again, and study the new parse tree. This shows how a browser should proceed to parse the document, after that first script has been executed successfullly. 4. Go back to the original version, remove the <SCRIPT> and </SCRIPT> tags, but do not remove the content between them. Run the validator again. This shows how an extremely old browser, like Mosaic or NN 1.0, would parse this page.
Confirming with today's CVS build on Linux.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Mac System 9.x → All
Hardware: Macintosh → All
Sounds similar to bug 66013.
QA Contact: bsharma → moied
Keywords: dom0
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.7
--> 0.9.8
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Out of time. Mass move to 0.9.9
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Target Milestone: mozilla0.9.9 → mozilla1.0
Mass moving bugs to 1.1
Target Milestone: mozilla1.0 → mozilla1.1
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:0.9.8+) Gecko/20020301 I have another site that has behavior that matches the summary of this bug. goto http://www.myrealbox.com and select the "Switch to secure connection" link under the login box at the top left of the page. The next page attempts to put up a Username/Password box, but barfs on some javascript and renders it into the page instead. The javascript looks ok to me, so not sure what's up.
[was 1.1alpha]
Target Milestone: mozilla1.1alpha → Future
I found a similar problem, but my examples do not use document.write(). The pages often render with javascript as HTML as the original poster described. URL: http://www.bulliondirect.com/catalog/selectProducts.do?category=9 validates as HTML 4.01 Transitional compliant via http://validator.w3.org/detailed.html. Incorrectly renders javascript about 70% of the time in the following browsers: Mozilla 1.6 Gecko/20040113, Mozilla 1.5 Gecko/20030916, FireBird 0.7 Gecko/20031007, Netscape 7.1 Gecko/20030624. If you keep hitting Reload, you may get a properly rendered page. Page source is identical regardless of correct rendering. Saving a local copy of the bad rendering then opening the local copy always renders correctly. Strange how by reloading the page, the browser can randomly decide to render the same source differently. Some sort of timing issue? Looks like this bug's been around for nearly 2.5 years now.
I'm not really sure who to CC on this... This smells like a race condition in the parser. If I step through this with a debugger and give it some time (maybe to load the external script?) this works fine. However, if I let it run right through, I get a bogus text token in the parser (null start and end points) and this breaks. I'm wondering if we process the inline script, calling CNavDTD::BuildModel on the result, which then tries to queue the request (which should block the parser), but when we return from processing the inline script, we *don't* block the parser (return NS_OK). This would seem to make sense to me, as if I then waited for the script to load (i.e., manually blocking the parser in the debugger) this would work... Does this sound at all plausable? Does anybody know this code in depth? If this is indeed the problem, it would seem to be another indication (to me at least) that the parser blocking model is seriously broken (also see bug 220542). Maybe we need to consider making a stack of some sort of elements blocking the parser so that out-of-order continue calls don't mess us up?
Assignee: harishd → parser
Status: ASSIGNED → NEW
QA Contact: moied → mrbkap
Blake, there's a lot of complexity in the blocking code, yes... I'd try putting printfs in the content sink to see whether we're blocking the parser there or not, then go on from there.
The root cause of this bug is that the parser uses the wrong parser context to parse the result of the external script. When the outer (inline) script writes the inner one, the parser pushes a new context onto its stack with key id = 0x01 (more on that later). This context is only popped off once the tokenizer reaches EOF on its scanner (i.e., it's been exhausted). The idea is that a nested write call (document.write("document.write('...')");) would generate a different parser key, so a new parser context would be established, and the parser would continue on its merry way. However, what happens here is that when the external script loads, the tokenizer of the inline script hasn't finished yet (since we block parsing to load scripts, which stops tokenization at the script), so we have a parser context already pushed on. Now, we take a break from the action and examine NS_GENERATE_PARSER_KEY(). This macro in nsHTMLDocument.cpp is supposed to generate increasing keys for parser contexts as writes nest. It relies on mWriteLevel to do so. Back to the action! In this case, however, the previous write() has finished, and mWriteLevel is 1 (0 before the call). So the generated parser key is the same for the external script as for the inline script. The parser hasn't popped of the inline's script context (which is now non-incremental) and so it continues to use this context for the external script. Now when the (non-incremental) scanner is used to parse |<script ...>/*| the tokenizer thinks that it needs to fake an end tag (giving |<script ...></script>|). So the /* [message] is parsed as text because the parser uses the wrong context. This problem is unfortunately very complex. I think the solution is going to be to generate better parser keys based on something like the content ID of the current script element AND mWriteLevel (is this possible?). Critiques welcome.
Assignee: parser → mrbkap
Depends on: 280713
I'm marking this fixed, since I just checked in bug 280713, which, while it doesn't solve the root of the problems here, is thorough enough to really cover this up. If you think this should remain open to fix the problem of using the wrong parser context, please reopen.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
It's worth noting here that this was really fixed for good in bug 271184.
Depends on: 271184
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: