<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Description

•

22 years ago

Currently "View Source" passes through the HTML tokenizer (so we can do syntax highlighting), which often results in mangling of page source. We've discussed creating a separate tokenizer before; while I doubt anyone has time to do this now, I wanted to get this bug on the table (and also track the bugs caused by our current setup).

Updated

•

22 years ago

Blocks: 57724

Comment 1

•

22 years ago

Or we could just pass this crap of to the system text editor....

Reporter

Updated

•

22 years ago

Blocks: 189202

harishd

Updated

•

22 years ago

Status: NEW → ASSIGNED

Target Milestone: --- → Future

Reporter

Updated

•

22 years ago

Blocks: 172947

Reporter

Updated

•

22 years ago

Blocks: 182215

Reporter

Updated

•

22 years ago

Blocks: 204573

Assignee

Comment 2

•

21 years ago

I've been thinking about this, and I may even get a patch going at some point. I was wondering about a couple of questions. First, would it be best if the new tokenizer could leverage the old tokenizer's code in some way? This may be tricky, and keeps some of the complexity of view source in the tokenizer, but code duplication is a bad thing, and some may be avoidable. Is it worth the extra complexity? Second, how forgiving should the new tokenizer be? If there is a missing '>' at the end of a tag, should it still colorize subsequent tags like there was a '>' (though, of course, it would not add the '>') or colorize subsequent words as though they were attributes? I think this question comes down to who is the 'target audience'. By being less forgiving, it makes mistakes easier to see for developers, but for others, may make a page less readable.

Comment 3

•

21 years ago

Any sharing of code would be great. As for forgiveness, I'd love it if we could blink erroneous things and tokenize as if the errors were corrected....

Assignee

Comment 4

•

20 years ago

Assigning to me since I'm working on this. bz, how forgiving should view-source be with XML constructs like CDATA? Currently, we make a mockery of it (see bug 84430). A real XML parser would be completely unforgiving, so I'm inclined to make view-source the same way. However, the current method of handling it in CCDataSection is basically: find the string ]]. If the next character is a '>', good, we're done. Otherwise, skip everything until we find a '>' and then quit. I could follow this route (except adding everything between the ]] and > to our string, or be ultra-strict. What do you think?

Assignee: harishd → mrbkap

Status: ASSIGNED → NEW

Comment 5

•

20 years ago

I have no real opinion past "we sholdn't lose content"....

Assignee

Comment 6

•

20 years ago

The more I think about this, the more I am not convinced this is the right way to go. I'm starting to feel that having two tokenizers doing almost exactly the same thing isn't worth the upkeep costs/duplicated code. The remaining bugs should be fixable with the current tokenizer (though a few may need to be handled in the view-source DTD). If I run into any major problems with the existing tokens, I'll just derive from CHTMLToken or something like that. We're never going to be able to show the unparsed HTML (just a representation), but it's certainly possible to make that representation look exactly like the original with the current codebase. As an aside, I'm now trying to collect all view-source bugs and resolve out the dupes, so if you see any not attatched to either this bug or bug 57724, please attach them.