Closed
Bug 188609
Opened 22 years ago
Closed 20 years ago
Need separate tokenizer for View Source
Categories
(Core :: DOM: HTML Parser, defect)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
WONTFIX
Future
People
(Reporter: choess, Assigned: mrbkap)
References
Details
Currently "View Source" passes through the HTML tokenizer (so we can do syntax
highlighting), which often results in mangling of page source. We've discussed
creating a separate tokenizer before; while I doubt anyone has time to do this
now, I wanted to get this bug on the table (and also track the bugs caused by
our current setup).
Comment 1•22 years ago
|
||
Or we could just pass this crap of to the system text editor....
Assignee | ||
Comment 2•21 years ago
|
||
I've been thinking about this, and I may even get a patch going at some point. I
was wondering about a couple of questions.
First, would it be best if the new tokenizer could leverage the old tokenizer's
code in some way? This may be tricky, and keeps some of the complexity of view
source in the tokenizer, but code duplication is a bad thing, and some may be
avoidable. Is it worth the extra complexity?
Second, how forgiving should the new tokenizer be? If there is a missing '>' at
the end of a tag, should it still colorize subsequent tags like there was a '>'
(though, of course, it would not add the '>') or colorize subsequent words as
though they were attributes? I think this question comes down to who is the
'target audience'. By being less forgiving, it makes mistakes easier to see for
developers, but for others, may make a page less readable.
Comment 3•21 years ago
|
||
Any sharing of code would be great.
As for forgiveness, I'd love it if we could blink erroneous things and tokenize
as if the errors were corrected....
Assignee | ||
Comment 4•20 years ago
|
||
Assigning to me since I'm working on this.
bz, how forgiving should view-source be with XML constructs like CDATA?
Currently, we make a mockery of it (see bug 84430). A real XML parser would be
completely unforgiving, so I'm inclined to make view-source the same way.
However, the current method of handling it in CCDataSection is basically: find
the string ]]. If the next character is a '>', good, we're done. Otherwise, skip
everything until we find a '>' and then quit. I could follow this route (except
adding everything between the ]] and > to our string, or be ultra-strict. What
do you think?
Assignee: harishd → mrbkap
Status: ASSIGNED → NEW
Comment 5•20 years ago
|
||
I have no real opinion past "we sholdn't lose content"....
Assignee | ||
Comment 6•20 years ago
|
||
The more I think about this, the more I am not convinced this is the right way
to go. I'm starting to feel that having two tokenizers doing almost exactly the
same thing isn't worth the upkeep costs/duplicated code.
The remaining bugs should be fixable with the current tokenizer (though a few
may need to be handled in the view-source DTD). If I run into any major problems
with the existing tokens, I'll just derive from CHTMLToken or something like that.
We're never going to be able to show the unparsed HTML (just a representation),
but it's certainly possible to make that representation look exactly like the
original with the current codebase.
As an aside, I'm now trying to collect all view-source bugs and resolve out the
dupes, so if you see any not attatched to either this bug or bug 57724, please
attach them.
Assignee | ||
Comment 7•20 years ago
|
||
Officially marking WONTFIX. The last of the real "major" munging errors is now
fixed. We don't need another tokenizer. Yay!
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•