Closed Bug 57724 Opened 24 years ago Closed 19 years ago

View source munging pages (does not display original page source as sent by server)

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: jruderman, Assigned: mrbkap)

References

Details

(Keywords: helpwanted, perf, testcase)

Attachments

(4 files, 3 obsolete files)

These are bugs where view-source doesn't display a page as originally sent.

Assigned to me for tracking only; steal if you want to use this bug to fix 
things.
Adding five dependencies.
Depends on: 44186, 49030, 55583, 57717, 57722
Keywords: meta
Jesse:
you also added Bug 49030. I (the reporter of 49030) just checked the page in
question (http://www.handyshop.de), and it looks fine now. I think 49030 should
be marked as fixed.
OS: Windows 98 → All
Hardware: PC → All
Depends on: 43267
Added related bugs involving view-source.
Depends on: 6119, 40867
This bug doesn't depend on bug 40867 directly.  It depends on bug 40867 through 
bug 55583 (and maybe through bug 6119).
No longer depends on: 40867
It's a tracking bug, right?  Isn't it easier to track the indirect dependencies
if you list them directly?  (Or do you tend to look at the dependency tree?)
I tend to look at the dependency tree, at least if there are more than three 
blockers.  By just listing direct dependencies it's easier to see what the 
problems with view-source are.
Depends on: 63137
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9
Target Milestone: mozilla0.9 → mozilla0.9.1
Adding bug 67938.
Depends on: 67938
Tracking bugs should probably belong to the QA. Giving bug to janc.
Assignee: harishd → janc
Status: ASSIGNED → NEW
updated qa contact.
QA Contact: janc → bsharma
I don't know who should get this, but I do know it's not mine...
returning to Jesse
Assignee: janc → davidr8
Depends on: 70828
Resetting target milestone.  Several of the bugs tracked by this bug are 
futured :(
Target Milestone: mozilla0.9.1 → ---
Bug 78342 should probably be added to this bug, or duped against one of the
dependencies if I missed it.
Depends on: 83221
Adding bug 70828, view source makes double-quotes inside tags disappear.
Many of the bugs that this bug depends on seem to indicate that the parser
cleans up the HTML. It seems that (for view source) the document should not go
through the parser, or at least should go through the parser without being changed.
-> nobody (removing tracking bugs from the list of bugs I own)
Priority: P3 → --
-> nobody
Assignee: jesse → nobody
Depends on: 87726
Depends on: 91045
Some of the bugs that depend on this are already fixed in my tree. I'll fix the
others too (exept bug 55583 which is not directly related to the others).
This is no longer a meta bug. I'll dupe the bugs that depend on this.
Assignee: nobody → clarence
No longer depends on: 6119, 43267, 44186, 49030, 55583, 57717, 57722, 63137, 67938, 70828, 83221, 87726, 91045
Keywords: meta
Priority: -- → P3
Summary: [meta] View-source munging pages → View-source munging pages
Target Milestone: --- → mozilla0.9.4
Status: NEW → ASSIGNED
*** Bug 63137 has been marked as a duplicate of this bug. ***
*** Bug 87726 has been marked as a duplicate of this bug. ***
*** Bug 57722 has been marked as a duplicate of this bug. ***
*** Bug 89033 has been marked as a duplicate of this bug. ***
*** Bug 43267 has been marked as a duplicate of this bug. ***
*** Bug 57717 has been marked as a duplicate of this bug. ***
*** Bug 70828 has been marked as a duplicate of this bug. ***
*** Bug 83221 has been marked as a duplicate of this bug. ***
*** Bug 91045 has been marked as a duplicate of this bug. ***
*** Bug 70918 has been marked as a duplicate of this bug. ***
*** Bug 49030 has been marked as a duplicate of this bug. ***
So it's not lost in the duping, my testcase is attached to bug 43267, for a
specific manifestation of this bug (extraneous quote removal).
Marking mostfreq as it has 13 dups.
Keywords: mostfreq
*** Bug 92196 has been marked as a duplicate of this bug. ***
*** Bug 91240 has been marked as a duplicate of this bug. ***
Andreas: your comments in bug 92196 mention generated content.  Are you
generating content with a stylesheet (using the "content" attribute)?  If so,
please check that it's selectable and copyable from view source.  The initial
reason view source was moved from XML to HTML was that Mozilla had a bug in
which generated content was not selectable.  See bug 12460....
Boris, no, the tokenizer will produce an additional token for view source.
Andreas: Good.  :)  Another question.  While you're in there, can you possibly
get line numbers working?  The whole mLineNumber thing is badly off (for example,

<table width="100%"
       height="100%">

will get counted as one line because the newline is included in the attribute
value -- '"100%"\n'.  Similar problems happen with doctype declarations and the
like.  CDATA sections seem to count as one line each, making this _really_ bad
on pages with inline JS or CSS).  

I was looking at getting that working at some point, but it would involve
changes to the tokenizer, for the reasons I just mentioned.  Luckily, you're
already changing the tokenizer....

just a thought... feel free to ignore this. :)
Boris, I've already done this for most tokens in the tokenizer, but that is not
enough to make it work. Some code (probably in CNavDTD) has to read the newline
count for *all* tokens. At the moment the newline count seems to be ignored for
at least attribute and end tag tokens and for all code inside <noscript>.Also,
it will not work for mac style line breaks (the scanner counts simply
LFs and I do the same).
OK. Well, I was hoping.  :)
Andreas: in the attachment you added to bug 92196 the &amp;s are
incorrect since the source doesn't contain &amp;s but only &-chars.

And why is mozilla trying to highlight stuff within the <xmp> tags?
The only tag recogninzed after a <xmp> should be a </xmp> if I understood
the HTML-2.0 definition correctly. The same goes for <listing>.

I know that the page is HTML-2.0 but those pages are still out there. 
carljohan, as I wrote in bug 92196, you'll be able to suppress the "amp;"s.

For <xmp> and <listing> (and <plaintext>), you're mostly right. Its contents
should be CDATA (but note that these elements were deprecated even in HTML 2.0).
I'll investigate if I can simply fix it by setting the CDATA flag in
nsElementTable.cpp. If not I'll file a separate bug on it.
Depends on: 92718
Clarence: Could you please post a patch ( even if incomplete )? I would like to
take a look at it. Btw, watch out when you touch nsElementTable.cpp...it's very
fragile. 

Note: All these problems started because of view source coloring.
Attached patch patch (work in progress) (obsolete) — — Splinter Review
The patch is not a final version, but most of it should work.
The basic idea is to create more smaller tokens instead of a single complex
token for view-source, e.g. separate tokens for delimiters, attribute names
and attribute values. The delimiter tokens slow down view-source notably, so
I made a pref for viewing them as plain text ("view_source.styled_delimiters").

> Note: All these problems started because of view source coloring.

I need it for rewriting HTML source in bug 40873 too. And maybe composer also
could use it.
No longer depends on: 92718
Attached patch patch (still work in progress) (obsolete) — — Splinter Review
The new patch does no longer freeze/crash on plain text documents and it
includes a modified version of the patch attached to bug 91437.
Clarence: This patch is huge!!!! The bigger the patch the harder it is to get
into 0.9.4. Btw, I haven't started reviewing your patch yet. 

> The delimiter tokens slow down view-source notably, so
> I made a pref for viewing them as plain text 
> ("view_source.styled_delimiters")

Who would want to view source in color if it's too slow? IMO, we need a completely  
different approach to solve the coloring problem ( very low priority though ).
> This patch is huge!!!! The bigger the patch the harder it is to get
> into 0.9.4.

Basically, it's a rewrite of most of nsHTMLTokenizer and nsHTMLTokens. Probably
they were not written to handle view-source and can't do it properly without
major changes. I thought about writing a own tokenizer for view-source, but I
wanted to have view-source being in sync with the tokenizing process. And I
found problems not directly related to view-source that I wanted to fix too
(I'll summarize them later). Finally I noticed that performance in the tokenizer
could be much better (I do not know how much it adds to the total page load
time, but I hope that it will be notably faster).

Yes, my patch has some risks and should get checked in early in a milestone
cycle. I hope to finish it within a week, but even that might be too late for
0.9.4. I'd rather have a good solution for 0.9.5 than a quick hack for 0.9.4.

> Who would want to view source in color if it's too slow? IMO, we need a
completely  
> different approach to solve the coloring problem ( very low priority though ).

If it shows the real source and has good syntax highlighting (reflecting what
the tokenizer recognizes and what not), I'll accept a slow view-source. I see
nochance to accelerate view-source besides of accelerating layout altogether.
What we could do is incremental display for view-source. If we'd do that it
wouldn't matter as much if it's slow. And, BTW, view-source without syntax
highlighting should go through tokenizer as text/plain (if we want to use the
tokenizer for text/plain at all (my last patch at least optimizes this)).
While you're recoding nsHTMLTokens.cpp, please take a look at BUG 92721.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Attached patch patch v 0.5 (obsolete) — — Splinter Review
Currently I'm trying to resolve performance issues. My patch makes the
tokenizer about 15% faster (depending on page), but most of that time is
lost in the HTML content sink later. Some pages (especially Bugzilla bug
lists) load even slower than before (up to 1.5%).

View-source with syntax highlighting will be typically about 10-15% slower.
I think I can reduce that number, but there will be a price to pay for
correcting the errors. Very large documents slow down more (40% for a 1MB
bug list), and if stylable delimiters are enabled (default in the patch,
but should later be disabled by default) load times rise about 100-200%.
Time spent in layout increases exponentially with the number of tokens in
a <pre> element.

Good news about performance: My patch makes tokenizer about 99.5% faster
for plain text documents. This includes view-source without syntax
highlighting, reducing total load time about 20-80% (depending on size of
document and proportion between markup and text).
> Time spent in layout increases exponentially with the number of tokens in
> a <pre> element.

Um.  That's very unfortunate... has anyone considered filing a bug on layout
regarding that??

Also, would it be at all beneficial to create multiple small <pre> elements
instead of one big honkin' one?  It seems that this would alleviate the
exponential growth problem a bit....
Filed bug 97229 for the frame construction time problem. Yes, I though about
dividing the <pre> too, I think it's worth to implement it (see numbers in
bug 97229).

The reason for the slowdown in the content sink was that I've changed whitespace
tokens to use nsSlidingSubstring instead of nsString. It's very slow for strings
consisting of only one char, but maybe that's only on my system.
Andreas, you may want to email scc@mozilla.org and ask him how to efficiently
use the string classes to do what you're trying to do with
nsSlidingSubstring....  Chances are, he can suggest a more effective approach. 
Things like

+    mTypeID = nsHTMLTags::LookupTag(nsAutoString(mTextValue));

can almost certainly be done more efficiently (without creating and initializing
an nsAutoString, for example).
Keywords: mostfreq
> +    mTypeID = nsHTMLTags::LookupTag(nsAutoString(mTextValue));

I have undone this already. It was just a hack to use nsSlidingSubstring
without changing LookupTag(), but it turned out that it had no benefit over
the current approach without changing more of the parser code. Where I can't
see an improvement I leave the code as it is now.
Will this fix bug 98149?
*** Bug 98149 has been marked as a duplicate of this bug. ***
Adding "patch".  Is this considered blocked by bug 97229?
Keywords: patch
QA Contact: bsharma → moied
Attaching my old testcase from a dupe, and a new instance I came across today
here so they can be verified against when this patch is ready.
Keywords: testcase
This isn't going to make 0.9.5, obviously, so targeting for 0.9.6.  Andreas:
perhaps you could freshen the patch and/or list what the remaining issues if any
are before we try to get reviews?

Adding perf since the htmlparser rewrite of this has performance impacts (some
positive).
Keywords: perf
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Is this rewrite absolutely necessary ( though I appreciate the effort )? It
looks scary. Remember the existing code has evolved for quite some time and I'm
not sure how well the patch above could be tested for regressions. Is this is
the only solution to fix view-source problems? I would prefer a simpler solution
focusing only on the problem ( view-source ). I'm very nervous about the patch (
though I haven't looked into it throughly ). Please don't make things
complicated than  what it's now.
Hmm.  Perhaps the best thing to do here would be to arrange for a branch.  After
seeing the havoc linktoolbar wreaked, this seems way too scary to just check
into the tree, regardless of review.  OTOH, the view source implementation does
have sort of an air of being tacked on, and it would be nice to see it improved.
 A branch would let us spin through a whole bunch of regression tests and so
forth without disturbing the stability of the main tree-I think it's something
we should seriously consider.
Whether we use this patch or not, we need view-source to show an accurate
representation of the source, otherwise it is useless!  I know this patch is
intended to fix a lot of view-source bugs. Do we have a list of those bugs in
one place?
I do like the idea of fixing viewsource problems but not at the cost of
jeopardizing layout. If the intention is to fix viewsource problems then we need
to a solution to target just that problem.
Sorry for not working on this for a long time.
But I have a new patch now. Please note my detailed explanations at
http://c07.de/mozilla/bug57724/changes.html (will attach this here too).
Binaries for Linux are available at http://c07.de/mozilla/bug57724/ .
Attached patch patch, v0.9 — — Splinter Review
Attachment #44427 - Attachment is obsolete: true
Attachment #44660 - Attachment is obsolete: true
Attachment #46746 - Attachment is obsolete: true
> No newline tokens for plain text documents (performance reasons). Line breaks
> are contained in text tokens instead.

How big a performance win is this?  The reason I ask is the if you have actually
implemented all this line-counting correctly we should be able to do things like
insert named anchors in the source view at the beginnings of lines.  This would
allow scrolling to those anchors and the like.

The other concern I have is that this patch will break line wrapping in view
source as far as I can tell.  Perhaps we can put the id="viewsource" on the
<body> and change viewsource.js appropriately?

Forgot to say something about performance:
- Normal performance (not view-source) is only slightly better, about -0.5%
  (page loading time without images etc.).
- Performance for view source depends largely on the size of the page. Same
  performance for typical pages, up to 10% slower for small and/or simple pages
  (e.g. +7% for the Mozilla homepage), much faster for large and/or complex
  pages (e.g. -90% for an 1MB bug list).
- Much faster for view-source without syntax highlighting and other text/plain
  documents (typical -50%).
- Poorer performance if you enable the new pref for styled delimiters, except
  for very large documents. Can be more than 100% slower for small
  documents with many comments like <!-------------------------------------->.
>> No newline tokens for plain text documents (performance reasons). Line breaks
>> are contained in text tokens instead.

>How big a performance win is this?  The reason I ask is the if you have actually
>implemented all this line-counting correctly we should be able to do things like
>insert named anchors in the source view at the beginnings of lines.  This would
>allow scrolling to those anchors and the like.

Newline tokens are created only after markup (>), not for line breaks in text.
There is no change in current behavior other than ignoring markup in plain
text. Creating newline tokens for all line breaks would probably cost some
performance.

>The other concern I have is that this patch will break line wrapping in view
>source as far as I can tell.  Perhaps we can put the id="viewsource" on the
><body> and change viewsource.js appropriately?

Will test this.
My patch doesn't break line wrapping.
Andreas, you tested this on a document big enough to trigger the splitting into
512-token blocks?  And looked at the _end_ of the file?

Basically, it looks like line wrapping will only work for the first <pre> with
your changes, since that's the only one that has the #viewsource id attached to it.
Boris, you're right. Need to change that.
Some comments:

The fact that the XML flag isn't set for XHTML as text/html is a bug, see bug
107904. We shouldn't support the weirdnesses of <plaintext>, <xmp>, and
<listing>, all of which are obsolete-in fact, there's already a bug to remove
<plaintext> support open, bug 88987. I will create testcases to check out some
fun SGML stuff (particularly attribute handling) soon.
I've been using this for the past several days and haven't noticed any
weirdnesses yet. Loading of view source seems slow on some pages, but not to a
greater extent than I experience with normal builds.

Pushing off to 0.9.7 while I'm at it, as this clearly won't be in for 0.9.6...
Target Milestone: mozilla0.9.6 → mozilla0.9.7
*** Bug 107611 has been marked as a duplicate of this bug. ***
Blocks: 105937
Blocks: 49030
Blocks: 70828
Blocks: 63137
Blocks: 87726
Blocks: 57722
Blocks: 89033
Blocks: 43267
Blocks: 57717
Blocks: 83221
Blocks: 91045
Blocks: 70918
Blocks: 92196
Blocks: 91240
Blocks: 98149
Blocks: 107611
Blocks: 91046
0.9.7 is out and the bug is still there... so I'd suggest to adjust the "target
milestone" setting!
->1.01, I doubt this patch can be adequately tested before 1.0. :-(
Target Milestone: mozilla0.9.7 → mozilla1.0.1
What does 1.0.1 mean?? You want to release the "rockstable bugfree" 1.0 version
including this bug??
The patch in this bug involves a destabilizing major tokenizer rewrite with
unknown consequences for lots of parsing issues.  So yes, unless this lands very
soon (0.9.8 timeframe) this should not land before 1.0.
one of the purposes of the massive bug reopening/dependency creation was that
the more serious of these bugs could be solved individually, without emplacing
the entire patch here. (as far as that goes, if we're going to do a major
rewrite, I think we should go whole-hog and do a separate view-source tokenizer).
I like Christopher's proposal for a separate viewsource tokenizer. This IMO
would reduce maintenance cost and would limit regressions to viewsource only.
Agreed.  A separate view-source tokenizer would be the way to solve the problem
without abandoning syntax highlighting.  In a separate view-source tokenizer,
*only* the parsing needed for syntax highlighting would be done, and it could be
audited to make sure it lost no information, even for totally invalid pages.  It
 could probably be significantly simpler than the regular tokenizer!  Meanwhile,
the regular tokenizer could continue to do whatever 'corrections' and
optimizations it wanted.

However, that might take a while to implement.  In the interests of getting
something done about this in the *near* future, how about a way to turn off
syntax highlighting, and have 'view source' *just* show the raw document?

This could be a hidden pref.  However, I'm pretty sure there's enough demand for
'view source' to show each page *exactly* as it came from the server, that that
should be the *default*.  Better to have a defaulted-to-off preference "Turn on
syntax coloring in view source (and potentially lose information)".
I have to cast my vote in favour of a "raw", non-tokenised mode for view-source.
 That would fix most of these bugs for the web developers, and the majority of
normal users won't care.  Would this be hard to implement?  (I suspect easier
than a seperate view-source tokeniser)
It would be pretty trivial to do that, in fact (just have nsViewSourceChannel
reset the type to text/plain).  The losses are:

1)  lack of syntax highlighting (a _major_ reason people prefer
    Mozilla's/Netscape's "view source" to IE's).
2)  Some intl pages which set a charset via a META tag may no longer be
    view-sourceable in the right charset.  We _do_ set the default charset on
    the webshell in the view source window under some conditions, but to be
    truthful I'm not sure what those are.  In any case, that would only apply
to     "view source" from the menu not to "view-source:" urls or to view source
    launched from the JS console (!).

In my opinion, item #2 is a _major_ drawback to the proposal.

Item #1 is avoidable if we use the existing "view source highlighting" pref to
control this behavior.  Item #2 needs careful thought and input from intl folks
who are more familiar with the issues.
The following observation might be related. If you copy something from view
source and paste it to another application, additional empty lines are added at
some places.

pi
*** Bug 142044 has been marked as a duplicate of this bug. ***
What's the status on this? We have a patch for a long time now, but no activity.

Also, maybe the blocking-list could be duped to this bug, looks like they all
are special cases of this bug.

pi
> What's the status on this? We have a patch for a long time now, but no activity.

See comment 61 and comments thereafter by both Harish and Andreas.

The blocking bugs _were_ dupped.  They were _undupped_ because many of them can
be fixed independently without a complete parser rewrite (which is what this
patch is).
I guess the target milestone should be moved then.

pi
Keywords: helpwanted
Target Milestone: mozilla1.0.1 → Future
Depends on: 149867
example:

I'm writing on a php-project. In that project I've a telephone list. It will
work with the following arguments (only the important will be mentioned)

SM = sub menu (number) to show
V  = variable that decides if the form to fill in or the results will be shown
(0 = form, 1 = execute sql and show results - without the form)
Suchwort = words I'll look up in my database (will be divided thru explode in
different words by blank). It's a field in my form.

mypage.php?SM=6
==> shows me the page with the form to fill in, which will be send like that:
<form action="mypage.php?SM=6&V=1" method ="post">

mypage.php?SM=6&V=1
==> examins different variables, if they are set. If they are set, they control
the order, count of records per page,..
One check I'll do is: if (empty($Suchwort)) { $V=0; }
That means: if someone comes over history/bookmark/.. to the page, it's not
interesting, if V=1 or V=0.
When I get back results back for my query and look up the sourcecode (Strg-U) of
my page I'll get back the form.
==> I get a new request for mypage.php?SM=6&V=1 . And there's the problem: I
don't send $Suchwort via adress, I send it through a form. mozilla doesn't
remember that and finds an empty $Suchwort and resets V to the value zero.
If testing Strg-U on a page called with mypage.php?SM=6&V=1&Suchwort=123 it will
work without any problem.

I think it would be better to analyze the page in the cache than sending a new
request. If using dynamic pages you're getting problems when searching bugs in
your source-code or when examining pages in www.

I'm using: en-US, rv:1.2a, Gecko/20020907 MultiZilla/v1.1.22
Frank: wrong bug. this bug is for view source displaying the right page but with
the source slightly modified.
quotation from first entry:
These are bugs where view-source doesn't display a page as originally sent.
======

I thought, it'll fit, because "doesn't display a page as originally sent" is
very general and opens different ways to interpret ;-)

I found different bugs for "view-source" but no bug would fit. SDo I thought, I
add the problem here (general). 
Frank, this is why one reads all the comments on a bug before commenting.
Blocks: 154120
Depends on: 188609
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and
<http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss
bugs are of critical or possibly higher severity.  Only changing open bugs to
minimize unnecessary spam.  Keywords to trigger this would be crash, topcrash,
topcrash+, zt4newcrash, dataloss.
Severity: normal → critical
Severity: critical → minor
Keywords: dataloss
This is not minor.  It's normal.  Please fix that.
Severity: minor → normal
As a note, I'm considering removing the whole view-source feature entirely and
just using an editor to view the source.  The amount of time being spent on
view-source is utterly disproportional to its usefulness (or rather the
usefulness it offers over a text editor).
Take a look at bug 8589, which would do that elegantly.  I tried to take a stab
at it, but I couldn't work out the undocumented internals of Mozilla well enough
to do it.  If you have the knowledge of Mozilla internals to do it, pleeeeeeease
do and we will love you forever!
Well, since the view-source channel now returns the application/x-view-source
MIME type all the time, all that would be required would be to not register an
internal handler for that type (see nsContentDLF.cpp).  Then when you try to
view source you would get the normal helper app dialog, select an application to
view with, make sure "ask me every time" is unchecked, and be done with it.

This is essentially a one-line change (not including the code that could be
removed as a result, of course).
If internal view-source is removed, some of the code should probably be kept and
moved to Composer for things like bug 58730.  (In which case the source is
munged anyway to achieve more compliant HTML).
bz:
Anyway, how should removing view-source (which I really like because of syntax
highlighting and such stuff) help to resolve this bug?
I thought this bug is because the Cache doesn't give us the right source for the
pages in the given cases - if that's true, the text editor would happen to get
that wrong source in those cases, which would mean that bug wouldn't be solved.
KaiRo: No, it's the parser who munges the pages, not cache. aiui, if you disable
syntax highlighting, it might work...
KaiRo, you're _way_ in the wrong bug, like Biesi said.

Aaron, if the composer people want the code they can get it out of CVS (Attic,
to be exact).  Again, this is presuming we decide to remove the internal
view-source.  Given all the problems it's causing, I will likely do it by summer
unless someone fixes the major issues with it (poor tokenizer, poor performance,
etc).
bz, biesi:
Oops, sorry, it seems I'm still a bit confused by the word "munge" ;-)
Anyway, thanks for claifying I was thinking/writing of the wrong bug here...

I still don't like view-source going away (we're getting nearer and nearer to do
everything the same as IE, it seems) - but I have no right to complain as long
as I'm not able to code anything which could make our situation better here :(
Viewsource should map to the source document 1:1. This is not completely true in
mozilla because the viewsource content is processed by the same tokenizer that
is tuned to handle quriky content. IMO, the viewsource content should be handled
by a) 
a new tokenizer or b) avoid tokenizer completely ( may be by using a serializer
instead ). Syntax highlighting might be an issue but Im sure we can come up with
something :-)
In "view source", I see a normal space where there is &nbsp; in the original
document. Is there a bug for that or should I file a new one?

BTW, sorry if I get it wrong, but aren't the dependencies reversed? I.e. if this
is a tracking bug, then this bug should _depend on_ fixing all the concrete
view-source bugs, not vice versa...
> Is there a bug for that or should I file a new one?

Is that regular view source, or view selection source?

The dependencies are correct.  This is not a tracking bug, this is a bug on the
root cause of all those other bugs -- the fact that we use the normal HTML
tokenizer for view source.
> > In "view source", I see a normal space where there is &nbsp; in the
> > original document. Is there a bug for that or should I file a new one?
> Is that regular view source, or view selection source?

Ah, sorry, I didn't realize there was a difference. It only does this in view
selection source. Is that a different bug than this?

Also, sorry about my misunderstanding about the dependencies.
> Is that a different bug than this?

Yes, bug 155635.  And it's fixed in current builds.
No longer blocks: 49030
Can someone change the summary of this bug to use real words (i.e. not
"munging") so that people have a better chance of finding this bug when they
search Bugzilla?        

How about "View Source does not display original page source as sent by server."
Summary: View-source munging pages → View source munging pages (does not display original page source as sent by server)
Blocks: majorbugs
Ah, maybe this is the bug report I am looking for. I've just been commented at
bug 55583 but that may deal with a different matter although the Summary
describes my problem very good.

I am a web developer and thus I need to see the html code that a server
sends to the client. But if I use View Page Source on a dynamically created page
resulting from a POST request, Mozilla asks me to send the POST request again.
This means: Mozilla does not show the original html source code which it should
get from the cache or somewhere. The problem is that the resending the POST
request can result in a completely different page (e.g. if a transaction has
finished after the first POST request).

Want to try yourself?

Go to 

http://democam.mobotixserver.de/cgi-bin/store2rom

press the button, wait for the page to return and then do
right-click -> View Page Source.
Now a popup window will appear telling you that the "page you are trying
to view contains POSTDATA that has expired from cache".

Nope, wrong bug.  Notice this bug has to do with PARSER.
Daniel: nope, that would be a different bug - this one is about view-source
showing not exactly what the server sent, such as not displaying redundant
double-quotes in HTML tags etc.

Anyway, your testcase works perfectly for me - perhaps you're using an old
version? Or perhaps you didn't wait until the page finished loading before
trying to view the source (I believe the problem you describe happens then,
though I'm not sure)?
Mh, if it is really about "what the server sent" then check out this
page which embedds a timestamp on every page sent out (only precise
to the minute):

http://www.theregister.co.uk

- Open the URL using Mozilla on a *Windows* system. (Linux works ok)
- Enter something in the text box (upper left corner) and press Go!.
- See the time in the head bar on the page.
- Wait one minute or brew a tea.
- Now "View Page Source", confirm the pop up window. 
- When the source is displayed find "Updated:"  You will see that the time
  has changed compared to the one shown on the rendered page.

I think, this fits the summary "does not display original page source as sent by
server". If this is the wrong bug could you please point me to a more
appropriate one?

Cheers
Daniel
Germany
Daniel, please read the whole bug, not just the summary (and in the case of a
tracking bug its dependencies) before commenting.

This bug covers the fact that what view-source shows is a tokenized version of
the source that comes from the server; the tokenizer sometimes "fixes" the
source or loses data...

Your problem is totally unrelated to this bug; see bug 166786.
Boris, 
thanks for pointing me into the right direction. Then I'm off to bug 166786. :-)

Goodbye.
Boris, you mean that mozilla gets a page and tokenizes it without keeping the
original source before doing anything else.

The contents of bug 188609 does not satisfy me.

What can I do to have the real page source ?
I haven't contributed to mozilla yet outside of submitting, commenting
or voting for bugs. What can I begin with ?
That's correct.  When doing view-source of a page, we start by getting the
source from cache, tokenize it, then generate a document that has highlighted
source.

The tokenizing is only needed for syntax highlighting; if highlighting is off
the tokenizing could very well be skipped.

I'm not sure what you can begin with, because I'm not sure what you want to do;
feel free to email me to discuss (the discussion does not seem directly relevant
to this bug).
Many text editors manage to syntax-highlight code without screwing it up in any
way.  So can we.
Actually, I've managed to write malformed code that's made every single text
editor I tried either give up on highlighting it or actually get so confused
that it changed the data in the file (eg by not loading all of it).

But yes, we can do better.  All we need is to write a separate tokenizer for
view-source, as discussed, instead of using the same one as pageload.  Wanna do
it?  Or are you only willing to make comments that don't really take this bug
anywhere?
I'd be quite happy to write a view-source tokeniser algorithm.

But my attempts at finding where it would fit in the program code are driving me
mad - I guess I need more than just the XUL and JS sources...?
Yes.  The tokenizer would be instantiated from nsParser.cpp (right now that
instantiates an nsHTMLTokenizer).
Remember while writing it that it has to handle malformed html, or stuff that
isn't HTML at all; the output must be character-for-character the same as the
imput, except possibly for the colors.  (This shouldn't be too hard, but I want
to make the requirements of the problem clear.)  Probably each token should hold the
exact original character string it's associated with, and there should be at
least one token type for "I don't know what this is".
Is a tokenizer really required for view source? How important really is (poorly)
attempting to color-code HTML in the browser? (in the editor I can understand).
And please, no "I'll stop using Mozilla and shoot myself if you remove colored
view source" comments from the non-developers.

If MNG and other useful features are being removed to slim down Moz/Firebird,
why not ditch this? It's buggy, a time sink for someone to redo it, and excess
code (which leads to excess bugs).

IE somehow manages to get by without a page source tokenizer. And it's not
something I hear the masses clamoring for. (The real masses, not this bugs CC list)

Unfortunately I suspect this may lead to "View Selection Source" going away too,
which is (arguably) semi-useful to a small percentage of users. But that tends
not to work well half the time anyway. Perhaps it would be a candidate for an
XPI extension.
Do you ever use "view source"? I can tell you for a fact that the highlighting
makes it a lot more readable... As a result, view-source is the #1 tool for me
when I'm debugging web page issues.

Please don't drag MNG in here -- if the imagelib owner wants to cut stuff,
that's his decision, but he has no control over the parser and content code.

Yes, view-source could be made into an extension, with a lot of work (a bunch of
extensions, one per platform, since it uses C++, actually).
One last thing.  Size-wise, view-source code is about 40KB, if anyone cares.
Here's a thought for implementing.  Instead of rewriting the tokenizer, add
fields to each token for "source-start" and "source-end", byte offsets into the
raw source.  Then to display color-coded source, we just display the raw source
and walk the parse tree to decide how to color it.
I (used to) use view source routinely to debug pages which were not displaying
properly.  The Mozilla coloring tokenization process meant that view source *did
not display* the source of the page.  The current coloring system is *worse than
nothing* for debugging web pages.  I've ended up using "Save Page..." and
opening the saved file in a text editor every time.

Blocks: 220386
Blocks: 172947
View source is downcasing tags in XHTML 1.0 strict. I wish to see the source as
it is sent from the server even if it incorrect. 
That is indeed the whole point of this bug.
view source does not show the original source again:( After a post, it shows the
page before post. (However as I saw, I can reload the source :), I just
recognize it now, that allows again) Cache is enabled and has 50 Mb.

Sorry if this announce is not in the best place here, but I could not find the
original bug.

Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6a) Gecko/20031030
*** Bug 225209 has been marked as a duplicate of this bug. ***
Blocks: 240636
*** Bug 251528 has been marked as a duplicate of this bug. ***
Blocks: 262562
(In reply to comment #127)
> Here's a thought for implementing.  Instead of rewriting the tokenizer, add
> fields to each token for "source-start" and "source-end", byte offsets into the
> raw source.  Then to display color-coded source, we just display the raw source
> and walk the parse tree to decide how to color it.

Sorry I've been quiet on this bug for a while.  I think this would be a good
idea.  Not only would it solve the problem once and for all, it would save
having to maintain two separate tokeniser algs in parallel, and consequently any
chance of the two disagreeing over whether something is HTML or XHTML.
That assumes we have the raw source as some sort of contiguous thing in memory
all at once.  That almost never happens; the source is parsed incrementally as
it comes in from the networking library.
Oh.  I would've thought it saved it verbatim to the cache.  If it doesn't, it
should.
1) Not everything is saved in the cache.
2) You're suggesting we get the source twice (once to construct the parse tree,
   and once to do this walk thing).
(In reply to comment #137)
> 1) Not everything is saved in the cache.

Why not?  Even if there's some dreaded 'expire' or 'no cache' directive on the
page, surely it should be there temporarily?

> 2) You're suggesting we get the source twice (once to construct the parse tree,
>    and once to do this walk thing).

What are you on about?  I'm suggesting that we get the source once, save it to
the cache and tokenise/parse it.
When reading data from the cache, we can't get it all at once either -- it comes
in chunks.
We don't have to wait until the whole page is retrieved and then parse it out of
the cache.  As each chunk comes in, send it both to the cache and to the tokeniser.
That's what we do right now for normal HTML loads.

But when you want to do view source, we'd have to read the data from cache,
generate a parse tree, then walk all the data _again_ to display it, per your
suggestion.  Assuming I understand your suggestion correctly.
Why not use the parse tree we already keep in memory, as character offsets into
the cached source?
We don't have a parse tree in memory (at any point in time, actually).  We have
a DOM.  Bloating all DOMs significantly for the benefit of view-source is not an
option.  Further, the DOM need not correspond to the original source in any way
(see JS manipulation of the DOM).  Finally, the default tokenizer/parser just
drops a lot of stuff on the floor altogether; stuff that needs to show up in
view-source.
(In reply to comment #143)
> We don't have a parse tree in memory (at any point in time, 
> actually).

What _do_ we use to view source at the moment then?  Is there any reason it
can't still be used to view source under the modifications proposed from comment
127 onwards?

> Finally, the default tokenizer/parser just drops a lot of stuff on 
> the floor altogether; stuff that needs to show up in view-source.

The view-source code would display the cached source as is, using the tokenise
tree/parse tree/whatever merely to syntax-highlight.
> What _do_ we use to view source at the moment then? 

We read it from cache and retokenize/reparse it (with slightly different rules
so we don't lose content).  The content sink builds a different content model
from the normal HTML one from the parse tree.  Please see nsViewSourceHTML.cpp.

> The view-source code would display the cached source as is, using the tokenise
> tree/parse tree/whatever merely to syntax-highlight.

That really can't work if the tree doesn't match the source, is the point.
What possible causes are there of the tree not matching the source?
I thought I covered that in comment 143?

In any case, with Blake Kaplan's recent tokenizer work we're using the same
tokenizer in both cases and just not dropping things in view-source.  You'll
note that most of the bugs blocked by this bug are in fact fixed; of the
remaining ones, most are not actually bugs but rfes.

So there is no need to do a wholesale rewrite of anything, is the point.
(In reply to comment #147)
> I thought I covered that in comment 143?

OK, so it wouldn't make sense to use the DOM as the whatever.

> In any case, with Blake Kaplan's recent tokenizer work we're using the same
> tokenizer in both cases and just not dropping things in view-source.  You'll
> note that most of the bugs blocked by this bug are in fact fixed; of the
> remaining ones, most are not actually bugs but rfes.

OK.  I guess we can wait and see what Blake comes up with.
Blocks: XiHashed
Taking, as I ended up fixing the rest of these bugs.
Assignee: c → mrbkap
Status: ASSIGNED → NEW
...and marking this bug as FIXED! The rest of the bugs depending on this one are
either not bugs in view source (such as bug 105937) or simply RFEs (in other
words, we're now showing (hopefully!) the exact source code of the page, with
some slight coloring tweaks remaining to be more expressive; but this bug as
reported is no longer a problem).
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Congratulations, and thank you!
No longer blocks: majorbugs
I'm seeing this problem again, in FireFox build 20050823 with bfcache enabled.

I can reliably reproduce it like this:

   1. Visit http://www.flinthomes.net/
   2. Near the lower-left of the page, under "MLS QuickSearch", hit "Find" (you
can leave the textbox empty)
   3. On the resulting page, view the source.  It's not the same source that was
rendered (it's as if the page is re-requested without the appropriate POST
variables).

I couldn't find a more specific or recent bug dealing with this, so re-opening
this one.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Please file new bugs on new problems.  If the problem only happens with bfcache
enabled, please make sure that your bug blocks bug 274784 and cc me.

As for the rest, this bug was about the right source being dealt with but not
being _shown_ correctly due to parsing issues.  So it sounds like a totally
separate issue.
Status: REOPENED → RESOLVED
Closed: 20 years ago19 years ago
Resolution: --- → FIXED
Blocks: 320585
I have reported the same bug for a view years ago at http://www.mobildiscounter.de
same bug here at http://www.autoradio-test.org some month ago
(In reply to Jens from comment #154)
> I have reported the same bug for a view years ago at
> http://www.mobildiscounter.de

(In reply to Carry from comment #155)
> same bug here at http://www.autoradio-test.org some month ago

View Source on these pages works for me. If you still see the problem in a fresh build, please file a new bug with detailed steps to reproduce.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: