Closed
Bug 57724
Opened 24 years ago
Closed 19 years ago
View source munging pages (does not display original page source as sent by server)
Categories
(Core :: DOM: HTML Parser, defect, P3)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
FIXED
Future
People
(Reporter: jruderman, Assigned: mrbkap)
References
Details
(Keywords: helpwanted, perf, testcase)
Attachments
(4 files, 3 obsolete files)
These are bugs where view-source doesn't display a page as originally sent. Assigned to me for tracking only; steal if you want to use this bug to fix things.
Reporter | ||
Comment 1•24 years ago
|
||
Adding five dependencies.
Jesse: you also added Bug 49030. I (the reporter of 49030) just checked the page in question (http://www.handyshop.de), and it looks fine now. I think 49030 should be marked as fixed.
Comment 3•24 years ago
|
||
Added related bugs involving view-source.
Reporter | ||
Comment 4•24 years ago
|
||
This bug doesn't depend on bug 40867 directly. It depends on bug 40867 through bug 55583 (and maybe through bug 6119).
No longer depends on: 40867
Comment 5•24 years ago
|
||
It's a tracking bug, right? Isn't it easier to track the indirect dependencies if you list them directly? (Or do you tend to look at the dependency tree?)
Reporter | ||
Comment 6•24 years ago
|
||
I tend to look at the dependency tree, at least if there are more than three blockers. By just listing direct dependencies it's easier to see what the problems with view-source are.
Tracking bugs should probably belong to the QA. Giving bug to janc.
Assignee: harishd → janc
Status: ASSIGNED → NEW
Comment 10•23 years ago
|
||
I don't know who should get this, but I do know it's not mine... returning to Jesse
Assignee: janc → davidr8
Reporter | ||
Comment 11•23 years ago
|
||
Resetting target milestone. Several of the bugs tracked by this bug are futured :(
Target Milestone: mozilla0.9.1 → ---
Comment 12•23 years ago
|
||
Bug 78342 should probably be added to this bug, or duped against one of the dependencies if I missed it.
Reporter | ||
Comment 13•23 years ago
|
||
Adding bug 70828, view source makes double-quotes inside tags disappear.
Comment 14•23 years ago
|
||
Many of the bugs that this bug depends on seem to indicate that the parser cleans up the HTML. It seems that (for view source) the document should not go through the parser, or at least should go through the parser without being changed.
Reporter | ||
Comment 15•23 years ago
|
||
-> nobody (removing tracking bugs from the list of bugs I own)
Priority: P3 → --
Comment 17•23 years ago
|
||
Some of the bugs that depend on this are already fixed in my tree. I'll fix the others too (exept bug 55583 which is not directly related to the others). This is no longer a meta bug. I'll dupe the bugs that depend on this.
Updated•23 years ago
|
Status: NEW → ASSIGNED
Comment 18•23 years ago
|
||
*** Bug 63137 has been marked as a duplicate of this bug. ***
Comment 19•23 years ago
|
||
*** Bug 87726 has been marked as a duplicate of this bug. ***
Comment 20•23 years ago
|
||
*** Bug 57722 has been marked as a duplicate of this bug. ***
Comment 21•23 years ago
|
||
*** Bug 89033 has been marked as a duplicate of this bug. ***
Comment 22•23 years ago
|
||
*** Bug 43267 has been marked as a duplicate of this bug. ***
Comment 23•23 years ago
|
||
*** Bug 57717 has been marked as a duplicate of this bug. ***
Comment 24•23 years ago
|
||
*** Bug 70828 has been marked as a duplicate of this bug. ***
Comment 25•23 years ago
|
||
*** Bug 83221 has been marked as a duplicate of this bug. ***
Comment 26•23 years ago
|
||
*** Bug 91045 has been marked as a duplicate of this bug. ***
Comment 27•23 years ago
|
||
*** Bug 70918 has been marked as a duplicate of this bug. ***
Comment 28•23 years ago
|
||
*** Bug 49030 has been marked as a duplicate of this bug. ***
Comment 29•23 years ago
|
||
So it's not lost in the duping, my testcase is attached to bug 43267, for a specific manifestation of this bug (extraneous quote removal).
Comment 31•23 years ago
|
||
*** Bug 92196 has been marked as a duplicate of this bug. ***
Comment 32•23 years ago
|
||
*** Bug 91240 has been marked as a duplicate of this bug. ***
Comment 33•23 years ago
|
||
Andreas: your comments in bug 92196 mention generated content. Are you generating content with a stylesheet (using the "content" attribute)? If so, please check that it's selectable and copyable from view source. The initial reason view source was moved from XML to HTML was that Mozilla had a bug in which generated content was not selectable. See bug 12460....
Comment 34•23 years ago
|
||
Boris, no, the tokenizer will produce an additional token for view source.
Comment 35•23 years ago
|
||
Andreas: Good. :) Another question. While you're in there, can you possibly get line numbers working? The whole mLineNumber thing is badly off (for example, <table width="100%" height="100%"> will get counted as one line because the newline is included in the attribute value -- '"100%"\n'. Similar problems happen with doctype declarations and the like. CDATA sections seem to count as one line each, making this _really_ bad on pages with inline JS or CSS). I was looking at getting that working at some point, but it would involve changes to the tokenizer, for the reasons I just mentioned. Luckily, you're already changing the tokenizer.... just a thought... feel free to ignore this. :)
Comment 36•23 years ago
|
||
Boris, I've already done this for most tokens in the tokenizer, but that is not enough to make it work. Some code (probably in CNavDTD) has to read the newline count for *all* tokens. At the moment the newline count seems to be ignored for at least attribute and end tag tokens and for all code inside <noscript>.Also, it will not work for mac style line breaks (the scanner counts simply LFs and I do the same).
Comment 37•23 years ago
|
||
OK. Well, I was hoping. :)
Comment 38•23 years ago
|
||
Andreas: in the attachment you added to bug 92196 the &s are incorrect since the source doesn't contain &s but only &-chars. And why is mozilla trying to highlight stuff within the <xmp> tags? The only tag recogninzed after a <xmp> should be a </xmp> if I understood the HTML-2.0 definition correctly. The same goes for <listing>. I know that the page is HTML-2.0 but those pages are still out there.
Comment 39•23 years ago
|
||
carljohan, as I wrote in bug 92196, you'll be able to suppress the "amp;"s. For <xmp> and <listing> (and <plaintext>), you're mostly right. Its contents should be CDATA (but note that these elements were deprecated even in HTML 2.0). I'll investigate if I can simply fix it by setting the CDATA flag in nsElementTable.cpp. If not I'll file a separate bug on it.
Comment 40•23 years ago
|
||
Clarence: Could you please post a patch ( even if incomplete )? I would like to take a look at it. Btw, watch out when you touch nsElementTable.cpp...it's very fragile. Note: All these problems started because of view source coloring.
Comment 41•23 years ago
|
||
Comment 42•23 years ago
|
||
The patch is not a final version, but most of it should work. The basic idea is to create more smaller tokens instead of a single complex token for view-source, e.g. separate tokens for delimiters, attribute names and attribute values. The delimiter tokens slow down view-source notably, so I made a pref for viewing them as plain text ("view_source.styled_delimiters"). > Note: All these problems started because of view source coloring. I need it for rewriting HTML source in bug 40873 too. And maybe composer also could use it.
Comment 43•23 years ago
|
||
Comment 44•23 years ago
|
||
The new patch does no longer freeze/crash on plain text documents and it includes a modified version of the patch attached to bug 91437.
Comment 45•23 years ago
|
||
Clarence: This patch is huge!!!! The bigger the patch the harder it is to get
into 0.9.4. Btw, I haven't started reviewing your patch yet.
> The delimiter tokens slow down view-source notably, so
> I made a pref for viewing them as plain text
> ("view_source.styled_delimiters")
Who would want to view source in color if it's too slow? IMO, we need a completely
different approach to solve the coloring problem ( very low priority though ).
Comment 46•23 years ago
|
||
> This patch is huge!!!! The bigger the patch the harder it is to get > into 0.9.4. Basically, it's a rewrite of most of nsHTMLTokenizer and nsHTMLTokens. Probably they were not written to handle view-source and can't do it properly without major changes. I thought about writing a own tokenizer for view-source, but I wanted to have view-source being in sync with the tokenizing process. And I found problems not directly related to view-source that I wanted to fix too (I'll summarize them later). Finally I noticed that performance in the tokenizer could be much better (I do not know how much it adds to the total page load time, but I hope that it will be notably faster). Yes, my patch has some risks and should get checked in early in a milestone cycle. I hope to finish it within a week, but even that might be too late for 0.9.4. I'd rather have a good solution for 0.9.5 than a quick hack for 0.9.4. > Who would want to view source in color if it's too slow? IMO, we need a completely > different approach to solve the coloring problem ( very low priority though ). If it shows the real source and has good syntax highlighting (reflecting what the tokenizer recognizes and what not), I'll accept a slow view-source. I see nochance to accelerate view-source besides of accelerating layout altogether. What we could do is incremental display for view-source. If we'd do that it wouldn't matter as much if it's slow. And, BTW, view-source without syntax highlighting should go through tokenizer as text/plain (if we want to use the tokenizer for text/plain at all (my last patch at least optimizes this)).
Comment 47•23 years ago
|
||
While you're recoding nsHTMLTokens.cpp, please take a look at BUG 92721.
Updated•23 years ago
|
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Comment 48•23 years ago
|
||
Comment 49•23 years ago
|
||
Currently I'm trying to resolve performance issues. My patch makes the tokenizer about 15% faster (depending on page), but most of that time is lost in the HTML content sink later. Some pages (especially Bugzilla bug lists) load even slower than before (up to 1.5%). View-source with syntax highlighting will be typically about 10-15% slower. I think I can reduce that number, but there will be a price to pay for correcting the errors. Very large documents slow down more (40% for a 1MB bug list), and if stylable delimiters are enabled (default in the patch, but should later be disabled by default) load times rise about 100-200%. Time spent in layout increases exponentially with the number of tokens in a <pre> element. Good news about performance: My patch makes tokenizer about 99.5% faster for plain text documents. This includes view-source without syntax highlighting, reducing total load time about 20-80% (depending on size of document and proportion between markup and text).
Comment 50•23 years ago
|
||
> Time spent in layout increases exponentially with the number of tokens in
> a <pre> element.
Um. That's very unfortunate... has anyone considered filing a bug on layout
regarding that??
Also, would it be at all beneficial to create multiple small <pre> elements
instead of one big honkin' one? It seems that this would alleviate the
exponential growth problem a bit....
Comment 51•23 years ago
|
||
Filed bug 97229 for the frame construction time problem. Yes, I though about dividing the <pre> too, I think it's worth to implement it (see numbers in bug 97229). The reason for the slowdown in the content sink was that I've changed whitespace tokens to use nsSlidingSubstring instead of nsString. It's very slow for strings consisting of only one char, but maybe that's only on my system.
Comment 52•23 years ago
|
||
Andreas, you may want to email scc@mozilla.org and ask him how to efficiently use the string classes to do what you're trying to do with nsSlidingSubstring.... Chances are, he can suggest a more effective approach. Things like + mTypeID = nsHTMLTags::LookupTag(nsAutoString(mTextValue)); can almost certainly be done more efficiently (without creating and initializing an nsAutoString, for example).
Keywords: mostfreq
Comment 53•23 years ago
|
||
> + mTypeID = nsHTMLTags::LookupTag(nsAutoString(mTextValue));
I have undone this already. It was just a hack to use nsSlidingSubstring
without changing LookupTag(), but it turned out that it had no benefit over
the current approach without changing more of the parser code. Where I can't
see an improvement I leave the code as it is now.
Comment 54•23 years ago
|
||
Will this fix bug 98149?
Comment 55•23 years ago
|
||
*** Bug 98149 has been marked as a duplicate of this bug. ***
Comment 57•23 years ago
|
||
Comment 58•23 years ago
|
||
Comment 59•23 years ago
|
||
Attaching my old testcase from a dupe, and a new instance I came across today here so they can be verified against when this patch is ready.
Keywords: testcase
Comment 60•23 years ago
|
||
This isn't going to make 0.9.5, obviously, so targeting for 0.9.6. Andreas: perhaps you could freshen the patch and/or list what the remaining issues if any are before we try to get reviews? Adding perf since the htmlparser rewrite of this has performance impacts (some positive).
Keywords: perf
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Comment 61•23 years ago
|
||
Is this rewrite absolutely necessary ( though I appreciate the effort )? It looks scary. Remember the existing code has evolved for quite some time and I'm not sure how well the patch above could be tested for regressions. Is this is the only solution to fix view-source problems? I would prefer a simpler solution focusing only on the problem ( view-source ). I'm very nervous about the patch ( though I haven't looked into it throughly ). Please don't make things complicated than what it's now.
Comment 62•23 years ago
|
||
Hmm. Perhaps the best thing to do here would be to arrange for a branch. After seeing the havoc linktoolbar wreaked, this seems way too scary to just check into the tree, regardless of review. OTOH, the view source implementation does have sort of an air of being tacked on, and it would be nice to see it improved. A branch would let us spin through a whole bunch of regression tests and so forth without disturbing the stability of the main tree-I think it's something we should seriously consider.
Comment 63•23 years ago
|
||
Whether we use this patch or not, we need view-source to show an accurate representation of the source, otherwise it is useless! I know this patch is intended to fix a lot of view-source bugs. Do we have a list of those bugs in one place?
Comment 64•23 years ago
|
||
I do like the idea of fixing viewsource problems but not at the cost of jeopardizing layout. If the intention is to fix viewsource problems then we need to a solution to target just that problem.
Comment 65•23 years ago
|
||
Sorry for not working on this for a long time. But I have a new patch now. Please note my detailed explanations at http://c07.de/mozilla/bug57724/changes.html (will attach this here too). Binaries for Linux are available at http://c07.de/mozilla/bug57724/ .
Comment 66•23 years ago
|
||
Comment 67•23 years ago
|
||
Updated•23 years ago
|
Attachment #44427 -
Attachment is obsolete: true
Updated•23 years ago
|
Attachment #44660 -
Attachment is obsolete: true
Updated•23 years ago
|
Attachment #46746 -
Attachment is obsolete: true
Comment 68•23 years ago
|
||
> No newline tokens for plain text documents (performance reasons). Line breaks
> are contained in text tokens instead.
How big a performance win is this? The reason I ask is the if you have actually
implemented all this line-counting correctly we should be able to do things like
insert named anchors in the source view at the beginnings of lines. This would
allow scrolling to those anchors and the like.
The other concern I have is that this patch will break line wrapping in view
source as far as I can tell. Perhaps we can put the id="viewsource" on the
<body> and change viewsource.js appropriately?
Comment 69•23 years ago
|
||
Forgot to say something about performance: - Normal performance (not view-source) is only slightly better, about -0.5% (page loading time without images etc.). - Performance for view source depends largely on the size of the page. Same performance for typical pages, up to 10% slower for small and/or simple pages (e.g. +7% for the Mozilla homepage), much faster for large and/or complex pages (e.g. -90% for an 1MB bug list). - Much faster for view-source without syntax highlighting and other text/plain documents (typical -50%). - Poorer performance if you enable the new pref for styled delimiters, except for very large documents. Can be more than 100% slower for small documents with many comments like <!-------------------------------------->.
Comment 70•23 years ago
|
||
>> No newline tokens for plain text documents (performance reasons). Line breaks >> are contained in text tokens instead. >How big a performance win is this? The reason I ask is the if you have actually >implemented all this line-counting correctly we should be able to do things like >insert named anchors in the source view at the beginnings of lines. This would >allow scrolling to those anchors and the like. Newline tokens are created only after markup (>), not for line breaks in text. There is no change in current behavior other than ignoring markup in plain text. Creating newline tokens for all line breaks would probably cost some performance. >The other concern I have is that this patch will break line wrapping in view >source as far as I can tell. Perhaps we can put the id="viewsource" on the ><body> and change viewsource.js appropriately? Will test this.
Comment 71•23 years ago
|
||
My patch doesn't break line wrapping.
Comment 72•23 years ago
|
||
Andreas, you tested this on a document big enough to trigger the splitting into 512-token blocks? And looked at the _end_ of the file? Basically, it looks like line wrapping will only work for the first <pre> with your changes, since that's the only one that has the #viewsource id attached to it.
Comment 73•23 years ago
|
||
Boris, you're right. Need to change that.
Comment 74•23 years ago
|
||
Some comments: The fact that the XML flag isn't set for XHTML as text/html is a bug, see bug 107904. We shouldn't support the weirdnesses of <plaintext>, <xmp>, and <listing>, all of which are obsolete-in fact, there's already a bug to remove <plaintext> support open, bug 88987. I will create testcases to check out some fun SGML stuff (particularly attribute handling) soon.
Comment 75•23 years ago
|
||
I've been using this for the past several days and haven't noticed any weirdnesses yet. Loading of view source seems slow on some pages, but not to a greater extent than I experience with normal builds. Pushing off to 0.9.7 while I'm at it, as this clearly won't be in for 0.9.6...
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Comment 76•23 years ago
|
||
*** Bug 107611 has been marked as a duplicate of this bug. ***
Comment 77•23 years ago
|
||
0.9.7 is out and the bug is still there... so I'd suggest to adjust the "target milestone" setting!
Comment 78•23 years ago
|
||
->1.01, I doubt this patch can be adequately tested before 1.0. :-(
Target Milestone: mozilla0.9.7 → mozilla1.0.1
Comment 79•23 years ago
|
||
What does 1.0.1 mean?? You want to release the "rockstable bugfree" 1.0 version including this bug??
Comment 80•23 years ago
|
||
The patch in this bug involves a destabilizing major tokenizer rewrite with unknown consequences for lots of parsing issues. So yes, unless this lands very soon (0.9.8 timeframe) this should not land before 1.0.
Comment 81•23 years ago
|
||
one of the purposes of the massive bug reopening/dependency creation was that the more serious of these bugs could be solved individually, without emplacing the entire patch here. (as far as that goes, if we're going to do a major rewrite, I think we should go whole-hog and do a separate view-source tokenizer).
Comment 82•23 years ago
|
||
I like Christopher's proposal for a separate viewsource tokenizer. This IMO would reduce maintenance cost and would limit regressions to viewsource only.
Comment 83•22 years ago
|
||
Agreed. A separate view-source tokenizer would be the way to solve the problem without abandoning syntax highlighting. In a separate view-source tokenizer, *only* the parsing needed for syntax highlighting would be done, and it could be audited to make sure it lost no information, even for totally invalid pages. It could probably be significantly simpler than the regular tokenizer! Meanwhile, the regular tokenizer could continue to do whatever 'corrections' and optimizations it wanted. However, that might take a while to implement. In the interests of getting something done about this in the *near* future, how about a way to turn off syntax highlighting, and have 'view source' *just* show the raw document? This could be a hidden pref. However, I'm pretty sure there's enough demand for 'view source' to show each page *exactly* as it came from the server, that that should be the *default*. Better to have a defaulted-to-off preference "Turn on syntax coloring in view source (and potentially lose information)".
Comment 84•22 years ago
|
||
I have to cast my vote in favour of a "raw", non-tokenised mode for view-source. That would fix most of these bugs for the web developers, and the majority of normal users won't care. Would this be hard to implement? (I suspect easier than a seperate view-source tokeniser)
Comment 85•22 years ago
|
||
It would be pretty trivial to do that, in fact (just have nsViewSourceChannel reset the type to text/plain). The losses are: 1) lack of syntax highlighting (a _major_ reason people prefer Mozilla's/Netscape's "view source" to IE's). 2) Some intl pages which set a charset via a META tag may no longer be view-sourceable in the right charset. We _do_ set the default charset on the webshell in the view source window under some conditions, but to be truthful I'm not sure what those are. In any case, that would only apply to "view source" from the menu not to "view-source:" urls or to view source launched from the JS console (!). In my opinion, item #2 is a _major_ drawback to the proposal. Item #1 is avoidable if we use the existing "view source highlighting" pref to control this behavior. Item #2 needs careful thought and input from intl folks who are more familiar with the issues.
Comment 86•22 years ago
|
||
The following observation might be related. If you copy something from view source and paste it to another application, additional empty lines are added at some places. pi
Comment 87•22 years ago
|
||
*** Bug 142044 has been marked as a duplicate of this bug. ***
Comment 88•22 years ago
|
||
What's the status on this? We have a patch for a long time now, but no activity. Also, maybe the blocking-list could be duped to this bug, looks like they all are special cases of this bug. pi
Comment 89•22 years ago
|
||
> What's the status on this? We have a patch for a long time now, but no activity. See comment 61 and comments thereafter by both Harish and Andreas. The blocking bugs _were_ dupped. They were _undupped_ because many of them can be fixed independently without a complete parser rewrite (which is what this patch is).
Comment 90•22 years ago
|
||
I guess the target milestone should be moved then. pi
Updated•22 years ago
|
Keywords: helpwanted
Target Milestone: mozilla1.0.1 → Future
Comment 91•22 years ago
|
||
example: I'm writing on a php-project. In that project I've a telephone list. It will work with the following arguments (only the important will be mentioned) SM = sub menu (number) to show V = variable that decides if the form to fill in or the results will be shown (0 = form, 1 = execute sql and show results - without the form) Suchwort = words I'll look up in my database (will be divided thru explode in different words by blank). It's a field in my form. mypage.php?SM=6 ==> shows me the page with the form to fill in, which will be send like that: <form action="mypage.php?SM=6&V=1" method ="post"> mypage.php?SM=6&V=1 ==> examins different variables, if they are set. If they are set, they control the order, count of records per page,.. One check I'll do is: if (empty($Suchwort)) { $V=0; } That means: if someone comes over history/bookmark/.. to the page, it's not interesting, if V=1 or V=0. When I get back results back for my query and look up the sourcecode (Strg-U) of my page I'll get back the form. ==> I get a new request for mypage.php?SM=6&V=1 . And there's the problem: I don't send $Suchwort via adress, I send it through a form. mozilla doesn't remember that and finds an empty $Suchwort and resets V to the value zero. If testing Strg-U on a page called with mypage.php?SM=6&V=1&Suchwort=123 it will work without any problem. I think it would be better to analyze the page in the cache than sending a new request. If using dynamic pages you're getting problems when searching bugs in your source-code or when examining pages in www. I'm using: en-US, rv:1.2a, Gecko/20020907 MultiZilla/v1.1.22
Comment 92•22 years ago
|
||
Frank: wrong bug. this bug is for view source displaying the right page but with the source slightly modified.
Comment 93•22 years ago
|
||
quotation from first entry: These are bugs where view-source doesn't display a page as originally sent. ====== I thought, it'll fit, because "doesn't display a page as originally sent" is very general and opens different ways to interpret ;-) I found different bugs for "view-source" but no bug would fit. SDo I thought, I add the problem here (general).
Comment 94•22 years ago
|
||
Frank, this is why one reads all the comments on a bug before commenting.
Comment 95•22 years ago
|
||
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and <http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss bugs are of critical or possibly higher severity. Only changing open bugs to minimize unnecessary spam. Keywords to trigger this would be crash, topcrash, topcrash+, zt4newcrash, dataloss.
Severity: normal → critical
Comment 96•22 years ago
|
||
This is not minor. It's normal. Please fix that.
Updated•22 years ago
|
Severity: minor → normal
Comment 97•22 years ago
|
||
As a note, I'm considering removing the whole view-source feature entirely and just using an editor to view the source. The amount of time being spent on view-source is utterly disproportional to its usefulness (or rather the usefulness it offers over a text editor).
Comment 98•22 years ago
|
||
Take a look at bug 8589, which would do that elegantly. I tried to take a stab at it, but I couldn't work out the undocumented internals of Mozilla well enough to do it. If you have the knowledge of Mozilla internals to do it, pleeeeeeease do and we will love you forever!
Comment 99•22 years ago
|
||
Well, since the view-source channel now returns the application/x-view-source MIME type all the time, all that would be required would be to not register an internal handler for that type (see nsContentDLF.cpp). Then when you try to view source you would get the normal helper app dialog, select an application to view with, make sure "ask me every time" is unchecked, and be done with it. This is essentially a one-line change (not including the code that could be removed as a result, of course).
Comment 100•22 years ago
|
||
If internal view-source is removed, some of the code should probably be kept and moved to Composer for things like bug 58730. (In which case the source is munged anyway to achieve more compliant HTML).
Comment 101•22 years ago
|
||
bz: Anyway, how should removing view-source (which I really like because of syntax highlighting and such stuff) help to resolve this bug? I thought this bug is because the Cache doesn't give us the right source for the pages in the given cases - if that's true, the text editor would happen to get that wrong source in those cases, which would mean that bug wouldn't be solved.
Comment 102•22 years ago
|
||
KaiRo: No, it's the parser who munges the pages, not cache. aiui, if you disable syntax highlighting, it might work...
Comment 103•22 years ago
|
||
KaiRo, you're _way_ in the wrong bug, like Biesi said. Aaron, if the composer people want the code they can get it out of CVS (Attic, to be exact). Again, this is presuming we decide to remove the internal view-source. Given all the problems it's causing, I will likely do it by summer unless someone fixes the major issues with it (poor tokenizer, poor performance, etc).
Comment 104•22 years ago
|
||
bz, biesi: Oops, sorry, it seems I'm still a bit confused by the word "munge" ;-) Anyway, thanks for claifying I was thinking/writing of the wrong bug here... I still don't like view-source going away (we're getting nearer and nearer to do everything the same as IE, it seems) - but I have no right to complain as long as I'm not able to code anything which could make our situation better here :(
Comment 105•22 years ago
|
||
Viewsource should map to the source document 1:1. This is not completely true in mozilla because the viewsource content is processed by the same tokenizer that is tuned to handle quriky content. IMO, the viewsource content should be handled by a) a new tokenizer or b) avoid tokenizer completely ( may be by using a serializer instead ). Syntax highlighting might be an issue but Im sure we can come up with something :-)
Comment 106•22 years ago
|
||
In "view source", I see a normal space where there is in the original document. Is there a bug for that or should I file a new one? BTW, sorry if I get it wrong, but aren't the dependencies reversed? I.e. if this is a tracking bug, then this bug should _depend on_ fixing all the concrete view-source bugs, not vice versa...
Comment 107•22 years ago
|
||
> Is there a bug for that or should I file a new one?
Is that regular view source, or view selection source?
The dependencies are correct. This is not a tracking bug, this is a bug on the
root cause of all those other bugs -- the fact that we use the normal HTML
tokenizer for view source.
Comment 108•22 years ago
|
||
> > In "view source", I see a normal space where there is in the
> > original document. Is there a bug for that or should I file a new one?
> Is that regular view source, or view selection source?
Ah, sorry, I didn't realize there was a difference. It only does this in view
selection source. Is that a different bug than this?
Also, sorry about my misunderstanding about the dependencies.
Comment 109•22 years ago
|
||
> Is that a different bug than this? Yes, bug 155635. And it's fixed in current builds.
Comment 110•21 years ago
|
||
Can someone change the summary of this bug to use real words (i.e. not "munging") so that people have a better chance of finding this bug when they search Bugzilla? How about "View Source does not display original page source as sent by server."
Updated•21 years ago
|
Summary: View-source munging pages → View source munging pages (does not display original page source as sent by server)
Comment 111•21 years ago
|
||
Ah, maybe this is the bug report I am looking for. I've just been commented at bug 55583 but that may deal with a different matter although the Summary describes my problem very good. I am a web developer and thus I need to see the html code that a server sends to the client. But if I use View Page Source on a dynamically created page resulting from a POST request, Mozilla asks me to send the POST request again. This means: Mozilla does not show the original html source code which it should get from the cache or somewhere. The problem is that the resending the POST request can result in a completely different page (e.g. if a transaction has finished after the first POST request). Want to try yourself? Go to http://democam.mobotixserver.de/cgi-bin/store2rom press the button, wait for the page to return and then do right-click -> View Page Source. Now a popup window will appear telling you that the "page you are trying to view contains POSTDATA that has expired from cache".
Comment 112•21 years ago
|
||
Nope, wrong bug. Notice this bug has to do with PARSER.
Comment 113•21 years ago
|
||
Daniel: nope, that would be a different bug - this one is about view-source showing not exactly what the server sent, such as not displaying redundant double-quotes in HTML tags etc. Anyway, your testcase works perfectly for me - perhaps you're using an old version? Or perhaps you didn't wait until the page finished loading before trying to view the source (I believe the problem you describe happens then, though I'm not sure)?
Comment 114•21 years ago
|
||
Mh, if it is really about "what the server sent" then check out this page which embedds a timestamp on every page sent out (only precise to the minute): http://www.theregister.co.uk - Open the URL using Mozilla on a *Windows* system. (Linux works ok) - Enter something in the text box (upper left corner) and press Go!. - See the time in the head bar on the page. - Wait one minute or brew a tea. - Now "View Page Source", confirm the pop up window. - When the source is displayed find "Updated:" You will see that the time has changed compared to the one shown on the rendered page. I think, this fits the summary "does not display original page source as sent by server". If this is the wrong bug could you please point me to a more appropriate one? Cheers Daniel Germany
Comment 115•21 years ago
|
||
Daniel, please read the whole bug, not just the summary (and in the case of a tracking bug its dependencies) before commenting. This bug covers the fact that what view-source shows is a tokenized version of the source that comes from the server; the tokenizer sometimes "fixes" the source or loses data... Your problem is totally unrelated to this bug; see bug 166786.
Comment 116•21 years ago
|
||
Boris, thanks for pointing me into the right direction. Then I'm off to bug 166786. :-) Goodbye.
Comment 117•21 years ago
|
||
Boris, you mean that mozilla gets a page and tokenizes it without keeping the original source before doing anything else. The contents of bug 188609 does not satisfy me. What can I do to have the real page source ? I haven't contributed to mozilla yet outside of submitting, commenting or voting for bugs. What can I begin with ?
Comment 118•21 years ago
|
||
That's correct. When doing view-source of a page, we start by getting the source from cache, tokenize it, then generate a document that has highlighted source. The tokenizing is only needed for syntax highlighting; if highlighting is off the tokenizing could very well be skipped. I'm not sure what you can begin with, because I'm not sure what you want to do; feel free to email me to discuss (the discussion does not seem directly relevant to this bug).
Comment 119•21 years ago
|
||
Many text editors manage to syntax-highlight code without screwing it up in any way. So can we.
Comment 120•21 years ago
|
||
Actually, I've managed to write malformed code that's made every single text editor I tried either give up on highlighting it or actually get so confused that it changed the data in the file (eg by not loading all of it). But yes, we can do better. All we need is to write a separate tokenizer for view-source, as discussed, instead of using the same one as pageload. Wanna do it? Or are you only willing to make comments that don't really take this bug anywhere?
Comment 121•21 years ago
|
||
I'd be quite happy to write a view-source tokeniser algorithm. But my attempts at finding where it would fit in the program code are driving me mad - I guess I need more than just the XUL and JS sources...?
Comment 122•21 years ago
|
||
Yes. The tokenizer would be instantiated from nsParser.cpp (right now that instantiates an nsHTMLTokenizer).
Comment 123•21 years ago
|
||
Remember while writing it that it has to handle malformed html, or stuff that isn't HTML at all; the output must be character-for-character the same as the imput, except possibly for the colors. (This shouldn't be too hard, but I want to make the requirements of the problem clear.) Probably each token should hold the exact original character string it's associated with, and there should be at least one token type for "I don't know what this is".
Comment 124•21 years ago
|
||
Is a tokenizer really required for view source? How important really is (poorly) attempting to color-code HTML in the browser? (in the editor I can understand). And please, no "I'll stop using Mozilla and shoot myself if you remove colored view source" comments from the non-developers. If MNG and other useful features are being removed to slim down Moz/Firebird, why not ditch this? It's buggy, a time sink for someone to redo it, and excess code (which leads to excess bugs). IE somehow manages to get by without a page source tokenizer. And it's not something I hear the masses clamoring for. (The real masses, not this bugs CC list) Unfortunately I suspect this may lead to "View Selection Source" going away too, which is (arguably) semi-useful to a small percentage of users. But that tends not to work well half the time anyway. Perhaps it would be a candidate for an XPI extension.
Comment 125•21 years ago
|
||
Do you ever use "view source"? I can tell you for a fact that the highlighting makes it a lot more readable... As a result, view-source is the #1 tool for me when I'm debugging web page issues. Please don't drag MNG in here -- if the imagelib owner wants to cut stuff, that's his decision, but he has no control over the parser and content code. Yes, view-source could be made into an extension, with a lot of work (a bunch of extensions, one per platform, since it uses C++, actually).
Comment 126•21 years ago
|
||
One last thing. Size-wise, view-source code is about 40KB, if anyone cares.
Comment 127•21 years ago
|
||
Here's a thought for implementing. Instead of rewriting the tokenizer, add fields to each token for "source-start" and "source-end", byte offsets into the raw source. Then to display color-coded source, we just display the raw source and walk the parse tree to decide how to color it.
Comment 128•21 years ago
|
||
I (used to) use view source routinely to debug pages which were not displaying properly. The Mozilla coloring tokenization process meant that view source *did not display* the source of the page. The current coloring system is *worse than nothing* for debugging web pages. I've ended up using "Save Page..." and opening the saved file in a text editor every time.
Comment 129•21 years ago
|
||
View source is downcasing tags in XHTML 1.0 strict. I wish to see the source as it is sent from the server even if it incorrect.
Comment 130•21 years ago
|
||
That is indeed the whole point of this bug.
Comment 131•21 years ago
|
||
view source does not show the original source again:( After a post, it shows the page before post. (However as I saw, I can reload the source :), I just recognize it now, that allows again) Cache is enabled and has 50 Mb. Sorry if this announce is not in the best place here, but I could not find the original bug. Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6a) Gecko/20031030
Comment 132•21 years ago
|
||
*** Bug 225209 has been marked as a duplicate of this bug. ***
Comment 133•20 years ago
|
||
*** Bug 251528 has been marked as a duplicate of this bug. ***
Comment 134•20 years ago
|
||
(In reply to comment #127) > Here's a thought for implementing. Instead of rewriting the tokenizer, add > fields to each token for "source-start" and "source-end", byte offsets into the > raw source. Then to display color-coded source, we just display the raw source > and walk the parse tree to decide how to color it. Sorry I've been quiet on this bug for a while. I think this would be a good idea. Not only would it solve the problem once and for all, it would save having to maintain two separate tokeniser algs in parallel, and consequently any chance of the two disagreeing over whether something is HTML or XHTML.
Comment 135•20 years ago
|
||
That assumes we have the raw source as some sort of contiguous thing in memory all at once. That almost never happens; the source is parsed incrementally as it comes in from the networking library.
Comment 136•20 years ago
|
||
Oh. I would've thought it saved it verbatim to the cache. If it doesn't, it should.
Comment 137•20 years ago
|
||
1) Not everything is saved in the cache. 2) You're suggesting we get the source twice (once to construct the parse tree, and once to do this walk thing).
Comment 138•20 years ago
|
||
(In reply to comment #137) > 1) Not everything is saved in the cache. Why not? Even if there's some dreaded 'expire' or 'no cache' directive on the page, surely it should be there temporarily? > 2) You're suggesting we get the source twice (once to construct the parse tree, > and once to do this walk thing). What are you on about? I'm suggesting that we get the source once, save it to the cache and tokenise/parse it.
Comment 139•20 years ago
|
||
When reading data from the cache, we can't get it all at once either -- it comes in chunks.
Comment 140•20 years ago
|
||
We don't have to wait until the whole page is retrieved and then parse it out of the cache. As each chunk comes in, send it both to the cache and to the tokeniser.
Comment 141•20 years ago
|
||
That's what we do right now for normal HTML loads. But when you want to do view source, we'd have to read the data from cache, generate a parse tree, then walk all the data _again_ to display it, per your suggestion. Assuming I understand your suggestion correctly.
Comment 142•20 years ago
|
||
Why not use the parse tree we already keep in memory, as character offsets into the cached source?
Comment 143•20 years ago
|
||
We don't have a parse tree in memory (at any point in time, actually). We have a DOM. Bloating all DOMs significantly for the benefit of view-source is not an option. Further, the DOM need not correspond to the original source in any way (see JS manipulation of the DOM). Finally, the default tokenizer/parser just drops a lot of stuff on the floor altogether; stuff that needs to show up in view-source.
Comment 144•20 years ago
|
||
(In reply to comment #143) > We don't have a parse tree in memory (at any point in time, > actually). What _do_ we use to view source at the moment then? Is there any reason it can't still be used to view source under the modifications proposed from comment 127 onwards? > Finally, the default tokenizer/parser just drops a lot of stuff on > the floor altogether; stuff that needs to show up in view-source. The view-source code would display the cached source as is, using the tokenise tree/parse tree/whatever merely to syntax-highlight.
Comment 145•20 years ago
|
||
> What _do_ we use to view source at the moment then? We read it from cache and retokenize/reparse it (with slightly different rules so we don't lose content). The content sink builds a different content model from the normal HTML one from the parse tree. Please see nsViewSourceHTML.cpp. > The view-source code would display the cached source as is, using the tokenise > tree/parse tree/whatever merely to syntax-highlight. That really can't work if the tree doesn't match the source, is the point.
Comment 146•20 years ago
|
||
What possible causes are there of the tree not matching the source?
Comment 147•20 years ago
|
||
I thought I covered that in comment 143? In any case, with Blake Kaplan's recent tokenizer work we're using the same tokenizer in both cases and just not dropping things in view-source. You'll note that most of the bugs blocked by this bug are in fact fixed; of the remaining ones, most are not actually bugs but rfes. So there is no need to do a wholesale rewrite of anything, is the point.
Comment 148•20 years ago
|
||
(In reply to comment #147) > I thought I covered that in comment 143? OK, so it wouldn't make sense to use the DOM as the whatever. > In any case, with Blake Kaplan's recent tokenizer work we're using the same > tokenizer in both cases and just not dropping things in view-source. You'll > note that most of the bugs blocked by this bug are in fact fixed; of the > remaining ones, most are not actually bugs but rfes. OK. I guess we can wait and see what Blake comes up with.
Assignee | ||
Comment 149•20 years ago
|
||
Taking, as I ended up fixing the rest of these bugs.
Assignee: c → mrbkap
Status: ASSIGNED → NEW
Assignee | ||
Comment 150•20 years ago
|
||
...and marking this bug as FIXED! The rest of the bugs depending on this one are either not bugs in view source (such as bug 105937) or simply RFEs (in other words, we're now showing (hopefully!) the exact source code of the page, with some slight coloring tweaks remaining to be more expressive; but this bug as reported is no longer a problem).
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 151•20 years ago
|
||
Congratulations, and thank you!
Comment 152•19 years ago
|
||
I'm seeing this problem again, in FireFox build 20050823 with bfcache enabled. I can reliably reproduce it like this: 1. Visit http://www.flinthomes.net/ 2. Near the lower-left of the page, under "MLS QuickSearch", hit "Find" (you can leave the textbox empty) 3. On the resulting page, view the source. It's not the same source that was rendered (it's as if the page is re-requested without the appropriate POST variables). I couldn't find a more specific or recent bug dealing with this, so re-opening this one.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 153•19 years ago
|
||
Please file new bugs on new problems. If the problem only happens with bfcache enabled, please make sure that your bug blocks bug 274784 and cc me. As for the rest, this bug was about the right source being dealt with but not being _shown_ correctly due to parsing issues. So it sounds like a totally separate issue.
Status: REOPENED → RESOLVED
Closed: 20 years ago → 19 years ago
Resolution: --- → FIXED
Comment 154•12 years ago
|
||
I have reported the same bug for a view years ago at http://www.mobildiscounter.de
Comment 155•12 years ago
|
||
same bug here at http://www.autoradio-test.org some month ago
Comment 156•12 years ago
|
||
(In reply to Jens from comment #154) > I have reported the same bug for a view years ago at > http://www.mobildiscounter.de (In reply to Carry from comment #155) > same bug here at http://www.autoradio-test.org some month ago View Source on these pages works for me. If you still see the problem in a fresh build, please file a new bug with detailed steps to reproduce.
You need to log in
before you can comment on or make changes to this bug.
Description
•