Closed Bug 66334 Opened 24 years ago Closed 23 years ago

view source needs to be a protocol handler

Categories

(Core Graveyard :: Embedding: APIs, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9

People

(Reporter: jud, Assigned: chak)

References

Details

Attachments

(9 files)

Currently view-source is implemented by putting the docshell into a view-source 
mode and some other trickery. we should have a view source protocol handler 
("view-source") which fetches url data (hitting the cache if possible) and 
returns the content type as text/plain (thus it will not be treated as html).

This will remove the view-source hacks currently exposed, and then anyone can 
"view source."
Jud,

One benefit of the current "trickery" is that it can embellish the source with
color/highlighting and other nice things.  That got turned off or removed at
some point, but it used to render the source much prettier.

I'm not sure you could do that in a protocol handler (it used to be in the
parser because it required some knowledge of html source coding conventions).

How about a generalized mechanism that let's one override the content-type?
Then we could offer features that let the user force a given URL to be displayed
as plain text (or plain text as html, whatever), in cases where the server is
misconfigured/confused?
> view source protocol handler ("view-source")

Is that an officially approved scheme? If not, please use the private namespace.

> which fetches url data (hitting the cache if possible) and

For the record: Please pay attention not to worse bug 55583 (i.e. don't make it
harder to fix that bug).

> How about a generalized mechanism that let's one override the content-type?

Nice idea.
OS: Windows 2000 → All
bill: colorization would likely be lost unless we passed some type of cookie 
around in the content type. maybe view-source would be "text/plain/source" or 
something. the parser (or whoever's doing the colorizing) could key off of that 
instead.

Ben:
pick a scheme out of a hat.

IMO, if we do colorizing (lower priority) based on content type, and make 
view-source a protocol handler, 55583 gets fixed for free.
> text/plain/source

Use paramemters: type/plain formatting="source" (or similar, not sure about the
exact syntax)
you're right. sorry:

"text/plain; fmt=source"
Not quite. Read RFC 2646. If we need a proprietary attribute for this value, we
should probably prefix it with something like "moz-". (I don't know of any
convention for naming vendor-specific values to media type attributes.)
cc to self
Is there some reason why we couldn't just emit text/html to support colorized 
text?

It seems as if it should be up to the view-source protocol handler, whether it 
just spit out the source as 'text/plain' or ran it through the parser to gather 
more contextual info and generated text/html to 'pretty print' the source...

-- rick
Hardware: PC → All
hmmm, wouldn't that modify the source itself (whitespace, etc) so a "save as" 
wouldn't work well?
I interpreted Jud's initial comment to mean that he wanted to replace *all* the
special view-source "trickery" with a new protocol handler.  I just wanted to
point out that not all the trickery (code for which might still be there, for
all I know) could go into the protocol handler because it would have to know
about HTML syntax, etc.

But now it seems that some "trickery" will remain outside the protocol handler
(if necessary, to do any special formatting).  So the issue boils down to:

1. Do we move the code that directs the parser/layout to do "view-source"
(whatever that might entail) from the docshell to a new protocol handler?

2. If so, how do we implement that protocol handler (i.e., how does that
protocol handler then drive the parser/layout to render the source)?

Is that a reasonable analysis?

The reason for asking ourselves the first question has not been stated
explicitly.  I suspect it has something to do with the fact that nsIDocShell
isn't publicly "exposed" and therefore embedding clients wouldn't have access to
"view-source" if it were provided via that interface.  Is that correct?

Using a protocol handler instead of the docshell view mode sounds reasonable.
But I wonder if it won't lead to some unpleasant complications.  For example,
what if one is viewing text/plain and chooses to "view-source" on that?  It
seems like the original content type would still be pertinent, no?

It looks as if the doc shell already supports "view-source:" as a prefix on the
URL (http://lxr.mozilla.org/seamonkey/source/docshell/base/nsDocShell.cpp#3090).
Why not just leave that alone and pretend we've already implemented the new
protocol handler?

Lastly, I don't quite see how resolution of this impacts bug 55583.  That bug is
about the fact that we don't always keep the page source around (and even if we
did, there's no way to ask for that saved version to be loaded).  Jud, are you
implying that the "view-source:" protocol handler would take advantage of some
new and otherwise unavailable Necko mechanism to get at the always saved source?

If not, then I think we'd still be stuck in the same predicament.  Namely, that
sometimes viewing the page source hits the server again.
bill: 
1. yes.
2. all it does is pass the source directly into the parser as text/plain, it 
doesn't do any colorization (we don't support this anyway), or source 
manipulation. it's is simply _view-source_.

Your assessment of why we'd want to do this is correct (among other things).

text/plain view source == text/plain view source. they're the same thing.

it impacts 55583 in that if we never modify the source, then it's obviously the 
*plain* source, and 55583 is nullified. as a protocol handler, it could leverage 
the copy that necko cached. this would imply we always had a cache hit for 
view-source which may not be valid. I guess that would indicate that the parser 
should be able to re-generate (or keep a copy around) the original source file. 

hmmm, maybe the view-source handler (or someone capturing view-source: urls) 
just tells the parser to cough up the original document.

maybe each docshell maintains a copy of the original, top level, content it's 
displaying, and it calls docshell::loadstream() with that content when we want 
to view source?
valeski, the cache does not always hold a copy. That's the problem in that bug
(and similar SaveAs etc. bugs).
I assume that save-as is an orthogonal issue...  I don't think that you would 
want to save the 'view-source:http://foo...' URL.  Rather, you would want to 
save the 'http://foo...' URL instead.
save-as view-source:url sounds reasonable to me, if that means that a normal 
save-as url results in saving rendered source.  maybe we need to resurrect 
the wysiwyg protocol.  I really should learn how to convert my modification 
of the dom dumper into a js protocol handler.
See also bug 15372.
Hmmm... catching up on this bug but hasn't pollmann already written viewsource? 
:)
Who, me?  The only protocol I've written is wyciwyg, and that's not checked in
yet...  :)
doh! wrong number. 
Blocks: 70229
Marking future for the moment.

Target Milestone: --- → Future
-> chak who's working on this. I'm resetting the milestone to -- so chak can
assign a new one.
Assignee: adamlock → chak
Target Milestone: Future → ---
Blocks: 72836
Setting target milestone
Target Milestone: --- → mozilla0.9
I've been looking at fixing this bug and noticed that some of the comments in 
this bug are out of sync with today's state of things as far as colorization is 
concerned. We do have code to display colorized source, so it does not make 
sense to move it to the realm of a view-source protocol handler.

Here's the general approach i'm taking:

0. Remove the code which checks for the "view-source:" prefix on URLs in 
nsDocShell.cpp and puts itself into a view-source mode. We'll use a protocol 
handler instead.
1. Write a new viewsource protocol handler which internally uses http to fetch 
the page source
2. Initially the view-source proto handler was giving out a mime type of the 
form:

"<orig_mime_type>; x-disp-fmt=view-source"

where <orig_mim_type> = the original mime type the HTTP channel gave us, so this 
could be text/html,text/xml or whatever. 
The intent here was not to lose the original mime type.

Doing this i discovered that we need to change code in a lot of places to handle 
document/viewer creation etc. for this "new" mime type.
(I could use some help here on how to fit this new mime type into the existing 
scheme of things without changing too much of what we already have)

3. Instead of emitting a modified mime type (of the form "text/html; 
x-disp-fmt=view-source") the view source protocol handler just returns the 
original mime type as it got from the http channel. 
Since the docshell needs to differentiate between the text/html which came from 
a view-source protocol handler Vs the regular http handler i do the following
(this is mainly needed to determine whether to create a view or a view-src 
viewer):

// in nsDocShell.cpp...
nsresult nsDocShell::NewContentViewerObj(const char* aContentType,
   nsIRequest *request, nsILoadGroup* aLoadGroup, 
   nsIStreamListener** aContentHandler, nsIContentViewer** aViewer)
{
   // We need to create a view-source type viewer if we can get 
   // nsIViewSourceChannel from the request
   nsCOMPtr<nsIViewSourceChannel> viewSrcChannel = do_QueryInterface(request);
   if(viewSrcChannel)
        mViewMode = viewSource;

   // Once we have the proper viewMode everything we currently have will
   // just work

   // Nothing changes from here on down in the existing code
  char id[256];
  PR_snprintf(id, sizeof(id), NS_DOCUMENT_LOADER_FACTORY_CONTRACTID_PREFIX 
"%s;1?type=%s",
     (const char*)((viewSource == mViewMode) ? "view-source" : "view"),
      aContentType);
...
...
...
...
}

Doing it this way resulted in minimal changes to the existing code.

Comments?
the mime type emmitted from the view source channel should be "text/plain;
<orig-type>" (rather than "<orig-type>; <some-view-source-type>") then it will
work (modulo colorization of course).

Text colorazation doesn't work today, so lets not try and address it here
(someone file a separate bug if they want).

The docshell shouldn't have any knowledge about view-source (and won't need it
when the content-types are switched); view-source is just a protocol handler. if
you change the mime-type around as I suggest, life will be good. You can even
leave out the "; ..." part for now for simplicity, I don't think it would be
propagated properly anyway, given the current usage of GetContentType() calls.
> Text colorazation doesn't work today

The mozilla build i have shows colorized HTML when i do a View/PageSource. Is 
there more to it (w.r.t colorization) than what we currently have?

> If you change the mime-type around as I suggest, life will be good. 

It does work when i emit text/plain as you suggest. However, whatever 
colorization we had earlier (before this mime type switch) is lost.
*** Bug 74855 has been marked as a duplicate of this bug. ***
*** Bug 60224 has been marked as a duplicate of this bug. ***
my NS build (week old) doesn't support colorization. if we have colorization,
it's spotty, and we'll break it w/ this change. that's fine w/ me because we're
changing the architecture for view-source; things break w/ arch changes.
View Source colorization has been supported for a long long time now.  It's a
pref, and it was off by default until some very recent builds.  Try setting:

user_pref("browser.view_source.syntax_highlight", true);

to see this.

In newer builds the pref is "view_source.syntax_highlight", is on by default,
and has UI.

Colorization is done by inserting <span> nodes into the document in the parser
and then styling the spans.  So I would guess the content-type for view source
needs to be text/html for the colorization to work...
yes. the current colorization model requires content viewer interjection. we can
manipulate the viewer used in the docshell to get this affect.

However, the view-source protocol handler approach doesn't currently support
colorization its primary goals are:

1. create view-source content that is *exactly* what the server sent back (there
are bugs in the current model which manipluates the actual content by inserting
various things (as you point out)).
2. remove docshell knowledge of view-source special casing (view-source "mode").

Colorization is a feature that will indeed break w/ this new model. It's give
and take. We either fix the two things above w/ the new model (and break
colorization), or we leave them for the sake of the defaulted off colorization.
IMO, we're fixing real bugs in favor of a defaulted off "feature."

regarding the last code change Chak posted in the docshell. QI'ing for the sake
of determining "what" something is (and then doing nothing w/ it) is generally
overkill (and introduces stub/meaningless interfaces). You should be able to
make that last code fragment work by using the "<orig-type>; view-source"
content type model, then GetcontentType() on the channel and look for the
"view-source" string in the content type. Just a cleaner way to make the decision.

Before we go ahead and break colorization (which is on by default as of a few
days ago, by the way) with this architecture change, we should consider whether
there is a viable way to still do colorization once this change is made.  It
does not have to be the current method, but a method should still exist.

The two things this architecture change fixes should indeed be fixed, but view
source colorization is one of Netscape/Mozilla's best features...
>regarding the last code change Chak posted in the docshell. QI'ing for the sake
>of determining "what" something is (and then doing nothing w/ it) is generally
>overkill (and introduces stub/meaningless interfaces). You should be able to
>make that last code fragment work by using the "<orig-type>; view-source"
>content type model, then GetcontentType() on the channel and look for the
>"view-source" string in the content type. Just a cleaner way to make the 
>decision.

That's exactly what i did first (please see my posting from yesterday where i 
mention this and ask for help) but that check needs to be done in more than one 
place - i'm not really sure where all since the contenttype seems to be checked 
for at several locations. 

I know for sure it needs to be done in nsDocShell::NewContentViewerObj() and 
here's what i originally had:

nsDocShell::NewContentViewerObj()
{
  const char *pContentType = aContentType;

  // Check ContentType to see if it's a view-source type
  // If it's a view-source: ContentType will be of the form
  //
  //    text/html; x-view-type=view-source
  //

  nsCAutoString strContentType; strContentType.Append(pContentType);
  char *cPtr = (char *) PL_strcasestr(
                            strContentType.get(), "; x-view-type=view-source"
                        );
  if(cPtr)
  {
     mViewMode = viewSource;

     *cPtr = '\0';
     pContentType = strContentType.get();
  }
...
...

}
after talking w/ chak about this. we'll go the protocol handler route and emit
the "<orig-type>; view-source-foo-moniker" and create a content viewer for the
load. this will maintain colorization, and leave all the content manipulation
bugs. The primary gain from the view-source handler will be the removal of the
"view-source:" check hack at the beginning of CreateFixupURI. it will also allow
for arbitrary view-source handler replacement. 

if we want to address the content manipulation bugs by emittingn text/plain, we
can easily throw that switch (we could make it a pref right from the outset).
Judson, that sounds like a good compromise.  We could make the current view
source colorization pref be the pref for this, perhaps?

A question.  I am not sure what it means to make view source be a protocol
handler.  Does this mean that it will be possible to, say, make view source open
in external apps more easily?  If so, then the colorization point is semi-moot
-- people who want colorization can make view source open in an external app of
choice that does highlighting.
we'd probably want to use a new pref name.

one of the things view-source being a protocol handler gives us is mime type
manipluation capability which allows content handling decisions to be made more
easily/correctly. In an embedding situation, an external app could say it
preferred the viewsource content type and handle it. Unfortunately, because the
docshell says it will handle this type, we won't ever kick it out the true
external application handling code.

we could add a pref that would enable/disable mozilla view-source handling, that
that would give external apps a crack at handling it instead.
After various discussions with Rick, Jud and Vidur here's the approach we're 
currently taking:

1. ViewSource handler emits a ContentType type of 
"<orig_type>; x-view-type=view-source"
where <orig_type> is the original content type returned from the server to the 
ViewSourceChannel i.e. it will be something like "text/html", "text/xml" etc.

2. Change docshell so that it knows nothing about viewsource mode etc.

3. Change nsLayoutDLF so that it can recognnize content type of the form 
"<orig_type>; x-view-type=view-source" to create viewers for doing "view-source" 
with "<orig_type>".

4. Fix viewer, mfcembed and mozilla to use the new "view-source:" urls rather 
than setting the docshells mode to viewsource(note that docshell does not have 
the "viewMode" attribute anymore)

Patches forthcoming....
Attached patch Patch to layoutSplinter Review
Attached patch Patch to netwerkSplinter Review
Attached patch Patch to viewerSplinter Review
Blocks: 74486
Blocks: 74862
layout -
* I'm not direct string buffer maniplulation is a good idea. some string impls 
might be sharing the buffers, and someone manipulating one, could cuase another 
use to fail. I'm not up to speed though. 
* the logic looks good though.

docshell -
* r=valeski

netwerk -
* we'll need to add the new dir to the mac.
* nsViewSourceChannel::Init() - you can avoid all the stringification if you 
just pass the "uri" arg into nsIIOService::NewChannelFromURI();
* cancel/suspect/resume should pass down to the transport.
* VSChannel::Get|SetOriginalURI() - shouldn't you be caching the channel's URI, 
and handing it back here?
* ::GetContentType() - rather than constructing the content type evertime this 
is called (an arbitrary number of times), you should cobble the content type 
together in your :OnStartRequest() callback (the earliest possible point at 
which you can get at a valid content type). You should also set mContentType to 
UNKNOWN in your init method.
* ::SetContentLenght() - pass this through to the underlying channel?

viewsource.js -
* r=valeski

mfcembed - 
* r=valeski

viewer -
* is the aURL.get() needed? I think you can just pass in aURL.

what about the other embedding apps and mozilla?
layout -
> * I'm not direct string buffer maniplulation is a good idea. some string impls 
> might be sharing the buffers, and someone manipulating one, could cuase 
another 
> use to fail. I'm not up to speed though. 
> * the logic looks good though.
Now we do not directly mess with the internal buffer - use the public methods 
which do the same:

  nsCAutoString strContentType; strContentType.Append(aContentType);
  PRInt32 idx = strContentType.Find("; x-view-type=view-source", PR_TRUE, 0, 
-1);
  if(idx != -1)
  { // Found "; x-view-type=view-source" param in content type. 
      // Set aCommand to view-source
      aCommand = "view-source";

     // Null terminate at the ";" in "text/html; x-view-type=view-source"
     // The idea is to end up with the original content type i.e. without 
     // the x-view-type param was added to it.
     strContentType.SetCharAt('\0', idx);
     aContentType = strContentType.get(); //This will point to the "original" 
mime type
  }

netwerk -
> * we'll need to add the new dir to the mac.
Will talk to Conrad

> * nsViewSourceChannel::Init() - you can avoid all the stringification if you 
> just pass the "uri" arg into nsIIOService::NewChannelFromURI();

We can't since the URI at this stage has a "view-source:" prefix. We need to 
strip that before creating an http 
or whatever channel - this is what the keyword: channel does.

> * cancel/suspect/resume should pass down to the transport.

How do i do this i.e how do i get the underlying transport interface etc.?

> * VSChannel::Get|SetOriginalURI() - shouldn't you be caching the channel's 
URI, 
> and handing it back here?

Currently i'm not cacheing this and just redirecting these methods to the 
underlying channel since they already 
handle/cache this data

> * ::GetContentType() - rather than constructing the content type evertime this 
> is called (an arbitrary number of times), you should cobble the content type 
> together in your :OnStartRequest() callback (the earliest possible point at 
> which you can get at a valid content type). You should also set mContentType 
to 
> UNKNOWN in your init method.

Will make these changes

> * ::SetContentLenght() - pass this through to the underlying channel?

Done

> viewer - is the aURL.get() needed? I think you can just pass in aURL.

Not needed. Removed the get().

[On a side note i still have'nt got a feel for how to use these (numerous) 
string classes we have here.  Everytime i need to do something (however simple) 
i 'm having to search the source for it's usage and everyone's using it their 
own way]

> what about the other embedding apps and mozilla?

Thought i covered all of them. What else does view-source?
cancel/resume/etc - 
* you can just call the underlying channel's version of these.

you're right, you covered all the apps; sorry about that.
Updated Patches based on jud's comments forthcoming:

patch.layout.1, patch.netwerk.1, patch.viewer.1 will follow...
BTW, can someone help me with adding the new "viewsource" dir for the mac since 
i do not have access to a mac?
- layout
* r=valeski

- nsViewSourceChannel.cpp
* ::Init(). You can just remove

+    const char *protocolSpec = path.get();
+    if (!protocolSpec) return NS_ERROR_OUT_OF_MEMORY;

and pass "path" directly (no .get() or casting) into ->NewChannel(). That'll 
remove the local var.

* nit. NS_WITH_SERVICE() has been depricated in favor of 
do_GetService(contractID, &rv);

* GetContentType() changes haven't been made.

- viewer
* r=valeski
after talking to chak, we'll leave the contenttype stuff as is in the most 
recent netwerk patch (for consistencies sake (other channels do the same) and to 
avoid housekeeping in OnStartReq()).
r=valeski on netwerk.
Leaving GetContentType() as is per Jud's comments above.

However, i fixed the other two issues Jud brought up:

1. ::Init(). You can just remove

   const char *protocolSpec = path.get();
   if (!protocolSpec) return NS_ERROR_OUT_OF_MEMORY;

and pass "path" directly (no .get() or casting) into ->NewChannel(). That'll 
remove the local var.

2. Use do_GetService() instead of NS_WITH_SERVICE
sr=rpotts.

Is the nsIViewSourceChannel interface still needed?  If so, then the comment 
above the getOriginalContentType attribute in nsIViewSourceChannel.idl is wrong 
- it says that getContentType(...) returns 'text/x-view-source'
Though not currently being used by anyone i decided to leave the 
GetOriginalContentType() in there since we may find a use for it after the nsDLF  
re-org Jud's working on.

I fixed the comment error in the idl file. Thanks for catching that.
These changes are now checked in.

Thanks you all for all your suggestions and help in getting this resolved.

Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
This is nice! I should change the js console to use viewsource: instead of
opening the view source window directly.
Moving API bug ownership to David Epstein.
QA Contact: mdunn → depstein
Correction: Changing QA contact for the Embed API bugs to David Epstein.
typed "view-source:http://www.xxx.com" into the URL bar. Page source views fine.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: