Closed Bug 113032 Opened 23 years ago Closed 21 years ago

jar: anchors within pages drop the HTML file name

Categories

(Core :: Networking, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.4alpha

People

(Reporter: typrase, Assigned: andreas.otte)

References

()

Details

(Keywords: helpwanted)

Attachments

(2 files, 6 obsolete files)

Build 2001112012 (0.9.6 release), RHL 6.1, Classic skin

Just for fun, I was browsing Javadoc HTML files packed into a ZIP file, using
the syntax:

jar:file:///path/to/javadoc.zip!/org/pkg/ClassName.html

This works for the most part. You can see the HTML pages and click between them.
(The URL above should let you download similar ZIP files, e.g. OpenAPIs.zip. Let
me know if you have trouble reproducing and need more exact instructions.)

However anchors in these pages seem not to work. Internal hyperlinks within
Javadoc-generated pages (for example ClassName.html) seem to be missing; I am
not sure what is wrong there, it may not be related, so don't worry too much.

The main thing is that hand-written HTML pages (in the OpenAPIs.zip example, try
org/openide/text/doc-files/api.html) which have internal links specified like this:

<a href="#someanchor">...</a>
...
<a name="someanchor">...</a>

do not work. When browsed as file: URLs (unpacked HTML files), it is fine;
clicking such a link scrolls the page to the anchor. However when wrapped in the
jar: URL, the location toolbar shows e.g.:

jar:file:///space/src/r33/nb_all/openide/OpenAPIs.zip!/org/openide/text/doc-files#create-ed-kit

Of course the final component should be:

..../doc-files/api.html#create-ed-kit

And the browser pane rather than display the HTML page shows:

<html><body></body></html>

(i.e. trivial HTML tags displayed as plain text).

So it seems that the algorithm for applying a simple anchor (with no page name)
to a jar: URL is broken, as it drops the file name.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: Networking: File → Parser
Keywords: helpwanted
Target Milestone: --- → Future
It looks like Resolve(...) in netwerk/protocol/jar/src/nsJARURI.cpp is too
simplistic and does not handle the cases covered by Resolve(...) in
netwerk/base/src/nsStandardURL.cpp, specifically the clause checking for
relative URIs beginning with '#' which should only replace the anchor and not
the path. I know little C++ (much less the Moz. macros) or I might suggest a
patch.
Component: Parser → Networking
Mozilla1.1b, RH7.1

Not only anchors but also links like <a href="../../org/pkg/AnotherClass.html">
don't work when jar:file:///path/to/javadoc.zip!/org/pkg/Main.html

Note that although it fails to create correct URL when it is used within one
frame or without frames these links are working when <a> tag contains abtribute
target that refers to another frame.
reset to default owners.
Assignee: dougt → new-network-bugs
Summary: jar:file: URLs work, but anchors within pages drop the HTML file name → jar: anchors within pages drop the HTML file name
Does anyone have a nice jar that contains anchors to test the patch against?
Jesse?
Sure. Using Mozilla 1.2.1, click here:

jar:http://www.netbeans.org/download/dev/javadoc/InputOutputAPI.zip!/org/openide/windows/package-summary.html

Standard Javadoc output, ZIP'd up. Seems to display fine.

Try clicking "Description". Broken, should link to

jar:http://www.netbeans.org/download/dev/javadoc/InputOutputAPI.zip!/org/openide/windows/package-summary.html#package-description

but links to the nonsense

jar:http://www.netbeans.org/download/dev/javadoc/InputOutputAPI.zip!/org/openide/windows/#package-description

Now go back to the first page and click "InputOutput". Fine, displays

jar:http://www.netbeans.org/download/dev/javadoc/InputOutputAPI.zip!/org/openide/windows/InputOutput.html

Most of the internal links are broken. For example, click "NULL". Should display

jar:http://www.netbeans.org/download/dev/javadoc/InputOutputAPI.zip!/org/openide/windows/InputOutput.html#NULL

but in fact the links just takes you back to

jar:http://www.netbeans.org/download/dev/javadoc/InputOutputAPI.zip!/org/openide/windows/InputOutput.html

(at the top of the page, no anchor).

Try the links at the top of the page - "Package" ... "Tree". The first three
work as expected. They have relative URLs referring to a different file with no
"../" component, e.g. "package-summary.html" or "class-use/InputOutput.html".
But the last three do not work, just returning to the same page. These all use
relative URLs with "../" in them.

So as far as I can tell, rendering of a particular URL, including the anchor,
works fine. But the URL handler does not produce the correct absolute URL given
a relative link when passed:

- an anchor with no filename, meaning within the same page

- an up-link using some number of "../" components

It seems that *some* up-links work, e.g. in

jar:http://www.netbeans.org/download/dev/javadoc/CompilerAPI.zip!/org/openide/compiler/doc-files/api.html

the link near the top with HTML

<a href="../../cookies/CompilerCookie.html"><code>CompilerCookie</code></a>

correctly jumps to

jar:http://www.netbeans.org/download/dev/javadoc/CompilerAPI.zip!/org/openide/cookies/CompilerCookie.html

Not sure what the difference is. My guess is that when the "../" components
equal the number of directory components in the URL after "!/", so that at the
end of all the "../"s but before the down-links you are at the root of the JAR,
the handler fails.
The attached patch does not seem to work.

From

jar:http://www.netbeans.org/download/dev/javadoc/CompilerAPI.zip!/org/openide/compiler/doc-files/api.html

the link "Overview" (../../../../overview-summary.html) points to the bogus

jar:http://www.netbeans.org/download/dev/javadoc/CompilerAPI.zip!/org/openide/compiler/doc-files

For what I presume are similar reasons, the stylesheet does not work: compare to
the otherwise identical page using the http: protocol:

http://www.netbeans.org/download/dev/javadoc/CompilerAPI/org/openide/compiler/doc-files/api.html

So up-links pointing all the way up to the root are still broken with the patch
(though apparently in a slightly different way: -> .../doc-files rather than
back to the same .../doc-files/api.html).

Similarly, in

jar:http://www.netbeans.org/download/dev/javadoc/InputOutputAPI.zip!/org/openide/windows/InputOutput.html

the link "NULL" now points to the (different but also bogus)

jar:http://www.netbeans.org/download/dev/javadoc/InputOutputAPI.zip!/org/openide/windows

It is simple to check what the intended behavior should be. Just go to

http://www.netbeans.org/download/dev/javadoc/

and in one browser tab, open one of the unpacked API sets, like

http://www.netbeans.org/download/dev/javadoc/XXX/index.html

and in the other tab type in the URL

jar:http://www.netbeans.org/download/dev/javadoc/XXX.zip!/index.html

and pay attention to where links take you. Obviously they should behave
identically - they are built from the same HTML sources.
I think there was a typo in your patch: rather than

char first = path.First();

I believe you meant

char first = relativePath.First();

I don't follow C++ too well so this is just a guess, but with that replacement
at least anchors within a page appear to work. Links all the way up to the root
of the ZIP and back still do not.
Try this patch instead, based on Andreas' code but with several fixes to that:

1. Examine the first char of relativePath, not path.

2. Handle relativePath == "", in which case cannot call First() on it.

3. net_ResolveRelativePath apparently:

a. Does not handle a root of "", i.e. from "a/", "../b" gives an error.

b. Does not even appear to handle a root "/" correctly, i.e. from "/a/", "../b"
still gives an error.

c. But from "/XXX/a/", "../b" correctly gives "/XXX/b", so do that. A complete
hack and probably net_ResolveRelativePath should be corrected instead, but this
seems to work.

I tried to read

http://www.mozilla.org/projects/xpcom/string-guide.html

but was completely baffled, so I have no idea if the string handling here is
correct. Maybe it leaks memory etc.

Anyway I tried a build (using somewhat old but post-1.2 dev sources plus this
patch) and it seemed to work correctly on everything I threw at it,
jar-protocol-wise. All the link problems I enumerated in the bug report appear
to be fixed by that patch. Commented-out printf's are handy for seeing what it
is doing.
Ah ... okay path versus relativePath was a typo. And I'm now considering using
the same code to resolve relative paths as does nsStandardURL. Seems to be in
much better shape.
Attachment #115927 - Attachment is obsolete: true
Attachment #115963 - Attachment is obsolete: true
Attachment #115971 - Attachment is obsolete: true
This patch should do it. Please throw at it what you have. Unrelated to this
(with or without the patch) I see a lot of assertions when navigating the jar file:

###!!! ASSERTION: Overwriting an existing document channel!: '(loadFlags & nsICh
annel::LOAD_REPLACE) || !(mDocumentRequest.get())', file nsDocLoader.cpp, line 5
32
Break: at file nsDocLoader.cpp, line 532
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [n
sIURI.hostPort]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame
 :: chrome://navigator/content/nsBrowserStatusHandler.js :: anonymous :: line 35
0"  data: no]
************************************************************
Document jar:http://www.netbeans.org/download/dev/javadoc/CompilerAPI.zip!/org/o
penide/compiler/doc-files/api.html loaded successfully


darin should look at this.
Comment on attachment 115972 [details] [diff] [review]
even better patch to net_ResolveRelativePath

andreas: don't worry too much about those assertions.  it's caused by having
two channels in the load group with the LOAD_DOCUMENT_URI load flag set.  the
jar channel just needs to be smarter about configuring the load flags for the
"inner" channel.

adding this patch to my review list... i should get to this in the next few
days.  thx andreas!
Attachment #115972 - Flags: review?(darin)
reassigning bug to andreas (bounce it back to me if you don't have time to see
this patch through).
Assignee: new-network-bugs → andreas.otte
I have time, after 03-10-03 ;-)
Comment on attachment 115972 [details] [diff] [review]
even better patch to net_ResolveRelativePath

>Index: mozilla/netwerk/base/src/nsURLHelper.cpp

>+                if (path.Length() == 0) 
>+                    return NS_ERROR_MALFORMED_URI;
>                 PRInt32 offset = path.Length() - (needsDelim ? 1 : 2);

hmm.. what if path == "/", then wouldn't offset == -1?	needsDelim == FALSE
when path.Last == '/', right?


>Index: mozilla/netwerk/protocol/jar/src/nsJARURI.cpp
>+    switch (first) {
...
>+    case '?':
>+        pos = path.RFind("?");
>+        if (pos >= 0)
>+            path.Truncate(pos);
>+        break;
>+    case '#':
>+    case '\0':
>+        pos = path.RFind("#");
>+        if (pos >= 0)
>+            path.Truncate(pos);
>+        break;

how about this instead:

      case '\0':
	  first = '#';
      case '?':
      case '#':
	  pos = path.RFindChar(first);
	  if (pos >= 0)
	      path.Truncate(pos);
	  break;   


>+    default:
>+        pos = path.RFind("/");

use RFindChar instead.
Attachment #115972 - Flags: review?(darin) → review-
>(From update of attachment 115972 [details] [diff] [review])
>>Index: mozilla/netwerk/base/src/nsURLHelper.cpp
>
>>+                if (path.Length() == 0) 
>>+                    return NS_ERROR_MALFORMED_URI;
>>                 PRInt32 offset = path.Length() - (needsDelim ? 1 : 2);
>
>hmm.. what if path == "/", then wouldn't offset == -1?	needsDelim == FALSE
>when path.Last == '/', right?

I don't see how that could happen: If path is only '/' that code can not be
reached because name is not "..". And if path gets truncated to '/' during the
process needsDelim will aready be PR_TRUE.
Status: NEW → ASSIGNED
Attached patch updated patch (obsolete) — Splinter Review
Added updated patch, also removed long overdue detection of \ as delimiter
Attachment #115972 - Attachment is obsolete: true
Attachment #116958 - Flags: review?(darin)
Attachment #116958 - Flags: review?(darin)
Attached patch corrected a little problem (obsolete) — Splinter Review
Attachment #116958 - Attachment is obsolete: true
Attachment #116959 - Flags: review?(darin)
After thinking more about this offset == -1 can happen, so I added an
additional check.
Attachment #116959 - Attachment is obsolete: true
Attachment #116989 - Flags: review?(darin)
Attachment #116959 - Flags: review?(darin)
Comment on attachment 116989 [details] [diff] [review]
added a check for offset < 0

>Index: mozilla/netwerk/base/src/nsURLHelper.cpp

>-		needsDelim = !(last == '/' || last == '\\' );
>+		needsDelim = !(last == '/');

are you sure this is the right thing to do?  is it going to cause
some URLs with '\\' to suddenly stop working?  what are the tradeoffs
here?


>           case '/':
>-          case '\\':

same question here.


>Index: mozilla/netwerk/protocol/jar/src/nsJARURI.cpp

>+    PRInt32 pos = nsnull;

  PRInt32 pos = 0;
Removing the \-handling would make this jar-resolver consistent with everything
else in necko. Remember we removed handling \ from the nsStandardUrl-Resolver a
long time ago. All that urifixup is now done in doc/webshell. As I said this is
long overdue and looking at the old code \-handling would not have worked
because while it was resolving ".." it was only looking for /.
Comment on attachment 116989 [details] [diff] [review]
added a check for offset < 0

ok, fair enough... r/sr=darin with "PRInt32 pos = 0" ;-)
Attachment #116989 - Flags: review?(darin) → review+
Already in my code, one wonders what comes up looking at code that hasn't been
touched for a while.
 
Attachment #116989 - Flags: superreview?(bzbarsky)
Comment on attachment 116989 [details] [diff] [review]
added a check for offset < 0

>                 PRInt32 offset = path.Length() - (needsDelim ? 1 : 2);
>+                // First check for errors
>+                if (path.Length() == 0 || offset < 0 ) 
>+                    return NS_ERROR_MALFORMED_URI;

This is redundant -- path.Length() == 0 implies offset < 0.  Just do the offset
check?

>+    case '\0':
>+        first = '#';

Be nice -- add a "// falling through on purpose" comment here, ok?  To be
truthful, I don't see why you can't just do:

+    char first = relativePath.Length() > 0 ? relativePath.First() : '#';

And dispense with this oddity....

sr=me with the nits picked.
Attachment #116989 - Flags: superreview?(bzbarsky) → superreview+
Thanks Boris, fixed the nits, this will go in over the weekend
Target Milestone: Future → mozilla1.4alpha
fix is in.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Appears to be working OK in the trunk, thanks!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: