jar: anchors within pages drop the HTML file name

RESOLVED FIXED in mozilla1.4alpha



16 years ago
15 years ago


(Reporter: Jesse Glick, Assigned: Andreas Otte)




Firefox Tracking Flags

(Not tracked)




(2 attachments, 6 obsolete attachments)



16 years ago
Build 2001112012 (0.9.6 release), RHL 6.1, Classic skin

Just for fun, I was browsing Javadoc HTML files packed into a ZIP file, using
the syntax:


This works for the most part. You can see the HTML pages and click between them.
(The URL above should let you download similar ZIP files, e.g. OpenAPIs.zip. Let
me know if you have trouble reproducing and need more exact instructions.)

However anchors in these pages seem not to work. Internal hyperlinks within
Javadoc-generated pages (for example ClassName.html) seem to be missing; I am
not sure what is wrong there, it may not be related, so don't worry too much.

The main thing is that hand-written HTML pages (in the OpenAPIs.zip example, try
org/openide/text/doc-files/api.html) which have internal links specified like this:

<a href="#someanchor">...</a>
<a name="someanchor">...</a>

do not work. When browsed as file: URLs (unpacked HTML files), it is fine;
clicking such a link scrolls the page to the anchor. However when wrapped in the
jar: URL, the location toolbar shows e.g.:


Of course the final component should be:


And the browser pane rather than display the HTML page shows:


(i.e. trivial HTML tags displayed as plain text).

So it seems that the algorithm for applying a simple anchor (with no page name)
to a jar: URL is broken, as it drops the file name.
Ever confirmed: true


16 years ago
Component: Networking: File → Parser
Keywords: helpwanted
Target Milestone: --- → Future

Comment 1

16 years ago
It looks like Resolve(...) in netwerk/protocol/jar/src/nsJARURI.cpp is too
simplistic and does not handle the cases covered by Resolve(...) in
netwerk/base/src/nsStandardURL.cpp, specifically the clause checking for
relative URIs beginning with '#' which should only replace the anchor and not
the path. I know little C++ (much less the Moz. macros) or I might suggest a
Component: Parser → Networking

Comment 2

16 years ago
Mozilla1.1b, RH7.1

Not only anchors but also links like <a href="../../org/pkg/AnotherClass.html">
don't work when jar:file:///path/to/javadoc.zip!/org/pkg/Main.html

Note that although it fails to create correct URL when it is used within one
frame or without frames these links are working when <a> tag contains abtribute
target that refers to another frame.

Comment 3

16 years ago
reset to default owners.
Assignee: dougt → new-network-bugs
Summary: jar:file: URLs work, but anchors within pages drop the HTML file name → jar: anchors within pages drop the HTML file name

Comment 4

15 years ago
Created attachment 115927 [details] [diff] [review]
patch to hopefully fix the anchor problem 

Does anyone have a nice jar that contains anchors to test the patch against?

Comment 5

15 years ago
Sure. Using Mozilla 1.2.1, click here:


Standard Javadoc output, ZIP'd up. Seems to display fine.

Try clicking "Description". Broken, should link to


but links to the nonsense


Now go back to the first page and click "InputOutput". Fine, displays


Most of the internal links are broken. For example, click "NULL". Should display


but in fact the links just takes you back to


(at the top of the page, no anchor).

Try the links at the top of the page - "Package" ... "Tree". The first three
work as expected. They have relative URLs referring to a different file with no
"../" component, e.g. "package-summary.html" or "class-use/InputOutput.html".
But the last three do not work, just returning to the same page. These all use
relative URLs with "../" in them.

So as far as I can tell, rendering of a particular URL, including the anchor,
works fine. But the URL handler does not produce the correct absolute URL given
a relative link when passed:

- an anchor with no filename, meaning within the same page

- an up-link using some number of "../" components

It seems that *some* up-links work, e.g. in


the link near the top with HTML

<a href="../../cookies/CompilerCookie.html"><code>CompilerCookie</code></a>

correctly jumps to


Not sure what the difference is. My guess is that when the "../" components
equal the number of directory components in the URL after "!/", so that at the
end of all the "../"s but before the down-links you are at the root of the JAR,
the handler fails.

Comment 6

15 years ago
The attached patch does not seem to work.



the link "Overview" (../../../../overview-summary.html) points to the bogus


For what I presume are similar reasons, the stylesheet does not work: compare to
the otherwise identical page using the http: protocol:


So up-links pointing all the way up to the root are still broken with the patch
(though apparently in a slightly different way: -> .../doc-files rather than
back to the same .../doc-files/api.html).

Similarly, in


the link "NULL" now points to the (different but also bogus)


It is simple to check what the intended behavior should be. Just go to


and in one browser tab, open one of the unpacked API sets, like


and in the other tab type in the URL


and pay attention to where links take you. Obviously they should behave
identically - they are built from the same HTML sources.

Comment 7

15 years ago
I think there was a typo in your patch: rather than

char first = path.First();

I believe you meant

char first = relativePath.First();

I don't follow C++ too well so this is just a guess, but with that replacement
at least anchors within a page appear to work. Links all the way up to the root
of the ZIP and back still do not.

Comment 8

15 years ago
Created attachment 115963 [details] [diff] [review]
Revised patch that seems to work - definitely needs review

Try this patch instead, based on Andreas' code but with several fixes to that:

1. Examine the first char of relativePath, not path.

2. Handle relativePath == "", in which case cannot call First() on it.

3. net_ResolveRelativePath apparently:

a. Does not handle a root of "", i.e. from "a/", "../b" gives an error.

b. Does not even appear to handle a root "/" correctly, i.e. from "/a/", "../b"
still gives an error.

c. But from "/XXX/a/", "../b" correctly gives "/XXX/b", so do that. A complete
hack and probably net_ResolveRelativePath should be corrected instead, but this
seems to work.

I tried to read


but was completely baffled, so I have no idea if the string handling here is
correct. Maybe it leaks memory etc.

Anyway I tried a build (using somewhat old but post-1.2 dev sources plus this
patch) and it seemed to work correctly on everything I threw at it,
jar-protocol-wise. All the link problems I enumerated in the bug report appear
to be fixed by that patch. Commented-out printf's are handy for seeing what it
is doing.

Comment 9

15 years ago
Ah ... okay path versus relativePath was a typo. And I'm now considering using
the same code to resolve relative paths as does nsStandardURL. Seems to be in
much better shape.

Comment 10

15 years ago
Created attachment 115971 [details] [diff] [review]
updated patch with a fix to net_ResolveRelativePath
Attachment #115927 - Attachment is obsolete: true

Comment 11

15 years ago
Created attachment 115972 [details] [diff] [review]
even better patch to net_ResolveRelativePath
Attachment #115963 - Attachment is obsolete: true
Attachment #115971 - Attachment is obsolete: true

Comment 12

15 years ago
This patch should do it. Please throw at it what you have. Unrelated to this
(with or without the patch) I see a lot of assertions when navigating the jar file:

###!!! ASSERTION: Overwriting an existing document channel!: '(loadFlags & nsICh
annel::LOAD_REPLACE) || !(mDocumentRequest.get())', file nsDocLoader.cpp, line 5
Break: at file nsDocLoader.cpp, line 532
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [n
sIURI.hostPort]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame
 :: chrome://navigator/content/nsBrowserStatusHandler.js :: anonymous :: line 35
0"  data: no]
Document jar:http://www.netbeans.org/download/dev/javadoc/CompilerAPI.zip!/org/o
penide/compiler/doc-files/api.html loaded successfully

darin should look at this.

Comment 13

15 years ago
Comment on attachment 115972 [details] [diff] [review]
even better patch to net_ResolveRelativePath

andreas: don't worry too much about those assertions.  it's caused by having
two channels in the load group with the LOAD_DOCUMENT_URI load flag set.  the
jar channel just needs to be smarter about configuring the load flags for the
"inner" channel.

adding this patch to my review list... i should get to this in the next few
days.  thx andreas!
Attachment #115972 - Flags: review?(darin)

Comment 14

15 years ago
reassigning bug to andreas (bounce it back to me if you don't have time to see
this patch through).
Assignee: new-network-bugs → andreas.otte

Comment 15

15 years ago
I have time, after 03-10-03 ;-)

Comment 16

15 years ago
Comment on attachment 115972 [details] [diff] [review]
even better patch to net_ResolveRelativePath

>Index: mozilla/netwerk/base/src/nsURLHelper.cpp

>+                if (path.Length() == 0) 
>+                    return NS_ERROR_MALFORMED_URI;
>                 PRInt32 offset = path.Length() - (needsDelim ? 1 : 2);

hmm.. what if path == "/", then wouldn't offset == -1?	needsDelim == FALSE
when path.Last == '/', right?

>Index: mozilla/netwerk/protocol/jar/src/nsJARURI.cpp
>+    switch (first) {
>+    case '?':
>+        pos = path.RFind("?");
>+        if (pos >= 0)
>+            path.Truncate(pos);
>+        break;
>+    case '#':
>+    case '\0':
>+        pos = path.RFind("#");
>+        if (pos >= 0)
>+            path.Truncate(pos);
>+        break;

how about this instead:

      case '\0':
	  first = '#';
      case '?':
      case '#':
	  pos = path.RFindChar(first);
	  if (pos >= 0)

>+    default:
>+        pos = path.RFind("/");

use RFindChar instead.
Attachment #115972 - Flags: review?(darin) → review-

Comment 17

15 years ago
>(From update of attachment 115972 [details] [diff] [review])
>>Index: mozilla/netwerk/base/src/nsURLHelper.cpp
>>+                if (path.Length() == 0) 
>>+                    return NS_ERROR_MALFORMED_URI;
>>                 PRInt32 offset = path.Length() - (needsDelim ? 1 : 2);
>hmm.. what if path == "/", then wouldn't offset == -1?	needsDelim == FALSE
>when path.Last == '/', right?

I don't see how that could happen: If path is only '/' that code can not be
reached because name is not "..". And if path gets truncated to '/' during the
process needsDelim will aready be PR_TRUE.

Comment 18

15 years ago
Created attachment 116958 [details] [diff] [review]
updated patch

Added updated patch, also removed long overdue detection of \ as delimiter
Attachment #115972 - Attachment is obsolete: true


15 years ago
Attachment #116958 - Flags: review?(darin)


15 years ago
Attachment #116958 - Flags: review?(darin)

Comment 19

15 years ago
Created attachment 116959 [details] [diff] [review]
corrected a little problem
Attachment #116958 - Attachment is obsolete: true


15 years ago
Attachment #116959 - Flags: review?(darin)

Comment 20

15 years ago
Created attachment 116989 [details] [diff] [review]
added a check for offset < 0

After thinking more about this offset == -1 can happen, so I added an
additional check.
Attachment #116959 - Attachment is obsolete: true


15 years ago
Attachment #116989 - Flags: review?(darin)


15 years ago
Attachment #116959 - Flags: review?(darin)

Comment 21

15 years ago
Comment on attachment 116989 [details] [diff] [review]
added a check for offset < 0

>Index: mozilla/netwerk/base/src/nsURLHelper.cpp

>-		needsDelim = !(last == '/' || last == '\\' );
>+		needsDelim = !(last == '/');

are you sure this is the right thing to do?  is it going to cause
some URLs with '\\' to suddenly stop working?  what are the tradeoffs

>           case '/':
>-          case '\\':

same question here.

>Index: mozilla/netwerk/protocol/jar/src/nsJARURI.cpp

>+    PRInt32 pos = nsnull;

  PRInt32 pos = 0;

Comment 22

15 years ago
Removing the \-handling would make this jar-resolver consistent with everything
else in necko. Remember we removed handling \ from the nsStandardUrl-Resolver a
long time ago. All that urifixup is now done in doc/webshell. As I said this is
long overdue and looking at the old code \-handling would not have worked
because while it was resolving ".." it was only looking for /.

Comment 23

15 years ago
Comment on attachment 116989 [details] [diff] [review]
added a check for offset < 0

ok, fair enough... r/sr=darin with "PRInt32 pos = 0" ;-)
Attachment #116989 - Flags: review?(darin) → review+

Comment 24

15 years ago
Already in my code, one wonders what comes up looking at code that hasn't been
touched for a while.


15 years ago
Attachment #116989 - Flags: superreview?(bzbarsky)
Comment on attachment 116989 [details] [diff] [review]
added a check for offset < 0

>                 PRInt32 offset = path.Length() - (needsDelim ? 1 : 2);
>+                // First check for errors
>+                if (path.Length() == 0 || offset < 0 ) 
>+                    return NS_ERROR_MALFORMED_URI;

This is redundant -- path.Length() == 0 implies offset < 0.  Just do the offset

>+    case '\0':
>+        first = '#';

Be nice -- add a "// falling through on purpose" comment here, ok?  To be
truthful, I don't see why you can't just do:

+    char first = relativePath.Length() > 0 ? relativePath.First() : '#';

And dispense with this oddity....

sr=me with the nits picked.
Attachment #116989 - Flags: superreview?(bzbarsky) → superreview+

Comment 26

15 years ago
Created attachment 117229 [details] [diff] [review]
final patch for reference

Thanks Boris, fixed the nits, this will go in over the weekend


15 years ago
Target Milestone: Future → mozilla1.4alpha

Comment 27

15 years ago
fix is in.
Last Resolved: 15 years ago
Resolution: --- → FIXED

Comment 28

15 years ago
Appears to be working OK in the trunk, thanks!
You need to log in before you can comment on or make changes to this bug.