Closed Bug 515401 Opened 10 years ago Closed 10 years ago

Content sink should handle <base> tags in an HTML5-compliant way

Categories

(Core :: HTML: Parser, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla2.0

People

(Reporter: justin.lebar+bug, Assigned: sicking)

References

Details

(Keywords: html5, student-project, Whiteboard: [good first bug])

Attachments

(4 files)

Our content sink currently handles <base> tags as follows: The document's base URI is resolved from the last <base> tag before the <body>.  <base>s which appear after the <body> change the base of elements which appear after them.

HTML5 [1] specifies that all but the first <base> tag are ignored.

[1] http://www.whatwg.org/specs/web-apps/current-work/#the-base-element
> <base>s which appear after the <body> change the base of elements which appear
> after them.

I'm pretty sure this is required for web compat.  If the HTML5 spec doesn't say anything about it, it might need changing.

What does IE do here?
IE does what HTML5 defines (they changed their behavior in IE7).
Attached file Testcase
Firefox resolves links relative to the <base> which appears above them, but IE8 does not.

It seems to me that IE8's behavior isn't exactly as HTML5 specifies.  HTML5 says to use the first <base>, but in this testcase, IE8 ignores all the <base>s.  Based upon my reading of the spec, the conformant thing to do here would be to resolve all links relative to the first <base>, even though it appears in the middle of the document.
> IE does what HTML5 defines (they changed their behavior in IE7).

In quirks mode too?
I get the same behavior in my testcase when I turn on compatibility mode in IE8. I'll work on getting VMs with IE6 and IE7, but they might take a while to build.
What happens on your testcase if you remove the doctype?
Same behavior in the testcase without the doctype in IE8, both in and out of compatibility mode.
Hmm.  Ok, in that case I'd be only too happy to rip out all the complexity we have around base-in-body.

As far as the HTML5 behavior, should the spec be changed for <base> not in head?
I can verity comment #2: IE6 handles <base> in <body> like we do, but IE7 behaves like IE8 and ignores <base> in <body>.

re comment #8: The suggestion is to change the spec so <base> after <body> is ignored?  I'd be OK with that.
FWIW, webkit appears to take the last <base> regardless of where it appears and apply it to the whole document.  (In my testcase, Chrome resolves all the links relative to http://mozilla.org, the href of the last base.)
> The suggestion is to change the spec so <base> after <body> is ignored? 

It's what IE does, right?
(In reply to comment #11)
> > The suggestion is to change the spec so <base> after <body> is ignored? 
> 
> It's what IE does, right?
AFAICT, yes, that's what IE7 and IE8 do.
Regarding multiple <base> tags: I talked a lot with hixie about this when he wrote that part of the HTML5 spec. He ran some pretty extensive testing which showed that very very few pages broke if we remove the support for multiple <base> tags that we have now.

I don't know if he ran any tests on pages with a single <base>, but where that base appeared in the <body>. However if IE doesn't honor that then I definitely agree we should nuke it.
Keywords: html5
Duplicate of this bug: 522658
Keywords: student-project
Whiteboard: [good first bug]
Fixing this will obsolete AddBaseTagInfo(), right? (I noticed the HTML5 parser never calls it.)
Yes.  And it would also allow simplifying nsGenericHTMLElement::GetBaseURI and the like.
Patch coming which cleans up our implementation of <base> in several ways:

* Get rid of support for multiple <base> tags. IE has dropped it and Hixie ran
  some pretty thorough tests showing it was essentially not relied upon
* Make us follow the HTML5 spec and only honor <base> elements that are a
  direct child of <head>
* Get rid of the code in the various content sinks to call
  nsIDocument::SetBaseURI and nsIDocument::SetBaseTarget
* Make the element implementation itself call nsIDocument::SetBaseURI as needed
* Make nsIDocument::GetBaseTarget traverse the DOM as needed to see if a base
  target has been set. This should be a rare operation so perf shouldn't matter
* Get rid of CheckLoadURI check from nsIDocument::SetBaseURI as there really
  should be no reason to keep it and it's not a good extra layer of security
  anyway.
Assignee: nobody → jonas
Comment on attachment 440099 [details] [diff] [review]
Patch to fix

Patch clearly does what was promised in comment #17.
Attachment #440099 - Flags: review?(bnewman) → review+
Checked in.

http://hg.mozilla.org/mozilla-central/rev/4ac0e4ef6e6b
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Looks like the test case here fails if the HTML5 parser is enabled.
The changes to the HTML5 parser and to the XML content sink look similar. sicking, did you test this patch with XML content sink?
It's a test case bug.
Attachment #442048 - Flags: review?(Olli.Pettay)
The HTML5 parser doesn't move <base> to <head>, so content/html/content/test/file_bug209275_3.html became bogus when this bug was fixed.
Attachment #442051 - Flags: review?(Olli.Pettay)
Attachment #442048 - Flags: review?(Olli.Pettay) → review+
Attachment #442051 - Flags: review?(Olli.Pettay) → review+
I also pushed http://hg.mozilla.org/mozilla-central/rev/060ddfe5aa48 as an orange fix but didn't investigate further why that change helped. (nextElementSibling was undefined.)
> (nextElementSibling was undefined.)

Presumably because head.firstChild was now a textnode?
Depends on: 600809
Duplicate of this bug: 609323
Duplicate of this bug: 609323
Duplicate of this bug: 553074
Target Milestone: --- → mozilla2.0
You need to log in before you can comment on or make changes to this bug.