Open Bug 1180623 Opened 9 years ago Updated 2 years ago

Stop choking at badly formatted application/xhtml+xml documents

Categories

(Core :: DOM: HTML Parser, defect)

41 Branch
defect

Tracking

()

People

(Reporter: julienw, Unassigned)

References

()

Details

Attachments

(3 files)

We're now in a world where HTML documents are parsed in a permissive way. The XML-for-web era never went off and is now basically dead.

Now is a good time to stop displaying an error when encountering a badly formatted application/xhtml+xml document, and instead parse it with the HTML5 parser.

This is especially encountered on GMail's basic mobile interface that is served to Firefox OS, and as a result this impairs a lot the system. I've seen this when trying to display an attachment, and when loading the search interface.

How to reproduce:
1. set your user agent to something mobile (eg: Firefox OS user agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0)
2. load gmail.com, log in.
3. tap the 'search' (the magnifying glass) button
OS: Unspecified → All
Hardware: Unspecified → All
Version: unspecified → 41 Branch
Attached image Error on gmail
Of course there is an issue at GMail's end too. But I think this is a good example of why we should relax our behavior here.
Attached file error_gmail.xhtml
Copy of the XHTML file triggering the issue.
I reported the issue for GMail at https://webcompat.com/issues/1347.
Bug 1180625 is probably a more workable approach.
Opera, back in the Presto days, decided to re-parse as HTML when encountering this kind of errors. https://dev.opera.com/blog/no-more-xml-parsing-failed-errors/
(In reply to :Ms2ger from comment #4)
> Bug 1180625 is probably a more workable approach.

But not really a short-term approach.
>> Bug 1180625 is probably a more workable approach.
>
> But not really a short-term approach.

There is an implementation (in Rust) at https://github.com/Ygg01/xml5ever
That looks like Bug 1036987
On the other hand, content designed for iOS and Android don't typically use XHTML either, it is only one site encountering this issue.
I wonder if it is possible to reparse as HTML but disable all JavaScript and other active content.
(In reply to Yuhong Bao from comment #10)
> I wonder if it is possible to reparse as HTML but disable all JavaScript and
> other active content.

Why would this be useful ?

(In reply to Yuhong Bao from comment #9)
> On the other hand, content designed for iOS and Android don't typically use
> XHTML either, it is only one site encountering this issue.

Albeit a very used website.
(In reply to Julien Wajsberg [:julienw] from comment #11)
> (In reply to Yuhong Bao from comment #10)
> > I wonder if it is possible to reparse as HTML but disable all JavaScript and
> > other active content.
> 
> Why would this be useful ?
XSS, and scripts may not expect that the content is being parsed as HTML.
In the failing website (basic mobile gmail, see bug 1036987), the issue comes from the fact that the pages uses a script element without CDATA blocks, and that script element uses the "<" character to do a comparison.

Wondering if we could simply infer CDATA blocks for scripts? Is there a usage for not using CDATA for such cases?
(In reply to Julien Wajsberg [:julienw] from comment #13)
> Wondering if we could simply infer CDATA blocks for scripts? Is there a
> usage for not using CDATA for such cases?

How would that work, exactly?

If you start tweaking the parser’s behavior, you should specify it to give other browsers a chance to interoperate without reverse-engineering. This is what XML5 does.
Chromium tries to display something but the page is also non-functional.

(I forced the Firefox OS UA)
(In reply to Simon Sapin (:SimonSapin) from comment #14)
> (In reply to Julien Wajsberg [:julienw] from comment #13)
> > Wondering if we could simply infer CDATA blocks for scripts? Is there a
> > usage for not using CDATA for such cases?
> 
> How would that work, exactly?

I was more thinking out loud.

In the current case, the issue is with the script part, that's why I thought we could do something here.

But I can see the site here is really broken. It likely works only on very permissive UA. I don't think this is really our role to fix it, at least not by tweaking the parser.

> 
> If you start tweaking the parser’s behavior, you should specify it to give
> other browsers a chance to interoperate without reverse-engineering. This is
> what XML5 does.

Agreed.
Webcompat Priority: --- → ?
Webcompat Priority: ? → ---
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: