Open Bug 436006 Opened 16 years ago Updated 7 months ago

"ASSERTION: Not a UTF-8 string" with <a name="%e2g">

Tracking

()

Status:

NEW

People

(Reporter: jruderman, Unassigned)

References

(
URL
)

Details

(Keywords: assertion, testcase)

Attachments

(4 files)

testcase 16 years ago Jesse Ruderman 85 bytes, text/html		Details
stack trace 16 years ago Jesse Ruderman 6.21 KB, text/plain		Details
Testcase 15 years ago Simon Montagu :smontagu 12.03 KB, text/html; charset=iso-8859-1		Details
Attempt to express the behaviour described on whatwg.org in the testcase 15 years ago Simon Montagu :smontagu 12.61 KB, text/html; charset=iso-8859-1		Details

Jesse Ruderman

Reporter

Description

•

16 years ago

Attached file testcase — Details

Loading the testcase triggers:

###!!! ASSERTION: Not a UTF-8 string. This code should only be used for converting from known UTF-8 strings.: 'Error', file ../../../dist/include/string/nsUTF8Utils.h, line 596

###!!! ASSERTION: not a UTF8 string: 'Error', file ../../dist/include/string/nsUTF8Utils.h, line 130

###!!! ASSERTION: Input wasn't UTF8 or incorrect length was calculated: 'Error', file /Users/jruderman/central/mozilla/xpcom/string/src/nsReadableUtils.cpp, line 287

I found this bug by simply loading random web sites and looking for interesting assertion failures.

Jesse Ruderman

Reporter

Comment 1

•

16 years ago

Attached file stack trace — Details

Jesse Ruderman

Reporter

Comment 2

•

16 years ago

The stack trace blames HTMLContentSink::AddAttributes.

Boris Zbarsky [:bzbarsky]

Comment 3

•

16 years ago

Yeah, we nsUnescape @name on <html:a>.  Which could introduce all sorts of random bytes...

Flags: wanted1.9.0.x?

Samuel Sidler (old account; do not CC)

Updated

•

16 years ago

Flags: wanted1.9.1?

Flags: wanted1.9.0.x?

Flags: wanted1.9.0.x+

Josh Aas

Updated

•

16 years ago

OS: Mac OS X → All

Simon Montagu :smontagu

Comment 5

•

15 years ago

So I was going to ask why we assume that @name is in UTF-8, rather than the document encoding, and found the answer in bug 87966 comment 5: the HTML spec requires it (http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars).

So strictly |<a name=%e2g>| is invalid, but surely the code in nsHTMLContentSink::AddAttributes should at least check for non-UTF8 @name, and maybe even try falling back to the page encoding?

What do other UAs do?

Simon Montagu :smontagu

Comment 6

•

15 years ago

Clarification: when I say "non-UTF8" I mean "non-UTF8 when unescaped".

Boris Zbarsky [:bzbarsky]

Comment 7

•

15 years ago

I believe that other UAs treat it as UTF-8, but it's worth experimenting.

Simon Montagu :smontagu

Comment 8

•

15 years ago

As far as I can tell, no other UA unescapes non-ASCII characters in name attributes, ever, in any encoding. I tested Safari 3.2.3 on Mac, and IE7, IE8, Opera9.52 and Opera9.64 on Windows.

Given that, I suggest that we should test whether the name is valid UTF-8 after unescaping, and leave it as escaped if not.

Simon Montagu :smontagu

Comment 9

•

15 years ago

The trouble with that suggestion seems to be that it breaks non-UTF-8 hrefs to non-UTF-8 anchors.

Simon Montagu :smontagu

Comment 10

•

15 years ago

Attached file Testcase — Details

This testcase incorporates Jesse's "named anchors" bookmarklet from squarefree.com to display the anchor names.

I have a patch that makes all the anchors and links to them work as expected, which I will attach after more testing (e.g. with escaped codepoints that are valid in neither UTF-8 nor the page encoding)

Assignee: nobody → smontagu

Simon Montagu :smontagu

Updated

•

15 years ago

Attachment #381731 - Attachment mime type: text/html → text/html; charset=iso-8859-1

Boris Zbarsky [:bzbarsky]

Comment 11

•

15 years ago

Simon, how would that testcase fare if we instead just implemented http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#scroll-to-fragid ?

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 483660

Simon Montagu :smontagu

Comment 12

•

15 years ago

If I am interpreting that whatwg reference correctly, then the other UAs mentioned in comment 8 are conforming to it and we are not, at least with respect to parsing the name attributes.

With respect to following the hrefs, IE's behaviour seems so be the most conformant: basically the links should only work (i.e. scroll to the anchors), when the link encoding is the same as the anchor encoding (ignoring NCRs as a separate can of worms for the time being).

I think that for us to implement it would mean cutting out the whole block in nsHTMLContentSink::AddAttributes that unescapes the name attribute, and modifying nsPresShell::GoToAnchor and its callers to search for id and name attributes separately with different input, and changing the fallback if UTF-8 conversion fails to use the escaped string rather than attempting to decode the unescaped string in the document charset. Does that sound right?

Those callers also assert as they stand, by the way, but we can prevent that by calling IsUTF8 before conversion instead of attempting to convert all input as UTF-8 and falling back if the conversion fails.

Simon Montagu :smontagu

Comment 13

•

15 years ago

Attached file Attempt to express the behaviour described on whatwg.org in the testcase — Details

Boris Zbarsky [:bzbarsky]

Comment 14

•

15 years ago

> Does that sound right?

Yes.  Would that work with your first testcase?

Simon Montagu :smontagu

Updated

•

15 years ago

Attachment #382008 - Attachment mime type: text/html → text/html; charset=iso-8859-1

Simon Montagu :smontagu

Comment 15

•

15 years ago

(In reply to comment #14)
> Would that work with your first testcase?

What are you asking exactly? The two testcases are the same, except for the prose describing the expected results, and except that I forgot to specify the charset in the second one until this moment ;-)

Boris Zbarsky [:bzbarsky]

Comment 16

•

15 years ago

I guess my question was really two questions:

1)  Does the HTML5 spec give the expected behavior on the first testcase?
2)  If not, is that because of a case that we think should actually work, or just
    because the testcase is exhaustive in terms of testing things that we make
    work right now?

Simon Montagu :smontagu

Comment 17

•

15 years ago

So after reading a bunch of related bugs that Boris pointed me to, I would say that in the long term we probably want to do what HTML5 says, but making any single change in that direction would be like pushing over a domino, and we can't really do it without considering where every other domino would fall.

In the short term, we might want to take a patch as described in comment 10, which makes anchors as in http://mtv3.fi (from bug 496336) not be totally broken, and is, I think, internally consistent with what we do already.

Simon Montagu :smontagu

Comment 18

•

15 years ago

That said, I'm not sure that there is a "short term" at this point -- we probably want to do the "long term" fix by the next release, and it's much too late to get the "short term" fix in for 1.9.1.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 19

•

15 years ago

Yeah, I don't think any changes here would be safe enough for 1.9.1. And since it's not flagged as a regression (is this correct) I doubt that it's worth taking on a branch.

If this is indeed a regression I think we should try to get it into 1.9.1.x though.

Samuel Sidler (old account; do not CC)

Updated

•

15 years ago

Flags: wanted1.9.1? → wanted1.9.1.x?

Simon Montagu :smontagu

Comment 20

•

15 years ago

It's not a regression: the code that causes the assertion was checked in back in 2001 in bug 87996.

Samuel Sidler (old account; do not CC)

Updated

•

15 years ago

Flags: wanted1.9.1.x?

Flags: wanted1.9.0.x+

Flags: blocking1.9.2?

Johnny Stenback (:jst)

Comment 21

•

15 years ago

Not holding 1.9.2 for this.

Flags: wanted-next+

Flags: blocking1.9.2?

Flags: blocking1.9.2-

Jesse Ruderman

Reporter

Comment 22

•

15 years ago

WFM on trunk.

Simon Montagu :smontagu

Comment 23

•

15 years ago

I still see the assertions on trunk (OS X and Linux)

Jesse Ruderman

Reporter

Comment 24

•

14 years ago

I only see the assertions if I set html5.enable to false.

Boris Zbarsky [:bzbarsky]

Comment 25

•

14 years ago

That's because the html5 parser knocked over one of the dominoes mentioned in comment 17 but not the others.  So now we don't assert, but we're not web-compatible either.

Jesse Ruderman

Reporter

Comment 26

•

14 years ago

Now I get the assertions with the default setting (html5.enable true).

Henri Sivonen (:hsivonen)

Comment 27

•

14 years ago

(In reply to comment #26)
> Now I get the assertions with the default setting (html5.enable true).

That's because I carefully copied and pasted the old bogus code to the HTML5 side in order to keep fragment refs working in the Gecko 2.0 timeframe.

BugBot [:suhaib / :marco/ :calixte]

Comment 28

•

2 years ago

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: smontagu → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Mathew Hodson

Updated

•

7 months ago