Closed
Bug 333535
Opened 19 years ago
Closed 19 years ago
Pango warnings about invalid UTF8 when loading page
Categories
(Core :: Layout: Text and Fonts, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: smontagu)
References
()
Details
Attachments
(3 files, 4 obsolete files)
|
300 bytes,
text/html
|
Details | |
|
1.66 KB,
patch
|
Details | Diff | Splinter Review | |
|
1.06 KB,
patch
|
rbs
:
review+
rbs
:
superreview+
|
Details | Diff | Splinter Review |
STEPS TO REPRODUCE:
1) Build cairo-gtk2 seamonkey build (might work with Firefox, but don't have
one on hand).
2) Load http://gamma.nic.fi/~tapio18/Opetus/index2.php3
ACTUAL RESULTS:
(Gecko:21667): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()
(repeated a bunch of times).
EXPECTED RESULTS: We shouldn't pass bogus data to pango. Or it could be that pango's check needs fixing, of course.
| Reporter | ||
Comment 1•19 years ago
|
||
So as far as I can tell, we're passing a string with embedded nulls (so non-null byte followed by 3 null bytes) into pango??
Is the issue that someone assumes out m1b thing on text fragments means ASCII (when it really means ISO-8859-1 in this case)?
Flags: blocking1.9a1?
Comment 2•19 years ago
|
||
note bug 252950.
Flags: blocking1.9a1? → blocking1.9+
| Reporter | ||
Comment 3•19 years ago
|
||
| Reporter | ||
Comment 4•19 years ago
|
||
Load the index.html
Attachment #240545 -
Attachment is obsolete: true
| Reporter | ||
Comment 5•19 years ago
|
||
Still load index.html
| Assignee | ||
Comment 6•19 years ago
|
||
Which version of pango do you see this with? I can't reproduce it with 1.12.3
Comment 7•19 years ago
|
||
Now, we don't use |pango_layout_*|, therefore, we cannot look the warning message.
But if the warning is true, we may have the bug.
| Reporter | ||
Comment 8•19 years ago
|
||
Simon, this is pango 1.8.1.
| Reporter | ||
Comment 9•19 years ago
|
||
> Now, we don't use |pango_layout_*|
Is that a very recent change? I'm using a tree pulled at MOZ_CO_DATE="Sat Sep 23 00:37:57 CDT 2006"
If this problem will be hidden but not gone if I update, please let me know so that I can avoid doing so until I debug it.
Bug 339513 was checked in on 9-25. I think that may be what changed it.
| Assignee | ||
Comment 11•19 years ago
|
||
I was testing on a tree from Thu Sep 28 08:00:13 IDT 2006
| Reporter | ||
Comment 12•19 years ago
|
||
Analysis coming up, as far as I can take it.
Attachment #240546 -
Attachment is obsolete: true
Attachment #240548 -
Attachment is obsolete: true
| Reporter | ||
Comment 13•19 years ago
|
||
So when I hit that warning in Pango we have:
#0 pango_layout_set_text (layout=0x8978528, text=0xbfffc118 "ä", length=3)
at pango-layout.c:790
790 g_warning ("Invalid UTF8 string passed to pango_layout_set_text()");
(gdb) x/3ub text
0xbfffc118: 195 164 0
Note the null byte at position 2. This is what triggers the warning, per the docs:
* Note that g_utf8_validate() returns %FALSE if @max_len is
* positive and NUL is met before @max_len bytes have been read.
The null byte is already in there in the gfxPangoTextRun:
(gdb) frame 1
#1 0xb6ba4cb7 in gfxPangoTextRun::EnsurePangoLayout (this=0x90a1b08, aContext=0x8fe60f0)
at ../../../../mozilla/gfx/thebes/src/gfxPangoFonts.cpp:836
836 pango_layout_set_text (mPangoLayout, u8str.Data(), u8str.Length());
(gdb) p mString.mLength
$64 = 2
(gdb) x/2uh mString->mData
0x8fc0398: 228 0
Looking up the stack to nsTextFrame::MeasureText, we see:
(gdb) frame 9
#9 0xb71ae534 in nsTextFrame::MeasureText (this=0x8d8ad94, aPresContext=0x91065e0,
aReflowState=@0xbfffcac0, aTx=@0xbfffc820, aTs=@0xbfffc980, aTextData=@0xbfffc7e0)
at ../../../mozilla/layout/generic/nsTextFrame.cpp:5534
5534 aReflowState.rendContext->GetTextDimensions(pWordBuf, lastWordLen, lastWordDimensions);
(gdb) p lastWordLen
$65 = 2
(gdb) x/2uh pWordBuf
0xbfffc840: 228 0
And examining the content node:
(gdb) p mContent->GetText()->Is2b()
$73 = 0 (as expected for ISO-8859-1 content)
(gdb) p mContent->GetText()->GetLength()
$74 = 3
(gdb) p mContent->GetText()->m1b[0]
$68 = 32 ' '
(gdb) p/u mContent->GetText()->m1b[1]
$70 = 228
(gdb) p/u mContent->GetText()->m1b[2]
$71 = 98
So the text fragment is, as expected, ISO-8859-1, and contains three characters -- space, 'a' with umlaut, and 'b'.
So somewhere between that and text transformer returning us a UTF16 buffer with an embedded null something wacky is happening... Doesn't seem to be a GFX bug, I guess? Or is there gfx-specific code in the transformer?
Oh, and while we don't call pango_layout_set_text anymore we're still passing this bogus data around as far as I can tell. I'm kinda surprised it renders....
Severity: normal → major
Component: GFX: Thebes → Layout: Fonts and Text
QA Contact: thebes → layout.fonts-and-text
| Reporter | ||
Comment 14•19 years ago
|
||
With this patch, I assert on the testcase in question.
Someone who understands the text transformer should really look into this....
| Assignee | ||
Comment 15•19 years ago
|
||
I guess this could happen if nsTextTransformer converts its buffer from ISO-8859-1 to Unicode and then assumes the Unicode buffer is a char* buffer and converts it to Unicode again.
| Reporter | ||
Comment 16•19 years ago
|
||
So basically what comment 1 said?
Comment 17•19 years ago
|
||
>So somewhere between that and text transformer returning us a UTF16 buffer with
>an embedded null something wacky is happening...
I wonder if this has any bearing with the enigmatic bug 211264.
| Assignee | ||
Comment 18•19 years ago
|
||
I'm not sure what comment 1 means. When nsTextTransformer encounters non-ASCII ISO-8859-1 it converts it to Unicode and sets its internal NS_TEXT_TRANSFORMER_TRANSFORMED_TEXT_IS_ASCII flag to false. The problem seems to be that that flag can get set back again at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/generic/nsTextTransformer.cpp&rev=1.125&mark=862#833 if NS_TEXT_TRANSFORMER_LEAVE_AS_ASCII is true.
| Assignee | ||
Comment 19•19 years ago
|
||
Attachment #241079 -
Flags: superreview?(rbs)
Attachment #241079 -
Flags: review?(rbs)
| Assignee | ||
Comment 20•19 years ago
|
||
(In reply to comment #17)
> I wonder if this has any bearing with the enigmatic bug 211264.
That was a spin-off from bug 113779, which was about horked layout on pages with non-ASCII characters, so it sounds very similar.
Comment 21•19 years ago
|
||
Comment on attachment 241079 [details] [diff] [review]
Set the LeaveAsAscii flag to false on non-ASCII
What about the _B(ackward) scaning functions used when handling selection.
More generally, why isn't that SetTransformedTextIsAscii(LeaveAsAscii() && !HasMultibyte())?
| Assignee | ||
Comment 22•19 years ago
|
||
That sounds better, yes.
Attachment #241079 -
Attachment is obsolete: true
Attachment #241096 -
Flags: superreview?(rbs)
Attachment #241096 -
Flags: review?(rbs)
Attachment #241079 -
Flags: superreview?(rbs)
Attachment #241079 -
Flags: review?(rbs)
Comment 23•19 years ago
|
||
Comment on attachment 241096 [details] [diff] [review]
with !hasMultibyte()
r+sr=rbs
Attachment #241096 -
Flags: superreview?(rbs)
Attachment #241096 -
Flags: superreview+
Attachment #241096 -
Flags: review?(rbs)
Attachment #241096 -
Flags: review+
| Assignee | ||
Comment 24•19 years ago
|
||
Boris, can you confirm that this fixes the original problem with Pango-WARNING? I only know that it fixes the assertion in attachment 241067 [details] [diff] [review]. And do you want me to check that in also?
| Reporter | ||
Comment 25•19 years ago
|
||
Yes, that patch fixes the Pango warnings over here. Thanks!
I'm not sure whether adding the assertion in general is OK, esp. since I think pages can put null UTF16 bytes into the middles of text nodes (eg with entities or with DOM createTextNode). Someone more familiar than I with what the text transformer output _should_ look like should make that call.
| Assignee | ||
Updated•19 years ago
|
Assignee: nobody → smontagu
| Assignee | ||
Comment 26•19 years ago
|
||
Checked in.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 27•19 years ago
|
||
Backed out to test if this caused bug 355457.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 28•19 years ago
|
||
Backing out had no effect on Ts/Tp, so relanded
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Flags: needinfo?(joeymunoz6974)
Updated•3 years ago
|
Flags: needinfo?(joeymunoz6974)
You need to log in
before you can comment on or make changes to this bug.
Description
•