Closed Bug 333535 Opened 19 years ago Closed 19 years ago

Pango warnings about invalid UTF8 when loading page

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: smontagu)

References

()

Details

Attachments

(3 files, 4 obsolete files)

STEPS TO REPRODUCE: 1) Build cairo-gtk2 seamonkey build (might work with Firefox, but don't have one on hand). 2) Load http://gamma.nic.fi/~tapio18/Opetus/index2.php3 ACTUAL RESULTS: (Gecko:21667): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text() (repeated a bunch of times). EXPECTED RESULTS: We shouldn't pass bogus data to pango. Or it could be that pango's check needs fixing, of course.
So as far as I can tell, we're passing a string with embedded nulls (so non-null byte followed by 3 null bytes) into pango?? Is the issue that someone assumes out m1b thing on text fragments means ASCII (when it really means ISO-8859-1 in this case)?
Flags: blocking1.9a1?
Depends on: 252950
Blocks: 334730
Flags: blocking1.9a1? → blocking1.9+
Attached file The right testcase (obsolete) —
Load the index.html
Attachment #240545 - Attachment is obsolete: true
Still load index.html
Which version of pango do you see this with? I can't reproduce it with 1.12.3
Now, we don't use |pango_layout_*|, therefore, we cannot look the warning message. But if the warning is true, we may have the bug.
Simon, this is pango 1.8.1.
> Now, we don't use |pango_layout_*| Is that a very recent change? I'm using a tree pulled at MOZ_CO_DATE="Sat Sep 23 00:37:57 CDT 2006" If this problem will be hidden but not gone if I update, please let me know so that I can avoid doing so until I debug it.
Bug 339513 was checked in on 9-25. I think that may be what changed it.
I was testing on a tree from Thu Sep 28 08:00:13 IDT 2006
Attached file Minimal testcase
Analysis coming up, as far as I can take it.
Attachment #240546 - Attachment is obsolete: true
Attachment #240548 - Attachment is obsolete: true
So when I hit that warning in Pango we have: #0 pango_layout_set_text (layout=0x8978528, text=0xbfffc118 "ä", length=3) at pango-layout.c:790 790 g_warning ("Invalid UTF8 string passed to pango_layout_set_text()"); (gdb) x/3ub text 0xbfffc118: 195 164 0 Note the null byte at position 2. This is what triggers the warning, per the docs: * Note that g_utf8_validate() returns %FALSE if @max_len is * positive and NUL is met before @max_len bytes have been read. The null byte is already in there in the gfxPangoTextRun: (gdb) frame 1 #1 0xb6ba4cb7 in gfxPangoTextRun::EnsurePangoLayout (this=0x90a1b08, aContext=0x8fe60f0) at ../../../../mozilla/gfx/thebes/src/gfxPangoFonts.cpp:836 836 pango_layout_set_text (mPangoLayout, u8str.Data(), u8str.Length()); (gdb) p mString.mLength $64 = 2 (gdb) x/2uh mString->mData 0x8fc0398: 228 0 Looking up the stack to nsTextFrame::MeasureText, we see: (gdb) frame 9 #9 0xb71ae534 in nsTextFrame::MeasureText (this=0x8d8ad94, aPresContext=0x91065e0, aReflowState=@0xbfffcac0, aTx=@0xbfffc820, aTs=@0xbfffc980, aTextData=@0xbfffc7e0) at ../../../mozilla/layout/generic/nsTextFrame.cpp:5534 5534 aReflowState.rendContext->GetTextDimensions(pWordBuf, lastWordLen, lastWordDimensions); (gdb) p lastWordLen $65 = 2 (gdb) x/2uh pWordBuf 0xbfffc840: 228 0 And examining the content node: (gdb) p mContent->GetText()->Is2b() $73 = 0 (as expected for ISO-8859-1 content) (gdb) p mContent->GetText()->GetLength() $74 = 3 (gdb) p mContent->GetText()->m1b[0] $68 = 32 ' ' (gdb) p/u mContent->GetText()->m1b[1] $70 = 228 (gdb) p/u mContent->GetText()->m1b[2] $71 = 98 So the text fragment is, as expected, ISO-8859-1, and contains three characters -- space, 'a' with umlaut, and 'b'. So somewhere between that and text transformer returning us a UTF16 buffer with an embedded null something wacky is happening... Doesn't seem to be a GFX bug, I guess? Or is there gfx-specific code in the transformer? Oh, and while we don't call pango_layout_set_text anymore we're still passing this bogus data around as far as I can tell. I'm kinda surprised it renders....
Severity: normal → major
Component: GFX: Thebes → Layout: Fonts and Text
QA Contact: thebes → layout.fonts-and-text
With this patch, I assert on the testcase in question. Someone who understands the text transformer should really look into this....
I guess this could happen if nsTextTransformer converts its buffer from ISO-8859-1 to Unicode and then assumes the Unicode buffer is a char* buffer and converts it to Unicode again.
So basically what comment 1 said?
>So somewhere between that and text transformer returning us a UTF16 buffer with >an embedded null something wacky is happening... I wonder if this has any bearing with the enigmatic bug 211264.
I'm not sure what comment 1 means. When nsTextTransformer encounters non-ASCII ISO-8859-1 it converts it to Unicode and sets its internal NS_TEXT_TRANSFORMER_TRANSFORMED_TEXT_IS_ASCII flag to false. The problem seems to be that that flag can get set back again at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/generic/nsTextTransformer.cpp&rev=1.125&mark=862#833 if NS_TEXT_TRANSFORMER_LEAVE_AS_ASCII is true.
Attachment #241079 - Flags: superreview?(rbs)
Attachment #241079 - Flags: review?(rbs)
(In reply to comment #17) > I wonder if this has any bearing with the enigmatic bug 211264. That was a spin-off from bug 113779, which was about horked layout on pages with non-ASCII characters, so it sounds very similar.
Comment on attachment 241079 [details] [diff] [review] Set the LeaveAsAscii flag to false on non-ASCII What about the _B(ackward) scaning functions used when handling selection. More generally, why isn't that SetTransformedTextIsAscii(LeaveAsAscii() && !HasMultibyte())?
That sounds better, yes.
Attachment #241079 - Attachment is obsolete: true
Attachment #241096 - Flags: superreview?(rbs)
Attachment #241096 - Flags: review?(rbs)
Attachment #241079 - Flags: superreview?(rbs)
Attachment #241079 - Flags: review?(rbs)
Comment on attachment 241096 [details] [diff] [review] with !hasMultibyte() r+sr=rbs
Attachment #241096 - Flags: superreview?(rbs)
Attachment #241096 - Flags: superreview+
Attachment #241096 - Flags: review?(rbs)
Attachment #241096 - Flags: review+
Boris, can you confirm that this fixes the original problem with Pango-WARNING? I only know that it fixes the assertion in attachment 241067 [details] [diff] [review]. And do you want me to check that in also?
Yes, that patch fixes the Pango warnings over here. Thanks! I'm not sure whether adding the assertion in general is OK, esp. since I think pages can put null UTF16 bytes into the middles of text nodes (eg with entities or with DOM createTextNode). Someone more familiar than I with what the text transformer output _should_ look like should make that call.
Blocks: 211264
Assignee: nobody → smontagu
Checked in.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Backed out to test if this caused bug 355457.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Backing out had no effect on Ts/Tp, so relanded
Status: REOPENED → RESOLVED
Closed: 19 years ago19 years ago
Resolution: --- → FIXED
Flags: needinfo?(joeymunoz6974)
Flags: needinfo?(joeymunoz6974)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: