nsTextFrame::MeasureText()'s fast text measuring codepath crashes on RISC machines (unaligned memory access)

RESOLVED FIXED

Status

()

defect
--
critical
RESOLVED FIXED
17 years ago
10 years ago

People

(Reporter: roland.mainz, Assigned: Mitch)

Tracking

({crash, perf})

Trunk
All
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [whitebox])

Attachments

(5 attachments, 5 obsolete attachments)

[follow-up of bug 36146 ("GetWidth optimizations need to be implemented on Unix
(text measurement performance)")]
nsTextFrame::MeasureText()'s fast text measuring codepath (enabled via
|NS_RENDERING_HINT_FAST_MEASURE|) crashes on RISC machines which require natural
alignment of datatypes (like SPARC).

Currently the fast text measuring code is disabled on all platforms except x86
for this reason.

Steps to reproduce:
1. % export MOZILLA_GFX_ENABLE_FAST_MEASURE=1 # this will enable
|NS_RENDERING_HINT_FAST_MEASURE| (see
http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsRenderingContextGTK.cpp#225)
even on non-x86 platforms
2. ./mozilla # start Mozilla

Stack trace looks like this (see
http://bugzilla.mozilla.org/show_bug.cgi?id=36146#c44 for the full version):
-- snip --
Reading libimgpng.so
t@1 (l@1) signal BUS (invalid address alignment) in TransformTextToUnicode at
line 4529 in file "nsTextFrame.cpp"
 4529       *cp2-- = PRUnichar(*cp1--);
(dbx) print cp2 cp1
dbx: syntax error on "cp1"
Use the `help' command for more information.
(dbx) print cp2, cp1
cp2 = 0xffbe46b7
cp1 = 0xffbe46b7 "on"
(dbx) print aText, aNumChars
aText = 0xffbe46b7 "on"
aNumChars = 1
(dbx) where
current thread: t@1
=>[1] TransformTextToUnicode(aText = 0xffbe46b7 "on", aNumChars = 1), line 4529
in "nsTextFrame.cpp"
  [2] nsTextFrame::MeasureText(this = 0x7ef8f0, aPresContext = 0x6ccdd8,
aReflowState = STRUCT, aTx = CLASS, aLb = 0x738458, aTs = STRUCT, aTextData =
STRUCT), line 5036 in "nsTextFrame.cpp"
  [3] nsTextFrame::Reflow(this = 0x7ef8f0, aPresContext = 0x6ccdd8, aMetrics =
STRUCT, aReflowState = STRUCT, aStatus = 0), line 5341 in "nsTextFrame.cpp"
  [4] nsLineLayout::ReflowFrame(this = 0xffbe5448, aFrame = 0x7ef8f0,
aReflowStatus = 0, aMetrics = (nil), aPushedFrame = 0), line 1081 in
"nsLineLayout.cpp"
  [5] nsInlineFrame::ReflowInlineFrame(this = 0x7ef884, aPresContext =
0x6ccdd8, aReflowState = STRUCT, irs = STRUCT, aFrame = 0x7ef8f0, aStatus = 0),
line 713 in "nsInlineFrame.cpp"
  [6] nsInlineFrame::ReflowFrames(this = 0x7ef884, aPresContext = 0x6ccdd8,
aReflowState = STRUCT, irs = STRUCT, aMetrics = STRUCT, aStatus = 0), line 523
in "nsInlineFrame.cpp"
etc. etc.
-- snip --
Depends on: 36146
Keywords: perf
Priority: -- → P2
Target Milestone: --- → Future
Whiteboard: [whitebox]
.
Assignee: attinasi → font
Component: Layout → Layout: Fonts and Text
Priority: P2 → --
QA Contact: cpetersen0953 → ian
Target Milestone: Future → ---
Priority: -- → P3
Target Milestone: --- → Future
*** Bug 224337 has been marked as a duplicate of this bug. ***
I fixed something like this in the mail codebase regarding misalignments
a while agi.

I'll build a debug version of mozilla and submit a patch in a couple of
days unless soemone beats me to it.

*** This bug has been marked as a duplicate of 224337 ***
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
oops, I marked the wrong bug.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
I'm a bit confused with this bug. Why does the 'Summary' mention RISC machines
and the 'OS' is tagged Linux ??

I've just submitted a patch in bug 224337 that may fix this as well.
Can someone check and confirm please ?
Greg has pointed out that Linux runs on more than just x86, so the Summary
statement is valid. Confusion cleared.
Mitch:
This bug is about the "fast text measuing codepath" which is disabled on
Solaris/SPARC (the fast text measuring codepath was initially added for
Win32/GDI where it is very expensive to use GDI for text measuring, however for
X11 it isn't that important since the X client has all the font metrics after
XLoadQueryFont() (e.g. without the need to issue expensive system calls)).

You can _enable_ it on RISC machines using an environment variable - which
triggers the crash described in this bug.

Just run Mozilla like this (sh/ksh/bash):
% (MOZILLA_GFX_ENABLE_FAST_MEASURE=1 ./mozilla)

The crash will occur in the gfx/fontmetrics layer
(gfx/src/gtk/nsFontMetricsGTK.cpp, gfx/src/xlib/nsFontMetricsXlib.cpp) but AFAIK
the issue sits in nsTextFrame.cpp (rbs and smontagu are more or less the experts
in this area (see CC:))
Ok, it looks like it crashes with or without my "fix" with the environment 
variable set to MOZILLA_GFX_ENABLE_FAST_MEASURE=1

I don't know why then there is a depencency on this bug from bug 224337.
Maybe someone should break the relationship ?

I can have a look at this too if you like ?
Mitch wrote:
> Ok, it looks like it crashes with or without my "fix" with the environment 
> variable set to MOZILLA_GFX_ENABLE_FAST_MEASURE=1
>
> I don't know why then there is a depencency on this bug from bug 224337.
> Maybe someone should break the relationship ?

AFAIK bug 224337 isn't related at all (except that it is another crash due
violation of the alignment requirements).

> I can have a look at this too if you like ?

Sure...
Assignee: font → Mitch.DSouza
Status: REOPENED → NEW
This is easy to figure out too but i need an explanation from someone.
Looking at the registers at the faulting instruction.                 
                                                     
(dbx) regs                                           
current thread: t@1
current frame:  [1]
g0-g3    0x00000000 0x00000b0c 0x00000030 0x00003c00
g4-g7    0x00000000 0xfea05e7c 0x00000000 0xff320000
o0-o3    0xffbf8a20 0x0091a8d0 0x00000000 0xffbf8334
o4-o7    0xffbf84ac 0x00000005 0xffbf8318 0xfd21f564
l0-l3    0xffbf8a37 0x00000079 0xffbf8a29 0xfd128b08
l4-l7    0x009be208 0xffbf84ac 0x00000005 0xfdf711b8
i0-i3    0xffbf8a29 0x00000008 0x009799e4 0x00000000                
i4-i7    0xfc3faa80 0x00000005 0xffbf8398 0xfd220a48
y        0x00000000                                                   
ccr      0x00000000                              
pc       0xfd21f610:TransformTextToUnicode+0xb0 sth     %l1, [%l0]
npc      0xfd21f614:TransformTextToUnicode+0xb4 ba      TransformTextToUnicode+0x58
(dbx)                                                                              
     
We are storing %l1 into the address of %l0 =  0xffbf8a29 which is
clearly a misaligned address. Then i asked myself and ask the readers
to explain why infact we storing a half word (16bits) into that address.
Looking at the function responsible:                                    
                                    
                                                           
// Transforms characters in place from ascii to Unicode
static void                                            
TransformTextToUnicode(char* aText, PRInt32 aNumChars)
{                                                     
  // Go backwards over the characters and convert them.
  unsigned char*  cp1 = (unsigned char*)aText + aNumChars - 1;
  PRUnichar*      cp2 = (PRUnichar*)aText + (aNumChars - 1);  
                                                            
  while (aNumChars-- > 0) {                                 
    // XXX: If you crash here then you may see the issue described
    // in http://bugzilla.mozilla.org/show_bug.cgi?id=36146#c44
    *cp2-- = PRUnichar(*cp1--);
  }                                                  
}

Keep in mind that sizeof(PRUnichar) = 2, so everytime we execute

        *cp2-- = PRUnichar(*cp1--);

we decrement cp2 by 2 and cp1 by 1. How does this ever work ?
If according to the comment we transform charaters in place, 
surely the declaration of PRUnichar should also be a unsigned char ?

Fixing this however uncovers even more bugs. Before we progress there,
though i'd like someone to comment on the above ?


             
comment 3 already noted that that's where the problem is.

PRUnichar is a 16-bit character type.  Accept that.

The problem is that the code in question needs to be redesigned not to use
unaligned access.  (That said, that part of the code could probably use a much
larger redesign at some point, so I'm not convinced this is worth fixing.  Was
bug 36146 a significant performance gain?)
Fair enough. Let's close this then. 
I'm not convinced that it isn't, either, since the fix might not be that hard.
Ok, i'll try a stab at a fix tomorrow.
> Keep in mind that sizeof(PRUnichar) = 2, so everytime we execute
> 
>        *cp2-- = PRUnichar(*cp1--);
> 
> we decrement cp2 by 2 and cp1 by 1. How does this ever work ?

The decrements have nothing to do with any bug, other than the overall bad
design dbaron mentioned.  The misalignment bug occurs at the initialization of cp2.

/be
Posted patch Proposed diffSplinter Review
Yes i realise that the decrement has nothing to do with the crash per-se,
but wondered how the whole thing worked since at first glance it'd seem  
that cp2 would fall off the begining of the array, but i later realised
that aNumChars is smaller than the size of aText and so i don't believe this
would ever happen.							    
							    
Anyhow the attached diff changes the store from a 'sth' to a 'stub' so
we only do byte assignments. This adds an extra store in the loop, but
i'm not sure there's an easier way to do it. As mentioned in comment 13
though, with this change in place, we uncover more crashes due to misaligned
access elsewhere in the codepath. Roland, David is it worth fixing this
and then plodding thru all misaligned crashes one by one ?
							    
P.s. I wouldn't know a unicode character if it hit me in the face, so
someone needs to check that the proposed fix actually works on other
platforms.
Keywords: crash
Hi,

This old bus (bus error in TransformTextToUnicode() due to unaligned memory access) has recently started to affect Debian's firefox (around version 1.5.0.1) builds on sparc, see http://bugs.debian.org/362170. I'm not sure why we haven't seen it before, but at the moment firefox is totally unusable on sparc, as it dies with bus error immediately after launch. Any ideas?

Best regards,

Jurij Smakov
Debian Sparc port maintainer
We're seeing this issue on HP-UX 11.23/IA and IRIX 6.5.
Comment on attachment 137234 [details] [diff] [review]
Proposed diff

dbaron: since the builds aren't working, can we introduce this crummy code until someone gets around to rewriting it?
Attachment #137234 - Flags: superreview?(dbaron)
Attachment #137234 - Flags: review?(dbaron)
(In reply to comment #21)
> We're seeing this issue on HP-UX 11.23/IA and IRIX 6.5.

BTW, this is with 1.5.0.3.

Tru64 UNIX 5.1 exhibits this error as well. When the error occurs, the following occurs:
Unaligned access pid=483396 <firefox-bin> va=0x11fff6e8d pc=0x3000c58177c ra=0x3000c58176c inst=0xa0a30000

This error occurs in places other than that fixed by attachment #137234 [details] [diff] [review] (comment #13 already mentions this). For example, after applying the patch on IRIX 6.5, a SIGBUS occurs later. A backtrace gives:
>  0 g_utf16_to_utf8(0x7ffbcfcd, 0x8, 0x0, 0x0, 0x0, 0x50, 0x50, 0x49d3f48) ["/opt/build/glib-2.6.5/glib/gutf8.c":1065, 0x4994c24]
   1 nsFontMetricsPango::GetTextDimensions(const unsigned short*,unsigned int,nsTextDimensions&,int*,nsRenderingContextGTK*)(0x0, 0x54cf5d8, 0x1090bfc0, 0x0, 0x0, 0x50, 0x50, 0x49d3f48) ["/opt/build/mozilla/gfx/src/gtk/nsFontMetricsPango.cpp":538, 0x54a2c0c]
   2 nsRenderingContextGTK::GetTextDimensions(const unsigned short*,unsigned int,nsTextDimensions&,int*)(0x7ffbc9e8, 0x8, 0x7ffbcfcd, 0x10914ac8, 0x0, 0x50, 0x50, 0x49d3f48) ["/opt/build/mozilla/gfx/src/gtk/nsRenderingContextGTK.cpp":1269, 0x5491d90]
   3 nsTextFrame::MeasureText(nsPresContext*,const nsHTMLReflowState&,nsTextTransformer&,nsILineBreaker*,nsTextFrame::TextStyle&,nsTextFrame::TextReflowData&)(0x10930a90, 0x7ffbc570, 0x7b513e0, 0x7b5c104, 0x7ffbcfcc, 0x50000, 0x0, 0x0) ["/opt/build/mozilla/layout/generic/nsTextFrame.cpp":5700, 0x5c7ebb4]
   4 nsTextFrame::Reflow(nsPresContext*,nsHTMLReflowMetrics&,const nsHTMLReflowState&,unsigned int&)(0x2f, 0x2a, 0x21, 0x1e, 0x7ffbca08, 0x0, 0x47, 0x67f16b8) ["/opt/build/mozilla/layout/generic/nsTextFrame.cpp":5987, 0x5c79ccc]
   ...
I don't think it's going to be thateasy. I've hit three more bus errors after I've patched the original one:

That one is the same as mentioned in comment #23:

#0  0xf755fc24 in g_utf16_to_utf8 () from /usr/lib/libglib-2.0.so.0
#1  0x001ee870 in nsFontMetricsPango::GetTextDimensions (this=0x132c588, aString=0xff
44f785, aLength=8, aDimensions=@0xff44f650, aFontID=0x0, aContext=0xe85680)
    at nsFontMetricsPango.cpp:539
#2  0x001e4aa8 in nsRenderingContextGTK::GetTextDimensions (this=0xe85680, aString=0x
ff44f785, aLength=8, aDimensions=@0xff44f650, aFontID=0x0)
    at nsRenderingContextGTK.cpp:1270
#3  0x002eaa70 in nsTextFrame::MeasureText (this=0x133a9f0, aPresContext=0x12d02f8, a
ReflowState=@0xff44fa40, aTx=<value optimized out>, aLb=0x123f018, 
    aTs=@0xff44f940, aTextData=@0xff44f980) at nsTextFrame.cpp:5697

After I patch it up, I get:

#0  0x002e1724 in RevertSpacesToNBSP (aBuffer=0xffeff785, aWordLen=<value optimized o
ut>) at nsTextFrame.cpp:6207
#1  0x002e9660 in nsTextFrame::ComputeTotalWordDimensions (this=0x1339250, aPresConte
xt=0x1241c70, aLineBreaker=0x12a4270, aLineLayout=@0xffefffdc, 
    aReflowState=@0xffeffa40, aNextFrame=0x133928c, aBaseDimensions=@0xffeff650, aWor
dBuf=0xffeff785, aWordLen=8, aWordBufSize=86, aCanBreakBefore=1)
    at nsTextFrame.cpp:6227
#2  0x002ea804 in nsTextFrame::MeasureText (this=0x1339250, aPresContext=0x1241c70, a
ReflowState=@0xffeffa40, aTx=<value optimized out>, aLb=0x12a4270, 
    aTs=@0xffeff940, aTextData=@0xffeff980) at nsTextFrame.cpp:5710

After this one is fixed, it dies here:

#0  0x000a51dc in nsJISx4051LineBreaker::Next (this=0x1237980, aText=0xff289765, aLen
=9, aPos=0, oNext=0xff28925c, oNeedMoreText=0xff289258)
    at nsJISx4501LineBreaker.cpp:478
#1  0x002e7bdc in nsTextFrame::ComputeWordFragmentDimensions (this=0x133acb0, aPresCo
ntext=<value optimized out>, aLineBreaker=0x1237980, aLineLayout=@0xff289fbc, 
    aReflowState=@0xff289a20, aNextFrame=0x133acec, aContent=0x126f868, aText=0x126f8
68, aMoreSize=0xff289330, aWordBuf=0xff289765, aRunningWordLen=@0xff2893a4, 
    aWordBufSize=86, aCanBreakBefore=1) at nsTextFrame.cpp:6385
#2  0x002e9834 in nsTextFrame::ComputeTotalWordDimensions (this=0x133acb0, aPresConte
xt=0x12d1698, aLineBreaker=0x1237980, aLineLayout=@0xff289fbc, 
    aReflowState=@0xff289a20, aNextFrame=0x133acec, aBaseDimensions=@0xff289630, aWor
dBuf=0xff289765, aWordLen=8, aWordBufSize=86, aCanBreakBefore=1)
    at nsTextFrame.cpp:6259
#3  0x002ea84c in nsTextFrame::MeasureText (this=0x133acb0, aPresContext=0x12d1698, a
ReflowState=@0xff289a20, aTx=<value optimized out>, aLb=0x1237980, 
    aTs=@0xff289920, aTextData=@0xff289960) at nsTextFrame.cpp:5710

That's where I pretty much ran out of space :-). Clearly, the same unaligned array keeps causing crashes in various places. I think there should be some global solution for this problem, otherwise this incremental fixing can go on forever.
(In reply to comment #24)
> I don't think it's going to be thateasy. I've hit three more bus errors after
> I've patched the original one:

Can you post your patches?
This patch "fixes" two more bus errors in nsTextFrame.cpp.
This crash actually happens in a Glib function, so the proper solution would be to bug them to fix their stuff... Anyway, another memcpy() saves the day :-)
I've came up with a preliminary patch which fixes these problems for me (by introducing new GetUnichar() and SetUnichar() functions which handle unaligned accesses correctly for the architectures which require it). It is available at http://www.wooyd.org/debian/firefox/firefox-bus-error.patch

I'll ask debian sparc people to test the (presumably) fixed binaries.
*** Bug 329087 has been marked as a duplicate of this bug. ***
I now had two positive reports (besides my own) from people who tested the patched up binaries, so I'm posting the patch here. Most probably the #if in functions GetUnichar() and SetUnichar() will need to be extended to include all architectures which might encounter the alignment problems.
Attachment #222217 - Attachment is obsolete: true
Attachment #222218 - Attachment is obsolete: true
Comment on attachment 222992 [details] [diff] [review]
Complete fix for alignment problems

Thanks for working on this.

>diff -aur a/gfx/src/gtk/nsFontMetricsPango.cpp b/gfx/src/gtk/nsFontMetricsPango.cpp
>@@ -535,8 +535,13 @@
>     PangoLayout *layout = pango_layout_new(mPangoContext);
I suspect this can fail, but that'll be someone else's problem (probably a coverity report).

>-    gchar *text = g_utf16_to_utf8(aString, aLength,
>+    // Just copy the aString to ensure the alignment,
>+    // it is not used anywhere else.
this can fail:
>+    PRUnichar* dummy = new PRUnichar[aLength];
and we don't usually use new for PRUnichar, it gives no benefit (other than more fun ways to die, either NS_Alloc (?) or malloc)

you'll crash here if you don't check.
>+    memcpy(dummy, aString, aLength*sizeof(PRUnichar));
>+    gchar *text = g_utf16_to_utf8(dummy, aLength,
>                                   NULL, NULL, NULL);
but given that you're doing this all locally, should you perhaps try to use a stack alloc function instead?
>+    delete [] dummy;

>diff -aur a/intl/lwbrk/src/nsJISx4501LineBreaker.cpp b/intl/lwbrk/src/nsJISx4501LineBreaker.cpp
>@@ -487,13 +487,13 @@
>   PRUint32 cur;
>   for (cur = aPos; cur < aLen; ++cur)
>   {
please cache this GetUnichar result:
>-    if (IS_SPACE(aText[cur]))
>+    if (IS_SPACE(GetUnichar(&aText[cur])))
>     {
>       *oNext = cur;
>       *oNeedMoreText = PR_FALSE;
>       return NS_OK;
>     }
so that you don't have to recompute it here:
>-    if (IS_CJK_CHAR(aText[cur]))
>+    if (IS_CJK_CHAR(GetUnichar(&aText[cur])))
>       goto ROUTE_CJK_NEXT;
>   }
>   *oNext = aLen;
>@@ -503,13 +503,13 @@
> ROUTE_CJK_NEXT:
>   PRInt8 c1, c2;
>   cur = aPos;

again, please cache.
>-  if(NEED_CONTEXTUAL_ANALYSIS(aText[cur]))
>+  if(NEED_CONTEXTUAL_ANALYSIS(GetUnichar(&aText[cur])))
>   {
>-    c1 = this->ContextualAnalysis((cur>0)?aText[cur-1]:0,
>-                                  aText[cur],
>-                                  (cur<(aLen-1)) ?aText[cur+1]:0);
>+    c1 = this->ContextualAnalysis((cur>0)?GetUnichar(&aText[cur-1]):0,
to save for here:
>+                                  GetUnichar(&aText[cur]),
>+                                  (cur<(aLen-1)) ?GetUnichar(&aText[cur+1]):0);
>   } else  {
to save for here:
>-    c1 = this->GetClass(aText[cur]);
>+    c1 = this->GetClass(GetUnichar(&aText[cur]));
>   }
>   
>   if(CLASS_THAI == c1) 
>@@ -521,13 +521,13 @@
> 
>   for(cur++; cur <aLen; cur++)
>   {
>-     if(NEED_CONTEXTUAL_ANALYSIS(aText[cur]))
>+     if(NEED_CONTEXTUAL_ANALYSIS(GetUnichar(&aText[cur])))
especially note that macros are allowed to result in multiple calls to arguments, which can be really inefficient.

>diff -aur a/intl/unicharutil/util/nsUnicharUtils.cpp b/intl/unicharutil/util/nsUnicharUtils.cpp
>@@ -340,3 +340,28 @@
> 
>+PRUnichar
>+GetUnichar(const void *ptr)
>+{
>+    PRUnichar result;
please use a single #ifdef SOMETHING_THAT_MEANS_WE_NEED_TO_DO_THIS (it can be as short as you like as long as it's meaningful), define/undef it at the top of the file. that allows people to add their favorite hardware to the list in a single place

isn't mips is the classic example?
>+#if defined(__sparc__) || defined(__alpha__)
>+    *((char *) &result) = *((char *) ptr);
>+    *((char *) &result + 1) = *((char *) ptr + 1);


>+SetUnichar(void *ptr, PRUnichar aChar)
>+{

instead of having to change it in a number of places.
>+#if defined(__sparc__) || defined(__alpha__)

>diff -aur a/layout/generic/nsTextFrame.cpp b/layout/generic/nsTextFrame.cpp
>@@ -5101,8 +5101,8 @@
>   
>   while (aNumChars-- > 0) {
>     // XXX: If you crash here then you may see the issue described
>-    // in http://bugzilla.mozilla.org/show_bug.cgi?id=36146#c44
>-    *cp2-- = PRUnichar(*cp1--);
>+    // in http://bugzilla.mozilla.org/show_bug.cgi?id=161826
I know the old code used a c++ style cast, but we almost never (afaik, and i'm not going to check using mxr-test) use c++ casts for PRUnichar (or other basic types).
>+    SetUnichar(cp2--, PRUnichar(*cp1--));
>   }
> }
>  
>@@ -6199,9 +6199,9 @@
> {
>   PRUnichar* end = aBuffer + aWordLen;
>   for (; aBuffer < end; aBuffer++) {
>-    PRUnichar ch = *aBuffer;
>+    PRUnichar ch = GetUnichar(aBuffer);
>     if (ch == ' ') {
>-      *aBuffer = CH_NBSP;
>+      SetUnichar(aBuffer, CH_NBSP);
>     }
>   }
> }
this last section is a reason to use -p in your diff command. without more context or -p, i have no clue what function you're patching.

Lastly, please if at all possible switch to using cvs to generate diffs, it enables patchviewer in bugzilla to do magical things as well as allowing many other tools to work, and lets people figure out how your changes fit in a world where the file may have changed (yes, i know that intl files don't change often, but nsTextFrame is in layout, and layout does like to change, sometimes fairly dramatically).
(In reply to comment #31)
Thanks a lot for all the comments! I'll try to fix the problems you pointed out. Since I've never hacked on firefox before, here are some followup questions.

> >+    PRUnichar* dummy = new PRUnichar[aLength];
> and we don't usually use new for PRUnichar, it gives no benefit (other than
> more fun ways to die, either NS_Alloc (?) or malloc)

Is there a way to make sure that NS_Alloc will allocate memory aligned on the 2-byte boundary in this case? I'm pretty sure that using the 'new' allocation will do the right thing.

[..]
> but given that you're doing this all locally, should you perhaps try to use a
> stack alloc function instead?

What function would that be? No clue here :-)

[..]
> Lastly, please if at all possible switch to using cvs to generate diffs, it
> enables patchviewer in bugzilla to do magical things as well as allowing many
> other tools to work, and lets people figure out how your changes fit in a world
> where the file may have changed (yes, i know that intl files don't change
> often, but nsTextFrame is in layout, and layout does like to change, sometimes
> fairly dramatically).

Which cvs tree should I use?

Perhaps those questions are too basic to keep in bugzilla, I would not mind if you would reply to me privately.
(In reply to comment #32)
> Is there a way to make sure that NS_Alloc will allocate memory aligned on the
> 2-byte boundary in this case? I'm pretty sure that using the 'new' allocation
> will do the right thing.

malloc (and thus NS_Alloc) generally aligns memory to the largest alignment requirement or preference (i.e., for faster performance) the system has.  We rely in many places on it being 4-byte aligned (i.e., use the lowest 2 bits for other things), and it's almost always at least 8-byte aligned.
the alternative function i referenced is alloca, http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_CRT__alloca.asp but that's probably only a good idea if we can show that the size that we'd be allocating will always safely fit in the stack. and i'm not sure we can show that.

smontagu? :)
Here's a rework of the patch based on the comments. Unfortunately, could not test the resulting binary, since the build of CVS checkout fails (some missing symbols, AFAICT it does not link to libpangoxft as it should). Here's the .mozconfig I used:

. $topsrcdir/browser/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/ff-opt-static
ac_add_options --enable-optimize
ac_add_options --disable-debug
ac_add_options --enable-static
ac_add_options --disable-shared
ac_add_options --disable-tests
ac_add_options --enable-pango
mk_add_options MOZ_CO_PROJECT=browser
Attachment #222992 - Attachment is obsolete: true
(In reply to comment #35)
> Created an attachment (id=223651) [edit]
> Fix for alignment problems, take 2
> 
> Here's a rework of the patch based on the comments.

This fixes the problem on Tru64 UNIX, IRIX, and HP-UX 11.23/IA. Oddly, Solaris 10/SPARC works without the patch. However, Solaris 8 and 9/SPARC have problems. I'm rebuilding with the patch on Solaris 2.6-10/SPARC now.

BTW, why not force this patch on all platforms? I don't see how it would break anything.
BTW, another possible solution is to specify single-byte alignment flags to the compiler:
  HP-UX 11.23/IA:
      +unum          Allow pointers to access non-natively aligned data.
                     This option alters the way that the compiler accesses
                     dereferenced data.  Use of this option may reduce the
                     efficiency of generated code.  (See also #pragma pack
                     in the HP aC++ Online Programmer's Guide for a more
                     efficient method.)
                        1    Assume single byte alignment.  Dereferences are
                             performed with a series of single-byte loads
                             and stores.
                        2    Dereferences are performed with a series of
                             two-byte loads and stores.
                        4    Dereferences are performed with a series of
                             four-byte loads and stores.

  Solaris/SPARC:
     -misalign (SPARC platform) Permits misaligned data, which
               would otherwise generate an error, in memory.

  Tru64 UNIX 5.1:
  -[no]misalign
       Use -misalign to specify that code generated for indirect load
       instructions and store instructions not generate alignment faults for
       arbitrarily aligned addresses.  Using this flag increases the size of
       generated code and might have a negative impact on performance. The
       default condition, implied by -nomisalign is for code generated for
       indirect load instructions and store instructions to generate align-
       ment faults for arbitrarily aligned addresses.

I cannot easily find a similar flag for IRIX 6.5 though.
I changed malloc and free to PR_Malloc and PR_Free, and added necessary casts and moved gchar *text initialization to avoid compilation problems with gcc 4.x.

Note that GetUnichar and SetUnichar might benefit from being defined as inline in the .h file.
Attachment #223651 - Attachment is obsolete: true
Attachment #224243 - Flags: review?
Attachment #224243 - Flags: review? → review?(dbaron)
What ever happened to the idea of trying to use aligned memory instead of working around the fact that it isn't?  (Given the workaround approach, what makes you think this limited set of patches is sufficient?)
(In reply to comment #39)
> Given the workaround approach, what makes you think this limited set of patches is sufficient?)

The fact that firefox does work now on sparc ?
Work while doing what?  Have you, say, run the CSS1 test suite?  Have you tested Chinese, Japanese, Arabic, Hebrew, Thai and Indic scripts?  Have you tested multiple graphics backends?  All of those could easily trigger additional codepaths that would need fixing.
Attachment #224243 - Flags: review?(dbaron) → review-
Note that I'm not actually suggesting that you do all those things; I think this is the wrong approach.
(In reply to comment #41)
> Work while doing what?  Have you, say, run the CSS1 test suite?  Have you
> tested Chinese, Japanese, Arabic, Hebrew, Thai and Indic scripts?  Have you
> tested multiple graphics backends?  All of those could easily trigger
> additional codepaths that would need fixing.

I have looked into doing a "proper" solution, and gave up. The buffer(s) which are accessed here can hold either ASCII or Unichar data, and, depending on circumstances, the pointers are cast either into char* or unichar*. Occasionally, one type of data is converted into another in place. To disentangle this mess one would need to have a clear understanding of how it all currently works and a lot of free time to rework it in a sane way :-). In the meantime, having a browser which can at least make it as far as its own welcome page, will be definitely appreciated by the users.  
*** Bug 356608 has been marked as a duplicate of this bug. ***
Depends on: 333659
No longer depends on: 351242
I suspect this was fixed on the trunk by bug 343445.  If not, somebody should figure out the remaining **sources** of misalignment so we can fix them.
Comment on attachment 137234 [details] [diff] [review]
Proposed diff

This is a tiny subset of attachment 224243 [details] [diff] [review].  review- for the same reasons attachment 224243 [details] [diff] [review] is (comment 39 / comment 41), plus that it's insufficient given all the stuff that's in attachment 224243 [details] [diff] [review].

I'd be willing to take the workaround on a branch if the real problem is fixed on the trunk, though.
Attachment #137234 - Flags: superreview?(dbaron)
Attachment #137234 - Flags: superreview-
Attachment #137234 - Flags: review?(dbaron)
Attachment #137234 - Flags: review-
I believe I'm seeing this problem on ia64 in version 2.0.0.1.  I get this message when running firefox:

firefox-bin(27853): unaligned access to 0x60000fffff51243f, ip=0x4000000000874720

Then when using gdb, it points me at nsTextFrame::MeasureText():

(gdb) list *0x4000000000874720
0x4000000000874720 is in nsTextFrame::MeasureText(nsPresContext*, nsHTMLReflowState const&, nsTextTransformer&, nsILineBreaker*, nsTextFrame::TextStyle&, nsTextFrame::TextReflowData&) (/data/f
irefox/mozilla/layout/generic/nsTextFrame.cpp:5133).
5128      PRUnichar*      cp2 = (PRUnichar*)aText + (aNumChars - 1);
5129
5130      while (aNumChars-- > 0) {
5131        // XXX: If you crash here then you may see the issue described
5132        // in http://bugzilla.mozilla.org/show_bug.cgi?id=36146#c44
5133        *cp2-- = PRUnichar(*cp1--);
5134      }
5135    }
5136
5137    PRUint32


There are multiple other unaligned access messages that I'll be creating new bugs for soon.
(In reply to comment #47)
> There are multiple other unaligned access messages that I'll be creating new
> bugs for soon.

If they're about access to the same unaligned objects, please don't.
David,

Actually it looks like comment #24 lists the location of the 'unaligned access' messages I've seen so far.

However, before I got your message, I already filed one of them as bug #366366.

Bryan
Duplicate of this bug: 366366
Duplicate of this bug: 377394
This bug is rearing it's ugly head again, but in a slightly different form.
With firefox-2.0.0.8 the backtrace on sparc is given below at the end of
this comment entry.

We do that in-place conversion to unicode, the string pointer is
unaligned, then we pass that unaligned unicode string pointer down
into the Pango font metrics code, which in turn passes the unaligned
unicode pointer into g_utf16_to_utf8 which does the unaligned accesses.

The answer to these bugs is not to continue papering around it by adding
more special accessors all over the place.  If you're passing around
pointers to a foo object, the object should be aligned properly.

This in-place unicode conversion code path has been biting people for
more than 5 years.  Please I would really appreciate it if someone would
fix this bug properly.

Program received signal SIGBUS, Bus error.
[Switching to Thread -143315168 (LWP 9543)]
0xf718345c in g_utf16_to_utf8 () from /usr/lib/libglib-2.0.so.0
(gdb) bt
#0  0xf718345c in g_utf16_to_utf8 () from /usr/lib/libglib-2.0.so.0
#1  0xf501f8b8 in nsFontMetricsPango::GetTextDimensions (this=0x9e1548, aString=0xfffb39af, aLength=10, aDimensions=@
0xfffb38b8, 
    aFontID=0x0, aContext=0xa0e710) at nsFontMetricsPango.cpp:543
#2  0xf5013a10 in nsRenderingContextGTK::GetTextDimensionsInternal (this=0xa0e710, aString=0xfffb39af, aLength=10, 
    aDimensions=@0xfffb38b8, aFontID=0x0) at nsRenderingContextGTK.cpp:1257
#3  0xf502658c in nsRenderingContextImpl::GetTextDimensions (this=0xa0e710, aString=0xfffb39af, aLength=10, 
    aDimensions=@0xfffb38b8, aFontID=0x0) at nsRenderingContextImpl.cpp:636
#4  0xf52c89d4 in nsTextFrame::MeasureText (this=0xa2219c, aPresContext=0x6ebb78, aReflowState=@0xfffb3cc0, 
    aTx=<value optimized out>, aLb=0x9e17d8, aTs=@0xfffb3bb8, aTextData=@0xfffb3bf8) at nsTextFrame.cpp:5724
I suspect this is fixed now (the right way) on the trunk (Firefox 3.0 alphas/betas, etc.), but somebody should probably test.
Thanks for the info David.

I think I know a way to fix this without much fuss in 2.0.0.x and
I'll attach my patch after my build finishes and I can confirm that
my fix works.
The idea is that we bump the pointer up by a char if it
is unaligned, and return that fixed up pointer we used
to the caller.  I am pretty sure this works within the
constraints of these word buffer objects.

Probably, this makes most if not all of the
GetUnichar/SetUnichar business unnecessary.
Posted patch alignsparc.patch (obsolete) — Splinter Review
This works for me on sparc
Posted patch align.patchSplinter Review
Based on CVS, it doesn't give any warning on ia64 either.
Attachment #294901 - Attachment is obsolete: true
Attachment 286275 [details] [diff] looks like the right approach (but does the patch ensure that the buffer is big enough?).

I think comment 39, comment 41, and comment 42 apply to attachment 294965 [details] [diff] [review].
I don't like the ad-hoc definition of NEED_STRICT_ALIGNMENT, isn't that readily available from autoconf?
armin76, think you could address dbaron's comment?
Priority: P3 → --
QA Contact: ian → layout.fonts-and-text
Target Milestone: Future → ---
(In reply to comment #60)
> armin76, think you could address dbaron's comment?

Nope, i just updated the patch, i have no clue what it does, sorry :)
From comment 53, sounds like this is fixed on all active branches.
Status: NEW → RESOLVED
Closed: 16 years ago10 years ago
Resolution: --- → FIXED
Summary: nsTextFrame::MeasureText()'s fast text measuring codepath crashes on RISC machines → nsTextFrame::MeasureText()'s fast text measuring codepath crashes on RISC machines (unaligned memory access)
You need to log in before you can comment on or make changes to this bug.