Last Comment Bug 790139 - (CVE-2012-4190) crash in FT2FontEntry::CreateFontEntry when visiting any site in CM10
(CVE-2012-4190)
: crash in FT2FontEntry::CreateFontEntry when visiting any site in CM10
Status: VERIFIED FIXED
[native-crash][startupcrash]
: crash, reproducible, sec-critical, sec-vector, topcrash
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: ARM Android
: -- critical with 3 votes (vote)
: ---
Assigned To: Brad Lassey [:blassey] (use needinfo?)
: Kevin Brosnan [:kbrosnan]
Mentors:
: 756140 790528 791781 792260 795570 795710 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-10 21:56 PDT by Marco Zehe (:MarcoZ) on PTO until August 15
Modified: 2013-01-10 12:35 PST (History)
30 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
verified
+
verified
+
verified
unaffected


Attachments
logcat log file when crashing Firefox (779 bytes, text/plain)
2012-09-12 23:53 PDT, Daniel Lombraña González
no flags Details
logcat file when crashing Firefox in reader mode CM10 (394.02 KB, text/plain)
2012-09-14 12:17 PDT, Daniel Lombraña González
no flags Details
stack (4.04 KB, text/plain)
2012-09-20 05:52 PDT, Ted Mielczarek [:ted.mielczarek]
no flags Details
patch (717 bytes, patch)
2012-09-27 13:55 PDT, Brad Lassey [:blassey] (use needinfo?)
blassey.bugs: review-
Details | Diff | Splinter Review
logging patch (4.24 KB, patch)
2012-10-01 09:02 PDT, Brad Lassey [:blassey] (use needinfo?)
no flags Details | Diff | Splinter Review
Valgrind trace -- invalid writes (32.09 KB, text/plain)
2012-10-02 10:24 PDT, Julian Seward [:jseward]
no flags Details
patch to force us to use in tree setlcdfilter (1.37 KB, text/plain)
2012-10-05 11:09 PDT, Brad Lassey [:blassey] (use needinfo?)
no flags Details
patch to force us to use in tree setlcdfilter (2.84 KB, patch)
2012-10-05 11:19 PDT, Brad Lassey [:blassey] (use needinfo?)
jfkthame: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta-
akeybl: approval‑mozilla‑release+
Details | Diff | Splinter Review

Description Marco Zehe (:MarcoZ) on PTO until August 15 2012-09-10 21:56:02 PDT
This bug was filed from the Socorro interface and is 
report bp-07fb607e-9105-4fad-9ecc-1534e2120911 .
============================================================= 

According to community member Peter Vagner, this happens on http://nightly.mozilla.org, or often when he turns on Reader Mode. An installation of AdBlock doesn't seem to have any bearing. Have been unsuccessful in reproducing so far, but will keep trying.
Comment 1 Peter Vágner 2012-09-11 00:30:34 PDT
Hello,
I have got some more details:
I am using Samsung galaxy S II GTI9100 with CM-10 20120908 as of now but am updating the rom quite frequently.
Originally I have assumed the crashes might be related because of the accessibility support. After I got the fact Marco is unable to reproduce it I started to believe this might have something to do with my device / software combination. So I have attempted to load this bug url in the nightly running on the device so I will try to comment on the issue directly from the device. First the page loaded fine, I was able to read the content. Then I have tried to login but I was unsuccessfull. So I have used my PC to reset the password. Then I have again returned to the mobile to open this bug url and ooops, the crash occured as well.
I am not going to uninstall and then install nightly back in case someone knowledgeable enough recommends doing something what might help.
I have tried to dump a adb logcat into a file, however I have got 2 MB file. And I am not sure this is helpfull.
Comment 2 Scoobidiver (away) 2012-09-11 01:48:08 PDT
It's our #1 top crasher.
Can you narrow down the logcat at the crash time and attach it to this bug?
Comment 3 Scoobidiver (away) 2012-09-12 00:57:49 PDT
*** Bug 790528 has been marked as a duplicate of this bug. ***
Comment 4 Daniel Lombraña González 2012-09-12 23:52:11 PDT
Hi,

I can confirm this bug right now: 

http://crash-stats.mozilla.com/report/index/bp-5af1177d-740d-4892-beba-681f42120913

It seems like browsing has been fixed without problems, because yesterday I barely could load any page with nightly and today it seems to be working more or less fine. The only problem is that as soon as I enable the reader mode in any page it crashes automatically. I've reproduced this behavior with and without Ad-Block plus so I guess it doesn't matter.

I've started within the phone the logcat command and adjoining you should see it. I do not find too much related info, but in any case if I'm doing something wrong, let me know it, ok?

I'm also using CM10.
Comment 5 Daniel Lombraña González 2012-09-12 23:53:29 PDT
Created attachment 660724 [details]
logcat log file when crashing Firefox
Comment 6 Peter Vágner 2012-09-13 00:31:59 PDT
While trying to capture the logcat I did the following.
- connected the USB cable with debugging turned on
- typed an nightly.mozilla.org into the avesome bar.
- typed adb logcat > logcat.log on the terminal running on a PC
- Tapped search button, waited a few second to see the crash reporter dialog,
- submitted the crash closing nightly,
- ended up on the android desktop,
- exited adb window by pressing ctrl+c.
I dont know how to make less verbose log with relevant info. Mine is about 2 MB in size.
If you can recommend me some reading material I would be happy to cooperate if it may help to identify and fix the issue.
Comment 7 Peter Vágner 2012-09-13 00:44:14 PDT
some more things I have noticed.
On my device auto updates are not working. When I am going to menu -> settings -> about firefox and tapping check for updates, I am getting a prompt "no update available".
I have updated manually to sep 12 nightly build but crashes still persist. I can always reproduce it by going to http://nightly.mozilla.org/

Might the updater issue be related somehow?
Comment 8 Daniel Lombraña González 2012-09-13 00:53:58 PDT
Hi,

As Peter I experienced the problem of the updates also in my previous CM9, now after reinstalling everything, CM10, the updates are coming back. However, the reader mode stills failing each time I enable it in the browser (it doesn't matter the page).

Cheers,

Daniel
Comment 9 Peter Vágner 2012-09-13 00:56:47 PDT
I have installed CM-10 at the end of the jully and was occassionally updating it. Since 6 sep update of CM-10 updates no longer work for me.
I am not sure if this is because of CM-10 20120906 and newer or firefox 5 sep or newer.
Comment 10 Scoobidiver (away) 2012-09-13 01:03:59 PDT
(In reply to Peter Vágner from comment #6)
> I dont know how to make less verbose log with relevant info. Mine is about 2
> MB in size.
Try to note the exact time of the crash based on your phone clock and select in the log everything that happen two seconds before and after.

(In reply to Peter Vágner from comment #7)
> Might the updater issue be related somehow?
It's bug 789791 and not related because this bug happens in en-US builds and before September 5.
Comment 11 Daniel Lombraña González 2012-09-14 08:54:54 PDT
Updated today to the latest version (by the way, is the browser downloading in the background the updates automatically? because right now I only get a notification saying: apply downloaded version) and the same problem, when enabling reader mode crashes.
Comment 12 Scoobidiver (away) 2012-09-14 09:01:19 PDT
(In reply to Daniel Lombraña González from comment #11)
> Updated today to the latest version [...] and the same problem
It's normal, nobody is assigned to fix it because we are waiting the logcat.
Note that custom kernels are not officially supported.
Comment 13 Daniel Lombraña González 2012-09-14 12:17:13 PDT
Created attachment 661312 [details]
logcat file when crashing Firefox in reader mode CM10

build.board: aries
build.bootloader: unknown
build.brand: samsung
build.cpu_abi: armeabi-v7a
build.cpu_abi2: armeabi
build.device: GT-I9000
build.display: cm_galaxysmtd-userdebug 4.1.1 JRO03L eng..20120909.192301 test-keys
build.fingerprint: samsung/GT-I9000/GT-I9000:2.3.5/GINGERBREAD/XXJVT:user/release-keys
build.hardware: aries
build.host: cyanogenmod
build.id: JRO03L
build.manufacturer: samsung
build.model: GT-I9000
build.product: GT-I9000
build.radio: unknown
build.serial: 39307058339800EC
build.tags: test-keys
build.time: 1347243800000
build.type: userdebug
build.user: unknown
version.codename: REL
version.incremental: eng..20120909.192301
version.release: 4.1.1
version.sdk_int: 16
Comment 14 Daniel Lombraña González 2012-09-14 12:17:43 PDT
Scoobiediver, do you need something else?
Comment 15 Scoobidiver (away) 2012-09-15 10:32:38 PDT
Daniel, with today's build do you hit this bug or bug 791462?
Comment 16 Daniel Lombraña González 2012-09-16 14:42:40 PDT
Hi,

I keep getting the same crash. Do you want another logcat file?

Daniel
Comment 17 Scoobidiver (away) 2012-09-16 14:46:26 PDT
(In reply to Daniel Lombraña González from comment #16)
> I keep getting the same crash. Do you want another logcat file?
No. I though the crash signature morphed into a better one but it's a new one.
Comment 18 Peter Vágner 2012-09-16 23:40:32 PDT
Hello,
I have skipped a few builds however now with build 2012-09-16 I am no longer getting crashes while browsing e.g. nightly.mozilla.org. I am still getting crashes when enabling reader mode.
Anyway it is now verry usable for me again.
thanks for what you all are doing.
Comment 19 Scoobidiver (away) 2012-09-19 01:21:53 PDT
*** Bug 792260 has been marked as a duplicate of this bug. ***
Comment 20 Russell Haley 2012-09-19 01:47:44 PDT
This crash can be triggered on Nightly and Aurora on CM10 M1 and CM10 Nightly 20120919 by going to about:firefox and clicking the "Release Notes" link.  The issue also occurs with some other pagees on the internet, with enough frequency that I am tmemporarily using the native browser and dealing with the ads. Reader Mode is not necessary.

Here is a logcat:

http://pastebin.com/iq7hhhRf
Comment 21 Scoobidiver (away) 2012-09-20 02:26:12 PDT
Based on crash stats, it seems there's a better stack trace in today's build.
For those who crash when entering Reader Mode, can you provide the crash ID with the latest Nightly to confirm?
Comment 22 Daniel Lombraña González 2012-09-20 04:41:40 PDT
Hi Scoobidiver,

I've a problem, because with the latest version, the crashes are worst. Now each time I try to access, about:crashes, firefox crashes as soon as I try to zoom in or I touch the screen. How can I send you the ID of the crash?
Comment 23 Scoobidiver (away) 2012-09-20 04:54:27 PDT
Do you crash when you zoom in or touch the screen only in about:crashes or on any web pages?
Do you still crash while entering Reader Mode?
Comment 24 Daniel Lombraña González 2012-09-20 05:12:03 PDT
It crashes more often again on loading pages, but there is no a pattern for this yet. However, in the about:crashes web page as soon as I touch the screen it crashes. The same if I try to zoom, which obviously involves touching the screen :-) Reader mode stills failing :(
Comment 26 Ted Mielczarek [:ted.mielczarek] 2012-09-20 05:52:50 PDT
Created attachment 662983 [details]
stack

Here's a more reasonable stack from 07fb607e-9105-4fad-9ecc-1534e2120911. I got this using my dump-lookup tool: http://hg.mozilla.org/users/tmielczarek_mozilla.com/dump-lookup/
Comment 27 Guillermo López :willyaranda (probably SLOW response) 2012-09-20 05:55:43 PDT
My problem is totally unrelated with the Reader mode, as this happens with any page, as comment #24 says.
Comment 28 Scoobidiver (away) 2012-09-20 06:00:18 PDT
(In reply to Guillermo López (:willyaranda) from comment #27)
> My problem is totally unrelated with the Reader mode, as this happens with
> any page, as comment #24 says.
Different steps to reproduce can generate the same crash signature.
Comment 29 Robert Kaiser 2012-09-20 06:50:11 PDT
Hmm, FT2FontEntry::CreateFontEntry is freetype and that reminds me heavily of what bug 756140 comment #13 also is talking about - "Unlike Android, Cyanogenmod builds freetype with subpixel rendering enabled".

Maybe there's some connection of all those libxul.so@foo crashes there.
Comment 30 Kartikaya Gupta (email:kats@mozilla.com) 2012-09-20 08:36:24 PDT
I attached gdb and reproduced the crash that AaronMT was seeing on his TF201 with CyanogenMod 10. This was the backtrace I got (definitely looks font-related):

0x643c83d8 in ft_module_get_service () from /Users/kats/zspace/mozilla-git/obj-android-debug/dist/bin/libxul.so
Loading symbols... Done
(gdb) bt
#0  0x643c83d8 in ft_module_get_service () from /Users/kats/zspace/mozilla-git/obj-android-debug/dist/bin/libxul.so
#1  0x64400058 in sfnt_init_face () from /Users/kats/zspace/mozilla-git/obj-android-debug/dist/bin/libxul.so
#2  0x643d15b0 in tt_face_init () from /Users/kats/zspace/mozilla-git/obj-android-debug/dist/bin/libxul.so
#3  0x643c4ef4 in open_face () from /Users/kats/zspace/mozilla-git/obj-android-debug/dist/bin/libxul.so
#4  0x643c5f8a in FT_Open_Face () from /Users/kats/zspace/mozilla-git/obj-android-debug/dist/bin/libxul.so
#5  0x643c5002 in FT_New_Memory_Face () from /Users/kats/zspace/mozilla-git/obj-android-debug/dist/bin/libxul.so
#6  0x63dc6206 in FT2FontEntry::CreateFontEntry (aProxyEntry=..., aFontData=0x67840000 "", aLength=106400) at /Users/kats/zspace/mozilla-git/gfx/thebes/gfxFT2FontList.cpp:170
#7  0x63dc3f98 in gfxAndroidPlatform::MakePlatformFont (this=<optimized out>, aProxyEntry=<optimized out>, aFontData=<optimized out>, aLength=<optimized out>)
    at /Users/kats/zspace/mozilla-git/gfx/thebes/gfxAndroidPlatform.cpp:164
#8  0x63dbdd40 in gfxUserFontSet::LoadFont (this=0x65c66880, aProxy=0x677ebf00, aFontData=0x67830000 "wOFF", aLength=@0x5df28654: 63712)
    at /Users/kats/zspace/mozilla-git/gfx/thebes/gfxUserFontSet.cpp:690
#9  0x63dbe316 in gfxUserFontSet::OnLoadComplete (this=0x65c66880, aProxy=0x677ebf00, aFontData=0x67830000 "wOFF", aLength=63712, aDownloadStatus=0)
    at /Users/kats/zspace/mozilla-git/gfx/thebes/gfxUserFontSet.cpp:472
#10 0x633bb6f2 in nsFontFaceLoader::OnStreamComplete (this=0x6a1ee3d0, aLoader=0x6a1ee400, aContext=<optimized out>, aStatus=0, aStringLen=63712, aString=0x67830000 "wOFF")
    at /Users/kats/zspace/mozilla-git/layout/style/nsFontFaceLoader.cpp:211
#11 0x6317de16 in nsStreamLoader::OnStopRequest (this=0x6a1ee400, request=<optimized out>, ctxt=<optimized out>, aStatus=0) at /Users/kats/zspace/mozilla-git/netwerk/base/src/nsStreamLoader.cpp:95
#12 0x634645ac in nsCORSListenerProxy::OnStopRequest (this=0x679d70b0, aRequest=0x679b9834, aContext=0x0, aStatusCode=0)
    at /Users/kats/zspace/mozilla-git/content/base/src/nsCrossSiteListenerProxy.cpp:604
#13 0x631921ae in nsHTTPCompressConv::OnStopRequest (this=<optimized out>, request=<optimized out>, aContext=<optimized out>, aStatus=<optimized out>)
    at /Users/kats/zspace/mozilla-git/netwerk/streamconv/converters/nsHTTPCompressConv.cpp:94
#14 0x6317d5c2 in nsStreamListenerTee::OnStopRequest (this=0x650294e0, request=0x679b9834, context=0x0, status=0) at /Users/kats/zspace/mozilla-git/netwerk/base/src/nsStreamListenerTee.cpp:49
#15 0x631ea93a in mozilla::net::nsHttpChannel::OnStopRequest (this=0x679b9800, request=<optimized out>, ctxt=<optimized out>, status=0)
    at /Users/kats/zspace/mozilla-git/netwerk/protocol/http/nsHttpChannel.cpp:4974
#16 0x63164162 in nsInputStreamPump::OnStateStop (this=0x679dc8e0) at /Users/kats/zspace/mozilla-git/netwerk/base/src/nsInputStreamPump.cpp:552
#17 0x63164238 in nsInputStreamPump::OnInputStreamReady (this=0x679dc8e0, stream=<optimized out>) at /Users/kats/zspace/mozilla-git/netwerk/base/src/nsInputStreamPump.cpp:374
#18 0x63d3cfd2 in nsInputStreamReadyEvent::Run (this=0x677fe540) at /Users/kats/zspace/mozilla-git/xpcom/io/nsStreamUtils.cpp:82
#19 0x63d4c57e in nsThread::ProcessNextEvent (this=0x5e1510f0, mayWait=<optimized out>, result=0x5df288b7) at /Users/kats/zspace/mozilla-git/xpcom/threads/nsThread.cpp:624
#20 0x63d17f86 in NS_ProcessNextEvent_P (thread=0x5e1510f0, mayWait=<optimized out>) at /Users/kats/zspace/mozilla-git/obj-android-debug/xpcom/build/nsThreadUtils.cpp:220
#21 0x63bf23b6 in mozilla::ipc::MessagePump::Run (this=0x5e1532e0, aDelegate=0x5e17f0e0) at /Users/kats/zspace/mozilla-git/ipc/glue/MessagePump.cpp:82
#22 0x63d7ad1a in MessageLoop::RunInternal (this=0x5e17f0e0) at /Users/kats/zspace/mozilla-git/ipc/chromium/src/base/message_loop.cc:208
#23 0x63d7ad78 in RunHandler (this=0x5e17f0e0) at /Users/kats/zspace/mozilla-git/ipc/chromium/src/base/message_loop.cc:201
#24 MessageLoop::Run (this=0x5e17f0e0) at /Users/kats/zspace/mozilla-git/ipc/chromium/src/base/message_loop.cc:175
#25 0x63b4581a in nsBaseAppShell::Run (this=0x5e1625c0) at /Users/kats/zspace/mozilla-git/widget/xpwidgets/nsBaseAppShell.cpp:163
#26 0x63a107a0 in nsAppStartup::Run (this=0x600d0880) at /Users/kats/zspace/mozilla-git/toolkit/components/startup/nsAppStartup.cpp:296
#27 0x6314b59a in XREMain::XRE_mainRun (this=0x5df28ae4) at /Users/kats/zspace/mozilla-git/toolkit/xre/nsAppRunner.cpp:3834
#28 0x6314ccb6 in XREMain::XRE_main (this=0x5df28ae4, argc=<optimized out>, argv=0x5e16b048, aAppData=<optimized out>) at /Users/kats/zspace/mozilla-git/toolkit/xre/nsAppRunner.cpp:3911
#29 0x6314ce12 in XRE_main (argc=7, argv=0x5e16b048, aAppData=0x57094c34, aFlags=<optimized out>) at /Users/kats/zspace/mozilla-git/toolkit/xre/nsAppRunner.cpp:3987
#30 0x63152890 in GeckoStart (data=0x5cb43af0, appData=0x57094c34) at /Users/kats/zspace/mozilla-git/toolkit/xre/nsAndroidStartup.cpp:73
#31 0x5707afec in Java_org_mozilla_gecko_GeckoAppShell_nativeRun (jenv=0x56f34ec0, jc=<optimized out>, jargs=0x21000005) at /Users/kats/zspace/mozilla-git/mozglue/android/APKOpen.cpp:983
#32 0x40770ff4 in ?? ()
#33 0x40770ff4 in ?? ()
Comment 31 Mark Finkle (:mfinkle) (use needinfo?) 2012-09-20 14:29:55 PDT
Just wanted to confirm that this is only happening on CM10? Not happening on CM9?
Comment 32 Aaron Train [:aaronmt] 2012-09-21 08:22:00 PDT
CM9 reported here https://support.mozilla.org/en-US/questions/937636#answer-369758 - his issue sounds like exactly what I saw on my TF201 which got this stack from Kats
Comment 33 Craig 2012-09-21 11:43:33 PDT
That post from Aaron was from me. I wasn't using Reader Mode, so I can't confirm that the crash I was experiencing is related to Reader Mode. However, the crashes on CM10 are ridiculous and happen all of the time. The crashes on CM9 are far fewer in my experience, but still happen enough to cause concern. Let me know if I can help in any way.
Comment 34 Scoobidiver (away) 2012-09-23 04:46:04 PDT
Removing the js::gc::MarkObjectRange signature as it seems unrelated although a top crasher.
Comment 35 Scoobidiver (away) 2012-09-24 08:20:25 PDT
Based on crash stats, it only happens with Android API Version 16 (CM10).
Comment 36 Aaron Train [:aaronmt] 2012-09-24 14:17:53 PDT
Might be helpful, but I noticed two changes to CM10 in Freetype from Sept 1st, a new renderer and a new version of Freetype
http://review.cyanogenmod.com/#/q/freetype,n,z
Comment 37 Daniel Lombraña González 2012-09-25 00:22:44 PDT
Dear all,

I've just updated to the latest nightly from CM10, version 24092012, and the bug stills there: Firefox crashes as soon as I load a page or I touch the screen in the about:crashes page.

Do you need something to help you debug this issue?
Comment 38 Mark Finkle (:mfinkle) (use needinfo?) 2012-09-25 11:35:28 PDT
(In reply to Daniel Lombraña González from comment #37)
> Dear all,
> 
> I've just updated to the latest nightly from CM10, version 24092012, and the
> bug stills there: Firefox crashes as soon as I load a page or I touch the
> screen in the about:crashes page.
> 
> Do you need something to help you debug this issue?

Unless you have a stack that points to Ft2FontEntry::CreateFontEntry, we can't be sure it's the same crash. We have at least two other bugs where Fennec crashes on CM10. See bug 756140 and bug 792342.
Comment 39 Daniel Lombraña González 2012-09-25 23:26:30 PDT
I'll wait! After reading those bug reports I think that all of them share the same user experience problem: as soon as you load a page, the browser crashes :-( Good luck with the bug hunting!!
Comment 40 Brad Lassey [:blassey] (use needinfo?) 2012-09-27 13:55:54 PDT
Created attachment 665635 [details] [diff] [review]
patch

speculative fix based on https://crash-stats.mozilla.com/report/index/d0c34bd6-ff3c-4481-87fc-5fc512120927 and other reports with the same signature
Comment 41 Jonathan Kew (:jfkthame) 2012-09-27 14:45:51 PDT
Comment on attachment 665635 [details] [diff] [review]
patch

Review of attachment 665635 [details] [diff] [review]:
-----------------------------------------------------------------

This looks like a band-aid for an issue we don't actually understand. I think I'd be happier about taking a complete FreeType update to the current release (2.4.10) than speculatively patching a line like this if we can't explain the root of the problem.

::: modules/freetype2/src/base/ftobjs.c
@@ +4325,4 @@
>            {
>              FT_ASSERT( cur[0]->clazz );
>  
> +            if ( cur[0] && cur[0]->clazz && cur[0]->clazz->get_interface )

If either cur[0] or cur[0]->clazz is NULL here, that seems to imply some pretty serious breakage within FreeType - it's loaded a NULL module? Surely that shouldn't happen; but if it does, wallpapering it here is likely to just defer the crash to some other place where the module is used.
Comment 42 Brad Lassey [:blassey] (use needinfo?) 2012-09-28 20:57:47 PDT
I've been able to confirm that this patch avoids the crash
Comment 43 Jonathan Kew (:jfkthame) 2012-09-30 05:27:51 PDT
Could you check (in gdb) whether the FT library's module list at the end of FT_Init_FreeType (after FT_Add_Default_Modules has returned) contains any NULL module records, or records with NULL clazz pointer? Assuming the pointers are all valid at that time, the question we need to answer is how the module list can be getting changed/corrupted subsequently.
Comment 44 Michael Loney 2012-09-30 09:28:13 PDT
*** Bug 795710 has been marked as a duplicate of this bug. ***
Comment 45 ricardohenrylee 2012-09-30 17:46:07 PDT
(In reply to Brad Lassey [:blassey] from comment #42)
> I've been able to confirm that this patch avoids the crash

@blassey or @jfkthame would it be possible for you to upload the builds somewhere for us to test?
Comment 46 Brad Lassey [:blassey] (use needinfo?) 2012-10-01 09:02:19 PDT
Created attachment 666574 [details] [diff] [review]
logging patch

with this logging patch, I see this:

I/GeckoFT ( 4097): checking 18 modules, starting at 0x60c8ecd4 for library: 0x60c8ecc0
I/GeckoFT ( 4097): checking 18 modules, starting at 0x60c8ef14 for library: 0x60c8ef00




I/GeckoFT ( 4097): cur[0]->clazz is null for 0x60c8ecfc while searching 18 modules, starting at 0x60c8ecd4 for library: 0x60c8ecc0
I/GeckoFT ( 4097): cur[0]->clazz is null for 0x60c8ecfc while searching 18 modules, starting at 0x60c8ecd4 for library: 0x60c8ecc0
^C
Comment 47 Jonathan Kew (:jfkthame) 2012-10-01 10:22:15 PDT
OK, so the modules list was intact when first created. Subsequently, *something* set one of those clazz pointers to NULL. I don't think there's anything within FreeType that should do this (AFAIK), so my worry is that we have a bug elsewhere that's stomping on a location that it shouldn't (writing via a wild/uninitialized pointer, maybe?).

Alternatively, it could be a bug within FreeType, leading to damaging its own library record, but I'm inclined to think that's less likely. Unless maybe CM10 includes a bad font that causes FT to stumbles, leading to internal corruption.

Running this under valgrind to see if it can detect any use of an uninitialized pointer would be awesome...
Comment 48 Jonathan Kew (:jfkthame) 2012-10-01 10:51:44 PDT
Alternatively, provided the problem is consistent (the rogue NULL is always at the same position, relative to the FT library), you might be able to catch it by setting a gdb data watchpoint on the address after the library has been initialized, to detect when it changes.
Comment 49 Jonathan Kew (:jfkthame) 2012-10-01 10:52:28 PDT
Comment on attachment 666574 [details] [diff] [review]
logging patch

Review of attachment 666574 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/freetype2/src/base/ftobjs.c
@@ +4328,5 @@
>              FT_ASSERT( cur[0]->clazz );
> +	    
> +
> +	    if (!cur[0])
> +	      __android_log_print(ANDROID_LOG_INFO, "GeckoFT", "cur[0] is null for %p while searching %d modules, starting at %p for library: %p", library->num_modules, library->modules, library);

BTW, you're missing "cur" as a parameter here, before library->num_modules. Though it doesn't matter right now as this isn't the log message that's actually being hit.
Comment 50 Brad Lassey [:blassey] (use needinfo?) 2012-10-01 18:06:51 PDT
strangely, when I went to set the watch point, I got this:

Breakpoint 2, FT_Add_Default_Modules (library=0x61982b40) at /Volumes/source/mozilla-central/modules/freetype2/src/base/ftinit.c:229
229	      __android_log_print(ANDROID_LOG_INFO, "GeckoFT", "checking %d modules, starting at %p for library: %p", library->num_modules, library->modules, library);
(gdb) p cur[0]->clazz
$1 = (FT_Module_Class *) 0x60844f00
(gdb) p cur[1]->clazz
$2 = (FT_Module_Class *) 0x5e446340
(gdb) p cur[2]->clazz
Cannot access memory at address 0x0

which implies my logging is incorrect and clazz is null from the get go
Comment 51 Brad Lassey [:blassey] (use needinfo?) 2012-10-01 18:18:06 PDT
please ignore that last comment, I was printing cur from the wrong scope
Comment 52 Brad Lassey [:blassey] (use needinfo?) 2012-10-01 18:20:26 PDT
however, it looks like android's gdb doesn't support watchpoints:

(gdb) c
Continuing.
Warning:
Could not insert hardware watchpoint 3.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.

(gdb) info breakpoints
Num     Type           Disp Enb Address    What
3       hw watchpoint  keep y              library->modules[2]->clazz
(gdb)
Comment 53 Brad Lassey [:blassey] (use needinfo?) 2012-10-01 18:55:24 PDT
Johnathan, is it expected that FT_Init_FreeType gets called twice?

Also, FWIW, the module that gets gummed up is "autofitter"
Comment 54 Jonathan Kew (:jfkthame) 2012-10-02 03:55:47 PDT
(In reply to Brad Lassey [:blassey] from comment #53)
> Johnathan, is it expected that FT_Init_FreeType gets called twice?

Hmm. I don't know if it's *necessarily* a problem - it should be valid to create multiple freetype library objects, I think. But it does seem surprising, as I'd expect us to create a single FT_Library and use it throughout. So where are the two calls coming from? Presumably one comes from gfxAndroidPlatform::gfxAndroidPlatform(); is the other call coming from somewhere in cairo's or skia's font code? Maybe we should be sharing that FT_Library rather than creating a separate one.
Comment 55 Jonathan Kew (:jfkthame) 2012-10-02 04:00:23 PDT
(In reply to Brad Lassey [:blassey] from comment #52)
> however, it looks like android's gdb doesn't support watchpoints:
> 
> (gdb) c
> Continuing.
> Warning:
> Could not insert hardware watchpoint 3.
> Could not insert hardware breakpoints:
> You may have requested too many hardware breakpoints/watchpoints.
> 
> (gdb) info breakpoints
> Num     Type           Disp Enb Address    What
> 3       hw watchpoint  keep y              library->modules[2]->clazz
> (gdb)

From the message there, it sounds like maybe it can only watch one or two locations. If you know which of the pointers is going to be damaged, maybe you can set a watch on *only* that one, not on every entry in the array.
Comment 56 Brad Lassey [:blassey] (use needinfo?) 2012-10-02 08:08:25 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #54)
> (In reply to Brad Lassey [:blassey] from comment #53)
> > Johnathan, is it expected that FT_Init_FreeType gets called twice?
> 
> Hmm. I don't know if it's *necessarily* a problem - it should be valid to
> create multiple freetype library objects, I think. But it does seem
> surprising, as I'd expect us to create a single FT_Library and use it
> throughout. So where are the two calls coming from? Presumably one comes
> from gfxAndroidPlatform::gfxAndroidPlatform(); is the other call coming from
> somewhere in cairo's or skia's font code? Maybe we should be sharing that
> FT_Library rather than creating a separate one.

It is being called from here http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxAndroidPlatform.cpp#32 and here http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-ft-font.c#273

Also, sorry for typo'ing your name. Johnath has me ruined.
Comment 57 Brad Lassey [:blassey] (use needinfo?) 2012-10-02 08:10:18 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #55)
> (In reply to Brad Lassey [:blassey] from comment #52)
> > however, it looks like android's gdb doesn't support watchpoints:
> > 
> > (gdb) c
> > Continuing.
> > Warning:
> > Could not insert hardware watchpoint 3.
> > Could not insert hardware breakpoints:
> > You may have requested too many hardware breakpoints/watchpoints.
> > 
> > (gdb) info breakpoints
> > Num     Type           Disp Enb Address    What
> > 3       hw watchpoint  keep y              library->modules[2]->clazz
> > (gdb)
> 
> From the message there, it sounds like maybe it can only watch one or two
> locations. If you know which of the pointers is going to be damaged, maybe
> you can set a watch on *only* that one, not on every entry in the array.

There was only one watch point and no break points set. Previously I had set 2 breakpoints, but had since cleared them. I've also tried only setting the one watch point with the same result.
Comment 58 Brad Lassey [:blassey] (use needinfo?) 2012-10-02 09:39:40 PDT
Julian Seward is seeing write after free from Valgrind, so I'm making this private so he can comment (apparently he doesn't have permission to mark as private, we should fix that)
Comment 59 Julian Seward [:jseward] 2012-10-02 10:24:55 PDT
Created attachment 667044 [details]
Valgrind trace -- invalid writes

Here's a bunch of invalid writes observed when loading

http://edition.cnn.com/2012/10/01/travel/american-airlines-customers-complaining/index.html?hpt=us_t3

These were preceded by the complaints listed in bug 795310 and those
listed in bug 797002.  I wonder if they might be related, especially
797002.

Anyway, w.r.t. the invalid writes: after the page finished loading,
there were a bunch of invalid writes ending in segfault, due to
trashing the heap.  What's interesting about these is that they are to
blocks that have no relationship at all to the stack doing the
writing.  Eg we have some part of gfx/cairo/libpixman writing in the
middle of blocks freed by the JS jit.

My best-guess interpretation of this is that there is some wildly
out-of-range writing going on and trashing arbitrary bits of heap.
Comment 60 Joe Drew (not getting mail) 2012-10-02 10:30:48 PDT
If we're scribbling on the heap, that could definitely cause those invalid writes. There's 0 excuse for layers to be looking at stuff allocated by the jit, though.
Comment 61 Robert Kaiser 2012-10-02 10:35:35 PDT
Somehow sounds to me like this could be an explanation for bug 792825 as well, then, as there we crash during GC. Interestingly, we seem to have a specific JS change triggering that issue one over there. In any case, I guess it's worth monitoring if there could be a connection.
Comment 62 Jonathan Kew (:jfkthame) 2012-10-02 10:42:34 PDT
(In reply to Brad Lassey [:blassey] from comment #56)
> (In reply to Jonathan Kew (:jfkthame) from comment #54)
> > (In reply to Brad Lassey [:blassey] from comment #53)
> > > Johnathan, is it expected that FT_Init_FreeType gets called twice?
> > 
> > Hmm. I don't know if it's *necessarily* a problem - it should be valid to
> > create multiple freetype library objects, I think. But it does seem
> > surprising, as I'd expect us to create a single FT_Library and use it
> > throughout. So where are the two calls coming from? Presumably one comes
> > from gfxAndroidPlatform::gfxAndroidPlatform(); is the other call coming from
> > somewhere in cairo's or skia's font code? Maybe we should be sharing that
> > FT_Library rather than creating a separate one.
> 
> It is being called from here
> http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxAndroidPlatform.
> cpp#32 and here
> http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-ft-
> font.c#273

To confirm, the one from gfxAndroidPlatform is first, I guess? What's the full callstack of the second one (from cairo), then? Could this mean we're doing something font-related with cairo but haven't created and passed it the proper freetype face, so it's trying to make its own?
Comment 63 Brad Lassey [:blassey] (use needinfo?) 2012-10-02 11:27:27 PDT
Stack for second call to FT_Init_FreeType:

#0  FT_Init_FreeType (alibrary=0x61ceb804) at /Volumes/source/mozilla-central/modules/freetype2/src/base/ftinit.c:250
#1  0x636873a0 in _cairo_ft_unscaled_font_map_create () at /Volumes/source/mozilla-central/gfx/cairo/cairo/src/cairo-ft-font.c:273
#2  _cairo_ft_unscaled_font_map_lock () at /Volumes/source/mozilla-central/gfx/cairo/cairo/src/cairo-ft-font.c:336
#3  0x6368927c in _cairo_ft_unscaled_font_create_internal (out=<optimized out>, font_face=<optimized out>, id=<optimized out>, 
    filename=<optimized out>, from_face=<optimized out>) at /Volumes/source/mozilla-central/gfx/cairo/cairo/src/cairo-ft-font.c:491
#4  _cairo_ft_unscaled_font_create_from_face (out=<optimized out>, face=<optimized out>)
    at /Volumes/source/mozilla-central/gfx/cairo/cairo/src/cairo-ft-font.c:580
#5  _moz_cairo_ft_font_face_create_for_ft_face (face=0x61ceb804, load_flags=1577246784)
    at /Volumes/source/mozilla-central/gfx/cairo/cairo/src/cairo-ft-font.c:3098
#6  0x635d8354 in FT2FontEntry::CairoFontFace (this=0x6148de00) at /Volumes/source/mozilla-central/gfx/thebes/gfxFT2FontList.cpp:325
#7  0x635d85d2 in FT2FontEntry::CreateScaledFont (this=0x6148de00, aStyle=<optimized out>)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxFT2FontList.cpp:130
#8  0x635d8602 in FT2FontEntry::CreateFontInstance (this=0x6148de00, aFontStyle=0x61ce1400, aNeedsBold=172)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxFT2FontList.cpp:155
#9  0x635c556e in gfxFontEntry::FindOrMakeFont (this=0x6148de00, aStyle=0x61cac8e8, aNeedsBold=false)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxFont.cpp:191
#10 0x635c566c in gfxFontGroup::FindPlatformFont (aName=..., aGenericName=<optimized out>, aUseFontSet=<optimized out>, aClosure=<optimized out>)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxFont.cpp:3210
#11 0x635bf08a in gfxFontGroup::FontResolverProc (aName=..., aClosure=<optimized out>)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxFont.cpp:3456
#12 0x635d5f1e in gfxAndroidPlatform::ResolveFontName (this=<optimized out>, aFontName=<optimized out>, 
    aCallback=0x635bf07d <gfxFontGroup::FontResolverProc(nsAString_internal const&, void*)>, aClosure=0x5f017c74, aAborted=@0x5f017cb7)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxAndroidPlatform.cpp:99
#13 0x635c2fcc in gfxFontGroup::ForEachFontInternal (this=0x61cac8d0, aFamilies=<optimized out>, aLanguage=<optimized out>, 
    aResolveGeneric=<optimized out>, aResolveFontName=true, aUseFontSet=false, 
    fc=0x635c55f1 <gfxFontGroup::FindPlatformFont(nsAString_internal const&, nsACString_internal const&, bool, void*)>, closure=0x61cac8d0)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxFont.cpp:3420
#14 0x635c3076 in gfxFontGroup::ForEachFontInternal (this=0x61cac8d0, aFamilies=<optimized out>, aLanguage=<optimized out>, 
    aResolveGeneric=<optimized out>, aResolveFontName=true, aUseFontSet=true, 
    fc=0x635c55f1 <gfxFontGroup::FindPlatformFont(nsAString_internal const&, nsACString_internal const&, bool, void*)>, closure=0x61cac8d0)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxFont.cpp:3441
#15 0x635c3130 in gfxFontGroup::ForEachFont (this=<optimized out>, 
    fc=0x635c55f1 <gfxFontGroup::FindPlatformFont(nsAString_internal const&, nsACString_internal const&, bool, void*)>, closure=<optimized out>)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxFont.cpp:3268
#16 0x635c56a4 in gfxFontGroup::BuildFontList (this=0x1) at /Volumes/source/mozilla-central/gfx/thebes/gfxFont.cpp:3106
#17 0x635c58f2 in gfxFontGroup::gfxFontGroup (this=0x61cac8d0, aFamilies=<optimized out>, aStyle=0x5f018580, aUserFontSet=0x0)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxFont.cpp:3097
#18 0x635d5e8c in gfxAndroidPlatform::CreateFontGroup (this=<optimized out>, aFamilies=..., aStyle=0x5f018580, aUserFontSet=0x0)
    at /Volumes/source/mozilla-central/gfx/thebes/gfxAndroidPlatform.cpp:149
#19 0x62ddab50 in nsFontMetrics::Init (this=0x61c9a970, aFont=..., aLanguage=<optimized out>, aContext=<optimized out>, aUserFontSet=0x0)
    at /Volumes/source/mozilla-central/gfx/src/nsFontMetrics.cpp:109
#20 0x62dd8ffa in nsFontCache::GetMetricsFor (this=0x61cecd20, aFont=..., aLanguage=0x61ca5880, aUserFontSet=0x0, aMetrics=@0x5f01868c)
#21 0x62dd92f6 in nsDeviceContext::GetMetricsFor (this=0x61cfc680, aFont=..., aLanguage=0x61ca5880, aUserFontSet=0x0, aMetrics=@0x5f01868c)
    at /Volumes/source/mozilla-central/gfx/src/nsDeviceContext.cpp:254
#22 0x62e0c9ee in nsLayoutUtils::GetFontMetricsForStyleContext (aStyleContext=<optimized out>, aFontMetrics=0x5f01868c, aInflation=1)
    at /Volumes/source/mozilla-central/layout/base/nsLayoutUtils.cpp:2050
#23 0x62e5122e in ComputeLineHeight (aFontSizeInflation=<optimized out>, aBlockHeight=<optimized out>, aStyleContext=<optimized out>)
    at /Volumes/source/mozilla-central/layout/generic/nsHTMLReflowState.cpp:2344
#24 nsHTMLReflowState::CalcLineHeight (aStyleContext=0x61f6e908, aBlockHeight=<optimized out>, aFontSizeInflation=1)
    at /Volumes/source/mozilla-central/layout/generic/nsHTMLReflowState.cpp:2367
#25 0x62e51320 in nsHTMLReflowState::CalcLineHeight (this=<optimized out>)
    at /Volumes/source/mozilla-central/layout/generic/nsHTMLReflowState.cpp:2356
#26 0x62e378a6 in nsBlockReflowState::nsBlockReflowState (this=0x5f018754, aReflowState=..., aPresContext=0x61f61400, aFrame=<optimized out>, 
    aMetrics=..., aTopMarginRoot=true, aBottomMarginRoot=true, aBlockNeedsFloatManager=true)
    at /Volumes/source/mozilla-central/layout/generic/nsBlockReflowState.cpp:113
#27 0x62e36238 in nsBlockFrame::Reflow (this=0x61f6ec58, aPresContext=0x61f61400, aMetrics=..., aReflowState=..., aStatus=@0x5f018c6c)
    at /Volumes/source/mozilla-central/layout/generic/nsBlockFrame.cpp:1023
#28 0x62e3b092 in nsContainerFrame::ReflowChild (this=<optimized out>, aKidFrame=0x61f6ec58, aPresContext=0x61f61400, aDesiredSize=..., 
    aReflowState=..., aX=0, aY=0, aFlags=0, aStatus=@0x5f018c6c, aTracker=0x0)
    at /Volumes/source/mozilla-central/layout/generic/nsContainerFrame.cpp:947
#29 0x62e501f0 in nsCanvasFrame::Reflow (this=0x61f6ddd0, aPresContext=0x61f61400, aDesiredSize=..., aReflowState=..., aStatus=@0x5f018c6c)
    at /Volumes/source/mozilla-central/layout/generic/nsCanvasFrame.cpp:473
#30 0x62e3b092 in nsContainerFrame::ReflowChild (this=<optimized out>, aKidFrame=0x61f6ddd0, aPresContext=0x61f61400, aDesiredSize=..., 
    aReflowState=..., aX=0, aY=0, aFlags=3, aStatus=@0x5f018c6c, aTracker=0x0)
    at /Volumes/source/mozilla-central/layout/generic/nsContainerFrame.cpp:947
#31 0x62e4cbd4 in nsHTMLScrollFrame::ReflowScrolledFrame (this=0x61f6dfa0, aState=0x5f018d54, aAssumeHScroll=<optimized out>, 
    aAssumeVScroll=<optimized out>, aMetrics=0x5f018ca8, aFirstPass=true)
    at /Volumes/source/mozilla-central/layout/generic/nsGfxScrollFrame.cpp:526
#32 0x62e4ce8c in nsHTMLScrollFrame::ReflowContents (this=0x61f6dfa0, aState=0x5f018d54, aDesiredSize=<optimized out>)
    at /Volumes/source/mozilla-central/layout/generic/nsGfxScrollFrame.cpp:624
#33 0x62e4d1c2 in nsHTMLScrollFrame::Reflow (this=0x61f6dfa0, aPresContext=<optimized out>, aDesiredSize=..., aReflowState=..., 
    aStatus=@0x5f0192dc) at /Volumes/source/mozilla-central/layout/generic/nsGfxScrollFrame.cpp:865
#34 0x62e3b092 in nsContainerFrame::ReflowChild (this=<optimized out>, aKidFrame=0x61f6dfa0, aPresContext=0x61f61400, aDesiredSize=..., 
    aReflowState=..., aX=0, aY=0, aFlags=0, aStatus=@0x5f0192dc, aTracker=0x0)
    at /Volumes/source/mozilla-central/layout/generic/nsContainerFrame.cpp:947
#35 0x62e7462a in ViewportFrame::Reflow (this=0x61f6d7e8, aPresContext=0x61f61400, aDesiredSize=..., aReflowState=..., aStatus=@0x5f0192dc)
    at /Volumes/source/mozilla-central/layout/generic/nsViewportFrame.cpp:201
#36 0x62e15784 in PresShell::DoReflow (this=0x5e241e00, target=Cannot access memory at address 0x0
) at /Volumes/source/mozilla-central/layout/base/nsPresShell.cpp:7372
#37 0x62e1cdcc in PresShell::ProcessReflowCommands (this=0x5e241e00, aInterruptible=false)
    at /Volumes/source/mozilla-central/layout/base/nsPresShell.cpp:7518
#38 0x62e1d08c in PresShell::FlushPendingNotifications (this=0x5e241e00, aType=Flush_Layout)
    at /Volumes/source/mozilla-central/layout/base/nsPresShell.cpp:3848
#39 0x62e0670e in DocumentViewerImpl::LoadComplete (this=0x61ce74f0, aStatus=NS_OK)
    at /Volumes/source/mozilla-central/layout/base/nsDocumentViewer.cpp:975
#40 0x632e492e in nsDocShell::EndPageLoad (this=0x61794c00, aProgress=<optimized out>, aChannel=0x5e281cc0, aStatus=NS_OK)
    at /Volumes/source/mozilla-central/docshell/base/nsDocShell.cpp:6409
#41 0x632e1e24 in nsDocShell::OnStateChange (this=0x61794c00, aProgress=0x61794c14, aRequest=<optimized out>, aStateFlags=<optimized out>, 
    aStatus=NS_OK) at /Volumes/source/mozilla-central/docshell/base/nsDocShell.cpp:6240
#42 0x632ea2e4 in nsDocLoader::DoFireOnStateChange (this=0x61794c00, aProgress=0x61794c14, aRequest=<optimized out>, aStateFlags=@0x5f019784, 
    aStatus=NS_OK) at /Volumes/source/mozilla-central/uriloader/base/nsDocLoader.cpp:1351
#43 0x632ea972 in nsDocLoader::doStopDocumentLoad (this=<optimized out>, request=0x5e281cc0, aStatus=NS_OK)
    at /Volumes/source/mozilla-central/uriloader/base/nsDocLoader.cpp:931
#44 0x632eae00 in nsDocLoader::DocLoaderIsEmpty (this=0x61794c00, aFlushLayout=<optimized out>)
    at /Volumes/source/mozilla-central/uriloader/base/nsDocLoader.cpp:820
#45 0x632eaf72 in nsDocLoader::OnStopRequest (this=0x61794c00, aRequest=0x61bfd748, aCtxt=<optimized out>, aStatus=NS_OK)
    at /Volumes/source/mozilla-central/uriloader/base/nsDocLoader.cpp:704
#46 0x62d0ad0e in nsLoadGroup::RemoveRequest (this=0x6178f100, request=0x61bfd748, ctxt=0x0, aStatus=NS_OK)
    at /Volumes/source/mozilla-central/netwerk/base/src/nsLoadGroup.cpp:698
#47 0x62f25ac4 in nsDocument::DoUnblockOnload (this=0x61ce4400) at /Volumes/source/mozilla-central/content/base/src/nsDocument.cpp:7362
#48 0x62f25b16 in nsDocument::UnblockOnload (this=0x61ce4400, aFireSync=false)
    at /Volumes/source/mozilla-central/content/base/src/nsDocument.cpp:7304
#49 0x62f2bca6 in nsDocument::DispatchContentLoadedEvents (this=0x61ce4400) at /Volumes/source/mozilla-central/content/base/src/nsDocument.cpp:4380
#50 0x62f236e0 in nsRunnableMethodImpl<void (nsDocument::*)(), true>::Run (this=<optimized out>) at ../../../dist/include/nsThreadUtils.h:349
#51 0x63573782 in nsThread::ProcessNextEvent (this=0x5e2564c0, mayWait=<optimized out>, result=0x5f01994f)
    at /Volumes/source/mozilla-central/xpcom/threads/nsThread.cpp:612
#52 0x635518f2 in NS_ProcessNextEvent_P (thread=0x61ce4400, mayWait=false)
    at /Volumes/source/mozilla-central/objdir-droid/xpcom/build/nsThreadUtils.cpp:220
#53 0x634728ea in mozilla::ipc::MessagePump::Run (this=0x5e2532b0, aDelegate=0x5e27f0e0)
    at /Volumes/source/mozilla-central/ipc/glue/MessagePump.cpp:82
#54 0x63596280 in MessageLoop::RunInternal (this=0x63b372c4) at /Volumes/source/mozilla-central/ipc/chromium/src/base/message_loop.cc:208
#55 0x63596356 in RunHandler (this=<optimized out>) at /Volumes/source/mozilla-central/ipc/chromium/src/base/message_loop.cc:201
#56 MessageLoop::Run (this=0x5e27f0e0) at /Volumes/source/mozilla-central/ipc/chromium/src/base/message_loop.cc:175
#57 0x633f3160 in nsBaseAppShell::Run (this=0x5e2546b0) at /Volumes/source/mozilla-central/widget/xpwidgets/nsBaseAppShell.cpp:163
#58 0x63320a08 in nsAppStartup::Run (this=0x608c8820) at /Volumes/source/mozilla-central/toolkit/components/startup/nsAppStartup.cpp:290
#59 0x62cf80b6 in XREMain::XRE_mainRun (this=0x5f019af4) at /Volumes/source/mozilla-central/toolkit/xre/nsAppRunner.cpp:3782
#60 0x62cfa1bc in XREMain::XRE_main (this=0x5f019af4, argc=<optimized out>, argv=0x5e26b048, aAppData=<optimized out>)
    at /Volumes/source/mozilla-central/toolkit/xre/nsAppRunner.cpp:3848
#61 0x62cfa2f0 in XRE_main (argc=7, argv=0x5e26b048, aAppData=0x582df66c, aFlags=<optimized out>)
    at /Volumes/source/mozilla-central/toolkit/xre/nsAppRunner.cpp:3923
#62 0x62cfd63a in GeckoStart (data=0x5c679820, appData=0x582df66c) at /Volumes/source/mozilla-central/toolkit/xre/nsAndroidStartup.cpp:73
#63 0x582c7362 in Java_org_mozilla_gecko_GeckoAppShell_nativeRun (jenv=0x5c385358, jc=<optimized out>, jargs=0x20200005)
    at /Volumes/source/mozilla-central/mozglue/android/APKOpen.cpp:981
#64 0x4077b0b4 in dvmPlatformInvoke () from /Volumes/source/jimdb/moz-gdb/lib/HT24WW301665/system/lib/libdvm.so
#65 0x407aa266 in dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*) ()
   from /Volumes/source/jimdb/moz-gdb/lib/HT24WW301665/system/lib/libdvm.so
Comment 64 Brad Lassey [:blassey] (use needinfo?) 2012-10-02 13:17:14 PDT
This is our #1 unpatched crasher. Can we take the patch we have as a bandaid and leave this open to investigate the core issue?
Comment 65 Jonathan Kew (:jfkthame) 2012-10-03 04:43:32 PDT
Are people seeing this crash with earlier Firefox releases on CM10 devices, or does it only start appearing with Aurora or Nightly builds? (If so, can we get some kind of regression range?)

I'm a bit reluctant to take a band-aid that just avoids the crash here but doesn't address the root cause, given that the evidence seems to be suggesting heap corruption of some kind. If rogue code is writing wildly to the heap, then crashing quickly is arguably the best outcome; if something besides the FT_Library modules list is being trashed, we could easily have an exploitable situation.
Comment 66 Daniel Lombraña González 2012-10-03 08:54:00 PDT
Hi everyone,

First of all thanks to all of you for the efforts that you are putting into this issue. When I installed CM10 the Nightly version worked more or less well, you could open a web page, but when you opened a second one it crashed. Then, new commits were introduced and since then I cannot browse the web at all with the Nightly because as soon as I open a web page it crashes (I cannot event paste here crash reports, as even that page crashes the browser).

I've also installed in my CM10 Firefox Beta from the Google Play store, but it crashes too from time to time (sometime in this browser I can even enable the reader mode). In any case, if you need a new logcat or something else, let me know it. I'll be happy to help.

Best regards,

Daniel
Comment 67 Jonathan Kew (:jfkthame) 2012-10-03 09:18:34 PDT
If you're willing to spend some time testing, it might be helpful if you re-install older Nightly versions (mozilla-central-android builds) available from http://ftp.mozilla.org/pub/mozilla.org/mobile/nightly/ (they're in directories named according to date) to identify the specific build where the immediate-crash issue started happening. By going back a month or two (or however long it takes) to confirm when there was a working build, and then repeatedly bisecting the range since then, it shouldn't take more than a half-dozen or so tests to narrow down the date of the regression.
Comment 68 Brad Lassey [:blassey] (use needinfo?) 2012-10-03 10:14:35 PDT
Just to note, the STR for this bug is to enter reader mode. An immediate crash could be bug 792825, which was a regression from bug 788378, which was introduced on 9/18.
Comment 69 Julian Seward [:jseward] 2012-10-03 10:52:15 PDT
To clarify, then, the invalid writes I saw in comment #59 happen after
the page has finished loading, but before entering reader mode --
before I even pressed on the book-y icon to enter reader mode.

Brad, just to be unambiguous about this, can you nominate a specific
URL that causes the crash when entering reader mode?  So that we can
all be clear that we're talking about the same failure.
Comment 70 Brad Lassey [:blassey] (use needinfo?) 2012-10-03 12:38:44 PDT
At least some of the invalid writes here were bug 795234, Julian is going to re-run valgrind with a build that has that fix
Comment 71 Brad Lassey [:blassey] (use needinfo?) 2012-10-03 17:14:48 PDT
*** Bug 756140 has been marked as a duplicate of this bug. ***
Comment 72 Julian Seward [:jseward] 2012-10-03 17:44:23 PDT
I verified that the fix for bug 795234 gets rid of the invalid writes
noted in comment #59.  And with that fix in place, I can't get it to
crash, nor can I observe any memory corruption, when entering reader
mode in a cnn.com article, on ICS.  So maybe this is CM10 specific.
Comment 73 Daniel Lombraña González 2012-10-04 03:33:22 PDT
Hi,

Is it available the fix in the latest nightly? I would test it right away :-)

Daniel
Comment 74 Brad Lassey [:blassey] (use needinfo?) 2012-10-04 08:35:56 PDT
The fix for bug 795234 is in the latest nightly, but not the bandaid fix for this bug
Comment 75 Brad Lassey [:blassey] (use needinfo?) 2012-10-04 09:45:35 PDT
This prevents the crash as well:

diff --git a/modules/freetype2/include/freetype/internal/ftobjs.h b/modules/free
--- a/modules/freetype2/include/freetype/internal/ftobjs.h
+++ b/modules/freetype2/include/freetype/internal/ftobjs.h
@@ -436,10 +436,11 @@ FT_BEGIN_HEADER
   /*                                                                       */
   /*    memory  :: A handle to the memory manager.                         */
   /*                                                                       */
   typedef struct  FT_ModuleRec_
   {
+    char              buffer[1024];
     FT_Module_Class*  clazz;
     FT_Library        library;
     FT_Memory         memory;
 
   } FT_ModuleRec;
Comment 76 Jonathan Kew (:jfkthame) 2012-10-04 10:05:01 PDT
So at this point, we know that the FT_Library created by gfxAndroidPlatform has non-NULL module pointers when it is initially created, but at some later time when we try to use it, one of those pointers has become NULL. What we don't know is who changed it to NULL, or why.

In the absence of data watchpoints, which would be the obvious tool, maybe you could put the module-pointer-validation code (with an abort as soon as the NULL is found) into a global function in gfxAndroidPlatform and then call it from strategic places - e.g. each time around the browser's event loop, for a start - to try and narrow down when the change is happening.
Comment 77 Tyler Downer [:Tyler] 2012-10-05 10:26:04 PDT
As discussed in the Mobile QA meeting today, this CM10 crash is still popping up in Firefox Beta 16, so this looks like it will be on the release channel next week. 
https://input.mozilla.org/opinion/3242204
https://input.mozilla.org/opinion/3241521
Comment 78 Alex Keybl [:akeybl] 2012-10-05 10:41:53 PDT
(In reply to Tyler Downer [:Tyler] from comment #77)
> As discussed in the Mobile QA meeting today, this CM10 crash is still
> popping up in Firefox Beta 16, so this looks like it will be on the release
> channel next week. 
> https://input.mozilla.org/opinion/3242204
> https://input.mozilla.org/opinion/3241521

This bug is not marked as FF15/16 tracking/affected by those involved with stability. I believe the crashes you're seeing here are bug 756140 and were already present in FF15.

Does that match other people's expectations? Kevin - can you see if you can reproduce on FF15/16 with CM10?
Comment 79 James Willcox (:snorp) (jwillcox@mozilla.com) 2012-10-05 10:45:45 PDT
I was able to get valgrind going, here is the log from the run: http://people.mozilla.org/~jwillcox/valgrind-logcat-2.txt

libz seems to be overrunning. If I replace the libz on the system with the stock libz from jellybean the crash goes away. I'm doing a valgrind run with that right now to see what it looks like.
Comment 80 Brad Lassey [:blassey] (use needinfo?) 2012-10-05 11:09:54 PDT
Created attachment 668513 [details]
patch to force us to use in tree setlcdfilter

it looks like cairo is picking up the system setlcdfilter function and that's what's stomping on our memory.
Comment 81 Brad Lassey [:blassey] (use needinfo?) 2012-10-05 11:19:00 PDT
Created attachment 668517 [details] [diff] [review]
patch to force us to use in tree setlcdfilter

updated to include the patch and a comment in the README
Comment 82 Tyler Downer [:Tyler] 2012-10-05 12:02:41 PDT
(In reply to Alex Keybl [:akeybl] from comment #78)
Those input reports don't indicate closing however (which is a different set up input we are seeing).  So, not sure
Comment 83 Jonathan Kew (:jfkthame) 2012-10-05 12:12:40 PDT
Comment on attachment 668517 [details] [diff] [review]
patch to force us to use in tree setlcdfilter

Review of attachment 668517 [details] [diff] [review]:
-----------------------------------------------------------------

This looks correct to me - running with in-tree freetype but calling this function from system freetype sure sounds like a recipe for trouble!

(It also explains why the problem is system-dependent, as we might get away with the error if the system and in-tree versions happen to be compatible.)
Comment 84 Brad Lassey [:blassey] (use needinfo?) 2012-10-05 14:56:48 PDT
https://hg.mozilla.org/mozilla-central/rev/acd25563db2f
Comment 85 Brad Lassey [:blassey] (use needinfo?) 2012-10-05 15:38:57 PDT
Comment on attachment 668517 [details] [diff] [review]
patch to force us to use in tree setlcdfilter

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: 
Testing completed (on m-c, etc.): 
Risk to taking this patch (and alternatives if risky): 
String or UUID changes made by this patch:
Comment 86 Alex Keybl [:akeybl] 2012-10-05 17:08:33 PDT
Comment on attachment 668517 [details] [diff] [review]
patch to force us to use in tree setlcdfilter

[Triage Comment]
We'll want to see crash volume drop significantly after this lands on FF17, before considering taking in a 16.0.1.
Comment 87 Alex Keybl [:akeybl] 2012-10-05 17:08:50 PDT
(please land asap to make Monday's merge)
Comment 88 Mark Finkle (:mfinkle) (use needinfo?) 2012-10-05 17:23:48 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/bf4ea2d4dd97
Comment 89 Scoobidiver (away) 2012-10-06 08:36:53 PDT
(In reply to Daniel Lombraña González from comment #73)
> Is it available the fix in the latest nightly? I would test it right away :-)
Please, can you verify it's fixed for you?
Comment 90 Scoobidiver (away) 2012-10-07 01:11:26 PDT
The ft_module_get_service crashes in Trunk and XUL crashes in Aurora has stopped after builds from October 5.
Comment 91 Scoobidiver (away) 2012-10-07 01:13:47 PDT
*** Bug 795570 has been marked as a duplicate of this bug. ***
Comment 92 Daniel Lombraña González 2012-10-07 05:16:52 PDT
Hi everyone,

THANKS!!!!! It seems to be working right now, even the reader mode!!!! I'll be in the #mozfestival so if someone is around I would love to say thanks in person :-D 

Cheers,

Daniel
Comment 93 Aaron Train [:aaronmt] 2012-10-07 07:56:30 PDT
*** Bug 791781 has been marked as a duplicate of this bug. ***
Comment 94 Kevin Brosnan [:kbrosnan] 2012-10-08 15:00:53 PDT
*** Bug 756140 has been marked as a duplicate of this bug. ***
Comment 95 Alex Keybl [:akeybl] 2012-10-10 10:06:06 PDT
Comment on attachment 668517 [details] [diff] [review]
patch to force us to use in tree setlcdfilter

[Triage Comment]
Approving for inclusion in 16.0.1. This patch introduces almost zero risk, since affected users are already crashing constantly.
Comment 96 Aaron Train [:aaronmt] 2012-10-10 17:27:49 PDT
This landed as https://hg.mozilla.org/releases/mozilla-release/rev/10eed4e3cfd2

Verified fixed on mozilla-release (16.0.1) via my TF201/CM10 Nightly

Note You need to log in before you can comment on or make changes to this bug.