Last Comment Bug 719531 - FallibleTArray uses infallible malloc
: FallibleTArray uses infallible malloc
Status: RESOLVED FIXED
[qa-]
: regression
Product: Core
Classification: Components
Component: General (show other bugs)
: 8 Branch
: All All
: -- normal (vote)
: mozilla12
Assigned To: Justin Lebar (not reading bugmail)
:
Mentors:
Depends on: 723809
Blocks: 680556 702217
  Show dependency treegraph
 
Reported: 2012-01-19 11:43 PST by Gian-Carlo Pascutto [:gcp]
Modified: 2012-07-09 15:07 PDT (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
fixed
-
fixed
14+
wontfix


Attachments
Part 1, v1: Add test that FallibleTArray returns false instead of crashing. (2.11 KB, patch)
2012-01-19 13:07 PST, Justin Lebar (not reading bugmail)
benjamin: review+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review
Part 2, v2: Make FallibleTArray use moz_malloc. (1.48 KB, patch)
2012-01-19 13:07 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Part 2, v3: Make FallibleTArray use moz_malloc or malloc, as appropriate. (2.13 KB, patch)
2012-01-20 12:25 PST, Justin Lebar (not reading bugmail)
benjamin: review+
bugzilla: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Gian-Carlo Pascutto [:gcp] 2012-01-19 11:43:24 PST
From bug 702217:

https://crash-stats.mozilla.com/report/index/1d2909f4-3a8f-461e-9948-0c0772120118

mozalloc_abort 	     
mozalloc_handle_oom  
nsTArray_base<nsTArrayFallibleAllocator>::EnsureCapacity

SetCapacity is being called on a FallibleTArray, and it's causing an OOM abort instead of returning false.
Comment 1 Justin Lebar (not reading bugmail) 2012-01-19 13:07:16 PST
Created attachment 589962 [details] [diff] [review]
Part 1, v1: Add test that FallibleTArray returns false instead of crashing.
Comment 2 Justin Lebar (not reading bugmail) 2012-01-19 13:07:29 PST
Created attachment 589963 [details] [diff] [review]
Part 2, v2: Make FallibleTArray use moz_malloc.
Comment 3 Justin Lebar (not reading bugmail) 2012-01-19 13:09:32 PST
Hopefully failing test only: https://tbpl.mozilla.org/?tree=Try&rev=8d6754fb4812
Test plus fix: https://tbpl.mozilla.org/?tree=Try&rev=c6191becddf1
Comment 4 Justin Lebar (not reading bugmail) 2012-01-19 13:39:04 PST
Messed up the try syntax on the first push above.  Here's another attempt at failing test only: 

  https://tbpl.mozilla.org/?tree=Try&rev=181bd403fc6e
Comment 5 Justin Lebar (not reading bugmail) 2012-01-19 15:12:09 PST
Test (without the fix) fails with OOM, which is great.
Comment 6 Mozilla RelEng Bot 2012-01-19 15:45:44 PST
Try run for 181bd403fc6e is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=181bd403fc6e
Results (out of 29 total builds):
    success: 25
    warnings: 4
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-181bd403fc6e
Comment 7 Justin Lebar (not reading bugmail) 2012-01-19 17:58:25 PST
With the fix, the test doesn't fail, but we get link errors on Windows in what appears to be a C++ unit test.   :-/
Comment 8 Mozilla RelEng Bot 2012-01-19 19:01:00 PST
Try run for c6191becddf1 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=c6191becddf1
Results (out of 162 total builds):
    success: 139
    warnings: 20
    failure: 3
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-c6191becddf1
Comment 9 Justin Lebar (not reading bugmail) 2012-01-20 12:25:01 PST
Created attachment 590296 [details] [diff] [review]
Part 2, v3: Make FallibleTArray use moz_malloc or malloc, as appropriate.
Comment 10 Justin Lebar (not reading bugmail) 2012-01-20 14:14:00 PST
https://tbpl.mozilla.org/?tree=Try&rev=0cdc12275ed7
Comment 11 Mozilla RelEng Bot 2012-01-20 18:45:36 PST
Try run for 0cdc12275ed7 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=0cdc12275ed7
Results (out of 165 total builds):
    success: 132
    warnings: 28
    failure: 5
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-0cdc12275ed7
Comment 12 Mozilla RelEng Bot 2012-01-21 16:01:06 PST
Try run for b3b4409d61a3 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=b3b4409d61a3
Results (out of 95 total builds):
    success: 75
    warnings: 11
    failure: 9
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-b3b4409d61a3
 Timed out after 06 hours without completing.
Comment 13 Justin Lebar (not reading bugmail) 2012-01-22 03:25:10 PST
Hm, I could use a bit of help here:

nsTArray is sometimes built without access to mozmemory.  What the current patch tries to do is, if mozmemory is not defined, fall back to raw malloc/free.  But try run in comment 12 shows that compile burning on Linux/Mac, where malloc/free are not defined.

What should our fallback be when there's no mozmemory?  We used to use NS_Alloc, but that's infallible, and it looks like the fallback should be fallible.

Also, is it a problem to use a different allocator for code which doesn't include mozalloc?  One of our constraints is that fallible and infallible need to share a free(); does the set of allocators we use when there's no mozalloc need to use moz_free() or a synonym?
Comment 14 Benjamin Smedberg [:bsmedberg] 2012-01-22 08:04:26 PST
I think that malloc/free are the correct choice in the case the mozalloc is not available. I don't understand the bit about the compile errors, wouldn't you just need to #include <stdlib.h> ?
Comment 15 Justin Lebar (not reading bugmail) 2012-01-22 10:22:27 PST
> wouldn't you just need to #include <stdlib.h> ?

Oh, probably!

https://tbpl.mozilla.org/?tree=Try&rev=26843cff372c
Comment 16 Mozilla RelEng Bot 2012-01-22 14:15:30 PST
Try run for 26843cff372c is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=26843cff372c
Results (out of 206 total builds):
    success: 173
    warnings: 32
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-26843cff372c
Comment 19 Gian-Carlo Pascutto [:gcp] 2012-01-30 00:47:52 PST
Comment on attachment 590296 [details] [diff] [review]
Part 2, v3: Make FallibleTArray use moz_malloc or malloc, as appropriate.

[Approval Request Comment]
Can we nominate this for Aurora and get it along still? Otherwise some of the fixes in Bug 702217 won't work, and that bug is a topcrasher.
Jusin & Benjamin: can you comment on the risk of taking this (if any)?
Comment 20 Justin Lebar (not reading bugmail) 2012-01-30 06:40:26 PST
There's some risk: We don't know if other users of FallibleTArray are buggy in a way we'll see only with this patch.

I might still take this patch to fix the topcrasher, though.
Comment 21 Gian-Carlo Pascutto [:gcp] 2012-01-30 06:43:49 PST
>We don't know if other users of FallibleTArray are buggy in a way we'll see only 
>with this patch.

Such bugs would always start with: *instead* of crashing the browser..., right?
Comment 22 Justin Lebar (not reading bugmail) 2012-01-30 06:50:12 PST
(In reply to Gian-Carlo Pascutto (:gcp) from comment #21)
> >We don't know if other users of FallibleTArray are buggy in a way we'll see only 
> >with this patch.
> 
> Such bugs would always start with: *instead* of crashing the browser...,
> right?

Yes, indeed.  For example, "Instead of safely crashing the browser on OOM, we ignore an allocation error and introduce a security hole."  :-/

Of course, by not pushing to Aurora, we may simply be delaying the inevitable -- that is, if there are security holes or other bugs, we're not likely to find them in six weeks of nightly, regardless of whether we take this patch upstream.
Comment 23 Benjamin Smedberg [:bsmedberg] 2012-01-30 08:03:17 PST
Note that this is basically reverting to the prior intended behavior, and we added FallibleTArray explicitly for this purpose, so it's relatively unlikely to cause bad behavior. I think we should take this path on Aurora.
Comment 24 Johnathan Nightingale [:johnath] 2012-01-31 14:49:43 PST
Comment on attachment 590296 [details] [diff] [review]
Part 2, v3: Make FallibleTArray use moz_malloc or malloc, as appropriate.

[Triage Comment]
This is now on Aurora post-migration, so this is a beta approval request. We think it still makes sense to take if we can get it RIGHT NOW.
Comment 25 Justin Lebar (not reading bugmail) 2012-01-31 21:23:40 PST
Landed as the first checkin to mozilla-beta after the merge.  So this change should be in Firefox 11.

(Actually, I landed on a completely red tree, which was perhaps not ideal. :)

https://hg.mozilla.org/releases/mozilla-beta/rev/8bf4b03ed889
https://hg.mozilla.org/releases/mozilla-beta/rev/a532dbd01acf
Comment 26 Mark Banner (:standard8) 2012-02-01 00:26:48 PST
This landing broke mozilla-beta. Windows opt builds:

d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/m-beta-w32/build/config/pythonpath.py -I../../config /e/builds/moz2_slave/m-beta-w32/build/config/expandlibs_exec.py --uselist -- link -NOLOGO -OUT:firefox.exe -PDB:firefox.pdb -SUBSYSTEM:WINDOWS -ENTRY:wmainCRTStartup -LARGEADDRESSAWARE -NXCOMPAT -DYNAMICBASE -SAFESEH  -DEBUG -DEBUGTYPE:CV -DEBUG -OPT:REF -LTCG:PGINSTRUMENT   /HEAP:0x40000 -LIBPATH:../../dist/lib -NODEFAULTLIB:msvcrt -NODEFAULTLIB:msvcrtd -NODEFAULTLIB:msvcprt -NODEFAULTLIB:msvcprtd -DEFAULTLIB:mozcrt nsBrowserApp.obj ./module.res  e:/builds/moz2_slave/m-beta-w32/build/obj-firefox/dist/lib/xpcomglue.lib   kernel32.lib user32.lib gdi32.lib winmm.lib wsock32.lib advapi32.lib comctl32.lib comdlg32.lib uuid.lib shell32.lib ole32.lib oleaut32.lib version.lib winspool.lib usp10.lib msimg32.lib
nsBrowserApp.obj : error LNK2001: unresolved external symbol __imp__moz_xmalloc
nsBrowserApp.obj : error LNK2001: unresolved external symbol __imp__moz_free
xpcomglue.lib(nsStringAPI.obj) : error LNK2001: unresolved external symbol __imp__moz_xrealloc
firefox.exe : fatal error LNK1120: 3 unresolved externals

and Windows debug builds: 

d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/m-beta-w32-dbg/build/config/pythonpath.py -I../../../config /e/builds/moz2_slave/m-beta-w32-dbg/build/config/expandlibs_exec.py --uselist -- link -nologo -out:nsTestSample.exe -pdb:nsTestSample.pdb nsTestSample.obj  -LARGEADDRESSAWARE -NXCOMPAT -DYNAMICBASE -SAFESEH  -DEBUG -DEBUGTYPE:CV   e:/builds/moz2_slave/m-beta-w32-dbg/build/obj-firefox/dist/lib/mozglue.lib e:/builds/moz2_slave/m-beta-w32-dbg/build/obj-firefox/dist/lib/xpcomglue.lib   kernel32.lib user32.lib gdi32.lib winmm.lib wsock32.lib advapi32.lib
xpcomglue.lib(nsGlueLinkingWin.obj) : error LNK2019: unresolved external symbol __imp__moz_xmalloc referenced in function "void * __cdecl operator new(unsigned int)" (??2@YAPAXI@Z)
xpcomglue.lib(nsGlueLinkingWin.obj) : error LNK2019: unresolved external symbol __imp__moz_free referenced in function "void __cdecl operator delete(void *)" (??3@YAXPAX@Z)
nsTestSample.exe : fatal error LNK1120: 2 unresolved externals
Comment 27 Justin Lebar (not reading bugmail) 2012-02-01 06:52:02 PST
That's...odd.
Comment 28 Justin Lebar (not reading bugmail) 2012-02-01 08:29:54 PST
Fix pushed (works locally): https://hg.mozilla.org/releases/mozilla-beta/rev/12c84a8fdd87
Comment 29 Alex Keybl [:akeybl] 2012-02-02 07:13:34 PST
Comment on attachment 589962 [details] [diff] [review]
Part 1, v1: Add test that FallibleTArray returns false instead of crashing.

[Triage Comment]
Presumably this is needed to go along with the other patch on Aurora 12.
Comment 30 Alex Keybl [:akeybl] 2012-02-02 07:13:50 PST
*Beta 12
Comment 32 Scoobidiver (away) 2012-06-14 23:38:18 PDT
It's needed in ESR if the patch of bug 702217 lands also.
Comment 33 Justin Lebar (not reading bugmail) 2012-07-05 16:21:54 PDT
Comment on attachment 589962 [details] [diff] [review]
Part 1, v1: Add test that FallibleTArray returns false instead of crashing.

esr10? per bug 702217 comment 80.
Comment 34 Justin Lebar (not reading bugmail) 2012-07-06 06:12:34 PDT
Could you please confirm that the right thing to do is not land this unless bug 702217 is going to land on ESR?  It sounds like we're not going to land bug 702217.
Comment 35 Lukas Blakk [:lsblakk] use ?needinfo 2012-07-09 15:07:03 PDT
Wontfixing for ESR since this depends on bug 702217 which in turn relies on Telemetry code that doesn't exist in the esr10 branch.  We'll have to take this when it comes for free in the next ESR cut for 17.

Note You need to log in before you can comment on or make changes to this bug.