Closed Bug 719531 Opened 12 years ago Closed 12 years ago

FallibleTArray uses infallible malloc

Categories

(Core :: General, defect)

8 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla12
Tracking Status
firefox11 - fixed
firefox12 - fixed
firefox-esr10 14+ wontfix

People

(Reporter: gcp, Assigned: justin.lebar+bug)

References

Details

(Keywords: regression, Whiteboard: [qa-])

Attachments

(2 files, 1 obsolete file)

From bug 702217:

https://crash-stats.mozilla.com/report/index/1d2909f4-3a8f-461e-9948-0c0772120118

mozalloc_abort 	     
mozalloc_handle_oom  
nsTArray_base<nsTArrayFallibleAllocator>::EnsureCapacity

SetCapacity is being called on a FallibleTArray, and it's causing an OOM abort instead of returning false.
Assignee: nobody → justin.lebar+bug
OS: Windows Vista → All
Version: 8 Branch → Trunk
Assignee: justin.lebar+bug → nobody
Blocks: 680556
Keywords: regression
OS: All → Windows Vista
Version: Trunk → 8 Branch
Blocks: 702217
Messed up the try syntax on the first push above.  Here's another attempt at failing test only: 

  https://tbpl.mozilla.org/?tree=Try&rev=181bd403fc6e
Test (without the fix) fails with OOM, which is great.
Try run for 181bd403fc6e is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=181bd403fc6e
Results (out of 29 total builds):
    success: 25
    warnings: 4
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-181bd403fc6e
With the fix, the test doesn't fail, but we get link errors on Windows in what appears to be a C++ unit test.   :-/
Try run for c6191becddf1 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=c6191becddf1
Results (out of 162 total builds):
    success: 139
    warnings: 20
    failure: 3
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-c6191becddf1
OS: Windows Vista → All
Assignee: nobody → justin.lebar+bug
Try run for 0cdc12275ed7 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=0cdc12275ed7
Results (out of 165 total builds):
    success: 132
    warnings: 28
    failure: 5
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-0cdc12275ed7
Try run for b3b4409d61a3 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=b3b4409d61a3
Results (out of 95 total builds):
    success: 75
    warnings: 11
    failure: 9
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-b3b4409d61a3
 Timed out after 06 hours without completing.
Hm, I could use a bit of help here:

nsTArray is sometimes built without access to mozmemory.  What the current patch tries to do is, if mozmemory is not defined, fall back to raw malloc/free.  But try run in comment 12 shows that compile burning on Linux/Mac, where malloc/free are not defined.

What should our fallback be when there's no mozmemory?  We used to use NS_Alloc, but that's infallible, and it looks like the fallback should be fallible.

Also, is it a problem to use a different allocator for code which doesn't include mozalloc?  One of our constraints is that fallible and infallible need to share a free(); does the set of allocators we use when there's no mozalloc need to use moz_free() or a synonym?
Summary: FallibleTArray can cause OOM aborts → FallibleTArray uses infallible malloc
I think that malloc/free are the correct choice in the case the mozalloc is not available. I don't understand the bit about the compile errors, wouldn't you just need to #include <stdlib.h> ?
> wouldn't you just need to #include <stdlib.h> ?

Oh, probably!

https://tbpl.mozilla.org/?tree=Try&rev=26843cff372c
Attachment #589962 - Flags: review?(benjamin)
Attachment #590296 - Flags: review?(benjamin)
Try run for 26843cff372c is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=26843cff372c
Results (out of 206 total builds):
    success: 173
    warnings: 32
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-26843cff372c
Attachment #589962 - Flags: review?(benjamin) → review+
Attachment #590296 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/e8c23cd5fc79
https://hg.mozilla.org/mozilla-central/rev/1d68a3277ada
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla12
Attachment #589962 - Flags: approval-mozilla-aurora?
Comment on attachment 590296 [details] [diff] [review]
Part 2, v3: Make FallibleTArray use moz_malloc or malloc, as appropriate.

[Approval Request Comment]
Can we nominate this for Aurora and get it along still? Otherwise some of the fixes in Bug 702217 won't work, and that bug is a topcrasher.
Jusin & Benjamin: can you comment on the risk of taking this (if any)?
Attachment #590296 - Flags: approval-mozilla-aurora?
There's some risk: We don't know if other users of FallibleTArray are buggy in a way we'll see only with this patch.

I might still take this patch to fix the topcrasher, though.
>We don't know if other users of FallibleTArray are buggy in a way we'll see only 
>with this patch.

Such bugs would always start with: *instead* of crashing the browser..., right?
(In reply to Gian-Carlo Pascutto (:gcp) from comment #21)
> >We don't know if other users of FallibleTArray are buggy in a way we'll see only 
> >with this patch.
> 
> Such bugs would always start with: *instead* of crashing the browser...,
> right?

Yes, indeed.  For example, "Instead of safely crashing the browser on OOM, we ignore an allocation error and introduce a security hole."  :-/

Of course, by not pushing to Aurora, we may simply be delaying the inevitable -- that is, if there are security holes or other bugs, we're not likely to find them in six weeks of nightly, regardless of whether we take this patch upstream.
Note that this is basically reverting to the prior intended behavior, and we added FallibleTArray explicitly for this purpose, so it's relatively unlikely to cause bad behavior. I think we should take this path on Aurora.
Comment on attachment 590296 [details] [diff] [review]
Part 2, v3: Make FallibleTArray use moz_malloc or malloc, as appropriate.

[Triage Comment]
This is now on Aurora post-migration, so this is a beta approval request. We think it still makes sense to take if we can get it RIGHT NOW.
Attachment #590296 - Flags: approval-mozilla-aurora? → approval-mozilla-beta+
Landed as the first checkin to mozilla-beta after the merge.  So this change should be in Firefox 11.

(Actually, I landed on a completely red tree, which was perhaps not ideal. :)

https://hg.mozilla.org/releases/mozilla-beta/rev/8bf4b03ed889
https://hg.mozilla.org/releases/mozilla-beta/rev/a532dbd01acf
This landing broke mozilla-beta. Windows opt builds:

d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/m-beta-w32/build/config/pythonpath.py -I../../config /e/builds/moz2_slave/m-beta-w32/build/config/expandlibs_exec.py --uselist -- link -NOLOGO -OUT:firefox.exe -PDB:firefox.pdb -SUBSYSTEM:WINDOWS -ENTRY:wmainCRTStartup -LARGEADDRESSAWARE -NXCOMPAT -DYNAMICBASE -SAFESEH  -DEBUG -DEBUGTYPE:CV -DEBUG -OPT:REF -LTCG:PGINSTRUMENT   /HEAP:0x40000 -LIBPATH:../../dist/lib -NODEFAULTLIB:msvcrt -NODEFAULTLIB:msvcrtd -NODEFAULTLIB:msvcprt -NODEFAULTLIB:msvcprtd -DEFAULTLIB:mozcrt nsBrowserApp.obj ./module.res  e:/builds/moz2_slave/m-beta-w32/build/obj-firefox/dist/lib/xpcomglue.lib   kernel32.lib user32.lib gdi32.lib winmm.lib wsock32.lib advapi32.lib comctl32.lib comdlg32.lib uuid.lib shell32.lib ole32.lib oleaut32.lib version.lib winspool.lib usp10.lib msimg32.lib
nsBrowserApp.obj : error LNK2001: unresolved external symbol __imp__moz_xmalloc
nsBrowserApp.obj : error LNK2001: unresolved external symbol __imp__moz_free
xpcomglue.lib(nsStringAPI.obj) : error LNK2001: unresolved external symbol __imp__moz_xrealloc
firefox.exe : fatal error LNK1120: 3 unresolved externals

and Windows debug builds: 

d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/m-beta-w32-dbg/build/config/pythonpath.py -I../../../config /e/builds/moz2_slave/m-beta-w32-dbg/build/config/expandlibs_exec.py --uselist -- link -nologo -out:nsTestSample.exe -pdb:nsTestSample.pdb nsTestSample.obj  -LARGEADDRESSAWARE -NXCOMPAT -DYNAMICBASE -SAFESEH  -DEBUG -DEBUGTYPE:CV   e:/builds/moz2_slave/m-beta-w32-dbg/build/obj-firefox/dist/lib/mozglue.lib e:/builds/moz2_slave/m-beta-w32-dbg/build/obj-firefox/dist/lib/xpcomglue.lib   kernel32.lib user32.lib gdi32.lib winmm.lib wsock32.lib advapi32.lib
xpcomglue.lib(nsGlueLinkingWin.obj) : error LNK2019: unresolved external symbol __imp__moz_xmalloc referenced in function "void * __cdecl operator new(unsigned int)" (??2@YAPAXI@Z)
xpcomglue.lib(nsGlueLinkingWin.obj) : error LNK2019: unresolved external symbol __imp__moz_free referenced in function "void __cdecl operator delete(void *)" (??3@YAXPAX@Z)
nsTestSample.exe : fatal error LNK1120: 2 unresolved externals
That's...odd.
Whiteboard: [qa-]
Comment on attachment 589962 [details] [diff] [review]
Part 1, v1: Add test that FallibleTArray returns false instead of crashing.

[Triage Comment]
Presumably this is needed to go along with the other patch on Aurora 12.
Attachment #589962 - Flags: approval-mozilla-aurora? → approval-mozilla-beta+
*Beta 12
Depends on: 723809
It's needed in ESR if the patch of bug 702217 lands also.
Comment on attachment 589962 [details] [diff] [review]
Part 1, v1: Add test that FallibleTArray returns false instead of crashing.

esr10? per bug 702217 comment 80.
Attachment #589962 - Flags: approval-mozilla-esr10?
Attachment #590296 - Flags: approval-mozilla-esr10?
Attachment #589962 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+
Attachment #590296 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+
Could you please confirm that the right thing to do is not land this unless bug 702217 is going to land on ESR?  It sounds like we're not going to land bug 702217.
Wontfixing for ESR since this depends on bug 702217 which in turn relies on Telemetry code that doesn't exist in the esr10 branch.  We'll have to take this when it comes for free in the next ESR cut for 17.
Attachment #589962 - Flags: approval-mozilla-esr10+
Attachment #590296 - Flags: approval-mozilla-esr10+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: