Last Comment Bug 460713 - [FIX]Memory exhaustion setting HTMLSelect.length() [GSEC-TZO-26-2009]
: [FIX]Memory exhaustion setting HTMLSelect.length() [GSEC-TZO-26-2009]
Status: VERIFIED FIXED
[sg:dos][hold disclosure, see comment...
: verified1.8.1.19, verified1.9.0.5
Product: Core
Classification: Components
Component: General (show other bugs)
: 1.9.0 Branch
: x86 Windows XP
: -- critical (vote)
: ---
Assigned To: Boris Zbarsky [:bz]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-19 20:20 PDT by Thierry Zoller
Modified: 2009-07-21 15:56 PDT (History)
13 users (show)
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
POC (1.34 KB, text/html)
2008-10-19 20:21 PDT, Thierry Zoller
no flags Details
replaced this with foobar (1.35 KB, text/html)
2008-10-20 06:56 PDT, Thierry Zoller
no flags Details
Video showing the POC, Memory load + exceptions (424.80 KB, application/octet-stream)
2008-10-21 16:54 PDT, Thierry Zoller
no flags Details
Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number (2.05 KB, patch)
2008-10-28 22:14 PDT, Boris Zbarsky [:bz]
jst: review+
jst: superreview+
mbeltzner: approval1.9.1b2+
dveditz: approval1.8.1.19+
dveditz: approval1.9.0.5+
Details | Diff | Splinter Review
1.8.0 branch patch (1.55 KB, patch)
2008-12-08 06:41 PST, Martin Stránský
asac: approval1.8.0.next+
Details | Diff | Splinter Review

Description Thierry Zoller 2008-10-19 20:20:02 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3

Possible Integer miscaluclation in select.length()

Triggers huge memory allocations and then first and second chance access violations.

Reproducible: Always

Steps to Reproduce:
1. open attachement
2. click on POC
3.



First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=056ca0b0 ecx=00000000 edx=00290040 esi=069a7d8c edi=00000000
eip=605eea6b esp=0013e378 ebp=0013e5c8 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010246
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Program Files\Mozilla Firefox\xul.dll - 
xul!NS_NewLocalFile_P+0xddb:
605eea6b 8b7718          mov     esi,dword ptr [edi+18h] ds:0023:00000018=????????
0:000> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

FAULTING_IP: 
xul!NS_NewLocalFile_P+ddb
605eea6b 8b7718          mov     esi,dword ptr [edi+18h]

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 605eea6b (xul!NS_NewLocalFile_P+0x00000ddb)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: 00000018
Attempt to read from address 00000018

FAULTING_THREAD:  00000998
DEFAULT_BUCKET_ID:  APPLICATION_FAULT
PROCESS_NAME:  firefox.exe
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
READ_ADDRESS:  00000018 
BUGCHECK_STR:  ACCESS_VIOLATION
NTGLOBALFLAG:  70

APPLICATION_VERIFIER_FLAGS:  0
LAST_CONTROL_TRANSFER:  from 00000000 to 605eea6b

STACK_TEXT:  
0013e5c8 00000000 00000000 00000000 7c809a69 xul!NS_NewLocalFile_P+0xddb

FOLLOWUP_IP: 
xul!NS_NewLocalFile_P+ddb
605eea6b 8b7718          mov     esi,dword ptr [edi+18h]

SYMBOL_STACK_INDEX:  0
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME: xul
IMAGE_NAME:  xul.dll
DEBUG_FLR_IMAGE_TIMESTAMP:  48dae678
SYMBOL_NAME:  xul!NS_NewLocalFile_P+ddb
STACK_COMMAND:  ~0s ; kb
FAILURE_BUCKET_ID:  ACCESS_VIOLATION_xul!NS_NewLocalFile_P+ddb
BUCKET_ID:  ACCESS_VIOLATION_xul!NS_NewLocalFile_P+ddb

Followup: MachineOwner
--------
0:000> k
ChildEBP RetAddr  
WARNING: Stack unwind information not available. Following frames may be wrong.
0013e5c8 00000000 xul!NS_NewLocalFile_P+0xddb

0:000> g
(eb4.998): Access violation - code c0000005 (!!! second chance !!!)
eax=00000000 ebx=056ca0b0 ecx=00000000 edx=00290040 esi=069a7d8c edi=00000000
eip=605eea6b esp=0013e378 ebp=0013e5c8 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
xul!NS_NewLocalFile_P+0xddb:
605eea6b 8b7718          mov     esi,dword ptr [edi+18h] ds:0023:00000018=????????
Comment 1 Thierry Zoller 2008-10-19 20:21:30 PDT
Created attachment 343859 [details]
POC
Comment 2 Jesse Ruderman 2008-10-19 22:39:50 PDT
The POC just triggers a JavaScript syntax error for me.  You can't have a variable called "this".
Comment 3 Martijn Wargers [:mwargers] (not working for Mozilla) 2008-10-20 03:29:50 PDT
Do you have breakpad ids of the crash? Type about:crashes in the url bar.
http://kb.mozillazine.org/Breakpad
Comment 4 Thierry Zoller 2008-10-20 06:53:40 PDT
Dear Jesse,
Please replace all "this" occurences with "foobar", and restart the poc
Comment 5 Thierry Zoller 2008-10-20 06:54:50 PDT
Addednum: In order to reproduce the crash itself (not the memory hog only) you have to wait until memory fills up.
Comment 6 Thierry Zoller 2008-10-20 06:56:10 PDT
Created attachment 343895 [details]
replaced this with foobar
Comment 7 Daniel Veditz [:dveditz] 2008-10-21 07:53:47 PDT
I don't think the POC runs in my debug Firefox 3.0.4pre build. 

In a Firefox 2.0.0.18pre debug build I get an alert and then it locks up, eating lots of memory. Didn't die though, and when I woke up the build was responsive again (though my machine had pretty much no free memory) and I was able to exit cleanly.

Both were on Mac, both were debug. My machine has 4Gb memory so maybe it's an OOM problem that shows up with less memory. I'll try on a windows machine, and on an optimized build.
Comment 8 Thierry Zoller 2008-10-21 08:21:30 PDT
My test machine : 2GB RAM, Windows,NO DEP, NX bit disabled.

Ôn a 4bg machine the time required might be longer. 

@Daniel - Does run in 3.0.3, maybe wait longer until it east up enough memory.

Summary:
On my test machine, it locks up after eating around 1800mb-1900mb of total (VM) memory, then triggers access violation error in xul!NS_NewLocalFile_P

I have not investigated further as I hoped you will catch the error in the source (look for casting error as 0x7fffff is typical signed<->unsigned error)

Can somebody point to the code here ?
Comment 9 Jesse Ruderman 2008-10-21 15:35:27 PDT
It sounds like your testcase uses up as much memory as it can and then tests the OOM handling of everything it touches.  So it's hard to point to the code involved.

If you are using a Firefox release or nightly build, you can find the crashing code using Breakpad (see comment 3).  If you are using your own build, you can use your favorite debugger.  It's hard to debug release/nightly builds locally because they ship with most symbols stripped.

NS_NewLocalFile_P+0xddb sounds unlikely.  http://mxr.mozilla.org/mozilla/source/xpcom/glue/standalone/nsXPCOMGlue.cpp#195 is a small function.  Most likely, that's just the closest function that happened to have its symbol left in.
Comment 10 Thierry Zoller 2008-10-21 16:20:25 PDT
Thanks Jesse, please note that WinDBG reports the second chance exception at that function (relying on exports not symbols) 
605eea6b 8b7718          mov     esi,dword ptr [edi+18h]
ds:0023:00000018=????????

***************
It sounds like your testcase uses up as much memory as it can and then tests
the OOM handling of everything it touches.
***************
The testcase just calls select.lenght ONCE with the integer refered above, it's not the testcases that consumes memory, it's the function. To be more clear if I call select.lenght with 0xffffffff nothing happens, the bug only is triggered by the value 0x7ffffff, which makes me think of a casting error.

->It may not be the cause of the bug but the very last thing that breaks.
The cause should be the select.lenght implementation.

I will reproduce on another Workstation to make sure it always reproducable.
Comment 11 Jesse Ruderman 2008-10-21 16:42:44 PDT
nsHTMLSelectElement::SetLength perhaps?

http://mxr.mozilla.org/mozilla/source/content/html/content/src/nsHTMLSelectElement.cpp#724
Comment 12 Thierry Zoller 2008-10-21 16:51:03 PDT
Dear Jesse,
Just reproduced it on a clean machine, Software DEP (SEH), 3GB Ram
and made a video, the video is made using screen2exe, you can trust the exe it displays video.

Attach a debugger to catch the access violations, KiUserExceptionDispatcher is called and Software SEH catches in.
Comment 13 Thierry Zoller 2008-10-21 16:54:19 PDT
Created attachment 344201 [details]
Video showing the POC, Memory load + exceptions
Comment 14 Thierry Zoller 2008-10-21 17:13:34 PDT
Dear Jesse,
It's late at night here, so please take the below with a bit of salt :
Your hint to nsHTMLSelectElement::SetLength seems to be correct, the aLenght ist being defined as an unsigned value. The value we call 0x7fffff is the MAX value for an signed integer (2147483647). In the setlenght routine referes to aLength as a signed value
Comment 15 Jesse Ruderman 2008-10-21 17:30:05 PDT
bz or jst, can you look at nsHTMLSelectElement::SetLength and look for potential integer overflow spots?
Comment 16 Boris Zbarsky [:bz] 2008-10-21 17:38:16 PDT
Sure thing.  Right here:

761     for (i = curlen; i < (PRInt32)aLength; i++) {

This is why 0xffffffff doesn't work: the loop never runs.  But if you pass in PR_INT32_MAX (or anything smaller) we execute the loop that many times.

The thing is, technically setting HTMLSelectElement.length to an integer N is supposed (DOM0; sites rely on this) to make sure that the element has at least N options in it.  Which means you have to go and create all those HTMLOptionElements.  That's what this loop is doing.

I suppose we could clamp the allowed aLength values.  I'd be really curious to find out what IE does here.
Comment 17 Thierry Zoller 2008-10-21 17:57:47 PDT
I can comment on IE, however please don't open this bugreport for public view until they fixed the problem (if they plan too - i'll keep you updated).

IE6 goes in some sort of endless loop, but doesn't allocated x+n amount of memory
IE7 same
IE8 doesn't care
Comment 18 Boris Zbarsky [:bz] 2008-10-21 18:08:42 PDT
That doesn't really answer the question, though.  How does what IE does for N == 10, 100, 1000, 10000, etc behave?  That includes IE8.  I don't mean just the CPU/memory behavior, but what does the DOM and display look like?
Comment 19 Thierry Zoller 2008-10-21 18:17:40 PDT
I hope this doesn't sound arrogant, the answer is you'd have to find out, I can't tell myself, don't know.
Comment 20 Daniel Veditz [:dveditz] 2008-10-21 18:18:45 PDT
Forgot to update what I wrote in comment 7. On an optimized Windows 3.0.4pre build I get the alert and then a crash, but the crash reporter fails (the .dmp file is 0 bytes). I run with DEP turned on, but it didn't detect the problem: it was a regular firefox crash that triggered the crash reporter (regular except for not being able to save the crash data).

On a debug windows build it doesn't seem to crash.
Comment 21 Thierry Zoller 2008-10-21 18:20:57 PDT
Same here, wanted to submit the crash report but can't
Comment 22 Boris Zbarsky [:bz] 2008-10-21 19:38:33 PDT
I was mostly asking Jesse, to be honest.  Or Martijn.

In any case, we can certainly confirm this.  I'm not sure it's a security issue per se, other than as a simple way to cause OOM.
Comment 23 Thierry Zoller 2008-10-22 07:47:28 PDT
imho, it's more then OOM, the memory is corrupted (see access violation) they are reproducable on 2GB and 3GB machine and trigger at exactly the same function at exactly the same moment. On the 3GB machine it doesn't take 3gb to be filled but triggers exactly at the same moment. The absence of crash data also leads to think memory is corrupted.

I have not investigated whether you can control the flow in one way or another at all.
Comment 24 Thierry Zoller 2008-10-22 08:08:10 PDT
Addendum:
You are correct : The thread exits with the error not enough memory. Thats Access violation result in EDI being 0000000 and memory cannot be read.

PS.
Same for Chrome, Safari, IE5,6,7,8
Comment 25 Boris Zbarsky [:bz] 2008-10-22 08:26:06 PDT
Right; there's really only one place here where we're allocating memory in the loop, so the OOM will always happen at that spot with this testcase.  Specifically, the CloneNode() call here.
Comment 26 Thierry Zoller 2008-10-22 08:49:22 PDT
Still baffled at these Access Violations though, why are they happening and why these functions?

Anyways another 10char patch I guess
Comment 27 Boris Zbarsky [:bz] 2008-10-22 10:15:14 PDT
Nothing's going to get patched until we sort out the correct behavior here...

Which is particularly fun given the no-fail allocator stuff.  Ideal behavior here in some sense would be to go ahead and allocate nodes until we OOM, then back out, kill off the nodes we allocated, and return error to script.  That takes a long time, though, so can be used as a DOS attack.  And the no-fail allocator doesn't really allow this approach.

Which means we need to limit the range of allowed values for aLength, and then the question is what we can limit it to without breaking pages.  Here the ideal thing to do would be to raise the issue in whatwg, but we don't want to take this public....  <sigh>.
Comment 28 Daniel Veditz [:dveditz] 2008-10-22 10:41:16 PDT
Is it really that hard to come up with a cap value? A select is for users to choose things, right? How many choices can you have before the control is completely unusable? take that, multiply by 10 and you're still nowhere near 2G items.

32K? 64K? 100K?

If this is non-exploitable resource exhaustion it should be OK to talk about in whatwg. Does the official W3 HTML5 group have a private members-only mailing list like most W3 working groups?
Comment 29 Boris Zbarsky [:bz] 2008-10-22 10:50:11 PDT
> A select is for users to choose things, right?

You'd think...  But some sites use hidden selects to store info...

> How many choices can you have before the control is completely unusable?

Depends on whether it's a listbox that has display:none on some options (the ones filtered out by some filter the page has).  Sites _definitely_ do this.

> If this is non-exploitable resource exhaustion it should be OK to talk about
> in whatwg.

I _think_ it's non-exploitable in Gecko.  I can't make such a guarantee for IE and so forth...
Comment 30 Thierry Zoller 2008-10-22 13:37:53 PDT
You can contact secure@microsoft.com, security@google.com , product-security@apple.com and refer to the submissions of thierry@zoller.lu 
maybe you'll find a common value.

Afaik this is a Denial of service condition only, so if you keep it low key why not ask a general question at whatwg ?
Comment 31 Boris Zbarsky [:bz] 2008-10-22 13:45:20 PDT
For what it's worth, I did check with Ian, and he thinks this isn't something to be specced and that we should just pick an arbitrary value (assuming we don't want to do the "try to keep allocating till it fails" approach).
Comment 32 Thierry Zoller 2008-10-26 10:02:02 PDT
Any estimate on when a patch will be available ?
Comment 33 Boris Zbarsky [:bz] 2008-10-28 22:14:24 PDT
Created attachment 345238 [details] [diff] [review]
Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number
Comment 34 Boris Zbarsky [:bz] 2008-11-06 13:12:35 PST
Comment on attachment 345238 [details] [diff] [review]
Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number

Would be good to get beta shakeout of any possible compat issues (not that I expect many, but.....)
Comment 35 Mike Beltzner [:beltzner, not reading bugmail] 2008-11-10 16:54:23 PST
Comment on attachment 345238 [details] [diff] [review]
Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number

a=beltzner, kinda feels like beta baking would be good here
Comment 36 Boris Zbarsky [:bz] 2008-11-11 21:22:01 PST
Pushed changeset ebe3fb89171.  No test checked in yet, because this isn't public.
Comment 37 Boris Zbarsky [:bz] 2008-11-11 21:22:23 PST
Comment on attachment 345238 [details] [diff] [review]
Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number

Might be good to take this on branches too, once we get some trunk baking.
Comment 38 Daniel Veditz [:dveditz] 2008-11-13 10:21:50 PST
Comment on attachment 345238 [details] [diff] [review]
Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number

approved for 1.8.1.19 and 1.9.0.5, a=dveditz for release-drivers
Comment 39 Boris Zbarsky [:bz] 2008-11-17 08:07:08 PST
Fixed on both branches.
Comment 40 Thierry Zoller 2008-11-17 08:22:24 PST
Dear all,

Please wait with disclosure (if planned) until others have patched :
Current status :
- MS patch planned for next update
- Opera do no respond (Bug Id attributed)
- Google chrome and Android (no reponse)
- Apple (working on it)

Might take a while, tired of fighting with vendors to get responses..
Comment 41 Thierry Zoller 2008-11-17 08:35:25 PST
Addendum , please wait to make this bug report visible
Comment 42 Al Billings [:abillings] 2008-11-25 15:31:30 PST
Verified with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.19pre) Gecko/2008112503 BonEcho/2.0.0.19pre for 1.8.1.19.

Verified for 1.9.1.5 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5pre) Gecko/2008112505 GranParadiso/3.0.5pre.
Comment 43 Thierry Zoller 2008-11-26 11:46:02 PST
Dear all,
To what degree was/are Tunderbird and others using the engine affected?
Comment 44 Boris Zbarsky [:bz] 2008-11-26 12:51:06 PST
If they have JS enabled (it's not in Thunderbird by default), then they have the same behavior.
Comment 45 Martin Stránský 2008-12-08 06:41:50 PST
Created attachment 351893 [details] [diff] [review]
1.8.0 branch patch
Comment 46 Alexander Sack 2008-12-16 01:02:20 PST
Comment on attachment 351893 [details] [diff] [review]
1.8.0 branch patch

a=asac for 1.8.0
Comment 47 Thierry Zoller 2009-01-13 10:39:15 PST
Hi all,
Going to diclose on the : 15/01/2009 after which you might also proceed.
Comment 48 Thierry Zoller 2009-02-02 08:46:38 PST
Hi had to delay disclosure due to request from certain vendors, in the meantime in order to include it in my advisory, I'd need to know :
- in what version was the bug fixed
- do you have an officialy advisory for this bug ?
Comment 49 Samuel Sidler (old account; do not CC) 2009-02-02 08:51:59 PST
This was fixed in Firefox 3.0.5 and Firefox 2.0.0.19.

I'm not sure if there was an official advisory issued. Brandon?
Comment 50 Brandon Sterne (:bsterne) 2009-02-02 10:21:36 PST
(In reply to comment #49)
No, we did not release an advisory for this bug.
Comment 51 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-07-16 15:12:04 PDT
This is now public:
http://www.g-sec.lu/one-bug-to-rule-them-all.html
Comment 52 Thierry Zoller 2009-07-21 15:38:00 PDT
Dear All,
I have an additional comment to make. According to the W3C specs, the select.length attribute is READ only.  If you think of it it makes sense. However every browser I know of implemented this read/write. I am wondering whether it makes sense to change that. 

What use has a write ? Why should you be able to at all? Is there a legitimate use ? The only use I see, is to set it to NULL. But couldn't you just use select.remove() for that ?

Would appreciate any comments
Comment 53 Thierry Zoller 2009-07-21 15:56:54 PDT
FYI, found this comment from 2001, explaining it would break a lot of sites.
http://markmail.org/message/j4hhf4s34kbg7x6z

Note You need to log in before you can comment on or make changes to this bug.