Closed Bug 460713 Opened 16 years ago Closed 16 years ago

[FIX]Memory exhaustion setting HTMLSelect.length() [GSEC-TZO-26-2009]

Categories

(Core :: General, defect)

1.9.0 Branch
x86
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: thierry, Assigned: bzbarsky)

Details

(Keywords: verified1.8.1.19, verified1.9.0.5, Whiteboard: [sg:dos][hold disclosure, see comment 40])

Attachments

(4 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3 Possible Integer miscaluclation in select.length() Triggers huge memory allocations and then first and second chance access violations. Reproducible: Always Steps to Reproduce: 1. open attachement 2. click on POC 3. First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=00000000 ebx=056ca0b0 ecx=00000000 edx=00290040 esi=069a7d8c edi=00000000 eip=605eea6b esp=0013e378 ebp=0013e5c8 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Program Files\Mozilla Firefox\xul.dll - xul!NS_NewLocalFile_P+0xddb: 605eea6b 8b7718 mov esi,dword ptr [edi+18h] ds:0023:00000018=???????? 0:000> !analyze -v ******************************************************************************* * * * Exception Analysis * * * ******************************************************************************* FAULTING_IP: xul!NS_NewLocalFile_P+ddb 605eea6b 8b7718 mov esi,dword ptr [edi+18h] EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 605eea6b (xul!NS_NewLocalFile_P+0x00000ddb) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 00000000 Parameter[1]: 00000018 Attempt to read from address 00000018 FAULTING_THREAD: 00000998 DEFAULT_BUCKET_ID: APPLICATION_FAULT PROCESS_NAME: firefox.exe ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s". READ_ADDRESS: 00000018 BUGCHECK_STR: ACCESS_VIOLATION NTGLOBALFLAG: 70 APPLICATION_VERIFIER_FLAGS: 0 LAST_CONTROL_TRANSFER: from 00000000 to 605eea6b STACK_TEXT: 0013e5c8 00000000 00000000 00000000 7c809a69 xul!NS_NewLocalFile_P+0xddb FOLLOWUP_IP: xul!NS_NewLocalFile_P+ddb 605eea6b 8b7718 mov esi,dword ptr [edi+18h] SYMBOL_STACK_INDEX: 0 FOLLOWUP_NAME: MachineOwner MODULE_NAME: xul IMAGE_NAME: xul.dll DEBUG_FLR_IMAGE_TIMESTAMP: 48dae678 SYMBOL_NAME: xul!NS_NewLocalFile_P+ddb STACK_COMMAND: ~0s ; kb FAILURE_BUCKET_ID: ACCESS_VIOLATION_xul!NS_NewLocalFile_P+ddb BUCKET_ID: ACCESS_VIOLATION_xul!NS_NewLocalFile_P+ddb Followup: MachineOwner -------- 0:000> k ChildEBP RetAddr WARNING: Stack unwind information not available. Following frames may be wrong. 0013e5c8 00000000 xul!NS_NewLocalFile_P+0xddb 0:000> g (eb4.998): Access violation - code c0000005 (!!! second chance !!!) eax=00000000 ebx=056ca0b0 ecx=00000000 edx=00290040 esi=069a7d8c edi=00000000 eip=605eea6b esp=0013e378 ebp=0013e5c8 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 xul!NS_NewLocalFile_P+0xddb: 605eea6b 8b7718 mov esi,dword ptr [edi+18h] ds:0023:00000018=????????
Attached file POC (obsolete) —
The POC just triggers a JavaScript syntax error for me. You can't have a variable called "this".
Component: Security → General
Product: Firefox → Core
QA Contact: firefox → general
Version: unspecified → 1.9.0 Branch
Do you have breakpad ids of the crash? Type about:crashes in the url bar. http://kb.mozillazine.org/Breakpad
Dear Jesse, Please replace all "this" occurences with "foobar", and restart the poc
Addednum: In order to reproduce the crash itself (not the memory hog only) you have to wait until memory fills up.
Attachment #343859 - Attachment is obsolete: true
I don't think the POC runs in my debug Firefox 3.0.4pre build. In a Firefox 2.0.0.18pre debug build I get an alert and then it locks up, eating lots of memory. Didn't die though, and when I woke up the build was responsive again (though my machine had pretty much no free memory) and I was able to exit cleanly. Both were on Mac, both were debug. My machine has 4Gb memory so maybe it's an OOM problem that shows up with less memory. I'll try on a windows machine, and on an optimized build.
My test machine : 2GB RAM, Windows,NO DEP, NX bit disabled. Ôn a 4bg machine the time required might be longer. @Daniel - Does run in 3.0.3, maybe wait longer until it east up enough memory. Summary: On my test machine, it locks up after eating around 1800mb-1900mb of total (VM) memory, then triggers access violation error in xul!NS_NewLocalFile_P I have not investigated further as I hoped you will catch the error in the source (look for casting error as 0x7fffff is typical signed<->unsigned error) Can somebody point to the code here ?
It sounds like your testcase uses up as much memory as it can and then tests the OOM handling of everything it touches. So it's hard to point to the code involved. If you are using a Firefox release or nightly build, you can find the crashing code using Breakpad (see comment 3). If you are using your own build, you can use your favorite debugger. It's hard to debug release/nightly builds locally because they ship with most symbols stripped. NS_NewLocalFile_P+0xddb sounds unlikely. http://mxr.mozilla.org/mozilla/source/xpcom/glue/standalone/nsXPCOMGlue.cpp#195 is a small function. Most likely, that's just the closest function that happened to have its symbol left in.
Thanks Jesse, please note that WinDBG reports the second chance exception at that function (relying on exports not symbols) 605eea6b 8b7718 mov esi,dword ptr [edi+18h] ds:0023:00000018=???????? *************** It sounds like your testcase uses up as much memory as it can and then tests the OOM handling of everything it touches. *************** The testcase just calls select.lenght ONCE with the integer refered above, it's not the testcases that consumes memory, it's the function. To be more clear if I call select.lenght with 0xffffffff nothing happens, the bug only is triggered by the value 0x7ffffff, which makes me think of a casting error. ->It may not be the cause of the bug but the very last thing that breaks. The cause should be the select.lenght implementation. I will reproduce on another Workstation to make sure it always reproducable.
Dear Jesse, Just reproduced it on a clean machine, Software DEP (SEH), 3GB Ram and made a video, the video is made using screen2exe, you can trust the exe it displays video. Attach a debugger to catch the access violations, KiUserExceptionDispatcher is called and Software SEH catches in.
Dear Jesse, It's late at night here, so please take the below with a bit of salt : Your hint to nsHTMLSelectElement::SetLength seems to be correct, the aLenght ist being defined as an unsigned value. The value we call 0x7fffff is the MAX value for an signed integer (2147483647). In the setlenght routine referes to aLength as a signed value
bz or jst, can you look at nsHTMLSelectElement::SetLength and look for potential integer overflow spots?
Sure thing. Right here: 761 for (i = curlen; i < (PRInt32)aLength; i++) { This is why 0xffffffff doesn't work: the loop never runs. But if you pass in PR_INT32_MAX (or anything smaller) we execute the loop that many times. The thing is, technically setting HTMLSelectElement.length to an integer N is supposed (DOM0; sites rely on this) to make sure that the element has at least N options in it. Which means you have to go and create all those HTMLOptionElements. That's what this loop is doing. I suppose we could clamp the allowed aLength values. I'd be really curious to find out what IE does here.
I can comment on IE, however please don't open this bugreport for public view until they fixed the problem (if they plan too - i'll keep you updated). IE6 goes in some sort of endless loop, but doesn't allocated x+n amount of memory IE7 same IE8 doesn't care
That doesn't really answer the question, though. How does what IE does for N == 10, 100, 1000, 10000, etc behave? That includes IE8. I don't mean just the CPU/memory behavior, but what does the DOM and display look like?
I hope this doesn't sound arrogant, the answer is you'd have to find out, I can't tell myself, don't know.
Forgot to update what I wrote in comment 7. On an optimized Windows 3.0.4pre build I get the alert and then a crash, but the crash reporter fails (the .dmp file is 0 bytes). I run with DEP turned on, but it didn't detect the problem: it was a regular firefox crash that triggered the crash reporter (regular except for not being able to save the crash data). On a debug windows build it doesn't seem to crash.
Same here, wanted to submit the crash report but can't
Summary: Possible Integer overflow → Memory exhaustion (was Possible Integer overflow)
I was mostly asking Jesse, to be honest. Or Martijn. In any case, we can certainly confirm this. I'm not sure it's a security issue per se, other than as a simple way to cause OOM.
Status: UNCONFIRMED → NEW
Ever confirmed: true
imho, it's more then OOM, the memory is corrupted (see access violation) they are reproducable on 2GB and 3GB machine and trigger at exactly the same function at exactly the same moment. On the 3GB machine it doesn't take 3gb to be filled but triggers exactly at the same moment. The absence of crash data also leads to think memory is corrupted. I have not investigated whether you can control the flow in one way or another at all.
Addendum: You are correct : The thread exits with the error not enough memory. Thats Access violation result in EDI being 0000000 and memory cannot be read. PS. Same for Chrome, Safari, IE5,6,7,8
Right; there's really only one place here where we're allocating memory in the loop, so the OOM will always happen at that spot with this testcase. Specifically, the CloneNode() call here.
Still baffled at these Access Violations though, why are they happening and why these functions? Anyways another 10char patch I guess
Nothing's going to get patched until we sort out the correct behavior here... Which is particularly fun given the no-fail allocator stuff. Ideal behavior here in some sense would be to go ahead and allocate nodes until we OOM, then back out, kill off the nodes we allocated, and return error to script. That takes a long time, though, so can be used as a DOS attack. And the no-fail allocator doesn't really allow this approach. Which means we need to limit the range of allowed values for aLength, and then the question is what we can limit it to without breaking pages. Here the ideal thing to do would be to raise the issue in whatwg, but we don't want to take this public.... <sigh>.
Is it really that hard to come up with a cap value? A select is for users to choose things, right? How many choices can you have before the control is completely unusable? take that, multiply by 10 and you're still nowhere near 2G items. 32K? 64K? 100K? If this is non-exploitable resource exhaustion it should be OK to talk about in whatwg. Does the official W3 HTML5 group have a private members-only mailing list like most W3 working groups?
Whiteboard: [sg:dos]
> A select is for users to choose things, right? You'd think... But some sites use hidden selects to store info... > How many choices can you have before the control is completely unusable? Depends on whether it's a listbox that has display:none on some options (the ones filtered out by some filter the page has). Sites _definitely_ do this. > If this is non-exploitable resource exhaustion it should be OK to talk about > in whatwg. I _think_ it's non-exploitable in Gecko. I can't make such a guarantee for IE and so forth...
You can contact secure@microsoft.com, security@google.com , product-security@apple.com and refer to the submissions of thierry@zoller.lu maybe you'll find a common value. Afaik this is a Denial of service condition only, so if you keep it low key why not ask a general question at whatwg ?
For what it's worth, I did check with Ian, and he thinks this isn't something to be specced and that we should just pick an arbitrary value (assuming we don't want to do the "try to keep allocating till it fails" approach).
Any estimate on when a patch will be available ?
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Attachment #345238 - Flags: superreview?(jst)
Attachment #345238 - Flags: review?(jst)
Summary: Memory exhaustion (was Possible Integer overflow) → [FIX]Memory exhaustion (was Possible Integer overflow)
Attachment #345238 - Flags: superreview?(jst)
Attachment #345238 - Flags: superreview+
Attachment #345238 - Flags: review?(jst)
Attachment #345238 - Flags: review+
Comment on attachment 345238 [details] [diff] [review] Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number Would be good to get beta shakeout of any possible compat issues (not that I expect many, but.....)
Attachment #345238 - Flags: approval1.9.1b2?
Comment on attachment 345238 [details] [diff] [review] Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number a=beltzner, kinda feels like beta baking would be good here
Attachment #345238 - Flags: approval1.9.1b2? → approval1.9.1b2+
Pushed changeset ebe3fb89171. No test checked in yet, because this isn't public.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Comment on attachment 345238 [details] [diff] [review] Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number Might be good to take this on branches too, once we get some trunk baking.
Attachment #345238 - Flags: approval1.9.0.5?
Attachment #345238 - Flags: approval1.8.1.18?
Attachment #345238 - Flags: approval1.8.1.18? → approval1.8.1.19?
Attachment #345238 - Flags: approval1.9.0.5?
Attachment #345238 - Flags: approval1.9.0.5+
Attachment #345238 - Flags: approval1.8.1.19?
Attachment #345238 - Flags: approval1.8.1.19+
Comment on attachment 345238 [details] [diff] [review] Talked to some Apple and Opera guys at the WHATWG social, and we decided this was a good number approved for 1.8.1.19 and 1.9.0.5, a=dveditz for release-drivers
Fixed on both branches.
Dear all, Please wait with disclosure (if planned) until others have patched : Current status : - MS patch planned for next update - Opera do no respond (Bug Id attributed) - Google chrome and Android (no reponse) - Apple (working on it) Might take a while, tired of fighting with vendors to get responses..
Addendum , please wait to make this bug report visible
Whiteboard: [sg:dos] → [sg:dos][hold disclosure, see comment 40]
Verified with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.19pre) Gecko/2008112503 BonEcho/2.0.0.19pre for 1.8.1.19. Verified for 1.9.1.5 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5pre) Gecko/2008112505 GranParadiso/3.0.5pre.
Status: RESOLVED → VERIFIED
Dear all, To what degree was/are Tunderbird and others using the engine affected?
If they have JS enabled (it's not in Thunderbird by default), then they have the same behavior.
Comment on attachment 351893 [details] [diff] [review] 1.8.0 branch patch a=asac for 1.8.0
Attachment #351893 - Flags: approval1.8.0.next+
Summary: [FIX]Memory exhaustion (was Possible Integer overflow) → [FIX]Memory exhaustion setting HTMLSelect.length()
Hi all, Going to diclose on the : 15/01/2009 after which you might also proceed.
Hi had to delay disclosure due to request from certain vendors, in the meantime in order to include it in my advisory, I'd need to know : - in what version was the bug fixed - do you have an officialy advisory for this bug ?
This was fixed in Firefox 3.0.5 and Firefox 2.0.0.19. I'm not sure if there was an official advisory issued. Brandon?
(In reply to comment #49) No, we did not release an advisory for this bug.
Group: core-security
Summary: [FIX]Memory exhaustion setting HTMLSelect.length() → [FIX]Memory exhaustion setting HTMLSelect.length() [GSEC-TZO-26-2009]
Dear All, I have an additional comment to make. According to the W3C specs, the select.length attribute is READ only. If you think of it it makes sense. However every browser I know of implemented this read/write. I am wondering whether it makes sense to change that. What use has a write ? Why should you be able to at all? Is there a legitimate use ? The only use I see, is to set it to NULL. But couldn't you just use select.remove() for that ? Would appreciate any comments
FYI, found this comment from 2001, explaining it would break a lot of sites. http://markmail.org/message/j4hhf4s34kbg7x6z
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: