Closed Bug 656685 Opened 13 years ago Closed 2 years ago

DoS by flooding {local,session}Storage with data (no quota)

Categories

(Core :: Storage: localStorage & sessionStorage, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: maciej.malecki, Unassigned)

Details

(Keywords: csectype-dos, testcase, Whiteboard: [sg:dos])

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.30 Safari/534.30
Build Identifier: Firefox 4.0.1

It's possible to flood {local,session}Storage with data, which leads to a DoS. There seems to be no quota (however, my dom.storage.default_quota = 5120).
Memory usage goes up to 2 GB in approx. 3 seconds on both Windows 7 and Fedora 15.


Reproducible: Always

Steps to Reproduce:
localStorage["a"] = "1337";
while(1) localStorage["a"] += "1337";

Actual Results:  
Memory usage goes up to 2 GB in approx. 3 seconds on both Windows 7 and Fedora 15, Firefox process needs to be killed or system stops responding.


Expected Results:  
DOM exception when exceeding quota.

Tested on: Windows 7 SP1 and Fedora 15 Beta, both with FF 4.0.1
Honza?
Status: UNCONFIRMED → NEW
Ever confirmed: true
The quota is 5 MB.  The preference is in kilobytes.  With your test case it takes a very long time to reach it.

Anyway, when I get to usage of about 50kB with your test case, memory is at 2,5 GB.  This is strange, looks like we don't GC or release something we should or whatever...

I pushed the script to continue through the "Non-responsive script dialog" to see what happens:

on a first try (clean profile): when I got to 3,8 GB memory allocation, Firefox died completely unresponsive, 0% CPU activity.

on a second try (almost clean profile): I reached less memory, then Fx released it, but left unresponsive.

It deadlocks in (probably a different bug):
 	nspr4.dll!_PR_WaitCondVar(PRThread * thread=0x007facf0, PRCondVar * cvar=0x06c4ac68, PRLock * lock=0x06c4aae8, unsigned int timeout=4294967295)  Line 204 + 0x17 bytes	
 	nspr4.dll!PR_WaitCondVar(PRCondVar * cvar=0x06c4ac68, unsigned int timeout=4294967295)  Line 547 + 0x17 bytes	
 	mozjs.dll!AutoGCSession::AutoGCSession(JSContext * cx=0x07c175f8)  Line 2598 + 0x12 bytes	
 	mozjs.dll!GCCycle(JSContext * cx=0x07c175f8, JSCompartment * comp=0x00000000, JSGCInvocationKind gckind=GC_NORMAL)  Line 2661	
 	mozjs.dll!js_GC(JSContext * cx=0x07c175f8, JSCompartment * comp=0x00000000, JSGCInvocationKind gckind=GC_NORMAL)  Line 2753 + 0x11 bytes	
 	mozjs.dll!JS_GC(JSContext * cx=0x07c175f8)  Line 2601 + 0xd bytes	
 	xul.dll!nsXPConnect::Collect()  Line 406 + 0xa bytes	
 	xul.dll!nsXPConnect::GarbageCollect()  Line 415	
 	xul.dll!nsJSContext::GarbageCollectNow()  Line 3252	
 	xul.dll!nsMemoryPressureObserver::Observe(nsISupports * aSubject=0x07d5edb0, const char * aTopic=0x615689e4, const wchar_t * aData=0x61568a6c)  Line 202	
 	xul.dll!nsMemoryImpl::RunFlushers(const wchar_t * aReason=0x61568a6c)  Line 163	
 	xul.dll!nsMemoryImpl::FlushEvent::Run()  Line 180	
 	xul.dll!nsThread::ProcessNextEvent(int mayWait=1, int * result=0x0040d180)  Line 618 + 0x19 bytes	
 	xul.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x01dd1330, int mayWait=1)  Line 250 + 0x16 bytes	
 	xul.dll!nsThread::Shutdown()  Line 481 + 0xb bytes	
 	xul.dll!NS_InvokeByIndex_P(nsISupports * that=0x07c07688, unsigned int methodIndex=6, unsigned int paramCount=0, nsXPTCVariant * params=0x00000000)  Line 103	
 	xul.dll!nsProxyObjectCallInfo::Run()  Line 182 + 0x2d bytes	
 	xul.dll!nsThread::ProcessNextEvent(int mayWait=0, int * result=0x0040d270)  Line 618 + 0x19 bytes	
 	xul.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x01dd1330, int mayWait=0)  Line 250 + 0x16 bytes	
 	xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate=0x01dcdb30)  Line 110 + 0xe bytes	
 	xul.dll!MessageLoop::RunInternal()  Line 219	
 	xul.dll!MessageLoop::RunHandler()  Line 203	
 	xul.dll!MessageLoop::Run()  Line 177	
 	xul.dll!nsBaseAppShell::Run()  Line 191	
 	xul.dll!nsAppShell::Run()  Line 248 + 0x9 bytes	
 	xul.dll!nsAppStartup::Run()  Line 224 + 0x1c bytes	
 	xul.dll!XRE_main(int argc=3, char * * argv=0x01dc3ca0, const nsXREAppData * aAppData=0x01dc4480)  Line 3698 + 0x25 bytes	


I'll look deeper into this tomorrow.  This is a very interesting bug.
Assignee: nobody → honzab.moz
Hmm.  I would fully expect us to generate garbage for all those intermediate strings.  In particular, growing at 5 chars each time (10 bytes, I assume) to get up to 50KB requires 5,000 sets.

That means we should generate about 10,000 strings (each += generates a string on get and another string on concatenate+set) of average length 2,500 chars or 5000 bytes, for a total of 50,000,000 bytes.... which is a long shot away from a few gigabytes.  Would be interesting to see why we're using so much more memory than that.
I checked out nsStorage2SH::GetProperty and, based on the JS_NewUCStringCopyN, we will not be accumulating a giant rope.  My guess is that perhaps we have a leak and thus we are keeping alive all the old strings?
Even if we were, it seems to me that unless I miscounted in comment 3 the _total_ allocations should be only 50MB.
My guess in comment 4 was that, if every intermediate string (every result of a set) was accidentally being rooted but (perhaps due to the same bug) not counted in the quota, then we'd have quadratic total space growth which could produce the explosive numbers in comment 2.
Sure; I'm positing quadratic growth, but comment 2 says that we get to 2.5GB of heap when the string length is only 50KB...
setting localStorage is going to write to mozstorage. It's unlikely to be javascript string memory but leaked mozstorage or sqlite objects would have much more overhead and might be what we're seeing.
Keywords: testcase
Whiteboard: [sg:dos]
Attached file HTML test file
Here's the exact test file I used for a Massif run (results shortly).  When not running under Massif I got much worse than quadratic behaviour -- eg. doing 14000 iterations took a few seconds, but 25000 iterations took long enough that I killed Firefox.  Maybe this was due to paging, though.
Attached file Massif results
You can see from Massif's graph that the memory usage spikes enormously but
then drops again very quickly.  This matched what I saw just when I was
watching 'top':

    GB
4.303^                                                       #                
     |                                                       #::              
     |                                                      @#:               
     |                                                      @#:               
     |                                                    ::@#:               
     |                                                  @@: @#: :             
     |                                                  @ : @#: :             
     |                                                  @ : @#: :             
     |                                                 :@ : @#: :             
     |                                              @@::@ : @#: :             
     |                                              @ ::@ : @#: :             
     |                                              @ ::@ : @#: :             
     |                                             :@ ::@ : @#: :             
     |                                           @@:@ ::@ : @#: :             
     |                                           @ :@ ::@ : @#: :             
     |                                          @@ :@ ::@ : @#: :             
     |                                         @@@ :@ ::@ : @#: :        :::: 
     |                                         @@@ :@ ::@ : @#: :::::::::: :@ 
     |                 @::@:::@::::::::::@:::@@@@@ :@ ::@ : @#: :: :  : :: :@@
     |         @:::::@:@: @:: @: :: ::: :@: :@ @@@ :@ ::@ : @#: :: :  : :: :@@
   0 +----------------------------------------------------------------------->s
     0                                                                   92.40

The peak snapshot was #39 (search for "^ 39" in the file).  It's definitely
string-related.  The top-level culprits:

o1o> 55.14% (2,547,972,408B): nsStringBuffer::Alloc() (nsSubstring.cpp:209)
o1o> 18.41% (850,436,732B): JSRope::flatten() (jsutil.h:239)
o1o> 13.76% (635,982,888B): js_NewStringCopyN() (jsutil.h:239)

Look at the file for the detailed call stacks.  I'm not sure what to make of
them, but nsStorage2SH::SetProperty and nsStorage2SH::GetProperty look very
guilty.  Maybe putting some logging code into them will be instructive?
In nsDOMStorageEvent::InitStorageEvent the key string, old data string, and new data string all get assigned... does that make a copy of the strings?  And a similar question for nsDOMStorageItem::SetValue.

In my example there are 784,000,000 bytes of string values that get set.  If each one is copied a few times unnecessarily, that would get up to multiple GC quickly.
(In reply to comment #11)
> that would get up to multiple GC quickly.

I meant "multiple GB", of course.
Attachment #532115 - Attachment mime type: text/x-vhdl → text/plain
> does that make a copy of the strings? 

At least for the new data string, I believe it does, since the incoming string is a JSString in disguise.  See bug 585656.

It looks like under nsDOMStorage::SetItem we actually copy twice: once for the event and once for the actual set in the storage.  The latter I would think goes away pretty quickly, though (on the very next set).... except the next event will hold on to it.  So we are in fact creating two copies of the string there.  We can avoid that by just creating an nsString for the given nsAString up front in SetItem and then passing _that_ down to both BroadcastChangeNotification and SetValue.

There's not much we can do about the JSRope::flatten.

For the js_NewStringCopyN, nsStorage2SH::GetProperty should just be using our normal XPConnect string-to-JSString conversion stuff instead of copying.  Taht would create an external string.

All of this would reduce memory usage by 3x, which is nice and we should do it but doesn't solve the real problem.  The real problem is that there's an event for each change and they can't go away until that script finishes running.  Honza, does the spec really require us to queue up all those events?  Can we just start dropping them on the floor?
It looks like the spec in fact requires that all those events exist.  We need to get that fixed, imo.
(In reply to comment #13)
> 
> For the js_NewStringCopyN, nsStorage2SH::GetProperty should just be using
> our normal XPConnect string-to-JSString conversion stuff instead of copying.
> That would create an external string.

Might there be other places where the same mistake is made?  Is there a way to detect them?
I did some research using VC++ tracepoints in moz_alloc and moz_free, I have following stacks to actually prove boris' theory about the event not being released until the script exits:

>>> called 43 times, bytes total 3928566
	xul.dll!nsStringBuffer::Alloc() 
	xul.dll!nsAString_internal::MutatePrep() 
	xul.dll!nsAString_internal::ReplacePrepInternal() 
	xul.dll!nsAString_internal::ReplacePrep() 
	xul.dll!nsAString_internal::Assign() 
	xul.dll!nsAString_internal::Assign() 
	xul.dll!nsString::operator=() 
	xul.dll!nsDOMStorageItem::SetValueInternal() 
	xul.dll!DOMStorageImpl::SetValue() 
	xul.dll!nsDOMStorage::SetItem() 
	xul.dll!nsDOMStorage2::SetItem() 
	xul.dll!nsStorage2SH::SetProperty() 
	xul.dll!XPC_WN_Helper_SetProperty() 
	mozjs.dll!js::CallJSPropertyOpSetter() 

>>> called 86 times, bytes total 3929082 (looks like this can be split to 43 allocations of 12 bytes and large rest)
	xul.dll!nsStringBuffer::Alloc() 
	xul.dll!nsAString_internal::MutatePrep() 
	xul.dll!nsAString_internal::ReplacePrepInternal() 
	xul.dll!nsAString_internal::ReplacePrep() 
	xul.dll!nsAString_internal::Assign() 
	xul.dll!nsAString_internal::Assign() 
	xul.dll!nsString::operator=() 
	xul.dll!nsDOMStorageEvent::InitStorageEvent() 
	xul.dll!nsDOMStorage2::BroadcastChangeNotification() 
	xul.dll!nsDOMStorage::SetItem() 
	xul.dll!nsDOMStorage2::SetItem() 
	xul.dll!nsStorage2SH::SetProperty() 
	xul.dll!XPC_WN_Helper_SetProperty() 
	mozjs.dll!js::CallJSPropertyOpSetter() 

I will attach the complete list of stacks I captured during a short period.
> Might there be other places where the same mistake is made?

Yes, but only in places where people hand-write JS bindings (i.e. in nsIXPCScriptable callbacks like here or in places where jsval return values or out params are used).
(In reply to comment #13)
> We can avoid that by just creating an nsString for the given
> nsAString up front in SetItem and then passing _that_ down to both
> BroadcastChangeNotification and SetValue.

With this change I can reach 90kB of the storage usage while allocating 2,5 GB.  So a bit better, but not a solution.  I can see the new buffer for that string still unreleased.

Looks like we need to limit the number of events or coalesce them to a single one until it is posted, that event would say OLD="" NEW="the horribly long new string".  I can try to have a patch for this.
I don't believe the spec allows either of those behaviors.  We really need a spec change here.

I also still don't understand why your allocations are an order of magnitude more than my calculation.  And also, since the allocation amount should be _quadratic_ in storage usage, halving allocations should not have allowed us to get to twice the storage usage....
How does the fix for bug 656878 affect this?
It just shows what the code in nsStorage2SH::GetProperty should look like to avoid the js_NewStringCopyN.
Is there any reason to keep this hidden? This seems like a useful discussion to have public.
Keywords: csec-dos
Assignee: honzab.moz → nobody
Group: core-security → dom-core-security
Group: dom-core-security
Component: DOM: Core & HTML → Storage: localStorage & sessionStorage
QA Whiteboard: qa-not-actionable

Not very surprisingly, a DoS remains a DoS, so also today we get kind of stuck but I needed to raise the loop counter to be very noticeable.

If I let it run for a while, memory goes up to over 8GB but then it wiggles between 7 and 9 GB, so apparently that is handled well. And also the slow script doorhanger appears after a while and stopping the script frees all memory, too. The rest of the browser remained usable, closing the tab is no problem at all.

So while we could probably use less memory or be faster while looping - an (almost) endless loop remains an endless loop and can block a content process, but it does not seem to impact the rest of the browser anymore.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: