Closed Bug 725993 Opened 8 years ago Closed 7 years ago

Remove ability to tell cache to STORE_ON_DISK_AS_FILE

Categories

(Core :: Networking: Cache, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: tmptgr, Assigned: michal)

References

()

Details

(Keywords: perf)

Attachments

(5 files, 3 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0) Gecko/20100101 Firefox/10.0
Build ID: 20120129021758

Steps to reproduce:

Watch an embedded YouTube video like the one on 


Actual results:

24-26% CPU of my Intel Xeon 2.93 Ghz, making Firefox unresponsive but the video kept playing. Process Explorer 12.04 says the offending thread is firefox.exe!wmainCRTStartup with varying stack, most interesting:

mozutils.dll!malloc_usable_size+0x6
mozsqlite3.dll!sqlite3_step+0xc2
xul.dll!??1gfxFont@@UAE@XZ+0x564a
xul.dll!?CanDraw2D@gfx3DMatrix@@QBE_NPAUgfxMatrix@@@Z+0xbe4c
xul.dll!XRE_LockProfileDirectory+0x43882
xul.dll!XRE_LockProfileDirectory+0x4e5a2
xul.dll!XRE_LockProfileDirectory+0x4e742
xul.dll!?AsShadowLayer@Layer@layers@mozilla@@UAEPAVShadowLayer@23@XZ+0x543f
xul.dll!NS_CycleCollectorSuspect2_P+0x4186
mozjs.dll!JS_GetTypeInferenceMemoryStats+0x6d7b
xul.dll!NS_CycleCollectorForget2_P+0x66e1
mozjs.dll!JS_GetOptions+0xc
mozjs.dll!JS_InitReflect+0x7710
xul.dll!NS_CycleCollectorSuspect2_P+0x2592

I'm currently running Firefox via 32-bit WinDbg on Windows 7 Enterprise with Nvidia Quadro FX 1800. Physical usage 4.4/6.0 GB RAM.

Application: Firefox 10.0 (20120129021758)
Operating System: WINNT (x86-msvc)

- BarTab Lite 1.2
- Download Statusbar 0.9.10 (Disabled)
- Extension List Dumper 1.15.2
- Firebug 1.9.1
- Firefogg 2.9.19 (Disabled)
- FireFTP 2.0.1
- Flashblock 1.5.15.1
- FromWhereToWhere 0.25.0 (Disabled)
- Memory Fox 7.4 (Disabled)
- Mozilla Archive Format 2.0.4
- New Tab JumpStart 0.5a5.4.3
- NoScript 2.2.9
- Places Maintenance 1.3 (Disabled)
- Session History Tree 1.0 (Disabled)
- Session Manager 0.7.8.1
- Showcase 0.9.5.8 (Disabled)
- Tab History Menu 2.1.1 (Disabled)

I use this for web development, so installing a nightly etc. doesn't sound appealing.

Attached is the running WinDbg log. I downloaded the symbols with a 64-bit WinDbg but didn't get any notice about that so i assume that's no problem.
Possibly related info in bug 465648.


Expected results:

Stay responsive until you can blame it on the host OS swap.
> Steps to reproduce:
> 
> Watch an embedded YouTube video like the one on 
> 

Like the one on what?
OK, see you have added an URL now.
While scrolling down this page using the scrollbar grip, i also sometimes get many time Error: [Exception... "'JavaScript component does not have a method named: "onBeforeItemRemoved"' when calling method: [nsINavBookmarkObserver::onBeforeItemRemoved]"  nsresult: "0x80570030 (NS_ERROR_XPC_JSOBJECT_HAS_NO_FUNCTION_NAMED)"  location: "JS frame :: resource:///components/nsLivemarkService.js :: LM_runBatched :: line 659"  data: no]
Source File: resource:///components/nsLivemarkService.js
Line: 659
The flooding of the error console definitely causes the CPU spikes.

Prolly useless, but a Process Explorer stack while it was flooding:

xul.dll!?GetFirstChild@ContainerLayer@layers@mozilla@@UAEPAVLayer@23@XZ+0x1188
xul.dll!??0gfxSize@@QAE@XZ+0xcbfe
xul.dll!?TransformBounds@gfxMatrix@@QBE?AUgfxRect@@ABU2@@Z+0x4a26
xul.dll!NS_InvokeByIndex_P+0x17d
xul.dll!NS_InvokeByIndex_P+0x27
xul.dll!NS_StackWalk+0x35e
xul.dll!??0gfxSize@@QAE@XZ+0x1009e
xul.dll!?CanDraw2D@gfx3DMatrix@@QBE_NPAUgfxMatrix@@@Z+0xbf6e
xul.dll!?CreatePlatformFontList@gfxWindowsPlatform@@UAEPAVgfxPlatformFontList@@XZ+0x199c
Summary: Random up to a minute-long unresponsive GUI → Random up to a minute-long unresponsive GUI watching HTML5 YouTube
The random freeze is the flooding described in comment 4, and causes font entries in the stack.

The on-demand terrible performance happens while playing the video at the URL, and when not interrupted by the error console flood, steadily has this stack in the 23+% CPU firefox.exe+0x1c59 thread according to Process Explorer v15.12:

mozjs.dll!js_RemoveRoot+0x23c2
mozjs.dll!js_RemoveRoot+0x25f3
mozjs.dll!js_RemoveRoot+0xa53
mozjs.dll!js_RemoveRoot+0xbbb
mozjs.dll!JS_GC+0x17
xul.dll!?GetCMSRGBTransform@gfxPlatform@@SAPAU_qcms_transform@@XZ+0x950


Application: Firefox 10.0.1 (20120208060813)
Operating System: WINNT (x86-msvc)

- BarTab Lite 1.2 (Disabled)
- Download Statusbar 0.9.10 (Disabled)
- Extension List Dumper 1.15.2
- Firebug 1.9.1
- Firefogg 2.9.19 (Disabled)
- FireFTP 2.0.1
- Flashblock 1.5.15.1
- FromWhereToWhere 0.25.0 (Disabled)
- Memory Fox 7.4 (Disabled)
- Mozilla Archive Format 2.0.4
- New Tab JumpStart 0.5a5.4.3
- NoScript 2.3
- Places Maintenance 1.3 (Disabled)
- Session History Tree 1.0 (Disabled)
- Session Manager 0.7.8.1
- Showcase 0.9.5.8 (Disabled)
- Tab History Menu 2.1.1 (Disabled)

According to plugin check (because i can't copy from the add-on screen), these are my active plugins:

Shockwave Flash 11.1 r102 	11.1.102.55 	Up to Date
Google Update 	Unknown plugin
I can only reproduce this when hardware acceleration is turned off... Do you have hardware acceleration enabled? In you about:support, what's the value of your "GPU Accelerated Windows" field in the "Graphics" section.
And do you get the pauses when running with all extensions disabled?
(In reply to Chris Pearce (:cpearce) from comment #6)
> I can only reproduce this when hardware acceleration is turned off... Do you
> have hardware acceleration enabled? In you about:support, what's the value
> of your "GPU Accelerated Windows" field in the "Graphics" section.

Thanks, that must be it! Reformatted via Notepad++ it says:

Graphics

Adapter Description: NVIDIA Quadro FX 1800
Vendor ID: 10de
Device ID: 0638
Adapter RAM: 768
Adapter Drivers: nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um
Driver Version: 8.15.11.9038
Driver Date: 7-14-2009
Adapter RAM (GPU #2): Unknown
Adapter Drivers (GPU #2): Unknown
Direct2D Enabled: Blocked for your graphics driver version. Try updating your graphics driver to version 257.21 or newer.
DirectWrite Enabled: false (6.1.7600.16385)
ClearType Parameters: ClearType parameters not found
WebGL Renderer: Blocked for your graphics driver version. Try updating your graphics driver to version 257.21 or newer.
GPU Accelerated Windows: 0/28. Blocked for your graphics driver version. Try updating your graphics driver to version 257.21 or newer.
After 2 hours of updating due to none of the involved software simply mentioning that there was not enough space to complete the graphic driver installation ("80070070" wtf?), about:support's Graphics section now says:

Adapter Description: NVIDIA Quadro FX 1800
Vendor ID: 10de
Device ID: 0638
Adapter RAM: 768
Adapter Drivers: nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um
Driver Version: 8.17.12.7652
Driver Date: 2-1-2012
Adapter RAM (GPU #2): Unknown
Adapter Drivers (GPU #2): Unknown
Direct2D Enabled: false
DirectWrite Enabled: false (6.1.7600.16385)
ClearType Parameters: ClearType parameters not found
WebGL Renderer: Google Inc. -- ANGLE (NVIDIA Quadro FX 1800) -- OpenGL ES 2.0 (ANGLE 0.0.0.809)
GPU Accelerated Windows: 1/27 Direct3D 9

The video plays fine now. Unfortunately, the URL still makes Firefox unresponsive and use 24+% CPU, even when the video is paused. Possibly useless glimpse at the offending firefox.exe (so i guess it's an add-on, probably Flashblock..*waits for onBeforeItemRemoved...*) thread stack glimpse:

xul.dll!??0gfxSize@@QAE@XZ+0x51b5
xul.dll!NS_CycleCollectorForget2_P+0x9cdb
xul.dll!??_7LayerManagerOGL@layers@mozilla@@6B@+0x92d78
xul.dll!??0gfxSize@@QAE@XZ+0x51b5
xul.dll!??0gfxSize@@QAE@XZ+0x24b0
xul.dll!?GetTightGlyphExtentsAppUnits@gfxGlyphExtents@@QAE_NPAVgfxFont@@PAVgfxContext@@IPAUgfxRect@@@Z+0x181b
xul.dll!??0gfxSize@@QAE@XZ+0x6360
mozjs.dll!?jitDataSize@JSScript@@QAEIP6AIPAX@Z@Z+0x5169d
mozjs.dll!?jitDataSize@JSScript@@QAEIP6AIPAX@Z@Z+0x293
mozjs.dll!?jitDataSize@JSScript@@QAEIP6AIPAX@Z@Z+0x39c
mozjs.dll!JS_GetTypeInferenceMemoryStats+0x6c9b
mozjs.dll!JS_GetTypeInferenceMemoryStats+0x6e23

I will now try disabling add-ons until i find the culprit.
Video paused, Flashblock disabled.

mozjs.dll!js_RemoveRoot+0x2cff
mozjs.dll!JS_TraceChildren+0x6b
mozjs.dll!?setGCLastBytes@JSCompartment@@QAEXIW4JSGCInvocationKind@@@Z+0x1f9
mozjs.dll!?setGCLastBytes@JSCompartment@@QAEXIW4JSGCInvocationKind@@@Z+0x23c
mozjs.dll!js_RemoveRoot+0x2673
mozjs.dll!js_RemoveRoot+0x955
mozjs.dll!js_RemoveRoot+0xa53
mozjs.dll!js_RemoveRoot+0xbbb
mozjs.dll!JS_GC+0x17
xul.dll!?GetCMSRGBTransform@gfxPlatform@@SAPAU_qcms_transform@@XZ+0x950

Disabled NoScript as well, during startup 24+% CPU on quadcore:

xul.dll!??0gfxSize@@QAE@XZ+0x51b5
xul.dll!??0gfxSize@@QAE@XZ+0x24b0
xul.dll!?GetTightGlyphExtentsAppUnits@gfxGlyphExtents@@QAE_NPAVgfxFont@@PAVgfxContext@@IPAUgfxRect@@@Z+0x181b
xul.dll!??0gfxSize@@QAE@XZ+0x7bec
mozjs.dll!JS_GetTypeInferenceMemoryStats+0x6d7b
mozjs.dll!JS_GetTypeInferenceMemoryStats+0x6170
mozjs.dll!JS_GetTypeInferenceMemoryStats+0x620a

Possibly due to Flashblock being disabled, i notice about:support's Graphic section now shows "GPU Accelerated Windows: 26/26 Direct3D 10".

There's a little 24+% CPU spike while loading the URL.

Scrolling the page is no problem. I also notice there are no onBeforeItemRemoved errors in the Error Console.

The instant i click the play button, firefox.exe+0x1c59 starts using 24+% CPU again. The video still plays fine but the Firefox GUI gets nonresponsive for 7 seconds every other 7 seconds. :( Tested by dragging the window.
Pausing the video still provides no relief.

The threada stack appears to bounce between 17 and 25% CPU. A glimpse at the 17% stack shows only "mozjs.dll!JS_ObjectToOuterObject".
24.7+% shows the following:

xul.dll!?GetType@ContainerLayer@layers@mozilla@@UBE?AW4LayerType@Layer@23@XZ+0x327c
xul.dll!?GetType@ContainerLayer@layers@mozilla@@UBE?AW4LayerType@Layer@23@XZ+0x4f4a
xul.dll!?GetUserFontSet@gfxFontGroup@@QAEPAVgfxUserFontSet@@XZ+0x209d

I notice the Error Console is now flooded with Error: [Exception... "'JavaScript component does not have a method named: "onBeforeItemRemoved"' when calling method: [nsINavBookmarkObserver::onBeforeItemRemoved]"  nsresult: "0x80570030 (NS_ERROR_XPC_JSOBJECT_HAS_NO_FUNCTION_NAMED)"  location: "JS frame :: resource:///components/nsLivemarkService.js :: LM_runBatched :: line 659"  data: no]
Source File: resource:///components/nsLivemarkService.js
Line: 659 again.

Firefox CPU usage has returned to normal: < 1% !

The URL is still open and the video still paused.

Starting the video doesn't cause noticable CPU increase. :)

Only starting the video after reloading the page causes the CPU to go back to 24+%.

Hitting the tab's close button while the GUI is responding immediately closes the tab and returns CPU to normal.

Disabling Firebug...

Now everything works fine. Too bad i'm still developing sites with Flash elements.
I've updated to Firebug 1.10.0a3 and the Flash CPU problem doesn't occur anymore. If i still see the onBeforeItemRemoved flood (that disabled both ways of middle mouse button scrolling but not the scrollbar grip scrolling), i'll find or create a different bug for that.
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
I can reproduce longish UI freezes (multi second, sometimes tens of seconds) semi-reliably when my system is doing heavy disk IO (for example unziping a zip of my object dir, or `winrm -rf $objdir`).

I caught a callstack of a pause in a locally built opt build, and I get the following stack:

ntdll.dll!_NtWriteFile@36()  + 0x15 bytes	
ntdll.dll!_NtWriteFile@36()  + 0x15 bytes	
kernel32.dll!_WriteFileImplementation@20()  + 0x4a bytes	
nspr4.dll!FileWrite(PRFileDesc * fd, const void * buf, int amount)  Line 109 + 0x8 bytes	C
nspr4.dll!PR_Write(PRFileDesc * fd, const void * buf, int amount)  Line 146 + 0x13 bytes	C
xul.dll!nsDiskCacheStreamIO::FlushBufferToFile()  Line 791 + 0xf bytes	C++
xul.dll!nsDiskCacheStreamIO::Write(const char * buffer, unsigned int count, unsigned int * bytesWritten)  Line 639	C++
xul.dll!nsDiskCacheOutputStream::Write(const char * buf, unsigned int count, unsigned int * bytesWritten)  Line 294	C++
xul.dll!nsCacheEntryDescriptor::nsOutputStreamWrapper::Write(const char * buf, unsigned int count, unsigned int * result)  Line 845 + 0x12 bytes	C++
xul.dll!nsInputStreamTee::TeeSegment(const char * buf, unsigned int count)  Line 200	C++
xul.dll!nsInputStreamTee::WriteSegmentFun(nsIInputStream * in, void * closure, const char * fromSegment, unsigned int offset, unsigned int count, unsigned int * writeCount)  Line 230 + 0xb bytes	C++
xul.dll!nsPipeInputStream::ReadSegments(unsigned int (nsIInputStream *, void *, const char *, unsigned int, unsigned int, unsigned int *)* writer, void * closure, unsigned int count, unsigned int * readCount)  Line 799 + 0x13 bytes	C++
xul.dll!nsInputStreamTee::ReadSegments(unsigned int (nsIInputStream *, void *, const char *, unsigned int, unsigned int, unsigned int *)* writer, void * closure, unsigned int count, unsigned int * bytesRead)  Line 277	C++
xul.dll!mozilla::ChannelMediaResource::OnDataAvailable(nsIRequest * aRequest, nsIInputStream * aStream, unsigned int aCount)  Line 409	C++
xul.dll!mozilla::ChannelMediaResource::Listener::OnDataAvailable(nsIRequest * aRequest, nsISupports * aContext, nsIInputStream * aStream, unsigned int aOffset, unsigned int aCount)  Line 128	C++
xul.dll!nsHTMLMediaElement::MediaLoadListener::OnDataAvailable(nsIRequest * aRequest, nsISupports * aContext, nsIInputStream * aStream, unsigned int aOffset, unsigned int aCount)  Line 388	C++
xul.dll!nsStreamListenerTee::OnDataAvailable(nsIRequest * request, nsISupports * context, nsIInputStream * input, unsigned int offset, unsigned int count)  Line 122 + 0x18 bytes	C++
xul.dll!nsHttpChannel::OnDataAvailable(nsIRequest * request, nsISupports * ctxt, nsIInputStream * input, unsigned int offset, unsigned int count)  Line 4478 + 0x36 bytes	C++
xul.dll!nsInputStreamPump::OnStateTransfer()  Line 514 + 0x18 bytes	C++
xul.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream * stream)  Line 403	C++
xul.dll!nsOutputStreamReadyEvent::Run()  Line 115	C++
xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result)  Line 657 + 0x9 bytes	C++
xul.dll!NS_ProcessNextEvent_P(nsIThread * thread, bool mayWait)  Line 245 + 0xd bytes	C++
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate)  Line 110 + 0xa bytes	C++
xul.dll!MessageLoop::RunHandler()  Line 202	C++
xul.dll!MessageLoop::Run()  Line 176	C++


So am I reading that stack right, we're writing data that's been received to a disk cache in nsDiskCacheStreamIO, and that's flushing the cache to disk which is blocking, presumably waiting on a disk seek?

I wonder if users are seeing these freezes caused by this due to disk activity on their systems by virus scanners (some people even run multiple scanners!) or the OS's search indexing service (SearchIndexer.exe) misbehaving.

Josh/jduell: I understand are there no current plans to enable off-main-thread nsHttpChannel::OnDataAvailable() listeners? Is there a way we can specify that we don't want the nsMediaCache's nsHttpChannels cached by the network cache, since we're caching them ourselves in the nsMediaCache? Or some other way to work around this?

On a side note, the video's audio still keeps playing, which requires reading the uncompressed data from disk in the nsMediaCache, so I'd have expected it to block too. Maybe the audio's required segments in the media cache's file are being cached in memory by the OS.
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: INVALID → ---
Michal/Nick,

So looking at the stack trace in comment 12, it seems that we're trying to read the inputstream during OnDataAvailable, but the inputstreamTee is somehow blocking that read while it writes stuff to disk.  That seems like it should never happen--all the data in the OnDataAvailable stream should be readable w/o blocking for the client.

On that basis, I'm classifying this for now as a disk cache bug.  Possible workaround in the meantime may be to bypass disk caching for loads that get stored in the media cache anyway.   (I'm not sure how best to make that happen--the INHIBIT_CACHING flag would work, but we'd need to know the media type at channel load time, right?  That may be doable.  Anyway, that belongs in a separate bug).
Assignee: nobody → michal.novotny
Component: Untriaged → Networking: Cache
Product: Firefox → Core
QA Contact: untriaged → networking.cache
I can't reproduce it even if I'm doing heavy disk IO. Anyway, if it is really caused by doing IO on the main thread then following builds shouldn't suffer from this issue. Could you please try it and check if the problem is fixed?

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mnovotny@mozilla.com-fc1d2e1954b7/
I extracted your patch from your try push and built that locally. I can confirm the stack trace in comment 12 no longer is the main cause of pauses on the main thread when the disk is doing heavy IO. There are a number of other IO operations on the main thread which are causing pauses, but the one in comment 12 seems fixed by that patch.
(In reply to Chris Pearce (:cpearce) from comment #15)
> I can confirm the stack trace in comment 12 no longer is the main cause of
> pauses on the main thread when the disk is doing heavy IO.

And was this IO the main culprit? I.e. is the UI responsiveness better now?


> There are a number of other IO operations on the main thread which are
> causing pauses, but the one in comment 12 seems fixed by that patch.

Are they cache related? Could you please post them to this bug?
(In reply to Michal Novotny (:michal) from comment #16)
> (In reply to Chris Pearce (:cpearce) from comment #15)
> > I can confirm the stack trace in comment 12 no longer is the main cause of
> > pauses on the main thread when the disk is doing heavy IO.
> 
> And was this IO the main culprit? I.e. is the UI responsiveness better now?

Yes, things are much better now. On my machine the pauses seem to be now less than ~2 seconds long, whereas they were often up to 5 seconds or even longer.

> > There are a number of other IO operations on the main thread which are
> > causing pauses, but the one in comment 12 seems fixed by that patch.
> 
> Are they cache related? Could you please post them to this bug?

The most common one is our media cache writing on the main thread (bug 736342).

The only pause (other than bug 736342) pause that I hit today when testing had the following main thread stack:

ntdll.dll!_ZwFlushBuffersFile@8()  + 0x12 bytes	
ntdll.dll!_ZwFlushBuffersFile@8()  + 0x12 bytes	
kernel32.dll!_FlushFileBuffersImplementation@4()  + 0x3a bytes	
mozsqlite3.dll!winSync(sqlite3_file * id, int flags)  Line 33800	C
xul.dll!`anonymous namespace'::xSync(sqlite3_file * pFile, int flags)  Line 180 + 0xf bytes	C++
mozsqlite3.dll!syncJournal(Pager * pPager, int newHdr)  Line 42240 + 0x1f bytes	C
mozsqlite3.dll!sqlite3PagerCommitPhaseOne(Pager * pPager, const char * zMaster, int noSync)  Line 44194 + 0x8 bytes	C
mozsqlite3.dll!sqlite3BtreeCommitPhaseOne(Btree * p, const char * zMaster)  Line 52348 + 0xe bytes	C
mozsqlite3.dll!vdbeCommit(sqlite3 * db, Vdbe * p)  Line 61160 + 0x7 bytes	C
mozsqlite3.dll!sqlite3VdbeHalt(Vdbe * p)  Line 61588 + 0x8 bytes	C
mozsqlite3.dll!sqlite3VdbeExec(Vdbe * p)  Line 2034 + 0x9 bytes	C
mozsqlite3.dll!sqlite3Step(Vdbe * p)  Line 63044	C
mozsqlite3.dll!sqlite3_step(sqlite3_stmt * pStmt)  Line 63118 + 0x7 bytes	C
mozsqlite3.dll!sqlite3_exec(sqlite3 * db, const char * zSql, int (void *, int, char * *, char * *)* xCallback, void * pArg, char * * pzErrMsg)  Line 25202 + 0xa bytes	C
xul.dll!mozilla::storage::Connection::ExecuteSimpleSQL(const nsACString_internal & aSQLStatement)  Line 1230 + 0x2b bytes	C++
85142444()
Based on this experience, I'd like to completely remove support for STORE_ON_DISK_AS_FILE from the cache. I wanted to do this a long time ago but I never had a real reason for this. There are 2 major problems with these entries:

1) They don't respect size limitations, so e.g. one extra large video can evict the whole cache.
2) These entries are written on the main thread.

Any code that sets nsICachingChannel::cacheAsFile MUST have a fallback solution in case of failure (disk cache not enabled, entry cannot be stored on persistent storage, ...). So removing support for STORE_ON_DISK_AS_FILE shouldn't break anything (see https://tbpl.mozilla.org/?tree=Try&rev=fc1d2e1954b7).

Any opinion on this?
Attachment #609523 - Flags: review?(jduell.mcbugs)
(In reply to Michal Novotny (:michal) from comment #18)
> Any opinion on this?

My understanding is that your patch doesn't actually reduce functionality (e.g. XHR and NPAPI will work as before). If so, it seems like a good idea. But, why is all this code using cacheAsFile in the first place? Is there any negative impact at all?
Blocks: 722034
(In reply to Brian Smith (:bsmith) from comment #20)
> My understanding is that your patch doesn't actually reduce functionality
> (e.g. XHR and NPAPI will work as before).

Right.


> If so, it seems like a good idea. But, why is all this code using cacheAsFile
> in the first place?

A long time ago, only cacheAsFile was used (see bug #209560). The current code is just a result of a decision in comment #10.


> Is there any negative impact at all?

The main difference is that cacheAsFile forces to cache file of any size. So removing this function means that e.g. 100MB HTML5 video won't be cached in the disk cache anymore. I personally consider this as a positive impact but others might have different opinion.
(In reply to Michal Novotny (:michal) from comment #21)
> (In reply to Brian Smith (:bsmith) from comment #20)
> > Is there any negative impact at all?
> 
> The main difference is that cacheAsFile forces to cache file of any size. So
> removing this function means that e.g. 100MB HTML5 video won't be cached in
> the disk cache anymore. I personally consider this as a positive impact but
> others might have different opinion.

<video>'s HTTP channels are currently created with the nsICachingChannel::LOAD_BYPASS_LOCAL_CACHE_IF_BUSY flag, would this change affect them? 

Currently the media cache isn't smart enough to persistently store data longer than the lifetime of all media elements playing that resource. We rely on the Necko cache to ensure that resources that have been loaded before, but which aren't currently stored in a live media element, are loaded quickly. This is important for low latency when creating media elements.
(In reply to Chris Pearce (:cpearce) from comment #22)
> <video>'s HTTP channels are currently created with the
> nsICachingChannel::LOAD_BYPASS_LOCAL_CACHE_IF_BUSY flag, would this change
> affect them? 

No.


> Currently the media cache isn't smart enough to persistently store data
> longer than the lifetime of all media elements playing that resource. We
> rely on the Necko cache to ensure that resources that have been loaded
> before, but which aren't currently stored in a live media element, are
> loaded quickly. This is important for low latency when creating media
> elements.

This patch will affect only large videos (currently bigger than 50MB). These files won't be cached anymore. Smaller files will be still cached but the cache file storage can't be used directly via nsICachingChannel::cacheFile.
OK, this bug has morphed a bit.   Here's the summary as I see it:

We were seeing long pauses for Youtube videos, but that's mostly been fixed by moving the cache writes to be async (comment 14).  Except that cache files stores with STORE_ON_DISK_AS_FILE are still written synchronously from the main thread, so Michal wants to remove the semantic entirely.  (These entries also don't respect cache size limits, so big (>50MB) files wind up evicting other resources).

I see two solutions to this: 1) remove STORE_ON_DISK_AS_FILE and make sure current API clients perform ok without it. (This is what you've done, but I'm not sure the client part is OK).  2) Change the code so STORE_ON_DISK_AS_FILE files don't get written to on the main thread, and/or put size limits on them.

Re #1:  So we have 4 clients of this API: MediaCache, XHR, nsPluginStreamListenerPeer.cpp, and nsDownLoader.  If we remove STORE_ON_DISK_AS_FILE, here's what they do AFAICT:

MediaCache: no longer caches files >50MB.  This is probably ok (though I think a better solution eventually is to have either a separate cache for large files (so small ones don't get squeezed out), and/or an algorithm that evicts large files earlier).  But we could achieve that w/o an STORE_ON_DISK_AS_FILE, so I think I'm ok with it.  It would be nice to have signoff from the media guys that they're OK with this: the media cache will handle same-session accesses, but we won't cache large videos across sessions.

XHR: will no longer back Blobs with a cache file, and will instead store all Blobs in RAM. This is almost definitely *not* what we want.  I'm not clear on whether there's a good reason for XHR to use the cache here (is it likely that Blobs will be re-requested? That's the only case I can think of for having Blobs in the cache), so the right answer here might be to have XHR handle storing Blob files.  But I don't think we can land this without that.

PluginStream/Downloader:  these classes store files in a temporary directory if we remove STORE_ON_DISK_AS_FILE.  That's OK in theory, but note that in many cases it will be a different disk partition than the cache directory (ex: on Unix it typically resolves to /tmp, so we'd be filling up either a potentially small /tmp, or the root partition, rather than the user's /home. On Windows we get whatever GetTempPathW() gives us, and on OSX we seem to get a per-user temp directory.  I was told on IRC by Yoric that the Java loader "emulates" TEMP on Android--I don't know details).  I especially would want to make sure that's OK on android and B2G.  I'm also not clear on how these file get cleaned up--nsPluginStreamListenerPeer seems to keep a list of recently accessed files, so it has it's own mini-cache, and the comment for nsPluginStreamListenerPeer::SetupPluginCacheFile says that "these files will be deleted when the host is destroyed" (I don't know what that means).  I haven't checked to see what nsDownLoader does for cleanup.  Do we know the status of how TEMP directories are cleaned up on the various OSes (I seem to remember Unix cleans /tmp at reboot).  It may not be enough to rely on the OS to handle /tmp if we're doing lots of downloads, etc.

Re: approach #2.  If it makes sense for us to store these files in the HTTP cache (i.e. we can get future cache hits for them *and* it's worth keeping them around even though they may be large), then we could try to keep the current API.  More questions:

Is there some reason why STORE_ON_DISK_AS_FILE requires main thread I/O?  The only reference in the patch I see to it is that you've changed the comment in nsIInputStreamTee.idl to say that STORE_ON_DISK_AS_FILE no longer causes sync I/O, but there are no changes to nsInputStreamTee.cpp (and I see no references to STORE_ON_DISK_AS_FILE in it).  What's doing main thread I/O, and how does your patch remove it?

Is there a way to deny SetCacheAsFile if a file is too large?  I'm guessing the issue is that we don't always know the size before we'd need to return success/fail, and once we've handed out the file descriptor we can't just take it back.  Correct?
Summary: Random up to a minute-long unresponsive GUI watching HTML5 YouTube → Remove ability to tell cache to STORE_ON_DISK_AS_FILE
cc-ing sicking/khuey in case they have comment on the XHR/Blob issue.  (I think jonas is on vacation though, and khuey is in transit).
Comment 24 is private: false
Comment on attachment 609523 [details] [diff] [review]
patch v1 - removes cacheAsFile functionality

Review of attachment 609523 [details] [diff] [review]:
-----------------------------------------------------------------

marking r- for now because I'm fairly confident we can't land this w/o resolving the XHR issue (and possibly the TEMP issue, esp on mobile).

::: dom/plugins/base/nsPluginStreamListenerPeer.cpp
@@ -1076,5 @@
>      nsCOMPtr<nsIFile> localFile;
>      if (mLocalCachedFileHolder)
>        localFile = mLocalCachedFileHolder->file();
>      else {
> -      nsCOMPtr<nsICachingChannel> cacheChannel = do_QueryInterface(request);

Also get rid of comment about "the most likely cause for this is either there is no disk cache available" (and https comment seems dated too) at

  http://mxr.mozilla.org/mozilla-central/source/dom/plugins/base/nsPluginStreamListenerPeer.cpp#439

::: netwerk/cache/nsICache.idl
@@ -149,5 @@
>      const nsCacheStoragePolicy STORE_ANYWHERE        = 0;
>      const nsCacheStoragePolicy STORE_IN_MEMORY       = 1;
>      const nsCacheStoragePolicy STORE_ON_DISK         = 2;
> -    const nsCacheStoragePolicy STORE_ON_DISK_AS_FILE = 3;
> -    const nsCacheStoragePolicy STORE_OFFLINE         = 4;

Do we store these constants anywhere in the cache, or could plugins/addons refer to them by using "3" etc?  Makes me nervous to change STORE_OFFLINE from 4 to 3--might be worth keeping at 4 and just leave a gap?
Attachment #609523 - Flags: review?(jduell.mcbugs) → review-
cc-ing mobile folks: if you can look starting at comment 24 and comment on the TEMP directory issue, that'd be great.
> XHR: will no longer back Blobs with a cache file, and will instead store all
> Blobs in RAM. This is almost definitely *not* what we want.  I'm not clear
> on whether there's a good reason for XHR to use the cache here (is it likely
> that Blobs will be re-requested? That's the only case I can think of for
> having Blobs in the cache), so the right answer here might be to have XHR
> handle storing Blob files.  But I don't think we can land this without that.

I believe we use the cache here because the cache will put the data in a file for us.  I think you are probably correct about the solution here.  CCing Masatoshi here who may want to comment.  (sicking is out for another week and a half)
Yes, I was too lazy to create disk-backed blobs' own cache manager.
We will have to add a cache manager for blobs in favor of STORE_ON_DISK_AS_FILE function of necko cache.
Based on my current understanding, I agree with Michal's change to XHR to remove the cacheAsFile code.

Why do Blobs need to be always disk-backed and what would motivate special caching of them? There is nothing inherently persistent about Blobs; in fact, AFAICT the File API defines Blobs so that we have an efficient way of shuffling around data that is NOT (necessarily) stored on disk. WebSockets and XHR allow the transfer of very large objects/strings without using the Blob type, and it seems like there should be consistency here.

Cacheable blobs in XHR (not websockets) will effectively be persisted to disk anyway, because they are will be in the Necko cache if they are cacheable, even without STORE_ON_DISK_AS_FILE, as long as the disk cache is enabled, right?

WebSockets also supports Blobs, and it seems (to somebody who admittedly hasn't worked on XHR or WebSockets) like the handling of XHR and WebSockets regarding Blobs should be consistent with each other and even share code.

XHR blob support should respect the HTTP no-store directive, and avoid storing no-store items on disk in the same manner the Necko disk cache does. (And, since WebSockets has nothing like no-store, I would say that it is probably a good idea to treat all blobs in WebSockets as though they were no-store.)
(In reply to Jason Duell (:jduell) from comment #24)
> MediaCache: no longer caches files >50MB.  This is probably ok (though I
> think a better solution eventually is to have either a separate cache for
> large files (so small ones don't get squeezed out), and/or an algorithm that
> evicts large files earlier).

Probably, for very large entries that we know are audio/video, we should MAYBE try to cache individual chunks--e.g. the first 1 minute, the most-recently-played minute, and the next 2 minutes instead of caching the whole thing. That would require higher-level smarts. I don't think anybody's expecting their giant YouTube videos to play offline, but it would be cool to get them to start/resume playing without having to hit the network.

> PluginStream/Downloader:  these classes store files in a temporary directory
> if we remove STORE_ON_DISK_AS_FILE.  That's OK in theory, but note that in
> many cases it will be a different disk partition than the cache directory
> (ex: on Unix it typically resolves to /tmp, so we'd be filling up either a
> potentially small /tmp, or the root partition, rather than the user's /home.

IMO, the temp directory is the best place for these files. Sizing /tmp too small or putting it on the root partition are "don't do that" issues. Definitely, these files do not belong in the cache, which is for caching, not for whatever random plugins want to temporarily store.

> Do we know the status of how TEMP directories are cleaned up on the various
> OSes (I seem to remember Unix cleans /tmp at reboot).  It may not be enough
> to rely on the OS to handle /tmp if we're doing lots of downloads, etc.

On platforms that support it, we should unlink the file and/or set the delete-on-close bit. But, I wouldn't block this on that.

> Is there some reason why STORE_ON_DISK_AS_FILE requires main thread I/O? 
> The only reference in the patch I see to it is that you've changed the
> comment in nsIInputStreamTee.idl to say that STORE_ON_DISK_AS_FILE no longer
> causes sync I/O, but there are no changes to nsInputStreamTee.cpp (and I see
> no references to STORE_ON_DISK_AS_FILE in it).  What's doing main thread
> I/O, and how does your patch remove it?

I agree with the idea of removing cacheAsFile. But, in the interim, can't we use the stream transport service to convert the main thread I/O to stream-transport-thread I/O like we already do for other I/O in nsHttpChannel and nsFtpState?
> Why do Blobs need to be always disk-backed...

I don't see language in the specs that *requires* them to be disk-backed.  One benefit is that keeping them disk-backed means we don't need to keep them in RAM for their entire lifecycle.  Might be a factor on mobile.   But I'd like to hear what sicking has to say on this.

> we should unlink [files in TEMP] and/or set the delete-on-close bit. But, 
> I wouldn't block this on that.

I haven't gone through all the clients to know whether they're currently doing some sort of deletion like this, or whether they're relying on the cache to clean up.  I don't think we want to clog up OS's tmp dirs unless we know for sure (do we?) that the OS handles that well on all our platforms of interest.

> for very large entries that we know are audio/video, we should MAYBE try to 
> cache individual chunks--e.g. the first 1 minute, the most-recently-played 
> minute, and the next 2 minutes instead of caching the whole thing.

Nice idea--open a bug for it and cc roc?
> We will have to add a cache manager for blobs

@emk:  if we do this, let's make the code modular so both XHR and websockets can use it.

Note that one downside of making this change is that we make the caches separate.  I.e. we no longer have a centralized disk cache with a single disk quota that can decide which things ought to get evicted.  Instead we have some separate (and possibly large) blob cache that does its own thing.  It's not clear to me that's a win.  OTOH if blobs really don't need to be cached (i.e. we don't expect them to be re-requested), and they just need to be refcounted until they're not in use, then we're better off keeping them separate from the HTTP cache.  We don't want them sitting around waiting to get evicted based on LRU when we could get rid of them sooner.
(In reply to Jason Duell (:jduell) from comment #32)
> > for very large entries that we know are audio/video, we should MAYBE try to 
> > cache individual chunks--e.g. the first 1 minute, the most-recently-played 
> > minute, and the next 2 minutes instead of caching the whole thing.
> 
> Nice idea--open a bug for it and cc roc?

nsMediaCache already does something similar. It caches a window of data around the current play point of each media resource. Its policy is to cache up to the cache size limit (currently 500MB on desktop), and to equalize the amount of cached time before and after the current play point of all media resources.

We can change that policy, of course.
> nsMediaCache already does something similar.

Right--the idea is to have the disk cache store this data too, so we'd get faster re-start of video when a user goes back to a video after restarting the browser.  Not sure if that needs to include the current play point, or just the start of the video. Either way, this is definitely a different bug, so we should discuss elsewhere.
(In reply to Jason Duell (:jduell) from comment #33)
> @emk:  if we do this, let's make the code modular so both XHR and websockets
> can use it.
I consider introducing nsDOMCacheFile based on the necko cache and consider making nsDOMMemoryFile::DataOwner possible to store the data to disk files.
> I consider introducing nsDOMCacheFile based on the necko cache

Do DOM files need to be kept around with a least-recently-used eviction policy?  If so we might as well keep them in the necko cache.  But I suspect the answer is "no", and in that case the code should be much simpler than the necko cache.
Not so easy if we share the storage for the same response. I do not want to create a separate copies for every XHR responses whose contents are the same. Of course we need to allocate a storage again when the response has been updated. That's exactly what the necko cache manages.
Furthermore, we need to adhere to something like "no-store" directives which also necko already takes care of.
emk: Why do you think we should store Blob responses on disk at all? I don't understand why the Blob type is any different than any other response type with respect to persistence.
(In reply to Brian Smith (:bsmith) from comment #40)
> emk: Why do you think we should store Blob responses on disk at all? I don't
> understand why the Blob type is any different than any other response type
> with respect to persistence.
Other response types always use memory simply because they *cannot* store responses on disk. I don't think it's optimal to store video files or iso images on memory. That's the reason the blob response type is invented which is potentially stored on the disk.
But, indeed, we aren't *required* to store all blob responses to the disk. So I changed my mind: it's ok to remove the cacheAsFile functionally even if we continue to use the necko cache (again, necko will decide which storage should be used for the response for us).
(In reply to Jason Duell (:jduell) from comment #24)
> Is there some reason why STORE_ON_DISK_AS_FILE requires main thread I/O? 

Yes, see the beginning of comment https://bugzilla.mozilla.org/show_bug.cgi?id=548406#c33


> Is there a way to deny SetCacheAsFile if a file is too large?  I'm guessing
> the issue is that we don't always know the size before we'd need to return
> success/fail, and once we've handed out the file descriptor we can't just
> take it back.  Correct?

Exactly. SetCacheAsFile() is called in OnStartRequest() and we might or might not know the predicted data size here. And if SetCacheAsFile() succeeds we must cache all content (see https://bugzilla.mozilla.org/show_bug.cgi?id=81640#c16). We already had a bug about it (bug #644431).


(In reply to Jason Duell (:jduell) from comment #26)
> ::: netwerk/cache/nsICache.idl
> @@ -149,5 @@
> >      const nsCacheStoragePolicy STORE_ANYWHERE        = 0;
> >      const nsCacheStoragePolicy STORE_IN_MEMORY       = 1;
> >      const nsCacheStoragePolicy STORE_ON_DISK         = 2;
> > -    const nsCacheStoragePolicy STORE_ON_DISK_AS_FILE = 3;
> > -    const nsCacheStoragePolicy STORE_OFFLINE         = 4;
> 
> Do we store these constants anywhere in the cache, or could plugins/addons
> refer to them by using "3" etc?  Makes me nervous to change STORE_OFFLINE
> from 4 to 3--might be worth keeping at 4 and just leave a gap?

We don't store policy anywhere. Do we care about addons that use numeric constants instead of names? Anyway, in case we decide to leave the gap, we should probably place a comment here explaining that 3 is missing and that this is NOT a bitflag.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #28)
> > XHR: will no longer back Blobs with a cache file, and will instead store all
> > Blobs in RAM. This is almost definitely *not* what we want.  I'm not clear
> > on whether there's a good reason for XHR to use the cache here (is it likely
> > that Blobs will be re-requested? That's the only case I can think of for
> > having Blobs in the cache), so the right answer here might be to have XHR
> > handle storing Blob files.  But I don't think we can land this without that.
> 
> I believe we use the cache here because the cache will put the data in a
> file for us.  I think you are probably correct about the solution here. 
> CCing Masatoshi here who may want to comment.  (sicking is out for another
> week and a half)

I can imagine applications that use XHR to download images or other predictable resources, but I have the feeling that it is not too common. We should check that with webdevs.
(In reply to Jason Duell (:jduell) from comment #24)
> I was told on IRC by Yoric that the Java
> loader "emulates" TEMP on Android--I don't know details.

Android does not have a system-wide notion of TEMP. This makes sense, as "/tmp" and "/var/tmp" are typically designed to be cleaned-up at reboot, while Android devices are expected to both have very limited storage capacity and never reboot in their lifetime.

What we do is create a directory owned by the application at the first startup of Fennec, at http://mxr.mozilla.org/mozilla-central/source/embedding/android/GeckoAppShell.java#358 , and set TMPDIR to point to this directory. This directory is readable and writable by any application, and afaict, it is never cleaned up (see bug 696084).
So, going forward:

We should figure out whether PluginStream/Downloader clean up the temp files they create when STORE_ON_DISK_AS_FILE doesn't happen.  I don't think we should litter tmp directories.  

I'd like to hear what sicking has to say on whether we need to back Blobs with files.  I can imagine some sort of scenario where JS grabs lots of images or something and relies on them not all being in RAM at once (esp on mobile).  But maybe it's not an issue.

If we do need to back Blobs with files, and we don't expect to get enough re-use of the same blobs to make caching worth it, we should remove STORE_ON_DISK_AS_FILE and have XHR/websockets share an implementation that removes files when the last ref to them goes away.

If we need to back Blobs with files AND we want HTTP caching re-use for them, we should modify STORE_ON_DISK_AS_FILE so that it no longer guarantees that the file is completely written to disk during OnStopRequest.  Blobs just wrap the file handle, and all I/O that JS does on them is async, so I think that might be ok.  Unless there's still a race between the JS reading and getting to EOF before the cache thread finishes writing (and/or we don't want JS to see a .size for the Blob that changes).  If that's an issue, then some kind of additional API that lets cacheAsFile clients wait for write to file to complete.
We definitely have the need to be able to create temporary files which are deleted when any Blobs referring to them go away.

The main reason that we use the cacheAsFile feature in XHR now though is to avoid having *two* copies of the file on disk. I.e. if the http-cache already is storing a resource on disk it seems silly to keep another copy around. Especially if that copy is in memory, but also if it is on disk.

We can certainly device other solutions that let us avoid having two copies around though. But we'd need to be notified when a resource is being downloaded into cache, as well as be notified when it's getting kicked out of cache and allow us to read the data out of cache before it's kicked out.
>  main reason that we use the cacheAsFile feature in XHR now though is to avoid 
> having *two* copies of the file on disk

Why don't we just load blobs with a flag that tells the HTTP channel not to cache the load, and then XHR can handle storing it in its own refcounted file storage system (which we'll share with websockets)?   That way we won't have two copies, we won't keep blobs files around any longer than needed, and we won't have to worry about the cache evicting the file before we're done with it (it's not clear to me if we guarantee this now, other than maybe the OS's guarantee that a deleted file stays around until file handles are closed, but maybe that's unix-only?)

I don't think we actually have the load flag we'd need for this right now:  LOAD_BYPASS_LOCAL_CACHE seems to only affect not checking the cache for a read, and  doesn't tell the channel not to write a new cache entry.  But it could easily be added--we obviously have the logic in place for handling HTTP "Cache-control: no-store" headers, etc.
Status: REOPENED → NEW
(In reply to Jonas Sicking (:sicking) from comment #46)
> We definitely have the need to be able to create temporary files which are
> deleted when any Blobs referring to them go away.

This is where the disconnect is: you take it for granted that these must be disk-backed, but it isn't clear to others (me, at least) *why* they must be disk-backed before the application asks them to be stored as a file using the File API. For practical security reasons, the size of these blobs must be limited to prevent someone from eating too much of the user's disk space, even temporarily, and. Also, we should support no-store blobs and respect no-store for them, so AFAICT we need to be able to handle memory-only blobs, and it isn't clear what the thresholds should be for memory-only vs. disk-backed vs. just too big.

To make things more concrete: assume the user only has 200MB of disk space on their mobile phone that is writable for applications. How much of that disk space is a blob allowed to take up? IMO, it isn't reasonable to allow the sum total of all blobs for a particular site/page to take up more than a few megabytes of the disk space. But, then, why not just use RAM for them, letting the OS VMM swap as needed?
(In reply to Jason Duell (:jduell) from comment #47)
> >  main reason that we use the cacheAsFile feature in XHR now though is to avoid 
> > having *two* copies of the file on disk
> 
> Why don't we just load blobs with a flag that tells the HTTP channel not to
> cache the load, and then XHR can handle storing it in its own refcounted
> file storage system (which we'll share with websockets)?

The downside is of course that it means that we loose the normal HTTP-caching semantics. I.e. if a page requests the same URI multiple times we'll have to hit the network every time.

Might not be a big deal though. I don't know how often XMLHttpRequests end up served out of the cache. We could get data on this maybe.
(In reply to Brian Smith (:bsmith) from comment #48)
> (In reply to Jonas Sicking (:sicking) from comment #46)
> > We definitely have the need to be able to create temporary files which are
> > deleted when any Blobs referring to them go away.
> 
> This is where the disconnect is: you take it for granted that these must be
> disk-backed, but it isn't clear to others (me, at least) *why* they must be
> disk-backed before the application asks them to be stored as a file using
> the File API.

Sorry, I didn't mean to take that for granted at all. The current code handles just fine the case when the channel says that the resource won't be cached. At that point we just create our own copy. We should definitely keep that fallback path.

> For practical security reasons, the size of these blobs must
> be limited to prevent someone from eating too much of the user's disk space,
> even temporarily, and.

The alternative to keeping the data on disk is to keep it in memory. There we are even more constrained in how much data we can use and how much data we'd like websites to use.

> To make things more concrete: assume the user only has 200MB of disk space
> on their mobile phone that is writable for applications. How much of that
> disk space is a blob allowed to take up? IMO, it isn't reasonable to allow
> the sum total of all blobs for a particular site/page to take up more than a
> few megabytes of the disk space. But, then, why not just use RAM for them,
> letting the OS VMM swap as needed?

I'm not sure things will be simply swapped to disk. It seems just as likely that the OS will simply kill us.

We can certainly have a policy for how much temporary blob data we swap out to disk though, and hope that keeping the rest in memory won't result in us getting killed.
(In reply to Brian Smith (:bsmith) from comment #48)
> But, then, why not just use RAM for them,
> letting the OS VMM swap as needed?
Then swapped memory must be read synchronously. If it is not a problem, why do you try to fix this bug (one of "synchronous io must die" bugs) at all?
Blobs support asynchronus reading because it assumes the content may be written to disk. Rather, Blob can't be read synchronously except on Workers.
I'd like to get moving on supporting file-backed blobs for websockets, so it'd be nice to resolve our design decision here.  There's definitely no use-case for web socket blobs to live in the cache, so I'll be implementing a refcounted blob storage system (with a BlobFileService--name ok?--that can enforce whatever storage limits we want to have).  XHR can use that service too, unless we decide that blobs get reused enough (seems unlikely to me?) that they're better off in the HTTP cache.

> Then swapped memory must be read synchronously.

I agree that it's probably not very jank-friendly to start having the OS swap out/in parts of our app: even if it picks the right data (the blobs) I think emk is right that this means we'll hand out memory addresses to the "async" JS handlers that will then hit swap-in pauses when they access the memory.
(In reply to Jason Duell (:jduell) from comment #52)
> I'd like to get moving on supporting file-backed blobs for websockets, so
> it'd be nice to resolve our design decision here.  There's definitely no
> use-case for web socket blobs to live in the cache, so I'll be implementing
> a refcounted blob storage system (with a BlobFileService--name ok?--that can
> enforce whatever storage limits we want to have).  XHR can use that service
> too, unless we decide that blobs get reused enough (seems unlikely to me?)
> that they're better off in the HTTP cache.

How would an application indicate that its WebSockets blobs should not be stored on disk, e.g. for security reasons?

> > Then swapped memory must be read synchronously.
> 
> I agree that it's probably not very jank-friendly to start having the OS
> swap out/in parts of our app: even if it picks the right data (the blobs) I
> think emk is right that this means we'll hand out memory addresses to the
> "async" JS handlers that will then hit swap-in pauses when they access the
> memory.

That's why swap is LRU. And, also, you can touch all the blob's memory to force it back into RAM when the application reads from it.

AFAICT, if a blob is too large to keep in memory, then it is also to large to store on disk without user approval, in general. Obviously, user prompting isn't appropriate for this interface. That seems to mean limiting the maximum size of blobs used by each page/site/domain so that they don't each up too much disk space. 

It would be strange to artificially limit the size of blobs because of disk usage concerns. And, it would be *very* strange to implement logic like "hey, this blob would eat up too much disk space, so let's keep it in memory."

See bug 648064 comment 13 and bug 648064 comment 14.
> How would an application indicate that its WebSockets blobs
> should not be stored on disk

Easy--it can specify that it wants to receive the message as an arrayBuffer or a Unicode text msg (both of which are stored in RAM only), rather than a blob.

> That's why swap is LRU.

But LRU can't be as smart as we are.  If we have a blob that takes up 20% of available RAM and causes us to go to 110% of RAM, so the OS swaps out 10% of firefox, it won't be the blob that's swapped out (it's recently used), and that could cause all kinds of jank when whatever is swapped is needed. (and it could also cause us to get OOM killed as sicking points out).  It's true that we could use a background thread to touch all of a blob's RAM (though that's a lot of overhead in the case when it's already in RAM), but even that won't save us when it's some other part of firefox that got paged out.  If instead we store the blob to disk, and the webpage that uses it never reads the entire thing into memory at once, there's a better chance we'll never need to use swap or hit OOM.
(In reply to Brian Smith (:bsmith) from comment #53)
> AFAICT, if a blob is too large to keep in memory, then it is also to large
> to store on disk
Then why the disk cache is present at all? If the cache data is too much to keep in memory, then it is also too much to store on disk. So you can just store all cache data on memory. You will not be bothered with this bug anymore because the disk cache is no longer present. The Operation System's excellent LRU algorithm must be smarter than our poor brain.
> without user approval,
If user approved? Blobs can be read from the appcache.

> It would be strange to artificially limit the size of blobs because of disk
> usage concerns. And, it would be *very* strange to implement logic like
> "hey, this blob would eat up too much disk space, so let's keep it in
> memory."
Who claimed such a strange logic? Rather, I'm saying we shouldn't always store the blob on memory. What are you trying to say?
(In reply to Masatoshi Kimura [:emk] from comment #55)
> (In reply to Brian Smith (:bsmith) from comment #53)
> > AFAICT, if a blob is too large to keep in memory, then it is also to large
> > to store on disk
> Then why the disk cache is present at all? If the cache data is too much to
> keep in memory, then it is also too much to store on disk.

The disk cache is not comparable to XHR/Websockets blobs. In particular, we store items in the disk cache because we expect those items to be used across browsing sessions. In addition, we already have limits in place for the disk cache--maximum item size and maximum total cache size, that are inappropriate for blobs (please see below). This makes sense for the disk cache because if an item is too big for the disk cache, we simply avoid storing it. That is, we implement the logic "if the item is too big to store on disk, then just keep it in memory until we've consumed it, and then throw it away."

> > It would be strange to artificially limit the size of blobs because of disk
> > usage concerns. And, it would be *very* strange to implement logic like
> > "hey, this blob would eat up too much disk space, so let's keep it in
> > memory."
> Who claimed such a strange logic? Rather, I'm saying we shouldn't always
> store the blob on memory. What are you trying to say?

Let me try from a different approach:

What is the largest amount of the user's disk space that a web app should be allowed to (silently) consume via the blob mechanism? What happens if the application tries to create a blob that is larger than that?

On Fennec, we've been told that we only have ~15MB or so of disk space available for our use, worst-case. A naive implementation of disk-backed blobs would consume that entire budget, completely blowing away the cache, and potentially blowing away all of the user's application-writable disk space. If we try to cap the size of the largest blob to the amount of disk space we allow to be silently consumed, then it will sometimes be the case that a blob will not fit under that cap but WOULD fit in RAM. What do we do in that case?

Something just clicked in my head. Now I understand why the XHR and Websockets blob interfaces are defined the way they are. I am pretty sure I understanding exactly what should happen: when the application tries to consume a very large value blob via XHR or via WebSockets, we should block the network I/O after reading a certain large amount (say 1MB) into RAM, and only continue transfering the data on the network as the application consumes it. It is up to the application to store it into a file (using the File API) if it wants it to be persistent.

This way, an application could realistically use XHR or Websockets to transfer even a Blue-ray movie in a reasonable way, without the risk that the app could silently and dangerously consume all the user's disk space without the user's consent. Without doing things this way, there's no way (AFAICT) that we would ever be able to support very large transfers via XHR or Websockets.

If we did this, then we would not have to deal with cache directives like no-store (beyond how the cache already implements them) or implement any backing store for XHR/Websockets blobs.
> If we try to cap the size of the largest blob to the amount of disk space
> we allow to be silently consumed, then it will sometimes be the case that
> a blob will not fit under that cap but WOULD fit in RAM

I think we should write the code so that fennec or other clients have control over whether (and when) blobs are written to disk vs put into RAM.  There's lots of strategies we could try out.  The issue for this bug is just whether we expect re-use (in which case we have some motivation to keep files in the cache), and if not, whether we think it's OK to fallback to RAM-only, or whether this bug needs to block on implementing another file-backed blob storage system (I'm inclined to say no and yes respectively).

> when the application tries to consume a very large value blob... we should block the > network I/O... and only continue transfering the data on the network 
> as the application consumes it.

I'm not sure the API permits that: onmessage/onloadend imply that the data has arrived, so if we hit a network error after that, there may be no way to throw an appropriate error.  In the XHR case we're also required to fire off an onprogress event "every 50ms or for every byte transmitted", and AFAICT those have to happen before onloadend.

> to transfer even a Blue-ray movie in a reasonable way

Well, if you are trying to display a movie in real-time, and not keep it around, you'd break it into lots of little blobs (or more likely arrayBuffers).  If you want to grab the whole thing, FTP-style, the blob API is in some sense just a convenience wrapper that lets you say "copy all this data over the network and store in this file and tell me when it's ready."   And that's about the only model where sending such a large message makes sense.  Presumably the app writer has some reason why they want such a large file.  (Maybe they're prefetching data they think they're likely to use at some point far enough in the future that they're fine waiting for the last byte to arrive before accessing any of the data).
(In reply to Jason Duell (:jduell) from comment #57)
> I think we should write the code so that fennec or other clients have
> control over whether (and when) blobs are written to disk vs put into RAM. 
> There's lots of strategies we could try out.  The issue for this bug is just
> whether we expect re-use (in which case we have some motivation to keep
> files in the cache), and if not, whether we think it's OK to fallback to
> RAM-only, or whether this bug needs to block on implementing another
> file-backed blob storage system (I'm inclined to say no and yes
> respectively).

I say "do not treat these responses specially; respect HTTP caching directives in the response" and "no, but ultimately up to the people implementing XHR/WebSockets," respectively.

Is it the case that now we allow a web app to use too much (unbounded?) disk space simply by using the blob type? If so, we should file a separate, higher priority, bug to discuss mitigation of that.

> I'm not sure the API permits that: onmessage/onloadend imply that the data
> has arrived, so if we hit a network error after that, there may be no way to
> throw an appropriate error.  In the XHR case we're also required to fire off
> an onprogress event "every 50ms or for every byte transmitted", and AFAICT
> those have to happen before onloadend.

From http://www.w3.org/TR/XMLHttpRequest/#the-response-attribute:

  If responseType is the empty string or "text"
    If the state is not LOADING or DONE, return the empty 
      string and terminate these steps.
      [...]
      Return the text response entity body.
    Otherwise
        If the state is not DONE, return null and terminate these steps.
        [...]
        If responseType is "blob"
            Return the blob response entity body.  

Does this actually make sense though? Why shouldn't blob work like text and support reading the data received so far in the LOADING state? AFAICT, that would allow applications to transfer large files in a sane manner without forcing them to implement their own chunking byte-range assembly logic.

I am less familiar with WebSockets but IMO it should also (be changed to) allow streaming blobs, unless it is unsafe to do so. (I vaguely remember that websockets returns whole data frames for some security reason, but I cannot remember why off the top of my head.)

I think if we were to standardize and implement such changes, that would be a strong case for not implementing disk-backed blobs at all, and would solve numerous problems that would otherwise afflict disk-backed blobs.
FWIW,

http://msdn.microsoft.com/en-us/library/ie/hh673569%28v=vs.85%29.aspx
http://lists.w3.org/Archives/Public/public-webapps/2010AprJun/0343.html

Microsoft already implements something like what I described, using responseType "ms-stream", and Jonas and others discussed similar things before.
Also, let's be pragmatic about why this bug is a high priority: frequent(?) I/O on the main thread caused by some part of the browser that **commonly** uses STORE_ON_DISK_AS_FILE API.

Blobs in XHR/WebSockets are not a **common** thing right now. We don't need to completely remove the STORE_ON_DISK_AS_FILE API right away. I suggest we file separate bugs for each of the components that use STORE_ON_DISK_AS_FILE, split the Michal's patch up into separate patches, and check them in in priority order as soon as we have agreement on each bug. That might be a while for XHR, but that's OK because we have higher-priority things to do in the meantime.

Now, that leads to the next question: what are the most common uses of STORE_ON_DISK_AS_FILE API, that are causing most of the end-user-visible jank?
(In reply to Brian Smith (:bsmith) from comment #60)
> Now, that leads to the next question: what are the most common uses of
> STORE_ON_DISK_AS_FILE API, that are causing most of the end-user-visible
> jank?

IMO this is the case of HTML5 video. BTW it's not entirely clear to me why ChannelMediaResource::OnStartRequest() calls SetCacheAsFile() because it never calls GetCacheFile(). I.e. the only effect of calling SetCacheAsFile() is that the video is cached regardless the file size, but the file itself is never used.
I have talked to Jonas about this a few times now and we agreed that it is OK to remove XHR's use of the cache-as-file functionality if that will significantly help us with our main-thread-IO work. 

I recommend splitting up the patch into multiple pieces, by component, so that the relevant module peers can review each piece individually (Jonas for XHR, roc for media, Josh for plugins, ??? for the downloader, a Necko peer for the removal of the cache-as-file functionality.)
Blocks: 717382
What's the implication of this change on appcache (if any?).  I'm trying to chase down the source of bug 791017, where audio is not properly playing from appcache.  That's another case where media files really *do* need to be stored on disk...
Depends on: 808997
(In reply to cscott from comment #63)
> What's the implication of this change on appcache (if any?).

This change doesn't affect appcache in any way.
Attachment #609523 - Attachment is obsolete: true
Attachment #678771 - Flags: review?(joshmoz)
Attachment #678772 - Flags: review?(roc)
Attached patch patch v2 - XHRSplinter Review
Attachment #678773 - Flags: review?(jonas)
Attached patch patch v2 - necko part (obsolete) — Splinter Review
Attachment #678777 - Flags: review?(bsmith)
Comment on attachment 678771 [details] [diff] [review]
patch v2 - plugin part

Review of attachment 678771 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me but I'd like to have bsmedberg check it over as well.
Attachment #678771 - Flags: review?(joshmoz)
Attachment #678771 - Flags: review?(benjamin)
Attachment #678771 - Flags: review+
Attachment #678771 - Flags: review?(benjamin) → review+
Comment on attachment 678777 [details] [diff] [review]
patch v2 - necko part

Review of attachment 678777 [details] [diff] [review]:
-----------------------------------------------------------------

I don't quite understand what these tests are testing. Is it really OK to just remove them completely, or should we just be updating them or replacing them with new tests?

::: netwerk/cache/nsICache.idl
@@ +110,5 @@
>       */
>      const nsCacheStoragePolicy STORE_ANYWHERE        = 0;
>      const nsCacheStoragePolicy STORE_IN_MEMORY       = 1;
>      const nsCacheStoragePolicy STORE_ON_DISK         = 2;
> +    const nsCacheStoragePolicy STORE_OFFLINE         = 3;

Do not change the value of the STORE_OFFLINE flag. Instead, leave a comment that says that value 3 is now unused, so we don't reuse it for something in the future.
Blocks: 815132
I've checked-in XHR part to fix bug #815132.

https://hg.mozilla.org/integration/mozilla-inbound/rev/4c02ec84a2c1


I'll land the rest after fixing bug #808997 and after making changes according to Brian's review in comment #70.
Whiteboard: [leave open]
Duplicate of this bug: 815132
Comment on attachment 678777 [details] [diff] [review]
patch v2 - necko part

Dropping r? since Michal said he's going to update the patch. Also, please answer my question: "Is it really OK to just remove them completely, or should we just be updating them or replacing them with new tests?" I didn't look at the tests carefully so I'm not sure either way.
Attachment #678777 - Flags: review?(bsmith)
Attached patch patch v3 - necko part (obsolete) — Splinter Review
(In reply to Brian Smith (:bsmith) from comment #70)
> I don't quite understand what these tests are testing. Is it really OK to
> just remove them completely, or should we just be updating them or replacing
> them with new tests?

I've updated the tests and I also verified that they can still detect the issues fixed by bugs #651100 and #654926.


> Do not change the value of the STORE_OFFLINE flag. Instead, leave a comment
> that says that value 3 is now unused, so we don't reuse it for something in
> the future.

fixed
Attachment #678777 - Attachment is obsolete: true
Attachment #691957 - Flags: review?(bsmith)
OS: Windows 7 → All
Hardware: x86_64 → All
Version: 10 Branch → Trunk
Comment on attachment 691957 [details] [diff] [review]
patch v3 - necko part

Review of attachment 691957 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. I admit I did not read the test changes too carefully :(.

::: netwerk/base/src/nsDownloader.cpp
@@ +55,5 @@
> +
> +        rv = mLocation->CreateUnique(nsIFile::NORMAL_FILE_TYPE, 0600);
> +        if (NS_FAILED(rv)) return rv;
> +
> +        mLocationIsTemp = true;

The existing code code (before your change) was confusing to the point of being dangerous. In particular, when an error occurs that causes us to return early, mLocation will be set to an incorrect path.

Now that you've made this change, mLocationIsTemp will always be true in the success case. I suggest that you remove the need for the mLocationisTemp variable by writing the code like this:

nsCOMPtr<nsIFile> location;
 rv = NS_GetSpecialDirectory(NS_OS_TEMP_DIR, getter_AddRefs(location));

char buf[13];
NS_MakeRandomString(buf, 8);
memcpy(buf+8, ".tmp", 5);
rv = location->AppendNative(nsDependentCString(buf, 12));
if (NS_FAILED(rv)) return rv;

rv = location->CreateUnique(nsIFile::NORMAL_FILE_TYPE, 0600);
if (NS_FAILED(rv)) return rv;

location.forget(mLocation);

Then, we would know that whenever mLocation is set, it is set to a valid path, and that we've created that file.
Attachment #691957 - Flags: review?(bsmith) → review+
(In reply to Brian Smith (:bsmith) from comment #76)
> Now that you've made this change, mLocationIsTemp will always be true in the
> success case. I suggest that you remove the need for the mLocationisTemp
> variable by writing the code like this:

mLocationIsTemp will be always true when we set mLocation in nsDownloader::OnStartRequest() but it will be false if mLocation is set in nsDownloader::Init(). mLocationIsTemp is there to differentiate between these situations so we know whether we should keep or remove the file in destructor. Or am I missing something?
(In reply to Michal Novotny (:michal) from comment #77)
> (In reply to Brian Smith (:bsmith) from comment #76)
> > Now that you've made this change, mLocationIsTemp will always be true in the
> > success case. I suggest that you remove the need for the mLocationisTemp
> > variable by writing the code like this:
> 
> mLocationIsTemp will be always true when we set mLocation in
> nsDownloader::OnStartRequest() but it will be false if mLocation is set in
> nsDownloader::Init(). mLocationIsTemp is there to differentiate between
> these situations so we know whether we should keep or remove the file in
> destructor. Or am I missing something?

OK. thanks for pointing that out. Never mind the suggestion to remove mLocationisTemp. Still seems like a good idea to avoid setting mLocation in the case of an error though.
Blocks: 405407
Blocks: 814010
(In reply to Brian Smith (:bsmith) from comment #78)
> Still seems like a good idea to avoid setting mLocation in the case of an
> error though.

OK, it is fixed in the new patch. Carrying forward r+
Attachment #691957 - Attachment is obsolete: true
Attachment #692995 - Flags: review+
Whiteboard: [leave open]
https://hg.mozilla.org/mozilla-central/rev/cddc8be15e62
https://hg.mozilla.org/mozilla-central/rev/9c02f2a0bd9b
https://hg.mozilla.org/mozilla-central/rev/3553adfc7a6c
Status: NEW → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Blocks: 758296
No longer depends on: 758296
You need to log in before you can comment on or make changes to this bug.