Our media cache files are not always deleted

RESOLVED FIXED

Status

()

Core
Audio/Video
RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: roc, Assigned: cpearce)

Tracking

Trunk
x86
Windows XP
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 final+)

Details

Attachments

(3 attachments, 4 obsolete attachments)

See https://bugzilla.mozilla.org/show_bug.cgi?id=567085#c1

Looks like FILE_DELETE_ON_CLOSE doesn't always work :-(

This should block now that we've increased our cache size to something fairly large.
blocking2.0: --- → final+
Was this on Windows?  On Mac and Unix we seem to PR_Delete the file when we PR_Open it when the flag is set....
Yes, just Windows.
So I can't find a good way to do this on Windows, offhand.  Perhaps a separate watchdog process that will detect the main browser dying and delete the files?  Might not help with OS shutdown, but putting the files in some sort of temp dir might help with that....
Pretty sure they are already in the user's tmpdir
I think the thing to do is to name all our temp files with a consistent pattern and then on startup (well, some time after startup!) try to delete all files with that pattern. On Windows, trying to delete the open ones will fail.
(Assignee)

Comment 6

7 years ago
I'll take this. The plan is:
1. Figure out how this happens.
2. Name the temp files with some recognizable pattern.
3. Some time after startup (say 5 minutes) run task which deletes these temp files, except the one which is being used.
Assignee: nobody → chris
(In reply to comment #6)
> 3. Some time after startup (say 5 minutes) run task

nsIIdleService, perhaps?
(Assignee)

Updated

7 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 8

7 years ago
(In reply to comment #5)
> I think the thing to do is to name all our temp files with a consistent pattern
> and then on startup (well, some time after startup!) try to delete all files
> with that pattern. On Windows, trying to delete the open ones will fail.

So there are 26 places where we create temp files in mozilla-central, AFAICT: 
http://mxr.mozilla.org/mozilla-central/search?string=NS_OS_TEMP_DIR&find=&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central

Shall we can prepend the temp files' names with "mozilla_temp_"?

Where abouts on startup should I set the nsIIdleService observer?
(Assignee)

Comment 9

7 years ago
(In reply to comment #8)
> So there are 26 places where we create temp files in mozilla-central, AFAICT: 
> http://mxr.mozilla.org/mozilla-central/search?string=NS_OS_TEMP_DIR&find=&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central

Upon closer inspection, I think there are only 8 places that actually matter to us:

http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsDownloader.cpp#115
http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsDataObj.cpp#1427
http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsDataObj.cpp#1534
http://mxr.mozilla.org/mozilla-central/source/widget/src/xpwidgets/nsTransferable.cpp#143
http://mxr.mozilla.org/mozilla-central/source/modules/plugin/base/src/nsNPAPIPluginStreamListener.cpp#59
http://mxr.mozilla.org/mozilla-central/source/modules/plugin/base/src/nsPluginHost.cpp#893
http://mxr.mozilla.org/mozilla-central/source/xpinstall/src/nsXPInstallManager.cpp#1055
http://mxr.mozilla.org/mozilla-central/source/content/media/nsMediaCache.cpp#504

There are some uses of temp files in nsExternalHelperAppService.cpp, but the comments surrounding their usage says that the tempfiles need to have non-predictable names for security purposes.
I think we can have the name start with moz_temp_* and still be unpredictable, if we add enough random chars after the prefix.

However, the idea in comment #5 is only safe for files that we know we keep open as long as we're using them. We can't use it for all temporary files.
(Assignee)

Comment 11

7 years ago
Created attachment 471706 [details] [diff] [review]
Patch v1

Remove temp files after 20s of idle time. At the moment, this will only try to remove the media cache temp file.
Attachment #471706 - Flags: review?(roc)
Instead of banging this into nsBaseAppShell, I think it should go into a separate file and be a startup observer. I don't know exactly how that stuff works but I know you can register stuff to happen on startup. Font loading does it for example.
(Assignee)

Comment 13

7 years ago
Created attachment 475910 [details] [diff] [review]
Patch: Add idle observer in xpcom startup

bsmedberg: Can you review this? This patch ensures that temp files with a specific prefix get deleted some time after startup (provided they're not in use). This is to clean up temp files (specifically the 500MB media cache) which aren't deleted when we crash. Even if a file is opened with nsILocalFile::DELETE_ON_CLOSE, it may not be deleted if we crash.

Initially I added an app-startup observer to the layout module, which adds itself as an idle observer, but code messing around with files doesn't really belong in layout, so I moved it into xpcom startup.
Attachment #471706 - Attachment is obsolete: true
Attachment #475910 - Flags: review?
Attachment #471706 - Flags: review?(roc)
Attachment #475910 - Flags: review? → review?(benjamin)
Comment on attachment 475910 [details] [diff] [review]
Patch: Add idle observer in xpcom startup

This cache is not per-profile, and you could well end up deleting files out from under other Firefox instances, or Thunderbird, or whatever else. I really think you need to be more specific in some way. Is there a reason we can't store these in the local-profile directory (which should be on local disk, even if the profile is on a network server)?
Attachment #475910 - Flags: review?(benjamin) → review-
(In reply to comment #14)
> This cache is not per-profile, and you could well end up deleting files out
> from under other Firefox instances, or Thunderbird, or whatever else.

No we can't, because files with this prefix should always be held open by the process as long as they're needed (i.e. this should only be used for files what you'd want to use FILE_DELETE_ON_CLOSE for).

> I really
> think you need to be more specific in some way. Is there a reason we can't
> store these in the local-profile directory (which should be on local disk, even
> if the profile is on a network server)?

How would that help with the problem of FILE_DELETE_ON_CLOSE files not being deleted on close?
(In reply to comment #15)
> (In reply to comment #14)
> > This cache is not per-profile, and you could well end up deleting files out
> > from under other Firefox instances, or Thunderbird, or whatever else.
> 
> No we can't, because files with this prefix should always be held open by the
> process as long as they're needed (i.e. this should only be used for files what
> you'd want to use FILE_DELETE_ON_CLOSE for).

I should elaborate: this code that attempts to delete stray temporary files should therefore fail to delete any file that is still needed, because you can't delete open files on Windows. (If you could, we'd just use that to implement FILE_DELETE_ON_CLOSE!)
Benjamin, see comments #16 and #17.
If it's a per-profile location, you can delete the files (as you do here) without worrying about the effects on other instances, and without magic filenames which might conflict with non-mozilla processes. It also solves a worry I had about the original patch of the cost of enumerating the entire temp directory, which sometimes can have huge numbers of files.
(In reply to comment #18)
> If it's a per-profile location, you can delete the files (as you do here)
> without worrying about the effects on other instances, and without magic
> filenames which might conflict with non-mozilla processes.

We can make the prefix arbitrarily unlikely to conflict with other processes.

> It also solves a
> worry I had about the original patch of the cost of enumerating the entire temp
> directory, which sometimes can have huge numbers of files.

That is a potential issue.

Is this "local-profile" directory NS_WIN_LOCAL_APPDATA_DIR?
http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsDirectoryServiceDefs.h?force=1#153
I guess that's Windows-only?

So we would create temporary files in NS_WIN_LOCAL_APPDATA_DIR on Windows and NS_OS_TEMP_DIR on other platforms, with FILE_DELETE_ON_CLOSE, and then on Windows we do this clean-up-on-idle thing to delete the temporary files? We'd still be relying on the current instance holding the files open to prevent them from being deleted. Or would you prefer us to just delete the temporary files early in startup to avoid that dependency?
No, it's NS_APP_USER_PROFILE_LOCAL_50_DIR. This is available on all platforms. It is not available if there is no profile at all (certain embedders, the profile manager, and content processes). You can/should have a subdirectory MediaCache or somesuch.

http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsAppDirectoryServiceDefs.h#103

relying on delete-on-close and the OS holding on seems reasonable.
(Assignee)

Comment 21

7 years ago
Created attachment 478172 [details] [diff] [review]
Patch v3

* On Windows only, the media cache temp file is now ${NS_APP_USER_PROFILE_50_DIR}/TempDeleteOnClose/media_cache. I chose not to put the media cache inside a MediaCache dir as suggested, as it seemed a bit overkill to add another directory to just store (in almost all cases) a single media cache file. I can easily put the media cache in a MediaCache dir inside TempDeleteOnClose if you prefer though.
* Change the nsTempFileRemover (from the previous patch) to nsWinTempFileRemover, and only have it instantiated on Windows. Have this try to recursively delete ${NS_APP_USER_PROFILE_50_DIR}/TempDeleteOnClose/ when it receives its idle notification.
* All other platforms have unchanged behaviour, i.e. they still put the media cache in ${NS_OS_TEMP_DIR}/media_cache, and we rely on delete-on-close to work on those non-windows platforms.
Attachment #475910 - Attachment is obsolete: true
Attachment #478172 - Flags: review?(benjamin)
Comment on attachment 478172 [details] [diff] [review]
Patch v3

1) Please note that it must be the LOCAL_ profile directory, which is always on a local/fast disk. The profile directory may be on a network share.

2) Why make the logic different for Windows? It still seems like extra complexity without benefit. Just have a profilelocaldir/MediaCache directory in all cases (and skip TempDeleteOnClose, which is superfluous since we know where the directory will be).
Attachment #478172 - Flags: review?(benjamin) → review-
We probably should generate a unique temp file name, since we can have multiple processes using a single profile now.
(Assignee)

Comment 24

7 years ago
(In reply to comment #22)
> Comment on attachment 478172 [details] [diff] [review]
> 2) Why make the logic different for Windows? It still seems like extra
> complexity without benefit.

What about platforms which don't have a profiledir? You said that "certain embedders, the profile manager, and content processes" don't have a profiledir. Will video never be used in these modes? I'm not familiar with these other modes, but video won't work without media cache.

Do you want to build nsWinTempFileRemover and call nsWinTempFileRemover_Startup() on all platforms? It's only needed on Windows...

> Just have a profilelocaldir/MediaCache directory in
> all cases (and skip TempDeleteOnClose, which is superfluous since we know where
> the directory will be).

The goal was to create a general place where we could put delete-on-close files where we'd be sure they were deleted later if we crashed, and then use this functionality for the media cache, and any other temp files we'd like to make this guarantee for in future.

That's why I put this in $profiledir/TempDeleteOnClose, since in future there could be other files in there.

If we don't want to add such general functionality, how about I rename nsWinTempFileRemover to nsWinMediaCacheFileRemover, so that it's name reflects its function?


(In reply to comment #23)
> We probably should generate a unique temp file name, since we can have multiple
> processes using a single profile now.

You already thought of this roc ;) The media cache file is opened with nsILocalFile::CreateUnique(), which will append an integer to the filename if a file with that name already exists:

http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileCommon.cpp#172
(Assignee)

Comment 25

7 years ago
bsmedberg: Can you reply to my questions in comment 24 so I can update my patch?

I'm still unsure whether it's ok to put the media cache in the profiledir when we (eventually?) switch desktop Firefox over to using content-processes. Is it true that in future we'll switch desktop Firefox over to having one content process per tab? So all the JS for that tab runs in the content-process? The media cache is currently created on the thread which the network load listener runs on. Does this run in the content-process as well? Does this mean that when we migrate to using content-processes, we'll have one media cache per content-process/tab, and that since we create the media cache in the content-process, we won't be able to create the media cache in the profiledir, since we can't access that from content-processes?

Video won't work without the media cache.
1) I don't want general functionality, let's be specific for now.
2) I'm not sure about the network load listener: in e10s the actual networking is done by the chrome process, and it sends the data over to the content process. In which process should the media cache live?
3) In any case, I think we should always put it in the local-profile directory for now, and deal with e10s "next". File a followup, since presumably fennec wants <video> to work ;-)
(Assignee)

Comment 27

7 years ago
Created attachment 481410 [details] [diff] [review]
Patch v4

Specifically only try to clean up media cache's temp files, and store them in the local profile dir.
Attachment #478172 - Attachment is obsolete: true
Attachment #481410 - Flags: review?
(Assignee)

Updated

7 years ago
Attachment #481410 - Flags: review? → review?(benjamin)
(In reply to comment #26)
> 2) I'm not sure about the network load listener: in e10s the actual networking
> is done by the chrome process, and it sends the data over to the content
> process. In which process should the media cache live?

The content process. Making it live somewhere else (the chrome process, I guess) would require a ton of work to make media elements' interaction with the cache IPC. Including reads from video decoder threads.

> 3) In any case, I think we should always put it in the local-profile directory
> for now, and deal with e10s "next". File a followup, since presumably fennec
> wants <video> to work ;-)

Hang on, so using the local-profile directory is going to break Fennec? Let's not do that.
So, I'm not sure what to say next. I really don't want to be enumerating the base tempdir for perf reasons. I'd *prefer* to communicate the profile-local directory to the content process and have it always use that, but that's probably a bit complicated. The easier solution is to use tempdir/mozilla-media-cache, and rely on the "fact" that the files will be held open and so we can't delete them while they are being used.
(Assignee)

Comment 30

7 years ago
Created attachment 483319 [details] [diff] [review]
Patch v5
Attachment #481410 - Attachment is obsolete: true
Attachment #483319 - Flags: review?(benjamin)
Attachment #481410 - Flags: review?(benjamin)
Comment on attachment 483319 [details] [diff] [review]
Patch v5

The behavior of nsIFile.remove(true) varies across Windows/Linux. On Windows, if it fails to remove a file it continues to enumerate the directory and removes other files:
http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileWin.cpp#1875

On Linux, failure to remove any file will stop the directory enumeration:
http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileUnix.cpp#997

In this case I think that's ok: removing in-use files on Linux should work, and on Windows it won't stop enumeration, and we don't have subdirectories to worry about.
Attachment #483319 - Flags: review?(benjamin) → review+
(Assignee)

Comment 32

7 years ago
Thanks Benjamin. At the moment this code is only run on Windows, so we'll only have to look out for the difference in behaviour of nsIFile.remove(true) on Linux if we start running it on Linux as well.
(Assignee)

Comment 33

7 years ago
http://hg.mozilla.org/mozilla-central/rev/34c006122b71
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Assignee)

Comment 34

7 years ago
Backed out due to errors in caused by one of the changesets I pushed this with.

http://hg.mozilla.org/mozilla-central/rev/f99a1506d50a
http://hg.mozilla.org/mozilla-central/rev/a61ca0dfe56b

Some of the errors:

Rev3 Fedora 12 mozilla-central opt test mochitests-5/5
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1287262011.1287262859.26658.gz

19554 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_videocontrols.html | [SimpleTest/SimpleTest.js, window.onerror] An error occurred - [object Event] at undefined:undefined
Thread 0 (crashed)
0  libxul.so!nsMediaCacheStream::Close [nsMediaCache.cpp:f9f10c04dceb : 1891 + 0x6]
1  libxul.so!nsMediaChannelStream::Close [nsMediaStream.cpp:f9f10c04dceb : 498 + 0x8]
2  libxul.so!nsWaveDecoder::Stop [nsWaveDecoder.cpp:f9f10c04dceb : 1372 + 0x8]

19557 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_videocontrols_audio_direction.html | Test timed out.
19560 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_videocontrols_video_direction.html | Test timed out.

Rev3 Fedora 12 mozilla-central debug test reftest
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1287261600.1287263280.28360.gz
Lots of failures...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 35

7 years ago
Created attachment 484223 [details] [diff] [review]
Patch 2: fix temp file permissions on posix systems

The reason this failed on Linux & MacOS is because I create the $tempDir/mozilla-media-cache directory with permissions 600. It needs execute permissions in order to create files in it, so we were unable to create the media cache file, and hence the overwhelming test failures.

This patch is based on top of the previous, and ensures the mozilla-media-cache dir has perms 700, so that we can create the temp file. My editor also took the liberty of removing some trailing whitespace in nsMediaCache.cpp.
Attachment #484223 - Flags: review?(roc)
Attachment #484223 - Flags: review?(roc) → review+
(Assignee)

Comment 36

7 years ago
http://hg.mozilla.org/mozilla-central/rev/22b6b7465458
http://hg.mozilla.org/mozilla-central/rev/191cad517e7c
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago7 years ago
Resolution: --- → FIXED
(Assignee)

Comment 37

7 years ago
Test failures on Win2003 (only) on `make check` and xpcshell tests.
make check:
TEST-UNEXPECTED-FAIL | Expected false, got true at e:/builds/moz2_slave/mozilla-central-win32-debug/build/storage/test/test_true_async.cpp:249!
TEST-UNEXPECTED-FAIL | Expected false, got true at e:/builds/moz2_slave/mozilla-central-win32-debug/build/storage/test/test_true_async.cpp:259!
TEST-UNEXPECTED-FAIL | Expected false, got true at e:/builds/moz2_slave/mozilla-
central-win32-debug/build/storage/test/test_true_async.cpp:277!

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1288056793.1288061749.27819.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1288062568.1288064674.8064.gz

xpcshell failures (lots of random ones):
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1288061137.1288064775.8597.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1288064065.1288066247.15745.gz

Backed out:

http://hg.mozilla.org/mozilla-central/rev/70f12cfa19d9
http://hg.mozilla.org/mozilla-central/rev/dbb756a35d6f
http://hg.mozilla.org/mozilla-central/rev/41d1e4327b9f
http://hg.mozilla.org/mozilla-central/rev/c9074c25fd5a

The idle service changed last week: http://hg.mozilla.org/mozilla-central/rev/6e4fbb231f1c perhaps that's a factor.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 38

7 years ago
mochitest-chrome failed as well:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1288061137.1288068581.24652.gz
(Assignee)

Comment 39

7 years ago
So the problem which caused the test failures is in the fake idle service added in bug 602871. xpcshell tests installs a fake idleservice which does nothing in its nsIIdleService methods. This is in order to prevent any idle tasks running and disrupting tests. But we're adding the nsMediaCacheRemover as an idle observer in xpcom startup, before the fake idle service is installed - so the nsMediaCacheRemover is still added as an idle observer and receiving idle notifications. Installing the fake idleservice doesn't prevent the original idleservice from dispatching idle events, and the nsMediaCacheRemover's observe method is trying to remove itself as an idle observer. However when the neMediaCacheRemover requests the idleservice to remove itself as an idle observer, it gets the fake idleservice, and that doesn't do anything in its nsIIdleService::RemoveIdleObserver() implementation. So the nsMediaCacheRemover can't remove itself as an idle observer, and idle notifications keep coming and somehow that causes problems and causes failures in some tests.

So possible solutions:
1. Just use an nsITimer to run the nsMediaCacheRemover, with say a 5 minute timeout. This is easiest.
2. Don't install the nsMediaCacheRemover as an idle observer when running inside xpcshell (how do we detect that we're running in xpcshell? Kick it off somewhere in browser/ ?).
3. Store a weak reference to the idleservice in nsMediaCacheRemover when its added as an idle observer, use that reference to the original idleservice to remove the nsMediaCacheRemover as an idle observer. This seems like  hacking around the real problem though.
4. Have the fake idleservice somehow "pause" the original idleservice notifications in xpcshell.
5. Install the nsMediaCacheRemover as an idle observer after the fake idle service has been installed. I'm not sure how we can do this.
It seems to me that the fake idle service should be intercepting calls from startup observers. The fact that it's not doing that is the real bug here.

I'm not sure how to fix that though.
(Assignee)

Comment 41

7 years ago
(In reply to comment #40)
> It seems to me that the fake idle service should be intercepting calls from
> startup observers. The fact that it's not doing that is the real bug here.

Our idle observer is created just after the startup observers are notified. The fake idle service hasn't been created yet, I think xpcshell/js interpreting hasn't started yet either, so wouldn't intercepting calls from startup observers be impossible?
(Assignee)

Comment 42

7 years ago
How about I set a timer at startup with an expiration of a couple of minutes which upon running creates the nsMediaCacheRemover and adds it as an idle observer? That way the fake idle service will be running by the time we add the nsMediaCacheRemover as an idle observer.
That would work, but it sounds slightly wasteful for non-test builds.

Option #3 sounds best to me.
(Assignee)

Comment 44

7 years ago
(In reply to comment #43)
> That would work, but it sounds slightly wasteful for non-test builds.
> 
> Option #3 sounds best to me.

I actually tried that and pushed it to try (http://hg.mozilla.org/try/rev/3757c1905a9c) but still got the failures, so we can't do that unfortunately. I'll test my fix in comment 42 on try. I pushed a similar timer-based work around to tryserver, and it was greenish.
(Assignee)

Comment 45

7 years ago
Created attachment 488320 [details] [diff] [review]
Patch: add nsMediaCacheRemover as an idle observer after 3 minute timeout

Works around the problem caused by the fake idle service in xpcshell-tests by adding the nsMediaCacheRemover as an idle observer after a 3 minute timeout. This means that in the case of xpcshell-tests, xpcshell will have started up and created the fake idle service which will properly intercept the addition of the nsMediaCacheRemover and ignore it as expected by xpcshell-tests.
Attachment #488320 - Flags: review?(benjamin)
(Assignee)

Comment 46

7 years ago
(In reply to comment #45)
> Created attachment 488320 [details] [diff] [review]
> Patch: add nsMediaCacheRemover as an idle observer after 3 minute timeout

This was greenish on TryServer as http://hg.mozilla.org/try/rev/4cd2ad09281d . The only failures were known randoms.
Comment on attachment 488320 [details] [diff] [review]
Patch: add nsMediaCacheRemover as an idle observer after 3 minute timeout

Bah, I hate the workaround, but I don't have a better suggestion at this point.
Attachment #488320 - Flags: review?(benjamin) → review+
(Assignee)

Comment 48

7 years ago
http://hg.mozilla.org/mozilla-central/rev/a20460fb08dd
http://hg.mozilla.org/mozilla-central/rev/8f0a6a230d48
http://hg.mozilla.org/mozilla-central/rev/a247e16e0c7f
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago7 years ago
Resolution: --- → FIXED
Depends on: 612246
Depends on: 684368
(Assignee)

Comment 49

5 years ago
For future reference, one way to reproduce this "FILE_DELETE_ON_CLOSE files aren't deleted" bug on WinXP is to open a video in Firefox, check that the media cache file has been created (it's in $HOME_DIR\Local Settings\Application Data\Mozilla\Firefox\Profiles\$ProfileDir\mozilla-media-cache) and then hard-reset your computer (i.e. cut the power out, or equivalent in your VM without going through normal power down). On reboot the media cache file will still be there.
You need to log in before you can comment on or make changes to this bug.