Open Bug 1116846 Opened 9 years ago Updated 2 years ago

<meta> element far enough down in the file causing re-read of the file 'loses' the temp file inbetween the two reads if it is unlinked immediately

Categories

(Core :: Networking: File, defect, P5)

34 Branch
x86_64
Linux
defect

Tracking

()

People

(Reporter: vincent-moz, Unassigned)

Details

(Keywords: reproducible, testcase, Whiteboard: [necko-would-take])

Attachments

(1 file)

Attached file file-not-found.pl
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0
Build ID: 20141125180439

Steps to reproduce:

1. Start Firefox (must be called "firefox", see the contents of the attached Perl script).
2. Execute the attached Perl script. It creates a temporary file and asks Firefox to open it.


Actual results:

I get a "File not found" error.


Expected results:

The HTML file should have been opened.

The problem comes from the meta element that is far from the beginning of the file (with a closer meta element or no meta at all, the file is opened without an error). I suspect that here, Firefox closes the file and tries to reopen it, yielding an obvious error. See the contents of the script for more information.
Note: in practice, I got the error when opening an HTML mail from YouTube from a text-only MUA (I use a script that look likes the one attached for that).
When I run your script while Firefox is running, the file is also actually not there in /tmp/ . It seems like your script removes the file when it finishes, and the firefox process will die after handing off the file to the already-running instance, thereby killing the tmp file.

So this seems like a race condition in the script, rather than a Firefox issue - if you keep the file in /tmp/ around, I bet the issue goes away...

Can you confirm this analysis? :-)
Flags: needinfo?(vincent-moz)
Ah, so I see what you mean about the <meta> thing now.

I expect that the default charset on your OS for these files is not UTF-8. When the browser gets the <meta> tag, it will have to re-request the file from the operating system in order to re-read it with the correct charset (this can happen over HTTP as well, hence why sending the right headers and keeping the <meta> tag close to the top is important!). If this happens late enough, the file will have been removed by your script by that time.
(In reply to :Gijs Kruitbosch from comment #2)
> When I run your script while Firefox is running, the file is also actually
> not there in /tmp/ . It seems like your script removes the file when it
> finishes, and the firefox process will die after handing off the file to the
> already-running instance, thereby killing the tmp file.

No, the script just unlinks the file. But as long as Firefox keeps the file open, the file is still there (though not linked any more from the /tmp/... filename). Indeed the unlink(2) man page under Linux says:

"unlink() deletes a name from the filesystem. If that name was the last link to a file and no processes have the file open, the file is deleted and the space it was using is made available for reuse. If the name was the last link to a file but any processes still have the file open, the file will remain in existence until the last file descriptor referring to it is closed."

Note: In practice, the script unlinks the file to avoid keeping temporary files (just needed to be rendered in Firefox) on the system. I'm not aware any other good way to do automatic clean-up (time-based clean-up is rather ugly), e.g. via hooks when a tab is opened or closed.

(In reply to :Gijs Kruitbosch from comment #3)
> I expect that the default charset on your OS for these files is not UTF-8.

It is UTF-8. But perhaps the default charset for Firefox is not UTF-8. Anyway whatever the default charset, there could be the same problem with a file using a different charset.
Flags: needinfo?(vincent-moz)
(In reply to Vincent Lefevre from comment #4)
> (In reply to :Gijs Kruitbosch from comment #2)
> > When I run your script while Firefox is running, the file is also actually
> > not there in /tmp/ . It seems like your script removes the file when it
> > finishes, and the firefox process will die after handing off the file to the
> > already-running instance, thereby killing the tmp file.
> 
> No, the script just unlinks the file. But as long as Firefox keeps the file
> open, the file is still there (though not linked any more from the /tmp/...
> filename). Indeed the unlink(2) man page under Linux says:
> 
> "unlink() deletes a name from the filesystem. If that name was the last link
> to a file and no processes have the file open, the file is deleted and the
> space it was using is made available for reuse. If the name was the last
> link to a file but any processes still have the file open, the file will
> remain in existence until the last file descriptor referring to it is
> closed."

Sure, but I expect Firefox to let go the file descriptor as soon as it has gotten all the data out... I guess we can keep a bug open for improving this situation in case the file we're opening is a temp file, but that doesn't seem likely to be very high priority.

> Note: In practice, the script unlinks the file to avoid keeping temporary
> files (just needed to be rendered in Firefox) on the system. I'm not aware
> any other good way to do automatic clean-up (time-based clean-up is rather
> ugly), e.g. via hooks when a tab is opened or closed.

Yes, rock and a hard place. :-(

Of course, an add-on can get this info, but it'll be hard to interface that into your perl script.

I don't really know how to improve this, although I would hazard a guess that doing an unlink after 5-10 seconds (depending on the speed of your machine) would work in practice without any serious downsides.
Component: Untriaged → Networking: File
Product: Firefox → Core
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: cannot find temporary HTML file when there is a far meta element → <meta> element far enough down in the file causing re-read of the file 'loses' the temp file inbetween the two reads if it is unlinked immediately
(In reply to :Gijs Kruitbosch from comment #5)
> Sure, but I expect Firefox to let go the file descriptor as soon as it has
> gotten all the data out...

I would say: ... as soon as it no longer needs to read the file (which includes internal re-requests).

Alternatively, messages between both firefox processes (the one that is currently running and the one that is started by the script) could be improved so that the latter one quits only when the former one has rendered the file (at least for "file:" URL's).
Whiteboard: [necko-would-take]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: