Closed Bug 71071 Opened 24 years ago Closed 23 years ago

rewriting cookies.txt on each cookie

Categories

(Core :: Networking: Cookies, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: cathleennscp, Assigned: morse)

Details

(Keywords: topperf)

Attachments

(1 file)

Currently, when we need to add a new cookie, we blow away the entire cookie file 
in order to update it with the new cookie info.  This is costly on performance, 
especially with slow file acess platform.

We need to keep the old cookie file and add the new line for the new cookie.
adding to topperf
Keywords: topperf
Let me rephrase that by saying that we need to reduce the frequency of writing 
this out by not doing it on each setcookie notification. 
Keywords: topperfperf
Summary: don't blow away entire cookie file each time a new cookie is added → rewriting cookies.txt on each cookie
Keywords: perftopperf
My position on this is that it's a feature :-). We crash often enough that
people get upset when we don't write to disk after each cookie change. We got
enough pressure in 4.0 to add this functionality then, and we're getting
pressure now from embeddors for the same reason (data loss potential).

Do we have performance data on this being a problem? It's obviously a function
of urls that use cookies.
Perhaps there's a middle ground.  We could write out once for each page load, 
perhaps when we got the notification that the page has finished loading.  As it 
is now, we write out on each cookie set, so if a page sets seven cookies (as 
does netscape.com), we do seven writes.
If we can tell that the cookie is not in the file, we should append.  Unix write
is atomic to disk filesystems, and append mode guarantees that the write goes at
the end of the file.  Batching up on end load sounds good too.

/be
yea, someone else mentioned this as well. If we can batch up a
per-top-level-document entity and write when that's complete that's the best way
to go. Many sites' cookie intent would be invalid if only 3 out of their 7
cookies were written anyway. Each top-level-doc can be considered a
"transaction." So, yea, let's tie cookie writing to an OnEndDocumentLoad() or
when a docuemnts loadgroup goes away (I'm partial to tha latter). The
internal/private cookie interface should have a "flush()" or "persist()" method
on it. We don't want to expose this on the public iface because we already have
a profile wide "flush()" mechanism that can be used if an embeddor wants to snap
things to disk.
> pressure now from embeddors for the same reason (data loss potential).

When the file is rewritten is a backup done and a crash in the middle handled on
restart?  If not, writing more frequently would increase the chance of data loss
of possibly the whole file.
IMO, that argument doesn't hold water. the likelyhood of crashing due to power
failure, disk i/o failure, system failure, or some other random act of G__, is,
relative to Mozilla crashes and stability problems, negligible at best.

I'd still argue that we're chasing our tail here. no-one has yet produced any
perf numbers regarding the penalty we're paying for doing this. Obviously
hitting the disk is expensive, I just don't think it's happening as often as
people think.

We don't even have to do Quantify level profiling, at least on windows and
linux, the cookie library is separate and can be removed from the components dir
by hand. Some stopwatch sanity checks on page loads w/ heavy cookie usage, w/
and w/ out the cookie lib might provide some insight.

If this is a noticable issue, we should take a couple approaches:
1. Consider each top level URL load a "transaction" and only write the file once
per top-level load.
2. We may want to do this anyway, but IIRC, we set a "cookies have changed" flag
at the outermost level of the SetCookie() routine which indeed makes us write
for *every* set-cookie header that comes in. We could set this flag at a more
granular level, perhaps *only* when a cookie has indeed changed (this would
require some additional logic to determine if a pre-existing cookie was there,
and whether or not it was identical to the one trying to be set).
sorry, didn't answer your question. backups and recovering from a broken write 
are not handled.
It depends on a lot of things, for example the size of the cookie file.

I have had this sort of thing bite me before in applications, but it may well
not be an issue in Mozilla.  Personally I wouldn't be too upset if I lost my
cookie file anyway.

Hopefully everyone agrees changing this to append is a good idea whether
immediate or not.

blindly appending would lead to cookie file growth beyond anyone's interest.
random access on the cookie file would involve some in memory representation of
the file's state, and at this point I'd argue we're putting more time into this
problem than is justified. AFAIK, this still hasn't shown up on any perf metric
radar.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9
Sounds like OS/All to me. Change back to WinNT if I'm wrong. <nit>btw, is this
really topperf or 'merely' perf in keyword terms?</nit>
OS: Windows NT → All
I ran some performance numbers on a p3 450 (maybe 350?) running windows 2k with 
a scsi hard drive. I used jrgm@netscape.com's performance server.  In one run, I 
remove the file saving code from cookies (I added an early return so that the 
code would not ever be hit.)  On the next run, I remove this early return so 
that there were no changes in my tree.  In an opt build, there was no negative 
performance for writing the cookies out.  In fact opposite as what you would 
expect, the build that wrote to disk every time a cookie was set ran a few 
seconds faster!! I attributed this to possible concurrent usage of the server, 
but jrgm discredited this claim.  

I am guessing that the disk cache is caching this file, or the scsi performance 
makes this kind of write nearly invisible.  We will have to try this test on a 
lower end machine, maybe a mac.
Some operating systems (OpenVMS being one of them) will not return from a 
close() until all the data is on the disk. Its probably these systems which are 
really getting hit by the cookies file getting rewritten several times per page 
load.
I wrote a small test to get some idea of the cost of writing a cookie file. 
Using my own cookie file to estimate how much data to write, I decided that the 
program should write 250 lines each of 75 characters. I put the 
open/write-loop/close in a loop and let it run long enough to get some 
reasonable numbers. My two test systems were a dual-cpu Alpha running OpenVMS, 
and a 500MHz pentium running RedHat 6.1. Results are:

OpenVMS - 70 "cookie files" in 9 seconds - 0.9s for 7 cookies
Linux - 7000 "cookie files" in 4 seconds - 0.004s for 7 cookies

[The "7 cookies" is because earlier in this bug someone states that visiting 
netscape.com results in 7 cookies. All times are elapsed times, not cpu times.]

So, somewhat as expected. On an O/S with a disk write cache the cost of writing 
multiple cookie files in negligable. But on an O/S where it cares that bits 
actually make it out to the disk the cost is pretty high.

Note that this test does not take into account the reading of the cookie file.

I'll attach the little test program in case anyone wants to try on a different 
operating system.
anyone want to run colin's program on the mac and post the findings here?
Jud: no one proposed blindly appending, that's silly.  Append-mode write makes
sense if the client can know the cookie is not yet in the file (and only the
client writes the file).

We have seen performance results from jrgm showing where this hurts: NFS.  NFS
does synchronous writes (without server-side NVRAM accelerator hardware) per its
spec, so clients can be sure that data is on stable storage by the time they get
the RPC reply message for a given write.

Then there is OpenVMS.

I think this bug is worth fixing for those who care.  Maybe Colin could come up
with a patch?

/be
I looked at the cooked code a little. First off, its NOT possible to simply 
append a new cookie if its new because the contents of the cookie file are 
sorted (by path name length). Also I'm not sure how much gain that would really 
have given use anyway since a lot of the time we are updating existing cookies.

At the moment whenever we create or update a cookie we set a changed flag and 
then call cookie_Save. So the fix is to just set the changed flag, and then call 
cookie_Save from an OnEndDocumentLoad routine (if the flag is set)? Is that's 
all we need to do here? I could probably manage that!
Just a small note that synchronous behavior can occur on a Linux ext2
filesystem. Just mount the fs with the sync option.

You can see the difference with this test. Assume the test program is
a.out, then do

  rm cookie-dummy.txt
  chattr +S `pwd`
  ./a.out 200
  rm cookie-dummy.txt
  chattr -S `pwd`
  ./a.out 200
Colink, I had already thought about the sorting issue and and concluded it is 
not a problem.  The cookies need to be sorted by pathlength when in memory but 
the order on the hard disk is insignificant -- whenever the cookies are read in 
they get re-sorted anyway.

But I also agree with you that the appending is probably not worth doing.

As far as the write on load done, you are correct that the flag that is already 
being set makes this very simple.  I am presently testing out a change that does 
this.  Furthermore I'm in the midst of a major code reorganization for the 
cookie module (see bug 46783) so it's best that nobody else modify any of the 
cookie files at the moment.  I'll probably be checking in the fix to save-once 
as part of the checkin for bug 46783 and will close out this bug report at that 
time.
Steve: thanks, that matches my experience with cookies files -- they need not be
sorted on disk.

Everyone: how about some measurements on the synchronous systems before we give
up on append-mode?  If the cookies file is small, it may not matter.  But in
general I think it's better to avoid rewriting a file in place.  The other
benefit is crashproofing, which OS kernel and utility code always ensures by
rewriting a tmp file on the same filesystem, then using an atomic rename(2)
system call to replace the existing file.  At no point will a crash leave you
with no data, partial data, or mangled data.

Of course, cookies.txt may not be worth the bother.  Opinions?  The code change
to do a safe tmpfile write and rename is small.  I can help with the details.

/be
The change for writing the cookies when the document finishes loading has now 
been included in the patch for bug 46783.  So I'll close out this bug report 
when that patch gets checked in.
Brendan: what about symlinks? Should mozilla replace a symlink with a
regular file just because it didn't expect it? It would annoy the user
who had her reasons. Does anyone consider a symlink a security issue? I
don't; others may.

It occurs to me that a power user could make the cookie file a named
pipe and do all sorts of interesting things. Maybe mozilla should pay
more attention to special files.

Personally, I don't think this is worth all the attention it's getting.
Fixed as part of the checkin for bug 46783 that just occured.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
verified
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: