Closed Bug 243486 Opened 19 years ago Closed 17 years ago

Lots of small document.write spend a lot of time in cache code

Categories

(Core :: DOM: Core & HTML, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Unassigned)

Details

(Keywords: helpwanted, perf)

Attachments

(1 file)

This is only an issue for purely document.write-generated pages...

In any case, about 7% of the time in nsHTMLDocument::Write is spent writing data
to the cache on the testcases in bug 23187.  The profile output is:

170     3869 nsWyciwygChannel::WriteToCacheEntry(nsAString const&)
        3307 nsCacheEntryDescriptor::nsOutputStreamWrapper::Write
          92 nsDiskCacheOutputStream::Write(char const*, unsigned, unsigned*)

210     3412 nsCacheEntryDescriptor::nsOutputStreamWrapper::Write
        2495 nsCacheEntryDescriptor::nsOutputStreamWrapper::OnWrite(unsigned)
         639 nsDiskCacheOutputStream::Write(char const*, unsigned, unsigned*)

145      731 nsDiskCacheOutputStream::Write(char const*, unsigned, unsigned*)
         503 nsDiskCacheStreamIO::Write(char const*, unsigned, unsigned*)
          59 nsDiskCacheStreamIO::WriteToBuffer(char const*, unsigned)

42     2496 nsCacheEntryDescriptor::nsOutputStreamWrapper::OnWrite(unsigned)
       2200 nsCacheEntryDescriptor::RequestDataSizeChange(int)
        126 PR_Lock
         57 PR_Unlock
         47 nsCacheService::ServiceLock()

70     2231 nsCacheEntryDescriptor::RequestDataSizeChange(int)
        889 nsCacheService::OnDataSizeChange(nsCacheEntry*, int)
        726 nsCacheEntry::TouchData()
        104 nsDiskCacheDevice::OnDataSizeChange(nsCacheEntry*, int)
         84 PR_Unlock
         78 nsCacheService::ServiceLock()
         73 PR_Now
         72 PR_Lock
         25 pthread_self
         23 pthread_mutex_lock

And so forth.

The upshot is that adding data to a cache entry a few bytes at a time is
expensive....  It'd be nice if we could add it in chunks somehow, but without
having to copy too much (or at all, if possible).

Darin, any thoughts on good ways to do that?
hmm... we could probably also optimize the cache writes better, because they
should be going to a buffer in this case.  NOTE: your jprof data doesn't show
any disk writes.  WriteToBuffer does indeed put the data in a buffer (up to 16k
in size)!  maybe we need to use a better allocation scheme for that buffer?  or
maybe we should deCOMtaminate the cache a bit?  what is the source of this cost?

could it be the cost of acquiring the cache lock?
> what is the source of this cost?

Let's see....

The 528 hits in/under nsDiskCacheStreamIO::Write are split up as follows:

                178 nsDiskCacheStreamIO::WriteToBuffer(char const*, unsigned)
                 71 PR_Unlock
                 52 memcpy
                 51 PR_Lock
                 35 nsDiskCacheStreamIO::UpdateFileSize()
                 14 __i686.get_pc_thunk.bx
                 11 _init
                  9 nsCacheService::ServiceLock()
                  5 pthread_equal
                  4 pthread_self
                  4 pthread_mutex_unlock
                  2 nsDiskCacheMap::UpdateRecord(nsDiskCacheRecord*)

and 88 hits in the function itself.  The 237 hits in/under
nsDiskCacheStreamIO::WriteToBuffer are split as:

                 82 memcpy
                 38 nsDiskCacheStreamIO::FlushBufferToFile(int)

and 116 in the function itself.

The bigger piece of the cache pie is the 2496 hits under
nsCacheEntryDescriptor::nsOutputStreamWrapper::OnWrite.  This is split as:

               2200 nsCacheEntryDescriptor::RequestDataSizeChange(int)
                126 PR_Lock
                 57 PR_Unlock
                 47 nsCacheService::ServiceLock()
                 14 nsCacheService::OnDataSizeChange(nsCacheEntry*, int)

nsCacheEntryDescriptor::RequestDataSizeChange is split up as:

                889 nsCacheService::OnDataSizeChange(nsCacheEntry*, int)
                726 nsCacheEntry::TouchData()
                104 nsDiskCacheDevice::OnDataSizeChange(nsCacheEntry*, int)
                 84 PR_Unlock
                 78 nsCacheService::ServiceLock()
                 73 PR_Now
                 72 PR_Lock
                 25 pthread_self
                 23 pthread_mutex_lock
                 17 SecondsFromPRTime(long long)

nsDiskCacheDevice::OnDataSizeChange is split up as:

                355 GetCacheEntryBinding(nsCacheEntry*)
                110 nsCOMPtr_base::begin_assignment()
                 71 nsDiskCacheDevice::EvictDiskCacheEntries(int)
                 67 nsCacheEntry::GetData(nsISupports**)
                 32 nsCOMPtr_base::~nsCOMPtr_base()
                235 in the function itself

GetCacheEntryBinding is split up as:

                173 nsCOMPtr_base::~nsCOMPtr_base()
                108 nsCacheEntry::GetData(nsISupports**)
                 47 nsDiskCacheBinding::AddRef()
                 11 nsDiskCacheBinding::Release()
                 32 in the function itself

nsCacheService::OnDataSizeChange mostly just calls
nsDiskCacheDevice::OnDataSizeChange, which goes back to what we already looked at.

nsCacheEntry::TouchData is split up as:

                397 gettimeofday
                138 SecondsFromPRTime(long long)
                112 __divdi3
                 34 PR_Now

So the main cost is updating various metadata on every change to this buffer,
looks like.  A secondary cost is COM stuff.  Allocation seems to be a non-issue
here, compared to those.
Some findings of a quick analysis:
* in ::Write, UpdateFileSize is called for every call after the first 16K,
however only every 16K of data will result a real file write, so only than the
file size will really changes. 
So may be UpdateFileSize can be moved to FlushBufferToFile, so that this update
will only happen every 16K. (this UpdateFileSize is only required to maintain
the total disk cache size for purging...).  
Note that UpdateFileSize will only be called for Write when a disk file is
produced (so for items > 16K).
But as the testcase (test1) of bug 23187 produces in total about 48K of data,
UpdateFileSize will be called a lot (once every 5 bytes after the first 16K).

* WriteToBuffer is only used in Write itself, may be bring it into Write itself?

* nsDiskCacheStreamIO::FlushBufferToFile(PRBool  clearBuffer): Remove param
clearBuffer?
http://lxr.mozilla.org/seamonkey/source/netwerk/cache/src/nsDiskCacheStreams.h#96
 96     nsresult    FlushBufferToFile(PRBool  clearBuffer); // XXX clearBuffer
is always PR_TRUE

* The SecondsFromPRTime(PR_Now()) combination is costly: The extra conversions
on top of gettimeofday are redundant if we just want to have the current time in
seconds...

* nsOutputStreamWrapper::Write calling nsOutputStreamWrapper::OnWrite calling
nsCacheEntryDescriptor::RequestDataSize calling nsCacheService::OnDataSizeChange
and updating DataSize of the CacheEntry seems to be a round about way.
May be RequestDataSize/OnDataSizeChange needs to be called once every 1K
(because it only checks for maximum file size (with respect to total cache size)
and for space left in total cache rounded in 1K's)?

The first three items are easy to do in just one file, the other two require
NSPR stuff, resp. more thinking about the OnWrite/OnDataSizeChange flow/logic...
The last item seems to have the most performance impact potential though...
If nothing else, we could fix the easy stuff and then reprofile...
Keywords: helpwanted
Attachment #190528 - Flags: review?(darin)
Comment on attachment 190528 [details] [diff] [review]
V1: Reduce the number of 'UpdateFileSize': only do it when something is really written to the file

looks good, r=darin
Attachment #190528 - Flags: review?(darin) → review+
Comment on attachment 190528 [details] [diff] [review]
V1: Reduce the number of 'UpdateFileSize': only do it when something is really written to the file

Boris, can you sr this (if needed)?

Also, can we put this in 1.8? (b4?)
Attachment #190528 - Flags: superreview?(bzbarsky)
Attachment #190528 - Flags: approval1.8b4?
Comment on attachment 190528 [details] [diff] [review]
V1: Reduce the number of 'UpdateFileSize': only do it when something is really written to the file

sr=bzbarsky.  Let's get this in on the trunk, then see whether any issues
arise.	If nothing comes up in a day or two, this should be ok for branch, I
guess.

Alfred, do you want me to check this in?
Attachment #190528 - Flags: superreview?(bzbarsky) → superreview+
Indeed, can you check this in for me? 
I don't have cvs access, and I am not able to monitor the tinderboxes that
closely...

Thanks in advance, Alfred
Keywords: perf
Checked in on trunk.  Should we resolve this fixed and file a followup bug for
remaining issues, or leave this open?
Let mark this one as 'fixed', as the testing of the related bug 23187 confirmed
that document.write is now fast 'enough'...

Actually, some re-testing (on 22/8/2005) results:
             IE6.0        Deerpark 20050817     Deerpark20050821
Test1        0.38         0.37                  0.38
Test2        0.73         0.78                  0.72 (!)

So, we are very very close, or can we just say there now the same?
Notes:
* Deerpark20050821 has bug 243486 fixed (Lots of small document.write spend a
lot of time in cache code).
* IE6.0 was really: 6.0.2800.1106.xpsp2.05031-1526

As for the other reported 'inefficiencies':
* SecondsFromPRTime: See Bug 58310
* OnWrite/OnData/etc: too complex, too risky, not much reward, ...
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
As this is checking into the trunk and verified working, can I be so bold to
request to check it also into the branch?
Flags: blocking1.8b5?
Flags: blocking1.8b4?
Comment on attachment 190528 [details] [diff] [review]
V1: Reduce the number of 'UpdateFileSize': only do it when something is really written to the file

too late for 1.5.
Attachment #190528 - Flags: approval1.8b4? → approval1.8b4-
Not a blocker and not worth the risk for 1.5.
Flags: blocking1.8b5?
Flags: blocking1.8b5-
Flags: blocking1.8b4?
Flags: blocking1.8b4-
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.