Closed Bug 57730 Opened 24 years ago Closed 23 years ago

[imglib]Big Page Load Win->fix imgcache/neckocache loadattribute interactions

Categories

(Core :: Graphics: ImageLib, defect, P3)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: pnunn, Assigned: pavlov)

References

()

Details

(Keywords: embed, perf, topperf)

Bugs #56599, #57015 and #56599 all describe different aspects 
of the problem of applying Necko Load Attributes to loading 
images. The problem is acute when the VALIDATE_ALWAYS load attribute 
is selected as a pref. 

*********************************************************** 
Here's a quick description of the problem: 

Given a page with one image requested in 5 places, the image 
is requested from the server once (http 200), as expected. 
No verify (http 304) is needed. 

 html   http log 
 ------------------  -------- 
 <img src=blah.gif>           *200 
 <img src=blah.gif> 
 <img src=blah.gif> 
 <img src=blah.gif> 
 <img src=blah.gif> 
  

However, if the image is requested in different sizes in 3 of the 
5 requests, the image is requested from the server once (http 200) 
and verified twice (http 304). 

 html    http log 
 ------------------------  -------- 
 <img src=blah.gif>   *200 
 <img src=blah.gif w=100>  *304 
 <img src=blah.gif w=200>  *304 
 <img src=blah.gif w=300>  *304 
 <img src=blah.gif>   -none 
  

The imglib stores decompressed and sized images in the imgcache. 
The imglib needs to make another request to necko for data to 
decompress and size to the new dimensions...but it should get the 
data from the necko cache. There should be no reason for necko 
to verify the freshness of the data since it just got it for use 
on the same page. 

The imglib issues a GetURL request to necko and uses the defchannel 
load attributes associated with the request. The imglib has no 
concept of a page. It does have an idea about load groups, which 
should be the same as a page. 

The image lib makes a decision about using the imgcache data too early 
in the process. The decision is made on a policy, with no real info on 
the state of what is fresh in the necko cache. The decision is simple. 
"If its in the cache and it has the right dimensions, use it" or 
"Don't use whats in the imgcache". Our failsafe was that if the data in 
necko was fresh, we would decompress it from the neckocache, not retrieve 
it from the server. 

Once the imglib decides it should not use the imgcache, it should simply 
request it from necko and necko would send "fresh" data from the cache or 
from the server. This another place where we go wrong. Even though we are 
asking for data from the same page (or load group) we are reverifying the 
data freshness for every request. 

In bug#56599, Hyatt recommended a patch where the VALIDATE_ALWAYS load attribute 
was always ignored. FORCE_RELOAD and FORCE_VALIDATION are enforced. 
Ignoring VALIDATE_ALWAYS breaks view-image pages where the image is the toplevel 
document. 

It is incorrect, but might not be such a terrible, temporary solution for RTM. 
The worst that could happen is the user would need to shift-reload to trigger a 
FORCE_RELOAD for a view-image document, but his fix would not give us 4.x 
behaviour. 

I have tried to find ways to trap the condition of a view-image page, but have 
been unsuccessful at detecting it on the first pass, which is what is needed. 
I don't have access to the channel at that point, and 2 passes are needed 
due to the layout of the view-image 'synth document'. By the time I know the 
image is a toplevel doc by being part of a synth document, it is too late to 
tell the request to not use what is in the imgcache. 

The imglib clones the netcontext to issue its requests to necko. This might be 
a way to pass information modifying the enforcement the load attribute. 
This method would also require the imglib to track why it ignored 
an entry in the imgcache (for example, andimension difference). 

If there was some way necko tracks whether a request belongs to a page that is 
fresh (a fresh load group) then the imglib could just depend on requesting the 
latest and greatest image data. The cost of decoding image data is small 
compared 
to verifying the data from the server.
Status: NEW → ASSIGNED
*** Bug 58318 has been marked as a duplicate of this bug. ***
*** Bug 57773 has been marked as a duplicate of this bug. ***
Blocks: 61532
For the set of "static", captured copies of web pages that are being used for
daily load-time and memory testing (curt and twalker), here are some
statistics for the number of GETs that are generated by each of the pages,
and the number of redundant, duplicate GETs for the same image on each page
(sorted by the number of unique GETs for the page).

[This is with an emptied cache, and "Once per session" set for cache
preference. win2k/20010105 build. Note: this is for any GET, whether IMG or
JS or CSS, although IMG, of course, dominates the numbers].

                               GETs    DUPL   % DUPL  Unique GETs
----------------------------------------------------------------------
www.time.com                    83      8       10%     75
www.tomshardware.com            72      3       4%      69
www.zdnet.com_Gamespot.com      78      14      18%     64
www.spinner.com                 82      33      40%     49
www.cnn.com                     51      5       10%     46
www.nytimes.com                 48      2       4%      46
www.moviefone.com               50      8       16%     42
www.voodooextreme.com           43      4       9%      39
www.amazon.com                  38      0       0%      38
www.zdnet.com                   64      26      41%     38
www.ebay.com                    57      20      35%     37
www.digitalcity.com             45      10      22%     35
www.aol.com                     39      5       13%     34
web.icq.com                     40      7       18%     33
www.apple.com                   35      2       6%      33
www.expedia.com                 43      11      26%     32
www.msnbc.com                   26      1       4%      25
home.netscape.com               42      19      45%     23
news.cnet.com                   31      8       26%     23
www.travelocity.com             28      5       18%     23
my.netscape.com                 26      5       19%     21
espn.go.com                     19      0       0%      19
www.nytimes.com_Table           19      0       0%      19
www.wired.com                   31      13      42%     18
www.mapquest.com                16      0       0%      16
hotwired.lycos.com              15      0       0%      15
slashdot.org                    18      3       17%     15
www.iplanet.com                 20      5       25%     15
www.microsoft.com               19      4       21%     15
www.excite.com                  14      0       0%      14
www.msn.com                     13      0       0%      13
www.altavista.com               12      0       0%      12
www.sun.com                     16      6       38%     10
www.compuserve.com              9       0       0%      9
www.quicken.com                 9       0       0%      9
www.w3.org_DOML2Core            3       0       0%      3
www.yahoo.com                   2       0       0%      2
bugzilla.mozilla.org            1       0       0%      1
lxr.mozilla.org                 1       0       0%      1
www.google.com                  1       0       0%      1
----------------------------------------------------------------------
                              1259     227      18%    1036


I modified one of a page that had a level of redundant GETs that was average
for the set of pages (45 GETs, 10 duplicates, 22% duplicate). I ran the
page-load timing test against this modified page. The result was that page
load time was reduced by 18%.

Some pages would benefit even more, and some pages not at all, but it appears 
that eliminating these redundant GETs would be a huge performance win. [Yeah, 
you already knew that anyways, but "measure twice, cut once", so this is the
measure ...]. This would be kick-ass winner for the next release!, so adding
nsbeta1 nomination (pretty please).

Let me know what other information I might be able to dig out of the test.

p.s., note that these measurements are over a fast network. The cost to send
out multiple, unnecessary 'If-modified-since' HTTP requests over a 56K modem 
would be much greater.
Keywords: nsbeta1
> I modified one of a page that had a level of redundant GETs that was average
> for the set of pages 

I modified ... for the set of pages so that it no longer performed the 
redundant HTTP GETs. ...
Wow, lots of sites use more duplicate images than I would have guessed. Okay,
this is on my radar as something that needs to be fixed (was before too, but I'm
taking the bug now to keep it higher in mind).
Assignee: pnunn → saari
Status: ASSIGNED → NEW
Target Milestone: --- → mozilla0.9
Keywords: embed, perf
Blocks: 64833
Summary: fix imgcache/neckocache loadattribute interactions → Big Page Load Win->fix imgcache/neckocache loadattribute interactions
Status: NEW → ASSIGNED
Summary: Big Page Load Win->fix imgcache/neckocache loadattribute interactions → [imglib]Big Page Load Win->fix imgcache/neckocache loadattribute interactions
Depends on: 70938
Keywords: topperf
Pavlov, I think this is fixed now, but you'd know better. Close it if it is 
working.
Assignee: saari → pavlov
Status: ASSIGNED → NEW
yup, fixed when the new imagelib came online.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Per comments by pavlov, marking verified
Status: RESOLVED → VERIFIED
No longer blocks: 64833
You need to log in before you can comment on or make changes to this bug.