Closed Bug 1325336 Opened 3 years ago Closed 2 years ago

Generate statistics for cache performance on the given HW

Categories

(Core :: Networking: Cache, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox53 --- affected
firefox56 --- fixed

People

(Reporter: michal, Assigned: michal)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-active])

Attachments

(1 file, 3 obsolete files)

We should have some performance data on how fast the cache is when the disk is not stressed by excessive IO. We can continuously collect average time of open, read, write operations when the cache IO queue is short. Then we can use this data when deciding whether to fetch the resource from the network or from the cache. Also bug 1325331 can compare current data with this statistics to detect delays.
Blocks: 1325341
Assignee: nobody → michal.novotny
Whiteboard: [necko-active]
Blocks: 1354407
No longer blocks: 1354407
Attached patch patch v1 (obsolete) — Splinter Review
CachePerfStats gathers performance data for single open, read and write operations as well as the whole cache entry opening. It maintains long term and short term average. The long term average filters out excessive values and it represents and average time for a given operation when the cache is not busy. The short term average represents the current cache speed. By comparing these two stats we know pretty quickly that the cache is getting slower and then we race the cache with network immediately without a delay. Otherwise the delay is based on the average cache entry open time.

Honza, please have a look at the change in CacheIOThread. If you don't like it for whatever reason, I can probably remove it. Filtering only in CachePerfStats::MMA::AddValue would probably work too.
Attachment #8879876 - Flags: review?(valentin.gosu)
Attachment #8879876 - Flags: review?(honzab.moz)
Comment on attachment 8879876 [details] [diff] [review]
patch v1

Review of attachment 8879876 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good overall, apart from the names of methods that use microseconds and this one small question.

::: netwerk/cache2/CacheFileUtils.cpp
@@ +523,5 @@
> +{
> +}
> +
> +void
> +CachePerfStats::MMA::AddValue(uint32_t aValue)

I am worried that users of this API might not know the value is supposed to be microseconds. Apart from comments, I think it would be useful to change the name of the method (maybe AddValueMicroSeconds ?)

@@ +638,5 @@
> +    uint32_t avgLong = sData[i].GetAverage(true);
> +    uint32_t stddevLong = sData[i].GetStdDev(true);
> +
> +    if (avgShort > 3 * stddevLong + avgLong) {
> +      return true;

If GetAverage returns 0 when mCnt == 0, doesn't this mean IsCacheSlow() will return true for the first call? Are we ok with that?

::: netwerk/cache2/CacheFileUtils.h
@@ +157,5 @@
> +    ENTRY_OPEN = 3,
> +    LAST       = 4
> +  };
> +
> +  static void     AddValue(EDataType aType, uint32_t aValue, bool aShortOnly);

rename to AddValueMicroseconds?

@@ +158,5 @@
> +    LAST       = 4
> +  };
> +
> +  static void     AddValue(EDataType aType, uint32_t aValue, bool aShortOnly);
> +  static uint32_t GetAverage(EDataType aType, bool aFiltered);

Rename to GetAverageMicroseconds?

@@ +195,5 @@
> +  public:
> +    PerfData();
> +
> +    void     AddValue(uint32_t aValue, bool aShortOnly);
> +    uint32_t GetAverage(bool aFiltered);

Rename to AddValue/GetAverage+Microseconds?
Attachment #8879876 - Flags: review?(valentin.gosu) → review+
(In reply to Valentin Gosu [:valentin] from comment #2)
> > +void
> > +CachePerfStats::MMA::AddValue(uint32_t aValue)
> 
> I am worried that users of this API might not know the value is supposed to
> be microseconds. Apart from comments, I think it would be useful to change
> the name of the method (maybe AddValueMicroSeconds ?)

The code is resolution independent. In fact, it doesn't necessarily have to be time. Instead of renaming the methods, I'll add comments explaining why microseconds are used.

> > +    if (avgShort > 3 * stddevLong + avgLong) {
> > +      return true;
> 
> If GetAverage returns 0 when mCnt == 0, doesn't this mean IsCacheSlow() will
> return true for the first call? Are we ok with that?

Good point. I'll change it so it returns false in this case.
Comment on attachment 8879876 [details] [diff] [review]
patch v1

Review of attachment 8879876 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay here, r+ for the cacheiothread changes.

::: netwerk/cache2/CacheIOThread.h
@@ +68,5 @@
>    bool IsCurrentThread();
>  
>    uint32_t QueueSize(bool highPriority);
>  
> +  uint32_t EventCounter() { return mEventCounter; }

nit: make it uint32_t EventCounter() const please (if our Atomics are const on read).
Attachment #8879876 - Flags: review?(honzab.moz) → review+
Attached patch patch v2 (obsolete) — Splinter Review
Fixed according to reviewers' comments and fixed uint32_t overflow in CachePerfStats::MMA::AddValue and CachePerfStats::MMA::GetStdDev.
Attachment #8879876 - Attachment is obsolete: true
Attachment #8880383 - Flags: review+
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/29c843d51c4b
Generate statistics for cache performance on the given HW, r=valentin, r=honzab
sorry had to back this out for bustage like https://treeherder.mozilla.org/logviewer.html#?job_id=109223829&repo=mozilla-inbound
Flags: needinfo?(michal.novotny)
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/107ad5dc5c01
Backed out changeset 29c843d51c4b for bustage
Attached patch patch v3 - fixed a build error (obsolete) — Splinter Review
Attachment #8880383 - Attachment is obsolete: true
Flags: needinfo?(michal.novotny)
Attachment #8880435 - Flags: review+
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e44f1f746017
Generate statistics for cache performance on the given HW, r=valentin, r=honzab
Backed out for crashing e.g. in browser-chrome's browser_flashblock_on_with_ask_to_activate.js:

https://hg.mozilla.org/integration/mozilla-inbound/rev/768a1255610d1225ce412533da4379f664fe7c9c

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=e44f1f7460176d691d8bddf2795831537ec12e96&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=109264209&repo=mozilla-inbound

[task 2017-06-22T16:19:21.282029Z] 16:19:21     INFO - TEST-PASS | toolkit/components/url-classifier/tests/browser/browser_flashblock_on_with_ask_to_activate.js | Plugin should have the correct 'plugin running' state - 
[task 2017-06-22T16:19:21.284277Z] 16:19:21     INFO - TEST-PASS | toolkit/components/url-classifier/tests/browser/browser_flashblock_on_with_ask_to_activate.js | Plugin's existance in navigator.plugins should match expected - 
[task 2017-06-22T16:19:21.286332Z] 16:19:21     INFO - Buffered messages logged at 16:19:17
[task 2017-06-22T16:19:21.288327Z] 16:19:21     INFO - RUNNING TEST: Subdocument of Sub-document blocked domain (Ask to Activate, Flashblock on)
[task 2017-06-22T16:19:21.291062Z] 16:19:21     INFO - Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://example.com/browser/toolkit/components/url-classifier/tests/browser/flash_block_frame.html" line: 0}]
[task 2017-06-22T16:19:21.293016Z] 16:19:21     INFO - Buffered messages finished
[task 2017-06-22T16:19:21.295147Z] 16:19:21    ERROR - TEST-UNEXPECTED-FAIL | toolkit/components/url-classifier/tests/browser/browser_flashblock_on_with_ask_to_activate.js | application terminated with exit code 11
[task 2017-06-22T16:19:21.297468Z] 16:19:21     INFO - runtests.py | Application ran for: 0:01:56.480974
[task 2017-06-22T16:19:21.300898Z] 16:19:21     INFO - zombiecheck | Reading PID log: /tmp/tmpHcdNFfpidlog
[task 2017-06-22T16:19:21.303990Z] 16:19:21     INFO - ==> process 4289 launched child process 4309
[task 2017-06-22T16:19:21.306455Z] 16:19:21     INFO - ==> process 4289 launched child process 4331
[task 2017-06-22T16:19:21.310134Z] 16:19:21     INFO - ==> process 4289 launched child process 4359
[task 2017-06-22T16:19:21.312558Z] 16:19:21     INFO - ==> process 4289 launched child process 4398
[task 2017-06-22T16:19:21.314884Z] 16:19:21     INFO - ==> process 4289 launched child process 4421
[task 2017-06-22T16:19:21.320873Z] 16:19:21     INFO - ==> process 4289 launched child process 4451
[task 2017-06-22T16:19:21.323425Z] 16:19:21     INFO - ==> process 4289 launched child process 4460
[task 2017-06-22T16:19:21.325510Z] 16:19:21     INFO - zombiecheck | Checking for orphan process with PID: 4309
[task 2017-06-22T16:19:21.327482Z] 16:19:21     INFO - zombiecheck | Checking for orphan process with PID: 4331
[task 2017-06-22T16:19:21.329413Z] 16:19:21     INFO - zombiecheck | Checking for orphan process with PID: 4359
[task 2017-06-22T16:19:21.331424Z] 16:19:21     INFO - zombiecheck | Checking for orphan process with PID: 4398
[task 2017-06-22T16:19:21.333341Z] 16:19:21     INFO - zombiecheck | Checking for orphan process with PID: 4421
[task 2017-06-22T16:19:21.335261Z] 16:19:21     INFO - zombiecheck | Checking for orphan process with PID: 4451
[task 2017-06-22T16:19:21.337196Z] 16:19:21     INFO - zombiecheck | Checking for orphan process with PID: 4460
[task 2017-06-22T16:19:21.339383Z] 16:19:21     INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/KKpHTAwGRTuB1u6nkukM_g/artifacts/public/build/target.crashreporter-symbols.zip
[task 2017-06-22T16:19:45.125380Z] 16:19:45     INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmpb6Vryy.mozrunner/minidumps/6abe3e99-38cb-7942-6cd9-aa2875f7cf19.dmp /tmp/tmpzIfklq
[task 2017-06-22T16:20:10.084694Z] 16:20:10     INFO - mozcrash Saved minidump as /home/worker/workspace/build/blobber_upload_dir/6abe3e99-38cb-7942-6cd9-aa2875f7cf19.dmp
[task 2017-06-22T16:20:10.089911Z] 16:20:10     INFO - mozcrash Saved app info as /home/worker/workspace/build/blobber_upload_dir/6abe3e99-38cb-7942-6cd9-aa2875f7cf19.extra
[task 2017-06-22T16:20:10.466774Z] 16:20:10     INFO - PROCESS-CRASH | toolkit/components/url-classifier/tests/browser/browser_flashblock_on_with_ask_to_activate.js | application crashed [@ mozilla::net::CacheFileUtils::CachePerfStats::MMA::GetStdDev]
[task 2017-06-22T16:20:10.473171Z] 16:20:10     INFO - Crash dump filename: /tmp/tmpb6Vryy.mozrunner/minidumps/6abe3e99-38cb-7942-6cd9-aa2875f7cf19.dmp
[task 2017-06-22T16:20:10.475094Z] 16:20:10     INFO - Operating system: Linux
[task 2017-06-22T16:20:10.476951Z] 16:20:10     INFO -                   0.0.0 Linux 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64
[task 2017-06-22T16:20:10.478841Z] 16:20:10     INFO - CPU: x86
[task 2017-06-22T16:20:10.480651Z] 16:20:10     INFO -      GenuineIntel family 6 model 62 stepping 4
[task 2017-06-22T16:20:10.482680Z] 16:20:10     INFO -      1 CPU
[task 2017-06-22T16:20:10.488784Z] 16:20:10     INFO - 
[task 2017-06-22T16:20:10.493506Z] 16:20:10     INFO - GPU: UNKNOWN
[task 2017-06-22T16:20:10.496846Z] 16:20:10     INFO - 
[task 2017-06-22T16:20:10.501471Z] 16:20:10     INFO - Crash reason:  SIGSEGV
[task 2017-06-22T16:20:10.503450Z] 16:20:10     INFO - Crash address: 0x0
[task 2017-06-22T16:20:10.511383Z] 16:20:10     INFO - Process uptime: not available
[task 2017-06-22T16:20:10.513297Z] 16:20:10     INFO - 
[task 2017-06-22T16:20:10.515077Z] 16:20:10     INFO - Thread 13 (crashed)
[task 2017-06-22T16:20:10.517613Z] 16:20:10     INFO -  0  libxul.so!mozilla::net::CacheFileUtils::CachePerfStats::MMA::GetStdDev [CacheFileUtils.cpp:0298f22db7a5 : 583 + 0x2]
[task 2017-06-22T16:20:10.524329Z] 16:20:10     INFO -     eip = 0xf046a136   esp = 0xef2c90a0   ebp = 0xef2c90e8   ebx = 0xf55399a4
[task 2017-06-22T16:20:10.526404Z] 16:20:10     INFO -     esi = 0x00000032   edi = 0x00000000   eax = 0x00000000   ecx = 0x00000032
[task 2017-06-22T16:20:10.528937Z] 16:20:10     INFO -     edx = 0xf425ece0   efl = 0x00010246
[task 2017-06-22T16:20:10.532920Z] 16:20:10     INFO -     Found by: given as instruction pointer in context
[task 2017-06-22T16:20:10.535137Z] 16:20:10     INFO -  1  libxul.so!mozilla::net::CacheFileUtils::CachePerfStats::MMA::AddValue [CacheFileUtils.cpp:0298f22db7a5 : 532 + 0xb]
[task 2017-06-22T16:20:10.540092Z] 16:20:10     INFO -     eip = 0xf0470d07   esp = 0xef2c90f0   ebp = 0xef2c9138   ebx = 0xf55399a4
[task 2017-06-22T16:20:10.542339Z] 16:20:10     INFO -     esi = 0x00000032   edi = 0x00000000
[task 2017-06-22T16:20:10.545517Z] 16:20:10     INFO -     Found by: call frame info
[task 2017-06-22T16:20:10.547546Z] 16:20:10     INFO -  2  libxul.so!mozilla::net::CacheFileUtils::CachePerfStats::PerfData::AddValue [CacheFileUtils.cpp:0298f22db7a5 : 598 + 0xc]
[task 2017-06-22T16:20:10.552294Z] 16:20:10     INFO -     eip = 0xf0470e54   esp = 0xef2c9140   ebp = 0xef2c9158   ebx = 0xf55399a4
[task 2017-06-22T16:20:10.554177Z] 16:20:10     INFO -     esi = 0xf5570578   edi = 0x000000b6
[task 2017-06-22T16:20:10.555910Z] 16:20:10     INFO -     Found by: call frame info
[task 2017-06-22T16:20:10.557797Z] 16:20:10     INFO -  3  libxul.so!mozilla::net::CacheFileUtils::CachePerfStats::AddValue [CacheFileUtils.cpp:0298f22db7a5 : 620 + 0x2a]
[task 2017-06-22T16:20:10.559649Z] 16:20:10     INFO -     eip = 0xf0470ed6   esp = 0xef2c9160   ebp = 0xef2c91a8   ebx = 0xf55399a4
[task 2017-06-22T16:20:10.561499Z] 16:20:10     INFO -     esi = 0x00000038   edi = 0xcc77bb20
[task 2017-06-22T16:20:10.564697Z] 16:20:10     INFO -     Found by: call frame info
[task 2017-06-22T16:20:10.566671Z] 16:20:10     INFO -  4  libxul.so!mozilla::net::ReadEvent::Run [CacheFileIOManager.cpp:0298f22db7a5 : 628 + 0x16]
[task 2017-06-22T16:20:10.568557Z] 16:20:10     INFO -     eip = 0xf04a0aa1   esp = 0xef2c91b0   ebp = 0xef2c9218   ebx = 0xf55399a4
[task 2017-06-22T16:20:10.570534Z] 16:20:10     INFO -     esi = 0xcc77aa80   edi = 0x0002ca50
[task 2017-06-22T16:20:10.572778Z] 16:20:10     INFO -     Found by: call frame info
[task 2017-06-22T16:20:10.577003Z] 16:20:10     INFO -  5  libxul.so!mozilla::net::CacheIOThread::LoopOneLevel [CacheIOThread.cpp:0298f22db7a5 : 570 + 0x8]
[task 2017-06-22T16:20:10.578934Z] 16:20:10     INFO -     eip = 0xf047cb58   esp = 0xef2c9220   ebp = 0xef2c9298   ebx = 0xf55399a4
[task 2017-06-22T16:20:10.580812Z] 16:20:10     INFO -     esi = 0x00000000   edi = 0xe78c8480
[task 2017-06-22T16:20:10.582555Z] 16:20:10     INFO -     Found by: call frame info
[task 2017-06-22T16:20:10.584383Z] 16:20:10     INFO -  6  libxul.so!mozilla::net::CacheIOThread::ThreadFunc [CacheIOThread.cpp:0298f22db7a5 : 508 + 0xf]
[task 2017-06-22T16:20:10.588916Z] 16:20:10     INFO -     eip = 0xf047cece   esp = 0xef2c92a0   ebp = 0xef2c92f8   ebx = 0xf55399a4
[task 2017-06-22T16:20:10.590902Z] 16:20:10     INFO -     esi = 0x00000009   edi = 0xef2c92db
[task 2017-06-22T16:20:10.592726Z] 16:20:10     INFO -     Found by: call frame info
[task 2017-06-22T16:20:10.594872Z] 16:20:10     INFO -  7  libxul.so!mozilla::net::CacheIOThread::ThreadFunc [CacheIOThread.cpp:0298f22db7a5 : 450 + 0x8]
[task 2017-06-22T16:20:10.601036Z] 16:20:10     INFO -     eip = 0xf047cf4c   esp = 0xef2c9300   ebp = 0xef2c9318   ebx = 0xf55399a4
[task 2017-06-22T16:20:10.603159Z] 16:20:10     INFO -     esi = 0xe78c8480   edi = 0x000010d6
[task 2017-06-22T16:20:10.604946Z] 16:20:10     INFO -     Found by: call frame info
[task 2017-06-22T16:20:10.607029Z] 16:20:10     INFO -  8  libnspr4.so!_pt_root [ptthread.c:0298f22db7a5 : 216 + 0x9]
[task 2017-06-22T16:20:10.608945Z] 16:20:10     INFO -     eip = 0xf72dac44   esp = 0xef2c9320   ebp = 0xef2c9368   ebx = 0xf72eb6c4
[task 2017-06-22T16:20:10.610788Z] 16:20:10     INFO -     esi = 0xe78c8fc0   edi = 0x000010d6
[task 2017-06-22T16:20:10.612558Z] 16:20:10     INFO -     Found by: call frame info
[task 2017-06-22T16:20:10.614286Z] 16:20:10     INFO -  9  libpthread-2.23.so + 0x6295
[task 2017-06-22T16:20:10.616136Z] 16:20:10     INFO -     eip = 0xf76a5295   esp = 0xef2c9370   ebp = 0xef2c9428   ebx = 0x00000000
[task 2017-06-22T16:20:10.618069Z] 16:20:10     INFO -     esi = 0x00000000   edi = 0x003d0f00
[task 2017-06-22T16:20:10.620773Z] 16:20:10     INFO -     Found by: call frame info
[task 2017-06-22T16:20:10.622578Z] 16:20:10     INFO - 10  libc-2.23.so + 0xe6eee
[task 2017-06-22T16:20:10.624877Z] 16:20:10     INFO -     eip = 0xf73d7eee   esp = 0xef2c9430   ebp = 0x00000000
[task 2017-06-22T16:20:10.628024Z] 16:20:10     INFO -     Found by: previous frame's frame pointer
Flags: needinfo?(michal.novotny)
This also caused talos failures each time it landed, but that might just be because it crashed things before talos could finish running.
Attached patch patch v4Splinter Review
The assertion could be false due to rounding errors. Fixed in the new patch.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=406bd17f2962f8b14336c434e4bb6926702c4495&selectedJob=109467678
Attachment #8880435 - Attachment is obsolete: true
Flags: needinfo?(michal.novotny)
Attachment #8880735 - Flags: review+
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4da7ff4c92bf
Generate statistics for cache performance on the given HW, r=valentin, r=honzab
https://hg.mozilla.org/mozilla-central/rev/4da7ff4c92bf
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.