Last Comment Bug 711297 - Report startup cache age via telemetry
: Report startup cache age via telemetry
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla13
Assigned To: Nathan Froyd [:froydnj]
:
Mentors:
Depends on: 730196
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-15 17:20 PST by (dormant account)
Modified: 2012-03-11 20:24 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
part 1 - enhance nsZipArchive with comment functionality (1.70 KB, patch)
2011-12-19 13:43 PST, Nathan Froyd [:froydnj]
taras.mozilla: review-
Details | Diff | Review
part 2 - store/retrieve creation time and feed age to Telemetry (4.50 KB, patch)
2011-12-19 13:45 PST, Nathan Froyd [:froydnj]
taras.mozilla: review+
Details | Diff | Review
part 1 - enhance nsZipArchive with comment functionality, v2 (1.87 KB, patch)
2011-12-20 08:18 PST, Nathan Froyd [:froydnj]
taras.mozilla: review-
Details | Diff | Review
part 2 - store/retrieve creation time and feed age to Telemetry, v2 (4.59 KB, patch)
2011-12-20 08:20 PST, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
part 1 - enhance nsZipArchive with comment functionality, v3 (1.86 KB, patch)
2011-12-21 08:32 PST, Nathan Froyd [:froydnj]
taras.mozilla: review+
Details | Diff | Review
part 2 - store/retrieve creation time and feed age to Telemetry, v3 (4.87 KB, patch)
2011-12-21 08:33 PST, Nathan Froyd [:froydnj]
taras.mozilla: review+
Details | Diff | Review
part 1 - enhance nsZipArchive with comment functionality, v4 (1.86 KB, patch)
2011-12-29 08:25 PST, Nathan Froyd [:froydnj]
nfroyd: review+
vladan.bugzilla: checkin+
Details | Diff | Review
part 2 - store/retrieve creation time and feed age to Telemetry, v4 (4.88 KB, patch)
2011-12-29 08:28 PST, Nathan Froyd [:froydnj]
nfroyd: review+
Details | Diff | Review
part 3 - add recordAgesAlways method to nsIStartupCache (3.12 KB, patch)
2011-12-29 08:31 PST, Nathan Froyd [:froydnj]
mwu.code: review+
Details | Diff | Review
part 4 - add test STARTUP_CACHE_AGE_HOURS histogram (3.64 KB, patch)
2011-12-29 08:39 PST, Nathan Froyd [:froydnj]
taras.mozilla: review+
Details | Diff | Review
part 3 - add recordAgesAlways method to nsIStartupCache, v2 (3.12 KB, patch)
2012-01-06 08:34 PST, Nathan Froyd [:froydnj]
nfroyd: review+
vladan.bugzilla: checkin+
Details | Diff | Review
part 4 - add test for STARTUP_CACHE_AGE_HOURS histogram, v2 (4.42 KB, patch)
2012-02-04 05:31 PST, Nathan Froyd [:froydnj]
nfroyd: review+
Details | Diff | Review
part 2 - store/retrieve creation time and feed age to Telemetry, v5 (7.44 KB, patch)
2012-02-20 16:45 PST, Nathan Froyd [:froydnj]
nfroyd: review+
vladan.bugzilla: checkin+
Details | Diff | Review
part 4 - add test for STARTUP_CACHE_AGE_HOURS histogram, v3 (5.83 KB, patch)
2012-02-20 16:48 PST, Nathan Froyd [:froydnj]
nfroyd: review+
vladan.bugzilla: checkin+
Details | Diff | Review

Description (dormant account) 2011-12-15 17:20:17 PST
It's not clear what the lifetime of entries in startup cache is for a typical user. We wipe it it on upgrades, when corrupted, addon install.
Comment 1 Nathan Froyd [:froydnj] 2011-12-16 09:03:45 PST
1) What is the startup cache?
2) How is this useful?
Comment 2 (dormant account) 2011-12-16 14:09:20 PST
Startup cache contains parsed chrome js and other system-specific caches.
It would be good to understand how long startup cache lasts before it is corrupted/blown away, especially once we put more stuff like bug 705594 into it.

It's in the "local" profile directory. See http://dxr.mozilla.org/mozilla/mozilla-central/startupcache/StartupCache.h.html#l124 for details.
Comment 3 Nathan Froyd [:froydnj] 2011-12-19 13:43:19 PST
Created attachment 582939 [details] [diff] [review]
part 1 - enhance nsZipArchive with comment functionality

nsIFile and friends don't expose a createdTime for files, mostly because not all platforms usefully support that.  Tracking cache age via lastModifiedTime doesn't seem very useful.

Instead, I opted to store the creation time directly in the zip file in the comment field of the...uh...end directory struct (not sure what the official zip name for it is).  Doing this avoids raciness problems with a separate file.

To do this, though, nsZipArchive needs to know about reading comments.  nsZipWriter already knows about writing comments, so there's no modifications to be made there.
Comment 4 Nathan Froyd [:froydnj] 2011-12-19 13:45:45 PST
Created attachment 582945 [details] [diff] [review]
part 2 - store/retrieve creation time and feed age to Telemetry

Now that we can read comments, it's a simple matter to compute the age of the startup cache when we open it.
Comment 5 (dormant account) 2011-12-19 17:53:54 PST
Comment on attachment 582939 [details] [diff] [review]
part 1 - enhance nsZipArchive with comment functionality

ZipArchive uses zero-copy strings elsewhere. The right way to do this is to store the comment offset and fetch it ondemand.
ie do stuff you did in BuildFileList within GetComment();
wrap the map-asseccessing part of GetComment in MOZ_WIN_MEM_TRY_*
Comment 6 (dormant account) 2011-12-19 18:00:36 PST
Comment on attachment 582945 [details] [diff] [review]
part 2 - store/retrieve creation time and feed age to Telemetry

I would skip atoll and use memcmp to copy PRTime as a binary. StartupCache is already machine-specific. This might make the string api upset, this is fine too.
Comment 7 Nathan Froyd [:froydnj] 2011-12-20 07:38:26 PST
If I'm going to be storing raw pointers and lengths anyway, I might as well ditch the nsCString bits and just have nsZipArchive clients work directly with the pointers themselves.  Or does that cause issues with whatever memory mapping magic is used by nsZipArchive?
Comment 8 Nathan Froyd [:froydnj] 2011-12-20 08:18:41 PST
Created attachment 583174 [details] [diff] [review]
part 1 - enhance nsZipArchive with comment functionality, v2

Now with on-demand copying.  I realized I was talking nonsense about raw pointers and whatnot.
Comment 9 Nathan Froyd [:froydnj] 2011-12-20 08:20:00 PST
Created attachment 583177 [details] [diff] [review]
part 2 - store/retrieve creation time and feed age to Telemetry, v2

Now with binary storage.  Requesting re-review because of bit fiddling and because I restructured LoadArchive to use return-early style to avoid massive indentation.
Comment 10 (dormant account) 2011-12-20 11:21:33 PST
Comment on attachment 583174 [details] [diff] [review]
part 1 - enhance nsZipArchive with comment functionality, v2

># HG changeset patch
># User Nathan Froyd <froydnj@gmail.com>
># Date 1324337115 18000
>Bug 711297 - add comment reading to nsZipArchive; r=taras
>
>
>diff --git a/modules/libjar/nsZipArchive.cpp b/modules/libjar/nsZipArchive.cpp
>index 399227d..63cf608 100644
>--- a/modules/libjar/nsZipArchive.cpp
>+++ b/modules/libjar/nsZipArchive.cpp
>@@ -666,6 +666,18 @@ MOZ_WIN_MEM_TRY_BEGIN
>   if (sig != ENDSIG)
>     return NS_ERROR_FILE_CORRUPTED;
> 
>+  // Make the comment available for consumers.
>+  if (endp - buf >= ZIPEND_SIZE) {
>+    ZipEnd *zipend = (ZipEnd *)buf;
>+
>+    buf += ZIPEND_SIZE;
>+    PRUint16 commentlen = xtoint(zipend->commentfield_len);
>+    if (endp - buf >= commentlen) {
>+      char *comment = (char *)buf;
>+      mComment.Assign(comment, commentlen);
>+    }
>+  }

This doesn't seem right. All of this stuff seems like it can occur in GetComment().

also irc comments.
Comment 11 Nathan Froyd [:froydnj] 2011-12-21 08:32:35 PST
Created attachment 583513 [details] [diff] [review]
part 1 - enhance nsZipArchive with comment functionality, v3

Thinkos fixed.

For posterity: the IRC discussion was about better ways to do this, maybe by looking at modtimes of individual entries or somesuch.  The nsZipArchive API isn't really conducive to such things, so hijacking the comment field stays.
Comment 12 Nathan Froyd [:froydnj] 2011-12-21 08:33:36 PST
Created attachment 583514 [details] [diff] [review]
part 2 - store/retrieve creation time and feed age to Telemetry, v3

Fix some thinkos, clean includes, etc.  Nothing big.
Comment 13 (dormant account) 2011-12-21 10:53:30 PST
Comment on attachment 583513 [details] [diff] [review]
part 1 - enhance nsZipArchive with comment functionality, v3

nsCString -> nsACString is our usual convention.
Comment 14 (dormant account) 2011-12-21 11:03:42 PST
Comment on attachment 583514 [details] [diff] [review]
part 2 - store/retrieve creation time and feed age to Telemetry, v3

>+  PRTime now = PR_Now();
>+  if (!mArchive) {
>+    nsCString comment;
>+    char buf[sizeof(now)];
>+    memcpy(buf, &now, sizeof(buf));
>+    comment.Assign(&buf[0], sizeof(buf));
>+    zipW->SetComment(comment);
>+  }
This reads like java code. 
Would comment.Assign((char*)&now, sizeof(now)) be less clear?
 
>+
>+  nsresult LoadArchive(enum TelemetrifyAge flag);
>   nsresult Init();
>   void WriteToDisk();
>   nsresult ResetStartupWriteTimer();
>diff --git a/toolkit/components/telemetry/TelemetryHistograms.h b/toolkit/components/telemetry/TelemetryHistograms.h
>index 3a0552f..eb04e79 100644
>--- a/toolkit/components/telemetry/TelemetryHistograms.h
>+++ b/toolkit/components/telemetry/TelemetryHistograms.h
>@@ -123,6 +123,7 @@ HISTOGRAM(MAC_INITFONTLIST_TOTAL, 1, 30000, 10, EXPONENTIAL, "gfxMacPlatformFont
> 
> HISTOGRAM(SYSTEM_FONT_FALLBACK, 1, 100000, 50, EXPONENTIAL, "System font fallback (us)")
> HISTOGRAM(SYSTEM_FONT_FALLBACK_FIRST, 1, 40000, 20, EXPONENTIAL, "System font fallback, first call (ms)")
>+HISTOGRAM(STARTUP_CACHE_AGE_HOURS, 1, 3000, 20, EXPONENTIAL, "Startup cache age (hours)")
> 
> HISTOGRAM_BOOLEAN(SHUTDOWN_OK, "Did the browser start after a successful shutdown")
> 

Before we land this bug, please add a 3rd patch with a testcase. Modify some startup cache test to make sure that the telemetry histogram is being filled.
Comment 15 Nathan Froyd [:froydnj] 2011-12-29 08:25:54 PST
Created attachment 584754 [details] [diff] [review]
part 1 - enhance nsZipArchive with comment functionality, v4

nsCString -> nsACString.  Carrying over r+.
Comment 16 Nathan Froyd [:froydnj] 2011-12-29 08:28:58 PST
Created attachment 584755 [details] [diff] [review]
part 2 - store/retrieve creation time and feed age to Telemetry, v4

Address Java-isms in the code.

This patch also changes the age computation to be ceil(# of hours) rather than floor(# of hours) as it was in previous versions.  This is required for tests to be reasonable, since histogram buckets must start at 1, rather than 0, and we'd rather not have a test that takes an hour to run.  I believe this change to be relatively non-controversial, so I'm going to carry over r+ on this new patch.
Comment 17 Nathan Froyd [:froydnj] 2011-12-29 08:31:06 PST
Created attachment 584757 [details] [diff] [review]
part 3 - add recordAgesAlways method to nsIStartupCache

Since the tests for nsIStartupCache all run in a single process and we don't normally record startup cache age during a single process's lifetime, we need some way of changing this for tests.  nsIStartupCache's methods are basically all for testing purposes anyway, so there doesn't seem to be much harm in adding one more.
Comment 18 Nathan Froyd [:froydnj] 2011-12-29 08:39:20 PST
Created attachment 584760 [details] [diff] [review]
part 4 - add test STARTUP_CACHE_AGE_HOURS histogram

This patch adds bits to TestStartupCache.cpp to ensure that we do record interesting information in the STARTUP_CACHE_AGE_HOURS histogram.

I will admit upfront that it is ugly.  However, given that:

1) nsITelemetry wasn't really designed to be used from C++, but rather from JavaScript;
1a) Making it more suitable means having a discussion about whether the ipc/chromium bits should be part of our SDK API, which probably isn't worth having for just one test;
2) Telemetry::GetHistogramById isn't exported from libxul;
2a) Making it so: see 1a;
3) Rewriting the whole test in JavaScript: ugh;

This solution is probably the least bad way to do things.  I think it would be less ugly if early returns from main() were used instead of the futzing with use_js, but I was trying to keep the execute-all-tests style of the current program.

I also don't make any claims about whether I used the JS api 100% correctly; it works well enough for the purposes herein.
Comment 19 Michael Wu [:mwu] 2011-12-29 09:20:56 PST
Comment on attachment 584757 [details] [diff] [review]
part 3 - add recordAgesAlways method to nsIStartupCache

Review of attachment 584757 [details] [diff] [review]:
-----------------------------------------------------------------

::: startupcache/StartupCache.cpp
@@ +154,4 @@
>  
>  StartupCache* StartupCache::gStartupCache;
>  bool StartupCache::gShutdownInitiated;
> +enum StartupCache::TelemetrifyAge StartupCache::postFlushAgeAction = StartupCache::IGNORE_AGE;

IGNORE_AGE is 1, and postFlushAgeAction is a global. I suggest changing the TelemetrifyAge enum to allow this to be initialized to zero.

::: startupcache/StartupCache.h
@@ +170,4 @@
>      RECORD_AGE = 0,
>      IGNORE_AGE = 1
>    };
> +  static enum TelemetrifyAge postFlushAgeAction;

Something like gPostFlushAgeAction would be more consistent with how static variables are named in this class.
Comment 20 Nathan Froyd [:froydnj] 2012-01-06 08:34:31 PST
Created attachment 586445 [details] [diff] [review]
part 3 - add recordAgesAlways method to nsIStartupCache, v2

Address review comments.  Carrying over r+.
Comment 21 Michael Wu [:mwu] 2012-01-10 08:50:28 PST
Comment on attachment 584760 [details] [diff] [review]
part 4 - add test STARTUP_CACHE_AGE_HOURS histogram

Well, I don't think there's anything less painful than this at the moment, since we don't have access to the right API to do this directly. Unfortunately I don't know this test code or telemetry particularly well so taras might be able to do a better review.
Comment 22 Nathan Froyd [:froydnj] 2012-02-03 01:02:40 PST
Pushing this on try showed test failures on debug builds, so removing checkin-needed while I debug.
Comment 23 Nathan Froyd [:froydnj] 2012-02-03 05:40:06 PST
Comment on attachment 584760 [details] [diff] [review]
part 4 - add test STARTUP_CACHE_AGE_HOURS histogram

I'm seeing failures for this test on Try debug builds.  The error is:

Assertion failure: cx->runtime->requestDepth || cx->runtime->gcRunning, at e:/builds/moz2_slave/try-w32-dbg/build/js/src/jscntxt.cpp:1288

It doesn't appear to happen all the time: I can run Linux debug builds locally just fine, and my Linux Try debug builds sometimes win, but it seems to fail consistently elsewhere.

Luke, can you take a look at my JSAPI usage and tell me what bits I might be missing?
Comment 24 Nathan Froyd [:froydnj] 2012-02-03 08:35:36 PST
Comment on attachment 584760 [details] [diff] [review]
part 4 - add test STARTUP_CACHE_AGE_HOURS histogram

jdm pointed out JSAutoRequest and that looks like the solution to my problems.
Comment 25 Nathan Froyd [:froydnj] 2012-02-04 05:31:06 PST
Created attachment 594434 [details] [diff] [review]
part 4 - add test for STARTUP_CACHE_AGE_HOURS histogram, v2

Fixed test so it will work on debug builds with JS checking.  Carrying over r+.
Comment 26 Mozilla RelEng Bot 2012-02-04 09:58:56 PST
Autoland Patchset:
	Patches: 584754, 584755, 586445, 594434
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/rev/3b9b982beafb
Try run started, revision 3b9b982beafb. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=3b9b982beafb
Comment 27 Mozilla RelEng Bot 2012-02-04 15:15:20 PST
Try run for 3b9b982beafb is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=3b9b982beafb
Results (out of 206 total builds):
    success: 186
    warnings: 20
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-3b9b982beafb
Comment 28 Ed Morley [:emorley] 2012-02-05 10:22:36 PST
Unfortunately failed make check on Windows:
TEST-UNEXPECTED-FAIL | histogram didn't record samples
https://tbpl.mozilla.org/?tree=Try&rev=3b9b982beafb
Comment 29 Nathan Froyd [:froydnj] 2012-02-20 16:45:34 PST
Created attachment 599010 [details] [diff] [review]
part 2 - store/retrieve creation time and feed age to Telemetry, v5

There's no need to check for values > 0; Histogram will do that for us.  We also might encounter values == 0 depending on the resolution of PR_Now().  So ditch that check.
Comment 30 Nathan Froyd [:froydnj] 2012-02-20 16:48:25 PST
Created attachment 599011 [details] [diff] [review]
part 4 - add test for STARTUP_CACHE_AGE_HOURS histogram, v3

I am an idiot.  After much poking at try, I finally realized that PRTime might not have a very big resolution on Windows, resulting in an age of 0 (since the time of creation and the time of reading happen so quickly in tests).

But then I realized that I was using the wrong method to check if the histogram had changed!  histogram.sum is the sum of all the *samples* taken, not the sum of all the bucket counts.  So, naturally, if you are taking samples with a value of zero, the sum of all those is...zero.

Updated patch to do bucket-wise comparisons before and after.
Comment 31 Mozilla RelEng Bot 2012-02-21 07:03:48 PST
Autoland Patchset:
	Patches: 584754, 586445, 599010, 599011
	Branch: mozilla-central => try
Error applying patch 586445 to mozilla-central.
patching file startupcache/StartupCache.cpp
Hunk #2 FAILED at 496
Hunk #3 FAILED at 508
2 out of 5 hunks FAILED -- saving rejects to file startupcache/StartupCache.cpp.rej
patching file startupcache/StartupCache.h
Hunk #2 FAILED at 169
1 out of 2 hunks FAILED -- saving rejects to file startupcache/StartupCache.h.rej
abort: patch failed to apply

Could not apply and push patchset:
Comment 32 Nathan Froyd [:froydnj] 2012-02-21 07:12:26 PST
Whoops, need to tell the bot which order to land things in.
Comment 33 Mozilla RelEng Bot 2012-02-21 07:32:04 PST
Autoland Patchset:
	Patches: 584754, 586445, 599010, 599011
	Branch: mozilla-central => try
Error applying patch 586445 to mozilla-central.
patching file startupcache/StartupCache.cpp
Hunk #2 FAILED at 496
Hunk #3 FAILED at 508
2 out of 5 hunks FAILED -- saving rejects to file startupcache/StartupCache.cpp.rej
patching file startupcache/StartupCache.h
Hunk #2 FAILED at 169
1 out of 2 hunks FAILED -- saving rejects to file startupcache/StartupCache.h.rej
abort: patch failed to apply

Could not apply and push patchset:
Comment 34 Nathan Froyd [:froydnj] 2012-02-21 10:59:04 PST
Releng poked at this problem and declared it fixed.  Pushing to try tells all. :)  Third time's the charm!
Comment 35 Mozilla RelEng Bot 2012-02-21 11:02:33 PST
Autoland Patchset:
	Patches: 584754, 599010, 586445, 599011
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/pushloghtml?changeset=6e815d021d62
Try run started, revision 6e815d021d62. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=6e815d021d62
Comment 36 Mozilla RelEng Bot 2012-02-21 23:31:13 PST
Try run for 6e815d021d62 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=6e815d021d62
Results (out of 218 total builds):
    exception: 4
    success: 175
    warnings: 25
    failure: 14
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-6e815d021d62
 Timed out after 12 hours without completing.
Comment 37 Vladan Djeric (:vladan) 2012-02-22 11:15:29 PST
Comment on attachment 584754 [details] [diff] [review]
part 1 - enhance nsZipArchive with comment functionality, v4

https://hg.mozilla.org/integration/mozilla-inbound/rev/60d47e224c40
Comment 38 Vladan Djeric (:vladan) 2012-02-22 11:16:05 PST
Comment on attachment 599010 [details] [diff] [review]
part 2 - store/retrieve creation time and feed age to Telemetry, v5

https://hg.mozilla.org/integration/mozilla-inbound/rev/579029d9950a
Comment 39 Vladan Djeric (:vladan) 2012-02-22 11:16:29 PST
Comment on attachment 586445 [details] [diff] [review]
part 3 - add recordAgesAlways method to nsIStartupCache, v2

https://hg.mozilla.org/integration/mozilla-inbound/rev/69b718a5a15e
Comment 40 Vladan Djeric (:vladan) 2012-02-22 11:16:53 PST
Comment on attachment 599011 [details] [diff] [review]
part 4 - add test for STARTUP_CACHE_AGE_HOURS histogram, v3

https://hg.mozilla.org/integration/mozilla-inbound/rev/b3773fd0e281

Note You need to log in before you can comment on or make changes to this bug.