Closed Bug 619561 Opened 14 years ago Closed 13 years ago

Add GC stats to Test Pilot

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: paul.biggar, Assigned: cdleary)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 4 obsolete files)

WIP: GC stats hooks for testpilot. 13 years ago Chris Leary [:cdleary] (not checking bugmail) 24.78 KB, patch		Details \| Diff \| Splinter Review
WIP: TestPilot study. 13 years ago Chris Leary [:cdleary] (not checking bugmail) 8.46 KB, text/plain		Details
JS engine hooks to support tespilot GC study. 13 years ago Chris Leary [:cdleary] (not checking bugmail) 18.97 KB, patch		Details \| Diff \| Splinter Review
JS GC TestPilot study. 13 years ago Chris Leary [:cdleary] (not checking bugmail) 9.23 KB, patch		Details \| Diff \| Splinter Review
JS engine hooks to support tespilot GC study. 13 years ago Chris Leary [:cdleary] (not checking bugmail) 24.26 KB, patch	billm : review+	Details \| Diff \| Splinter Review
JS GC TestPilot study. 13 years ago Chris Leary [:cdleary] (not checking bugmail) 10.38 KB, text/plain		Details

Paul Biggar

Reporter

Description

•

14 years ago

We want to track simple stuff via Test Pilot or whatever the metrics stuff is. We don't know how all this works, so we should figure it out.

We want to track at least:

- timestamp before GC
- timestamp after marking
- timestamp after GC

Paul Biggar

Reporter

Comment 1

•

14 years ago

I've been chatting to jono about this, and created a proposal with his help:

https://wiki.mozilla.org/Labs/Test_Pilot/Test_Proposals/GC

Comments welcome.

One particular need is to figure out exactly what data we want to collect. So far, I'm at:

 - the timestamp when GC starts
 - the timestamp when GC finishes its first phase (marking)
 - the timestamp when GC finishes its second phase (sweeping)
 - the timestamp when GC finishes completely
 - the size of the heap before
 - the size of the heap after

But I'd like others to suggest what is useful here. (Note that if we want to impact performance, we'll need to warn users, which will probably significantly decrease participation).

Paul Biggar

Reporter

Updated

•

14 years ago

Depends on: 622638

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 2

•

14 years ago

It would be nice to collect all the data that we do when MOZ_GCTIMER is set (see the bottom of jsgcstats.cpp).

It would also be nice to know why the GC was triggered:
- per-compartment GC? (after bug 605662 lands)
- full GC?
- cycle collection?
- triggered via MaybeGC?

None of this data should be difficult to collect.

David Mandelin [:dmandelin]

Comment 3

•

14 years ago

I think it would really help if we started with what questions we want to answer or what decisions we want to make based on the data. Off the top of my head, getting parameters that could feed into performance models would be really great, such as:

- Stats on conservative mark/conservative stack-scan time. The basic question is "How long does a typical stack scan take?" I don't know if we can do this easily, though.

- Allocation rates. E.g., how many strings do we allocate per second? More generally, we want to know the allocation count and the byte count for every allocation of interest since last GC (along with the time since last GC): objects, strings, functions, slot arrays, string character data, etc. That is probably too many numbers to do at once. I think "total headers" and "total arrays" would be good summary stats. This tells us how often we need to collect with various generation sizes.

- Total heap size, allocated heap size, mark time, and sweep time, most of which were previously mentioned, could also be useful. It would be interesting to correlate allocated heap size with mark time and total heap size with sweep time.

Paul Biggar

Reporter

Comment 4

•

14 years ago

I had a chat with Jono and Jinghua about this.

In terms of how much data we can collect, we're constrained by an 8MB limit (with is the limit of POST requests). So far, they haven't collected any data that is remotely that large.

We may or may not be able to collect URLs. Although the privacy policy states that URLs can be gathered, there was a promise that they wouldn't be during the FF4 beta period. However, if we really need URLs, we may be able to ask the user if they'd like to send the URLs too (or a selection, perhaps for the ones with really long GC time, or something). We can certainly get hashes of the URLs, and it may be possible to find out just the domains (Jono will get back to me).

Test Pilot currently has about a million users, but that may drop once because it's not in FF4. However, there'll still be plenty enough users for us (more than 100K definitely().

Paul Biggar

Reporter

Comment 5

•

13 years ago

Jono sent some resources for writing the test pilot study:

- Tutorial for developing an experiment: https://wiki.mozilla.org/Labs/Test_Pilot/Experiment_Tutorial

- API reference: https://wiki.mozilla.org/Labs/Test_Pilot/ExperimentAPI

- Source code for all existing studies can be found in hg: http://hg.mozilla.org/labs/testpilotweb/file/tip/testcases
e.g. the beta UI study code is at http://hg.mozilla.org/labs/testpilotweb/file/ef201f4a6572/testcases/beta/combined.js

FYI: I'll be on the type-inference side of things rather than GC, so I think this is as far as I'm going to go here. Whoever takes over please ask me anything that's unclear.

Component: JavaScript Engine → jemalloc

Target Milestone: --- → flash10

Phil Ringnalda (:philor)

Updated

•

13 years ago

Component: jemalloc → JavaScript Engine

Target Milestone: flash10 → ---

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 6

•

13 years ago

Taking assignment.

Assignee: general → cdleary

Status: NEW → ASSIGNED

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 7

•

13 years ago

I have an initial prototype MQ repo at http://hg.mozilla.org/users/cleary_mozilla.com/gc-testpilot/ -- just have to make it GC-safe now. When we drop info from the circular GC-stats buffer we can just mark those database entries as tainted so that we can ignore them in post-processing. Ideally the circular buffer will be sufficiently large (or our test pilot timeout interval will be sufficiently small) that we very rarely drop any info.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 9

•

13 years ago

Attached patch WIP: GC stats hooks for testpilot. (obsolete) — Details — Splinter Review

Start off with some basic information for the first iteration of the study.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 10

•

13 years ago

Attached file WIP: TestPilot study. (obsolete) — Details

Needs plotting functionality and testing on Windows.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 11

•

13 years ago

Attached patch JS engine hooks to support tespilot GC study. (obsolete) — Details — Splinter Review

I'm not really proud of the fact I poked a "give me the JS runtime" hole in CTypes here but it seemed like the most straightforward route to getting something out there. The GC trace method changes are to avoid cross-compartment mismatches (and assertions) when collecting ctypes objects under debug.

Attachment #520837 - Attachment is obsolete: true

Attachment #525894 - Flags: review?(wmccloskey)

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 12

•

13 years ago

Attached patch JS GC TestPilot study. (obsolete) — Details — Splinter Review

No system metadata yet, will have to create an XPCOM component for that as a followup. We'll want to iterate on this study pretty rapidly (on the order of weeks) as we figure out what additional GC information would be useful.

Attachment #520838 - Attachment is obsolete: true

Attachment #525895 - Flags: review?

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Updated

•

13 years ago

Attachment #525895 - Flags: review? → review?(jdicarlo)

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Updated

•

13 years ago

Blocks: 585196

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 13

•

13 years ago

Comment on attachment 525894 [details] [diff] [review]
JS engine hooks to support tespilot GC study.

Great! Just a few things.

I think you need more #ifdef JSGC_TESTPILOTs (in JSRuntime::init). Maybe also try compiling with MOZ_GCTIMER enabled (with --enable-gctimer) and JSGC_TESTPILOT disabled?

It would be nice if GCTimer had a more normal constructor. As far as I can tell, the memset is unnecessary. The remaining fields could be initialized using the ": field(xxx)" form.

It seems like when JS_PopGCInfo gets called, since it immediately increments gcInfoStart, it's possible for the info to get overwritten before the TestPilot code is done with it. Maybe have two calls: one to get it and one to increment?

The GetRuntime thing is a little ugly. Rather than passing in a random object to get the runtime, can't you just return cx->runtime? Also, it would be cool to put the new JSRuntime fields into their own object. The circular buffer code could be encapsulated inside that.

Finally, as long as you're there, could you fix the bracing in jsgcstats.cpp?

Jono Xia

Comment 14

•

13 years ago

Comment on attachment 525895 [details] [diff] [review]
JS GC TestPilot study.

First of all, thanks a lot for writing this study!

Now on to the nit-picking:

In makeCTypeProps(), is var record supposed to be declared inside the loop?  that looks a little weird.  Are you trying to have an array of objects where each object only has a single property?  (And isn't var scoped to the function anyway, even if you use it inside a loop?)

Please change the name of ExampleStudyGlobalObserver to something more descriptive -- this is not the example study anymore, it's a real study now!

Dump() statements should be removed (or replaced with console.debug) before we push this to production, but keep them in for now as we test the code.

What operating systems is this intended to run on?  (It is possible to distribute a study only to certain OSes, if we ever want to do that.)  We need to make sure to test for crashes on all supported OSes.

Should GCObserver.uninstall() set alreadyInstalled back to false?

If createCTypes() fails, then this.setGCInfoEnabled  and this.getGCInfoEnabled will not be functions, which will cause errors why you try to call setGCInfoEnabled in GCObserver.uninstall().

I see you're trying to restore GCInfoEnabled to its original state when the study is done, which is good.  But if Firefox exits without GCObserver.uninstall() having been called (it can happen on a crash) then GCInfoEnabled won't be restored, and then what happens when you start back up again and GCObserver.install() calls getGCInfoEnabled again?  I don't know exactly how this works, but won't you lose the original setting in that case?  You might need to store the user's original setting in a pref the first time the study runs, just to make sure you don't lose it.

You should also write a doExperimentCleanup() method on your global observer object.  This gets called when the study is finally over (either it's finished or it's been canceled) - as opposed to uninstall() which gets called on Firefox shutdown even if the study is still running.  Your doExperimentCleanup() method might be the best place to restore the original value of GCInfoEnabled from the pref.

Why does your selfPingTimer callback call pingSelf() again?  You're passing a type code of 1, which is TYPE_REPEATING_SLACK according to https://developer.mozilla.org/en/nsITimer, to initWithCallback(), so that hould be enough to ensure that dumpGCInfos() is called approx. every 30 seconds; calling pingSelf() again should be unneccessary, and in fact might cause extra redundant timers to be created.

That's all I have to say from reading the code; I may have more comments after I start testing it.

Tom S [:evilpie]

Updated

•

13 years ago

Blocks: 505308

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 15

•

13 years ago

Attached patch JS engine hooks to support tespilot GC study. — Details — Splinter Review

Revised per review comments. Builds with all gctimer/jsgctestpilot configure flag combos.

Attachment #525894 - Attachment is obsolete: true

Attachment #533839 - Flags: review?(wmccloskey)

Attachment #525894 - Flags: review?(wmccloskey)

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 16

•

13 years ago

Attached file JS GC TestPilot study. — Details

Updated per review comments. (Some comments were added at the top address some of the questions!)

Attachment #525895 - Attachment is obsolete: true

Attachment #533841 - Flags: review?(jdicarlo)

Attachment #525895 - Flags: review?(jdicarlo)

Gregor Wagner [:gwagner]

Comment 17

•

13 years ago

Last time I checked GCTimer didn't compile on ARM. I don't know if this is fixed by now.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 18

•

13 years ago

(In reply to comment #17)
> Last time I checked GCTimer didn't compile on ARM. I don't know if this is
> fixed by now.

Thanks for the heads up... I'll check it on the tegra 2 (and try) before I check it in.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 19

•

13 years ago

Comment on attachment 533839 [details] [diff] [review]
JS engine hooks to support tespilot GC study.

Review of attachment 533839 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.

::: js/src/ctypes/CTypes.cpp
@@ +5690,5 @@
> +
> +  jsval* argv = JS_ARGV(cx, vp);
> +  if (JSVAL_IS_PRIMITIVE(argv[0]) ||
> +      !CType::IsCType(cx, JSVAL_TO_OBJECT(argv[0]))) {
> +    JS_ReportError(cx, "second argument must be a CType");

I think you mean "first argument ..."

@@ +5697,5 @@
> +
> +  JSObject* targetType = JSVAL_TO_OBJECT(argv[0]);
> +  size_t targetSize;
> +  if (!CType::GetSafeSize(cx, targetType, &targetSize) ||
> +      targetSize > sizeof(void*)) {

Why isn't this a != test?

::: js/src/ctypes/CTypes.h
@@ +503,5 @@
>    bool IsCData(JSContext* cx, JSObject* obj);
>  
>    // Attached by JSAPI as the function 'ctypes.cast'
>    JSBool Cast(JSContext* cx, uintN argc, jsval* vp);
> +  // Attached by JSAPI as the function 'ctypes.runtime'

Isn't it getRuntime?

Attachment #533839 - Flags: review?(wmccloskey) → review+

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 20

•

13 years ago

Android and friends looked good on the try run, so I pushed to TM:

http://hg.mozilla.org/try/rev/e678548b2f5a
http://hg.mozilla.org/tracemonkey/rev/00ed83d09564

Marking as fixed-in-tracemonkey. I'll make a new bug to track the TestPilot study after Jono's comments on the new version of the study.

Whiteboard: fixed-in-tracemonkey

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 21

•

13 years ago

http://hg.mozilla.org/mozilla-central/rev/00ed83d09564
http://hg.mozilla.org/mozilla-central/rev/8273f083b461

Jono Xia

Comment 22

•

13 years ago

I've added the study, provisionally, to the index-dev channel, so that I
and the other internal users of this channel can try it out. It's in the
hg repo at:
http://hg.mozilla.org/labs/testpilotweb/file/c2448c803f6f/testcases/garbage-collection/gcstudy.js 

A couple more things, after reviewing the newest code.

experimentInfo should define a runOrNotFunc() that returns true if the study can be run (i.e. the instrumentation is available), false if not.  This is so the study doesn't show up in the UI for people who can't run it.

Also, before we can release, we need a thumbnail image (90x90px) to represent the study, and a URL for a landing page the user gets to when they click the appropriate link.

You have:

GCGlobalObserver.prototype.getStudyMetadata = function() {
    return [{droppedGCInfoCount: this.gcObserver.dropCount}];
};

but onExperimentShutdown will be called before getStudyMetadata, which means that this.gcObserver will already have been uninstalled when this is called.  You may have to work around this by storing dropCount in a preference or something.

(It's kind of a design flaw in the test pilot extension, to be honest.  Ed Lee discovered this problem when working on his study.  In a future version I'd like to make the extension call getStudyMetadata just before shutting down the experiment for the last time and cache that value, which will make the workaround unnecessary.)

Finally, debug() should switch over to using console.info() or console.warn() instead of dump().

Jono Xia

Comment 23

•

13 years ago

When the Javascript Performance study finishes for me, and I click "more info", to open the study detail view window...
I get a blank window.

This is on Aurora, 6.0a2, on Mac OS X 10.6.7.

The error message in the Javascript console is:
Error: [Exception... "Could not convert JavaScript argument arg 0 [nsIDOMWindow.getComputedStyle]"  nsresult: "0x80570009 (NS_ERROR_XPC_BAD_CONVERT_JS)"  location: "JS frame :: chrome://testpilot/content/flot/jquery.js :: <TOP_LEVEL> :: line 808"  data: no]
Source File: chrome://testpilot/content/flot/jquery.js
Line: 808

Looking manually at the data that was collected for me, the events array is empty.

There's also an exception that occurs in getStudyMetadata:

TypeError: this.gcObserver is undefined

It seems like 1. this study should not be running on Mac at all, if the gcObserver is designed to work on Windows;
and 2. the code for WebContent.onPageLoad() needs to correctly handle the case where the data is empty.

Nicholas Nethercote [inactive]

Comment 24

•

13 years ago

The patches for this bug have landed on m-c, should it be resolved fixed?  (Nb: I've read comment 22 and comment 23, maybe they should be spun off into another bug?)

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 25

•

13 years ago

I'll file new bugs under Test Pilot.

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Updated

•

13 years ago

Attachment #533841 - Flags: review?(jdicarlo)

You need to log in before you can comment on or make changes to this bug.