Closed Bug 1124752 Opened 9 years ago Closed 9 years ago

Intermittent Memory-takeCensus-02.js:23:0 Error: Assertion failed: got false, expected true

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox36 --- unaffected
firefox37 --- unaffected
firefox38 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: RyanVM, Assigned: jimb)

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

INFO stdout > census1["scripts"]["count"] = 38
INFO stdout > census1["strings"]
INFO stdout > census1["strings"]["count"] = 802
INFO stdout > census1["other"]
INFO stdout > census1["other"]["js::Shape"]
INFO stdout > census1["other"]["js::Shape"]["count"] = 672
INFO stdout > census1["other"]["js::BaseShape"]
INFO stdout > census1["other"]["js::BaseShape"]["count"] = 87
INFO stdout > census1["other"]["js::types::TypeObject"]
INFO stdout > census1["other"]["js::types::TypeObject"]["count"] = 35
INFO stdout > census1["other"]["JS::Symbol"]
INFO stdout > census1["other"]["JS::Symbol"]["count"] = 1
INFO stdout > census2
INFO stdout > census3
INFO stderr 2> /builds/slave/m-in_l64-d_sm-ggc-000000000000/src/js/src/jit-test/tests/debug/Memory-takeCensus-02.js:23:0 Error: Assertion failed: got false, expected true
Jim, can you take a look at this?
Flags: needinfo?(jimb)
Thanks for flagging me; yes, I'll take a look.
Flags: needinfo?(jimb)
Assignee: nobody → jimb
I can't reproduce this. The output of takeCensus is only supposed to reflect reachability, and not be affected by GC.

Let's add some debugging output to the test.
Attachment #8556206 - Flags: review?(sphink)
Comment on attachment 8556206 [details] [diff] [review]
Provide more information when js/src/jit-test/tests/debug/Memory-takeCensus-02.js fails.

Review of attachment 8556206 [details] [diff] [review]:
-----------------------------------------------------------------

This is fine, but might I suggest throwing in a variant of this in addition?:

+
+var upload_dir = os.getenv("MOZ_UPLOAD_DIR") || ".";
+redirect(upload_dir + "/census6.txt");
+print(JSON.stringify(census6, null, 4));

MOZ_UPLOAD_DIR is defined during a try push. Anything you write into that directory will be automatically uploaded, and the resulting URL displayed in the log. I think it even gets scraped out and displayed in the summary pane of treeherder/tbpl. So you can do the above, conditionally on your mysterious failure.

Sadly, the committed version of redirect() does not allow restoring the original stdout. I still need to update my nicer version with Jason's comments. But in this case, the test is about to bomb out anyway, so it doesn't matter.
Attachment #8556206 - Flags: review?(sphink) → review+
and note that it's fine to redirect() multiple times if you'd like to keep multiple censuses... censi... censa... censae? separate.
Okay, I've pushed the change with additional diagnostics. Waiting for a visit from the robot...
https://hg.mozilla.org/integration/mozilla-inbound/rev/b3fb84cd2df1
Whiteboard: [leave open]
Tracked down the bug. Since takeCensus piggybacks on the GC tracer, its notion of reachability is "what should be kept alive". The test seems to assume takeCensus only reports what's reachable via script.

This particular case is due to the JIT's use of template objects. If when taking a census a script has JIT code, it may have template objects stored to help speed up allocation. These template objects hold the right shape and type that are assigned to the newly allocated objects.

In the test, if at the point of taking census g1.add(f) happens to have a BaselineScript, that BaselineScript will have a template object for the object literal {}. If due to GC churn (which happens all the time with high zeal), the BaselineScript got collected the next time you take a census, the net count of objects may in fact go down.

tl;dr: Tracing doesn't imply a thing about script reachability. Either takeCensus needs to have this invariant relaxed or the edge traversal for takeCensus needs to do its own thing.
Flags: needinfo?(jimb)
Okay, it sounds like the test is wrong, then. Rats. I hated writing that test, because I was afraid something like this would happen.

In the older parts of the Debugger API, when the actual implementation diverges from the behavior prescribed by ECMAScript (say, by putting non-aliased variables in a stack frame), Debugger sides with ECMAScript, and tries to present the program's state as the language specification would have it.

The memory tools, however, must account for actual memory consumption: ECMAScript doesn't even try to specify how much memory a program may consume, so there's no portable notion of what the memory tools should say. Here, Debugger should side with SpiderMonkey, and report what's actually going on. And hence, the tests for memory tools will inevitably be sensitive to SpiderMonkey's implementation details.

The criterion of "script reachability" only helps us decide how to handle things contemplated in ECMAScript, but the memory tools must report shapes, JIT code, and so on. Is a BaselineScript 'script reachable'? I don't think there's a meaningful answer to that.

Pretty much anything that holds resources weakly (scripts holding BaselineScripts weakly, in this case) will cause similar problems here. Should takeCensus not traverse weak references? Should we inhibit all GC for that test?
Flags: needinfo?(jimb)
(In reply to Jim Blandy :jimb from comment #34)
> Okay, it sounds like the test is wrong, then. Rats. I hated writing that
> test, because I was afraid something like this would happen.
> 
> In the older parts of the Debugger API, when the actual implementation
> diverges from the behavior prescribed by ECMAScript (say, by putting
> non-aliased variables in a stack frame), Debugger sides with ECMAScript, and
> tries to present the program's state as the language specification would
> have it.
> 
> The memory tools, however, must account for actual memory consumption:
> ECMAScript doesn't even try to specify how much memory a program may
> consume, so there's no portable notion of what the memory tools should say.
> Here, Debugger should side with SpiderMonkey, and report what's actually
> going on. And hence, the tests for memory tools will inevitably be sensitive
> to SpiderMonkey's implementation details.

Yup.

> The criterion of "script reachability" only helps us decide how to handle
> things contemplated in ECMAScript, but the memory tools must report shapes,
> JIT code, and so on. Is a BaselineScript 'script reachable'? I don't think
> there's a meaningful answer to that.
> 
> Pretty much anything that holds resources weakly (scripts holding
> BaselineScripts weakly, in this case) will cause similar problems here.
> Should takeCensus not traverse weak references? Should we inhibit all GC for
> that test?

I don't think we should traverse weak refs here. We have other APIs for getting USS/RSS-like data for tabs (or the actual USS/RSS in b2g apps). We should focus on edges that affect the GC/CC behavior, because this is the only good API for understanding that sort of thing, currently.
(In reply to Nick Fitzgerald [:fitzgen] from comment #36)
> (In reply to Jim Blandy :jimb from comment #34)
> > Okay, it sounds like the test is wrong, then. Rats. I hated writing that
> > test, because I was afraid something like this would happen.
> > 
> > In the older parts of the Debugger API, when the actual implementation
> > diverges from the behavior prescribed by ECMAScript (say, by putting
> > non-aliased variables in a stack frame), Debugger sides with ECMAScript, and
> > tries to present the program's state as the language specification would
> > have it.
> > 
> > The memory tools, however, must account for actual memory consumption:
> > ECMAScript doesn't even try to specify how much memory a program may
> > consume, so there's no portable notion of what the memory tools should say.
> > Here, Debugger should side with SpiderMonkey, and report what's actually
> > going on. And hence, the tests for memory tools will inevitably be sensitive
> > to SpiderMonkey's implementation details.
> 
> Yup.
> 
> > The criterion of "script reachability" only helps us decide how to handle
> > things contemplated in ECMAScript, but the memory tools must report shapes,
> > JIT code, and so on. Is a BaselineScript 'script reachable'? I don't think
> > there's a meaningful answer to that.
> > 
> > Pretty much anything that holds resources weakly (scripts holding
> > BaselineScripts weakly, in this case) will cause similar problems here.
> > Should takeCensus not traverse weak references? Should we inhibit all GC for
> > that test?
> 
> I don't think we should traverse weak refs here. We have other APIs for
> getting USS/RSS-like data for tabs (or the actual USS/RSS in b2g apps). We
> should focus on edges that affect the GC/CC behavior, because this is the
> only good API for understanding that sort of thing, currently.

JIT code isn't held weakly by JSScript. The problem with this test was more than JIT code can be discarded, which affects the census.
more that*
Comment on attachment 8560957 [details] [diff] [review]
Use approximate object counts when testing takeCensus, to avoid being flummoxed by SpiderMonkey implementation details.

Review of attachment 8560957 [details] [diff] [review]:
-----------------------------------------------------------------

I don't feel great about the fudge factor, but then again I also don't foresee the engine going over the count of 50 for this test case anytime soon.

::: js/src/jit-test/tests/debug/Memory-takeCensus-02.js
@@ +58,4 @@
>  var census2 = dbg.memory.takeCensus();
> +Census.walkCensus(census2, "census2",
> +                  Census.assertAllWithin(
> +                    50,

Where did this fudge factor come from?
Attachment #8560957 - Flags: review?(shu) → review+
I took out the fudge factor.
https://hg.mozilla.org/integration/mozilla-inbound/rev/383609472929
Flags: in-testsuite+
OS: Linux → All
Hardware: x86_64 → All
Target Milestone: --- → mozilla38
Okay, it's been a week without any more visits from the robot. I'm calling this ready to be closed.
Status: NEW → ASSIGNED
Whiteboard: [leave open]
Attachment #8560957 - Flags: checkin+
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: