Execution tracing for differential testing of interpreter vs. JIT semantics

ASSIGNED
Assigned to

Status

Tamarin
Virtual Machine
P2
normal
ASSIGNED
7 years ago
3 years ago

People

(Reporter: William Maddox, Assigned: William Maddox)

Tracking

unspecified
Q2 12 - Cyril
Bug Flags:
flashplayer-qrb +
flashplayer-bug -

Details

Attachments

(1 attachment)

(Assignee)

Description

7 years ago
Created attachment 513683 [details] [diff] [review]
Execution tracing (v1)

This patch adds a new feature, AVMFEATURE_EXECTRACE, enabled with the
--enable-exectrace switch to configure.py.  With this feature configured,
both the interpreter and JIT-generated code are instrumented to log the values
of selected ABC opcode inputs and outputs to the console during execution.

Not all values are logged.  Values read from or written to ABC variables and object properties and slots are recorded, providing coverage as least as good
as schemes that would simply log the variable state at safepoints.  Arguments
to method and constructor calls, as well as the result returned, is also logged.
Beyond this, some additional values are also logged, based on a heuristic
assessment of their value in finding bugs -- this is a bit haphazard, and could
stand refinement.

In addition to operand/result logging, thrown exceptions are also logged,
even if they arise from within the runtime rather than from an explicit 'throw'
opcode.

Each log entry begins with a '@@' prefix to facilitate (heuristic) filtering
of the log entries from other console output the execution may produce. This
is followed by the bytecode location of the opcode, and a representation of
the value.  To avoid spurious differences from run-to-run depending on where
the ABC lands in memory, locations are relative to the start of the ABC, and
are prefixed with an ordinal assigned to each PoolObject as it is created.
Currently, there is no further identification -- you have to know the load
order to relate such locations to the original ABC.

It is possible for an AS3 program to observe GC behavior via weak references,
but this possibility is ignored at present.  Note that the log output increases
in size with execution time, so running with GC off only hastens resource
exhaustion, but cannot avoid it.

A command-line option '-Dtracelimit N' is provided, which suspends tracing
after N events have been logged.  This may be used to bound logfile size for
longer-running test cases.  The default is currently set to ~4M events.
It is explicitly non-zero, as the execution tracing facility imposes
significant overheads whether log entries are being written or not, and
builds configured with execution tracing enabled should not be used for
other purposes.

The code has been tested to some extent on 32-bit platforms.  The abcdump
utility, dissassembling abcdump.abc, produces identical log output for
-Dinterp and -Ojit.  Substantial portions of the acceptance test suite have
also run successfully, with some divergences detected that need further
investigation -- it is not clear whether these are issues with the tracer,
the VM itself, or the testing methodology.  It is necessary to avoid tests
that query the time, and the test infrastructure needs to have a few
inessential time queries removed (e.g., in shell.as).  My own testing was
done with the following additional patches included:

https://bugzilla.mozilla.org/attachment.cgi?id=513638&action=diff
https://bugzilla.mozilla.org/attachment.cgi?id=509956&action=diff

It is not clear that either were necessary to achieve the reported results.

The patch appears to be broken in 64-bit builds, asserting on a non-zero
RCObject (String) reference count.  It is unclear whether the issue is 64-bit
specific, or has simply only manifested there -- It is possible I have violated
a GC invariant somewhere.
(Assignee)

Updated

7 years ago
Attachment #513683 - Attachment is patch: true
Attachment #513683 - Flags: feedback?(rreitmai)
(Assignee)

Updated

7 years ago
Attachment #513683 - Flags: feedback?(edwsmith)

Comment 1

7 years ago
Comment on attachment 513683 [details] [diff] [review]
Execution tracing (v1)

Cool! 

If you run this on a version of the VM with known differences between interp/jit (i.e. one of the bugs we recently fixed or not-yet-fixed), can it detect the difference?

Does logging read/writes of ABC variables (you mean to the stack right?) over constrain the jit?  presumably it has to log reads/writes that normally would be optimized out -- thus stopping us using the tool at high optimization levels?

It would be good if you could create a cosim module (or "ExecTracer"), and put everything (or as much as possible) in the module -- tracing I/O routines, tables for numbering PoolObjects, the trace limit, and so on.  At first glance, tracing functionality looks like its pretty scattered all over the vm code.

Updated

7 years ago
Attachment #513683 - Flags: feedback?(edwsmith) → feedback+
(Assignee)

Comment 2

7 years ago
(In reply to comment #1)
> Comment on attachment 513683 [details] [diff] [review]
> Execution tracing (v1)

>If you run this on a version of the VM with known differences between
>interp/jit (i.e. one of the bugs we recently fixed or not-yet-fixed), can it
>detect the difference?

Yup. :)   I tried bug 620403 and bug 615544.  There's no reason to suspect it couldn't find the others.

>Does logging read/writes of ABC variables (you mean to the stack right?) over
>constrain the jit?  presumably it has to log reads/writes that normally would
>be optimized out -- thus stopping us using the tool at high optimization
>levels?

What we are logging is ABC-level semantics, and what is being tested is the lowering to LIR and the related helper functions. The instrumentation is intrusive, and alters what goes on in LIR-level optimization and assembly greatly, so it's not reasonable to expect exectrace to find bugs systematically at that level.

A basic principle of the instrumentation is that only opcode arguments and results are logged, not intermediate values generated within the opcode's execution. Combined with the fact that we do no optimize at the ABC level, alignment of the interpreted and compiled instrumentation is assured with only minimal attention to issuing the log entries in a consistent order.  No matter how we choose to expand an ABC opcode into LIR, the arguments and results are always well-defined and available at entry and exit.

The instrumentation may inhibit optimization because it introduces observable
side-effects that would otherwise not be present.  For the most part, however, the optimizer will simply do the right thing -- high optimization levels per se do not preclude the use of the tool, though optimizations that exploit the absence of side effects (e.g., dead code) may be defeated.

With respect to "reads/writes that normally would be optimized out", in fact, we do not force such reads or writes to occur, and when sufficient registers are available, the logged values should be forced to memory only because the calling convention demands it, not because the register allocator or vartracker has been disabled or weakened.  The code isn't going to be tight, however, due to all the calls, argument shuffling, and (likely) spills.  Again, what is logged are not reads or write per se, nor the values of actual stack slots, but opcode arguments and results.

Note that if we optimized at the ABC level, the situation would be quite different, as entire ABC opcodes might be removed.  In this case, all we could really count on would be the contents of ABC *variables* (not the transient entries at the top of the stack during expression evaluation) at safe points.  I needed the ability to log individual reads and writes in order to deal sanely with the heap, so it was expedient to use the same machinery for local variables as well.  It should be easy enough to accomodate a more agressive ABC-level optimizer as long as it assumes the heap is volatile, i.e., that reads/writes to the heap may not be elided or reordered.

There are a couple of places where the "no ABC-level optimization" assumption breaks down. 1) Certain calls are elided when the callee has no visible effects, as determined by a simple and conservative triviality analysis.  The issue here is simply that the analysis does not take into account the side effects that logging introduces.  I fix this with the slightly conservative assumption that no functions are trivial in the presence of logging, just as is done when debugging is enabled. 2) Lazy creation of the rest argument vector is visible at the ABC level.  The rest argument optimization itself is not inhibited (i.e., the getproperty specialization), but the vector must be created
eagerly even if it is not used in order that the slot initialization be consistent with the interpreter.  This is a case in which what is observable to the instrumentation is slightly more than what is observable to the executing program, allowing a semantically harmless "divergence" to arise.

An alternative to the execution tracing approach would be to compare states periodically. This rests on two assumptions, however: 1) A divergence, once it occurs, is likely to continue to propagate, and 2) it is practical to compare the entire (abstract) machine state. I question 1), given that many of the corner-case values (-0, NaN, undefined) are promiscuously promoted to more "reasonable" values, and thus their appearance may be masked.  While 2) is reasonable for the registers and memory of a hardware CPU or similar low-level virtual processor, a garbage-collected heap makes such comparison very difficult.  I'm not even sure it is possible in our conservative GC environment -- one needs unambiguous roots in order to trace out two heaps to determine that they are isomorphic.  Also, dumping out the entire heap is a problem in itself, and we'd need to use bisection to find the actual point of divergence. I claim that an event logging mechanism is primary, and we'd do better to normalize the log somewhat as issues arise to accommodate more extensive optimization than to abandon the event-based approach and go with a state-based one.

>It would be good if you could create a cosim module (or "ExecTracer"), and put
>everything (or as much as possible) in the module -- tracing I/O routines,
>tables for numbering PoolObjects, the trace limit, and so on.  At first glance,
>tracing functionality looks like its pretty scattered all over the vm code.

To some degree, this is unavoidable in a scheme that is pushing instrumented code through the code generator.  Given the architecture we have in place, Interpreter.cpp, Verifier.{h,cpp}, CodegenLIR.{h,cpp}, AvmCore.{h,cpp}, and jit-calls.h must all be involved.  The following changes appear feasible:

1) The helper function bodies added to jit-calls.h and to AvmCore.cpp could be moved to a new class, leaving just the FUNCTION macros in jit-calls.h.  The member functions added to AvmCore would now have to be static methods taking an AvmCore* argument. In order to access the tracelimit, AvmCore would need to hold a pointer to an instance of the new class as well.   Method findPool() would either have to remain in AvmCore, or some provision would have to be made for other classes to iterate over livePools, which is a private member.

2) It is possible that the mapping of ScriptObject to unique identifiers could be done with an auxilliary table.  This would greatly increase the size of the heap, and add a good deal of table lookup overhead and complexity.

Given the relatively small amount of existing code that would be moved to the new class, I'm not at all convinced that either of these would be an improvement, but I'm open to suggestions.

What I want to avoid doing is departing from the basic design, which can be thought of as an elaboration of -Dverbose style tracing, and is rather easy to get one's head around and extend.

Another comment on the design:

I resisted the temptation to refactor the interpreter or verifier code to accommodate the probes.  I want it to be obvious what is being probed, and that the probes have no effect on the code when configured out.  As a result, probe arguments may include expressions occuring in the surrounding code, with no attempt to do CSE at the source level.  The C++ optimizer is expected to deal with this.

In a few cases, I had to put a probe inside a conditional, and in a few cases, the conditional was introduced newly for this purpose.  In preference to adding
an #ifdef VMCFG_EXECTRACE/#endif, I just expected the optimizer would eliminate an empty block.  Whether it does so or not, I don't like the intrusiveness, and intend to introduce guarded forms of the EXECTRACE_* macros where needed.

Updated

7 years ago
Assignee: nobody → wmaddox
Flags: flashplayer-qrb+
Flags: flashplayer-bug-
Priority: -- → P2
Target Milestone: --- → Q1 12 - Brannan

Updated

7 years ago
Status: NEW → ASSIGNED

Updated

7 years ago
Target Milestone: Q1 12 - Brannan → Q2 12 - Cyril
(Assignee)

Updated

6 years ago
Attachment #513683 - Flags: feedback?(rreitmai)
You need to log in before you can comment on or make changes to this bug.