Closed Bug 534547 Opened 10 years ago Closed 9 years ago

TM: non-deterministic LIR generation

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: njn, Unassigned)

Details

Attachments

(4 files)

The LIR generated by Tracemonkey for crypto-aes.js, v8-crypto.js and v8-raytrace.js changes non-deterministically.

Eg. here's the result of piping TMFLAGS=readlir output through 'wc' 10 times:

 [ocean:~/moz/ws0/js/src] for i in `seq 1 10` ; do TMFLAGS=readlir js0d32 -j /home/njn/moz/SunSpider/tests/crypto-aes.js | strip_addrs > y$i ; wc y$i ; done
 21897 125196 744143 y1
 21897 125196 744143 y2
 19718 113227 675588 y3
 21897 125196 744143 y4
 21897 125196 744143 y5
 21897 125196 744143 y6
 21897 125196 744143 y7
 17102  99017 594237 y8
 21897 125196 744143 y9
 21897 125196 744143 y10

Most of the time I get 21897 bytes, but sometimes I get 19718 or 17102.

I discovered this when testing a patch that cleaned up the i386 backend but wasn't supposed to change the generated code.  AFAICT crypto-aes is the only SunSpider benchmark that features this non-determinism.

v8-crypto.js exhibits even more non-determinism:

 /home/njn/moz/SunSpider/tests/v8-crypto.js
 102636  616415 3975756
 112656  676163 4363617
 108532  650392 4194133
 110888  665181 4290891
 112386  674402 4351075
 112882  677413 4370792
 158539  953766 6173152
 108489  650195 4192971
 114667  688319 4441484
 111804  671200 4329003

And v8-raytrace.js has a small amount:

/home/njn/moz/SunSpider/tests/v8-raytrace.js
  17706  112018  765336
  15255   96055  656340
  15255   96055  656340
  17706  112018  765348
  17706  112018  765336
  15255   96055  656329
  15255   96055  656339
  15255   96055  656339
  15255   96055  656329
  15255   96055  656340

This behaviour seems undesirable to me.  For one, it makes the diff-based testing mentioned earlier more difficult.  More generally it just doesn't feel right.

I tested with Valgrind, but didn't find any problems.

I looked through crypto-aes.js but couldn't find any cause.

Is this a TM problem, or just a characteristic of these programs?  I'd be happy if anyone could explain.
Nb: this non-determinism also isn't going to help the stability of our SunSpider timings.
Attachment #417394 - Attachment description: crypto-aes.js output (21897 bytes) → crypto-aes.js output (744143 bytes)
Attachment #417394 - Attachment mime type: application/octet-stream → text/plain
Attachment #417395 - Attachment mime type: application/octet-stream → text/plain
Attachment #417397 - Attachment mime type: application/octet-stream → text/plain
I think this non-determinism is due, at least in part, to the non-determinism of the Oracle (bug 517535).  If you need a stable execution sequence for debug/timing, you could disable it by have Oracle::is*Undemotable return true.
We should really rewrite the oracle. Its a gross hack. It should properly map script hashes and pc numbers to results and cache those for a while (maybe a day or so? something like that).
(In reply to comment #4)
> I think this non-determinism is due, at least in part, to the non-determinism
> of the Oracle (bug 517535).

I tried changing isGlobalSlotUndemotable() to always return true, as that bug suggests.  I still see the same non-determinism.  So either I haven't disabled the Oracle properly or it's not the cause (or not the only cause).
which harness are you using? what is the command line?
isInstructionUndemotable and isStackSlotUndemotable need to return true as well.
(In reply to comment #7)
> which harness are you using? what is the command line?

I'm using the Webkit SunSpider.  Command line is in comment 0 above.

(In reply to comment #8)
> isInstructionUndemotable and isStackSlotUndemotable need to return true as
> well.

I tried that as well, I'm still seeing the non-determinism.
(In reply to comment #4)
> I think this non-determinism is due, at least in part, to the non-determinism
> of the Oracle (bug 517535).  If you need a stable execution sequence for

Luke, can you say precisely what form of nondeterminism the Oracle
introduces?  For a non-threaded app, how can it be nondeterministic?
I can think of two other ways: either it relies on details of object
(C++ ones as well as JS ones) addresses, and hence is affected by 
address space randomisation.  Or, it looks at cycle counts as obtained
by rdtsc et al, and so is deliberately timing dependent.

But I had the impression it did neither of these things.  I thought
the nondeterminism in these benchmarks is due to the use of 
Math.random, which AIUI is unlike libc random et al in that it 
isn't initialised to a fixed starting point at process start.
The Oracle does rely on somewhat random numbers, it hashes |fp->pc|, |fp->script|, the global object shape, maybe more.

But if the Oracle's disabled and we're still getting non-determinism, without Math.random or Date.something, that's a little sketchy.
(In reply to comment #11)
> The Oracle does rely on somewhat random numbers, it hashes |fp->pc|,
> |fp->script|, the global object shape, maybe more.

Why would any of those be non-repeatable from run to run, though?
crypto-aes has a call to Date.now() and uses the bits for data. When I fix that the results are consistent.
Don't you just love SunSpider?

There may be a bug on file already, checking bugs.webkit.org... didn't find one, feel free to file.

/be
(In reply to comment #13)
> crypto-aes has a call to Date.now() and uses the bits for data. When I fix that
> the results are consistent.

I didn't see Date.now() but I did see Date.getTime().  The attached patch makes crypto-aes.js deterministic for me even with the Oracle enabled, which is good.

I got v8-crypto much less non-deterministic by futzing with a Date.getTime() call and a Math.random() call -- it now bounces between two very similar results.  (And disabling the Oracle didn't help.)

v8-raytrace.js is still a mystery to me.  It doesn't have any date or random values that I can find.
> (In reply to comment #13)
> v8-raytrace.js is still a mystery to me.  It doesn't have any date or random
> values that I can find.

This looks kinda dodgy to me:

  registerCallback: function() {
    this.timer = setInterval(this.onTimerEvent.bind(this), this.frequency * 1000);
  },

Looks like it might use some kind of timer-based mechanism delivered
by signals?  (waves hands wildly)
(In reply to comment #14)
> Don't you just love SunSpider?
> 
> There may be a bug on file already, checking bugs.webkit.org... didn't find
> one, feel free to file.

https://bugs.webkit.org/show_bug.cgi?id=32804
Not much point having this bug open any more.  Bug 580532 has a patch that improves SunSpider and V8 a lot.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.