Closed Bug 679427 Opened 13 years ago Closed 13 years ago

eventtracer: make responsiveness threshold a variable/parameter

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: anodelman, Assigned: ted)

References

Details

(Whiteboard: fixed-in-bs)

Attachments

(1 file, 1 obsolete file)

make event loop responsiveness threshold and interval configurable via environment variable 13 years ago (not currently active) Ted Mielczarek 4.74 KB, patch		Details \| Diff \| Splinter Review
make event loop responsiveness threshold and interval configurable via environment variable 13 years ago (not currently active) Ted Mielczarek 4.74 KB, patch	justin.lebar+bug : review+	Details \| Diff \| Splinter Review

alice nodelman [:alice] [:anode]

Reporter

Description

•

13 years ago

Currently, the EventTracer uses a 50ms responsiveness threshold.  We'd like the threshold to be a variable that can be set through a browser parameter to expand the testing options.

alice nodelman [:alice] [:anode]

Reporter

Updated

•

13 years ago

Blocks: 631571

alice nodelman [:alice] [:anode]

Reporter

Updated

•

13 years ago

Assignee: nobody → ted.mielczarek

(not currently active) Ted Mielczarek

Assignee

Comment 1

•

13 years ago

jlebar is out, but I have a question. Do we want to change everywhere I'm currently using 50ms, or just the threshold? (That is, should we be sending events every X ms, or just sending them every 50ms but printing data that's > x ms?)

Justin Lebar (not reading bugmail)

Comment 2

•

13 years ago

In an ideal world, you'd send a new event as soon as the previous one returns, right?  Otherwise, a 1ms pause followed by a 98ms pause would be recorded as a 1ms pause followed by a 49ms pause.  (Or maybe the 98ms pause would be completely ignored, since the sequence of events was [1ms] [98ms] [notice <50ms has passed, insert sentinel into event loop]?)

If sending one event right after the previous one finishes would spin the event loop too much, then what's the smallest delay we can have?  1ms?  5ms?

(not currently active) Ted Mielczarek

Assignee

Comment 3

•

13 years ago

This entire approach is just sampling. I made a judgement call to fire sampling events at the frequency we wanted to detect. I don't know if that's the right call or not.

Justin Lebar (not reading bugmail)

Comment 4

•

13 years ago

> This entire approach is just sampling.

Meaning that you insert a sentinel every X ms, regardless of when the last sentinel finished?  Does this mean that a 500ms delay is counted as 500ms, 450ms, 400ms, and so on?

(not currently active) Ted Mielczarek

Assignee

Comment 5

•

13 years ago

No, there's only ever one tracing event in play at one time. We'll fire an event, wait for it, and then sleep for the remainder of the interval if it took < 50ms. If it took >50ms we fire another event right away. You can see the code here, it's pretty simple:
http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/EventTracer.cpp#118

Justin Lebar (not reading bugmail)

Comment 6

•

13 years ago

Okay, that's sensible.

I think that if possible, the sampling interval should be on the order of the slop you're willing to accept in the measurements, since sampling every Xms means that you may under-report a delay by up to Xms.  So I think the interval should be a lot smaller than 50ms, if that doesn't cause problems of its own.

(not currently active) Ted Mielczarek

Assignee

Comment 7

•

13 years ago

Justin and I discussed this on IRC. He thinks that if we want to detect 50ms periods of unresponsiveness, we probably want to be sampling at like 200Hz (so that the minimum sampling error is on the order of 5ms). My only worry is that sampling that heavily will slow down the application, so I proposed that we make both numbers configurable and we can run some tests.

(not currently active) Ted Mielczarek

Assignee

Comment 8

•

13 years ago

Attached patch make event loop responsiveness threshold and interval configurable via environment variable (obsolete) — Details — Splinter Review

This patch lets you set both the threshold (below which we won't report an unresponsive period) and the measurement interval (how often we send events) via environment variables. I pushed this patch to try, a link to the try builds should show up here when they're finished.

Mozilla RelEng Bot

Comment 9

•

13 years ago

Try run for 9b9d49395ee7 is complete.
Detailed breakdown of the results available here:
    http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=9b9d49395ee7
Results (out of 4 total builds):
    success: 4
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tmielczarek@mozilla.com-9b9d49395ee7

alice nodelman [:alice] [:anode]

Reporter

Comment 10

•

13 years ago

I'd like to see this checked in.  The defaults should be set to ping time 10ms, threshold 20 ms - as determined through test runs.

(not currently active) Ted Mielczarek

Assignee

Comment 11

•

13 years ago

Attached patch make event loop responsiveness threshold and interval configurable via environment variable — Details — Splinter Review

Okay, I've adjusted those constants.

Attachment #559440 - Flags: review?(justin.lebar+bug)

(not currently active) Ted Mielczarek

Assignee

Updated

•

13 years ago

Attachment #556629 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Comment 12

•

13 years ago

Comment on attachment 559440 [details] [diff] [review]
make event loop responsiveness threshold and interval configurable via environment variable

> + * Set MOZ_INSTRUMENT_EVENT_LOOP_INTERVAL in the environment to an
> + * integer number of milliseconds to change the maximum sampling frequency.
> + * The default is at most every 10 milliseconds. 

Nit: Can you be more concrete about what this setting does?  Maybe "We will not sample twice within LOOP_INTERVAL milliseconds."

> + //TODO: only wait up to a maximum of interval, return
>   // early if that threshold is exceeded and dump a stack trace
>   // or do something else useful.

Nit: This is a comma splice.  Maybe s/,/;

Attachment #559440 - Flags: review?(justin.lebar+bug) → review+

(not currently active) Ted Mielczarek

Assignee

Comment 13

•

13 years ago

https://hg.mozilla.org/projects/build-system/rev/cf18cd153c5e

Whiteboard: fixed-in-bs

(not currently active) Ted Mielczarek

Assignee

Comment 14

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/cf18cd153c5e

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Justin Lebar (not reading bugmail)

Comment 15

•

13 years ago

I wonder if we can achieve something like this without inserting events into the native event loop.  Can we tell when we run for so long that we don't go check the native event loop for more than X ms?

The main problem with the approach in this bug is that we can't assign blame for a hiccup.  By the time we run the inserted native event, we've already forgotten about everything which has run since the last time.

Happy to move this discussion to a new bug, if it's a reasonable idea.

(not currently active) Ted Mielczarek

Assignee

Comment 16

•

13 years ago

BenWa's already got a proof of concept of this hooked up to his SPS profiler in bug 653703.

You need to log in before you can comment on or make changes to this bug.