Closed Bug 1814718 Opened 3 years ago Closed 3 years ago

Record number of timers fired per Timer Thread wake-up

Tracking

()

Status:

RESOLVED FIXED

Milestone:

112 Branch

Tracking Flags:

Tracking

Status

firefox112

---

fixed

People

(Reporter: jlink, Assigned: jlink)

References

Details

Attachments

(3 files)

Bug 1814718 - Added telemetry for number of timers fired per Timer Thread wake-up. r=smaug,florian 3 years ago Justin Link 48 bytes, text/x-phabricator-request		Details \| Review
1814718-data-collection-review-request.txt 3 years ago Justin Link 2.74 KB, text/plain	chutten : data-review+	Details
Bug 1814718 - Buffer "timers fired per-wake-up" stats before sending to glean. r=florian,smaug 3 years ago Justin Link 48 bytes, text/x-phabricator-request		Details \| Review

Justin Link

Assignee

Description

•

3 years ago

•

Edited

For the purpose of measuring the impact of https://bugzilla.mozilla.org/show_bug.cgi?id=1783405 it would useful to measure how many timers are fired each time the Timer Thread wakes up to do work. The intention of the above work is to cause a reduction in the number of times that the Timer Thread wakes up to accomplish a given amount of work by slightly increasing the amount of work that it does each time that it wakes up.

We need the telemetry to go in a few weeks before that change does in order to establish a baseline from real user behavior. The result that we expect to see is a small right-ward shift in the lower end of the distribution.

Justin Link

Assignee

Updated

•

3 years ago

Assignee: nobody → jlink

No longer depends on: 1769255

Justin Link

Assignee

Comment 1

•

3 years ago

Attached file Bug 1814718 - Added telemetry for number of timers fired per Timer Thread wake-up. r=smaug,florian — Details

Justin Link

Assignee

Comment 2

•

3 years ago

Attached file 1814718-data-collection-review-request.txt — Details

Attachment #9316057 - Flags: data-review?(chutten)

Chris H-C :chutten

Comment 3

•

3 years ago

Comment on attachment 9316057 [details]
1814718-data-collection-review-request.txt

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection can be controlled through the product's preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, :jlink is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does the data collection use a third-party collection tool?

No.

Result: datareview+

Attachment #9316057 - Flags: data-review?(chutten) → data-review+

Pulsebot

Comment 4

•

3 years ago

Pushed by fqueze@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/56cd09827bab Added telemetry for number of timers fired per Timer Thread wake-up. r=smaug,florian

Sandor Molnar[:smolnar]

Comment 5

•

3 years ago

Backed out for causing mochitest failures in editor/libeditor/tests/test_undo_after_spellchecker_replaces_word.html

Backout link: https://hg.mozilla.org/integration/autoland/rev/abfe9884ce821653277d51273689ef8b507cffd3

Push with failures

Failure log: editor/libeditor/tests/test_undo_after_spellchecker_replaces_word.html

Failure log: editor/spellchecker/tests/test_bug1205983.html

Flags: needinfo?(jlink)

Justin Link

Assignee

Comment 6

•

3 years ago

•

Edited

I'm baffled that this particular change here would cause a regression in those tests. A couple thoughts/questions came to mind:

@smolnar: In the push with my change which has those failures those tests are consistently taking longer to run than in the push with my changes removed in which those tests don't fail. Other tests in that same suite do not show this time discrepancy. The failures are also only occurring on Windows 10 x64 2004 WebRender Shippable and, when they are failing, those tests are taking even longer than the same tests in the Windows 10 x64 2004 WebRender debug. Is it possible that those particular tests are running on a particular machine that is older/slower or that has something else going on with it to make it slow?
@jrediger: There is not much going on in this change. The only "interesting" thing is really the call into glean to record the sample. Can you think of any conditions under which this could cause a performance problem? It seems to be only a couple tests and only on a highly-optimized configuration which is odd.
Does anybody know how to trigger these tests in a push-to-try? I did one and tried to find and select everything that seemed related to Windows Shippable and, unless I'm missing something, it didn't run the "M-spi-nw" tests for that config (but they did run for some other configs).

Flags: needinfo?(smolnar)

Flags: needinfo?(jrediger)

Flags: needinfo?(jlink)

Sandor Molnar[:smolnar]

Comment 7

•

3 years ago

•

Edited

@Justin, the difference between test taking longer, is that in the first link there was a backstop push, which means all the tests are running. in the backout link not all the tests are running, hence the time difference. You can see the difference in what tests are running, by clicking on a test and checking it in the "Test Groups" tab, next to "Performance" tab.
The failure seems to be win shippable specific.
I have added manually on your test try the missing jobs, hopefully the needed test will run

Flags: needinfo?(smolnar)

Jan-Erik Rediger [:janerik] (Away 2026-04-02 - 2026-04-10)

Comment 8

•

3 years ago

(In reply to Justin Link from comment #6)

@jrediger: There is not much going on in this change. The only "interesting" thing is really the call into glean to record the sample. Can you think of any conditions under which this could cause a performance problem? It seems to be only a couple tests and only on a highly-optimized configuration which is odd.

I looked at the patch. Do you have an idea how many samples would be accumulated there at once?
I realized that AccumulateSamples will allocate a (Rust) vec to hold those, and depending on how many that could get expensive easily.

Flags: needinfo?(jrediger)

Florian Quèze [:florian]

Comment 9

•

3 years ago

I grabbed a quick profile of editor/libeditor/tests/test_undo_after_spellchecker_replaces_word.html without Justin's patch. During the test, the Timer thread of the content process is awakened 192758 in about 3.2s. I think Justin's patch would record a sample in Glean every time.
When profiling the test with the patch applied, I can't reproduce the failure, but I see 2 "FOG IPC Payload getting full" runnables on the child process running the test. I think this confirms we are recording a lot of data.

This seems to be due to AsyncSpellCheckTestHelper.jsm adding a repeating 0ms timer: https://searchfox.org/mozilla-central/rev/5ba1bdb6f8fbf8e8c91720b1e584c0081022f1bc/editor/AsyncSpellCheckTestHelper.jsm#93-94

This helper is used by the two tests mentioned in comment 5 and more: https://searchfox.org/mozilla-central/search?q=maybeOnSpellCheck&path=

Justin Link

Assignee

Comment 10

•

3 years ago

I wake up to find a whole bunch of really useful feedback. I love it!

It seems like a potential easy "fix" here would be to have the TimerThread queue up its telemetry data and only flush it once enough has been captured. That should be able to mitigate most of the cost of adding this telemetry in this situation and won't take long to do.

Seeing timers used like this makes me kind of sad though. Am I the only one who thinks that repeatedly adding timers that need to be fired right away isn't a great way to use the Timer Thread? I'll need to think about how this type of usage is handled by the upcoming timer coalescing work (which is what this telemetry is intended to support).

Chris H-C :chutten

Comment 11

•

3 years ago

I confess to asking myself how much of this bug is "Oh, Glean is cheap but not free. Once we get a full idea of the shape of this issue we'll need to examine how best to support this case." and how much of this bug is "Oh, new instrumentation managed to catch a problem before it even reached users! And once it's fixed we can re-evaluate the perf."

Florian Quèze [:florian]

Comment 12

•

3 years ago

(In reply to Justin Link from comment #10)

Seeing timers used like this makes me kind of sad though. Am I the only one who thinks that repeatedly adding timers that need to be fired right away isn't a great way to use the Timer Thread?

Yeah, I refrained from suggesting immediately that we change the test helper to use ChromeUtils.idleDispatch instead of a 0ms timer and move on, mostly because I'm not sure if this would be hiding other similar cases. This might still be the way to move forward here.

Maybe this is just another reason to want to fix bug 1770658.

Justin Link

Assignee

Comment 13

•

3 years ago

Attached file Bug 1814718 - Buffer "timers fired per-wake-up" stats before sending to glean. r=florian,smaug — Details

Depends on D168915

Justin Link

Assignee

Comment 14

•

3 years ago

Here is push-to-try with my change to queue up glean submissions rather than send them every wake-up showing that the M-spi-nw tests are again passing on Windows 10 x64 2004 WebRender Shippable.

Pulsebot

Comment 15

•

3 years ago

Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/de43589862f3 Added telemetry for number of timers fired per Timer Thread wake-up. r=smaug,florian https://hg.mozilla.org/integration/autoland/rev/45ae4c850c9f Buffer "timers fired per-wake-up" stats before sending to glean. r=smaug

Sandor Molnar[:smolnar]

Comment 16

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/de43589862f3
https://hg.mozilla.org/mozilla-central/rev/45ae4c850c9f

Status: NEW → RESOLVED

Closed: 3 years ago

status-firefox112: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 112 Branch

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

3 years ago

Regressions: 1818758

Florian Quèze [:florian]

Updated

•

3 years ago