Investigate missing BHR stats from e10s child process

RESOLVED FIXED in Firefox 45

Status

()

Toolkit
Telemetry
P1
normal
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: vladan, Assigned: vladan)

Tracking

(Blocks: 3 bugs)

Trunk
mozilla46
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s+, firefox45 fixed, firefox46 fixed)

Details

Attachments

(2 attachments)

(Assignee)

Description

2 years ago
I can't generate BHR child process hang stats in my local Nightly or Aurora and poiru is not seeing them server-side either https://bugzilla.mozilla.org/show_bug.cgi?id=1222972#c1
(Assignee)

Updated

2 years ago
Blocks: 1222972
(Assignee)

Updated

2 years ago
Blocks: 1182637
The e10s threadHangStats (under payload/childPayloads) looks like this:

  [{u'activity': {u'bucket_count': 33,
                  u'histogram_type': 0,
                  u'log_sum': 0,
                  u'log_sum_squares': 0,
                  u'range': [1, 0],
                  u'sum': 0,
                  u'values': {}},
    u'hangs': {},
    u'name': u''},
   {u'activity': {u'bucket_count': 33,
                  u'histogram_type': 0,
                  u'log_sum': 0,
                  u'log_sum_squares': 0,
                  u'range': [1, 0],
                  u'sum': 0,
                  u'values': {u'0': 0, u'1': 15, u'3': 0}},
    u'hangs': {},
    u'name': u'ImageBridgeChild'},
   {u'activity': {u'bucket_count': 33,
                  u'histogram_type': 0,
                  u'log_sum': 0,
                  u'log_sum_squares': 0,
                  u'range': [1, 0],
                  u'sum': 0,
                  u'values': {u'0': 0, u'1': 13, u'3': 0}},
    u'hangs': {},
    u'name': u'ProcessHangMonitor'}]

So some data is collected about the ProcessHangMonitor and ImageBridgeChild threads, but the main thread seems to be missing. There is also a mysterious unnamed thread.
(In reply to Birunthan Mohanathas [:poiru] from comment #1)
> The e10s threadHangStats (under payload/childPayloads) looks like this:
...
> So some data is collected about the ProcessHangMonitor and ImageBridgeChild
> threads, but the main thread seems to be missing. There is also a mysterious
> unnamed thread.

Is this valid for the majority of e10s pings?
The unnamed thread in comment 2 is actually the content process main thread. It's unnamed because this is never executed on the content process: https://dxr.mozilla.org/mozilla-central/rev/74c7941a9e22d50057800771ebae07f69deecc9f/xpcom/build/XPCOMInit.cpp#536-540

(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #2)
> Is this valid for the majority of e10s pings?

I checked the e10s-enabled Nightly pings that were submitted yesterday (with `fraction=0.1`) and none of them had hang stacks under childPayloads.
(Assignee)

Updated

2 years ago
Blocks: 1229104

Updated

2 years ago
tracking-e10s: --- → +
(Assignee)

Comment 4

a year ago
Created attachment 8701925 [details] [diff] [review]
Enable BHR and chromeHangs in content processes, draft

I tested that this works, but I feel like this approach might be overlooking hazards?
Attachment #8701925 - Flags: feedback?(nchen)
(Assignee)

Updated

a year ago
Assignee: nobody → vladan.bugzilla
Comment on attachment 8701925 [details] [diff] [review]
Enable BHR and chromeHangs in content processes, draft

Review of attachment 8701925 [details] [diff] [review]:
-----------------------------------------------------------------

I think this would be fine. I don't think UI activity tracking in HangMonitor is relevant to content processes, but it looks harmless. Otherwise, I didn't see anything that wouldn't work in content processes.
Attachment #8701925 - Flags: feedback?(nchen) → feedback+
(Assignee)

Comment 6

a year ago
Comment on attachment 8701925 [details] [diff] [review]
Enable BHR and chromeHangs in content processes, draft

Approval Request Comment
[Feature/regressing bug #]: Measurement missing since e10s became a thing
[User impact if declined]: We won't get this BHR responsiveness measurement from the e10s content process, preventing us from accurately comparing e10s vs non-e10s performance in our Beta 44 experiment, needed for making a recommendation for shipping or not shipping e10s by default in Release 45
[Describe test coverage new/current, TreeHerder]: small change to measurement code, manual local testing
[Risks and why]: Seems very low. The change is small + it just adds a measurement that is missing in the content process
[String/UUID change made/needed]: None
Attachment #8701925 - Flags: approval-mozilla-beta?
Attachment #8701925 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 7

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/6daddfb64d72

Comment 8

a year ago
Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3505bec8eac5
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/3505bec8eac5 for timing out in test_NuwaProcessCreation.html on B2G ICS emulator debug:

backout job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=3505bec8eac5
first failing job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=6daddfb64d72

Log: https://treeherder.mozilla.org/logviewer.html#?job_id=19022879&repo=mozilla-inbound

19:08:59     INFO -  99 INFO TEST-START | dom/ipc/tests/test_NuwaProcessCreation.html
19:09:49     INFO -  ###################################### forms.js loaded
19:09:49     INFO -  ############################### browserElementPanning.js loaded
19:09:49     INFO -  ###################################### BrowserElementCopyPaste.js loaded
19:09:50     INFO -  ######################## BrowserElementChildPreload.js loaded
19:16:16     INFO -  100 INFO Set up preferences for testing the Nuwa process.
19:16:16     INFO -  101 INFO Shut down processes by disabling process prelaunch
19:16:16     INFO -  102 INFO Launch the Nuwa process
19:16:16     INFO -  103 INFO TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_NuwaProcessCreation.html | Test timed out.
19:16:16     INFO -      reportError@SimpleTest/TestRunner.js:114:7
19:16:16     INFO -      TestRunner._checkForHangs@SimpleTest/TestRunner.js:134:7
19:16:17     INFO -  MEMORY STAT | vsize 153MB | residentFast 70MB | heapAllocated 20MB
19:16:18     INFO -  104 INFO TEST-OK | dom/ipc/tests/test_NuwaProcessCreation.html | took 438607ms
BackgroundHangMonitor needs to add https://dxr.mozilla.org/mozilla-central/source/widget/gonk/GonkMemoryPressureMonitoring.cpp#26-28 and https://dxr.mozilla.org/mozilla-central/source/widget/gonk/GonkMemoryPressureMonitoring.cpp#122-126 for b2g.
Comment on attachment 8701925 [details] [diff] [review]
Enable BHR and chromeHangs in content processes, draft

Please renominate once the reasons for backout are addressed. Thanks!
Attachment #8701925 - Flags: approval-mozilla-beta?
Attachment #8701925 - Flags: approval-mozilla-aurora?
adding this to e10s m8 for tracking purposes. we are blocked on this waiting on good data for perf bug 1182637.
tracking-e10s: + → m8+
Blocks: 1063169
tracking-e10s: m8+ → +
Priority: -- → P1
Created attachment 8708235 [details] [diff] [review]
patch v2

Addressed failed Nuwa test cases. Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f4b4940a25f4

Comment 14

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/28731460915c
(Assignee)

Comment 15

a year ago
Re-landed with Ting-Yu's fix. Thank you!
(Assignee)

Comment 16

a year ago
Comment on attachment 8708235 [details] [diff] [review]
patch v2

Approval Request Comment
[Feature/regressing bug #]: none
[User impact if declined]: we won't get good performance (BHR) data from the Beta 45 A/B experiment
[Describe test coverage new/current, TreeHerder]: no issues on TreeHerder, tested locally
[Risks and why]: not much, as far as I can tell. the overhead of content-process BHR monitoring is minimal and the equivalent monitoring is already done in non-e10s
[String/UUID change made/needed]: none
Attachment #8708235 - Flags: approval-mozilla-aurora?
(Assignee)

Updated

a year ago
Blocks: 1240648
status-firefox46: --- → affected

Comment 17

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/28731460915c
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox46: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Whiteboard: [e10s-45-uplift]
(Assignee)

Updated

a year ago
Blocks: 1241336
Comment on attachment 8708235 [details] [diff] [review]
patch v2

Better metrics, taking it.
Attachment #8708235 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 19

a year ago
https://hg.mozilla.org/releases/mozilla-aurora/rev/30d486237af3
status-firefox45: affected → fixed
Whiteboard: [e10s-45-uplift]
You need to log in before you can comment on or make changes to this bug.