Closed Bug 1228437 Opened 9 years ago Closed 8 years ago

Investigate missing BHR stats from e10s child process

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
e10s + ---
firefox45 --- fixed
firefox46 --- fixed

People

(Reporter: vladan, Assigned: vladan)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

I can't generate BHR child process hang stats in my local Nightly or Aurora and poiru is not seeing them server-side either https://bugzilla.mozilla.org/show_bug.cgi?id=1222972#c1
Blocks: 1222972
Blocks: 1182637
The e10s threadHangStats (under payload/childPayloads) looks like this:

  [{u'activity': {u'bucket_count': 33,
                  u'histogram_type': 0,
                  u'log_sum': 0,
                  u'log_sum_squares': 0,
                  u'range': [1, 0],
                  u'sum': 0,
                  u'values': {}},
    u'hangs': {},
    u'name': u''},
   {u'activity': {u'bucket_count': 33,
                  u'histogram_type': 0,
                  u'log_sum': 0,
                  u'log_sum_squares': 0,
                  u'range': [1, 0],
                  u'sum': 0,
                  u'values': {u'0': 0, u'1': 15, u'3': 0}},
    u'hangs': {},
    u'name': u'ImageBridgeChild'},
   {u'activity': {u'bucket_count': 33,
                  u'histogram_type': 0,
                  u'log_sum': 0,
                  u'log_sum_squares': 0,
                  u'range': [1, 0],
                  u'sum': 0,
                  u'values': {u'0': 0, u'1': 13, u'3': 0}},
    u'hangs': {},
    u'name': u'ProcessHangMonitor'}]

So some data is collected about the ProcessHangMonitor and ImageBridgeChild threads, but the main thread seems to be missing. There is also a mysterious unnamed thread.
(In reply to Birunthan Mohanathas [:poiru] from comment #1)
> The e10s threadHangStats (under payload/childPayloads) looks like this:
...
> So some data is collected about the ProcessHangMonitor and ImageBridgeChild
> threads, but the main thread seems to be missing. There is also a mysterious
> unnamed thread.

Is this valid for the majority of e10s pings?
The unnamed thread in comment 2 is actually the content process main thread. It's unnamed because this is never executed on the content process: https://dxr.mozilla.org/mozilla-central/rev/74c7941a9e22d50057800771ebae07f69deecc9f/xpcom/build/XPCOMInit.cpp#536-540

(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #2)
> Is this valid for the majority of e10s pings?

I checked the e10s-enabled Nightly pings that were submitted yesterday (with `fraction=0.1`) and none of them had hang stacks under childPayloads.
Blocks: 1229104
I tested that this works, but I feel like this approach might be overlooking hazards?
Attachment #8701925 - Flags: feedback?(nchen)
Assignee: nobody → vladan.bugzilla
Comment on attachment 8701925 [details] [diff] [review]
Enable BHR and chromeHangs in content processes, draft

Review of attachment 8701925 [details] [diff] [review]:
-----------------------------------------------------------------

I think this would be fine. I don't think UI activity tracking in HangMonitor is relevant to content processes, but it looks harmless. Otherwise, I didn't see anything that wouldn't work in content processes.
Attachment #8701925 - Flags: feedback?(nchen) → feedback+
Comment on attachment 8701925 [details] [diff] [review]
Enable BHR and chromeHangs in content processes, draft

Approval Request Comment
[Feature/regressing bug #]: Measurement missing since e10s became a thing
[User impact if declined]: We won't get this BHR responsiveness measurement from the e10s content process, preventing us from accurately comparing e10s vs non-e10s performance in our Beta 44 experiment, needed for making a recommendation for shipping or not shipping e10s by default in Release 45
[Describe test coverage new/current, TreeHerder]: small change to measurement code, manual local testing
[Risks and why]: Seems very low. The change is small + it just adds a measurement that is missing in the content process
[String/UUID change made/needed]: None
Attachment #8701925 - Flags: approval-mozilla-beta?
Attachment #8701925 - Flags: approval-mozilla-aurora?
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/3505bec8eac5 for timing out in test_NuwaProcessCreation.html on B2G ICS emulator debug:

backout job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=3505bec8eac5
first failing job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=6daddfb64d72

Log: https://treeherder.mozilla.org/logviewer.html#?job_id=19022879&repo=mozilla-inbound

19:08:59     INFO -  99 INFO TEST-START | dom/ipc/tests/test_NuwaProcessCreation.html
19:09:49     INFO -  ###################################### forms.js loaded
19:09:49     INFO -  ############################### browserElementPanning.js loaded
19:09:49     INFO -  ###################################### BrowserElementCopyPaste.js loaded
19:09:50     INFO -  ######################## BrowserElementChildPreload.js loaded
19:16:16     INFO -  100 INFO Set up preferences for testing the Nuwa process.
19:16:16     INFO -  101 INFO Shut down processes by disabling process prelaunch
19:16:16     INFO -  102 INFO Launch the Nuwa process
19:16:16     INFO -  103 INFO TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_NuwaProcessCreation.html | Test timed out.
19:16:16     INFO -      reportError@SimpleTest/TestRunner.js:114:7
19:16:16     INFO -      TestRunner._checkForHangs@SimpleTest/TestRunner.js:134:7
19:16:17     INFO -  MEMORY STAT | vsize 153MB | residentFast 70MB | heapAllocated 20MB
19:16:18     INFO -  104 INFO TEST-OK | dom/ipc/tests/test_NuwaProcessCreation.html | took 438607ms
Comment on attachment 8701925 [details] [diff] [review]
Enable BHR and chromeHangs in content processes, draft

Please renominate once the reasons for backout are addressed. Thanks!
Attachment #8701925 - Flags: approval-mozilla-beta?
Attachment #8701925 - Flags: approval-mozilla-aurora?
adding this to e10s m8 for tracking purposes. we are blocked on this waiting on good data for perf bug 1182637.
Blocks: e10s-perf
Priority: -- → P1
Re-landed with Ting-Yu's fix. Thank you!
Comment on attachment 8708235 [details] [diff] [review]
patch v2

Approval Request Comment
[Feature/regressing bug #]: none
[User impact if declined]: we won't get good performance (BHR) data from the Beta 45 A/B experiment
[Describe test coverage new/current, TreeHerder]: no issues on TreeHerder, tested locally
[Risks and why]: not much, as far as I can tell. the overhead of content-process BHR monitoring is minimal and the equivalent monitoring is already done in non-e10s
[String/UUID change made/needed]: none
Attachment #8708235 - Flags: approval-mozilla-aurora?
Blocks: 1240648
https://hg.mozilla.org/mozilla-central/rev/28731460915c
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Whiteboard: [e10s-45-uplift]
Blocks: 1241336
Comment on attachment 8708235 [details] [diff] [review]
patch v2

Better metrics, taking it.
Attachment #8708235 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [e10s-45-uplift]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: