PContent::Msg_GetGraphicsDeviceInitData can take a long time

RESOLVED FIXED in Firefox 55

Status

()

Core
Graphics
P3
normal
RESOLVED FIXED
3 months ago
2 days ago

People

(Reporter: Ehsan, Assigned: gerald)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

unspecified
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

(Whiteboard: [qf:p1][gfx-noted])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

Similar to bug 1337062:

<https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-02-05&keys=PContent%253A%253AMsg_GetGraphicsDeviceInitData!PContent%253A%253AMsg_RpcMessage!PContent%253A%253AMsg_GetGfxVars!PAPZCTreeManager%253A%253AMsg_ReceiveMouseInputEvent&max_channel_version=nightly%252F54&measure=IPC_SYNC_LATENCY_MS&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-02-02&table=0&trim=1&use_submission_date=0>

Start	End	IPC_SYNC_LATENCY_MS Count
0	1	0 (0%)
1	3	47.75k (22.31%)
3	8	38.43k (17.96%)
8	20	35.9k (16.78%)
20	50	35.32k (16.51%)
50	126	23.94k (11.18%)
126	317	16.73k (7.82%)
317	796	10.09k (4.71%)
796	2k	4.29k (2%)
2k	Infinity	1.56k (0.73%)
Priority: -- → P3
See Also: → bug 1337062
Whiteboard: gfx-noted
This is the third worst sync IPC that we have after bug 1337062 and bug 1194751 with 95 percentile of 677.
I have the same question here as in bug 1337062 comment 13.
Flags: needinfo?(milan)

Updated

2 months ago
Whiteboard: gfx-noted → [qf:p1][gfx-noted]
Assignee: nobody → gsquelart
Flags: needinfo?(milan)
Doing bug 1337062 first.  Close to landing.
(Assignee)

Comment 4

19 days ago
That's right.

I've actually started working on it (~half done now), using the same awkward command-line strategy for now, so we can get its benefits quickly before we implement a more elegant solution...
I don't understand why we can't do a more elegant solution now? Why can't we put this data into the initial IPC pipe?
(Assignee)

Comment 6

19 days ago
David, could you please open a separate bug for that?

I'd like to focus here on PContent::Msg_GetGraphicsDeviceInitData, so we don't block processes because of it anymore, regardless of the method of transfer for the initial data.

I know almost nothing about how the IPC works, so I think someone who is better versed into it should tackle that (I'd be happy to learn & look into it, but it would take much more time); and then we can move all the existing work that uses the command line, to use the pipe instead (or whatever other solution is chosen*).

* My own uneducated thought: It would be great if we could somehow reuse the IPC calls, so that the code that de/serializes data is automatically generated from ipdls, instead of having to write it (again) by hand and having to maintain it when structures change.
(Assignee)

Comment 7

19 days ago
(In reply to David Anderson [:dvander] from comment #5)
> Why can't we put this data into the initial IPC pipe?

And just about that question, in case you missed it, it was answered there:
> Bill McCloskey (:billm) wrote in bug 1337062 comment #8:
> (In reply to David Anderson [:dvander] from bug 1337062 comment #7)
> > This is needed during startup, but the data is pretty simple. Out of
> > curiosity, we launch the subprocess with an initial pipe right? Is there any
> > reason we can't shove the data in there instead of the command line, which
> > is super gross? Or can we not write to a pipe until both ends are connected?
> There was some work in bug 1303096 to do that using PeekMessages, but Brad
> hit some test failures that he couldn't figure out. If you want to give it a
> go, that would be great. The command line thing is gross.

So I think we all want a better solution, but it's not that trivial! Help welcome. ;-)
My objection to hacking a solution via argv is that, as soon as we need to modify anything involving this message, someone is going to have to touch that code, and it seems like exactly the sort of thing that is hard to maintain. And unless we're planning to immediately uplift this to 53, it doesn't make sense that we need to rush a hack. If we file a follow-up bug it'll probably never get fixed, until the maintenance burden becomes too high for someone else.

I'll take a look at the failures in bug 1303096 to see what went wrong.
(In reply to David Anderson [:dvander] from comment #8)
> My objection to hacking a solution via argv is that, as soon as we need to
> modify anything involving this message, someone is going to have to touch
> that code, and it seems like exactly the sort of thing that is hard to
> maintain. And unless we're planning to immediately uplift this to 53, it
> doesn't make sense that we need to rush a hack. If we file a follow-up bug
> it'll probably never get fixed, until the maintenance burden becomes too
> high for someone else.

Fixing the issue is a very high priority and it is a good candidate for an uplift. The sooner the code lands the better position we will be in to make that determination. I see no reason hijack this bug with repaying technical debt from the other similar bugs.

I'm not concerned about incurring some technical debt here and neither should you. My team have a really good track record of repaying technical debt so any followup will get prioritised and fixed if it is important enough.

Can you please create a follow up bug with a clear statement of exactly what you think is wrong with the current implementation, separate from any proposed solutions? You'll need to needinfo? me so I don't miss it.
I objected in those bugs too - we have a system in place already for serializing structures to other processes, and adding a new one based on string parsing doesn't seem good. Every time we change these messages we have to consider the string parsing code.
Yuck. The IPC bug Brad encountered is proving to be very difficult to track down. Luckily, *this* bug shouldn't need either approach.

The content process does not request graphics data until it initializes XPCOM, which happens from an async message from the parent process [1]. We can send the ContentDeviceData as part of the SetXPCOMProcessAttributes message. gfxVars could have gone there as well.

Note we can't eliminate this sync message in general but we can on startup - a "don't fetch the first time" flag in gfxPlatform would be okay. If I can help please let me know.

[1] http://searchfox.org/mozilla-central/rev/c4fdb67bca7889e59af9dd99c604651a49c4aafa/dom/ipc/PContent.ipdl#455
[2] http://searchfox.org/mozilla-central/rev/c4fdb67bca7889e59af9dd99c604651a49c4aafa/dom/ipc/ContentChild.cpp#523
(Assignee)

Comment 12

15 days ago
(In reply to David Anderson [:dvander] from comment #11)
> The content process does not request graphics data until it initializes
> XPCOM, which happens from an async message from the parent process [1]. We
> can send the ContentDeviceData as part of the SetXPCOMProcessAttributes
> message. gfxVars could have gone there as well.

Thank you for the tip, this seems to work, and does look like the ideal place to do this.

Tries in progress:
- ContentDeviceData:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9dbd8a23e4b79157679da985051a9809e5b7e917
- And gfxVarUpdates:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=18826ac92a411dcadd4b7f9efa95156fb66afa48

Could you please have a quick look at the patches there? Feedback welcome before I push for official review.
(I've put MOZ_ASSERTs to catch unexpected races; but I'm keeping the old sync requests as a fallback otherwise, until we're confident it's all good.)
Flags: needinfo?(dvander)
These look great, thanks!
Flags: needinfo?(dvander)
Comment hidden (mozreview-request)
(In reply to Gerald Squelart [:gerald] from comment #12)
> ...
> (I've put MOZ_ASSERTs to catch unexpected races; but I'm keeping the old
> sync requests as a fallback otherwise, until we're confident it's all good.)

MOZ_CRASH or gfxDevCrash() instead of MOZ_ASSERT may be a better choice if we really want to find out quickly if this is a problem or not.  A lot more people run optimized build :)
(Assignee)

Comment 16

15 days ago
I didn't want to be too annoying to our users, in case our assumptions are wrong; and especially this close to a new release.

However, if you think it would be worth the risk, I'd be happy to switch to MOZ_CRASH.
Let's do a MOZ_CRASH (or, gfxDevCrash which is MOZ_CRASH in Nightly, and MOZ_ASSERT in Beta/Release.)  I'm assuming it would be unexpected if we hit it, and tell us we have a bug to fix - may as well know about that sooner.
Thanks for doing all this!

Comment 18

15 days ago
mozreview-review
Comment on attachment 8856817 [details]
Bug 1337063 - Transfer child's initial ContentDeviceData with SendSetXPCOMProcessAttributes -

https://reviewboard.mozilla.org/r/128738/#review131576

Thanks!
Attachment #8856817 - Flags: review?(dvander) → review+
Comment hidden (mozreview-request)

Comment 20

14 days ago
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a6300a521f9c
Transfer child's initial ContentDeviceData with SendSetXPCOMProcessAttributes - r=dvander

Comment 21

14 days ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a6300a521f9c
Status: NEW → RESOLVED
Last Resolved: 14 days ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(Assignee)

Updated

2 days ago
Depends on: 1356138
You need to log in before you can comment on or make changes to this bug.