Closed Bug 903949 Opened 6 years ago Closed 6 years ago

about:memory and the thumbnails process are combining to cause hangs and unexpected behaviour

Categories

(Toolkit :: about:memory, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla26
Tracking Status
firefox24 --- unaffected
firefox25 + fixed
firefox26 + fixed

People

(Reporter: njn, Assigned: njn)

References

Details

(Whiteboard: [MemShrink][qa-])

Attachments

(2 files)

Hughman reported some hangs involving memory reporting.  After some investigation, here are my steps to reproduce:

- Start Firefox.
- Open about:memory, hit "measure".  Keep it open.
- Delete the images in the thumbnails dir, within the profile temp dir.  On my machine it's $HOME?.cache/mozilla/firefox/<profile>/thumbnails.
- Open about:newtab.  This triggers a thumbnails process.
- Wait ~60 seconds for the thumbnails process to die.
- about:memory will auto-update, due to the existing inter-process memory reporting machinery, which is triggered by the thumbnails process's death.

This last event is very surprising for the user, and probably undesirable.  Hughman was also seeing 7--10 second hangs, much longer than the 2 second pauses he saw after hitting "measure" in about:memory.  He's on Windows, I wonder if the IPC is slow.

Another curious thing is that even though the thumbnails process is passing memory reports to the main process, I don't see anything other than "Main Process" in about:memory.  Turns out there's a bug in ChildMemoryReporter, and it's always returning the empty string from GetProcess(), instead of the child process name.  That's easy to fix.

But for the original problem... the IPC done by about:memory isn't pretty.  It basically request child processes to give it some reports, and then measures the main process and displays the results.  Then, when a child process reports back, it re-measures the main process and re-displays.

And when the child process dies, it sends a message to the main process saying "all those measurements you had from me are dead, re-measure", more or less.  Ugh.
This fixes the problem of the child process not showing up in about:memory.
Just a dumb defect introduced when I changed the reporters to use
MemoryReporterBase, and not noticed because we didn't have child processes at
the time.
Attachment #788801 - Flags: review?(justin.lebar+bug)
Here are measurements of the thumbnails process:

> Content (3744)
> Explicit Allocations
> 
> 79.96 MB (100.0%) -- explicit
> ├──45.31 MB (56.67%) -- window-objects/top(http://techcrunch.com/, id=1)
> │  ├──32.55 MB (40.71%) -- active
> │  │  ├──21.54 MB (26.94%) ++ window(http://techcrunch.com/)
> │  │  ├───6.88 MB (08.60%) ++ window(http://www.youtube.com/embed/Gs0JD5KITfM?version=3&rel=1&fs=1&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent)
> │  │  ├───2.94 MB (03.68%) -- (8 tiny)
> │  │  │   ├──0.64 MB (00.81%) ++ window(http://static.ak.facebook.com/connect/xd_arbiter.php?version=26#channel=f9b30a982fa908&origin=http%3A%2F%2Ftechcrunch.com&channel_path=%2F%3Ffb_xd_fragment%23xd_sig%3Df59c43a4342efc%26)
> │  │  │   ├──0.64 MB (00.81%) ++ window(https://s-static.ak.facebook.com/connect/xd_arbiter.php?version=26#channel=f9b30a982fa908&origin=http%3A%2F%2Ftechcrunch.com&channel_path=%2F%3Ffb_xd_fragment%23xd_sig%3Df59c43a4342efc%26)
> │  │  │   ├──0.44 MB (00.55%) ++ window(http://ads.tw.adsonar.com/adserving/getAds.jsp?previousPlacementIds=&placementId=1530659&pid=2360767&zw=300&zh=250&ssl=false&url=http%3A//techcrunch.com/&v=5&dct=TechCrunch)
> │  │  │   ├──0.42 MB (00.53%) ++ window(http://techcrunch.com/wp-content/themes/vip/tctechcrunch2/_uac/isocket.html?phpAds_used=&MAX_used=&OA_used=)
> │  │  │   ├──0.42 MB (00.53%) ++ window(about:blank)
> │  │  │   ├──0.15 MB (00.19%) ++ window(http://cdn.tacoda.at.atwola.com/an/qseg.html)
> │  │  │   ├──0.11 MB (00.14%) ++ window(http://rma-api.gravity.com/v1/api/intelligence/w2?sg=fca4fa8af1286d8a77f26033fdeed202&pl=7&ug=&b=13&ad=&sourceUrl=http%3A%2F%2Ftechcrunch.com%2F&frameUrl=http%3A%2F%2Ftechcrunch.com%2F&clientTime=1376287926106&pageViewId%5BwidgetLoaderWindowUrl%5D=http%3A%2F%2Ftechcrunch.com%2F&pageViewId%5BtimeMillis%5D=1376287926105&pageViewId%5Brand%5D=20848183125038966)
> │  │  │   └──0.11 MB (00.14%) ++ window(http://ads.tw.adsonar.com/adserving/createASLId.jsp)
> │  │  └───1.19 MB (01.48%) ++ window(http://techcrunch.com/wp-content/themes/vip/tctechcrunch2/_uac/adpage.html)

So the thumbnails process actually loads the pages in question?  Huh.
SetChildMemoryReporters() triggers about:memory regeneration.  It's called from
two places:

- MemoryReportRequestParent::Recv__delete__(), which is triggered by the
  receipt of a "child-memory-reporter-request" event.

- ContentParent::ActorDestroy, which is triggered by the death of a child
  process.

This patch changes the latter... we don't want to auto-update about:memory in
that case, we just want to delete the previously-registered reports.
Attachment #788806 - Flags: review?(justin.lebar+bug)
BTW, I don't have any good ideas on how to test these changes.

Also, if/when we get to serious e10s usage, this hacky child process memory reporting situation is going to need some serious re-working.  (For example, if you do "Measure and save" in about:memory when the thumbnails process is running, the thumbnails process doesn't get measured, because the child process feedback only works with on-screen about:memory generation.  Ugh.)  But I'm not sure what the new design will look like, and these patches are enough to keep things limping along for the moment :/
> > Content (3744)

And it's a shame we don't get a more informative process name for the thumbnails process.  I'm trying to remember how B2G does a better job with this...
Attachment #788801 - Flags: review?(justin.lebar+bug) → review+
Attachment #788806 - Flags: review?(justin.lebar+bug) → review+
> BTW, I don't have any good ideas on how to test these changes.

You mean, how to write an automated test?

It would be good to have an automated test verifying that child-process memory reporters actually work.  That shouldn't be too hard; you can create a <xul:browser remote=true> or an <iframe mozbrowser>, pull the memory reports, and verify that you have two processes.

If you want to verify that forcibly killing the child process doesn't cause SetChildMemoryReporters to run, that's a bit more difficult, but we can kill child processes from script.  It might be sufficient simply to remove the browser/iframe from the DOM.  That doesn't forcibly kill it, but it's the same path here.

But I'm not sure that's worth testing, myself.

I totally agree with everything else you've said here, particularly the bit about these hacky memory reporters.
> It would be good to have an automated test verifying that child-process memory reporters 
> actually work. 

Oh right, bug 673323.  :)

> And it's a shame we don't get a more informative process name for the thumbnails process.

B2G gets the name from ContentChild::GetProcessName.  In the parent process we have ContentParent::FriendlyName, which is similar.
> If you want to verify that forcibly killing the child process doesn't cause
> SetChildMemoryReporters to run, that's a bit more difficult...
> But I'm not sure that's worth testing, myself.

I agree.  I just want to test multi-process reporting, i.e. patch 1.  I'll try mozbrowser.

Bug 673323 comment 4 has a suggested design for a better approach to memory reporting with multiple processes.
Blocks: 904321
https://hg.mozilla.org/integration/mozilla-inbound/rev/780836c77424
https://hg.mozilla.org/integration/mozilla-inbound/rev/911c2e86d6a4

I landed without a test because I was having trouble getting it working and jlebar just boarded a plane.  I filed bug 904321 as a follow-up.

I guess I should nominate these for Aurora, since the thumbnails child process is on Aurora.
Blocks: 870100
https://hg.mozilla.org/mozilla-central/rev/780836c77424
https://hg.mozilla.org/mozilla-central/rev/911c2e86d6a4
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Ready for an uplift request with risk evaluation.
Comment on attachment 788801 [details] [diff] [review]
(part 1) - ChildMemoryReporter should return mProcess in GetProcess() rather than an empty string.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): some refactoring patch some time ago.

User impact if declined: Memory measurements from the thumbnails child process are incorrectly combined with those from the parent process in about:memory

Testing completed (on m-c, etc.): on m-c for a few days.

Risk to taking this patch (and alternatives if risky): none.  Code was obviously incorrect;  patch is trivial.

String or IDL/UUID changes made by this patch:  none.
Attachment #788801 - Flags: approval-mozilla-aurora?
(In reply to Nicholas Nethercote [:njn] from comment #12)
> Comment on attachment 788801 [details] [diff] [review]
> (part 1) - ChildMemoryReporter should return mProcess in GetProcess() rather
> than an empty string.
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): some refactoring patch some time
> ago.

I should add:  the creation of the thumbnails child process (bug 870100) exposed this latent defect.
Comment on attachment 788806 [details] [diff] [review]
(part 2) - Don't trigger about:memory re-generation when a child process closes.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): has always been latently present, but exposed by bug 870100.

User impact if declined: when the thumbnails process runs, if you have about:memory open it can refresh itself unexpectedly.

Testing completed (on m-c, etc.): on m-c for a few days.

Risk to taking this patch (and alternatives if risky): low.  The patch is simple.

String or IDL/UUID changes made by this patch: none.
Attachment #788806 - Flags: approval-mozilla-aurora?
Attachment #788801 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #788806 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I have tested this issue on FF 25b2 following STR from Comment 0. 

I didn't encounter any hang while testing but I am not sure if the steps I had used are reliable. After I opened about:newtab I can see a memory increse by refreshing about:memory tab, in the explicit memory section which means the thumbnails process are mesured.

Anthony, this is what this patch is expected to do? If it does please mark it as being verified, if not please provide me a way to test this behaviour.
Flags: needinfo?(anthony.s.hughes)
Mihai, how did you reproduce this issue on the build with the bug? What makes the steps seem unreliable to you?
Flags: needinfo?(anthony.s.hughes) → needinfo?(mihai.morar)
In reply to Comment 0

> This last event is very surprising for the user, and probably undesirable.  Hughman was also seeing 
> 7--10 second hangs, much longer than the 2 second pauses he saw after hitting "measure" in 
> about:memory.  He's on Windows, I wonder if the IPC is slow.

I was able to reproduce a short hang about 1-2 seconds, but not as bad as mentioned in Comment 0.

In reply to Comment 4
> (For example, if you do "Measure and save" in about:memory when the thumbnails process is running, 
> the thumbnails process doesn't get measured, because the child process feedback only works with 
> on-screen about:memory generation.  Ugh.)

After I opened about:newtab I can see a memory increse by refreshing about:memory tab, in the explicit memory section which means the thumbnails process are mesured.

On 25b2 I was not able to reproduce the short hang (1-2sec) mentioned above.

Ioana what I wanted to say with "I am not sure if the steps I had used are reliable" it was that if Nicholas said in Comment 4 that "the thumbnails process doesn't get measured" and my memory usage got increased I don't think I am able to fully reproduce this issue on 24 builds following those STR from Comment 0, that's why I didn't mark this bug as being verified, even if I got no hangs on FF 25b2.
Flags: needinfo?(mihai.morar)
Given the lack of a reliable reproduction I'm going to mark this as [qa-]. Nicholas, if there is something you are specifically concerned about testing then please let us know.
Keywords: verifyme
Whiteboard: [MemShrink] → [MemShrink][qa-]
You need to log in before you can comment on or make changes to this bug.