Closed Bug 1292218 Opened 3 years ago Closed 3 years ago

Do collector work more aggressively in reftests

Categories

(Testing :: Reftest, defect)

defect
Not set

Tracking

(firefox51 fixed)

RESOLVED WONTFIX
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

Attachments

(2 files, 1 obsolete file)

Reftests rapidly open and close windows, which our GC/CC heuristics are not set up to deal with. We also don't want to just force a sync GC and CC because this reduces the amount of testing we get for incremental GC and CC. To address this, I added a window utils method runNextCollectorTimer(), which does a bit of incremental collector work (or sometimes starts one if none is in progress). Mochitests already call this once after every test, but each reftest is so much faster that I think it makes sense to call it more times in a row.

To measure the effectiveness of this, I ran the following, and measured for various numbers of times in a row running runNextCollectorTimer() after each test:
  time ./mach reftest layout/reftests/reftest-sanity/ 2>&1 | grep "\+\+DOMWINDOW" | sort

For each test run, the format is (# of peak windows, time to run test).

no runNextCollectorTimer(): (290, 18.1), (290, 18.1)
3 calls in a row: (133, 18.5), (133, 18.5), (141, 18.4)
4 calls in a row: (93, 18.8), (109, 18.4)
5 calls in a row: (103 windows, 18.9), (99, 18.4), (99, 18.3)

Running it 3 times in a row drops the peak from 290 windows to 133. 4 drops it further to 100. It doesn't seem to substantially increase the running time. I'm going to do some try runs and see how it affects a full reftest run, both in terms of peak windows and time.

Having many windows open at once will make memory usage worse, which could cause OOM crashes on 32-bit tests. Additionally, if you end up collecting hundreds of windows at once (which I have seen when running locally), this can take a while and cause a test to spuriously time out.

Bug 1110928 seems to increase the number of windows a bit, by collecting less aggressively, so I would like to improve the current situation so I can land that patch.
With my patches in bug Bug 1110928 and 4 in a row: (120, 18.5), (120, 18.2), (120, 18.2), (119, 20.6), (112, 18)

So, a little worse but still better than it currently is.
I looked at a full run for Linux 64 debug R1. Without my patch, there was a peak of about 471 windows. With it, it was 295. So, still high, but not as bad.

I calculated that by using a little script that creates some ASCII art representing how many windows are opened: https://github.com/amccreight/mochitest-logs/blob/master/bloatfinder.py

Terrence wanted to discuss this with me, so I'll needinfo him for that, so neither of us forget.
Flags: needinfo?(terrence)
Attached patch implement_MOZ_CCTIMER-v0.diff (obsolete) — Splinter Review
We have MOZ_GCTIMER for the GC, this adds the same for the CC.
Attachment #8778507 - Flags: review?(continuation)
Comment on attachment 8778507 [details] [diff] [review]
implement_MOZ_CCTIMER-v0.diff

I'll assume the new bug replaces this.
Attachment #8778507 - Attachment is obsolete: true
Attachment #8778507 - Flags: review?(continuation)
(In reply to Andrew McCreight [:mccr8] from comment #5)
> Comment on attachment 8778507 [details] [diff] [review]
> implement_MOZ_CCTIMER-v0.diff
> 
> I'll assume the new bug replaces this.

Whoops! I forgot that I had put the patch here Friday. It warrants a new bug regardless.
See Also: → 1296358
See Also: 1296358
Duplicate of this bug: 1296358
I talked to Terrence about this. His main issues was just that this does not fix the issue he is seeing. But I think we should just go ahead and land this, particularly since mchang is also seeing issues with the reftest bloat.

Mason, does this patch fix the problem you are seeing with your other patches?
Flags: needinfo?(terrence) → needinfo?(mchang)
Comment on attachment 8782554 [details]
Bug 1292218 - Run collectors more in reftests.

https://reviewboard.mozilla.org/r/72678/#review70346

some pending questions, possibly I just need more context.

::: layout/tools/reftest/reftest-content.js:694
(Diff revision 1)
>  
> +    // Do some collector work so that windows don't build up too much.
> +    windowUtils().runNextCollectorTimer();
> +    windowUtils().runNextCollectorTimer();
> +    windowUtils().runNextCollectorTimer();
> +    windowUtils().runNextCollectorTimer();

why do we need to call this 4 times?  And this looks to be called for every test.

Does this do work in both content and chrome processes?
Attachment #8782554 - Flags: review?(jmaher) → review-
Sorry, I should have explained more what this patch is trying to do.

(In reply to Joel Maher ( :jmaher) from comment #10)
> why do we need to call this 4 times?  And this looks to be called for every
> test.

Well, because when I only call it once it doesn't do that much. ;) I just tried out different numbers of times we call it and saw how it helped.

Each call should not take very long, something like 5ms to 40ms. What bug 1296358 was doing was running a full GC every 10 tests (which is a few hundred ms). What this patch does is instead take that full GC and break it up into little pieces, and we run a few pieces at the end of every test.

> Does this do work in both content and chrome processes?

Yes, this script is run in both, which I agree is surprising. :P
The reason that incremental GCs are better than full GCs is that for actual Firefox users they are mostly getting incremental GCs. (And incremental GCs are much trickier to get right.) My patch also runs the cycle collector in addition to the garbage collector, so it hopefully collects more.
(In reply to Andrew McCreight [:mccr8] from comment #11)
> Sorry, I should have explained more what this patch is trying to do.
> 
> (In reply to Joel Maher ( :jmaher) from comment #10)
> > why do we need to call this 4 times?  And this looks to be called for every
> > test.
> 
> Well, because when I only call it once it doesn't do that much. ;) I just
> tried out different numbers of times we call it and saw how it helped.
> 
> Each call should not take very long, something like 5ms to 40ms. What bug
> 1296358 was doing was running a full GC every 10 tests (which is a few
> hundred ms). What this patch does is instead take that full GC and break it
> up into little pieces, and we run a few pieces at the end of every test.
>

Please put this in a comment above.
 
> > Does this do work in both content and chrome processes?
> 
> Yes, this script is run in both, which I agree is surprising. :P

Please put this in a comment as well. 

(In reply to Andrew McCreight [:mccr8] from comment #9)
> I talked to Terrence about this. His main issues was just that this does not
> fix the issue he is seeing. But I think we should just go ahead and land
> this, particularly since mchang is also seeing issues with the reftest bloat.
> 
> Mason, does this patch fix the problem you are seeing with your other
> patches?

I'll find out in three days, but we can land this first and if not keep trying other things. Sigh, try times are forever right now :(.
Flags: needinfo?(mchang)
(In reply to Mason Chang [:mchang] from comment #13)
> Please put this in a comment above.

That's what "Do some collector work so that windows don't build up too much." was supposed to mean. I can spell things out a little more.

> Please put this in a comment as well. 

Sure, I can add a comment to the top of the file.
how will we know to tune the number of incremental gc calls as the project evolves?  Right now 4 calls does a great job in reducing the number of oustanding domwindows, but what if we need 6 calls in the future, would that seem silly?  is there some indication in the job results that would help us determine if we are getting out of balance.
(In reply to Joel Maher ( :jmaher) from comment #15)
> how will we know to tune the number of incremental gc calls as the project
> evolves?  Right now 4 calls does a great job in reducing the number of
> oustanding domwindows, but what if we need 6 calls in the future, would that
> seem silly?

No, 4 is just something that seemed to be a reasonable tradeoff between reducing memory usage and not increasing test time too much. It is possible some other value would be better now or in the future.

> is there some indication in the job results that would help us
> determine if we are getting out of balance.

As described in bug 1296358, the main indication is that we hit OOM crashes. A more minor symptom would be getting timeouts, because if we end up running a huge GC that frees a lot of things during a test, this can take a very long time, and the test will time out.
I updated my patch with more comments. Let me know if that improves things or not.
Comment on attachment 8782554 [details]
Bug 1292218 - Run collectors more in reftests.

https://reviewboard.mozilla.org/r/72678/#review70368

thanks for the update.
Attachment #8782554 - Flags: review?(jmaher) → review+
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0c135a70a81a
Run collectors more in reftests. r=jmaher
https://hg.mozilla.org/mozilla-central/rev/0c135a70a81a
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Depends on: 1297919
Feel free to back this out. I thought I'd checked to make sure it didn't affect the time, but I guess it does. It also didn't fix the problem I was trying to fix.
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/0b638e24de7e - dunno if that makes this reopened, or wontfix.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Thanks, and sorry for the trouble. Terrence landed bug 1296484 which should address the same issue in a different way. Also if something else goes horribly awry with timeouts, maybe that will be to blame. ;)
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → WONTFIX
Just an FYI, we were seeing lots of graphics reftest failures due to OOM with Skia on Windows w/o this patch. If those come back, please let me know. Thanks!
Yep, they're back.
The original skia problem is back due to the collector not being run as aggressively in the reftest suite and bug 1296484 wasn't enough. In bug 1297919, it looks like tests are taking too long to run. Is the proper solution here to take Joel's suggestion and chunk more or is there more GC work to be done? Thanks!
Ah, right, I forgot that my patch actually did do something useful for somebody.
What do those OOM failures look like? The only thing that I see happening on Windows reftests which isn't an already filed bug is failures in writing-mode/abspos/*, which is supposed to be disabled on Win7 by https://hg.mozilla.org/integration/mozilla-inbound/file/tip/layout/reftests/writing-mode/reftest.list#l176 but which your Skia landing somehow caused to start running on Win7 (and to then start failing after this was backed out). If that's what we're talking about, figuring out why they started running and stopping them again seems like a cheaper thing to do than to add 77 minutes per push on WinXP plus the time that this adds, when we have 22 hours of WinXP backlog right now.
And indeed, every Windows reftest failure on mozilla-inbound since I backed this out has been either the longstanding e10s startup hang bug 1267106 or that return of bug 1167155 in tests which are supposed to have been skipped on Win7 since last December.
(In reply to Phil Ringnalda (:philor) from comment #30)
> What do those OOM failures look like? The only thing that I see happening on
> Windows reftests which isn't an already filed bug is failures in
> writing-mode/abspos/*, which is supposed to be disabled on Win7 by
> https://hg.mozilla.org/integration/mozilla-inbound/file/tip/layout/reftests/
> writing-mode/reftest.list#l176 but which your Skia landing somehow caused to
> start running on Win7 (and to then start failing after this was backed out).
> If that's what we're talking about, figuring out why they started running
> and stopping them again seems like a cheaper thing to do than to add 77
> minutes per push on WinXP plus the time that this adds, when we have 22
> hours of WinXP backlog right now.

In the logs, we usually see this -
04:04:02     INFO - REFTEST TEST-LOAD | file:///c:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-024-ref.xht | 13286 / 13783 (96%)
04:04:02     INFO - [GFX1]: Failed 2 buffer db=00000000 dw=00000000 for 0, 0, 800, 1000
04:04:02     INFO - [GFX1]: Failed 2 buffer db=00000000 dw=00000000 for 0, 0, 800, 1000
04:04:02     INFO - REFTEST TEST-PASS | file:///c:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-024.xht == file:///c:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-024-ref.xht | image comparison
04:04:02     INFO - REFTEST TEST-END | file:///c:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-024.xht == file:///c:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-024-ref.xht
04:04:02     INFO - [GFX1]: Failed 2 buffer db=00000000 dw=00000000 for 0, 0, 800, 1000

The GFX failed 2 buffer means we failed to allocate memory for the texture here. I'll try taking a look at seeing why those tests aren't being skipped.
The tests that were actually being skipped were here [1]. The other tests were matching because any other fuzzy-if or condition would override the skip-if. Hoorah reftest syntax! This explicitly disables those tests in the reftest.list itself.

http://searchfox.org/mozilla-central/source/layout/reftests/writing-mode/abspos/reftest.list#104
Attachment #8786501 - Flags: review?(philringnalda)
Comment on attachment 8786501 [details] [diff] [review]
Skip abs-pos-non-replaced tests on Windows 7

Ah. Apparently that's actually a bug (when fuzzy-if was added it both should have been documented in https://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/README.txt#54 and shouldn't have been implemented to beat out skip), but in the meantime this should work around it.

r=me with a commit message saying it's for bug 1167155 rather than this one.
Attachment #8786501 - Flags: review?(philringnalda) → review+
See Also: → 1300355
You need to log in before you can comment on or make changes to this bug.