[META] Turn on vsync-aligned refresh driver by default on b2g

RESOLVED FIXED in Firefox 39

Status

()

defect
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: jerry, Assigned: mchang)

Tracking

(Depends on 1 bug)

unspecified
mozilla39
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox37 wontfix, firefox38 wontfix, firefox39 fixed, b2g-v2.2 fixed, b2g-master fixed)

Details

Attachments

(10 attachments, 1 obsolete attachment)

We would like to turn on vsync-aligned refresh driver by default on b2g.

Here are the prefs in b2g/app/b2g.js
pref("gfx.vsync.hw-vsync.enabled", true);
pref("gfx.vsync.refreshdriver", true);

track any blocking bugs here.
Component: Widget: Gonk → Graphics
Blocks: 1125030
No longer blocks: 1125030
Depends on: 1125030
Depends on: 1125032
Depends on: 1125999
Depends on: 1102200, 1094760
Will land in two weeks - Passing try - https://treeherder.mozilla.org/#/jobs?repo=try&revision=f63c2dd48c42.
Attachment #8557377 - Flags: review?(bugmail.mozilla)
Comment on attachment 8557377 [details] [diff] [review]
Enable vsync refresh driver on b2g

Review of attachment 8557377 [details] [diff] [review]:
-----------------------------------------------------------------

The patch is just duplicating prefs that are already there two lines up.
Attachment #8557377 - Flags: review?(bugmail.mozilla) → review-
Ahh fail :/. Try again.
Assignee: hshih → mchang
Attachment #8557377 - Attachment is obsolete: true
Attachment #8557494 - Flags: review?(bugmail.mozilla)
Nice reduction in composite times all around. Sometimes see ~3-5 ms delays from vsync -> refresh driving ticking.
Looking through various profiles, including the attached ones as well as browsing the web + edge swipe detection, nothing stands out. As expected, composite times are shorter in many cases, refresh driver ticks are about the same amount of time as expected. The smoothness impact of the refresh driver isn't as high as touch resampling / compositing, which is also expected.

No major bugs. I could browse the web, do edge swipes, scroll, add contacts, pull down the notification bar, watch a video, watch youtube etc. Looking good!
This is the last part of Silk. Will land this next week if there are no Silk related fires that come up from the compositor / touch resampling.
Flags: needinfo?(npark)
Since it didn't look like they are user prefs, I modified b2g.js and rebuilt gecko to try it out.  I did not find any particular issues with the prefs turned on.
Flags: needinfo?(npark)
(In reply to No-Jun Park [:njpark] from comment #12)
> Since it didn't look like they are user prefs, I modified b2g.js and rebuilt
> gecko to try it out.  I did not find any particular issues with the prefs
> turned on.

Thanks for testing! They are still user prefs, but they are hidden since they aren't enabled anywhere yet. If you just manually push the pref, it should work.
Depends on: 1078184
Blocks: 1130678
Depends on: 1130680
Depends on: 1130681
From https://treeherder.mozilla.org/#/jobs?repo=try&revision=beb856c5c40a 

The reftest failures seem to have calmed down a bit, not sure it will stick though.
Note to sheriffs, if you see dom/broadcastchannel/tests/test_broadcastchannel_sharedWorker.html intermittent, it is bug 1130678 with a fix on mozilla-inbound.
sorry had to back this out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=1316833&repo=b2g-inbound - somehow this patch made the r14 test failures a little worse :(
https://hg.mozilla.org/mozilla-central/rev/0e2e7780755b
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Depends on: 1133786
See Also: → 1076820
Backed out for causing bug 1133786 and bug 1133865.
https://hg.mozilla.org/mozilla-central/rev/8f0354b995ea
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla38 → ---
Retriggers also point to this as the reason for bug 1123762 spiking like crazy. B2G R4 and R15 are much greener post-backout.
(mis-pasted bug number -- RyanVM really meant that this seemed to cause *bug 1019840* to spike like crazy.)
Depends on: 1134459
Depends on: 1019840
New try push from parent 993eb76a8bd6

New push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4ab87483cc5a

Both 1019840, 1134459 were bad tests / problems in layout. Will try again if these pushes look good.
Gah... Mochitest 9 seems perma-red although on b2g-inbound right now it's failing quite often. B2g ics debug 8 looks almost perma-orange, but it looks like it already spiked due to something else - https://bugzilla.mozilla.org/show_bug.cgi?id=949971#c65
Try with an updated m-c based on 0189941a3fd5
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4fd8a177b897

Also a try with the patch in bug 949971 to see if it fixes ICS debug 8 errors
https://treeherder.mozilla.org/#/jobs?repo=try&revision=28c126e05fc1
Depends on: 949971
The patch from bug 949971 seems to fix the ICS debug 8 errors. I spoke with Ryan on irc about Mochitest 9, which is a known problem. The ICS Opt C1 looks new, but seems to have recently spiked and is pretty orange/red on b2g-inbound. Will try to land this once the trees open.
https://hg.mozilla.org/mozilla-central/rev/afd91b997c2e
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Hi Mason, this may be causing another set of regression.  

We highly encourage before landing in inbound that you work with QA and let us help test before landing in inbound/MC for high risk patches.
Flags: needinfo?(mchang)
See Also: → 1140978
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #31)
> Hi Mason, this may be causing another set of regression.  
> 
> We highly encourage before landing in inbound that you work with QA and let
> us help test before landing in inbound/MC for high risk patches.

No-Jun helped verify this with https://bugzilla.mozilla.org/show_bug.cgi?id=1123762#c12
Flags: needinfo?(mchang)
I did run the graphic sanity manual test suite after building the custom gecko with it, and did not see any crashes at the time. I'll check out what I need to add to the test if it turns out the fix for this bug is the culprit for the smoketest failure.
Backed out for smoketest failures: https://hg.mozilla.org/integration/b2g-inbound/rev/a75d14a0de35

See bug 1140978
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
https://hg.mozilla.org/mozilla-central/rev/a75d14a0de35
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Jayme, please check if this has been resolved.
Flags: needinfo?(jmercado)
Keywords: qawanted
Using a build provided by Mason I was unable to reproduce the crash issues seen before manually.  I was unable to run automation, which reproduced more consistently because the phone and terminal would freeze up when attempted.
Flags: needinfo?(jmercado) → needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
I didn't crash; I ended up getting a blackscreen trying to run marionette as well as when  trying to run the crystal skull.  I think the crystal skull might be a separate issue.

Having said that, I think python-marionette breaking would be a major issue.
Flags: needinfo?(mchang)
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #39)
> I didn't crash; I ended up getting a blackscreen trying to run marionette as
> well as when  trying to run the crystal skull.  I think the crystal skull
> might be a separate issue.
> 
> Having said that, I think python-marionette breaking would be a major issue.

Hmm, that seems odd. I wonder how much of that is just because I'm not building it the same way as nightly. Can you try with a fresh nightly and just enable the pref like you would any other pref? Flip the pref gfx.vsync.refreshdriver to true.

Or maybe this this cedar build? https://treeherder.mozilla.org/#/jobs?repo=cedar&revision=75173caa99c2 Not sure I did that right. Thanks!
Flags: needinfo?(mchang) → needinfo?(nhirata.bugzilla)
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #39)
> I didn't crash; I ended up getting a blackscreen trying to run marionette as
> well as when  trying to run the crystal skull.  I think the crystal skull
> might be a separate issue.
> 
> Having said that, I think python-marionette breaking would be a major issue.

OOH, I also forgot. Do nightly builds already set the developer option in settings to adb and devtools? I think my build by default sets that to disabled. I could not run the clock_test and hit the black screen until I make reset-gaia. After I enable Debugging via USB in the settings app, the tests started running.
Jayme, can you please go into settings and enable ADB debugging in the settings app, then try running the tests again with that custom build? Thanks!
Flags: needinfo?(jmercado)
Testing locally on an updated to build to see if these still works.
Flags: needinfo?(nhirata.bugzilla)
Flags: needinfo?(jmercado)
So even without enabling the vsync refresh driver, I get this error with a local build while trying to run the clock marionette test. This is a fresh repo sync on today's gaia and gecko. If I just flash the device then run gaiatest, I get a black screen and can't get up and running again until I flash again or make reset-gaia.

If I use PVT builds, without enabling the vsync refresh driver, the tests work. I reflashed again, manually enabled the preference with edit-prefs.sh, then ran the tests, the tests worked again. I think the black screen that QA is experiencing must be something with my local environment / build that differs from nightly builds.
Interestingly, if I flash my build, then do a make reset-gaia, everything works as expected. The tests run, the phone comes up, and the tests pass. 

@nhirata - Can you try that? Use the custom build I sent you, do a make reset-gaia on top of that (preferably Gaia nightly from Monday, March 9th), then run the tests? 

Alternatively, we can wait until we fix bug 1139090 and then try a nightly?
Flags: needinfo?(nhirata.bugzilla)
I ran this adb forward tcp:2828 tcp:2828 && gaiatest --testvars=testvars.json --address=localhost:2828 --timeout=30000 --type=B2G ./tests/python/gaia-ui-tests/gaiatest/tests/functional/clock/test_clock_create_new_alarm.py --restart --repeat=50, which tests the python test 50 times with the refresh driver enabled + a temp fix from bug 1139090 and it passed without any crashes. This was after doing a full flash then a make reset gaia due to the python unit test issues in comment 44.
(In reply to Mason Chang [:mchang] from comment #46)
> I ran this adb forward tcp:2828 tcp:2828 && gaiatest
> --testvars=testvars.json --address=localhost:2828 --timeout=30000 --type=B2G
> ./tests/python/gaia-ui-tests/gaiatest/tests/functional/clock/
> test_clock_create_new_alarm.py --restart --repeat=50, which tests the python
> test 50 times with the refresh driver enabled + a temp fix from bug 1139090
> and it passed without any crashes. This was after doing a full flash then a
> make reset gaia due to the python unit test issues in comment 44.

I can't run the test with today's gecko and gaia even with clean build from repo(no vsync-refresh driver). When the test run for several steps, I got:

  File "/usr/local/lib/python2.7/site-packages/marionette/marionette_test.py", line 267, in run
    testMethod()
  File "/Users/bignose/work/mozilla/gaia/gaia_bignose/tests/python/gaia-ui-tests/gaiatest/tests/functional/clock/test_clock_create_new_alarm.py", line 63, in test_clock_create_new_alarm
    self.assertEqual(alarms[0].label, alarm_label_text)
  File "/usr/local/lib/python2.7/site-packages/gaiatest/apps/clock/app.py", line 70, in label
    return self.root_element.find_element(*self._label_locator).text
  File "/usr/local/lib/python2.7/site-packages/marionette/marionette.py", line 58, in find_element
    return self.marionette.find_element(method, target, self.id)
  File "/usr/local/lib/python2.7/site-packages/marionette/marionette.py", line 1313, in find_element
    response = self._send_message('findElement', 'value', **kwargs)
  File "/usr/local/lib/python2.7/site-packages/marionette/decorators.py", line 36, in _
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/marionette/marionette.py", line 630, in _send_message
    self._handle_error(response)
  File "/usr/local/lib/python2.7/site-packages/marionette/marionette.py", line 678, in _handle_error
    raise errors.JavascriptException(message=message, status=status, stacktrace=stacktrace)

I use gdb to attach the clock app during testing. The clock is closed by the SIGKILL signal, and it seems to be killed when the message occurred. I think that's sent by marionette. I haven't traced the marionette script yet.
Hrm, not sure.  I was able to run it off of Mason's build and a fresh gaia.  Mason, can you update everything and then create a build with the patch?  I'm not sure everything was quite up to date.  I don't want to try running on a franken build either as there might be some fall outs like what Jerry mentioned.
Flags: needinfo?(nhirata.bugzilla) → needinfo?(mchang)
Posted patch refresh.patchSplinter Review
Hey Naoki, if you could please try this patch locally and run your tests, I'd greatly appreciate it. Thanks!

This enables the refresh driver plus the proposed fix in bug 1142935
Flags: needinfo?(mchang) → needinfo?(nhirata.bugzilla)
Now that bug 1142935 has landed, waiting on QA approval before trying to land again.
Since bug 1142935 has landed and should be in nightly pvt builds, can you please set the preference "gfx.vsync.refreshdriver" to true and test this again? Thanks!
Flags: needinfo?(jmercado)
Adding qawanted to track this task and working on it now.
Flags: needinfo?(jmercado)
Keywords: qawanted
I ran the gaiatest/tests/functional/browser/test_browser_navigation.py test 30 times and did not crash.  I also tested manually for about an hour with no crash.  Naoki had mentioned to me about crashing a couple of times after a reboot of the phone while testing manually, but I didn't see that with restarting 8 times throughout the hour of manual testing.

Environmental Variables:
Device: Flame 3.0 KK (319MB) (Full Flash)
BuildID: 20150318055750
Gaia: b8051d370ddf4e5bd8e7d8a19fb9eeb5fd6ffb39
Gecko: 41a61514461e
Gonk: b83fc73de7b64594cd74b33e498bf08332b5d87b
Version: 39.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: qawanted
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
https://hg.mozilla.org/mozilla-central/rev/b4ff5f067537
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Performed manual and automated testing on today's build of b2g v2.2 for Flame-KK after turning on the vsync-aligned refresh driver.

Expected Results
b2g should not crash during testing.

Actual Results
1. b2g is sluggish but stable and does not crash during testing.
2. No Crash Reports exist on the Flame device in the directory /data/b2g/mozilla/Crash\ Reports/.


Manual Testing
1. Executed all 9 test cases in the Graphics Sanity Test Suite for b2g v2.2
https://moztrap.mozilla.org/manage/cases/?filter-suite=669 
Actual Results: Screen transitions appeared more sluggish than normal, but without any visual artifacts.

2. Installed the Loop client app using WebIDE and executed the following tests for Loop twice: connect, chat and disconnect. 
Actual Results: Screen transitions appeared more sluggish than normal, but without any visual artifacts.

3. Played the Marketplace game 'Cut The Rope' till Level 2. 
https://marketplace.firefox.com/app/cut-the-rope
Actual Results: Verified that its landscape orientation was preserved when switching periodically to Card View and back. Screen transitions appeared more sluggish than normal, but without any visual artifacts.

Automated Testing 
Executed the Browser Navigation test script 30 times on the Flame device.
https://wiki.mozilla.org/B2G/QA/Smoke_Tests#How_to_run_test_automation_locally
adb forward tcp:2828 tcp:2828 && gaiatest --testvars=gaiatest/testvars.json --address=localhost:2828 --timeout=30000 gaiatest/tests/functional/browser/test_browser_navigation.py  --restart --repeat=30
Actual Results: No failures were reported in the automated test execution results. The execution time for roughly 2 seconds per iteration.

*~*~*~*~*~*~*

Test Environment
1. Flashed the base build v18D available on MDN:
https://developer.mozilla.org/en-US/Firefox_OS/Phone_guide/Flame/Updating_your_Flame#Up-to-date_%28use_these_unless_you_have_a_good_reason_not_to%29
http://cds.w5v8t3u9.hwcdn.net/v18D.zip

2. Full flashed the latest build using the Taiwan QA flashing tool: (Device: flame-kk | Branch: mozilla-b2g37_v2_2 | Build Type: Engineer | Gecko/Gaia/Full: images | Build ID: latest).
Gaia-Rev        44c62060581fde8de1e12e94cf55e9673b401a47
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/0f1dbb995e1a
Build-ID        20150320162503
Version         37.0
Device-Name     flame
FW-Release      4.4.2
FW-Incremental  eng.cltbld.20150320.195314
FW-Date         Fri Mar 20 19:53:28 EDT 2015
Bootloader      L1TC000118D0

3. Turned on the vsync-aligned refresh driver by default on b2g by setting the prefs in b2g/omni.ja/defaults/pref/b2g.js.
pref("gfx.vsync.refreshdriver", true);
(Thanks to a bash script written by Naoki Hirata [:nhirata], attached here as change_gfx_refresh_vsync.sh.)

*~*~*~*~*~*~*

Steps to install the Loop client app using WebIDE
https://developer.mozilla.org/en-US/docs/Tools/WebIDE
1. On your Mac / Linux machine, execute the following commands:
$ mkdir loop
$ cd loop
$ git clone https://github.com/mozilla-b2g/firefoxos-loop-client.git
$ make all
2. Connect your b2g phone to your Mac / Linux machine and enable remote debugging from Settings > Developer.
3. On your Mac / Linux machine, launch Firefox browser and go to Tools > Web Developer > WebIDE.
4. In WebIDE, go to Open Packaged App and navigate to the loop/firefoxos-loop-client/app folder. Click on Open.
5. In WebIDE, click on the Select Runtime dropdown and click on Firefox OS (flame). On the Flame device, tap on the OK button to allow incoming connection.
6. In WebIDE, click on the Play button to install Firefox Hello on your Flame and then launch it.

Steps to execute the following tests for Loop: connect, chat and disconnect. 
https://wiki.mozilla.org/Loop/Try_Loop
1. On the Flame device, swipe left through the introductory screens.
2. Tap on Use Firefox Accounts button.
3. Sign in to Firefox Accounts with your email address and password.
4. Allow Firefox Hello access to your contact list.
5. Tap on the button to create a room. Give it a name and save it. Note the room URL on the Room detail page. Click on the back button.
6. On your Mac / Linux machine, click on the Hello icon in the Firefox browser toolbar.
7. Click on the Get Started button.
8. Click on the gear icon and then click on the Sign In option. Login to Firefox Accounts with the same credentials as above.
9. The room created on the Flame device will appear. Click on its URL to connect.
10. On the Flame device, tap on the room name and select the camera you want to use for this room. Grant Firefox Hello permission to share the camera and microphone. The message 'You are alone in this room' will appear.
11. On your Mac / Linux machine, the Hello icon in the Firefox browser toolbar turns blue. Copy the room link.
12. On your Mac / Linux machine, open another Firefox window (not tab) and paste the room link in the address bar. Hit enter. In Firefox Hello, click on Join the conversation. Grant Firefox Hello permission to share the camera and microphone.
13. On the Flame device, the message 'You are alone in this room' disappears and a guest is shown. A timer starts counting the call time.
14. Move the Flame device so that its camera captures different images. The same image should show in your Firefox browser. Speak aloud so that the Flame device's microphone captures sound. The same should come out of the speakers on your Mac / Linux machine.
15. On the Flame device, tap on the red button to disconnect the call.
16. On your Mac / Linux machine, the message 'You're the first one here.' appears.
Smoke test passed today; we're letting it bake on MC before landing on 2.2
Clearing NI from me as QA finished the testing and giving this a green light to land.
Flags: needinfo?(nhirata.bugzilla)
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Project Silk
User impact if declined: Jankier animations and performance.
Testing completed: Manual testing, mochitests, running on master on OSX + b2g, testing in comment 56, manual testing by nhirata and JMercado.
Risk to taking this patch (and alternatives if risky): High, this changes when we tick the refresh driver, which controls when we do requestAnimationFrames and paint. 
String or UUID changes made by this patch: None
Attachment #8582564 - Flags: review+
Attachment #8582564 - Flags: approval-mozilla-b2g37?
Attachment #8582564 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Depends on: 1053902
You need to log in before you can comment on or make changes to this bug.