Closed Bug 1525314 Opened 5 years ago Closed 5 years ago

Get reftests and crashtests running on geckoview-qr

Categories

(Core :: Graphics: WebRender, enhancement, P3)

Other Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: kats, Assigned: kats)

References

(Depends on 6 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted][wr-amvp][wr-q2])

Attachments

(6 files)

We should get reftests running in automation for GeckoView with WebRender enabled. This bug tracks that work (will likely turn into a metabug)

Link to a recent try run of reftests on GeckoView by gbrown (for my future reference): https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=a50af786c7dfe77cb535e2ab698c4efde81f23e5

Priority: -- → P3
Assignee: nobody → kats

I tried adding geckoview QR jobs:

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=a9309dc71b6a228434f270eee3441888cd516f3a

Looks like the emulator/AVD that we're using doesn't support GL ES 3.0 so we either need to upgrade that or find some other solution.

Specifically in the logcat I see this output:

04-03 15:30:57.533  1046  1046 I SurfaceFlinger: OpenGL ES informations:
04-03 15:30:57.533  1046  1046 I SurfaceFlinger: vendor    : Google (Google Inc.)
04-03 15:30:57.533  1046  1046 I SurfaceFlinger: renderer  : Android Emulator OpenGL ES Translator (Google SwiftShader)
04-03 15:30:57.533  1046  1046 I SurfaceFlinger: version   : OpenGL ES 2.0 (OpenGL ES 3.0 SwiftShader 4.0.0.1)
04-03 15:30:57.533  1046  1046 I SurfaceFlinger: extensions: GL_EXT_debug_marker GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_depth24 GL_OES_depth32 GL_OES_element_index_uint GL_OES_texture_float GL_OES_texture_float_linear GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth_texture GL_OES_texture_half_float GL_OES_texture_half_float_linear GL_OES_packed_depth_stencil GL_OES_standard_derivatives GL_OES_texture_npot GL_OES_rgb8_rgba8 ANDROID_EMU_CHECKSUM_HELPER_v1 GL_OES_vertex_array_object ANDROID_EMU_gles_max_version_2 
04-03 15:30:57.533  1046  1046 I SurfaceFlinger: GL_MAX_TEXTURE_SIZE = 8192
04-03 15:30:57.533  1046  1046 I SurfaceFlinger: GL_MAX_VIEWPORT_DIMS = 8192

and then when we start Gecko we get this:

04-03 15:32:01.680  2451  2467 D EGL_emulation: eglCreateContext: 0xe1584440: maj 2 min 0 rcv 2
04-03 15:32:01.680  2451  2467 D EGL_emulation: eglMakeCurrent: 0xe1584440: ver 2 0 (tinfo 0xe1593d70)
04-03 15:32:01.690  2451  2500 I Gecko   : [GFX1-]: Failed to create EGLConfig!
04-03 15:32:01.690  2451  2500 I Gecko   : [GFX1-]: Failed GL context creation for WebRender: 0x0
04-03 15:32:01.690  2451  2500 I Gecko   : [GFX1-]: Failed to create EGLConfig!
04-03 15:32:01.690  2451  2500 I Gecko   : [GFX1-]: Failed GL context creation for WebRender: 0x0
04-03 15:32:01.890  2451  2467 I Gecko   : 1554301921890	Marionette	TRACE	Received observer notification command-line-startup
04-03 15:32:01.900  2451  2467 W ResourceType: Too many attribute references, stopped at: 0x01010099
04-03 15:32:01.910  2451  2500 D         : HostConnection::get() New Host Connection established 0xcafc66c0, tid 2500
04-03 15:32:01.920  2451  2500 E EGL_emulation: eglCreateContext: EGL_BAD_CONFIG: no ES 3 support
04-03 15:32:01.920  2451  2500 E EGL_emulation: tid 2500: eglCreateContext(1404): error 0x3005 (EGL_BAD_CONFIG)
04-03 15:32:01.920  2451  2500 I Gecko   : [GFX1-]: Failed to create EGLContext!: 0x3005
04-03 15:32:01.920  2451  2500 I Gecko   : [GFX1-]: Failed GL context creation for WebRender: 0x0
04-03 15:32:01.920  2451  2500 I Gecko   : [GFX1-]: Failed to get shared GL context
04-03 15:32:01.920  2451  2500 E EGL_emulation: eglCreateContext: EGL_BAD_CONFIG: no ES 3 support

:aerickson -- Can you investigate? I found, but did not verify, this tip: https://stackoverflow.com/questions/40797975/android-emulator-and-opengl-es3-egl-bad-config. I would think the existing emulator and avd would not need an upgrade for this, but who knows?

See Also: → geckoview_reftests
Flags: needinfo?(aerickson)

(In reply to Geoff Brown [:gbrown] from comment #4)

I would think the existing emulator and avd would not need an upgrade for this

Actually, our deployed emulator is version 27.3.10, and https://developer.android.com/studio/releases/emulator indicates there were egl improvements in version 28.0.16, so maybe an emulator (sdk) update is the first thing to try.

Is this something I can try? It looks like we get the emulator and AVDs out of tooltool but I'm not sure how to go about testing an updated version.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)

Is this something I can try?

Without hacking to get around tooltool, you'd need to create an updated emulator/sdk archive, upload to tooltool, then update the manifest in a try push. Maybe best to leave it to Andrew?

Depends on: 1541955

Yes, I will investigate. I've created Bug 1541955 for tracking.

Flags: needinfo?(aerickson)

I tried using the patch on bug 1541955 and I get the same results. I wonder if there's something else we need to do (e.g. passing additional flags to the emulator) to enable ES 3. I'll try experimenting locally.

Even with emulator 28.0.23 installed locally, it looks like I still need to add GLESDynamicVersion = on to the ~/.android/advancedFeatures.ini file in order to get GL ES 3. According to this thread they are whitelisting host GPUs and so presumably my host GPU (and whatever we're using in automation) is not whitelisted.

With the GLESDynamicVersion thing added in automation via bug 1541955, the reftests are running, but a lot of things are failing. Including some sanity tests and such. So there's some work to do to investigate why that's happening. A lot of stuff is rendering with a black background on all/most of the page, and I'm not entirely sure why.

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=34bc5f975936c9308a4819796db93c3e23036883

Whiteboard: [gfx-noted] → [gfx-noted][wr-amvp][wr-q2]

Quick update: I'm still seeing a lot of black stuff when running in the emulator. Getting a WR capture (using a x86_64 android build, because of bug 1546516) didn't show the problem - the capture rendered fine on desktop. Which is not totally surprising, but it eliminates the geckoview test app as a source of the problem. More likely the problem is in WR or the GLES implementation in the emulator.

To narrow this down I tried running reftests on a Pixel 2 device (I had a detour to root it). All the snapshots were coming out blank, for which I filed bug 1547097.

I got the reftests running on a Pixel 2 device, but was seeing intermittent failures from nondeterminism somewhere in the pipeline. I was just running the reftest-sanity suite, and getting ~15 failures, mostly to do with text rendering. I looked at one of the simpler ones and found that it would fail intermittently, even if I ran it as == div.html div.html which should always render exactly the same. This obviously points to nondeterminism in the rendering, but I wasn't sure if it was in WR code or in the pixel 2 graphics driver/stack.

After that I tried to get the wrench reftests running on the emulator in the hopes that it would help narrow down problems. After some fiddling I got those working. I ran into bug 1547833 which is easy to work around for now, but am also running into a problem where tex_sub_image_3d_pbo is returning a GL error 0x502. This affects multiple tests in the wrench/reftests/image/ directory. I tried an emulator image based on Android 9 in the hopes that a newer implementation wouldn't have this problem but it still did.

Other than the tex_sub_image_3d_pbo problem I ran into bug 1548092, bug 1548099, bug 1548131, and another assertion failure due to a reftest being too wide for the pixel 2 screen dimensions that I'm running with. There were also a bunch of test failures that I haven't yet looked at.

There were also a bunch of test failures that I haven't yet looked at.

I suspect a lot of these are because I'm not running in headless mode. Doing that involves cross-compiling osmesa for android which I'll probably have to do eventually but it was really painful with macOS so I'm procrastinating having to tackle that.

Depends on: 1549776
See Also: 1549776
No longer blocks: wr-android-mvp

Unwinding the stack here a bit...

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)

I suspect a lot of these are because I'm not running in headless mode. Doing that involves cross-compiling osmesa for android which I'll probably have to do eventually but it was really painful with macOS so I'm procrastinating having to tackle that.

We decided not to do this, and instead just annotate the failures. I have wrench reftests running in CI now on the emulator, and patches are up in bug 1555479 to run them on a pixel 2 device in CI as well. So far I have NOT seen any evidence of nondeterminism in the results.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #13)

I got the reftests running on a Pixel 2 device, but was seeing intermittent failures from nondeterminism somewhere in the pipeline.

I'll try this again and see if I can still reproduce the nondeterminism. If there's no nondeterminism in the wrench reftests, but there is in the gecko reftests, then that's going to be tricky to deal with. I already verified that the display list emitted by gecko for the intermittent reftests are deterministic, so the nondeterminism must be coming from some sort of complex interaction between different parts.

Another update: we now have wrench reftests running in CI on both emulator and device (pixel 2 running Android 8). So far no evidence of nondeterminism there. I also just pushed the land button for gecko reftests on non-WR geckoview in the emulator (in bug 1501582) which will serve as a baseline for the corresponding WR-enabled reftests. I did find a bunch of nondeterminism and I don't know which component is producing that.

Anyway, I'll do another try run with gecko reftests on WR-enabled geckoview and see what it looks like now.

Depends on: 1559957
Depends on: 1559958

The nondeterminism when running on a Pixel2 is quite troublesome. I've done a few rounds of annotations and try pushes and I'm still getting lots of fuzzy failures: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=252314928&revision=338fba58c190d5ae04d4c2c89b1aea9ffe80786d

I tried modifying the reftest harness for the webrender && geckoview case to just eat maxDifference values of 1, and that seems to work better. Instead of annotating a bazillion tests that are constantly shifting I just need to annotate a much smaller mostly-constant set.

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=f52cc2c0241aadfacf7aae7de150f80b62440a5d is looking much better. Still a few random intermittents but mostly now just hitting the crasher bugs that are marked as deps of this one.

Depends on: 1560367

Latest try push is at https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&tier=1%2C2%2C3&revision=d4bc947ea7f3291625cdb029c6c893c1537af929 and has the patches rebased on top of bug 1558598. I'm tempted to increase the autofuzz from 1 to 2 because I'm still getting a trickle of intermittents with maxDifference=2. Anyway I'll wait for some of the dependencies to land while I try and debug the crash in bug 1560367.

Depends on: 1563013
Summary: Get reftests running on geckoview-qr → Get reftests and crashtests running on geckoview-qr
Depends on: 1563020
Depends on: 1563214

Due to the sheer number of tests that exhibit a random fuzz with maxDifference=1
and maxDifference=2 with WR on Android, it's easier to just tweak the harness
to autofuzz these away. This adds machinery to do so, and also adds a new
annotation that can be used to disable the autofuzzing on specific tests.

Depends on D36794

Only enabled on try/m-c as tier-2 for now, per email discussion, to minimize
load on bitbar Pixel 2 devices.

Depends on D36799

Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/28f52fd3934e
Disable tile markers on Android as they seem to crash a lot. r=gw
https://hg.mozilla.org/integration/autoland/rev/95790a07a93c
Auto-fuzz WR on Android with maxDifference<=2. r=jmaher
https://hg.mozilla.org/integration/autoland/rev/02399933ac4b
Skip assertion intermittently failing on Android. r=aosmond
https://hg.mozilla.org/integration/autoland/rev/ab21a3ff4ae4
Update reftest annotations for WebRender on GeckoView. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/af72d1c4c107
Disable tests that crash. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/0ed2509b7191
Run gecko reftests for WebRender on pixel 2. r=gbrown

I had a typo, webrender&&!webrender instead of webrender&&!geckoview. Whoops.

Flags: needinfo?(kats)
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7032c413182e
Disable tile markers on Android as they seem to crash a lot. r=gw
https://hg.mozilla.org/integration/autoland/rev/4f43e8655fef
Auto-fuzz WR on Android with maxDifference<=2. r=jmaher
https://hg.mozilla.org/integration/autoland/rev/b9b49a1f5e97
Skip assertion intermittently failing on Android. r=aosmond
https://hg.mozilla.org/integration/autoland/rev/065c8eee9249
Update reftest annotations for WebRender on GeckoView. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/4c912cace666
Disable tests that crash. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/a1666a9348ce
Run gecko reftests for WebRender on pixel 2. r=gbrown
Depends on: 1563737
See Also: → 1590805
You need to log in before you can comment on or make changes to this bug.