Closed Bug 1369706 Opened 4 years ago Closed 3 years ago

qr Crash [@ swrast_dri.so@0x6b143a]

Categories

(Core :: Graphics: WebRender, defect, P3)

55 Branch
x86_64
Linux
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- disabled
firefox57 --- disabled
firefox58 --- disabled
firefox59 --- disabled
firefox60 --- ?

People

(Reporter: bc, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [investigation waiting for WebRender to be enabled])

Crash Data

Attachments

(3 files)

Attached file crash report
1. export MOZ_WEBRENDER=1
2. http://www.sccb.ac.uk/courses/business-finance
3. Crash
 debug

Thread 17 (crashed)
 0  swrast_dri.so + 0x6b143a
    rax = 0xe5e5e5e5e5e5e5e5   rdx = 0x00007f881271bc20
    rcx = 0x0000000000000e00   rbx = 0x0000000000000000
    rsi = 0x0000000000000001   rdi = 0x00007f8818679000
    rbp = 0x0000000000000000   rsp = 0x00007f882115ba20
     r8 = 0x000000000000000f    r9 = 0x000000000000002b
    r10 = 0x0000000000000011   r11 = 0x0000000000000400
    r12 = 0x00007f88186765e0   r13 = 0x00007f88186ebf80
    r14 = 0x00007f8818679000   r15 = 0x00007f88186ebfa8
    rip = 0x00007f8817f7d43a
    Found by: given as instruction pointer in context
 1  swrast_dri.so + 0x6b171b
    rsp = 0x00007f882115ba70   rip = 0x00007f8817f7d71b
    Found by: stack scanning
 2  swrast_dri.so + 0x6b317e
    rsp = 0x00007f882115baa0   rip = 0x00007f8817f7f17e
    Found by: stack scanning
 3  swrast_dri.so + 0x6a52e8
    rsp = 0x00007f882115bae0   rip = 0x00007f8817f712e8
    Found by: stack scanning
 4  swrast_dri.so + 0x1d574b
    rsp = 0x00007f882115bb00   rip = 0x00007f8817aa174b
    Found by: stack scanning
 5  libxul.so!webrender::device::GpuFrameProfile<webrender::profiler::GpuProfileTag>::add_marker_gl<webrender::profiler::GpuProfileTag> [device.rs:96b243f22677 : 556 + 0xb]
    rsp = 0x00007f882115bb20   rip = 0x00007f8834359c58
    Found by: stack scanning

opt

Thread 18 (crashed)
 0  swrast_dri.so + 0x6b143a
    rax = 0x0000000100000002   rdx = 0x00007fc812a20220
    rcx = 0x000000000000d400   rbx = 0x0000000000000000
    rsi = 0x0000000000000001   rdi = 0x00007fc813002000
    rbp = 0x0000000000000001   rsp = 0x00007fc82155bad0
     r8 = 0x000000000000000f    r9 = 0x00007fc83ff01b68
    r10 = 0x00007fc812a00360   r11 = 0x0000000000000400
    r12 = 0x00007fc819eb55e0   r13 = 0x00007fc804f2ff80
    r14 = 0x00007fc813002000   r15 = 0x00007fc8130737c0
    rip = 0x00007fc81829743a
    Found by: given as instruction pointer in context
 1  swrast_dri.so + 0x6b171b
    rsp = 0x00007fc82155bb20   rip = 0x00007fc81829771b
    Found by: stack scanning
 2  swrast_dri.so + 0x6b317e
    rsp = 0x00007fc82155bb50   rip = 0x00007fc81829917e
    Found by: stack scanning
 3  swrast_dri.so + 0x6a52e8
    rsp = 0x00007fc82155bb90   rip = 0x00007fc81828b2e8
    Found by: stack scanning
 4  swrast_dri.so + 0x1d574b
    rsp = 0x00007fc82155bbb0   rip = 0x00007fc817dbb74b
    Found by: stack scanning
 5  libxul.so!webrender::device::GpuProfiler<webrender::profiler::GpuProfileTag>::add_marker<webrender::profiler::GpuProfileTag> [device.rs:96b243f22677 : 556 + 0xb]
    rsp = 0x00007fc82155bbd0   rip = 0x00007fc8340d9e27
    Found by: stack scanning

ss due to rax = 0xe5e5e5e5e5e5e5e5

top crashes in bughunter with over 170 urls in one day.

See also https://crash-stats.mozilla.com/search/?signature=~swrast_dr&date=%3E%3D2017-05-25T19%3A28%3A00.000Z&date=%3C2017-06-01T19%3A28%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

https://bugzilla.mozilla.org/show_bug.cgi?id=1186668
See Also: → 899802
For clarity, this was on Ubuntu which is the only platform I can test qr builds and was only reproducible on qr builds not the normal builds.
Group: core-security → layout-core-security
Adding some cc love since you may not have been able to see this.
(In reply to Bob Clary [:bc:] from comment #2)
> This isn't *quite* 100% qr. The counts so far today are 242 urls and crash
> counts:
> 
> nightly debug:      Linux 4.4.0 x86 64/64 2   
> nightly debug-qr:   Linux 4.4.0 x86 64/64 267   
> nightly opt:        Linux 4.4.0 x86 64/64 1   
> nightly opt-qr:     Linux 4.4.0 x86 64/64 259   
> 
> so, overwhelmingly qr but there is a hint of regular builds in there as well.

Just to be clear, these are all running inside bughunter? i.e. the non-qr builds definitely don't have qr enabled? (as opposed to this being data from crash-stats where people might have turned on webrender and are running it in the wild).
Component: Canvas: WebGL → Graphics: WebRender
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
These are all inside bughunter. Crap! I was trying to reproduce a crash on one of the vms and set MOZ_WEBRENDER in the terminal and forgot to either unset it or go back to the original terminal were it wasn't set. I've fixed that now.

So, ignore comment 2. Those were with webrender as well. Sorry.
I fixed the issue in the database and it no longer contains improperly marked qr tests so my mistake in comment 2 will not persist.
I wasn't able to reproduce by running a m-c build with MOZ_WEBRENDER=1 and using the URL provided. The page seems to load fine for me. On Ubuntu 16.04.2 LTS, with WebRender reporting OpenGL version 3.3 (Core Profile) Mesa 12.0.6 as the GL stack. Default options, so HW acceleration is disabled.
So far Bughunter has hit this 1202 times on 508 urls. I tried to reproduce manually with the top url but also failed. The last crash we saw was last night when the current set of urls was completed. I'll retest the urls with today's builds and see how reproducible it is now. Attaching the about:support for one of the ubuntu vms.

One thing that stands out in the urls is they all seem to have a very long query string with escaped bytes.
I don't think the URL ever makes it near the WebRender code, so if all the crashes have that as a common factor, it might point to a problem in the URL parsing code that's corrupting something somewhere. Regardless getting a reproducible case, even if it's intermittent, would help.
The resubmitted urls reproduced in production but attempting to manually reproduce has not been successful so far. I went back and tried to get the original urls from Socorro to test manually and that has been a bust as well. I do escape urls before loading them and also have to truncate them to 1000 characters due to database limitations. I did notice during manual testing that the console showed url decoding errors. I'll try to revisit when I have more time later today/tonight.
Attached file command line
1. Install https://bclary.com/projects/spider/spider-0.1.0.5-an+fn+fx+sm+tb.xpi into profile firefox-nightly-qr-profile
2. export MOZ_WEBRENDER=1
3. Load Firefox/Spider from command line.
4. Crash.

Appears to require both Spider and export MOZ_WEBRENDER=1

Launching Spider without a url and then pasting the url into the inputs and running does not reproduce. This requires loading from the command line. The command line handler in Spider is located at https://hg.mozilla.org/automation/sisyphus/file/tip/spider/components/spider-cmdline.js
Group: layout-core-security → gfx-core-security
kats, can you take another look? Comment 11 has STR-
Flags: needinfo?(bugmail)
I'm still not able to reproduce. I used a local build, here's what I did:

1. ./mach build
2. ./mach run
3. In the running Firefox, install the addon from https://bclary.com/projects/spider/spider-0.1.0.5-an+fn+fx+sm+tb.xpi (it prompts to restart, just cancel)
4. Close firefox
5. MOZ_WEBRENDER=1 ./mach run http://www.sccb.ac.uk/courses/business-finance

The page loaded fine, no crash. Am I doing something wrong? Your STR in comment 11 weren't very clear with respect to if/when the page is loaded and how you're loading it, or if the crash happens just upon running firefox with the addon installed.
Flags: needinfo?(bugmail)
I tried that. I get a "Spider" window with a bunch of options. It seems to be running something because in the console I see some mixed content message warnings from the page. Eventually it finishes and shuts down. I don't see any crash.

We're both using the same version of Mesa (at least according to the about:support you posted) so I'm not really sure what the difference between our setups is :/
This has been idle for almost a month. Can you work with each other to get this reproducing?
Is this strictly necessary, kats, or could you start working with the backtrace instead?
Flags: needinfo?(bugmail)
At the moment this bug is not really a high priority because (a) webrender is not enabled by default on any channel, and is not something we are shipping to users, (b) the crash stack seems to be rooted in swrast_dri.so which is part of libGL, not part of firefox, so it's more likely to be a bug in libGL.

I guess the next step here would be to try and reproduce under valgrind and see if that points to possible causes. If it's a bug in libGL we should probably report it to them.
Flags: needinfo?(bugmail)
I don't think fixing this is necessary before enabling on nightly Windows. We should figure out what's going on before we ship though.
Blocks: stage-wr-trains
No longer blocks: stage-wr-nightly
Are 56 and 57 really "unaffected", meaning this got fixed somehow? Or did you mean "disabled"? (ESR-52 is definitely "unaffected" in any case)
Flags: needinfo?(milan)
I don't know what the difference between those two is :)  WebRender is off by default and not supported in 56 and 57 (and 58, for that matter.)  Is that unaffected or disabled?
Flags: needinfo?(milan)
Note also that webrender *cannot* be enabled on non-nightly because we conditionally compile the (rust) code for the nightly channel only. So to me "unaffected" seems more appropriate than "disabled" for anything that's not nightly.
(In reply to Milan Sreckovic [:milan] (away 10/19-10/20) from comment #20)
> I don't know what the difference between those two is :)  WebRender is off by default and
> not supported in 56 and 57 (and 58, for that matter.)  Is that unaffected or disabled?

The code and the bug is in the tree and would affect anyone who built that configuration, but it's not how we're currently shipping. That's "disabled". "unaffected" would mean the bug doesn't exist, but it was found in those versions so it does.
Hi Milan:

I have assigned these security bugs to you to reassign them to appropriate developers in your team to investigate and fix them.

Thanks!

Wennie
Assignee: nobody → milan
We're going to revisit this once we're close to enabling WebRender by default.
Blocks: stage-wr-nightly
No longer blocks: stage-wr-trains
Whiteboard: [investigation waiting for WebRender to be enabled]
Assignee: milaninbugzilla → nobody
Blocks: stage-wr-trains
No longer blocks: stage-wr-nightly
Is this still reproducible?
Flags: needinfo?(bob)
I can't tell. My local Fedora 28 laptop does not support web render and the current version of Ubuntu 18.04 on bughunter does not support it either. If someone has hardware that supports webrender, they should easily be able to load the url and check.
Flags: needinfo?(bob)
What hardware do you have that doesn't support WebRender?
Flags: needinfo?(bob)
Thinkpad X1 Carbon 6th gen

lshw says
             description: VGA compatible controller
             product: UHD Graphics 620
             vendor: Intel Corporation
             physical id: 20000:00:02.0
             bus info: pci@0000:00:02.0
             version: 07its
             width: 64 bits

Nightly troubleshooting info says
WebGL 1 Driver Renderer	Intel Open Source Technology Center -- Mesa DRI Intel(R) UHD Graphics 620 (Kabylake GT2) 
WebGL 1 Driver Version	3.0 Mesa 18.0.5
WebGL 2 Driver Renderer	Intel Open Source Technology Center -- Mesa DRI Intel(R) UHD Graphics 620 (Kabylake GT2) 
WebGL 2 Driver Version	4.5 (Core Profile) Mesa 18.0.5
HW_COMPOSITING	
blocked by default: Acceleration blocked by platform
OPENGL_COMPOSITING	
unavailable by default: Hardware compositing is disabled
WEBRENDER	
opt-in by default: WebRender is an opt-in feature
unavailable by runtime: Hardware compositing is disabled
WEBRENDER_QUALIFIED	
blocked by env: No qualified hardware
Flags: needinfo?(bob)
That hardware supports webrender. You should be able to turn it on by setting gfx.webrender.all to true.
WEBRENDER	
opt-in by default: WebRender is an opt-in feature
available by user: Force enabled by pref
WEBRENDER_QUALIFIED	
blocked by env: No qualified hardware
grepping stdout does show INFO 2018-08-30T13:36:59Z: webrender_bindings::bindings: WebRender - OpenGL version new 4.5 (Core Profile) Mesa 18.0.5 so I guess I do have it. I've done a quick check locally and couldn't reproduce the crash. I'll get a recent asan build and see if I can reproduce anything there.
I can't reproduce any more with new builds with webrender enabled on my thingpad->wfm.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Group: gfx-core-security
You need to log in before you can comment on or make changes to this bug.