Closed
Bug 1357734
Opened 7 years ago
Closed 7 years ago
Crash in mozilla::wr::DisplayListBuilder::PushBorder
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
RESOLVED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | disabled |
firefox56 | --- | fixed |
People
(Reporter: marcia, Assigned: sotaro)
References
(Depends on 1 open bug)
Details
(Keywords: crash, Whiteboard: [gfx-noted])
Crash Data
Attachments
(5 files)
This bug was filed from the Socorro interface and is report bp-212d466d-cc37-405e-85ca-666070170419. ============================================================= Seen while looking at nightly crash data - these crashes started using 20170418030220 with 26 crashes/5 installations: http://bit.ly/2pBAYsU The App notes doesn't show the WR pref as being enabled on the reports I looked at.
Comment 1•7 years ago
|
||
(In reply to Marcia Knous [:marcia - use ni] from comment #0) > The App notes doesn't show the WR pref as being enabled on the reports I > looked at. The WR annotation in the app notes only shows up in parent process crashes, I think. It seems that each process has independent app notes, and we only run the gfxPlatform code that creates the annotation in the parent process [1]. Also if crashes showed up in the Apr 18 nightly it's likely from something the graphics -> m-c merge [2] that we did the previous day. [1] http://searchfox.org/mozilla-central/rev/214345204f1e7d97abb571b7992b6deedb5ff98f/gfx/thebes/gfxPlatform.cpp#714-716 [2] https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=2b6a66a98e25
Comment 2•7 years ago
|
||
Nothing that pushlog seems relevant to the crash stack, though :(
Updated•7 years ago
|
Depends on: wr-stability
Comment 3•7 years ago
|
||
I'm seeing this crash after enabling WebRender in Nightly when I click around Yahoo, specifically https://www.yahoo.com/news/politics/: https://crash-stats.mozilla.com/report/index/3c4a473f-179e-42cb-a81b-5a1820170419 https://crash-stats.mozilla.com/report/index/aa11e7a5-e7df-4edf-a1cb-89b9b0170419 https://crash-stats.mozilla.com/report/index/6d82c284-a965-4fa9-b4c1-435340170419 Since it's easy to repro, I was able to get a log from windbg.
Comment 4•7 years ago
|
||
(In reply to Brian Carpenter [:geeknik] from comment #3) > Created attachment 8859703 [details] > firefox-debug.log > > I'm seeing this crash after enabling WebRender in Nightly when I click > around Yahoo, specifically https://www.yahoo.com/news/politics/: > > https://crash-stats.mozilla.com/report/index/3c4a473f-179e-42cb-a81b- > 5a1820170419 > https://crash-stats.mozilla.com/report/index/aa11e7a5-e7df-4edf-a1cb- > 89b9b0170419 > https://crash-stats.mozilla.com/report/index/6d82c284-a965-4fa9-b4c1- > 435340170419 > > Since it's easy to repro, I was able to get a log from windbg. Sync with today's Nightly build, plus enabling WebRender, I can't reproduce this issue after surfing yahoo link many times. The platform I tried was on Windows 10. Looked into the crash report, a null pointer deference happens in [1]. [1]:https://searchfox.org/mozilla-central/rev/d8ac097a1de2544f0e51d186bc850662c5b7430a/gfx/webrender_bindings/WebRenderAPI.cpp#652 From mini-dump, it seems that wr_dp_push_border didn't point to a null address. Based on the current information, I don't have any idea to think what is the root cause.
Comment 5•7 years ago
|
||
Yeah the crash here doesn't make much sense. We probably need to inspect the disassembly to see where the 0xffffffffffffffff is that's trying to read from.
Comment 6•7 years ago
|
||
[Tracking Requested - why for this release]: This is the #4 Windows topcrash in Nightly 20170421030241.
tracking-firefox55:
--- → ?
Keywords: topcrash
Comment 7•7 years ago
|
||
Ethan, can you take a look to see if there's anything that jumps out at you here?
Flags: needinfo?(ethlin)
Updated•7 years ago
|
Whiteboard: [gfx-noted]
Comment 8•7 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7) > Ethan, can you take a look to see if there's anything that jumps out at you > here? Sorry, I couldn't reproduce the problem on my Mac or Windows10 devices. I also reviewed the code around the function and checked the minidump file, but I have no idea right now.
Flags: needinfo?(ethlin)
Comment 10•7 years ago
|
||
I downloaded the build [1] for the crash report in comment 0 and ran dumpbin.exe on the xul.dll to get the disassembly. I found the disasm for the PushBorder function at offset 119ee18, just as predicted by the .sym file. I'm attaching the disassembly. The crash is supposedly at offset 0x52 into the function, so I marked that in the file. Assuming the ip is incremented before executing the instruction, the crash happens on this instruction: movaps xmm1,xmmword ptr [r8+30h] which is an aligned 128-bit read from the address at r8+0x30. The raw dump from the crash report says r8 is 0x000000ff5b5fd230 which means this should be reading from 0x000000ff5b5fd260. And the crash report indicates that the instruction tried to an invalid read from 0xffffffffffffffff. I don't really know how that's possible. [1] https://archive.mozilla.org/pub/firefox/nightly/2017/04/2017-04-18-03-02-20-mozilla-central/firefox-55.0a1.en-US.win64.zip
Comment 11•7 years ago
|
||
If we give user the nightly debug build, does it help?
Comment 12•7 years ago
|
||
Ah, the 0xffffffffffffffff is bogus, see bug 974420. Looking at the debug log that :geeknik attached we see this: 00007ffc`cc48555e 0f2800 movaps xmm0,xmmword ptr [rax] ds:0000000f`73ffd108=3f8000003f41c1c33f41c1c33f41c1c3 Firstly it looks like that's the instruction after the one I pegged - it's reading from [rax] instead of [r8+30h]. And secondly the debugger provided the actual value of what that resolves to, and it's not 0xffffffffffffffff. So it's just a regular invalid read, probably of free'd memory. Considering the PushBorder function just delegates to wr_dp_push_border [1], the most likely thing that's happening here is that the DisplayListBuilder object has already been freed, and we're trying to read the mWrState from a dead object. [1] http://searchfox.org/mozilla-central/rev/baf47b352e873d4516d7656186d6d7c7447d3873/gfx/webrender_bindings/WebRenderAPI.cpp#652
Comment 13•7 years ago
|
||
Looking at the call stack, the DisplayListBuilder is allocated on the stack just a few frames up, so there's no way it could have been freed. However maybe the stack is getting corrupted. Looking at the WR calls that happens just before, I see there are potentially a couple of calls to build clip regions at [1] which pass a rect and no mask. That trickles down to [2] where it passes nullptr for the complex clip region pointer and then to [3] which tries to make a slice out of (nullptr, 0). Guess what the slice::from_raw_parts documentation says is not allowed? "p must be non-null, even for zero-length slices." [4]. Bug! Not really sure that this is the root cause of the observed crash, but it's something that needs fixing at least. [1] http://searchfox.org/mozilla-central/rev/baf47b352e873d4516d7656186d6d7c7447d3873/layout/painting/nsCSSRenderingBorders.cpp#3563-3565 [2] http://searchfox.org/mozilla-central/rev/baf47b352e873d4516d7656186d6d7c7447d3873/gfx/webrender_bindings/WebRenderAPI.cpp#739 [3] http://searchfox.org/mozilla-central/rev/baf47b352e873d4516d7656186d6d7c7447d3873/gfx/webrender_bindings/src/bindings.rs#1119 [4] https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html
Comment 14•7 years ago
|
||
This code has now changed so that an empty slice is now properly created from a nullptr[1]. I'm not sure if this fixed the crash or not. [1] http://searchfox.org/mozilla-central/source/gfx/webrender_bindings/src/bindings.rs#81
Comment 15•7 years ago
|
||
Ah, I see that this was handled in the dependent bug 1359462. Sorry for the noise.
Comment 16•7 years ago
|
||
I don't see any crash reports matching this signature on a buildid more recent than 20170421030241. I'm going to assume it's fixed, although it was probably not fixed by bug 1359462 as that would only have showed up in the April 28 nightly. The crashes stopped a week before that.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Updated•7 years ago
|
Comment 17•7 years ago
|
||
This signature appeared again in the nightly of 20170601030206, although it's only a single installation.
Comment 18•7 years ago
|
||
I looked at the last 6 months of crash data for this signature and here are the build ids it shows up on: Rank BuildId Count Percent 3 20170418030220 47 21.08 % 1 20170419030223 87 39.01 % 2 20170421030241 67 30.04 % 5 20170524030204 7 3.14 % <-- reappears in may 24 6 20170529030204 6 2.69 % 4 20170601030206 9 4.04 % So it seems that after going away in april, it reappeared in 20170524030204. However looking at the "install time" of the may/june crash reports, I suspect all the may/june reports are from a single user. Which likely means it's something in their environment that's triggering this.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Updated•7 years ago
|
Comment 19•7 years ago
|
||
Kats, I see two different users with this crash in Nightly 20170609030207: https://crash-stats.mozilla.com/report/index/246fc936-4490-4a60-a0b0-069570170610 https://crash-stats.mozilla.com/report/index/1682e345-231c-4815-b1db-1619a0170609 Does that change your conclusion?
Flags: needinfo?(bugmail)
Comment 20•7 years ago
|
||
Yes, I agree - this is probably a bug in our code. Problem is, I have no idea what it could be.
Flags: needinfo?(bugmail)
Comment 21•7 years ago
|
||
Try enabling webrendest on top of webrender. I'm seeing some crashes that only show up with both enabled.
Comment 22•7 years ago
|
||
20170629030206 @ Windows 10 x64 with (only) webrender enabled The new newtab page has gotten some elements from the acticity stream project. I clicked on the top-right gear and wanted to switch from "blank page" to "most visited sites". Got browser crashes: Meldungs-ID Sendedatum bp-0c6fa79f-21bb-4934-888d-111310170629 29.06.2017 19:18 bp-d360b20a-39f4-4fc2-ba67-ed66d0170629 29.06.2017 19:18 bp-c05f6bf0-da97-4b29-b3dc-a812b0170629 29.06.2017 19:18
Comment 23•7 years ago
|
||
(In reply to Darkspirit from comment #22) With browser.newtabpage.enabled;false at least the newtabpage is fine. But if a click on a bug id inside a crash report and want to login into bugzilla, crash. Switch tabs. Crash. If you can't reproduce it with this build, I would give you my whole config or test whatever debug binary you'd gave me.
Comment 24•7 years ago
|
||
Nightly 56 Build 20170629030206 @ Windows 10 x64 (Radeon RX 480) STR 1. Edit Nightly's Desktop shortcut to "C:\Program Files\Nightly\firefox.exe" -P 2. create a fresh profile 3. about:config > set gfx.webrender.enabled;true 4. open a new tab. browser crash. bp-404c3a25-3d6f-4555-a72a-34eb30170629 29.06.2017 21:37 bp-479c5e59-34d4-4828-8278-4a8e70170629 29.06.2017 21:37 bp-13b22d13-c567-4218-9ecd-7b73e0170629 29.06.2017 21:36 bp-553d9b71-f970-41f1-81fd-e7a090170629 29.06.2017 21:36 bp-838cb60d-9434-469e-bcf2-60e610170629 29.06.2017 21:36 bp-2a0cf12b-a7b9-4061-9cb8-d4f3f0170629 29.06.2017 21:35
Assignee | ||
Comment 25•7 years ago
|
||
I also saw the crash with nightly, but I did not see the crash with my local build.
Comment 26•7 years ago
|
||
On this bugzilla page I only get a crash by mouseover those slim "Status", "People", "Tracking" etc. bars at the top. And by visiting http://www.dict.cc I get immediately a crash.
Comment 27•7 years ago
|
||
(In reply to Darkspirit from comment #26) > On this bugzilla page I only get a crash by mouseover those slim "Status", "People", "Tracking" etc. bars at the top. And by visiting http://www.dict.cc I get immediately a crash. While making the first video of this, I discovered a visible bug regarding borders (and when I mouseover them, I get a crash). 1. about:config, enabled webrender 2. Shift+F2 restart 3. switched to this tab. Recording of this video starts. 4. You see borders around the bug's meta fields ("Status", "People", "Tracking" etc). 5. At 3s I scroll down, the borders disappear immediately (bug). 6. I scroll up, mouseover on of those borders: crash (bug). 7. Crash on opening dict.cc Meldungs-ID Sendedatum bp-b5ec7870-f8fd-468c-aac3-870260170629 29.06.2017 23:17 bp-b1bf98cc-4834-467e-85b4-ce9110170629 29.06.2017 23:17 bp-a589df7e-d08e-43a9-87ff-8bb420170629 29.06.2017 23:17 bp-f29fecc5-ddbe-41d1-bf18-005f80170629 29.06.2017 23:17 bp-b3c12979-8d32-4a74-adc5-455590170629 29.06.2017 23:17
Assignee | ||
Comment 28•7 years ago
|
||
I debugged on nightly with :jrmuizel. At first, "Access violation reading location 0xFFFFFFFFFFFFFFFF" seemed to wrong suggestion. The exception happened at "movaps xmm0,xmmword ptr [rcx]" and rcx was 0x0000005cdf1fc928. Instead, the exception happened because rcx=0x0000005cdf1fc928 was not aligned to 16bit. movaps requests that operand is aligned to 16bit boundary. http://x86.renejeschke.de/html/file_module_x86_id_180.html
Assignee | ||
Comment 29•7 years ago
|
||
It is not clear why compiler uses movaps for a pointer. It might be caused by compiler's bug.
Assignee | ||
Comment 30•7 years ago
|
||
> The exception happened at "movaps xmm0,xmmword ptr [rcx]" and rcx was 0x0000005cdf1fc928.
My local build uses "movups xmm0,xmmword ptr [rcx] ". movups was used instead of movaps.
Comment 31•7 years ago
|
||
(In reply to Darkspirit from comment #24) > Nightly 56 Build 20170629030206 @ Windows 10 x64 (Radeon RX 480) Missed something: Enable webrendest, immediate crash. No escape. Had to edit prefs.js.
Comment 32•7 years ago
|
||
32 bit works. Win32 Build 20170629030206 downloaded from http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central-l10n/firefox-56.0a1.de.win32.zip does not crash (even not with webrendest enabled), but shows us solid borders at the left and right side on bugzilla which is a bug.
Comment 33•7 years ago
|
||
(In reply to Darkspirit from comment #32) > but shows us solid borders at the left and right side on bugzilla which is a bug. I am too fast in submitting, sorry. Those borders are fine with webrendest disabled. It's only about 32 bit (works) vs. 64 bit (crashes).
Assignee | ||
Comment 34•7 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #29) > It is not clear why compiler uses movaps for a pointer. It might be caused > by compiler's bug. There was similar problem in Visual Studio 2013, but it seems already fixed. https://connect.microsoft.com/VisualStudio/feedback/details/956733/misaligned-memory-access-with-sse-optimization-resulting-in-an-application-crash-visual-c http://www.beta.microsoft.com/VisualStudio/feedbackdetail/view/766477/c-optimizer-generates-aligned-read-movaps-for-possibly-unaligned-pointer-to-member-function
Comment 35•7 years ago
|
||
Sorry. It seems to be clear for one longer moment, then I comment here, then it changes. (In reply to Darkspirit from comment #27) > 5. At 3s I scroll down, the borders disappear immediately (bug). Maybe I should've said something like box shadow. This visual bug behaves the same with only webrender enabled on Nightly win32 (no crash) and win64 (crash on mouseover).
Assignee | ||
Comment 36•7 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #30) > > The exception happened at "movaps xmm0,xmmword ptr [rcx]" and rcx was 0x0000005cdf1fc928. > > My local build uses "movups xmm0,xmmword ptr [rcx] ". movups was used > instead of movaps. My local build uses Visual Studio 2015 Update 3. Official nightly windows build seems also use Visual Studio 2015 Update 3. So, I am not sure what causes the difference.
Assignee | ||
Comment 37•7 years ago
|
||
> > (In reply to Darkspirit from comment #27) > > 5. At 3s I scroll down, the borders disappear immediately (bug). > Maybe I should've said something like box shadow. It seems like Bug 1372083.
Assignee | ||
Comment 38•7 years ago
|
||
The crash seems to happen only on PGO build. When I downloaded x64 opt and x64 pgo from the following, the crash happened only on x64 pgo. https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedJob=110898139
Assignee | ||
Comment 39•7 years ago
|
||
visual studio update 2 seemed to fix "PGO build generates movaps to unaligned address". But seems not addressed actually:( https://blogs.msdn.microsoft.com/vcblog/2016/03/31/visual-c-2015-update-2-bug-fixes/
Assignee | ||
Comment 40•7 years ago
|
||
By disabling PGO under webrender_bindings, the crash seemed to be addressed. https://treeherder.mozilla.org/#/jobs?repo=try&revision=876da1cdf1f4dafecc644e03a6f0f44c4955786f
Assignee | ||
Comment 41•7 years ago
|
||
Default build for comparison. https://treeherder.mozilla.org/#/jobs?repo=try&revision=888f6b684935e491451dbf0bbdee67219f15b564 Default win x64 pgo build caused the crash.
Assignee | ||
Comment 42•7 years ago
|
||
Modified argument of DisplayListBuilder::PushBorder() also bypassed the problem. https://treeherder.mozilla.org/#/jobs?repo=try&revision=4e7efc4f46d0fa10afd488afe021e27f4178993e
Assignee | ||
Comment 43•7 years ago
|
||
Assignee: nobody → sotaro.ikeda.g
Assignee | ||
Updated•7 years ago
|
Attachment #8882656 -
Flags: review?(jmuizelaar)
Updated•7 years ago
|
Attachment #8882656 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Comment 44•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=91d43f48fe2bbc7e6ddb432073e2504b2883ef2c
Comment 45•7 years ago
|
||
Pushed by sikeda@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5954289fc962 Change argument of DisplayListBuilder::PushBorder() to bypass compiler bug r=jrmuizel
Comment 46•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5954289fc962
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment 47•7 years ago
|
||
Last crash seen with 20170701030203. Does not crash anymore with STR of comment 27 with Nightly 56 x64 20170702030204 @ Win10 x64 (Radeon RX480).
Updated•7 years ago
|
Updated•7 years ago
|
status-firefox54:
--- → unaffected
status-firefox-esr52:
--- → unaffected
You need to log in
before you can comment on or make changes to this bug.
Description
•