Closed Bug 1374622 Opened 7 years ago Closed 7 years ago

qr Crash mozalloc_abort | webrender::frame_builder::FrameBuilder::build

Categories

(Core :: Graphics: WebRender, defect)

Unspecified
All
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 1394695
mozilla57
Tracking Status
firefox56 --- unaffected
firefox57 --- unaffected

People

(Reporter: bc, Assigned: vliu)

References

(Blocks 2 open bugs, )

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

Attached file crash report
1. https://qiblafinder.withgoogle.com/intl/en/desktop/finder/%28view:context%29

2. Crash opt/debug

 0  firefox!mozalloc_abort [mozalloc_abort.cpp:416c3c8c4b3d : 33 + 0x0]
 1  firefox!abort [mozalloc_abort.cpp:416c3c8c4b3d : 80 + 0x5]
 2  libxul.so!std::panicking::rust_panic [lib.rs : 61 + 0x5]
 3  libxul.so!std::panicking::rust_panic_with_hook [panicking.rs : 565 + 0x5]
 4  libxul.so!std::panicking::begin_panic<collections::string::String> [panicking.rs : 511 + 0x12]
 5  libxul.so!std::panicking::begin_panic_fmt [panicking.rs : 495 + 0x1c]
 6  libxul.so!core::panicking::panic_fmt [panicking.rs : 471 + 0x18]
 7  libxul.so!core::option::expect_failed [option.rs : 794 + 0x1a]
 8  libxul.so!webrender::frame_builder::FrameBuilder::build [option.rs : 297 + 0x5]
 9  libxul.so!webrender::render_backend::RenderBackend::render [frame.rs:416c3c8c4b3d : 1009 + 0x33]
10  libxul.so!webrender::render_backend::RenderBackend::run [render_backend.rs:416c3c8c4b3d : 420 + 0x5]
(Bob Clary [:bc:] from comment #0)
> https://qiblafinder.withgoogle.com/intl/en/desktop/finder/%28view:context%29

Clicked > Hang/Tab crash.

Nightly 57 x64 20170815100349 @ Windows 10

Meldungs-ID 	Sendedatum
bp-874b0681-fe50-438f-9b42-e13d10170816	16.08.2017	04:50
[@ webrender::gpu_cache::Texture::push_data ] = bug 1376213
bp-5a646370-aba9-4fb6-82f6-079ec0170816	16.08.2017	04:50
[@ core::option::expect_failed | webrender::frame_builder::FrameBuilder::build ] = this bug
bp-b3e6ee17-44ae-4973-b02e-6f14f0170816	16.08.2017	04:50
[@ core::option::expect_failed | webrender::frame_builder::FrameBuilder::build ] = this bug
bp-6de06632-c6ee-4685-88ef-707c10170816	16.08.2017	04:50
[@ core::option::expect_failed | webrender::frame_builder::FrameBuilder::build ] = this bug
bp-94c359bc-f9b8-4455-a751-064d20170816	16.08.2017	04:50
[@ core::option::expect_failed | webrender::frame_builder::FrameBuilder::build ] = this bug
bp-654e14ca-f3a9-49a0-a05b-a117d0170816	16.08.2017	04:50
[@ core::option::expect_failed | webrender::frame_builder::FrameBuilder::build ] = this bug
bp-cd2f184a-0e2c-4de0-a48e-131c70170816	16.08.2017	04:50
[@ webrender::gpu_cache::Texture::push_data ] = bug 1376213
Severity: normal → major
Crash Signature: [@ core::option::expect_failed | webrender::frame_builder::FrameBuilder::build ]
Has STR: --- → yes
OS: Unspecified → All
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #1)
> (Bob Clary [:bc:] from comment #0)
> > https://qiblafinder.withgoogle.com/intl/en/desktop/finder/%28view:context%29
> 
> Clicked > Hang/Tab crash.

But I can't reproduce it in Nightly 57 x64 20170815100349 @ Debian Testing, even comment 0 seems to be on Linux. Hm.
Nightly 57 x64 20170819100442 @ Debian Testing 
stylo + webrender + webrendest + blob-images + layers-free + layers force accel (APZ untouched)

Finally.
bp-4c3830ea-0361-4b6e-858a-b42080170819
Clicked on Whatsapp' app tab. Not reproducible. :/

(+ Off topic: Some texts of Nightly's internal pages are partially invisible today when webrendest=true. You might have noticed that already.)
Crash Signature: [@ core::option::expect_failed | webrender::frame_builder::FrameBuilder::build ] → [@ mozalloc_abort | abort | core::option::expect_failed | webrender::frame_builder::FrameBuilder::build ] [@ core::option::expect_failed | webrender::frame_builder::FrameBuilder::build ]
Nightly 57 x64 20170819100442 @ Debian Testing 
stylo + webrender + webrendest + blob-images + layers-free + layers force accel + now also: layers apz disabled
(In reply to comment #3)
> Clicked on Whatsapp' app tab. Not reproducible. :/
Correction: Could reproduce it (but not 100% reliably) around web.whatsapp.com (try to enlarge photos, clicked to switch tab).
I can't reproduce for comment 0 and comment 3. Hope we could have another STR.
Can't reproduce the crash from comment 4 anymore, but Whatsapp is exceptional slow and I can get a desktop freeze (bug 1377120) by scrolling up and enlarging an image. Nightly 57 x64 20170824100243 @ Debian Testing (KDE/Xorg/Radeon RX480).
For Stylo it was helpful to run a debug build and to redirect all console output into a log file. Could I do the same for WebRender or is there some other logging mechanism to log events and stats, so we can see what happens in the last second?
Flags: needinfo?(bob)
I couldn't reproduce with either of

https://webapplog.com/software-engineering-future/
https://qiblafinder.withgoogle.com/intl/en/desktop/finder/%28view:context%29

with a current build. I tried with a debug build from 2017-06-20 but could only get a hang with the giblafinder url and no problem at all with the webapplog url.
Flags: needinfo?(bob)
Looks like this issue could be reproduced from try with layers-free enabled.
Check Linux x64 QuantumRender opt R4.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa4791c97c18002b53b49f3ef2ef7de49b5459d3&selectedJob=126276492


[task 2017-08-27T16:43:14.036162Z] 16:43:14    ERROR - thread 'RenderBackend' panicked at 'no entry found for key', /checkout/src/libcore/option.rs:823
[task 2017-08-27T16:43:14.038198Z] 16:43:14     INFO - stack backtrace:
[task 2017-08-27T16:43:14.039759Z] 16:43:14     INFO - REFTEST INFO | drawWindow flags = DRAWWINDOW_DRAW_CARET | DRAWWINDOW_DRAW_VIEW | DRAWWINDOW_USE_WIDGET_LAYERS; window size = 800,1000; test browser size = 800,1000
[task 2017-08-27T16:43:14.210095Z] 16:43:14     INFO -    0:     0x7f309b038143 - std::sys::imp::backtrace::tracing::imp::unwind_backtrace::hcab99e0793da62c7
[task 2017-08-27T16:43:14.211352Z] 16:43:14     INFO -    1:     0x7f309b035109 - std::panicking::default_hook::{{closure}}::h9ba2c6973907a2be
[task 2017-08-27T16:43:14.213199Z] 16:43:14     INFO -    2:     0x7f309b034530 - std::panicking::default_hook::he4d55e2dd21c3cca
[task 2017-08-27T16:43:14.214799Z] 16:43:14     INFO -    3:     0x7f309b034094 - std::panicking::rust_panic_with_hook::ha138c05cd33ad44d
[task 2017-08-27T16:43:14.216321Z] 16:43:14     INFO -    4:     0x7f309b033f6f - std::panicking::begin_panic::hcdbfa35c94142fa2
[task 2017-08-27T16:43:14.218105Z] 16:43:14     INFO -    5:     0x7f309b033ed9 - std::panicking::begin_panic_fmt::hc09fe500d9b7be81
[task 2017-08-27T16:43:14.221032Z] 16:43:14     INFO -    6:     0x7f309b041d06 - core::panicking::panic_fmt::h883a028e9f4b4457
[task 2017-08-27T16:43:14.222956Z] 16:43:14     INFO -    7:     0x7f309b047697 - core::option::expect_failed::h1ff823102004902d
[task 2017-08-27T16:43:14.224195Z] 16:43:14     INFO -    8:     0x7f309afc8dd3 - webrender::frame_builder::FrameBuilder::build::h8d50cf91eae5296f
[task 2017-08-27T16:43:14.226850Z] 16:43:14     INFO -    9:     0x7f309afba934 - webrender::render_backend::Document::render::hf13d0d68e9527616
[task 2017-08-27T16:43:14.228440Z] 16:43:14     INFO -   10:     0x7f309afb0c6c - webrender::render_backend::RenderBackend::process_document::hb8545a5aaba1fa85
[task 2017-08-27T16:43:14.230548Z] 16:43:14     INFO -   11:     0x7f309afa4ccb - webrender::render_backend::RenderBackend::run::h30fc31c1d67231ef
[task 2017-08-27T16:43:14.232409Z] 16:43:14     INFO -   12:     0x7f309afa3381 - std::sys_common::backtrace::__rust_begin_short_backtrace::hb726974a403e0b61
[task 2017-08-27T16:43:14.233650Z] 16:43:14     INFO -   13:     0x7f309afa2a46 - <F as alloc::boxed::FnBox<A>>::call_box::h87762c226d70f830
[task 2017-08-27T16:43:14.234578Z] 16:43:14     INFO -   14:     0x7f309b03f543 - std::sys::imp::thread::Thread::new::thread_start::h227b2afaa9316a8d
[task 2017-08-27T16:43:14.235664Z] 16:43:14     INFO -   15:     0x7f30a87f66b9 - start_thread
[task 2017-08-27T16:43:14.236907Z] 16:43:14     INFO -   16:     0x7f30a787f3dc - clone
[task 2017-08-27T16:43:14.238060Z] 16:43:14     INFO -   17:                0x0 - <unknown>
[task 2017-08-27T16:43:14.239301Z] 16:43:14     INFO - Redirecting call to abort() to mozalloc_abort
[task 2017-08-27T16:43:14.240515Z] 16:43:14     INFO - ExceptionHandler::GenerateDump cloned child 1283
[task 2017-08-27T16:43:14.241711Z] 16:43:14     INFO - ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2017-08-27T16:43:14.242878Z] 16:43:14     INFO - ExceptionHandler::WaitForContinueSignal waiting for continue signal...
Blocks: 1389000
QA Contact: vliu
Assignee: nobody → vliu
QA Contact: vliu
The try message in Comment 8 showed that the crash hits on both opt and debug build in Linux x64 QuantumRender. [1-opt] and [2-debug] are the try link for both. 

From looked into, they got different back trace for the crash. But even so, the all crash in the same reftest file, which is reftest/tests/layout/reftests/bugs/593243-1.html.

[1-opt]: https://public-artifacts.taskcluster.net/c08wkcmRSMa30z6iyelQ-w/0/public/logs/live_backing.log
[2-debug]: https://public-artifacts.taskcluster.net/RZDBFP43Rh2MY85gocNT4g/0/public/logs/live_backing.log

Currently I can reproduce them in local by the below preference settings.

pref("layers.async-pan-zoom.enabled", false);
pref("gfx.webrender.layers-free", true);
pref("layers.acceleration.force-enabled", true);

The way to reproduce it is just run 593243-1.html.  

I can't tell they all point to the same root cause at this moment. I will look into them to find out the root cause.
For the crash issue on debug build, I have filed bug 1396471 to investigate. The following comment in this bug still focus on opt build.
In current study, the crash happens in [1] on opt build.

[1]: https://searchfox.org/mozilla-central/rev/f2a1911ad310bf8651f342d719e4f4ca0a7b9bfb/gfx/webrender/src/frame_builder.rs#1801

When [1] was called, group.scroll_node_id contains ClipExternalId(5, PipelineId(1, 5)). With this index, it can't find any matched index in nodes. Based on this, it then crash.

The attached file the content of this clip_scroll_tree for note.
I tried to rebase the code and found the crash issue on opt build has been fixed. I tried to find the commit history in wr github but it is hard to find out the root cause. I also tried the crash on debug build(see bug 1396471) but the issue still exist unfortunately.
Status: NEW → ASSIGNED
Priority: P3 → P1
Whiteboard: [wr-mvp]
Target Milestone: --- → mozilla57
Oh if this is the clip-scroll-tree being corrupted it might have been https://bugzilla.mozilla.org/show_bug.cgi?id=1398324, which is now fixed?
(In reply to Alexis Beingessner [:Gankro] from comment #14)
> Oh if this is the clip-scroll-tree being corrupted it might have been
> https://bugzilla.mozilla.org/show_bug.cgi?id=1398324, which is now fixed?

I am afraid that above patch didn't fix the problem. Furthermore, I can still reproduce this problem after I did rebase on Mac. Based on this, please forget statements in Comment 12.
Hi kats,

More study on this bug, I found the crash is relative to Push/Pop ClipAndScrollInfo in DisplayListBuilder. 

If I left out [2], the crash disappeared.

From the above, it seems that WebRender backend can't deal with ClipAndScrollInfo. Besides, I also found bug 1394695 seems got the same root cause.

In 1394695, the crash happens because we didn't deal with skipping scrolling clips in [3] with layer-free and apz disabled. 

[4] was doing skipping scrolling clips and pushing the layer's local clip (maybe we can call it non-scrolling clips). It was done from getting layer's local clip in mLayer(WebRenderLayer). 
To fix this problem under layer-free, we need to do the same thing in [3]. There are two problems raised in my mind.

1. nsDisplayItem was passed in [3]. How do I get non-scrolling clips from nsDisplayItem like the same as layer's local clip? For these clips, we still need to push into. I am not sure the scope for this part so maybe you can have hints to let me understand more.
2. This problem may be minor. layers.async-pan-zoom.enabled was accquired by mLayer->WrManager()->AsyncPanZoomEnabled(). How do I do it in layer-free?


[2]: http://searchfox.org/mozilla-central/rev/51eae084534f522a502e4b808484cde8591cb502/gfx/layers/wr/ScrollingLayersHelper.cpp#131-133
[3]: http://searchfox.org/mozilla-central/rev/51eae084534f522a502e4b808484cde8591cb502/gfx/layers/wr/ScrollingLayersHelper.cpp#89
[4]: http://searchfox.org/mozilla-central/rev/51eae084534f522a502e4b808484cde8591cb502/gfx/layers/wr/ScrollingLayersHelper.cpp#26-31


Thanks.
Flags: needinfo?(bugmail)
See Also: → 1394695
I put a patch on bug 1394695 that should fix the issue you're describing in comment 16.
Flags: needinfo?(bugmail)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)
> I put a patch on bug 1394695 that should fix the issue you're describing in
> comment 16.

With the patch in bug 1394695, the crash never happens in my local build and the Linux x64 QuantumRender opt(R4) in try[1]. I will set this bug as dup of bug 1394695.


[1]:https://treeherder.mozilla.org/#/jobs?repo=try&revision=2c0881cbc25ed28989ac2b84f91b28e3fa1f4dd5&selectedJob=130542354
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Priority: P1 → --
Whiteboard: [wr-mvp]
See Also: → 1404558
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: