Closed Bug 1729908 Opened 3 years ago Closed 8 months ago

valgrind: uninitialized memory access in mozilla::layers::WebRenderScrollDataWrapper::GetTransform() const (WebRenderScrollDataWrapper.h:224)

Categories

(Core :: Graphics: WebRender, defect)

x86_64
Linux
defect

Tracking

()

RESOLVED INACTIVE
Tracking Status
firefox94 --- affected
firefox95 --- affected

People

(Reporter: ishikawa, Unassigned)

References

Details

Attachments

(2 files)

I am testing thunderbird locally compiled from M-C/C-C source updated a few days ago.

When I ran the binary under valgrind, I saw valgrind warnigs.

I am quoting only the first stacktrace in the following.
Another uninitialized access occurs in tandem in the same place but with a slightly different stack (I show the few frames from the second access warning.)
I am attaching the full trace as attachment.

I think this uninitialized access is the cause of strange port size reported during TB's mochitest. I get 3335 such warnings.

3335: Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/layout/generic/nsGfxScrollFrame.cpp:7076

Please note the following lines in the log after the uninitialized memory report from valgrind.

result.width = 0 < mScrollPort.width = 77880
[Parent 121148, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/layout/generic/nsGfxScrollFrame.cpp:7069
result.height = 480 < mScrollPort.height = 32800
[Parent 121148, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/layout/generic/nsGfxScrollFrame.cpp:7076

  1. Full stack from the first uninitialized access.
==121148== Thread 73 WRScene~ilder#2:
==121148== Conditional jump or move depends on uninitialised value(s)
==121148==    at 0xA18C124: mozilla::layers::WebRenderScrollDataWrapper::GetTransform() const (WebRenderScrollDataWrapper.h:224)
==121148==    by 0xA19E740: void mozilla::layers::SetHitTestData<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::HitTestingTreeNode*, mozilla::layers::WebRenderScrollDataWrapper const&, mozilla::Maybe<mozilla::gfx::IntRegionTyped<mozilla::ParentLayerPixel> > const&, mozilla::layers::EventRegionsOverride const&) (WebRenderScrollDataWrapper.h:241)
==121148==    by 0xA1B1D95: mozilla::layers::HitTestingTreeNode* mozilla::layers::APZCTreeManager::PrepareNodeForLayer<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::RecursiveMutexAutoLock const&, mozilla::layers::WebRenderScrollDataWrapper const&, mozilla::layers::FrameMetrics const&, mozilla::layers::LayersId, mozilla::Maybe<mozilla::layers::ZoomConstraints> const&, mozilla::layers::AncestorTransform const&, mozilla::layers::HitTestingTreeNode*, mozilla::layers::HitTestingTreeNode*, mozilla::layers::APZCTreeManager::TreeBuildingState&) (APZCTreeManager.cpp:1310)
==121148==    by 0xA1B3066: void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#2}::operator()(mozilla::layers::WebRenderScrollDataWrapper) const (APZCTreeManager.cpp:479)
==121148==    by 0xA1B3873: std::enable_if<(is_same_v<decltype ({parm#2}({parm#1})), void>)&&(is_same_v<decltype ({parm#3}({parm#1})), void>), void>::type mozilla::layers::ForEachNode<mozilla::layers::ReverseIterator, mozilla::layers::WebRenderScrollDataWrapper, void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#2}, void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#3}>(mozilla::layers::WebRenderScrollDataWrapper, void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#2} const&, void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#3} const&) (TreeTraversal.h:139)
==121148==    by 0xA1C1442: void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int) (APZCTreeManager.cpp:438)
==121148==    by 0xA1C23C8: mozilla::detail::RunnableFunction<mozilla::layers::APZUpdater::UpdateScrollDataAndTreeState(mozilla::layers::LayersId, mozilla::layers::LayersId, mozilla::wr::Epoch const&, mozilla::layers::WebRenderScrollData&&)::{lambda()#2}>::Run() (APZCTreeManager.cpp:720)
==121148==    by 0xA1A8889: mozilla::layers::APZUpdater::ProcessQueue() (APZUpdater.cpp:466)
==121148==    by 0xA1BE9BB: mozilla::layers::APZUpdater::CompleteSceneSwap(mozilla::wr::WrWindowId const&, mozilla::wr::WrPipelineInfo const&) (APZUpdater.cpp:120)
==121148==    by 0xA1BEB24: apz_post_scene_swap (APZUpdater.cpp:534)
==121148==    by 0xF0F3EC6: <webrender_bindings::bindings::APZCallbacks as webrender::renderer::SceneBuilderHooks>::post_scene_swap (bindings.rs:976)
==121148==    by 0xF2D71EA: forward_built_transactions (scene_builder_thread.rs:699)
==121148==    by 0xF2D71EA: webrender::scene_builder_thread::SceneBuilderThread::run (scene_builder_thread.rs:313)
==121148==    by 0xF13350F: {{closure}} (mod.rs:1230)
==121148==    by 0xF13350F: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
==121148==    by 0xF14B29F: {{closure}}<closure-2,()> (mod.rs:481)
==121148==    by 0xF14B29F: call_once<(),closure-0> (panic.rs:344)
==121148==    by 0xF14B29F: do_call<std::panic::AssertUnwindSafe<closure-0>,()> (panicking.rs:379)
==121148==    by 0xF14B29F: try<(),std::panic::AssertUnwindSafe<closure-0>> (panicking.rs:343)
==121148==    by 0xF14B29F: catch_unwind<std::panic::AssertUnwindSafe<closure-0>,()> (panic.rs:431)
==121148==    by 0xF14B29F: {{closure}}<closure-2,()> (mod.rs:480)
==121148==    by 0xF14B29F: core::ops::function::FnOnce::call_once{{vtable-shim}} (function.rs:227)
==121148==    by 0xFF21309: call_once<(),FnOnce<()>,alloc::alloc::Global> (boxed.rs:1546)
==121148==    by 0xFF21309: call_once<(),alloc::boxed::Box<FnOnce<()>, alloc::alloc::Global>,alloc::alloc::Global> (boxed.rs:1546)
==121148==    by 0xFF21309: std::sys::unix::thread::Thread::new::thread_start (thread.rs:71)
==121148==    by 0x4870EA6: start_thread (pthread_create.c:477)
==121148==    by 0x4BADDEE: clone (clone.S:95)
==121148==  Uninitialised value was created by a heap allocation
==121148==    at 0x48397B5: malloc (vg_replace_malloc.c:380)
==121148==    by 0x1203A8: moz_xmalloc (mozalloc.cpp:52)
==121148==    by 0xA2863E4: void std::vector<mozilla::layers::WebRenderLayerScrollData, std::allocator<mozilla::layers::WebRenderLayerScrollData> >::_M_realloc_insert<>(__gnu_cxx::__normal_iterator<mozilla::layers::WebRenderLayerScrollData*, std::vector<mozilla::layers::WebRenderLayerScrollData, std::allocator<mozilla::layers::WebRenderLayerScrollData> > >) (cxxalloc.h:33)
==121148==    by 0xA2A094A: mozilla::layers::WebRenderCommandBuilder::BuildWebRenderCommands(mozilla::wr::DisplayListBuilder&, mozilla::wr::IpcResourceUpdateQueue&, mozilla::nsDisplayList*, mozilla::nsDisplayListBuilder*, mozilla::layers::WebRenderScrollData&, WrFiltersHolder&&) (vector.tcc:121)
==121148==    by 0xA2A0B66: mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer(mozilla::nsDisplayList*, mozilla::nsDisplayListBuilder*, WrFiltersHolder&&, mozilla::layers::WebRenderBackgroundData*, double) (WebRenderLayerManager.cpp:361)
==121148==    by 0xCBA4BD4: mozilla::nsDisplayList::PaintRoot(mozilla::nsDisplayListBuilder*, gfxContext*, unsigned int, mozilla::Maybe<double>) (nsDisplayList.cpp:2385)
==121148==    by 0xC81F96F: nsLayoutUtils::PaintFrame(gfxContext*, nsIFrame*, nsRegion const&, unsigned int, mozilla::nsDisplayListBuilderMode, nsLayoutUtils::PaintFrameFlags) (nsLayoutUtils.cpp:3443)
==121148==    by 0xC78108F: mozilla::PresShell::Paint(nsView*, nsRegion const&, mozilla::PaintFlags) (PresShell.cpp:6363)
==121148==    by 0xC44E903: nsViewManager::ProcessPendingUpdatesPaint(nsIWidget*) (nsViewManager.cpp:467)
==121148==    by 0xC44ED32: nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) (nsViewManager.cpp:402)
==121148==    by 0xC44F6D5: nsViewManager::ProcessPendingUpdates() (nsViewManager.cpp:980)
==121148==    by 0xC74DEC9: nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsRefreshDriver::IsExtraTick) (nsRefreshDriver.cpp:2546)
==121148==    by 0xC74F9F2: mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (nsRefreshDriver.cpp:353)
==121148==    by 0xC74FC07: mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (nsRefreshDriver.cpp:782)
==121148==    by 0xC7501E2: mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyParentProcessVsync() (nsRefreshDriver.cpp:622)
==121148==    by 0xC7505F0: mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::ParentProcessVsyncNotifier::Run() (nsRefreshDriver.cpp:512)
==121148==    by 0x92AC9D5: mozilla::RunnableTask::Run() (TaskController.cpp:502)
==121148==    by 0x92AB78B: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (TaskController.cpp:805)
==121148==    by 0x92ABF6A: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (TaskController.cpp:641)
==121148==    by 0x92AC27B: mozilla::TaskController::ProcessPendingMTTask(bool) (TaskController.cpp:425)
==121148==    by 0x92AC34A: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::{lambda()#1}>::Run() (TaskController.cpp:135)
==121148==    by 0x92AD003: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:1148)
==121148==    by 0x928C939: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:466)
==121148==    by 0x99BEC29: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (MessagePump.cpp:85)
==121148==    by 0x9958338: MessageLoop::Run() (message_loop.cc:324)
==121148==    by 0xC48D498: nsBaseAppShell::Run() (nsBaseAppShell.cpp:137)
==121148==    by 0xDA2DFF9: nsAppStartup::Run() (nsAppStartup.cpp:284)
==121148==    by 0xDB2ED1D: XREMain::XRE_mainRun() (nsAppRunner.cpp:5291)
==121148==    by 0xDB30329: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) (nsAppRunner.cpp:5476)
==121148==    by 0xDB30D58: XRE_main(int, char**, mozilla::BootstrapConfig const&) (nsAppRunner.cpp:5535)
==121148==    by 0x11DEE3: do_main(int, char**, char**) (nsMailApp.cpp:229)
==121148==    by 0x11D152: main (nsMailApp.cpp:368)
==121148==

Following the first such use of uninitialized access, I see another one reported.
Only the first several frames from the 2nd stacktrace:

==121148== Conditional jump or move depends on uninitialised value(s)
==121148==    at 0xA18C124: mozilla::layers::WebRenderScrollDataWrapper::GetTransform() const (WebRenderScrollDataWrapper.h:224)
==121148==    by 0xA1B314F: void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#2}::operator()(mozilla::layers::WebRenderScrollDataWrapper) const (APZCTreeManager.cpp:528)
==121148==    by 0xA1B3873: std::enable_if<(is_same_v<decltype ({parm#2}({parm#1})), void>)&&(is_same_v<decltype ({parm#3}({parm#1})), void>), void>::type mozilla::layers::ForEachNode<mozilla::layers::ReverseIterator, mozilla::layers::WebRenderScrollDataWrapper, void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#2}, void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#3}>(mozilla::layers::WebRenderScrollDataWrapper, void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#2} const&, void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::WebRenderScrollDataWrapper)#3} const&) (TreeTraversal.h:139)
==121148==    by 0xA1C1442: void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper>(mozilla::layers::WebRenderScrollDataWrapper const&, bool, mozilla::layers::LayersId, unsigned int) (APZCTreeManager.cpp:438)

There are a couple of other uninitialized memory accesses.
The first one related to

==121322== Syscall param sendmsg(msg.msg_iov[7]) points to uninitialised byte(s)
==121322==    at 0x487BE4D: __libc_sendmsg (sendmsg.c:28)
==121322==    by 0x487BE4D: sendmsg (sendmsg.c:25)
==121322==    by 0x997060F: IPC::Channel::ChannelImpl::ProcessOutgoingMessages() [clone .part.0] (ipc_channel_posix.cc:139)

is caused by the issue already reported in bug Bug 1725959
valgrind: Syscall param sendmsg(msg.msg_iov[4]) points to uninitialised byte(s).

The other uninitialized access at

fun:drop<(webrender::internal_types::CacheTextureId, alloc::vec::Vec<webrender::internal_types::TextureCacheUpdate, alloc::alloc::Global>),alloc::alloc::Global>
   fun:drop_in_place<hashbrown::raw::RawIntoIter<(webrender::internal_types::CacheTextureId, alloc::vec::Vec<webrender::internal_types::TextureCacheUpdate, alloc::alloc::Global>), alloc::alloc::Global>>
   fun:drop_in_place<hashbrown::map::IntoIter<webrender::internal_types::CacheTextureId, alloc::vec::Vec<webrender::internal_types::TextureCacheUpdate, alloc::alloc::Global>, alloc::alloc::Global>>

might be another new bug. But fixing the earlier uninitialized issues may solve this one.

Blocks: gfx-triage
See Also: → 1725959

Hey Tim, can you triage this? (Looks apz and scroll data related. )

Flags: needinfo?(tnikkel)

This does seem important, this is a pretty basic data structure for async scrolling so we don't want to be able to hit cases like this.

Do you think this is related to changes to m-c or thunderbird? Do you get the same problem running Firefox?

Flags: needinfo?(ishikawa)

(In reply to Timothy Nikkel (:tnikkel) from comment #2)

This does seem important, this is a pretty basic data structure for async scrolling so we don't want to be able to hit cases like this.

Do you think this is related to changes to m-c or thunderbird? Do you get the same problem running Firefox?

Thank you for your comment.

I am glad that someone finally took notice.

  1. Related changes to m-c or thunderbird?

Not sure.
However, the problem regarding the strange rect size was noticed since October 2020. The check seems to have been in the code for quite sometime, and so something changed and triggered the warnings.
I filed a bugzilla mentioning this problem.
Bug 1695139
WARNING: Scrolled rect smaller than scrollport during debug version TB mochitest
In it, another Bug 810274 was mentioned. That bug seems to have downgraded the check code from assertions to warnings.

It is possible that the uninitialized variable was behind the warning to this strange rect size.
(I could not run TB under valgrind due to the issue mentioned in Bug 1629433 since the switch from |make mozmill| testing framework to the one that uses mochitest that took place at the end of 2019. If I had been able to run TB under valgrind during mochitest, I would have noticed this issue and possible connection to this strange rect size much sooner.)

  1. Do you get the same problem running Firefox?
    Good question.
    I thought there were many more FF developers than TB developer/patch contributors,
    and if the same problem is with FF, it would have been noticed there much sooner.

To test whether FF experiences this strange rect size obviously needs a debug version of FF.
NS_WARNING() does not print anything in non-DEBUG version, correct?
(I debug TB locally and create a debug version of TB binary with my own patches, that is why I could notice the warnings of strange rect size and reported it in bug 1695139 )

Well, come to think of it, maybe I can poke around other people's FF submission jobs on treeherder and see if the warning about strange rect size shows up?

Testing FF under valgrind may take a day or two (Just bringing up the initial main window of TB under valgrind takes maybe a couple of dozen minutes these days! And it uses up memory and so I don't want to do it often.)

If someone is interested in testing FF under valgrind, the following is the valgrind command I used to run TB.
Maybe someone can try running FF under valgrind with similar parameters.
You have to use your own suppression files (--suppressions).
Some memory parameters may be quite large for smaller PC. I allocate 16GB to the linux instance inside virtual box that runs under windows10 with 32 GB real memory.

valgrind --trace-children=yes --fair-sched=yes --smc-check=all-non-file --gen-suppressions=all -v --trace-signals=yes --vex-iropt-register-updates=allregs-at-mem-access --track-origins=yes --child-silent-after-fork=yes --trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,*/bin/certutil,*/bin/pk12util,*/bin/ssltunnel,*/bin/uname,*/bin/which,*/bin/ps,*/bin/grep,*/bin/java,*/fix-stacks,*/firefox/firefox,*/bin/firefox-esr,*/bin/python,*/bin/python2,*/bin/python3,*/bin/python2.7,*/bin/bash,*/bin/nodejs,*/bin/node,*/bin/xpcshell,python3 --max-threads=5000  --max-stackframe=16000000 --num-transtab-sectors=24 --tool=memcheck --freelist-vol=500000000 --redzone-size=128 --px-default=allregs-at-mem-access --px-file-backed=unwindregs-at-mem-access --malloc-fill=0xA5 --free-fill=0xC3 --num-callers=50 --suppressions=/home/ishikawa/Dropbox/myown.sup --show-mismatched-frees=no --show-possibly-lost=no --read-inline-info=yes  /KERNEL-SRC/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin 

Flags: needinfo?(ishikawa)

I'm not too worried about the rect warning, I see that a lot too and I don't think it will lead to anything bad, I don't think it's related to the valgrind uninit problem. The valgrind uninit is what I'm more interested in.

(In reply to Timothy Nikkel (:tnikkel) from comment #4)

I'm not too worried about the rect warning, I see that a lot too and I don't think it will lead to anything bad, I don't think it's related to the valgrind uninit problem. The valgrind uninit is what I'm more interested in.

Understood.

Oh well, I tried running FF under valgrind and surprisingly, the main window showed up quite fast.

HOWEVER, obviously I need to visit a few web sites to see if the uninitialized memory access occurs, and just typing characters to URL bar takes time with valgrind.
Please don't hold your breath. :-)

Also, I just realized that firefox-esr that I have under Debian GNU/Linux does not have symbols (or rather lacks the symbol files for the run-time libraries?), thus the stacktrace would be useless. :-(

In the meantime, I get the valgrind warning with TB all the time (well the last few times I tried).

FF locally compiled from M-C also shows this uninitialized memory access warning from valgrind.

I am attaching the excerpt from the log.
I include the a somewhat large number of lines preceding the relevant warnings.

Please note there are other possible (false positive?) warnings, especially the one with parse_attribute_selector , but after the window frame is shown and
the inside is still white (nothing is drawn), I see the same WebRenderer GetTransForm() warnings twice
followed by the CacheTextureID warning. The same sequence that was observed in TB. (These are toward the end of the excerpted log.)

I may not rule out the possibility that the other uninitialized memory issue
is spreading the issue here.

Flags: needinfo?(tnikkel)
Flags: needinfo?(tnikkel)

I now realize that the Bug 1725959 is for real and
it may not be the simple end of data transfer null filling failure.

We may have uninitialized data being passed for real.

And I suspect that rust code DOES get bogus uninitialized data via the following function (and from C/C++ side)
or create an uninitialized data itself when it is invoked.
https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/src/bindings.rs#3802

#[no_mangle]
pub unsafe extern "C" fn wr_api_end_builder(
    state: &mut WrState,
    dl_descriptor: &mut BuiltDisplayListDescriptor,
    dl_items_data: &mut WrVecU8,
    dl_cache_data: &mut WrVecU8,
    dl_spatial_tree: &mut WrVecU8,
) {
    let (_, dl) = state.frame_builder.dl_builder.end();
    let (payload, descriptor) = dl.into_data();
    *dl_items_data = WrVecU8::from_vec(payload.items_data);
    *dl_cache_data = WrVecU8::from_vec(payload.cache_data);
    *dl_spatial_tree = WrVecU8::from_vec(payload.spatial_tree);
    *dl_descriptor = descriptor;
}

This observation comes from the fact that valgrind reported the following
warning 9 times druing TB startup and
when I looked at the code, I realize it imports C/C++ data into rust domain.
We can get uninitialized data from outside rust.

==361883==  Uninitialised value was created by a stack allocation
==361883==    at 0xF0EC99D: wr_api_end_builder (bindings.rs:3802)

This may not be all the reason why we see the valgrind warnings in Bug 1725959.
However, I think that bug is quite relevant for this bugzilla entry
when I realized this wr_api_end_builder seems to be related to WebRender API command buffer list(?), and
that the bogus data eventually is passed to sendmsg() (Bug 1725959 ) as in the following valgrind backtrace.

I think we need a couple of people to try finding where the data comes from. I am not at all certain how to insert
valgrind check on rust side, but maybe if we rename that rust function and then creates wr_api_end_builder function in C/C++ that calls the original rust function with the proper argument, we may be able to check whether data passed to it is initialized or not using valgrind-supplied functions on the C/C++ side.
Yeah, just a kludgy idea.

The backtrace:

==361722== 23 errors in context 2 of 2:
==361722== Syscall param sendmsg(msg.msg_iov[7]) points to uninitialised byte(s)
==361722==    at 0x487BE4D: __libc_sendmsg (sendmsg.c:28)
==361722==    by 0x487BE4D: sendmsg (sendmsg.c:25)
==361722==    by 0x99E440F: IPC::Channel::ChannelImpl::ProcessOutgoingMessages() [clone .part.0] (ipc_channel_posix.cc:139)
==361722==    by 0x99E4CF4: IPC::Channel::Send(mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> >) (ipc_channel_posix.cc:949)
==361722==    by 0x9A40960: mozilla::ipc::NodeChannel::DoSendMessage(mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> >) (NodeChannel.cpp:205)
==361722==    by 0x9A2A773: mozilla::detail::RunnableMethodImpl<mozilla::ipc::NodeChannel*, void (mozilla::ipc::NodeChannel::*)(mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> >), true, (mozilla::RunnableKind)0, StoreCopyPassByRRef<mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> > > >::Run() (nsThreadUtils.h:1147)
==361722==    by 0x99D89A1: MessageLoop::RunTask(already_AddRefed<nsIRunnable>) (message_loop.cc:454)
==361722==    by 0x99D8C6D: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) (message_loop.cc:463)
==361722==    by 0x99D8DC5: MessageLoop::DoWork() [clone .part.0] (message_loop.cc:538)
==361722==    by 0x99D8F4B: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) (message_loop.cc:518)
==361722==    by 0x99CD5B8: MessageLoop::Run() (message_loop.cc:324)
==361722==    by 0x99E5164: base::Thread::ThreadMain() (thread.cc:187)
==361722==    by 0x99C7F09: ThreadFunc(void*) (platform_thread_posix.cc:40)
==361722==    by 0x4870EA6: start_thread (pthread_create.c:477)
==361722==    by 0x4BAFDEE: clone (clone.S:95)
==361722==  Address 0x18071c96 is 6 bytes inside a block of size 4,096 alloc'd
==361722==    at 0x48397B5: malloc (vg_replace_malloc.c:380)
==361722==    by 0x1203A8: moz_xmalloc (mozalloc.cpp:52)
==361722==    by 0x99D5AF7: mozilla::BufferList<InfallibleAllocPolicy>::WriteBytes(char const*, unsigned long) (mozalloc.h:149)
==361722==    by 0x99D5E99: Pickle::EndWrite(unsigned int, unsigned int) (pickle.cc:528)
==361722==    by 0x99D5F4C: Pickle::WriteBytesZeroCopy(void*, unsigned int, unsigned int) (pickle.cc:633)
==361722==    by 0x9AE4C0A: void mozilla::ipc::WriteIPDLParam<mozilla::ipc::ByteBuf>(IPC::Message*, mozilla::ipc::IProtocol*, mozilla::ipc::ByteBuf&&) [clone .constprop.0] (ByteBufUtils.h:29)
==361722==    by 0xA2D8DDC: mozilla::ipc::IPDLParamTraits<mozilla::layers::DisplayListData>::Write(IPC::Message*, mozilla::ipc::IProtocol*, mozilla::layers::DisplayListData&&) (RenderRootTypes.cpp:21)
==361722==    by 0x9D3BB12: mozilla::layers::PWebRenderBridgeChild::SendSetDisplayList(mozilla::layers::DisplayListData&&, nsTArray<mozilla::layers::OpDestroy> const&, unsigned long const&, mozilla::layers::BaseTransactionId<mozilla::layers::TransactionIdType> const&, bool const&, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, nsTString<char> const&, mozilla::TimeStamp const&, nsTArray<mozilla::layers::CompositionPayload> const&) (PWebRenderBridgeChild.cpp:228)
==361722==    by 0xA2DD0CF: mozilla::layers::WebRenderBridgeChild::EndTransaction(mozilla::layers::DisplayListData&&, mozilla::layers::BaseTransactionId<mozilla::layers::TransactionIdType>, bool, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, nsTString<char> const&) (WebRenderBridgeChild.cpp:127)
==361722==    by 0xA2FAC22: mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer(mozilla::nsDisplayList*, mozilla::nsDisplayListBuilder*, WrFiltersHolder&&, mozilla::layers::WebRenderBackgroundData*, double) (WebRenderLayerManager.cpp:456)
==361722==    by 0xCBA081D: mozilla::nsDisplayList::PaintRoot(mozilla::nsDisplayListBuilder*, gfxContext*, unsigned int, mozilla::Maybe<double>) (nsDisplayList.cpp:2385)
==361722==    by 0xC82986E: nsLayoutUtils::PaintFrame(gfxContext*, nsIFrame*, nsRegion const&, unsigned int, mozilla::nsDisplayListBuilderMode, nsLayoutUtils::PaintFrameFlags) (nsLayoutUtils.cpp:3443)
==361722==    by 0xC785AB1: mozilla::PresShell::Paint(nsView*, nsRegion const&, mozilla::PaintFlags) (PresShell.cpp:6363)
==361722==    by 0xC45D0D3: nsViewManager::ProcessPendingUpdatesPaint(nsIWidget*) (nsViewManager.cpp:467)
==361722==    by 0xC45D502: nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) (nsViewManager.cpp:402)
==361722==    by 0xC45DEA5: nsViewManager::ProcessPendingUpdates() (nsViewManager.cpp:980)
==361722==    by 0xC753ED7: nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsRefreshDriver::IsExtraTick) (nsRefreshDriver.cpp:2546)
==361722==    by 0xC755361: mozilla::detail::RunnableFunction<nsRefreshDriver::EnsureTimerStarted(nsRefreshDriver::EnsureTimerStartedFlags)::{lambda()#2}>::Run() (nsRefreshDriver.cpp:1598)
==361722==    by 0x932E465: mozilla::RunnableTask::Run() (TaskController.cpp:502)
==361722==    by 0x932D208: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (TaskController.cpp:805)
==361722==    by 0x932D9FA: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (TaskController.cpp:641)
==361722==    by 0x932DD0B: mozilla::TaskController::ProcessPendingMTTask(bool) (TaskController.cpp:425)
==361722==    by 0x932DDDA: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::{lambda()#1}>::Run() (TaskController.cpp:135)
==361722==    by 0x932EA21: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:1148)
==361722==    by 0x930F919: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:466)
==361722==    by 0x9A292C9: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (MessagePump.cpp:85)
==361722==    by 0x99CD5B8: MessageLoop::Run() (message_loop.cc:324)
==361722==    by 0xC49BF98: nsBaseAppShell::Run() (nsBaseAppShell.cpp:137)
==361722==    by 0xDB1D6FB: XRE_RunAppShell() (nsEmbedFunctions.cpp:917)
==361722==    by 0x9A29476: mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) (MessagePump.cpp:235)
==361722==    by 0x99CD5B8: MessageLoop::Run() (message_loop.cc:324)
==361722==    by 0xDB1DC65: XRE_InitChildProcess(int, char**, XREChildData const*) (nsEmbedFunctions.cpp:749)
==361722==    by 0x11DA2B: content_process_main(mozilla::Bootstrap*, int, char**) (plugin-container.cpp:57)
==361722==    by 0x11D1E3: main (nsMailApp.cpp:329)
==361722==  Uninitialised value was created by a stack allocation
==361722==    at 0xF0EC99D: wr_api_end_builder (bindings.rs:3802)
==361722==

It may still be a simple bug in which we copied two octets (two low octets of 32 bit integer) but the receiver picked up four octets (thinking the full data was passed) kind of a bug.
But I believe the function discussed in Bug 1725959 ought to take care of THAT simple scenario.
And that led me to believe we DO pass bogus data from C/C++ domain. But that probably is not end of the story.

Oh, I forgot. The backtrace shown is after I applied a patch of my own to the |EndWrite| in pickle.cc file.
(I am about to report the temporary patch to bug Bug 1725959.)
After I realize these warnings still persist after the patch, I investigated a bit and realized maybe we
really have real bogus uninitialized databeing passed around.

I admit there will be bugs forever. But I prefer deterministic bugs to non-deterministic ones for various reasons. :-)

No longer blocks: gfx-triage

Seems like the issue here is not in the specific code (that I'm familiar with) but in something more low level so clearing needinfo on me.

Flags: needinfo?(tnikkel)

(In reply to Timothy Nikkel (:tnikkel) from comment #9)

Seems like the issue here is not in the specific code (that I'm familiar with) but in something more low level so clearing needinfo on me.

Thank you in advance for your attention on this matter.

I am making a VERY SLOW PROGRESS running full TB mochitest and xpcshell test under valgrind.
When something shows up in my log that is related to this issue, I will report it in due course.

Stay tuned, but don't hold your breath.

Severity: -- → S3
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: