Some more serialization experiments: At `opt-level = 2` the following code is generated for
Bug 1550640 Comment 20 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
From VTune on Windows, testing with Miko's version of dl-mutate test I would see `Content Process` as the number one Thread with the following kind of relationship: > Content Process (TID: 10528) > + bincode::internal::serialize_into<mut webrender_api::display_list::UnsafeVecWriter*, webrender_api::display_item::DisplayItem*, bincode::config::WithOtherEndian<bincode::config::WithOtherLimit<bincode::config::DefaultOptions, bincode::internal::Infinite>, byteorder::LittleEndian,>> 1.716s > + bincode::ser::{{impl}}::serialize_field<mut webrender_api::display_list::UnsafeVecWriter*, bincode::config::WithOtherEndian<bincode::config::WithOtherLImit<bincode::config::DefaultOptions, bincode::internal::Infinite>, byteorder::LittleEndian>, webrender_api::display_item::CommonItemProperties> 1.694s > + MergeState::ProcessOldNode 1.451s > + RetainedDisplayListBuilder::PreProcessDisplayList 1.413s The time in the serialize functions was always relatively the same to that of ProcessOldNode/PreProcessDisplayList. After integration with `peek_poke` I see the following in VTune: > WRRenderBackend#2 (TID: 8464) > ContentProcess (TID: 16040) > + RetainedDisplayListBuilder::PreProcessDisplayList 1.982s > + MergeState::ProcessOldNode 1.238s > + nsIFrame::ClearInvalidationStateBits 1.226s > + nsDisplayText::CreateWebRenderCommands 1.076s > + nsIFrame::BuildDisplayListForChild 1.048s > + MergeState::ResolveNodeIndexesOldToMerged 1.015s > + nsDisplayBackgroundColor::RestoreState 0.954s > + nsTextFrame::PaintText 0.861s > + MergeState::AddNewNode 0.845s > + gfxTextRun::Draw 0.730s > + floorf 0.708s > + mozilla::layers::ClipManager::SwitchItem 0.696s > + gfxFont::Draw 0.601s > + _security_check_cookie 0.601s > + mozilla::layers::WebRenderCommandBuilder::CreateWebRenderCommandsFromDisplayList 0.571s > + nsDisplayText::RestoreState 0.559s > + webrender_api::display_list::DisplayListBuilder::push_item 0.525s Where the code to serialize `DisplayItem` ended up inlined into `DisplayListBuilder::push_item`
From VTune on Windows, testing with Miko's version of dl-mutate test I would see `Content Process` as the number one Thread with the following kind of relationship: > Content Process (TID: 10528) > + bincode::internal::serialize_into<mut webrender_api::display_list::UnsafeVecWriter*, webrender_api::display_item::DisplayItem*, bincode::config::WithOtherEndian<bincode::config::WithOtherLimit<bincode::config::DefaultOptions, bincode::internal::Infinite>, byteorder::LittleEndian,>> 1.716s > + bincode::ser::{{impl}}::serialize_field<mut webrender_api::display_list::UnsafeVecWriter*, bincode::config::WithOtherEndian<bincode::config::WithOtherLImit<bincode::config::DefaultOptions, bincode::internal::Infinite>, byteorder::LittleEndian>, webrender_api::display_item::CommonItemProperties> 1.694s > + MergeState::ProcessOldNode 1.451s > + RetainedDisplayListBuilder::PreProcessDisplayList 1.413s The time in the serialize functions was always relatively the same to that of ProcessOldNode/PreProcessDisplayList. After integration with `peek_poke` I see the following in VTune: > WRRenderBackend#2 (TID: 8464) > ContentProcess (TID: 16040) > + RetainedDisplayListBuilder::PreProcessDisplayList 1.982s > + MergeState::ProcessOldNode 1.238s > + nsIFrame::ClearInvalidationStateBits 1.226s > + nsDisplayText::CreateWebRenderCommands 1.076s > + nsIFrame::BuildDisplayListForChild 1.048s > + MergeState::ResolveNodeIndexesOldToMerged 1.015s > + nsDisplayBackgroundColor::RestoreState 0.954s > + nsTextFrame::PaintText 0.861s > + MergeState::AddNewNode 0.845s > + gfxTextRun::Draw 0.730s > + floorf 0.708s > + mozilla::layers::ClipManager::SwitchItem 0.696s > + gfxFont::Draw 0.601s > + _security_check_cookie 0.601s > + mozilla::layers::WebRenderCommandBuilder::CreateWebRenderCommandsFromDisplayList 0.571s > + nsDisplayText::RestoreState 0.559s > + webrender_api::display_list::DisplayListBuilder::push_item 0.525s Where the code to serialize `DisplayItem` ended up inlined into `DisplayListBuilder::push_item`. The two top memory intensive functions are combined and fall from #1 to #17 in the stack.