Closed
Bug 1387342
Opened 8 years ago
Closed 3 years ago
tabhoarding: Firefox is slow as heck and I'm not going to take it anymore
Categories
(Core :: DOM: Content Processes, defect, P5)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: nevion, Unassigned)
Details
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.78 Safari/537.36
Steps to reproduce:
Restart firefox, every day or two for the last decade.
I have 200-500 tabs open. That's how I do.
I usually operate tab tree view or tab tree - I've used tab tree basically from release until a year ago, switching to tab tree view. Session manager, No script, ublock origin, tab suspender, couple of others are my usual addons. about:plugins has only OpenH264 and Widevine. Distro stuff is always disabled, using firefox official binaries.
I've created a backup profile where I'm testing with safe mode. I've demonstrated the problem with all addons disabled except tab tree view, not in safe mode.
Of course I want the problem solved but I'd also like to learn how to profile or trace this issue down through firefox internals. I'm wanting to get my hands dirty, I've been trying to figure these issues out for 5 years now, off and on and have thrown vtune at it, as well as the gecko profiler each time not really finding anything relevant despite the fact that the main thread of firefox can easily get pegged to 100% with children not doing much. Gecko seemed to only be able to see things that were too high level and vtune too low level to figure out what was going on. As a starter, how was bug 906076 or other tab related performance issues studied?
Actual results:
Driven to madness, and using chrome where I'd otherwise prefer to be using firefox. Often complete stalls or delayed processing on the order of a minute.
I've failed to demonstrate issues in safe mode because it is impractical to wield many tabs or even make many of them load since bug 906076 starts all tabs unloaded and to me I can't really load firefox down without tab tree views, through normal usage. Perhaps some sort of "tabdo load" is a next step to reproducing in current and future builds.
Expected results:
Fast interactive usage that scales with the tabs until levels the 10s of thousands are hit. In my mind tabs are light weight processes and if I don't actively use them they shouldn't carry weight - that methodology really helps in research.
Want to highlight, any machine, no matter how powerful or what graphics drivers has these same problems. I've worked on dozens of machines having the issue, most recently machines having 40 cores and 128GiB of memory and plenty of disk.
To clarify I just need some help loading a safe-mode started instance down via usage with large numbers of tabs to get something to... profile. And even then, I'm not sure gecko or vtune can really catch an issue - is there any other instrumentation that might help track this issue down?
Comment 3•8 years ago
|
||
Hi Ehsan, can you point nevion in the right direction for narrowing down the cause of this performance issue?
Flags: needinfo?(ehsan)
Comment 4•8 years ago
|
||
Go to https://perf-html.io/, install the profiler extension from there, and from the toolbar icon press Start to make sure the profiler is running. When the performance issues mentioned happened, press Capture, in the page that opens, press Share, wait for the upload to finish and paste the links here please. Thanks!
Flags: needinfo?(ehsan)
(In reply to nevion from comment #0)
> I've failed to demonstrate issues in safe mode because it is impractical to
> wield many tabs or even make many of them load since bug 906076 starts all
> tabs unloaded and to me I can't really load firefox down without tab tree
> views, through normal usage. Perhaps some sort of "tabdo load" is a next
> step to reproducing in current and future builds.
>
I am not sure I understood the issue correctly, but you can make all tabs load in safe mode (since most of them started unloaded) by right clicking over a tab and selecting "Reload all tabs" from the pop-up menu.
Managed to create a web extension to reload all tabs prior to nivtwig's comment. Will have some time to come back to this soon.
@eshan - does the profile capture or report urls or content from my opened browsers? For my work computer I would want to make sure nothing proprietary is captured. Trivial googling didn't get me an answer.
Hi ehsan , first an intermediate profiling. Unfortunately had to restart the browser again and I'm out of time tonight - but I did catch the 10 minute start time I have for all the tabs to load. I'd love to figure out what's stalling them from finalizing their loading for so many minutes. This definitely ties into the overall issue and exacerbates it. However the captures look like they only catch 17 seconds... is that enough to asses what's blocking who? I looked through the profile and didn't see much stick out except for the load ramping up at 11.5 seconds. The gecko profiler's profiling of the C++ side has improved since the last time I had used it but I'm still afraid of being able to start the profiler while slogging is happening, and this ultimately limiting the sampling capability of a sampling based profile if it interleaves with the main thread. Poll masking waits and GC's continue to obfuscate the profiles :(.
https://perfht.ml/2wCL7JG
I will have to run again when it is slogging hard to open a new tab, after it's been running for a while.
Comment 8•8 years ago
|
||
(Apologies for the long delay in getting back to you...)
(In reply to nevion from comment #6)
> @eshan - does the profile capture or report urls or content from my opened
> browsers? For my work computer I would want to make sure nothing
> proprietary is captured. Trivial googling didn't get me an answer.
Yes it does, the URLs from the pages and scripts on the pages you have open will be uploaded when you upload a profile. This is explained in the dialog we show before uploading a profile <https://screenshots.firefox.com/TynRqfioPUMR9tTb/perf-html.io>, sorry if that wasn't clear.
The profile in comment 7 is somewhat helpful in explaining the issues you have been seeing. The biggest issue you are hitting is that you have a single content process it seems. Can you please open a new tab, go to about:support in it, click "Copy text to clipboard" and paste that here?
Application Basics
------------------
Name: Firefox
Version: 55.0
Build ID: 20170803173024
Update Channel: beta
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0
OS: Linux 4.4.0-67-generic
Multiprocess Windows: 2/2 (Enabled by default)
Web Content Processes: 1/1
Google Key: Found
Mozilla Location Service Key: Found
Safe Mode: false
Crash Reports for the Last 3 Days
---------------------------------
All Crash Reports (including 1 pending crash in the given time range)
Firefox Features
----------------
Name: Application Update Service Helper
Version: 2.0
ID: aushelper@mozilla.org
Name: Click-to-Play staged rollout
Version: 1.2
ID: clicktoplay-rollout@mozilla.org
Name: Firefox Screenshots
Version: 10.10.0
ID: screenshots@mozilla.org
Name: Follow-on Search Telemetry
Version: 0.9.1
ID: followonsearch@mozilla.com
Name: Multi-process staged rollout
Version: 2.0
ID: e10srollout@mozilla.org
Name: Pocket
Version: 1.0.5
ID: firefox@getpocket.com
Name: Shield Recipe Client
Version: 55.1
ID: shield-recipe-client@mozilla.org
Name: TLS 1.3 Compatibility Testing of Middleboxes
Version: 1.0.0
ID: tls13-middlebox@mozilla.org
Name: Web Compat
Version: 1.1
ID: webcompat@mozilla.org
Extensions
----------
Name: Gecko Profiler
Version: 0.17
Enabled: true
ID: geckoprofiler@mozilla.com
Name: Sample Commands Extension
Version: 1.0
Enabled: true
ID: e7dbe99907f41a74a4a7b8762debb5446f18d6f5@temporary-addon #I wrote this and used it to force reloading of everything without cache
Name: Tab Tree
Version: 1.5.0
Enabled: false
ID: TabsTree@traxium
Name: Brief
Version: 2.3.0
Enabled: false
ID: brief@mozdev.org
Name: Google Redirects Fixer
Version: 2.1.0
Enabled: false
ID: jid1-zUrvDCat3xoDSQ@jetpack
Name: Session Manager
Version: 0.8.1.13
Enabled: false
ID: {1280606b-2510-4fe0-97ef-9b5a22eafe30}
Name: Tab Suspender (memory saver)
Version: 0.1.8
Enabled: false
ID: {e225ac78-5e83-484b-a16b-b6ed0924212f}
Name: uBlock Origin
Version: 1.13.8
Enabled: false
ID: uBlock0@raymondhill.net
Name: Ubuntu Modifications
Version: 3.2
Enabled: false
ID: ubufox@ubuntu.com
Name: Zotero Connector
Version: 5.0.11
Enabled: false
ID: zotero@chnm.gmu.edu
Graphics
--------
Features
Compositing: Basic
Asynchronous Pan/Zoom: wheel input enabled; scrollbar drag enabled
WebGL 1 Driver WSI Info: GLX 1.4 GLX_VENDOR(client): NVIDIA Corporation GLX_VENDOR(server): NVIDIA Corporation Extensions: GLX_EXT_visual_info GLX_EXT_visual_rating GLX_EXT_import_context GLX_SGIX_fbconfig GLX_SGIX_pbuffer GLX_SGI_video_sync GLX_SGI_swap_control GLX_EXT_swap_control GLX_EXT_swap_control_tear GLX_EXT_texture_from_pixmap GLX_EXT_buffer_age GLX_ARB_create_context GLX_ARB_create_context_profile GLX_EXT_create_context_es_profile GLX_EXT_create_context_es2_profile GLX_ARB_create_context_robustness GLX_NV_delay_before_swap GLX_EXT_stereo_tree GLX_ARB_context_flush_control GLX_NV_robustness_video_memory_purge GLX_ARB_multisample GLX_NV_float_buffer GLX_ARB_fbconfig_float GLX_EXT_framebuffer_sRGB GLX_NV_copy_image GLX_ARB_get_proc_address
WebGL 1 Driver Renderer: NVIDIA Corporation -- TITAN Xp/PCIe/SSE2
WebGL 1 Driver Version: 4.5.0 NVIDIA 375.66
WebGL 1 Driver Extensions: GL_AMD_multi_draw_indirect GL_AMD_seamless_cubemap_per_texture GL_AMD_vertex_shader_viewport_index GL_AMD_vertex_shader_layer GL_ARB_arrays_of_arrays GL_ARB_base_instance GL_ARB_bindless_texture GL_ARB_blend_func_extended GL_ARB_buffer_storage GL_ARB_clear_buffer_object GL_ARB_clear_texture GL_ARB_clip_control GL_ARB_color_buffer_float GL_ARB_compatibility GL_ARB_compressed_texture_pixel_storage GL_ARB_conservative_depth GL_ARB_compute_shader GL_ARB_compute_variable_group_size GL_ARB_conditional_render_inverted GL_ARB_copy_buffer GL_ARB_copy_image GL_ARB_cull_distance GL_ARB_debug_output GL_ARB_depth_buffer_float GL_ARB_depth_clamp GL_ARB_depth_texture GL_ARB_derivative_control GL_ARB_direct_state_access GL_ARB_draw_buffers GL_ARB_draw_buffers_blend GL_ARB_draw_indirect GL_ARB_draw_elements_base_vertex GL_ARB_draw_instanced GL_ARB_enhanced_layouts GL_ARB_ES2_compatibility GL_ARB_ES3_compatibility GL_ARB_ES3_1_compatibility GL_ARB_ES3_2_compatibility GL_ARB_explicit_attrib_location GL_ARB_explicit_uniform_location GL_ARB_fragment_coord_conventions GL_ARB_fragment_layer_viewport GL_ARB_fragment_program GL_ARB_fragment_program_shadow GL_ARB_fragment_shader GL_ARB_fragment_shader_interlock GL_ARB_framebuffer_no_attachments GL_ARB_framebuffer_object GL_ARB_framebuffer_sRGB GL_ARB_geometry_shader4 GL_ARB_get_program_binary GL_ARB_get_texture_sub_image GL_ARB_gl_spirv GL_ARB_gpu_shader5 GL_ARB_gpu_shader_fp64 GL_ARB_gpu_shader_int64 GL_ARB_half_float_pixel GL_ARB_half_float_vertex GL_ARB_imaging GL_ARB_indirect_parameters GL_ARB_instanced_arrays GL_ARB_internalformat_query GL_ARB_internalformat_query2 GL_ARB_invalidate_subdata GL_ARB_map_buffer_alignment GL_ARB_map_buffer_range GL_ARB_multi_bind GL_ARB_multi_draw_indirect GL_ARB_multisample GL_ARB_multitexture GL_ARB_occlusion_query GL_ARB_occlusion_query2 GL_ARB_parallel_shader_compile GL_ARB_pipeline_statistics_query GL_ARB_pixel_buffer_object GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_post_depth_coverage GL_ARB_program_interface_query GL_ARB_provoking_vertex GL_ARB_query_buffer_object GL_ARB_robust_buffer_access_behavior GL_ARB_robustness GL_ARB_sample_locations GL_ARB_sample_shading GL_ARB_sampler_objects GL_ARB_seamless_cube_map GL_ARB_seamless_cubemap_per_texture GL_ARB_separate_shader_objects GL_ARB_shader_atomic_counter_ops GL_ARB_shader_atomic_counters GL_ARB_shader_ballot GL_ARB_shader_bit_encoding GL_ARB_shader_clock GL_ARB_shader_draw_parameters GL_ARB_shader_group_vote GL_ARB_shader_image_load_store GL_ARB_shader_image_size GL_ARB_shader_objects GL_ARB_shader_precision GL_ARB_shader_storage_buffer_object GL_ARB_shader_subroutine GL_ARB_shader_texture_image_samples GL_ARB_shader_texture_lod GL_ARB_shading_language_100 GL_ARB_shader_viewport_layer_array GL_ARB_shading_language_420pack GL_ARB_shading_language_include GL_ARB_shading_language_packing GL_ARB_shadow GL_ARB_sparse_buffer GL_ARB_sparse_texture GL_ARB_sparse_texture2 GL_ARB_sparse_texture_clamp GL_ARB_stencil_texturing GL_ARB_sync GL_ARB_tessellation_shader GL_ARB_texture_barrier GL_ARB_texture_border_clamp GL_ARB_texture_buffer_object GL_ARB_texture_buffer_object_rgb32 GL_ARB_texture_buffer_range GL_ARB_texture_compression GL_ARB_texture_compression_bptc GL_ARB_texture_compression_rgtc GL_ARB_texture_cube_map GL_ARB_texture_cube_map_array GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_filter_minmax GL_ARB_texture_float GL_ARB_texture_gather GL_ARB_texture_mirror_clamp_to_edge GL_ARB_texture_mirrored_repeat GL_ARB_texture_multisample GL_ARB_texture_non_power_of_two GL_ARB_texture_query_levels GL_ARB_texture_query_lod GL_ARB_texture_rectangle GL_ARB_texture_rg GL_ARB_texture_rgb10_a2ui GL_ARB_texture_stencil8 GL_ARB_texture_storage GL_ARB_texture_storage_multisample GL_ARB_texture_swizzle GL_ARB_texture_view GL_ARB_timer_query GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_transform_feedback_instanced GL_ARB_transform_feedback_overflow_query GL_ARB_transpose_matrix GL_ARB_uniform_buffer_object GL_ARB_vertex_array_bgra GL_ARB_vertex_array_object GL_ARB_vertex_attrib_64bit GL_ARB_vertex_attrib_binding GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_vertex_type_10f_11f_11f_rev GL_ARB_vertex_type_2_10_10_10_rev GL_ARB_viewport_array GL_ARB_window_pos GL_ATI_draw_buffers GL_ATI_texture_float GL_ATI_texture_mirror_once GL_S3_s3tc GL_EXT_texture_env_add GL_EXT_abgr GL_EXT_bgra GL_EXT_bindable_uniform GL_EXT_blend_color GL_EXT_blend_equation_separate GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_compiled_vertex_array GL_EXT_Cg_shader GL_EXT_depth_bounds_test GL_EXT_direct_state_access GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_draw_range_elements GL_EXT_fog_coord GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXTX_framebuffer_mixed_formats GL_EXT_framebuffer_multisample_blit_scaled GL_EXT_framebuffer_object GL_EXT_framebuffer_sRGB GL_EXT_geometry_shader4 GL_EXT_gpu_program_parameters GL_EXT_gpu_shader4 GL_EXT_multi_draw_arrays GL_EXT_packed_depth_stencil GL_EXT_packed_float GL_EXT_packed_pixels GL_EXT_pixel_buffer_object GL_EXT_point_parameters GL_EXT_polygon_offset_clamp GL_EXT_post_depth_coverage GL_EXT_provoking_vertex GL_EXT_raster_multisample GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_shader_objects GL_EXT_separate_specular_color GL_EXT_shader_image_load_formatted GL_EXT_shader_image_load_store GL_EXT_shader_integer_mix GL_EXT_shadow_funcs GL_EXT_sparse_texture2 GL_EXT_stencil_two_side GL_EXT_stencil_wrap GL_EXT_texture3D GL_EXT_texture_array GL_EXT_texture_buffer_object GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_latc GL_EXT_texture_compression_rgtc GL_EXT_texture_compression_s3tc GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_EXT_texture_filter_anisotropic GL_EXT_texture_filter_minmax GL_EXT_texture_integer GL_EXT_texture_lod GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp GL_EXT_texture_object GL_EXT_texture_shared_exponent GL_EXT_texture_sRGB GL_EXT_texture_sRGB_decode GL_EXT_texture_storage GL_EXT_texture_swizzle GL_EXT_timer_query GL_EXT_transform_feedback2 GL_EXT_vertex_array GL_EXT_vertex_array_bgra GL_EXT_vertex_attrib_64bit GL_EXT_window_rectangles GL_EXT_x11_sync_object GL_EXT_import_sync_object GL_NV_robustness_video_memory_purge GL_IBM_rasterpos_clip GL_IBM_texture_mirrored_repeat GL_KHR_context_flush_control GL_KHR_debug GL_KHR_no_error GL_KHR_robust_buffer_access_behavior GL_KHR_robustness GL_KTX_buffer_region GL_NV_alpha_to_coverage_dither_control GL_NV_bindless_multi_draw_indirect GL_NV_bindless_multi_draw_indirect_count GL_NV_bindless_texture GL_NV_blend_equation_advanced GL_NV_blend_equation_advanced_coherent GL_NVX_blend_equation_advanced_multi_draw_buffers GL_NV_blend_square GL_NV_clip_space_w_scaling GL_NV_command_list GL_NV_compute_program5 GL_NV_conditional_render GL_NV_conservative_raster GL_NV_conservative_raster_dilate GL_NV_conservative_raster_pre_snap_triangles GL_NV_copy_depth_to_color GL_NV_copy_image GL_NV_depth_buffer_float GL_NV_depth_clamp GL_NV_draw_texture GL_NV_draw_vulkan_image GL_NV_ES1_1_compatibility GL_NV_ES3_1_compatibility GL_NV_explicit_multisample GL_NV_fence GL_NV_fill_rectangle GL_NV_float_buffer GL_NV_fog_distance GL_NV_fragment_coverage_to_color GL_NV_fragment_program GL_NV_fragment_program_option GL_NV_fragment_program2 GL_NV_fragment_shader_interlock GL_NV_framebuffer_mixed_samples GL_NV_framebuffer_multisample_coverage GL_NV_geometry_shader4 GL_NV_geometry_shader_passthrough GL_NV_gpu_program4 GL_NV_internalformat_sample_query GL_NV_gpu_program4_1 GL_NV_gpu_program5 GL_NV_gpu_program5_mem_extended GL_NV_gpu_program_fp64 GL_NV_gpu_shader5 GL_NV_half_float GL_NV_light_max_exponent GL_NV_multisample_coverage GL_NV_multisample_filter_hint GL_NV_occlusion_query GL_NV_packed_depth_stencil GL_NV_parameter_buffer_object GL_NV_parameter_buffer_object2 GL_NV_path_rendering GL_NV_path_rendering_shared_edge GL_NV_pixel_data_range GL_NV_point_sprite GL_NV_primitive_restart GL_NV_register_combiners GL_NV_register_combiners2 GL_NV_sample_locations GL_NV_sample_mask_override_coverage GL_NV_shader_atomic_counters GL_NV_shader_atomic_float GL_NV_shader_atomic_float64 GL_NV_shader_atomic_fp16_vector GL_NV_shader_atomic_int64 GL_NV_shader_buffer_load GL_NV_shader_storage_buffer_object GL_NV_stereo_view_rendering GL_NV_texgen_reflection GL_NV_texture_barrier GL_NV_texture_compression_vtc GL_NV_texture_env_combine4 GL_NV_texture_multisample GL_NV_texture_rectangle GL_NV_texture_shader GL_NV_texture_shader2 GL_NV_texture_shader3 GL_NV_transform_feedback GL_NV_transform_feedback2 GL_NV_uniform_buffer_unified_memory GL_NV_vdpau_interop GL_NV_vertex_array_range GL_NV_vertex_array_range2 GL_NV_vertex_attrib_integer_64bit GL_NV_vertex_buffer_unified_memory GL_NV_vertex_program GL_NV_vertex_program1_1 GL_NV_vertex_program2 GL_NV_vertex_program2_option GL_NV_vertex_program3 GL_NV_viewport_array2 GL_NV_viewport_swizzle GL_NVX_conditional_render GL_NVX_gpu_memory_info GL_NVX_nvenc_interop GL_NV_shader_thread_group GL_NV_shader_thread_shuffle GL_KHR_blend_equation_advanced GL_KHR_blend_equation_advanced_coherent GL_SGIS_generate_mipmap GL_SGIS_texture_lod GL_SGIX_depth_texture GL_SGIX_shadow GL_SUN_slice_accum
WebGL 1 Extensions: ANGLE_instanced_arrays EXT_blend_minmax EXT_color_buffer_half_float EXT_frag_depth EXT_sRGB EXT_shader_texture_lod EXT_texture_filter_anisotropic EXT_disjoint_timer_query MOZ_debug OES_element_index_uint OES_standard_derivatives OES_texture_float OES_texture_float_linear OES_texture_half_float OES_texture_half_float_linear OES_vertex_array_object WEBGL_color_buffer_float WEBGL_compressed_texture_etc WEBGL_compressed_texture_s3tc WEBGL_compressed_texture_s3tc_srgb WEBGL_debug_renderer_info WEBGL_debug_shaders WEBGL_depth_texture WEBGL_draw_buffers WEBGL_lose_context MOZ_WEBGL_lose_context MOZ_WEBGL_compressed_texture_s3tc MOZ_WEBGL_depth_texture
WebGL 2 Driver WSI Info: GLX 1.4 GLX_VENDOR(client): NVIDIA Corporation GLX_VENDOR(server): NVIDIA Corporation Extensions: GLX_EXT_visual_info GLX_EXT_visual_rating GLX_EXT_import_context GLX_SGIX_fbconfig GLX_SGIX_pbuffer GLX_SGI_video_sync GLX_SGI_swap_control GLX_EXT_swap_control GLX_EXT_swap_control_tear GLX_EXT_texture_from_pixmap GLX_EXT_buffer_age GLX_ARB_create_context GLX_ARB_create_context_profile GLX_EXT_create_context_es_profile GLX_EXT_create_context_es2_profile GLX_ARB_create_context_robustness GLX_NV_delay_before_swap GLX_EXT_stereo_tree GLX_ARB_context_flush_control GLX_NV_robustness_video_memory_purge GLX_ARB_multisample GLX_NV_float_buffer GLX_ARB_fbconfig_float GLX_EXT_framebuffer_sRGB GLX_NV_copy_image GLX_ARB_get_proc_address
WebGL 2 Driver Renderer: NVIDIA Corporation -- TITAN Xp/PCIe/SSE2
WebGL 2 Driver Version: 3.2.0 NVIDIA 375.66
WebGL 2 Driver Extensions: GL_AMD_multi_draw_indirect GL_AMD_seamless_cubemap_per_texture GL_AMD_vertex_shader_viewport_index GL_AMD_vertex_shader_layer GL_ARB_arrays_of_arrays GL_ARB_base_instance GL_ARB_bindless_texture GL_ARB_blend_func_extended GL_ARB_buffer_storage GL_ARB_clear_buffer_object GL_ARB_clear_texture GL_ARB_clip_control GL_ARB_color_buffer_float GL_ARB_compressed_texture_pixel_storage GL_ARB_conservative_depth GL_ARB_compute_shader GL_ARB_compute_variable_group_size GL_ARB_conditional_render_inverted GL_ARB_copy_buffer GL_ARB_copy_image GL_ARB_cull_distance GL_ARB_debug_output GL_ARB_depth_buffer_float GL_ARB_depth_clamp GL_ARB_depth_texture GL_ARB_derivative_control GL_ARB_direct_state_access GL_ARB_draw_buffers GL_ARB_draw_buffers_blend GL_ARB_draw_indirect GL_ARB_draw_elements_base_vertex GL_ARB_draw_instanced GL_ARB_enhanced_layouts GL_ARB_ES2_compatibility GL_ARB_ES3_compatibility GL_ARB_ES3_1_compatibility GL_ARB_ES3_2_compatibility GL_ARB_explicit_attrib_location GL_ARB_explicit_uniform_location GL_ARB_fragment_coord_conventions GL_ARB_fragment_layer_viewport GL_ARB_fragment_program GL_ARB_fragment_program_shadow GL_ARB_fragment_shader GL_ARB_fragment_shader_interlock GL_ARB_framebuffer_no_attachments GL_ARB_framebuffer_object GL_ARB_framebuffer_sRGB GL_ARB_geometry_shader4 GL_ARB_get_program_binary GL_ARB_get_texture_sub_image GL_ARB_gl_spirv GL_ARB_gpu_shader5 GL_ARB_gpu_shader_fp64 GL_ARB_gpu_shader_int64 GL_ARB_half_float_pixel GL_ARB_half_float_vertex GL_ARB_imaging GL_ARB_indirect_parameters GL_ARB_instanced_arrays GL_ARB_internalformat_query GL_ARB_internalformat_query2 GL_ARB_invalidate_subdata GL_ARB_map_buffer_alignment GL_ARB_map_buffer_range GL_ARB_multi_bind GL_ARB_multi_draw_indirect GL_ARB_multisample GL_ARB_multitexture GL_ARB_occlusion_query GL_ARB_occlusion_query2 GL_ARB_parallel_shader_compile GL_ARB_pipeline_statistics_query GL_ARB_pixel_buffer_object GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_post_depth_coverage GL_ARB_program_interface_query GL_ARB_provoking_vertex GL_ARB_query_buffer_object GL_ARB_robust_buffer_access_behavior GL_ARB_robustness GL_ARB_sample_locations GL_ARB_sample_shading GL_ARB_sampler_objects GL_ARB_seamless_cube_map GL_ARB_seamless_cubemap_per_texture GL_ARB_separate_shader_objects GL_ARB_shader_atomic_counter_ops GL_ARB_shader_atomic_counters GL_ARB_shader_ballot GL_ARB_shader_bit_encoding GL_ARB_shader_clock GL_ARB_shader_draw_parameters GL_ARB_shader_group_vote GL_ARB_shader_image_load_store GL_ARB_shader_image_size GL_ARB_shader_objects GL_ARB_shader_precision GL_ARB_shader_storage_buffer_object GL_ARB_shader_subroutine GL_ARB_shader_texture_image_samples GL_ARB_shader_texture_lod GL_ARB_shading_language_100 GL_ARB_shader_viewport_layer_array GL_ARB_shading_language_420pack GL_ARB_shading_language_include GL_ARB_shading_language_packing GL_ARB_shadow GL_ARB_sparse_buffer GL_ARB_sparse_texture GL_ARB_sparse_texture2 GL_ARB_sparse_texture_clamp GL_ARB_stencil_texturing GL_ARB_sync GL_ARB_tessellation_shader GL_ARB_texture_barrier GL_ARB_texture_border_clamp GL_ARB_texture_buffer_object GL_ARB_texture_buffer_object_rgb32 GL_ARB_texture_buffer_range GL_ARB_texture_compression GL_ARB_texture_compression_bptc GL_ARB_texture_compression_rgtc GL_ARB_texture_cube_map GL_ARB_texture_cube_map_array GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_filter_minmax GL_ARB_texture_float GL_ARB_texture_gather GL_ARB_texture_mirror_clamp_to_edge GL_ARB_texture_mirrored_repeat GL_ARB_texture_multisample GL_ARB_texture_non_power_of_two GL_ARB_texture_query_levels GL_ARB_texture_query_lod GL_ARB_texture_rectangle GL_ARB_texture_rg GL_ARB_texture_rgb10_a2ui GL_ARB_texture_stencil8 GL_ARB_texture_storage GL_ARB_texture_storage_multisample GL_ARB_texture_swizzle GL_ARB_texture_view GL_ARB_timer_query GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_transform_feedback_instanced GL_ARB_transform_feedback_overflow_query GL_ARB_transpose_matrix GL_ARB_uniform_buffer_object GL_ARB_vertex_array_bgra GL_ARB_vertex_array_object GL_ARB_vertex_attrib_64bit GL_ARB_vertex_attrib_binding GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_vertex_type_10f_11f_11f_rev GL_ARB_vertex_type_2_10_10_10_rev GL_ARB_viewport_array GL_ARB_window_pos GL_ATI_draw_buffers GL_ATI_texture_float GL_ATI_texture_mirror_once GL_S3_s3tc GL_EXT_texture_env_add GL_EXT_abgr GL_EXT_bgra GL_EXT_bindable_uniform GL_EXT_blend_color GL_EXT_blend_equation_separate GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_compiled_vertex_array GL_EXT_Cg_shader GL_EXT_depth_bounds_test GL_EXT_direct_state_access GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_draw_range_elements GL_EXT_fog_coord GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXTX_framebuffer_mixed_formats GL_EXT_framebuffer_multisample_blit_scaled GL_EXT_framebuffer_object GL_EXT_framebuffer_sRGB GL_EXT_geometry_shader4 GL_EXT_gpu_program_parameters GL_EXT_gpu_shader4 GL_EXT_multi_draw_arrays GL_EXT_packed_depth_stencil GL_EXT_packed_float GL_EXT_packed_pixels GL_EXT_pixel_buffer_object GL_EXT_point_parameters GL_EXT_polygon_offset_clamp GL_EXT_post_depth_coverage GL_EXT_provoking_vertex GL_EXT_raster_multisample GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_shader_objects GL_EXT_separate_specular_color GL_EXT_shader_image_load_formatted GL_EXT_shader_image_load_store GL_EXT_shader_integer_mix GL_EXT_shadow_funcs GL_EXT_sparse_texture2 GL_EXT_stencil_two_side GL_EXT_stencil_wrap GL_EXT_texture3D GL_EXT_texture_array GL_EXT_texture_buffer_object GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_latc GL_EXT_texture_compression_rgtc GL_EXT_texture_compression_s3tc GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_EXT_texture_filter_anisotropic GL_EXT_texture_filter_minmax GL_EXT_texture_integer GL_EXT_texture_lod GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp GL_EXT_texture_object GL_EXT_texture_shared_exponent GL_EXT_texture_sRGB GL_EXT_texture_sRGB_decode GL_EXT_texture_storage GL_EXT_texture_swizzle GL_EXT_timer_query GL_EXT_transform_feedback2 GL_EXT_vertex_array GL_EXT_vertex_array_bgra GL_EXT_vertex_attrib_64bit GL_EXT_window_rectangles GL_EXT_x11_sync_object GL_EXT_import_sync_object GL_NV_robustness_video_memory_purge GL_IBM_rasterpos_clip GL_IBM_texture_mirrored_repeat GL_KHR_context_flush_control GL_KHR_debug GL_KHR_no_error GL_KHR_robust_buffer_access_behavior GL_KHR_robustness GL_KTX_buffer_region GL_NV_alpha_to_coverage_dither_control GL_NV_bindless_multi_draw_indirect GL_NV_bindless_multi_draw_indirect_count GL_NV_bindless_texture GL_NV_blend_equation_advanced GL_NV_blend_equation_advanced_coherent GL_NVX_blend_equation_advanced_multi_draw_buffers GL_NV_blend_square GL_NV_clip_space_w_scaling GL_NV_command_list GL_NV_compute_program5 GL_NV_conditional_render GL_NV_conservative_raster GL_NV_conservative_raster_dilate GL_NV_conservative_raster_pre_snap_triangles GL_NV_copy_depth_to_color GL_NV_copy_image GL_NV_depth_buffer_float GL_NV_depth_clamp GL_NV_draw_texture GL_NV_draw_vulkan_image GL_NV_ES1_1_compatibility GL_NV_ES3_1_compatibility GL_NV_explicit_multisample GL_NV_fence GL_NV_fill_rectangle GL_NV_float_buffer GL_NV_fog_distance GL_NV_fragment_coverage_to_color GL_NV_fragment_program GL_NV_fragment_program_option GL_NV_fragment_program2 GL_NV_fragment_shader_interlock GL_NV_framebuffer_mixed_samples GL_NV_framebuffer_multisample_coverage GL_NV_geometry_shader4 GL_NV_geometry_shader_passthrough GL_NV_gpu_program4 GL_NV_internalformat_sample_query GL_NV_gpu_program4_1 GL_NV_gpu_program5 GL_NV_gpu_program5_mem_extended GL_NV_gpu_program_fp64 GL_NV_gpu_shader5 GL_NV_half_float GL_NV_light_max_exponent GL_NV_multisample_coverage GL_NV_multisample_filter_hint GL_NV_occlusion_query GL_NV_packed_depth_stencil GL_NV_parameter_buffer_object GL_NV_parameter_buffer_object2 GL_NV_path_rendering GL_NV_path_rendering_shared_edge GL_NV_pixel_data_range GL_NV_point_sprite GL_NV_primitive_restart GL_NV_register_combiners GL_NV_register_combiners2 GL_NV_sample_locations GL_NV_sample_mask_override_coverage GL_NV_shader_atomic_counters GL_NV_shader_atomic_float GL_NV_shader_atomic_float64 GL_NV_shader_atomic_fp16_vector GL_NV_shader_atomic_int64 GL_NV_shader_buffer_load GL_NV_shader_storage_buffer_object GL_NV_stereo_view_rendering GL_NV_texgen_reflection GL_NV_texture_barrier GL_NV_texture_compression_vtc GL_NV_texture_env_combine4 GL_NV_texture_multisample GL_NV_texture_rectangle GL_NV_texture_shader GL_NV_texture_shader2 GL_NV_texture_shader3 GL_NV_transform_feedback GL_NV_transform_feedback2 GL_NV_uniform_buffer_unified_memory GL_NV_vdpau_interop GL_NV_vertex_array_range GL_NV_vertex_array_range2 GL_NV_vertex_attrib_integer_64bit GL_NV_vertex_buffer_unified_memory GL_NV_vertex_program GL_NV_vertex_program1_1 GL_NV_vertex_program2 GL_NV_vertex_program2_option GL_NV_vertex_program3 GL_NV_viewport_array2 GL_NV_viewport_swizzle GL_NVX_conditional_render GL_NVX_gpu_memory_info GL_NVX_nvenc_interop GL_NV_shader_thread_group GL_NV_shader_thread_shuffle GL_KHR_blend_equation_advanced GL_KHR_blend_equation_advanced_coherent GL_SGIS_generate_mipmap GL_SGIS_texture_lod GL_SGIX_depth_texture GL_SGIX_shadow GL_SUN_slice_accum
WebGL 2 Extensions: EXT_color_buffer_float EXT_texture_filter_anisotropic EXT_disjoint_timer_query MOZ_debug OES_texture_float_linear WEBGL_compressed_texture_etc WEBGL_compressed_texture_s3tc WEBGL_compressed_texture_s3tc_srgb WEBGL_debug_renderer_info WEBGL_debug_shaders WEBGL_lose_context MOZ_WEBGL_lose_context MOZ_WEBGL_compressed_texture_s3tc
Audio Backend: pulse
GPU #1
Active: Yes
Description: NVIDIA Corporation -- TITAN Xp/PCIe/SSE2
Vendor ID: NVIDIA Corporation
Device ID: TITAN Xp/PCIe/SSE2
Driver Version: 4.5.0 NVIDIA 375.66
Diagnostics
AzureCanvasAccelerated: 0
AzureCanvasBackend: skia
AzureContentBackend: skia
AzureFallbackCanvasBackend: none
CairoUseXRender: 0
Decision Log
HW_COMPOSITING:
blocked by default: Acceleration blocked by platform
OPENGL_COMPOSITING:
unavailable by default: Hardware compositing is disabled
WEBRENDER:
opt-in by default: WebRender is an opt-in feature
unavailable by runtime: Build doesn't include WebRender
Important Modified Preferences
------------------------------
accessibility.typeaheadfind.flashBar: 0
browser.cache.disk.capacity: 358400
browser.cache.disk.filesystem_reported: 1
browser.cache.disk.hashstats_reported: 1
browser.cache.disk.smart_size.first_run: false
browser.cache.disk.smart_size.use_old_max: false
browser.cache.frecency_experiment: 4
browser.places.smartBookmarksVersion: 8
browser.search.update: false
browser.sessionstore.restore_on_demand: false
browser.sessionstore.upgradeBackup.latestBuildID: 20170803173024
browser.startup.homepage_override.buildID: 20170803173024
browser.startup.homepage_override.mstone: 55.0
browser.tabs.remote.autostart.2: true
browser.urlbar.daysBeforeHidingSuggestionsPrompt: 0
browser.urlbar.lastSuggestionsPromptDate: 20170721
browser.urlbar.timesBeforeHidingSuggestionsHint: 0
dom.push.userAgentID: db713ab39ad84104b68aa404855e0e4d
extensions.lastAppVersion: 55.0
font.internaluseonly.changed: false
media.autoplay.enabled: false
media.eme.enabled: true
media.gmp-gmpopenh264.abi: x86_64-gcc3
media.gmp-gmpopenh264.lastUpdate: 1498508411
media.gmp-gmpopenh264.version: 1.6
media.gmp-manager.buildID: 20170803173024
media.gmp-manager.lastCheck: 1502746120
media.gmp-widevinecdm.abi: x86_64-gcc3
media.gmp-widevinecdm.lastUpdate: 1500354313
media.gmp-widevinecdm.version: 1.4.8.903
media.gmp.storage.version.observed: 1
media.webrtc.debug.log_file: /tmp/WebRTC.log
network.cookie.prefsMigrated: true
network.dns.disablePrefetch: true
network.http.speculative-parallel-limit: 0
network.predictor.cleaned-up: true
network.prefetch-next: false
places.database.lastMaintenance: 1502273964
places.history.expiration.transient_current_max_pages: 90395
plugin.disable_full_page_plugin_for_types: application/pdf
plugins.ctprollout.cohort: control
plugins.ctprollout.cohortSample: 0.714769
print.print_bgcolor: false
print.print_bgimages: false
print.print_duplex: 0
print.print_evenpages: true
print.print_in_color: true
print.print_margin_bottom: 0.5
print.print_margin_left: 0.5
print.print_margin_right: 0.5
print.print_margin_top: 0.5
print.print_oddpages: true
print.print_orientation: 1
print.print_page_delay: 50
print.print_paper_data: 0
print.print_paper_height: 11.00
print.print_paper_name: na_letter
print.print_paper_size_unit: 0
print.print_paper_width: 8.50
print.print_scaling: 1.00
print.print_shrink_to_fit: true
print.print_to_file: false
print.print_unwriteable_margin_bottom: 56
print.print_unwriteable_margin_left: 25
print.print_unwriteable_margin_right: 25
print.print_unwriteable_margin_top: 25
services.sync.declinedEngines:
storage.vacuum.last.brief.sqlite: 1499027621
storage.vacuum.last.index: 1
storage.vacuum.last.places.sqlite: 1501393325
Important Locked Preferences
----------------------------
Places Database
---------------
JavaScript
----------
Incremental GC: true
Accessibility
-------------
Activated: false
Prevent Accessibility: 0
Library Versions
----------------
NSPR
Expected minimum version: 4.15
Version in use: 4.15
NSS
Expected minimum version: 3.31
Version in use: 3.31
NSSSMIME
Expected minimum version: 3.31
Version in use: 3.31
NSSSSL
Expected minimum version: 3.31
Version in use: 3.31
NSSUTIL
Expected minimum version: 3.31
Version in use: 3.31
Experimental Features
---------------------
Sandbox
-------
Seccomp-BPF (System Call Filtering): true
Seccomp Thread Synchronization: true
User Namespaces: true
Content Process Sandboxing: true
Media Plugin Sandboxing: true
Content Process Sandbox Level: 2
Effective Content Process Sandbox Level: 2
Rejected System Calls
---------------------
Comment 10•8 years ago
|
||
Hmm, Blake, shouldn't users on 55 get multiple content processes?
Flags: needinfo?(mrbkap)
Reporter | ||
Comment 11•8 years ago
|
||
Ehsan - if Content processes are bottlenecks, are they bottlenecks based on CPU usage soley? In the case of the attached profile, the CPU is pegged to 100%. Looking at a fresh reload like that originally I see CPU is going from 100-200 percent, hanging around 100-120% CPU after the first maybe 20 seconds. Not sure the profile captures this amount of CPU usage. The ui was alive through my torturing just now but it wasn't really interactive or usable....
Here's a fresh one I did in the middle of writing this comment: https://perfht.ml/2vzeeic The interesting thing here is you'll see the content is at zero for the first 6 seconds while compositor is fairly busy along with Main thread. What in the world is main thread doing as a function of time? ForwardTransaction DisplayList (linked list...?) LayerBuilding, Rasterize, RefreshDriverTick dominate the lifetime from what I'm reading.
I'm trying to understand why Content processes wouldn't saturate to just get the work done or why they're otherwise underutilizing the machine and taking minutes to come to a halt. I somewhat figure this means it's waiting on IO - but then the connections I'm on are first rate and it again becomes a self imposed starvation firefox is causing to itself... also concerned of some priority inversion type issue happening or it's switching so many contexts that the amount of useful work is very little per time spent (low efficiency). This kind of thing can be happen and very difficult to detect in message passing systems where all the relative time is spent waiting on poll/pthread_cond_wait. Need to figure out how to trace what's not sending it messages and then find why that's the case.
Comment 12•8 years ago
|
||
Hi,
If I understand correctly all your reports are against version 55.
Did you check if it was improved or fixed in Firefox Nightly ?
Reporter | ||
Comment 13•8 years ago
|
||
@nivtwig In due time, especially if we hit a wall or a patch/feature of demonstrable value is known or required for better tracability. However the probability any recent version fixes these issues is astronomically low due to length of time these issues have already survived. As a an important side exercise, first it would be extremely good to determine how to find what's bogging the browser down then we can start varying versions and heading towards more recent builds.
Comment 14•8 years ago
|
||
(In reply to nevion from comment #11)
> Ehsan - if Content processes are bottlenecks, are they bottlenecks based on
> CPU usage soley?
The first and foremost issue you are having is that you have one content process. In Firefox 55, as far as I'm aware, we have enabled the usage of 4 content processes by default for all users. This means that you should be getting 4 "Content" rows in the profiles for example. The effect of that would be that the work of running the hundreds of tabs you have open will be divided in between 4 main threads (since each content process has 1 main thread) and assuming that your CPU has enough cores, we can run some of this in parallel, effectively making the browsing experience faster. The question I asked in comment 10 is about whether there is an underlying bug somewhere that prevents you from getting the default 4 content processes.
(Note that if you have more memory on your machine and are a heavy tab user, you may want to consider changing the default settings for even better performance, by going to the Preferences dialog, unchecking the "Use recommended performance settings" and changing the number of content processes to something higher. That being said, I'd appreciate if you wait for a while before Blake or someone else gets back to us in case they ask for more information from you to help diagnose the issue of not getting the default of 4 content processes, since I'm a bit afraid that changing that setting may make the bug go away and make it more difficult to diagnose what was going wrong...)
> In the case of the attached profile, the CPU is pegged to
> 100%. Looking at a fresh reload like that originally I see CPU is going
> from 100-200 percent, hanging around 100-120% CPU after the first maybe 20
> seconds. Not sure the profile captures this amount of CPU usage. The ui
> was alive through my torturing just now but it wasn't really interactive or
> usable....
FWIW, contrary to what many people think, the CPU usage is mostly irrelevant to the performance of software usually, it really matters _what_ the CPU is doing when it's pegged to 100% usage. If it is doing useful work then it's a great thing for it to be pegged at 100% since it means the software is running as fast as possible, and otherwise if the code was written in a way that would introduce a lot of artificial delays, you'd see the CPU usage drop down a bit, but the overall throughput would also drop down and you would have to wait a longer amount of time overall for the work (for example, loading a page) to finish.
Looking at the profiles you have submitted here, I haven't seen any specific things that are wasting CPU time, the browser is really just doing *a lot* of work. There really isn't any way for us to provide a great browsing performance when running this much code on a single thread unfortunately. :-/ This is why I suggested that the biggest issue to focus on for now is why you're only getting a single content process. Once that is solved, you should see a *huge* improvement (I'm hoping you're not on a very old machine with a very small number of CPU cores, since in that case there really isn't too much that software can do to help I'm afraid...)
> Here's a fresh one I did in the middle of writing this comment:
> https://perfht.ml/2vzeeic The interesting thing here is you'll see the
> content is at zero for the first 6 seconds while compositor is fairly busy
> along with Main thread. What in the world is main thread doing as a
> function of time? ForwardTransaction DisplayList (linked list...?)
> LayerBuilding, Rasterize, RefreshDriverTick dominate the lifetime from what
> I'm reading.
That is just an implication of the way that our profiler records this information. We have a fixed-size circular buffer (https://en.wikipedia.org/wiki/Circular_buffer) which we allocate from start and then we start to fill it up with the profile data. The profile data is variable size for each sample we take since it includes things like the URLs for the JS scripts if the browser was running some JS code at the time the sample was taken. Due to the variable length of each sample, the number of samples that would fit in each one of these fixed-size buffers is variable. Each row in the profiler UI (for example, "Main Thread", "Content" and "Compositor" is representing data from one of these circular buffers but the number of elements in each one isn't the same. It just happens that we managed to record fewer samples for the Content circular buffer, which makes that one appear shorter in the UI which confuses everyone. :-) If you want, you can increase the size of this buffer by clicking on the toolbar icon and moving the Buffer Size slider, for example to something larger than the 9MB default. Note that larger buffers will make capturing the profile take longer, and capturing the profile janks the entire browser (we haven't optimized that since that problem mostly affects us developers and not our users!)
> I'm trying to understand why Content processes wouldn't saturate to just get
> the work done or why they're otherwise underutilizing the machine and
> taking minutes to come to a halt. I somewhat figure this means it's waiting
> on IO - but then the connections I'm on are first rate and it again becomes
> a self imposed starvation firefox is causing to itself... also concerned of
> some priority inversion type issue happening or it's switching so many
> contexts that the amount of useful work is very little per time spent (low
> efficiency). This kind of thing can be happen and very difficult to detect
> in message passing systems where all the relative time is spent waiting on
> poll/pthread_cond_wait. Need to figure out how to trace what's not sending
> it messages and then find why that's the case.
I think the buffer size issue I described above may have given you the wrong impression about what was going on, please try increasing the buffer size and capture a few more profiles and see for yourself how the profile changes. If you're interested to look at the idle times, it's actually quite easy, all you have to do is to search in the profile for the OS wait function (on Linux it's __poll(), pthread_cond_wait() is used to wait on conditional variables, but we don't use condvars to wait on our event loop in Gecko.) For example the content process in the profile in comment 0 is only spending ~55ms waiting idle on the event queue <https://perfht.ml/2fINtCG>, but even that is oversimplifying it, since most of those are just single samples (you can see that if you zoom in over them), and with a sampling profiler you can't say that one sample taken which has landed in a __poll() call actually shows that a wait was happening. In reality the main thread here has been so busy that the event queue was almost never empty, so we almost spent 0 time idle waiting for events.
Component: Tabbed Browser → DOM: Content Processes
Product: Firefox → Core
Comment 15•8 years ago
|
||
I don't think we have rolled out 4 processes to 100% yet. You could check about:config and look for:
e10s.rollout.cohort
e10s.rollout.cohortSample.multi
extensions.e10sMultiBlockedByAddons
If its blocked by addons, well that is a problem. If you are not in "multiBucket4" for the cohort then you can try changing the cohortSample to 1.0 and restarting. That may force you into the 4 process cohort.
Comment 16•8 years ago
|
||
Beta 56 users who are eligible should receive multi. From the about:support above, it appears that nevion is on the old beta (Beta 55) which was at a 50/50 split for multi. Looking at the pref e10s.rollout.cohort should tell us that he's in the multiBucket1 group.
Flags: needinfo?(mrbkap)
Reporter | ||
Comment 17•8 years ago
|
||
e10s.rollout.cohort: temp-qualified-devtools
e10s.rollout.cohortSample: .417645 #I am changing this to 1 per Ben's comment verbatim, and if that doesn't work, I will set ohortSample.multi.
e10s.rollout.cohortSample.multi: .795750
e10s.e10sMultiBlockedByAddons: false #tab tree view made this affected this
Reporter | ||
Comment 18•8 years ago
|
||
cohortSample worked, now have cohort webextensions-multiBucket4
Reporter | ||
Comment 19•8 years ago
|
||
ehsan per your response a few more questions. I'm glad about the depth of your explanations btw, especially with the nuances of the gecko profiler. I find it pretty bizarre that the sampler for the Content process didn't produce output (starved sampler?) - we're talking 11.5 seconds of no data - is that reasonable for either the profiler or firefox? I've increased the buffer size to 90 MiB and will do some testing with that and e10s soon, but are you theorizing it's just due to the size of the buffer?
First about the slow loading / lingering loading of tabs - I posit that I can stop all tabs loading, then load one at a time until all tabs are loaded and the amount of time to do this is much less. There's something going on here dragging out how long it takes to actually render each tab. The sites I'm on shouldn't take 5 minutes to render if the CPU is doing useful work the entire time.
Btw there's still 833ms spent (aggregate) on pthread_cond_wait, and 635ms coming from PCookieService::Msg_GetCookieString. Now I don't think that's my problem but that's still a ton of overhead. Is there anyway to trace the call counts, even if the impact on performance is higher?
Reporter | ||
Comment 20•8 years ago
|
||
Hm, thinking about it a bit more, I understand why the content buffers could drop on circular buffers from the start of the program given variable/larger than other sampling entries. Still seems like something that could be accommodated better.
Also I understand pthread_cond_wait and poll are not necessarily wasting time but that they're waiting may suggest that they are being starved for input.
https://perfht.ml/2vBT6bt
Reporter | ||
Comment 21•8 years ago
|
||
I created an simple webextensions based extension to stop all tabs and reload tabs in a broadcast-storm style and sequential. It doesn't quite work right on sequential after the first window and stopping all tabs also seems to not have the desired effect (although it seems like it has an effect), and moving through each tab to make active before calling stop is unsavory (in contrast to reload). Some timing info should be extracted/collected when all loading is complete (will improve upon after some advancement).
You can find it here https://github.com/nevion/tabstressin - please help identify improvements and advise or find better ways to implement the commands. The documentation and ecosystem for extensions is really confusing right now and I hit a wall on it already.
The purpose here is to get some metrics to see how we're scaling with the number of content processes and also verify the claim that serial loading is faster than the broadcast-storm loading, and potentially identify a bit more what's going on with tabs having a lingering load cascading in to minutes.
Comment 22•8 years ago
|
||
(In reply to nevion from comment #19)
> Btw there's still 833ms spent (aggregate) on pthread_cond_wait, and 635ms
> coming from PCookieService::Msg_GetCookieString. Now I don't think that's
> my problem but that's still a ton of overhead. Is there anyway to trace the
> call counts, even if the impact on performance is higher?
GetCookieString was one of the last sync message we had that in some cases had a massive performance cost. Since bug 1331680 is complete, you should have a lot better experience on the latest nightly.
Reporter | ||
Comment 23•8 years ago
|
||
@gabor thanks for letting me know, that is indeed an interesting and relevant change.. and the presence of an issue like that I think builds a bridge along my line of thinking that at a mid-level firefox is spinning wheels and bringing down efficiency. As usual when I identify something, somebody just did it :-).
Will test a nightly with it soon, and perhaps 55 patched to compare more fairly if that's easy to patch in, but really want to get something going for doing a more automated test that captures data inhand per tabstressin. If I can study the performance issues under context better, I think we'll have better luck scratching beneath the surface here.
Comment 24•8 years ago
|
||
(In reply to nevion from comment #19)
> ehsan per your response a few more questions. I'm glad about the depth of
> your explanations btw, especially with the nuances of the gecko profiler. I
> find it pretty bizarre that the sampler for the Content process didn't
> produce output (starved sampler?) - we're talking 11.5 seconds of no data -
> is that reasonable for either the profiler or firefox? I've increased the
> buffer size to 90 MiB and will do some testing with that and e10s soon, but
> are you theorizing it's just due to the size of the buffer?
I am not theorizing, I am explaining. :-) Let's say you have two timelines, one is 20 seconds long, one is 10 seconds long, and you want to display both horizontally on top of each other. One is bound to appear narrower than the other. The way we display this data is we right align it visually to make sure the late samples align together vertically, which creates a visible gap in the beginning. No starvation, just merely one timeline being much shorter than the other ones.
> First about the slow loading / lingering loading of tabs - I posit that I
> can stop all tabs loading, then load one at a time until all tabs are loaded
> and the amount of time to do this is much less. There's something going on
> here dragging out how long it takes to actually render each tab. The sites
> I'm on shouldn't take 5 minutes to render if the CPU is doing useful work
> the entire time.
If you stop all tabs and load them one at a time, each will take much less wall clock time to load, since when the page load is happening the browser isn't also busy running the code from all of the other pages at the same time. When all of the pages are loading in the background, the browser is busy dividing up the time in between all of those pages, and the division of time is essentially random, because of the semantics of JavaScript (https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop#Run-to-completion) which means once we start running some JavaScript code from page 1 (be it a foreground or background page) we have to finish running that code on that main thread before we can do *anything else*, including resuming loading the page that the user is looking at.
And interestingly, we know that this is a terrible user experience, and we are actively working on a project (https://wiki.mozilla.org/Quantum/DOM) to switch our browser engine to a cooperative scheduling model which would allow us to interrupt running JS (essentially break the run to completion model in a way without web pages being able to tell we're doing that!) so that we can start to service tasks belonging to foreground pages. Once that is completed, for scenarios such as loading hundreds of tabs in the background on the same main thread, in cases where we're running JavaScript for those tabs we can interrupt and start running a task belonging to the foreground page so that the foreground page loads faster, and once there's nothing to do for the foreground page we'd go back to finishing the tasks belonging to background pages. Your CPU would still be pegged at 100% when all of this is happening but you'd see the pages you're looking at load and become interactive much faster and you wouldn't be able to tell we're doing a lot of work in the background. Of course, there are practical limits to how well this can work, because the browser should really be thought of like an operating system running applications these days, and the amount of work that we have to do totally depends on what the web page requests us to do, so you *still* would want to divide up this work across a few content processes. But once Quantum DOM scheduling (bug 1350432) gets to a working state, heavy users such as yourself should hopefully experience a massive improvement. :-)
> Btw there's still 833ms spent (aggregate) on pthread_cond_wait, and 635ms
> coming from PCookieService::Msg_GetCookieString. Now I don't think that's
> my problem but that's still a ton of overhead. Is there anyway to trace the
> call counts, even if the impact on performance is higher?
As Gabor mentioned, this was bug 1331680.
(In reply to nevion from comment #20)
> Hm, thinking about it a bit more, I understand why the content buffers could
> drop on circular buffers from the start of the program given variable/larger
> than other sampling entries. Still seems like something that could be
> accommodated better.
This design has been the result of a lot of careful deliberation! What has informed it has been the desire to never allocate memory while capturing a profile, since memory allocation can be expensive and its performance is unpredictable, and it's extremely important for the profiler sampler thread to be *super fast* during recording of a sample, so doing expensive operations such as memory allocations aren't allowed there. Also please note that the profiler hasn't been designed with user friendliness as its first goal, for the better or worse, and performance and developer friendliness have been higher priority goals for its design, since its users are more often developers than users. These goals can sometimes dictate contradictory design choices, and while usually in Firefox we opt for user friendliness, the profiler has grown from the needs of the developers working on the performance of Firefox over the years and it still has some confusing rough edges such as the one you've seen. :-) That being said, compared to a few years ago, it has come so far in terms of being easy to use and understand, but sometimes it's really hard to have our cake and eat it too, and this may be one of those cases!
> Also I understand pthread_cond_wait and poll are not necessarily wasting
> time but that they're waiting may suggest that they are being starved for
> input.
They generally mean the thread is blocked on another thread waiting on "something". What the wait is happening on depends on the callers of these functions, and the explanation I provided before was based on that. Without knowing how these functions are used in a codebase you can't really draw conclusions about what it means to see them being called.
For example, we have a large number of background worker threads that spend almost all of their time waiting for work, until a thread posts some work to them when they wake up, do the work, post the result back and go back to sleep.
Reporter | ||
Comment 25•8 years ago
|
||
Well aware of scheduling issues here, kind of surprised work wasn't started on prioritizing (in a LRU order) the scheduling in the past. Seemed for the last few years that the active tab seems to get a little special attention but is still subject to execution starvation. Maybe that happened as a byproduct of the low level event handling loop on the UI or is just my imagination.
I'm used to the idea of a consumer (main in this case) processes a message, one at a time, in order, completely before moving on to the next - that's the usual way to do it. However the context switch overhead that is making broadcast-storm loading slower than the sum of the time of sequential is something I want to understand better, and measure and see what are it's bottlenecks. Do you have any suggestions on how I might write the code for that (the idea behind tabstormmin) in a devel time productive way or someone to refer to on how to hack it with webextensions apis? Not really sure of a good PoC there or if I'm choosing an appropriate way to get the browser to do these things. I believe things start getting challenging for a browser running 30+ tabs simply opening 5-10 tabs over the coarse of 10-20 seconds, again where the sum of the individual loads, if done serially would be significantly less than all of them loading simultaneously. I have several cases I want to write and experiment with and start tracking browser metrics with.
Cooperative scheduling is interesting, I'm surprised real threads via a threadpool is the basis of the current work though, would've expected an approach more like greenthreads. I've done quite some work with that on networking / message IPC and it's benefits both in terms of code simplification and latency is very good and it saves you from the expenses of heavyweight synchronization - it meshes very well around the poll system call, and you can collect them into different real threads to scale in CPU along with IO - one nice thing about it was that you get very fast "ipc" where execution also inherently become mutually exclusive. FYI I used Boost.Context and Boost.Coroutines to do those things in a crossplatform way.
Re inferring the pthread_cond_wait , its difficult to draw conclusions anyway you slice it and you end up with alot of false positives - but I've uncovered many issues in codebases starting from there, it's a useful starting point and does have common incidence with perf issues. For instance you might take that and look at it's dependency condition notifications to find some low hanging fruit.
Reporter | ||
Comment 26•8 years ago
|
||
will be profiling again soon, working through a back atm
Reporter | ||
Comment 27•8 years ago
|
||
I've been using 56 beta series now for a few weeks and despite a couple of bugs, this is indeed probably the fastest I've seen the browser in 5+ years. Much less deadlock in the UI in startup or heavy load on moderate high tab counts. I do notice a sort of queuing of actions/completions of UI interaction still and it can take a very long time or linger near forever to load (> 1 hr to load in some cases, manual intervention required to cancel these).
Still interested in "tabstressin" to help me profile the browser better and a more controlled fashion - any pointers on where to read the implementation of web extensions and the C++ bridge that exposes the functionality to JS?
Profiling exercises coming sooner or later depending on my workload.
![]() |
||
Updated•8 years ago
|
Priority: -- → P5
Comment 28•3 years ago
|
||
Hi nevion, I'd assume that recent releases behave quite different and hopefully better for you. Feel free to re-open or probably better file a new bug in case.
Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(nevion)
Resolution: --- → INCOMPLETE
Updated•4 months ago
|
Flags: needinfo?(nevion)
You need to log in
before you can comment on or make changes to this bug.
Description
•