Open Bug 1966616 Opened 3 months ago Updated 1 month ago

C-C TB, mochitest comm/mail/base/test/browser/browser_markAsRead.js TSAN issue on local linux PC

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

People

(Reporter: ishikawa, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Attached file TSAN warning

During the local testing of TSAN version of C-C TB, I encountered the attached TSAN warning.

Note that this TSAN issue has become apparent only after previous tsan issues have been eliminated.

But given this seems to be in Rust code, I have no idea how much I cantrust TSAN's check. Does TSAN work well with Rust code?

The TSAN issue seems to be reported inside the WebRenderer code.
I am yet to see the error/warning on tree server. So it may have something to do with the particular graphics driver being used.

This particular TSAN bothers me very much because the race is reported with |pthread_mutex_destroy| at the top of the stack of thread T41.
What does it mean? Is it possible that the mutex was being destroyed while there is still other threads waiting for it?

Maybe someone in the know can figure out the issue rather effortlessly.
I am not familiar with the code in the stack at all.

I now realize that there is a warning line which may bring light to the condition that caused the TSAN issue.

00:22.56 GECKO(2267869) {debug} SetSpec succeeded. : aSpec=about:blank
00:23.09 GECKO(2267869) [WARN  webrender::device::gl] Missing optimized shader source for gpu_cache_update <--- * I wonder if this is important?*
00:23.78 GECKO(2267869) ==================
00:23.78 GECKO(2267869) WARNING: ThreadSanitizer: data race (pid=2267869)
00:23.78 GECKO(2267869)   Write of size 1 at 0x7220000d2790 by thread T41 (mutexes: write M0):

I now realize that the very similar TSAN warning is printed after Marionette is invoked for the first time BEFORE any test is attempted (!?)
There are some dumps from my local modifications.
However, basically, this is the excerpt from a local log file when mochitest of TSAN version of C-C TB is attempted.
Before any test file is read (TEST_START), marionette printed the attached TSAN warning.
Something is wrong.
Again, |pthread_mutex_destroy| is visible at the time the problem was noticed.

Not sure what to do with this. The mentioned tsan issues were in mailnews code and the warnings happen in commcentral.

00:22.45 GECKO(2265665) WARNING: ThreadSanitizer: data race (pid=2265665)
00:22.45 GECKO(2265665) Write of size 1 at 0x7220000ce790 by thread T41 (mutexes: write M0):
00:22.45 GECKO(2265665) #0 pthread_mutex_destroy /builds/worker/fetches/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1344:3 (thunderbird+0xd4d4e) (BuildId: a87c77d91aec7167b0524a3ffae54463)
00:22.45 GECKO(2265665) #1 <null> <null> (libgallium-25.0.5-1.so+0x86f0d7) (BuildId: 17b621a998972f5828194cce760f15e26f24192a)
00:22.45 GECKO(2265665) #2 _$LT$gleam..gl..ErrorReactingGl$LT$F$GT$$u20$as$u20$gleam..gl..Gl$GT$::flush::h7985001b9d2c5892 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/third_party/rust/gleam/src/gl.rs:98:26 (libxul.so+0x13cb27fd) (BuildId: 2707e276020f4d34b144dd3fb4cbf1b5)
00:22.45 GECKO(2265665) #3 webrender::renderer::Renderer::draw_frame::h8a648a8d7ae96b05 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/wr/webrender/src/renderer/mod.rs:5054:17 (libxul.so+0x13f0f9e2) (BuildId: 2707e276020f4d34b144dd3fb4cbf1b5)
00:22.45 GECKO(2265665) #4 webrender::renderer::Renderer::render_impl::h115acb2008de7966 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/wr/webrender/src/renderer/mod.rs:1599:17 (libxul.so+0x13f0f9e2)
00:22.45 GECKO(2265665) #5 webrender::renderer::Renderer::render::h1fa507f08a3af9ea /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/wr/webrender/src/renderer/mod.rs:1283:30 (libxul.so+0x13f0ae07) (BuildId: 2707e276020f4d34b144dd3fb4cbf1b5)
00:22.45 GECKO(2265665) #6 wr_renderer_render /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/webrender_bindings/src/bindings.rs:649:11 (libxul.so+0x1390d0a8) (BuildId: 2707e276020f4d34b144dd3fb4cbf1b5)

Blocks: gfx-triage
Severity: -- → S3
Priority: -- → P3

(In reply to Jim Mathies [:jimm] from comment #3)

Not sure what to do with this. The mentioned tsan issues were in mailnews code and the warnings happen in commcentral.

Thank you for the comment.
Are you referring to the code in "/comm/third_party/rust/gleam/src/gl.rs"? (Marked with *** below)

00:22.45 GECKO(2265665) WARNING: ThreadSanitizer: data race (pid=2265665)
00:22.45 GECKO(2265665) Write of size 1 at 0x7220000ce790 by thread T41 (mutexes: write M0):
00:22.45 GECKO(2265665) #0 pthread_mutex_destroy /builds/worker/fetches/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1344:3 (thunderbird+0xd4d4e) (BuildId: a87c77d91aec7167b0524a3ffae54463)
00:22.45 GECKO(2265665) #1 <null> <null> (libgallium-25.0.5-1.so+0x86f0d7) (BuildId: 17b621a998972f5828194cce760f15e26f24192a)
*** > 00:22.45 GECKO(2265665) #2 _$LT$gleam..gl..ErrorReactingGl$LT$F$GT$$u20$as$u20$gleam..gl..Gl$GT$::flush::h7985001b9d2c5892 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/third_party/rust/gleam/src/gl.rs:98:26 (libxul.so+0x13cb27fd) (BuildId: 2707e276020f4d34b144dd3fb4cbf1b5) *** <---- This code in ./comm subdirectory but see comment below.
00:22.45 GECKO(2265665) #3 webrender::renderer::Renderer::draw_frame::h8a648a8d7ae96b05 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/wr/webrender/src/renderer/mod.rs:5054:17 (libxul.so+0x13f0f9e2) (BuildId: 2707e276020f4d34b144dd3fb4cbf1b5)
00:22.45 GECKO(2265665) #4 webrender::renderer::Renderer::render_impl::h115acb2008de7966 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/wr/webrender/src/renderer/mod.rs:1599:17 (libxul.so+0x13f0f9e2)
00:22.45 GECKO(2265665) #5 webrender::renderer::Renderer::render::h1fa507f08a3af9ea /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/wr/webrender/src/renderer/mod.rs:1283:30 (libxul.so+0x13f0ae07) (BuildId: 2707e276020f4d34b144dd3fb4cbf1b5)
00:22.45 GECKO(2265665) #6 wr_renderer_render /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/webrender_bindings/src/bindings.rs:649:11 (libxul.so+0x1390d0a8) (BuildId: 2707e276020f4d34b144dd3fb4cbf1b5)

Actually there is "mozilla/third_party/rust/gleam/src/gl.rs" (IN M-C) which is an exact replica of
"mozilla/comm/third_party/rust/gleam/src/gl.rs". (IN C-C)
I have no idea why COMM subdirectory has this duplicate file, but
for all practical purposes, the error occurs in a series of function calls that exist in M-C portion of the tree (by identifying the said equivalent/same files).
Line 98 of gl.rs is:
https://searchfox.org/mozilla-central/source/third_party/rust/gleam/src/gl.rs#98 is:

        impl<F: Fn(&dyn Gl, &str, GLenum)> Gl for ErrorReactingGl<F> {
            $($(unsafe $($garbo)*)* fn $name(&self $(, $arg:$t)*) $(-> $retty)* {
                let rv = self.gl.$name($($arg,)*);   <---- This is it.  
                let error = self.gl.get_error();
                if error != 0 {
                    (self.callback)(&*self.gl, stringify!($name), error);
                }
                rv
            })+
        }

I am a bit puzzled (I am not THAT familiar with rust code), and wonder why |$LT$gleam..gl..ErrorReactingGl$LT$F$GT$$u20$as$u20$gleam..gl..Gl$GT$::flush::h7985001b9d2c5892| ends up calling pthread_mutex_destroy.

For that matter, the read by T59 was ATOMIC.: "Previous atomic read of size 1 at 0x7220000d2790 by thread T59"
On the other hand, "Write of size 1 at 0x7220000ce790 by thread T41" suggests the write was NOT ATOMIC.
Can this mean there is a problem of non-atomic read/write in TSAN runtime routine?

00:23.51 GECKO(2412576)	  Previous atomic read of size 1 at 0x7220000d2790 by thread T59:
00:23.51 GECKO(2412576)	    #0 pthread_mutex_lock /builds/worker/fetches/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1353:3 (thunderbird+0xd4ece) (BuildId: 15c86b29d330b67d637d0a56f2f4cf77)
00:23.51 GECKO(2412576)	    #1 <null> <null> (libgallium-25.0.5-1.so+0x5afcac) (BuildId: 17b621a998972f5828194cce760f15e26f24192a)

The version of clang I use is:
clang --version
clang version 19.1.7 (taskcluster-VFltXbHqQ6qL_c7O4lIS6w)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/ishikawa/.mozbuild/clang/bin

One other thing I am curious about this is the race that happens in an error-related code, isn't it?
Somehow, | webrender::device::gl::Device::draw_nonindexed_points::hbc73175e7578da72| ended up calling calloc() after it calls
| $LT$gleam..gl..ErrorReactingGl$LT$F$GT$$u20$as$u20$gleam..gl..Gl$GT$::draw_arrays::h34f1aee7b2c5de70 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/third_party/rust/gleam/src/gl.rs:98:26| which seems to suggest, due to its name string "...$gleam..gl..ErrorReatingGl$...", it is related to error handling?
I am not entirely sure where pthread_mutex
* functions get into the picture.

Location is heap block of size 120 at 0x7220000d2780 allocated by thread T41:
00:23.51 GECKO(2412576)	    #0 calloc /builds/worker/fetches/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:686:5 (thunderbird+0xd18db) (BuildId: 15c86b29d330b67d637d0a56f2f4cf77)
00:23.51 GECKO(2412576)	    #1 <null> <null> (libgallium-25.0.5-1.so+0x86f06a) (BuildId: 17b621a998972f5828194cce760f15e26f24192a)
00:23.51 GECKO(2412576)	    #2 _$LT$gleam..gl..ErrorReactingGl$LT$F$GT$$u20$as$u20$gleam..gl..Gl$GT$::draw_arrays::h34f1aee7b2c5de70 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/third_party/rust/gleam/src/gl.rs:98:26 (libxul.so+0x13caa18c) (BuildId: 073afc2d07cf4f57875bf8dbb561ca05)
00:23.51 GECKO(2412576)	    #3 webrender::device::gl::Device::draw_nonindexed_points::hbc73175e7578da72 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/wr/webrender/src/device/gl.rs:3691:9 (libxul.so+0x13e35069) (BuildId: 073afc2d07cf4f57875bf8dbb561ca05)
00:23.51 GECKO(2412576)	    #4 webrender::renderer::gpu_cache::GpuCacheTexture::flush::ha5671287860d8d81 /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/wr/webrender/src/renderer/gpu_cache.rs:369:17 (libxul.so+0x13e44049) (BuildId: 073afc2d07cf4f57875bf8dbb561ca05)

Starting from the following line,
https://searchfox.org/mozilla-central/source/third_party/rust/gleam/src/gl.rs#781
ErrorReactingGl<F> and friends are defined.

/// A wrapper around GL context that calls a specified callback on each GL error.
pub struct ErrorReactingGl<F> {

Maybe the wrapper creation/deletion (deletion after appropriate message is printed?) somehow implements atomic operation using pthread_mutex_* but has an oversight of proper locking or two? OR clang TSAN runtime itself has an issue?

OK, now I realize libgallium is MESA 3D graphics library. It may implement its own mult-threading protection. Hmm...

Of course, the issue may boil down to why C-C TB experiences this issue while, presumably, M-C FF does not experience this (?).

Now I realize that maybe libgallium needs to be compiled with TSAN enabled to obtain precise TSAN warning.
Not sure if that is a viable solution or not.
Maybe I simply need to ignore this error by whitelisting to TSAN runtime although I am not sure if the resulting TSAN warnings/errors can be trusted.

However, I wonder if M-C FF does not see this warning in similar locations.

I have tried to whitelist the issue, but failed. (Maybe I have not created the proper whiltelist text file.)
Sort of stumped how to proceed right now.

BTW, the mochitest is executed inside a linux image in a VMware Workstation.
So the graphics driver in my Debian GNU/Linux image is the sfotware driver.
Given the log (excerpted from the latest error, I added my own log to monitor flakey valgrind execution, etc.):

Marionette(): marionette_args=
 {'symbols_path': '/NEW-SSD/ASAN-OBJ-DIR/objdir-tb3/dist/crashreporter-symbols', 'socket_timeout': 450000, 'startup_timeout': None}
00:03.29 GECKO(73949) Calling mallopt for malloc/free debug. with 0xc3 value.
00:03.29 GECKO(73949) ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /COMM-CENTRAL/TMP-DIR/tmpauxownpx.mozrunner/runtests_leaks.log
00:07.04 GECKO(73949) Initializing context 721c00046310 surface 0 on display 727800030000
00:07.04 GECKO(73949) GL_VENDOR: Mesa
00:07.04 GECKO(73949) mVendor: Unknown
00:07.04 GECKO(73949) GL_RENDERER: llvmpipe (LLVM 19.1.7, 256 bits)
00:07.04 GECKO(73949) mRenderer: Unknown
00:07.04 GECKO(73949) mIsMesa: 1
00:07.05 GECKO(73949) [Parent 73949, Renderer] WARNING: robust_buffer_access_behavior marked as unsupported: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/gl/GLContextFeatures.cpp:644
00:07.83 GECKO(73949) 1747719209250	Marionette	INFO	Marionette enabled
00:07.87 GECKO(73949) 1747719209276	Marionette	TRACE	Received observer notification final-ui-startup
00:08.57 GECKO(73949) Warning: unrecognized command line flag -foreground
00:08.81 GECKO(73949) 1747719210223	Marionette	INFO	Listening on port 2828
00:08.82 GECKO(73949) 1747719210232	Marionette	DEBUG	Marionette is listening
00:08.86 GECKO(73949) NS_NewBufferedOutputStream: outputStream (= std::move(aOutputputStream)) =0x721c00058808
00:09.66 GECKO(73949) 1747719211079	Marionette	DEBUG	Accepted connection 0 from 127.0.0.1:55206
00:09.68 GECKO(73949) [Parent 73949, Main Thread] WARNING: Failed to register host application for portals
00:09.68 GECKO(73949) : file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/widget/gtk/WidgetUtilsGtk.cpp:201
00:10.26 GECKO(73949) 1747719211679	Marionette	DEBUG	Closed connection 0
00:10.42 GECKO(73949) 1747719211837	Marionette	DEBUG	Accepted connection 1 from 127.0.0.1:46722
00:11.69 GECKO(73949) 1747719213102	Marionette	DEBUG	Closed connection 1
00:11.84 GECKO(73949) 1747719213251	Marionette	DEBUG	Accepted connection 2 from 127.0.0.1:46736
00:12.35 GECKO(73949) 1747719213768	Marionette	DEBUG	Closed connection 2
00:12.74 GECKO(73949) 1747719214157	Marionette	DEBUG	Accepted connection 3 from 127.0.0.1:46748
00:12.91 GECKO(73949) NS_NewBufferedOutputStream: outputStream (= std::move(aOutputputStream)) =0x721c00018998
00:14.38 GECKO(73949) 1747719215796	Marionette	DEBUG	Accepted connection 4 from 127.0.0.1:46754
00:14.38 GECKO(73949) 1747719215800	Marionette	DEBUG	Closed connection 3
00:15.19 GECKO(73949) 1747719216603	Marionette	DEBUG	4 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
00:15.26 GECKO(73949) 1747719216673	Marionette	DEBUG	Waiting for initial application window
00:15.70 GECKO(73949) [Parent 73949, Main Thread] WARNING: 'NS_FAILED(rv)', file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/toolkit/components/resistfingerprinting/nsRFPService.cpp:2201
00:15.72 GECKO(73949) [WARN  rkv::backend::impl_safe::environment] `load_ratio()` is irrelevant for this storage backend.
00:15.90 GECKO(73949) NS_NewBufferedOutputStream: outputStream (= std::move(aOutputputStream)) =0x721c000ad838
00:16.91 GECKO(73949) [Parent 73949, GMPThread] WARNING: Failed to delete GMP storage directory: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/dom/media/gmp/GMPServiceParent.cpp:1858
00:17.41 GECKO(73949) NS_NewBufferedOutputStream: outputStream (= std::move(aOutputputStream)) =0x721c00018998
00:20.71 GECKO(73949) [Parent 73949, Main Thread] WARNING: 'NS_FAILED(rv)', file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/netwerk/base/DefaultURI.cpp:189
00:24.75 GECKO(73949) [WARN  webrender::device::gl] Missing optimized shader source for gpu_cache_update
00:25.42 GECKO(73949) ==================
00:25.42 GECKO(73949) WARNING: ThreadSanitizer: data race (pid=73949)
00:25.42 GECKO(73949)   Write of size 1 at 0x7220000ce790 by thread T40 (mutexes: write M0):
00:25.42 GECKO(73949)     #0 pthread_mutex_destroy /builds/worker/fetches/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1344:3 (thunderbird+0xd4d4e) (BuildId: 3f396730511c937f4345675104235ab7)
00:25.42 GECKO(73949)     #1 <null> <null> (libgallium-25.0.5-1.so+0x86f0d7) (BuildId: 17b621a998972f5828194cce760f15e26f24192a) 
   
    [... omitted ...]

I have a feeling that this is in an error path triggered by " [WARN webrender::device::gl] Missing optimized shader source for gpu_cache_update", but this is a pure guess.

OTOH, the following excerpt from a non-TSAN run of mochitest (DEBUG version of C-C TB) suggests there may be
an imprpoer setting of initial window? during the startup phase of Marionette.

Marionette(): marionette_args=
 {'symbols_path': '/NEW-SSD/moz-obj-dir/objdir-tb3/dist/crashreporter-symbols', 'socket_timeout': 450000, 'startup_timeout': None}
00:02.84 GECKO(1291930) ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /mnt/1gtmpfs/tmpggwczt5n.mozrunner/runtests_leaks.log
00:07.55 GECKO(1291930) Initializing context 7f601008b750 surface 0 on display 7f6010070940
00:07.55 GECKO(1291930) GL_VENDOR: Mesa
00:07.55 GECKO(1291930) mVendor: Unknown
00:07.55 GECKO(1291930) GL_RENDERER: llvmpipe (LLVM 19.1.7, 256 bits)
00:07.55 GECKO(1291930) mRenderer: Unknown
00:07.55 GECKO(1291930) mIsMesa: 1
00:07.55 GECKO(1291930) [Parent 1291930, Renderer] WARNING: robust_buffer_access_behavior marked as unsupported: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/gfx/gl/GLContextFeatures.cpp:642
00:07.73 GECKO(1291930) 1743483630790	Marionette	INFO	Marionette enabled
00:07.73 GECKO(1291930) 1743483630794	Marionette	TRACE	Received observer notification final-ui-startup
00:07.83 GECKO(1291930) Warning: unrecognized command line flag -foreground
00:07.85 GECKO(1291930) 2025-04-01 05:00:30.908187 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 559078d517d0 == 1 [pid = 1291930] [id = 0]
00:07.85 GECKO(1291930) 2025-04-01 05:00:30.908280 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 1 (55907863d440) [pid = 1291930] [serial = 1] [outer = 0]
00:07.85 GECKO(1291930) 2025-04-01 05:00:30.909195 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 2 (559078d95af0) [pid = 1291930] [serial = 2] [outer = 55907863d440]
00:07.93 GECKO(1291930) 1743483630989	Marionette	INFO	Listening on port 2828
00:07.93 GECKO(1291930) 1743483630990	Marionette	DEBUG	Marionette is listening
00:08.01 GECKO(1291930) NS_NewBufferedOutputStream: outputStream (= std::move(aOutputputStream)) =0x7f5fe400d948
00:08.07 GECKO(1291930) 1743483631131	Marionette	DEBUG	Accepted connection 0 from 127.0.0.1:48888
00:08.07 GECKO(1291930) [Parent 1291930, Main Thread] WARNING: Failed to register host application for portals
00:08.07 GECKO(1291930) : file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/widget/gtk/nsAppShell.cpp:372
00:08.10 GECKO(1291930) 1743483631163	Marionette	DEBUG	Closed connection 0
00:08.10 GECKO(1291930) 1743483631164	Marionette	DEBUG	Accepted connection 1 from 127.0.0.1:48894
00:08.27 GECKO(1291930) 1743483631332	Marionette	DEBUG	1 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
00:08.28 GECKO(1291930) 1743483631340	Marionette	DEBUG	Waiting for initial application window
00:08.66 GECKO(1291930) [WARN  rkv::backend::impl_safe::environment] `load_ratio()` is irrelevant for this storage backend.
00:08.70 GECKO(1291930) [Parent 1291930, Main Thread] WARNING: 'NS_FAILED(rv)', file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/toolkit/components/resistfingerprinting/nsRFPService.cpp:2095
00:08.85 GECKO(1291930) NS_NewBufferedOutputStream: outputStream (= std::move(aOutputputStream)) =0x7f5fe4059378
00:09.07 GECKO(1291930) 2025-04-01 05:00:32.134526 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 55907aacde30 == 2 [pid = 1291930] [id = 1]
00:09.07 GECKO(1291930) 2025-04-01 05:00:32.134579 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 3 (55907acea7a0) [pid = 1291930] [serial = 3] [outer = 0]
00:09.07 GECKO(1291930) 2025-04-01 05:00:32.134979 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 4 (55907aa8fc10) [pid = 1291930] [serial = 4] [outer = 55907acea7a0]
00:09.15 GECKO(1291930) 2025-04-01 05:00:32.209523 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 55907ad79360 == 3 [pid = 1291930] [id = 2]
00:09.15 GECKO(1291930) 2025-04-01 05:00:32.209564 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 5 (559079ca0db0) [pid = 1291930] [serial = 5] [outer = 0]
00:09.15 GECKO(1291930) 2025-04-01 05:00:32.209589 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 55907ad98190 == 4 [pid = 1291930] [id = 3]
00:09.15 GECKO(1291930) 2025-04-01 05:00:32.209597 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 6 (559079ca1270) [pid = 1291930] [serial = 6] [outer = 0]
00:09.15 GECKO(1291930) 2025-04-01 05:00:32.209839 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 55907ad84b20 == 5 [pid = 1291930] [id = 4]
00:09.15 GECKO(1291930) 2025-04-01 05:00:32.209857 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 7 (55907af42e80) [pid = 1291930] [serial = 7] [outer = 0]
00:09.16 GECKO(1291930) 2025-04-01 05:00:32.225527 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 55907b138700 == 6 [pid = 1291930] [id = 5]
00:09.16 GECKO(1291930) 2025-04-01 05:00:32.225554 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 8 (55907b0c5a10) [pid = 1291930] [serial = 8] [outer = 0]
00:09.17 GECKO(1291930) 2025-04-01 05:00:32.225994 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 9 (55907af14ee0) [pid = 1291930] [serial = 9] [outer = 55907b0c5a10]
00:09.17 GECKO(1291930) 2025-04-01 05:00:32.230894 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 10 (55907ad6cee0) [pid = 1291930] [serial = 10] [outer = 559079ca0db0]
00:09.17 GECKO(1291930) 2025-04-01 05:00:32.234675 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 55907b32d150 == 7 [pid = 1291930] [id = 6]
00:09.17 GECKO(1291930) 2025-04-01 05:00:32.234749 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 11 (55907b0ae7b0) [pid = 1291930] [serial = 11] [outer = 0]
00:09.18 GECKO(1291930) 2025-04-01 05:00:32.235151 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 12 (55907af10b80) [pid = 1291930] [serial = 12] [outer = 55907b0ae7b0]
00:09.18 GECKO(1291930) 2025-04-01 05:00:32.240088 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 13 (55907ad452d0) [pid = 1291930] [serial = 13] [outer = 559079ca1270]
00:09.18 GECKO(1291930) 2025-04-01 05:00:32.245161 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 14 (55907ad422d0) [pid = 1291930] [serial = 14] [outer = 55907af42e80]
00:09.19 GECKO(1291930) 2025-04-01 05:00:32.250014 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 55907b43c020 == 8 [pid = 1291930] [id = 7]
00:09.19 GECKO(1291930) 2025-04-01 05:00:32.250041 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 15 (55907b43aaa0) [pid = 1291930] [serial = 15] [outer = 0]
00:09.19 GECKO(1291930) 2025-04-01 05:00:32.250331 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 16 (55907a3efc00) [pid = 1291930] [serial = 16] [outer = 55907b43aaa0]
00:09.29 GECKO(1291930) [Parent 1291930, Main Thread] WARNING: 'NS_FAILED(rv)', file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/netwerk/base/DefaultURI.cpp:189
00:09.56 GECKO(1291930) 2025-04-01 05:00:32.618084 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 17 (55907bd839e0) [pid = 1291930] [serial = 17] [outer = 55907acea7a0]
00:09.56 GECKO(1291930) 2025-04-01 05:00:32.623717 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 18 (55907bd4a750) [pid = 1291930] [serial = 18] [outer = 55907b0c5a10]
00:09.57 GECKO(1291930) 2025-04-01 05:00:32.630281 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 19 (55907bd3d6e0) [pid = 1291930] [serial = 19] [outer = 55907b0ae7b0]
00:09.57 GECKO(1291930) 2025-04-01 05:00:32.636201 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 20 (55907b9718d0) [pid = 1291930] [serial = 20] [outer = 55907af42e80]
00:09.62 GECKO(1291930) 2025-04-01 05:00:32.679003 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 55907bbbab10 == 9 [pid = 1291930] [id = 8]
00:09.62 GECKO(1291930) 2025-04-01 05:00:32.679033 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 21 (55907bf151a0) [pid = 1291930] [serial = 21] [outer = 0]
00:09.62 GECKO(1291930) 2025-04-01 05:00:32.679452 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 22 (55907be7da90) [pid = 1291930] [serial = 22] [outer = 55907bf151a0]
00:09.69 GECKO(1291930) [WARN  webrender::device::gl] Missing optimized shader source for gpu_cache_update
00:10.69 GECKO(1291930) NS_NewBufferedOutputStream: outputStream (= std::move(aOutputputStream)) =0x7f600000f5f8
00:10.83 GECKO(1291930) [Parent 1291930, Main Thread] WARNING: NS_ENSURE_TRUE(frame) failed: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/base/src/nsMsgWindow.cpp:67
00:10.83 GECKO(1291930) [Parent 1291930, Main Thread] WARNING: NS_ENSURE_TRUE(frame) failed: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/base/src/nsMsgWindow.cpp:67
00:10.83 GECKO(1291930) [Parent 1291930, Main Thread] WARNING: NS_ENSURE_TRUE(frame) failed: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/base/src/nsMsgWindow.cpp:67
00:10.92 GECKO(1291930) NS_NewBufferedOutputStream: outputStream (= std::move(aOutputputStream)) =0x7f6000005b58
00:11.33 GECKO(1291930) 2025-04-01 05:00:34.388275 UTC - [Parent 1291930: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 55907ce53830 == 10 [pid = 1291930] [id = 9]
         ... omitted ...
 

I am really puzzed that I don't see a similar trace on treeheder job. It is either

  • the timing is very different and the problem is not seen on tree server,
  • the problems may happen but properly whitelisted (but how?) on treeherder, or
  • my local binary tools and those on the treeherder compiler farms are different and these do not happen on treeherder, etc.

My CPU is AMD Ryzen 7 5700X 8-Core Processor (family: 0x19, model: 0x21, stepping: 0x2) from the output of dmesg.

Joel, could this be related to the LLVM llvmpipe version running on CI?

Flags: needinfo?(jmaher)

I don't know much about asan/tsan, I do know that llvm has a lot of custom patches in-tree. It seems that when we upgrade or have issues running either all tests of one type or all tasks in general, it leans towards llvm needing some adjustment.

For example, upgrading our docker image in CI from 18.04 -> 24.04, we found that web-platform-test-reftests had a lot of failures across the board (see bug 1973320). I am awaiting some responses (after PTO) from someone more familiar with LLVM.

The message I am seeing is different from your message.

Do thunderbird/comm-central tests run with tsan in CI? If so, is this local environment the same os? (I assume kernel + library versions is most important)

Flags: needinfo?(jmaher)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: