1636945 - make timeout of mochitest much longer for running mochitest under valgrind successfully.

ISHIKAWA, Chiaki

Assignee

Description

•

5 years ago

•

Edited

Slightly modified quote from https://bugzilla.mozilla.org/show_bug.cgi?id=1631197#c4

I could make mochitest under valgrind run finally after increasing the thread to 5000.
(I did it out of ./mach valgrind-test by creating a wrapper program to run TB under valgrind as noted in Bug 1629433
Running thunderbird binary under valgrind while trying to run mochitest requires way TOO MANY threads (> 1500) !? )

But there was a catch.
mochitest under valgrind slows down significantly as one expects, and
although I did detect a few memory issues that happen very early in the execution of tests, (e.g. Bug 1591177 )
I got the following timeout from mochitest framework, and unfortunately, I realize I don't get any serious memory test coverage of
mochitest under valgrind because of premature termination due to timeout. (My setup worked fine with |make mozmill| AFTER I extended timeout in various mozmill test files.
Either I have to do this manually for mochitest files again, or have to find a global way to extend the timeout up to a large value. I prefer the latter.)

2041:39.38 GECKO(116476) {debug} SetSpec failed : aSpec=messenger/otr/chat.ftl
2042:48.84 GECKO(116476) --116480-- memcheck GC: 78624 nodes, 23241 survivors (29.6%)
2042:48.85 GECKO(116476) --116480-- memcheck GC: 79803 new table size (driftup)
2048:50.88 INFO Buffered messages finished
2048:50.88 INFO TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 370 seconds with no output
2048:50.88 ERROR Force-terminating active process(es).
2048:50.88 INFO Determining child pids from psutil...
2048:50.89 INFO [116477]
2048:50.89 INFO ==> process 116480 launched child process 116520

...
grep TEST-UNEXPECTED-TIMEOUT log1208-mochitest-memcheck.txt |wc 
   48     720    5419

I think this basically means that all the tests under subdirectories of ./comm fail due to timeout when valgrind tests run.

There may be an automatic bloat factor automatically involved for timeout value when |./mach valgrind-test| is issued, but this value obviously will require fine tuning by each local developer who uses different hardware, and thus we need manual override for such longer timeout value.

I searched in the help message from |mach| but could not find any mention of such override. So, unless such manual override for timeout exists for valgrind-test, I wish it is implemented as such.

ISHIKAWA, Chiaki

Assignee

Comment 1

•

5 years ago

BTW, such lengthening of timeout value is also necessary for running xpcshell-tests under valgrind, too.

For that, I manually edited relevant xpshell.ini and added requesttimeoutfactor for tests that timed out.:
e.g. |requesttimeoutfactor = 25| and even 40 or 50 for some tests.

I think we should have a requestvalgrindtimeoutfactor for fine tuning the
timeout value for various tests because requesttimeoutfactor is applied for tests under normal operation and may not be desirable to be placed in community-wide tree, but the valgrind timeout factor would be only meaningful for someone who runs valgrind and thus can be left in the community-wide tree without introducing unduely long timeout for tests that fail due to timeout.

Just a thought.

ISHIKAWA, Chiaki

Assignee

Comment 2

•

5 years ago

Attached file Some observation of strace -p marionette process (TB) — Details

I can't get past by even the first subtest of mochitest before a timeout happens even after I lengthened the timeout value
in runtest.py

            else:
                # We generally want the JS harness or marionette to handle
                # timeouts if they can.
                # The default JS harness timeout is currently 300 seconds.
                # The default Marionette socket timeout is currently 360 seconds.
                # Wait a little (10 seconds) more before timing out here.
                # See bug 479518 and bug 1414063.
                timeout = 370.0*5   #   <=== was 370

            # Detect shutdown leaks for m-bc runs if

Now, while the mochitest under valgrind runs, I found that one of the process is busily spinning of futex, etc.

Attachment is the observation of system calls executed by a pair of relevant processes.

First the list of the processes while the test is running before the timeout happens.
I am using the wrapper named "thunderbird" that runs the original "thunderbird-bin" under valgrind.
pid 199522 is the mochitest invocation.
pid 199523 is the invocation of valgrind to run thunderbird-bin.
pid 199526 and 199582 is the pair of processes that were created during mochitest run.
pid 199526 is the marionette test framework program, I think.: |KERNEL-SRC/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin -marionette -foreground -profile /COMM-CENTRAL/TMP-DIR/tmpH8ngbJ.mozrunner |
199582 is so called |contentproc|.: |/NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin -contentproc -parentBuildID 20200510181245 -prefsLen 1 -prefMapSize 231381 -appdir /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin 199526 socket|
I wonder what that is, but here is the |ps axg | grep thunderbird| output.

ps axg | grep thunderbird
 199522 pts/7    S      0:00 /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird -marionette -foreground -profile /COMM-CENTRAL/TMP-DIR/tmpH8ngbJ.mozrunner
 199523 pts/7    S      0:00 sh -c valgrind --trace-children=yes --fair-sched=yes --smc-check=all-non-file --gen-suppressions=all -v --trace-signals=yes --vex-iropt-register-updates=allregs-at-mem-access --track-origins=yes --child-silent-after-fork=yes --trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,*/bin/certutil,*/bin/pk12util,*/bin/ssltunnel,*/bin/uname,*/bin/which,*/bin/ps,*/bin/grep,*/bin/java,*/fix-stacks,*/firefox/firefox,*/bin/firefox-esr,*/bin/python,*/bin/python2,*/bin/python2.7,*/bin/bash  --max-threads=5000  --max-stackframe=16000000 --num-transtab-sectors=24 --tool=memcheck --freelist-vol=500000000 --redzone-size=128 --px-default=allregs-at-mem-access --px-file-backed=unwindregs-at-mem-access --malloc-fill=0xA5 --free-fill=0xC3 --num-callers=50 --suppressions=/home/ishikawa/Dropbox/myown.sup --show-mismatched-frees=no --show-possibly-lost=no --read-inline-info=yes  /KERNEL-SRC/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin -marionette -foreground -profile /COMM-CENTRAL/TMP-DIR/tmpH8ngbJ.mozrunner  
 199526 pts/7    Rl    16:07 valgrind --trace-children=yes --fair-sched=yes --smc-check=all-non-file --gen-suppressions=all -v --trace-signals=yes --vex-iropt-register-updates=allregs-at-mem-access --track-origins=yes --child-silent-after-fork=yes --trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,*/bin/certutil,*/bin/pk12util,*/bin/ssltunnel,*/bin/uname,*/bin/which,*/bin/ps,*/bin/grep,*/bin/java,*/fix-stacks,*/firefox/firefox,*/bin/firefox-esr,*/bin/python,*/bin/python2,*/bin/python2.7,*/bin/bash --max-threads=5000 --max-stackframe=16000000 --num-transtab-sectors=24 --tool=memcheck --freelist-vol=500000000 --redzone-size=128 --px-default=allregs-at-mem-access --px-file-backed=unwindregs-at-mem-access --malloc-fill=0xA5 --free-fill=0xC3 --num-callers=50 --suppressions=/home/ishikawa/Dropbox/myown.sup --show-mismatched-frees=no --show-possibly-lost=no --read-inline-info=yes /KERNEL-SRC/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin -marionette -foreground -profile /COMM-CENTRAL/TMP-DIR/tmpH8ngbJ.mozrunner
 199582 pts/7    Sl     0:15 valgrind --trace-children=yes --fair-sched=yes --smc-check=all-non-file --gen-suppressions=all -v --trace-signals=yes --vex-iropt-register-updates=allregs-at-mem-access --track-origins=yes --child-silent-after-fork=yes --trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,*/bin/certutil,*/bin/pk12util,*/bin/ssltunnel,*/bin/uname,*/bin/which,*/bin/ps,*/bin/grep,*/bin/java,*/fix-stacks,*/firefox/firefox,*/bin/firefox-esr,*/bin/python,*/bin/python2,*/bin/python2.7,*/bin/bash --max-threads=5000 --max-stackframe=16000000 --num-transtab-sectors=24 --tool=memcheck --freelist-vol=500000000 --redzone-size=128 --px-default=allregs-at-mem-access --px-file-backed=unwindregs-at-mem-access --malloc-fill=0xA5 --free-fill=0xC3 --num-callers=50 --suppressions=/home/ishikawa/Dropbox/myown.sup --show-mismatched-frees=no --show-possibly-lost=no --read-inline-info=yes /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin -contentproc -parentBuildID 20200510181245 -prefsLen 1 -prefMapSize 231381 -appdir /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin 199526 socket
 202049 pts/6    S+     0:00 grep thunderbird
root@ip030:/home/ishikawa#

Now, from strace output, 199582 was stuck with the following and not that busily consuming CPU., or is it? Nothing happens until I hit control-C to exit from strace.

 strace -p 199582
strace: Process 199582 attached
restart_syscall(<... resuming interrupted read ...>^Cstrace: Process 199582 detached
 <detached ...>

PID 199526 is totally another story. As soon as I attach strace to it, it spews out lines continously.
I only except a dozen or lines from strace below.
You can see that there seems to be some type of busy loop happening (!).
This is probably the cause of 100% CPU usage (LOAD is 1.1). I am not sure what the mmap() is doing here.
mmap() pops now and then and its address seemed to be increasing by 0x4000 each time it is called.

strace -p 199526
strace: Process 199526 attached
futex(0x100be72928, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x100be7292c, FUTEX_WAIT_PRIVATE, 10316, NULL) = 0
gettid()                                = 199526
gettid()                                = 199526
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
rt_sigtimedwait(~[], 0x100907de30, {tv_sec=0, tv_nsec=0}, 8) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 199526
futex(0x100be72930, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x100be72934, FUTEX_WAIT_PRIVATE, 10324, NULL) = 0
gettid()                                = 199526
gettid()                                = 199526
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
rt_sigtimedwait(~[], 0x100907de30, {tv_sec=0, tv_nsec=0}, 8) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 199526
futex(0x100be72938, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x100be7293c, FUTEX_WAIT_PRIVATE, 10342, NULL) = 0
gettid()                                = 199526
gettid()                                = 199526
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
rt_sigtimedwait(~[], 0x100907de30, {tv_sec=0, tv_nsec=0}, 8) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 199526
futex(0x100be72940, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x100be72944, FUTEX_WAIT_PRIVATE, 10301, NULL) = 0
gettid()                                = 199526
gettid()                                = 199526
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
rt_sigtimedwait(~[], 0x100907de30, {tv_sec=0, tv_nsec=0}, 8) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 199526
futex(0x100be72908, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x100be7290c, FUTEX_WAIT_PRIVATE, 10294, NULL) = 0
gettid()                                = 199526
gettid()                                = 199526
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
rt_sigtimedwait(~[], 0x100907de30, {tv_sec=0, tv_nsec=0}, 8) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 199526
mmap(0x1134bfa000, 16384, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0, 0) = 0x1134bfa000
futex(0x100be72910, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x100be72914, FUTEX_WAIT_PRIVATE, 10327, NULL) = 0
gettid()                                = 199526
gettid()                                = 199526

The address from mmap() seems to be increasing every time mmap() is called.

So I think there is a busy loop that consumes too much CPU time. This hurts the valgrind execution. (I am not sure if the busy loop above is
in the TB code or the valgrind program itself. I am assuming it is in the TB binary.)

This is what I found after I increased the time out to close to 2000 seconds, and not being able to finish a single mochitest subtest at all.

ISHIKAWA, Chiaki

Assignee

Comment 3

•

5 years ago

•

Edited

WAIT. I thought PID 199526 is spinning on the same futex, but I noticed that the address (1st argument) to futex in the above excerpt is bumped by 4 each time it is called (!!!). Is this related to the issue of valgrind needing to support more than 1500 threads to run mochitest under it(!?)
Bug 1629433

There seems to be a problem of

too many threads being created for TB mochitest,
these seem to wait on MANY futexes,
the wait on different futexes seems to be done in a busy loop manner, at least under a bad condition which I observe locally, thus resulting in high CPU load, and
way too slow valgrind execution.

These seem to all related.

Something is terribly wrong under the hood of mochitest.

ISHIKAWA, Chiaki

Assignee

Comment 4

•

5 years ago

All I want is to run thunderbird under valgrind to find memory-related issues.
With |make mozmill|, this wrapper approach worked fine AFTER I figured out all the relevant timeout values.
Hmm... maybe I need to do something similar.

ISHIKAWA, Chiaki

Assignee

Comment 5

•

5 years ago

•

Edited

Attached file A mochitest run under valgrind until the timeout occurs (then manual ^C killed it). — Details

Updated log after a minimal change to run mochitest under valgrind is
applied.

with the patch in bug 1631197
I can run mochitest with thunderbird binary running valgrind without
the local hack of creating a wrapper named |thunderbird| to run the
original thunderbird binary |thunderbird-bin| under valgrind>).

So other people can now execute mochitest+valgrind on their
Linux PC more comfortably.

But as noted in bug 1629433, I need to pass "--max-threads=5000" to
make it start even.
Then it times out.

So with the patch of bug AAAA and additional valgrind options,
you can check mochitest+valgrind under linux on your local PC until
the fatal timeout occurs.

I extend a few timeout values as well. (I will post another attachment.)

The command I ran to obtain the attached log (partial, only up to the
first time out and error processing related to it). is as follows.

Firstly, before issuing |mach| command, I changed my directory to comm/mail/test directory.
I thought I would skip the calendar related tests (which are invoked
if I issue |mach| at the top of C-C tree. But this made no difference
as far as the timeout issue goes.)

The command is the absolute path name to mach on my PC.

cd /NREF-COMM-CENTRAL/mozilla/comm/mail/test

/NREF-COMM-CENTRAL/mozilla/mach --log-no-times --log-no-times mochitest --valgrind /usr/local/bin/valgrind --valgrind-args=--trace-children=yes,--max-threads=5000,--max-stackframe=16000000,--num-transtab-sectors=24,--tool=memcheck,--freelist-vol=500000000,--redzone-size=128,--px-default=allregs-at-mem-access,--px-file-backed=unwindregs-at-mem-access,--malloc-fill=0xA5,--free-fill=0xC3,--num-callers=50,--suppressions=/home/ishikawa/Dropbox/myown.sup,--show-mismatched-frees=no,--show-possibly-lost=no,--read-inline-info=yes,--fair-sched=yes,--smc-check=all-non-file,--gen-suppressions=all,-v,--trace-signals=yes,--vex-iropt-register-updates=allregs-at-mem-access,--track-origins=yes,--child-silent-after-fork=yes,--suppressions=/home/ishikawa/Dropbox/myown.sup

You need to remove or replace
|--suppressions=/home/ishikawa/Dropbox/myown.sup|.
Myown.sup contains valgrind suppressions I collected over the years.

In many attempts so far, the TB+valgrind seems to do something (100% CPU from xosview display) but nothing much happens after "Waiting for browser..." message, and then times out.
"Browser" here means TB if I am not mistaken.

15:03.46 INFO runtests.py | Waiting for browser...
20:03.38 INFO Buffered messages finished
20:03.38 INFO TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 2960 seconds with no output
20:03.38 ERROR Force-terminating active process(es).
20:03.38 INFO Determining child pids from psutil...
20:03.39 INFO []
20:03.39 INFO ==> process 335704 launched child process 335745
20:03.39 INFO Found child pids: set([335745])
20:03.39 INFO Failed to get child procs
20:03.39 INFO Killing process: 335745
20:03.39 INFO Failed to retrieve MOZ_UPLOAD_DIR env var
20:03.39 INFO Can't trigger Breakpad, just killing process
20:03.39 INFO Error: Failed to kill process 335745: psutil.NoSuchProcess no process found with pid 335745
20:03.39 INFO Killing process: 335704
20:03.39 INFO Not taking screenshot here: see the one that was previously logged
20:03.39 INFO Can't trigger Breakpad, just killing process
20:33.41 INFO failed to kill pid 335704 after 30s
21:03.42 WARNING failed to kill pid 335704 after 30s

I wonder if other people also experience the timeout on the linux PCs.

YMMV.

BTW, at one point in time when issue the above |mach| command without the flurry of options, I got a mysterious error from valgrind. Thunderbird binary is large.

mmap(0x58000000, 2756608) failed in UME with error 22 (Invalid argument). valgrind: this can be caused by executables with very large text, data or bss segments.

It was noted in
https://bugs.kde.org/show_bug.cgi?id=290061
https://bugs.kde.org/show_bug.cgi?id=138424

I wonder if this potential issue of large segments have something to do with my failure
to run thunderbird under valgrind as an ordinary user under Debian GNU/Linux.
But I CAN as superuser.
This has been like this since Debian's kernel 3.x series onward. There was a time when a version 3.9 or 11 [can't recall off my head], TB+valgrind worked as an ordinary user, but ever since I have not been able to run TB+valgrind as an ordinary user for like three years now. I accidentally found that I can run it as superuser late last year.
This superuser running valgrind+TB to an X session of an ordinary user may be the source of a proble. BUT |make mozmill|+valgrind ran fine as superuser and showed the display to an X session of an ordinary user without any issues.

I THINK there is something in mochitest that is NOT friendly to TB at all at this stage when valgrind test kicks in. The number of threads (> 1500) is a tip of an iceberg.

TIA

ISHIKAWA, Chiaki

Assignee

Comment 6

•

5 years ago

Attached patch Lengthening timeout value (but to no avail). — Details — Splinter Review

I changed some timeout values I could find.
370 that came from runtest.py is the one that kicked in.
But I think there IS something wrong with mochitest+TB+valgrind combination today.
370*8 did not eliminate the timeout.

Assignee: nobody → ishikawa

Wayne Mery (:wsmwk)

Comment 7

•

4 years ago

do you need info from someone?

Flags: needinfo?(ishikawa)

ISHIKAWA, Chiaki

Assignee

Comment 8

•

4 years ago

(In reply to Wayne Mery (:wsmwk) from comment #7)

do you need info from someone?

If someone can give a clue of what is going on, I would very much appreciate it.
|make mozmill| worked flawlessly with valgrind.
I can only surmize that mochitest framework does something very unfriendly to valgrind in the case of TB.
I am not sure how FF people manage to run FF mochitest under valgrind at all.

Flags: needinfo?(ishikawa)

ISHIKAWA, Chiaki

Assignee

Comment 9

•

3 years ago

In bug 1629433,
I tried to track down why there were so many threads running inside TB running mochitest.
While investigating, I somehow stumbled upon a timeout message coming from marionette.py.

After I increased the timeout value, some subtests of mochitest were running under valgrind.
I will summarize my finding there, and then when all the timeout issues are clarified, I will report the result here, also.

ISHIKAWA, Chiaki

Assignee

Comment 10

•

3 years ago

•

Edited

During TB's mochitest, I　observe several test programs running in parallel.

Correct me if I am wrong.

   mach  mochitest --valgrind valbrind ...

```
  xpcshell .../mochitest/server.js
```
```
   python3 ...mochitest
```

   ssltunnel .. for whatever communication among the test modules

  *thunderbird -marionette -foerground ...

  *thunderbird -contetproc  -childID 1 -isForBrowser ...

  *thunderbird -contetproc  -childID 2 -isForBrowser ...

1. |mach| is the control progfram that invokes mochitest. It is the overall control program.
1. xpcshell runs a javascript called server.js. I am not sure what it is, but it must handle some commands (a server), but not sure
  from which it receives the commands. (the next process 3)
1. python3 runs a script calls mochitest. So presumably this is the heart of mochitest, that looks for various test files and
  fires off these tests.
1. ssltunnel. Not sure what it is for. But it seems that it handles the encrypted communication between some modules (between what modules exactly?).
5* thunderbird binaries run with different options. The first one runs with -marionette. So this must be controlled by marionette commands issued by what module(?).
6* thunderbird binary runs wiith -contentproc -childID 1 -isForBrowser ...
This seems to be for displaying thunderbird pane, etc. (Not sure).
7* Ditto. Occasionally I see the second invocation of thunderbird for contentproc but with different childID 2.

Now I want to understand the relationship between these modules and
wonder, to the extent that thunderbird is tested with xpcshell-test (without head, i.e. no display),
if we can eliminate the valgrind tracing of, say, the thunderbird binary executed with "-marionette" option.
Right now, the processes marked with "*", i.e., 5, 6, and 7 are traced under my local valgrind test.

This is to reduce the elapsed time of valgrind run.
Right now, actually, I instructed the skipping of tracing of the process 2 above, " xpcshell .../mochitest/server.js"
Now I am not sure if that was the right thing to do.

I wonder if someone in the know can give a quick overview of what processes/programs comprise the thunderbird mochitest framework, and
what processes need to be traced for valgrind testing and what processes do not need to traced by valgrind.

I am asking Wayne first to see if he can pass this to someone in the know.
I really need to cut down on the valgrind test elapsed time.

Flags: needinfo?(vseerror)

Wayne Mery (:wsmwk)

Comment 11

•

3 years ago

(In reply to ISHIKAWA, Chiaki from comment #10)

...
I wonder if someone in the know can give a quick overview of what processes/programs comprise the thunderbird mochitest framework, and
what processes need to be traced for valgrind testing and what processes do not need to traced by valgrind.

I am asking Wayne first to see if he can pass this to someone in the know.
I really need to cut down on the valgrind test elapsed time.

Maybe Ben or Joshua. Failing that, perhaps glandium or Magnus

N.B. bug 1629433

Flags: needinfo?(vseerror)

Flags: needinfo?(benc)

Flags: needinfo?(Pidgeot18)

ISHIKAWA, Chiaki

Assignee

Comment 12

•

3 years ago

Thank you, Wayne.
I have read the following document, but it is short on the details.
http://dgt.gob.gt/PdfView/test/mozcentral/file_pdfjs_test.pdf

Google does not turn up useful blogs when I searched for mochitest. Grr...

ISHIKAWA, Chiaki

Assignee

Comment 13

•

3 years ago

•

Edited

I think possibly we can break down the valgrind mochitest execution into two overlapping parts.
(a) Execution where marionette process (thunderbird -marionette -foerground ... ) is traced under valgrind, but
content process (thunderbird -contentproc ...) is not traced under valgrind.
(b) The other mode is where maironette process is NOT traced and content process is traced.

This will cut down the slowdown by valgrind of at least one process and may be made to execute in a reasonable setting of timeout instead o the extreme timeout which may be required when the two processes (marionette and contentproc) both are traced.

HOW: I　think we can somehow tweak the execution of one of the process, say, marionette process by invoking |thunderbird-bin| instead of |thunderbird|, and then tell valgrind to NOT to trace either one of them.

For example, if we tell valgrind not to trace |thunderbird-bin|, then marionette process is NOT traced, but content process is traced. (Case (b) above).
Case (a) can be implemented by not tracing |thunderbird|.

Just a thought.
But the significant slow down caused by the two parallel processes both traced under valgrind at the same time may need to be handled this way.

I wonder how |make mozmill| was executed. It was relatively easy to run it under valgrind (actually took me a few months to figure out the right timeouts values)
.

ISHIKAWA, Chiaki

Assignee

Updated

•

3 years ago

Comment 14

•

2 years ago

Chiaki, is this still slowing down your development work?

Flags: needinfo?(benc) → needinfo?(ishikawa)

ISHIKAWA, Chiaki

Assignee

Comment 15

•

2 years ago

Yes, we need better support valgrind for memory issues.

Flags: needinfo?(ishikawa)

ISHIKAWA, Chiaki

Assignee

Comment 16

•

2 years ago

(In reply to ISHIKAWA, Chiaki from comment #15)

Yes, we need better support valgrind for memory issues.

Not exactly slowing down.
But it is a hit or miss situation.
Some tests do run under valgrind.
Some do not.

Worse, in the least couple of months, something changed and make it impossible for me to run TB under valgrind.
To be exact, it seems to get stuck during execution.
I am not sure if it is valgrind that gets stuck, but it is more likely that TB is deadlocked due to slowing down of some operations.
I suspect something that should proceed quickly does not and blocks other threads, and this leads to a deadlock. (I don't see much CPU usage while I wait for TB to proceed.)
Tough nut to crack.
Factors that changed.:
OS version (I use Debian GNU/Linux)
Libraries linked with TB
TB
valgrind itself

I take that nobody runs TB under valgrind on tryserver. Or does someone?

ISHIKAWA, Chiaki

Assignee

Comment 17

•

2 years ago

valgrind test gives us certain level of assurance that memory-related problems do not exist.
I have found probably half a dozen or so memory-related issues using valgrind (one of them was deemed as worth a security bounty)
and thus it certainly is nice to be able to run TB under valgrind during mochitest.

Some observation of strace -p marionette process (TB) 5 years ago ISHIKAWA, Chiaki 135.01 KB, text/plain		Details
A mochitest run under valgrind until the timeout occurs (then manual ^C killed it). 5 years ago ISHIKAWA, Chiaki 114.05 KB, text/plain		Details
Lengthening timeout value (but to no avail). 5 years ago ISHIKAWA, Chiaki 3.87 KB, patch		Details \| Diff \| Splinter Review

Bugzilla

make timeout of mochitest much longer for running mochitest under valgrind successfully.

Categories

(Thunderbird :: Build Config, defect)

Tracking

(Not tracked)

People

(Reporter: ishikawa, Assigned: ishikawa, NeedInfo)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Comment 14

Comment 15

Comment 16

Comment 17

Attachment

General

Description

File Name

Content Type