Leak stacks don't work on Taskcluster test runs

RESOLVED FIXED in mozilla48

Status

defect
RESOLVED FIXED
3 years ago
5 months ago

People

(Reporter: RyanVM, Assigned: armenzg)

Tracking

unspecified
mozilla48
Dependency tree / graph

Details

Attachments

(1 attachment)

Reporter

Description

3 years ago
https://treeherder.mozilla.org/logviewer.html#?job_id=17319342&repo=try

 20:20:43     INFO -  allocation stack:

 20:20:43     INFO -  #00: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #01: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #02: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #03: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #04: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #05: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #06: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #07: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #08: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #09: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

 20:20:43     INFO -  #10: start_thread (/build/eglibc-rrybNj/eglibc-2.15/nptl/pthread_create.c:308)

 20:20:43     INFO -  #11: clone (/build/eglibc-rrybNj/eglibc-2.15/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:114)

20:20:43 INFO - 14 @0x7f2e0bc67670 (0 references; 0 from COMPtrs)
Reporter

Comment 1

3 years ago
To be clear, these stacks are typically generated when adding |"--setenv=XPCOM_MEM_LOG_CLASSES=<the leaking class you want a stack for>"| to the appropriate mozharness test config in the tree.

An example of this in action - https://hg.mozilla.org/try/rev/4689074c7b22 - which gives a log like https://treeherder.mozilla.org/logviewer.html#?job_id=17324551&repo=try#L74993 when working correctly.
Reporter

Updated

3 years ago
Blocks: 1243024
No longer blocks: tc-linux-debug-tier1
Assignee

Comment 2

3 years ago
Where can I learn about how these stacks work? or make them work?
Flags: needinfo?(ted)
I don't think the leak log bit is super relevant (despite the way the bug is filed). I think the bug is that the stack hasn't been symbolized, which is supposed to happen inside the Mochitest harness. It should be using the same code we use to get symbols for assertion stacks:
https://dxr.mozilla.org/mozilla-central/rev/5e0140b6d11821e0c2a2de25bc5431783f03380a/testing/mochitest/runtests.py#2517

The actual implementation of that is in mozrunner:
https://dxr.mozilla.org/mozilla-central/rev/5e0140b6d11821e0c2a2de25bc5431783f03380a/testing/mozbase/mozrunner/mozrunner/utils.py#246
Flags: needinfo?(ted)
This should be simpler to test by making a test cause an assertion. You can do that by calling a method on the nsIDebug interface, like:
https://dxr.mozilla.org/mozilla-central/rev/5e0140b6d11821e0c2a2de25bc5431783f03380a/testing/xpcshell/selftest.py#539
Moving to TaskCluster component...
Component: TaskCluster → Integration
Product: Testing → Taskcluster
Version: Version 3 → unspecified
:armenzg, this one issue seemed to have fallen through the cracks, can you pick this up and help get this solved?
Flags: needinfo?(armenzg)
Assignee

Comment 7

3 years ago
Sure!
Assignee: nobody → armenzg
Flags: needinfo?(armenzg)
Assignee

Comment 8

3 years ago
Let me try to rephrase this, in this push [1] we can see logs of jobs that should have a meaningful allocation stacks but don't [2]
In one type of jobs we download the symbols on deman while on the Buildbot ones we don't.
I see that get_stack_fixer() takes a symbolsPath. This could be related [3]

Should we move away from using ondemand for downloading symbols?
stackFixer() is called in the __init__ of the OutputHandler.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=96b972ba996b

[2]
https://public-artifacts.taskcluster.net/HYqHzLTjR3yOkSR4NVWqkA/0/public/logs/live_backing.log
19:20:51     INFO -  allocation stack:
19:20:52     INFO -  #00: ??? (/home/worker/workspace/build/application/firefox/libxul.so)
19:20:52     INFO -  #01: ??? (/home/worker/workspace/build/application/firefox/libxul.so)

[3]
https://dxr.mozilla.org/mozilla-central/rev/5e0140b6d11821e0c2a2de25bc5431783f03380a/testing/mozbase/mozrunner/mozrunner/utils.py#266
I believe that buildbot always downloads symbols for debug tests ahead of time to handle cases like this.
(In reply to Chris AtLee [:catlee] from comment #9)
> I believe that buildbot always downloads symbols for debug tests ahead of
> time to handle cases like this.

Yes, this is correct. For non-debug builds having the symbol path be a URL works great because mozcrash will download the symbols on-demand to produce a stack for a crash. For debug builds, we will almost always hit an assertion which will print a stack like this and needs symbols to make it useful, so we should download them before running tests, unpack them, and set the symbol path to the local directory.
No longer blocks: 1243024
Assignee

Comment 11

3 years ago
Do we need the change for every debug job? or only for mochitests?
Every debug test job.
Assignee

Comment 14

3 years ago
Comment on attachment 8741378 [details]
MozReview Request: Bug 1251893 - Change TC Linux64 debug jobs to always download symbols. r=ted

Not ready for review. Sorry.
Attachment #8741378 - Flags: review?(ted)
Assignee

Comment 15

3 years ago
Comment on attachment 8741378 [details]
MozReview Request: Bug 1251893 - Change TC Linux64 debug jobs to always download symbols. r=ted

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46439/diff/1-2/
Attachment #8741378 - Flags: review?(ted)
Assignee

Comment 16

3 years ago
hrmm my next review push raised the flag back.

Anyways, here's the push that shows that nothing breaks [1]
Here's a command showing --download-symbols=true instead of ondemand [2]

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=637e5197199933ea5ad2f931fcf13ba4e54199d3&filter-searchStr=tc&group_state=expanded&selectedJob=19497053
[2] /home/worker/workspace/mozharness/scripts/desktop_unittest.py --config-file mozharness/configs/unittests/linux_unittest.py --config-file mozharness/configs/remove_executables.py --no-read-buildbot-config --installer-url=https://queue.taskcluster.net/v1/task/M_n8wx4wR9aaBC_mEwj-Cw/artifacts/public/build/target.tar.bz2 --test-packages-url=https://queue.taskcluster.net/v1/task/M_n8wx4wR9aaBC_mEwj-Cw/artifacts/public/build/target.test_packages.json --download-symbols=true --gtest-suite=gtest
Comment on attachment 8741378 [details]
MozReview Request: Bug 1251893 - Change TC Linux64 debug jobs to always download symbols. r=ted

https://reviewboard.mozilla.org/r/46439/#review43875

I don't fully understand the current state of the in-tree task definitions, but it looks like these same tasks would be used for opt tests, is that right? If so then changing these to always download symbols isn't what we want. We want opt tests to continue to download ondemand.
Assignee

Comment 18

3 years ago
At the moment there are no opt tests.
gbrown is likely to tackle this and we will need to split the yml files into opt/debug.
We can have opt jobs at that point use ondemand.

Can we get this landed and ask gbrown to make sure to use ondemand for opt jobs?
Assignee

Comment 19

3 years ago
wrt to comment 18
Flags: needinfo?(ted)
(In reply to Armen Zambrano [:armenzg] - Engineering productivity from comment #18)
> At the moment there are no opt tests.
> gbrown is likely to tackle this and we will need to split the yml files into
> opt/debug.
> We can have opt jobs at that point use ondemand.
> 
> Can we get this landed and ask gbrown to make sure to use ondemand for opt
> jobs?

OK. I'm not wild about leaving footguns in the tree for the next person to deal with, but if this is something we've already scoped, maybe you can just put a comment in the task definitions to note that those should be `ondemand` for opt jobs?
Flags: needinfo?(ted)
Comment on attachment 8741378 [details]
MozReview Request: Bug 1251893 - Change TC Linux64 debug jobs to always download symbols. r=ted

https://reviewboard.mozilla.org/r/46439/#review44265

::: testing/taskcluster/tasks/tests/fx_desktop_unittest.yml:12
(Diff revision 2)
>        - bash
>        - /home/worker/bin/test.sh
>        - --no-read-buildbot-config
>        - --installer-url={{build_url}}
>        - --test-packages-url={{test_packages_url}}
> -      - --download-symbols=ondemand
> +      - --download-symbols=true

As I said, let's put a comment here to let gbrown (or whoever winds up doing the work) know that opt tests need to use `ondemand`, so that we don't wind up with the reverse of this problem in the near future.
Assignee

Comment 22

3 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #21)
> > -      - --download-symbols=ondemand
> > +      - --download-symbols=true
> 
> As I said, let's put a comment here to let gbrown (or whoever winds up doing
> the work) know that opt tests need to use `ondemand`, so that we don't wind
> up with the reverse of this problem in the near future.

This is scoped in bug 1265773 by gbrown.
I will land since it is already going to be worked on.
Assignee

Comment 23

3 years ago
I will add comments and land it tomorrow.
There are a few debug test jobs that still use ondemand symbols after this change:

  fx_linux64_firefox_ui_functional.yml
  fx_linux64_firefox_ui_functional_e10s.yml

and some that don't specify:

  fx_linux64_luciddream.yml
  fx_linux64_mochitest_vg.yml

Is that intentional?
Assignee

Comment 25

3 years ago
(In reply to Geoff Brown [:gbrown] from comment #24)
> There are a few debug test jobs that still use ondemand symbols after this
> change:
> 
>   fx_linux64_firefox_ui_functional.yml
>   fx_linux64_firefox_ui_functional_e10s.yml
> 
These are whimboo's jobs and these don't run on developer's pushes but on nightly jobs.
They're not debug jobs.

> and some that don't specify:
> 
>   fx_linux64_luciddream.yml
>   fx_linux64_mochitest_vg.yml
> 
> Is that intentional?

Luciddream (not running) and valgrind are only for opt builds.
Assignee

Comment 27

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/cc6534601ad80c04c260493112e9a8d621c304ad
Bug 1251893 - Change TC Linux64 debug jobs to always download symbols. r=ted

Comment 28

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/cc6534601ad8
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Component: Integration → Services
You need to log in before you can comment on or make changes to this bug.