Closed Bug 1360650 Opened 3 years ago Closed 2 years ago

Taskcluster Windows ASan builds missing symbols

Categories

(Firefox Build System :: General, defect)

defect
Not set

Tracking

(firefox59 fixed)

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: tsmith, Assigned: ting)

References

Details

Attachments

(3 files, 1 obsolete file)

Attached file log.txt
I am unable to symbolize ASan backtracks that I get from builds found on taskcluster[1].

ASAN_SYMBOLIZER_PATH is set correctly and llvm-symbolizer.exe exists. It seems that the symbol information that is required is not included in the build.

I ran with ASAN_OPTIONS=verbosity=2 here is part of the output:
==1144==SetCurrentThread: 0x017190650000 for thread 0x0000000007f4
==1144==T0: stack [0x00402a200000,0x00402a800000) size 0x600000; local=0x00402a7fe46c
==1144==Using llvm-symbolizer at user-specified path: C:\Users\user\Desktop\firefox\llvm-symbolizer.exe
==1144==AddressSanitizer Init done
...
See log.txt for full log

[1] https://tools.taskcluster.net/index/artifacts/#gecko.v2.mozilla-central.latest.firefox/gecko.v2.mozilla-central.latest.firefox.win64-asan-opt
Assignee: nobody → janus926
Blocks: 1361185
No longer blocks: 1361185
It seems that we need MOZ_CRASHREPORTER defined to have the artifact *.crashreporter-symbols-full.zip.
Blocks: 1359097
Verified locally that ASan still reports error like [1] when the binary is built without "ac_add_options --disable-crashreporter".

[1] https://chromium.googlesource.com/chromium/chromium/+/master/base/tools_sanity_unittest.cc#133
Blocks: 1360120
No longer blocks: 1360120
Blocks: 1360120
Duplicate of this bug: 1383918
Address Sanitizer builds intentionally --disable-crashreporter because Breakpad interferes with ASan's own exception handling. The side effect here is that we don't produce symbols.zip files. For Linux/Mac builds this hasn't been a problem because we can just not strip the builds so the binaries contain function symbols. For Windows the symbols are in separate PDB files so that doesn't work right.

Making the ASan builds actually work with --enable-crashreporter is probably a lot of work, and really all you want out of this is the PDB files for them. Maybe we could instead just do something in the packager code to copy PDB files into the package for every binary we are packaging? If we put xul.pdb next to xul.dll in the zip file things ought to work fine.
I have another idea that to archive symbols always instead of controlling it by MOZ_CRASHREPORTER [1], this way every time running ASan we need to download the symbols and put them in a symbol path that dbghelp expects. Comment 4 seems easier, just it makes ASan package different from the normal ones (symbols are in different zips).

[1] http://searchfox.org/mozilla-central/rev/8a61c71153a79cda2e1ae7d477564347c607cc5f/Makefile.in#290
I'd like to fix this by the way described in comment 5, so the same symbolication can apply to both automation and local runs.
Status: NEW → ASSIGNED
Remove following MOZ_CRASHREPORTER ifdef will generate a symbols archive, but without any content...

  ifdef MOZ_CRASHREPORTER
  buildsymbols: symbolsfullarchive symbolsarchive
  else
  buildsymbols:
  endif
Right, because we do the actual symbol dumping during the build now, mostly controlled by this bit of rules.mk:
https://dxr.mozilla.org/mozilla-central/rev/db7f19e26e571ae1dd309f5d2f387b06ba670c30/config/rules.mk#834

However, even if you changed that you'd find that it doesn't work because it's going to try to run dump_syms on the binaries, which we don't build in the --disable-crashreporter configuration:
https://dxr.mozilla.org/mozilla-central/rev/db7f19e26e571ae1dd309f5d2f387b06ba670c30/python/mozbuild/mozbuild/action/dumpsymbols.py#30
It's OOM when download crashreporter-symbols-full.zip on TC:

06:47:54     INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': 'https://queue.taskcluster.net/v1/task/J73E1BWMR3eBjn7j1o-M0Q/artifacts/public/build/target.crashreporter-symbols-full.zip'}, attempt #1
06:47:54     INFO - Fetch https://queue.taskcluster.net/v1/task/J73E1BWMR3eBjn7j1o-M0Q/artifacts/public/build/target.crashreporter-symbols-full.zip into memory
06:48:01     INFO - Running post-action listener: _resource_record_post_action
06:48:01     INFO - Running post-action listener: find_tests_for_verification
06:48:01     INFO - Running post-action listener: set_extra_try_arguments
06:48:01     INFO - [mozharness: 2017-09-04 06:48:01.599000Z] Finished download-and-extract step (failed)
06:48:01    FATAL - Uncaught exception: Traceback (most recent call last):
06:48:01    FATAL -   File "Z:\task_1504504617\mozharness\mozharness\base\script.py", line 2058, in run
06:48:01    FATAL -     self.run_action(action)
06:48:01    FATAL -   File "Z:\task_1504504617\mozharness\mozharness\base\script.py", line 1997, in run_action
06:48:01    FATAL -     self._possibly_run_method(method_name, error_if_missing=True)
06:48:01    FATAL -   File "Z:\task_1504504617\mozharness\mozharness\base\script.py", line 1937, in _possibly_run_method
06:48:01    FATAL -     return getattr(self, method_name)()
06:48:01    FATAL -   File "mozharness\scripts\desktop_unittest.py", line 576, in download_and_extract
06:48:01    FATAL -     suite_categories=target_categories)
06:48:01    FATAL -   File "Z:\task_1504504617\mozharness\mozharness\mozilla\testing\testbase.py", line 596, in download_and_extract
06:48:01    FATAL -     self._download_and_extract_symbols()
06:48:01    FATAL -   File "Z:\task_1504504617\mozharness\mozharness\mozilla\testing\testbase.py", line 557, in _download_and_extract_symbols
06:48:01    FATAL -     self.download_unpack(self.symbols_url, self.symbols_path)
06:48:01    FATAL -   File "Z:\task_1504504617\mozharness\mozharness\base\script.py", line 705, in download_unpack
06:48:01    FATAL -     **retry_args
06:48:01    FATAL -   File "Z:\task_1504504617\mozharness\mozharness\base\script.py", line 1133, in retry
06:48:01    FATAL -     status = action(*args, **kwargs)
06:48:01    FATAL -   File "Z:\task_1504504617\mozharness\mozharness\base\script.py", line 401, in fetch_url_into_memory
06:48:01    FATAL -     response_body = response.read()
06:48:01    FATAL -   File "c:\mozilla-build\python\lib\socket.py", line 362, in read
06:48:01    FATAL -     buf.write(data)
06:48:01    FATAL - MemoryError: out of memory
06:48:01    FATAL - Running post_fatal callback...
06:48:01    FATAL - Exiting -1
I'd like to use requests to download the archive streamingly to avoid the OOM in comment 9, but not sure how to install the package for the python in mozilla-build also whether it is a good idea. What do you think?
Flags: needinfo?(aki)
Or maybe download_unpack should fallback to download_file when fetch_url_into_memory OOM.
(In reply to Ting-Yu Chou [:ting] from comment #11)
> Or maybe download_unpack should fallback to download_file when
> fetch_url_into_memory OOM.

I think download_file rather than fetch_url_into_memory is probably a good idea in general.
Flags: needinfo?(aki)
In this Try run [1], crashreporter-symbols-full.zip was generated, downloaded and extracted; also _NT_SYMBOL_PATH was set. But still ASan reported error without symbols. I'll see if I can reproduce it locally.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=24d0814e98d08d4f58720c71e57e728e96811cf0&filter-tier=1&filter-tier=2&filter-tier=3
(In reply to Ting-Yu Chou [:ting] from comment #13)
> In this Try run [1], crashreporter-symbols-full.zip was generated,
> downloaded and extracted; also _NT_SYMBOL_PATH was set. But still ASan
> reported error without symbols. I'll see if I can reproduce it locally.

It's because 1) LLVM_ENABLE_DIA_SDK is not enabled, and 2) the *.pd_ in crashreporter-symbols-full.zip are zipped files.
llvm-symbolizer.exe seems searching pdb file in the directories as dumpbin. When I have _NT_SYMBOL_PATH=cache*w:\fx\sandbox\bug-1360650\build\symbols, dumpbin searches in following folders but not the symbol cache:

  C:\Users\ting>"c:\Program Files (x86)\Microsoft Visual Studio 14.0\vc\bin\amd64\dumpbin.exe" /PDBPATH:verbose w:\fx\sandbox\bug-1360650\build\application\firefox\xul.dll
  Microsoft (R) COFF/PE Dumper Version 14.00.24215.1
  Copyright (C) Microsoft Corporation.  All rights reserved.


  Dump of file w:\fx\sandbox\bug-1360650\build\application\firefox\xul.dll

  File Type: DLL
    PDB file 'w:\fx\sandbox\bug-1360650\build\application\firefox\xul.pdb' checked.  (File not found)
    PDB file 'z:\build\build\src\obj-firefox\toolkit\library\xul.pdb' checked.  (File not found)
    PDB file 'C:\WINDOWS\symbols\dll\xul.pdb' checked.  (File not found)
    PDB file 'C:\WINDOWS\dll\xul.pdb' checked.  (File not found)
    PDB file 'C:\WINDOWS\xul.pdb' checked.  (File not found)

    Summary

        BFC000 .data
          1000 .gfids
          1000 .orpc
        1D5000 .pdata
       2A68000 .rdata
        25A000 .reloc
        A76000 .rodata
          3000 .rsrc
       F58B000 .text
          1000 .tls

But symchk.exe searches the symbol cache and is able to find the PDB:

  C:\Users\ting>"c:\Program Files (x86)\Windows Kits\10\Debuggers\x64\symchk.exe" w:\fx\sandbox\bug-1360650\build\application\firefox\xul.dll /v
  [SYMCHK] Searching for symbols to w:\fx\sandbox\bug-1360650\build\application\firefox\xul.dll in path cache*w:\fx\sandbox\bug-1360650\build\symbols
  DBGHELP: Symbol Search Path: cache*w:\fx\sandbox\bug-1360650\build\symbols
  [SYMCHK] Using search path "cache*w:\fx\sandbox\bug-1360650\build\symbols"
  DBGHELP: No header for w:\fx\sandbox\bug-1360650\build\application\firefox\xul.dll.  Searching for image on disk
  DBGHELP: w:\fx\sandbox\bug-1360650\build\application\firefox\xul.dll - OK
  DBGHELP: w:\fx\sandbox\bug-1360650\build\application\firefox\xul.dll cached to w:\fx\sandbox\bug-1360650\build\symbols\xul.dll\59AF76D013a9b000\xul.dll
  SYMSRV:  BYINDEX: 0x1
           w:\fx\sandbox\bug-1360650\build\symbols
           xul.pdb
           2E471BE5E2854139BEDED792C023F8B61
  SYMSRV:  PATH: w:\fx\sandbox\bug-1360650\build\symbols\xul.pdb\2E471BE5E2854139BEDED792C023F8B61\xul.pdb
  SYMSRV:  RESULT: 0x00000000
  DBGHELP: xul - private symbols & lines
          w:\fx\sandbox\bug-1360650\build\symbols\xul.pdb\2E471BE5E2854139BEDED792C023F8B61\xul.pdb
  [SYMCHK] MODULE64 Info ----------------------
  [SYMCHK] Struct size: 1680 bytes
  [SYMCHK] Base: 0x0000000180000000
  [SYMCHK] Image size: 329887744 bytes
  [SYMCHK] Date: 0x59af76d0
  [SYMCHK] Checksum: 0x139274a0
  [SYMCHK] NumSyms: 0
  [SYMCHK] SymType: SymPDB
  [SYMCHK] ModName: xul
  [SYMCHK] ImageName: w:\fx\sandbox\bug-1360650\build\application\firefox\xul.dll
  [SYMCHK] LoadedImage: w:\fx\sandbox\bug-1360650\build\symbols\xul.dll\59AF76D013a9b000\xul.dll
  [SYMCHK] PDB: "w:\fx\sandbox\bug-1360650\build\symbols\xul.pdb\2E471BE5E2854139BEDED792C023F8B61\xul.pdb"
  [SYMCHK] CV: RSDS
  [SYMCHK] CV DWORD: 0x53445352
  [SYMCHK] CV Data:  z:\build\build\src\obj-firefox\toolkit\library\xul.pdb
  [SYMCHK] PDB Sig:  0
  [SYMCHK] PDB7 Sig: {2E471BE5-E285-4139-BEDE-D792C023F8B6}
  [SYMCHK] Age: 1
  [SYMCHK] PDB Matched:  TRUE
  [SYMCHK] DBG Matched:  TRUE
  [SYMCHK] Line nubmers: TRUE
  [SYMCHK] Global syms:  TRUE
  [SYMCHK] Type Info:    TRUE
  [SYMCHK] ------------------------------------
  SymbolCheckVersion  0x00000002
  Result              0x001f0001
  DbgFilename
  DbgTimeDateStamp    0x59af76d0
  DbgSizeOfImage      0x13a9b000
  DbgChecksum         0x139274a0
  PdbFilename         w:\fx\sandbox\bug-1360650\build\symbols\xul.pdb\2E471BE5E2854139BEDED792C023F8B61\xul.pdb
  PdbSignature        {2E471BE5-E285-4139-BEDE-D792C023F8B6}
  PdbDbiAge           0x00000001
  [SYMCHK] [ 0x00000000 - 0x001f0001 ] Checked "w:\fx\sandbox\bug-1360650\build\application\firefox\xul.dll"

  SYMCHK: FAILED files = 0
  SYMCHK: PASSED + IGNORED files = 1
Set DBGHELP_DBGOUT=1 before running llvm-symbolizer doesn't output any dbghelp logging.
So llvm-symbolizer.exe uses DIA and loadDataForExe does not search the symbol cache directory if I set

  _NT_SYMBOL_PATH=cache*W:\fx\sandbox\bug-1360650\build\symbols

but it finds the PDB if I set

  _NT_SYMBOL_PATH=W:\fx\sandbox\bug-1360650\build\symbols

This is a bit harassed. I'll try Ted's suggestion in comment 4.
Attached patch wip v1 (obsolete) — Splinter Review
With this WIP, the pdb are archived together with the .dll and .exe in target.zip. Just win64-asan build is failed now when running build check.
Comment on attachment 8909706 [details]
Bug 1360650 part 2 - Export VSINSTALLDIR so LLVM_ENABLE_DIA_SDK will be set.

https://reviewboard.mozilla.org/r/180858/#review186554
Attachment #8909706 - Flags: review?(ehsan) → review+
Attachment #8909170 - Attachment is obsolete: true
Attachment #8909705 - Flags: review?(ted) → review?(core-build-config-reviews)
Component: General → Build Config
Comment on attachment 8909705 [details]
Bug 1360650 part 1 - Archive pdb files of the binaries for llvm-symbolizer.

https://reviewboard.mozilla.org/r/180856/#review202548

::: config/rules.mk:824
(Diff revision 1)
>  endif
>  
>  define syms_template
>  syms:: $(2)
>  $(2): $(1)
> +ifdef MOZ_CRASHREPORTER

Do you need these changes now given that you're just putting the PDB files in the package?

If not, just remove this part of the patch.

::: toolkit/mozapps/installer/packager.py:318
(Diff revision 1)
>                      copier.add(libbase + '.chk',
>                                 LibSignFile(os.path.join(args.destination,
>                                                          libname)))
>  
> +    # Include pdb files for llvm-symbolizer to resolve symbols.
> +    if buildconfig.substs.get('LLVM_SYMBOLIZER') and mozinfo.isWin:

I wish we had a more general "some sanitizer is enabled" subst, but this is probably the closest thing we have.
Attachment #8909705 - Flags: review+
Attachment #8909705 - Flags: review?(core-build-config-reviews)
This failed the debug build on try with:

17:29:54     INFO -  z:/build/build/src/obj-firefox/_virtualenv/Scripts/python.exe -m mozbuild.action.test_archive  gtest 'z:/build/build/src/obj-firefox/dist//target.gtest.tests.zip'
17:29:54     INFO -  Traceback (most recent call last):
17:29:54     INFO -    File "c:\mozilla-build\python\Lib\runpy.py", line 162, in _run_module_as_main
17:29:54     INFO -      "__main__", fname, loader, pkg_name)
17:29:54     INFO -    File "c:\mozilla-build\python\Lib\runpy.py", line 72, in _run_code
17:29:54     INFO -      exec code in run_globals
17:29:54     INFO -    File "z:\build\build\src\python\mozbuild\mozbuild\action\test_archive.py", line 654, in <module>
17:29:54     INFO -      sys.exit(main(sys.argv[1:]))
17:29:54     INFO -    File "z:\build\build\src\python\mozbuild\mozbuild\action\test_archive.py", line 641, in main
17:29:54     INFO -      skip_duplicates=True)
17:29:54     INFO -    File "z:\build\build\src\python\mozbuild\mozpack\mozjar.py", line 626, in add
17:29:54     INFO -      deflater.write(data)
17:29:54     INFO -    File "z:\build\build\src\python\mozbuild\mozpack\mozjar.py", line 710, in write
17:29:54     INFO -      self._data.write(data)
17:29:54     INFO -  MemoryError
17:29:54     INFO -  z:/build/build/src/testing/testsuite-targets.mk:158: recipe for target 'package-tests-gtest' failed
17:29:54     INFO -  mozmake.EXE[3]: *** [package-tests-gtest] Error 1
Also this doesn't appear to include the .pdb for firefox.exe (which is currently needed to investigate bug 1361185)
Blocks: 1361185
Pasting this just so it doesn't get lost -- here are some rebased versions of Ting's patches:
https://hg.mozilla.org/try/rev/68ddc3364df488194f789f89d2c30858ae5c9fe1
https://hg.mozilla.org/try/rev/be9f44dc1a7f8790a61412b657d738f131763c5d
The problems in comment 23 seem to have gone away: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e5515e5beffe4d814877341f339087937f69bd38&filter-tier=1&filter-tier=2&filter-tier=3

I'll try to land these on behalf of Ting next week.
Flags: needinfo?(dmajor)
Flags: needinfo?(dmajor)
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8e4a0ea53ed3
part 1 - Archive pdb files of the binaries for llvm-symbolizer. r=ted
https://hg.mozilla.org/integration/mozilla-inbound/rev/f37d962e0381
part 2 - Export VSINSTALLDIR so LLVM_ENABLE_DIA_SDK will be set. r=ehsan
https://hg.mozilla.org/mozilla-central/rev/8e4a0ea53ed3
https://hg.mozilla.org/mozilla-central/rev/f37d962e0381
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
dmajor, thanks for pushing this to the end. :)
Comment on attachment 8909705 [details]
Bug 1360650 part 1 - Archive pdb files of the binaries for llvm-symbolizer.

https://reviewboard.mozilla.org/r/180856/#review218448

::: config/rules.mk:824
(Diff revision 1)
>  endif
>  
>  define syms_template
>  syms:: $(2)
>  $(2): $(1)
> +ifdef MOZ_CRASHREPORTER

I did so because we only need to dump symbols when crashreporter is enabled.
(In reply to dmajor (offline) from comment #24)
> Also this doesn't appear to include the .pdb for firefox.exe (which is
> currently needed to investigate bug 1361185)

I saw firefox.pdb in the target.zip from bug 1361185 comment 13, is this still an issue?
> I saw firefox.pdb in the target.zip from bug 1361185 comment 13, is this still an issue?

The file has aged out so I can't tell, but if it works, great!
Product: Core → Firefox Build System
See Also: → 1457523
> > I saw firefox.pdb in the target.zip from bug 1361185 comment 13, is this still an issue?
> 
> The file has aged out so I can't tell, but if it works, great!

It turns out that firefox.pdb is indeed still missing. Filed bug 1460340. (fixed typo)
You need to log in before you can comment on or make changes to this bug.