Closed Bug 1805915 Opened 2 years ago Closed 2 years ago

profiles of local builds linked with --enable-linker=mold don't get symbolicated

Categories

(Core :: Gecko Profiler, defect, P1)

defect

Tracking

()

RESOLVED FIXED
110 Branch

People

(Reporter: emilio, Assigned: mstange)

References

Details

Attachments

(1 file)

https://crisal.io/tmp/mold-opt-libxul.so is the libxul.so for this.

I think the bug might be in shared-libraries-linux.cc.

The addresses shown in the profiler for libxul.so frames are too big. For example 0x7fa3c7d97987 and 0x7fa3c7dc73be don't fit into 32 bits. These addresses are supposed to be "relative" addresses, i.e. they should be relative to the library's load address / "base address".

Emilio, can you run the following in the Firefox-linked-with-mold build?

  • On the browser console: Services.profiler.sharedLibraries.find(l => l.name == 'libxul.so')
  • In the terminal: cat /proc/<firefox-parent-process-pid>/maps

and then paste both here? sharedLibrary.start should be equal to the address of the first libxul.so mapping.

Flags: needinfo?(emilio)

(In reply to Markus Stange [:mstange] from comment #2)

  • On the browser console: Services.profiler.sharedLibraries.find(l => l.name == 'libxul.so')
{
  "start": 140689736204288,
  "end": 140689932096320,
  "offset": 0,
  "name": "libxul.so",
  "path": "/home/emilio/src/moz/gecko/obj-opt/dist/bin/libxul.so",
  "debugName": "libxul.so",
  "debugPath": "/home/emilio/src/moz/gecko/obj-opt/dist/bin/libxul.so",
  "breakpadId": "7A1C91BBA8ACD92CB580C1077E2631EF0",
  "arch": ""
}
  • In the terminal: cat /proc/<firefox-parent-process-pid>/maps

Will attach because not doing it makes me go over the comment character limit :)

sharedLibrary.start should be equal to the address of the first libxul.so mapping

So... 0x7ff4e1c00000 == 140689736204288, which means the start is right?

I took a profile on this same run just in case it helps: https://share.firefox.dev/3WorOyF

The addresses from libxul shown in the profiler are from before the first libxul.so mapping, so they seem off somehow? If you tell me where to look I'm happy to try debug this

Flags: needinfo?(emilio) → needinfo?(mstange.moz)
Attached file Memory maps

Can you attach a profile that you capture with await Services.profiler.dumpProfileToFileAsync("/home/emilio/Desktop/raw-profile.json")? (Start the profiler the usual way first.)

Flags: needinfo?(mstange.moz) → needinfo?(emilio)

Ah, the end of the library seems off tho, 140689932096320 is 0x7ff4ed6d1340, while the last libxul mapping is 7ff4ed59a000-7ff4ed5e8000, which is smaller than that. Not sure if that is a problem?

Flags: needinfo?(emilio)

False alarm about the mappings - the ranges look fine. I forgot that unsymbolicated functions in the profiler still display the absolute addresses, not the relative addresses. Their internal relative address values look reasonable. For example, one of the hottest addresses is 0xadc73be, which symbolicates fine:

query_api % cargo run -- ~/Downloads/ /symbolicate/v5 '{"jobs":[{"memoryMap":[["mold-opt-libxul.so","36D208789F8DF932F8E647CB2BE1FAE60"]],"stacks":[[[0,182219710]]]}]}' | jq  
{
  "results": [
    {
      "stacks": [
        [
          {
            "frame": 0,
            "module_offset": "0xadc73be",
            "module": "mold-opt-libxul.so",
            "function": "style::dom_apis::query_selector",
            "function_offset": "0xebe",
            "function_size": "0x171a",
            "file": "/home/emilio/src/moz/gecko/servo/components/style/dom_apis.rs",
            "line": 642,
            "inlines": [
              {
                "function": "<style::gecko::wrapper::GeckoNode as style::dom::TNode>::as_element",
                "file": "/home/emilio/src/moz/gecko/servo/components/style/gecko/wrapper.rs",
                "line": 479
              },
              {
                "function": "style::dom_apis::collect_all_elements",
                "file": "/home/emilio/src/moz/gecko/servo/components/style/dom_apis.rs",
                "line": 230
              },
              {
                "function": "style::dom_apis::query_selector_single_query",
                "file": "/home/emilio/src/moz/gecko/servo/components/style/dom_apis.rs",
                "line": 397
              },
              {
                "function": "style::dom_apis::query_selector_fast",
                "file": "/home/emilio/src/moz/gecko/servo/components/style/dom_apis.rs",
                "line": 448
              }
            ]
          }
        ]
      ],
      "found_modules": {
        "mold-opt-libxul.so/36D208789F8DF932F8E647CB2BE1FAE60": true
      }
    }
  ]
}

The actual problem is a debug ID mismatch. The profile contains the value "B95396F750B7D947AAF1DC62F1771AF90", which was computed here, whereas the symbolication code computes a debug ID of "36D208789F8DF932F8E647CB2BE1FAE60".

One of the reasons for that mismatch is the fact that the library doesn't contain an ELF build ID: llvm-readelf --notes mold-opt-libxul.so doesn't show any output.

I'm a bit surprised by this. The mold documentation specifically describes how to speed up build ID computation, in the Details section.

Anyway, we have fallback code to compute a debug ID even when no ELF build ID is present, by hashing the first 4096 bytes of the .text section. But this fallback code is behaving differently: The one in mozilla-central specifically looks for the .text section, whereas the one in the profiler symbolication code takes the first section of "kind" "text", which, in mold-opt-libxul.so, happens to be the .plt section.

It looks like I introduced this bug in May 2020.

Assignee: nobody → mstange.moz
Status: NEW → ASSIGNED

Fixed in https://github.com/mstange/samply/commit/b600f99e398aaa953abcaa2357068cf508a4a9a9 .

I'll leave this bug open until the wasm blob in Firefox is updated.

(In reply to Markus Stange [:mstange] from comment #9)

One of the reasons for that mismatch is the fact that the library doesn't contain an ELF build ID: llvm-readelf --notes mold-opt-libxul.so doesn't show any output.

I'm a bit surprised by this. The mold documentation specifically describes how to speed up build ID computation, in the Details section.

I've filed https://github.com/rui314/mold/issues/919 on this.

Here is the list of command line options given to mold for building libxul.so. As you can see, no --build-id option is passed to the linker. I believe somewhere in your build system, -Wl,--build-id is appended to the linker's command line, and that code isn't executed if the linker is mold. So please check your build system.

--sysroot=/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu
-z relro
--hash-style=gnu
--eh-frame-hdr
-m elf_x86_64
-shared
-o libxul.so
/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/x86_64-linux-gnu/crti.o
/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/crtbeginS.o
-L/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/x86_64-linux-gnu
-L/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0
-L/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/lib/x86_64-linux-gnu
-L/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/x86_64-linux-gnu
-L/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/lib
-L/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib
-z defs
-h libxul.so
/home/ruiu/mozilla-unified/obj-x86_64-pc-linux-gnu/toolkit/library/build/libxul_so.list
-lpthread
-rpath-link
/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/lib/x86_64-linux-gnu
-rpath-link
/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/x86_64-linux-gnu
-z noexecstack
-z text
-z relro
-z nocopyreloc
-Bsymbolic-functions
-rpath-link
/home/ruiu/mozilla-unified/obj-x86_64-pc-linux-gnu/dist/bin
-rpath-link
/usr/local/lib
../../../security/nss/lib/crmf/crmf_crmf/libcrmf.a
../../../js/src/build/libjs_static.a
/home/ruiu/mozilla-unified/obj-x86_64-pc-linux-gnu/x86_64-unknown-linux-gnu/release/libgkrust.a
../../../security/sandbox/linux/libmozsandbox.so
../../../config/external/nspr/pr/libnspr4.so
../../../config/external/nspr/libc/libplc4.so
../../../config/external/nspr/ds/libplds4.so
../../../config/external/lgpllibs/liblgpllibs.so
../../../security/nss/lib/nss/nss_nss3/libnss3.so
../../../security/nss/lib/util/util_nssutil3/libnssutil3.so
../../../security/nss/lib/smime/smime_smime3/libsmime3.so
../../../config/external/sqlite/libmozsqlite3.so
../../../security/nss/lib/ssl/ssl_ssl3/libssl3.so
../../../widget/gtk/mozgtk/libmozgtk.so
../../../widget/gtk/mozwayland/libmozwayland.so
--version-script
symverscript
-ldl
-lasound
-lrt
-lm
-ldl
-lX11
-lXcomposite
-lXdamage
-lXext
-lXfixes
-lXrandr
-lXrender
-lXtst
-lpthread
-lc
-lfreetype
-lfontconfig
-lgtk-3
-lgdk-3
-lpangocairo-1.0
-lpango-1.0
-latk-1.0
-lcairo-gobject
-lcairo
-lgdk_pixbuf-2.0
-lgio-2.0
-lgobject-2.0
-lglib-2.0
-ldbus-glib-1
-ldbus-1
-lxcb-shm
-lX11-xcb
-lxcb
-lXcursor
-lXi
-lstdc++
-lm
-lgcc_s
-lgcc
-lpthread
-lc
-lgcc_s
-lgcc
/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/crtendS.o
/home/ruiu/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/x86_64-linux-gnu/crtn.o```

It seems since bug 1796518 the --build-id=sha1 is not added to local builds.

See Also: → 1796518

--build-id=sha1 isn't that slow with mold, so I recommend always appending that option if the linker is mold.

Thanks for checking! I've filed bug 1806470 on this.

Severity: -- → S3
Priority: -- → P1
Depends on: 1808982

Hey Markus, this bug is fixed after Bug 1808982, right? I guess we can close this bug as fixed as well.

Flags: needinfo?(mstange.moz)

Oh, yes, thanks. It's even double-fixed, with bug 1806470 being fixed as well.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(mstange.moz)
Resolution: --- → FIXED
Target Milestone: --- → 110 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: