Investigate whether alternate dump_syms is faster on Windows

RESOLVED WONTFIX

Status

()

Core
Build Config
RESOLVED WONTFIX
10 months ago
5 months ago

People

(Reporter: ted, Unassigned)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(firefox53 affected)

Details

(Reporter)

Description

10 months ago
I've got this alternate implementation of dump_syms for Windows lying around:
https://github.com/luser/dump_syms

I didn't write it originally, but the original author BSD licensed it and I adopted it. I had some plans for it but I haven't really done anything with it. We should see if it's faster than the stock Breakpad dump_syms, since we spend a lot of time running dump_syms in the buildsymbols step.

The Breakpad version just calls into Microsoft's DIA SDK, so the underpinnings of the PDB parsing are opaque. At least here it's all visible and we could potentially make it faster if needed.
(Reporter)

Comment 1

10 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=58a2ce3f435b
(Reporter)

Comment 2

10 months ago
So in a local Win64 build, comparing the time to dump symbols from xul.pdb with the existing Breakpad dump_syms vs. the alternate implementation I get:

$ time ./breakpad/src/src/tools/windows/dump_syms/Release/dump_syms.exe ./debug-x64-mozilla-central/toolkit/library/xul.pdb > /tmp/stock-xul.sym

real    0m18.405s
user    0m0.031s
sys     0m0.000s


$ time ./debug-x64-mozilla-central/dist/host/bin/dump_syms.exe ./debug-x64-mozilla-central/toolkit/library/xul.pdb > /tmp/new-xul.sym

real    0m3.316s
user    0m0.000s
sys     0m0.015s

that is...faster than I expected. I re-ran the original to make sure it wasn't just filesystem caching and the second run was ~18s as well.
(Reporter)

Comment 3

10 months ago
I looked at a few random inbound builds and we spend 300+ seconds processing xul.pdb for our Windows builds. That includes running dump_syms and compressing the pdb file with makecab, so I don't know what the actual breakdown is. Assuming that try push works we can see what impact this has on the buildsymbols timing.

The new dump_syms implementation is missing at least one thing that we'd need, it doesn't dump the unwind info from 64-bit PE headers, but I wrote that code for Breakpad so I could port it without much trouble. I doubt that accounts for much of the time difference.

Comment 4

10 months ago
Did you compare the resulting symbol files?
(Reporter)

Comment 5

9 months ago
(In reply to Mike Hommey [:glandium] from comment #4)
> Did you compare the resulting symbol files?

I did, there are some differences I know about:
1) The function name for C-linkage symbols doesn't match, because the DIA API gets function names from somewhere in the PDB file that I never quite figured out, so the new PDB parser just uses the bare name and generates the function arguments from the type information, so it adds argument lists to C functions that aren't present in the current dump_syms output. This is not really the worst thing, honestly, although it does bug me.
2) The STACK WIN info is missing from win64 symbol files, as I mentioned in comment 3.

Anyway, comparing the buildsymbols times from the try push in comment 1 it's not that impressive of a win in automation:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=5bc831655dec&newProject=try&newRevision=58a2ce3f435b&originalSignature=4b0e4dc8185db3ad05cbd5f8e78c2a82e71b2dac&newSignature=4b0e4dc8185db3ad05cbd5f8e78c2a82e71b2dac&framework=2
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=5bc831655dec&newProject=try&newRevision=58a2ce3f435b&originalSignature=58dd798456c0c59350fb5d65dc18fee63b89dd2a&newSignature=58dd798456c0c59350fb5d65dc18fee63b89dd2a&framework=2
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=5bc831655dec&newProject=try&newRevision=58a2ce3f435b&originalSignature=e261c8d0935410f20362706e4fcfd3fc7cc2de24&newSignature=e261c8d0935410f20362706e4fcfd3fc7cc2de24&framework=2
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=5bc831655dec&newProject=try&newRevision=58a2ce3f435b&originalSignature=14c1be0797a43a19e08666617834e7d763ad2a44&newSignature=14c1be0797a43a19e08666617834e7d763ad2a44&framework=2
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=5bc831655dec&newProject=try&newRevision=58a2ce3f435b&originalSignature=4b0e4dc8185db3ad05cbd5f8e78c2a82e71b2dac&newSignature=4b0e4dc8185db3ad05cbd5f8e78c2a82e71b2dac&framework=2
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=5bc831655dec&newProject=try&newRevision=58a2ce3f435b&originalSignature=58dd798456c0c59350fb5d65dc18fee63b89dd2a&newSignature=58dd798456c0c59350fb5d65dc18fee63b89dd2a&framework=2

It looks like it's only 15-20s faster.

However, while discussing this with gps on IRC the other week, he reminded me that the timings printed in the build logs are for more than just running dump_syms, they're for the entire processing of each binary's symbols, which on Windows includes running makecab to compress it. makecab is pretty slow, so I spent a couple of days and rewrote a simple version that implements just what we use in Rust: https://github.com/luser/rust-makecab

It's about 30% faster at compressing xul.pdb from a local build on my machine with the output of both versions piped to /dev/null. I didn't write the timings down but it was ~1m40s for the original compared to ~1m5s for the rewrite. This is on my local Windows machine which is a Core i7-4790, 32GB ram, SSD. I cat'ed xul.pdb before testing to make sure it was in the filesystem cache, and ran each program 4 times. The timings were very consistent.

I pushed a patch to try here to use the Rust version: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ce539ea7b696fdb17495a8da28d837f7cdf9e9ec

The initial results are promising, but I retriggered some more builds to get better data. For example, the buildbot win32 build shows:
03:55:02     INFO -  1940: Worker finished processing ('.\\toolkit\\library\\xul.pdb',) in 149.04s

Compared to the 300+s I was seeing in logs as I mentioned in comment 15. We'll see what that does to the overall buildsymbols time, but I'm optimistic.
(Reporter)

Comment 6

9 months ago
For comparison, one of the buildbot win32 builds from the try push in comment 1 had:
20:01:52     INFO -  1816: Worker finished processing ('.\\toolkit\\library\\xul.pdb',) in 331.70s

So that's a pretty big drop!
(Reporter)

Comment 7

9 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7cc495f79da0
(Reporter)

Comment 8

9 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=08e3e715a939
(Reporter)

Comment 9

9 months ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #7)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=7cc495f79da0

I pushed both patches together here just to see if there's any additional benefit. If not, the rust-makecab change seems to be worthwhile on its own.
(Reporter)

Comment 10

9 months ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=08e3e715a939

Er, that comment was intended for this try push.
Great work, Ted!

Having not looked at the code, have you incorporated the optimization where we turn off or decrease the compression ratio of the files that don't compress very well? We're currently burning a lot of CPU cycles attempting to compress files only to get e.g. 99% of their original size.

Regarding time savings, in my mind I'm imagining C++ compile times to be 5 minutes (with distributed caching and possibly compilation). In that world, 20s is still 3.3%, which is worth paying attention to. Even 10 minute times and a 1.66% savings is worthwhile. What I'm trying to say is I think 20s or even 10s is worth pursuing in automation. (But we should prioritize by the longest pole, obviously.)
(Reporter)

Comment 12

9 months ago
The build time wins for the makecab patch alone are pretty great:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=autoland&originalRevision=6f7b660eb31b&newProject=try&newRevision=7cc495f79da0&framework=2&filter=windows&showOnlyImportant=0

Here's a compare of the two try pushes, with the "old" here being "just the makecab patch" and the "new" being "makecab + dump_syms":
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=7cc495f79da0&newProject=try&newRevision=2a666ba77681&framework=2&filter=windows&showOnlyImportant=0

It looks like a small but real additional win.
(Reporter)

Comment 13

9 months ago
I'll move the makecab change to its own bug: bug 1329320.
(Reporter)

Comment 14

5 months ago
Given the timings in comment 15 I don't think this is worth the effort to fix right now. It's good I looked into this though, since the build time wins in bug 1329320 were significant!
Status: NEW → RESOLVED
Last Resolved: 5 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.