Closed Bug 201622 Opened 22 years ago Closed 22 years ago

codesighs gives incorrect symbol sizes

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bryner, Assigned: bryner)

References

Details

Attachments

(1 file, 4 obsolete files)

convert to using readelf 22 years ago Brian Ryner (not reading) 20.64 KB, patch		Details \| Diff \| Splinter Review
all-perl version 22 years ago Brian Ryner (not reading) 7.93 KB, patch		Details \| Diff \| Splinter Review
cleaned up a bit 22 years ago Brian Ryner (not reading) 7.97 KB, patch	sfraser_bugs : review+	Details \| Diff \| Splinter Review
patch with faulty object file guessing removed 22 years ago Brian Ryner (not reading) 7.56 KB, patch	brendan : superreview+	Details \| Diff \| Splinter Review
revised patch 22 years ago Brian Ryner (not reading) 9.29 KB, patch	brendan : superreview+	Details \| Diff \| Splinter Review

Brian Ryner (not reading)

Assignee

Description

•

22 years ago

(filing under Build Config because I couldn't find anything more appropriate) Codesighs can, in certain cases, report incorrect sizes for symbols on Linux. This leads to wrong attributed blame for code size increases. One particular case I found today is that string literals are completely anonymous (no symbol table entry), so based on the method nm uses to calculate symbol sizes, the size of all string literals up to the next symbol are added into the size of the previous symbol. This is just nm being lame, but there's not much we can do about that. Instead, we can convert codesighs to use readelf on linux, which knows how to look at the size field of the symbol table entry to give the correct size for the symbol. This will leave string literals unaccounted for, but I'd be in favor of fixing that later if we care.

Brian Ryner (not reading)

Assignee

Comment 1

•

22 years ago

Attached patch convert to using readelf (obsolete) — Details — Splinter Review

This seems to work. I'm not entirely happy about using both a perl wrapper and a separate output processor, but it turns out some things are just easier to do in perl -- the sorting (which seems to not be necessary for codesighs, but is nice to have when examining the file by hand) and buffering up the lines until we [possibly] hit a .symtab table. Comparing the TSV files generated from readelf and nm, the readelf one uses non-rounded-up sizes for everything, so there will likely be an artificial decrease in code size if we switch to this. I think the improved accuracy makes it worth the hit of a one-time jump (down) in the results.

Brian Ryner (not reading)

Assignee

Comment 2

•

22 years ago

... maybe I'll lose elf2tsv.c and just write the whole thing in perl.

Brendan Eich [:brendan]

Comment 3

•

22 years ago

perl is better at all that sort of stuff. /be

Brian Ryner (not reading)

Assignee

Comment 4

•

22 years ago

Attached patch all-perl version (obsolete) — Details — Splinter Review

Ok, this is much simpler. I'm converting the readelf output directly into codesighs TSV output. I'm also now getting real values for section names and object files within the library for each symbol.

Brian Ryner (not reading)

Assignee

Updated

•

22 years ago

Attachment #120302 - Attachment is obsolete: true

Brian Ryner (not reading)

Assignee

Comment 5

•

22 years ago

Attached patch cleaned up a bit (obsolete) — Details — Splinter Review

Attachment #120361 - Attachment is obsolete: true

Brian Ryner (not reading)

Assignee

Updated

•

22 years ago

Attachment #120366 - Flags: superreview?(brendan)

Attachment #120366 - Flags: review?(sfraser)

Brian Ryner (not reading)

Assignee

Comment 6

•

22 years ago

Anyone have any thoughts on whether we should try to account for symbol-less data, like string literals? One possibility is that if there's a gap between the end of a symbol and the beginning of the next one, we could generate a fake symbol name to fill that space, like __data_N, where N counts up from 0 for each library (or maybe __string_N since it's most likely a string).

Simon Fraser [no longer active]

Comment 7

•

22 years ago

Comment on attachment 120366 [details] [diff] [review] cleaned up a bit >Index: readelf_wrap.pl >=================================================================== >+# First read in the list of sections >+open(READELF_OUTPUT, "readelf -SW $ARGV[0] |") or die "readelf failed to run on $ARGV[0]\n"; ... >+open(READELF_OUTPUT, "readelf -sW $ARGV[0] | c++filt |") or die "readelf failed to run on $ARGV[0]\n"; How expensive is it to run readelf twice? Maybe run it once and store the output? >+ # Filter out types of symbols that we don't care about: >+ # - anything that's not of type OBJECT, FUNC, or FILE What other types are there? Should you include other types with non-zero sizes? Otherwise it looks good.

Attachment #120366 - Flags: review?(sfraser) → review+

Brian Ryner (not reading)

Assignee

Comment 8

•

22 years ago

> How expensive is it to run readelf twice? Maybe run it once and store the > output? Good point. I can run 'readelf -sSW' and get both the section headers and the symbol table from the same run. > What other types are there? Should you include other types with non-zero sizes? The other types are: NOTYPE (symbols like gcc2_compiled, which are also 0 size) SECTION (section symbols; also 0 size) COMMON (I can't find these anywhere in real files. I suppose they could be non 0 size though.) TLS (thread local storage - I can't find any of these symbols either) So maybe I should just include all symbol types, and call anything DATA that's not of type FUNC.

Brendan Eich [:brendan]

Comment 9

•

22 years ago

We could try treating anything not FUNC as signal, and see how that goes. If we want to be strict about code size regressions, we'll need to avoid false positives like this string foozle, though. COMMON shouldn't show up unless someone is using Fortran, I bet. ;-) /be

Brian Ryner (not reading)

Assignee

Comment 10

•

22 years ago

Turns out we can't do what I wanted to do to figure out the object file a symbol comes from. For example, the patch above will mis-attribute all global functions and data (oops). We really need debugging info (either stabs or dwarf) to get this right. I'd like to get this patch in without the object file parts and come back and do a later revision to fix that. As for the debugging symbols, finding the right object file is done roughly the same way for both stabs and dwarf (find the SO before the symbol for stabs or the DW_TAG_compile_unit for dwarf). These can be dumped with 'objdump -G' and 'readelf -wi' respectively.

Brian Ryner (not reading)

Assignee

Updated

•

22 years ago

Attachment #120366 - Flags: superreview?(brendan)

Brian Ryner (not reading)

Assignee

Comment 11

•

22 years ago

Attached patch patch with faulty object file guessing removed (obsolete) — Details — Splinter Review

Attachment #120366 - Attachment is obsolete: true

Brian Ryner (not reading)

Assignee

Updated

•

22 years ago

Attachment #120681 - Flags: superreview?(brendan)

Brendan Eich [:brendan]

Comment 12

•

22 years ago

Comment on attachment 120681 [details] [diff] [review] patch with faulty object file guessing removed Extra newline after comment? + # module name + + print "$module\t"; Looks great -- just curious, under what conditions do we not find a .symtab that trumps a .dynsym section? /be

Attachment #120681 - Flags: superreview?(brendan) → superreview+

Michiel van Leeuwen (email: mvl+moz@)

Comment 13

•

22 years ago

I tried it, and tried to compare with an old-style .tsv file. They are not compatible, so codesighs was runnig very long to compare everything. Is this intentional? It now uses names like "rodata" instead of "R". I don't know the old names exactly. Anyway, the new ones look clearer. At least, I can understand them :) It also seems te be slower. Is that the perlscript, or is it readelf being slow?

Ulrich Drepper

Comment 14

•

22 years ago

The old readelf definitely is no speed daemon. The best you can do is use the readelf from elfutils, installed as eu-readelf on RHL9. It should be possible to compile the package oon RHL7.3 but I never tried.

Brian Ryner (not reading)

Assignee

Updated

•

22 years ago

Blocks: 202164

Brian Ryner (not reading)

Assignee

Comment 15

•

22 years ago

Unfortunately, the output of readelf and eu-readelf isn't identical, so I'll probably have to rewrite the parsing to use eu-readelf.

Brian Ryner (not reading)

Assignee

Comment 16

•

22 years ago

Attached patch revised patch — Details — Splinter Review

Changes from the last patch: - Work with readelf or eu-readelf (prefer the latter, for speed) - Report space used in a section but not associated with a symbol (this will make sure we take into account string constants, etc). The extra space is reported as ".nosyms.<section>" I ran this with both readelf and eu-readelf and verified that the results were identical.

Brian Ryner (not reading)

Assignee

Updated

•

22 years ago

Attachment #120681 - Attachment is obsolete: true

Brian Ryner (not reading)

Assignee

Updated

•

22 years ago

Attachment #132393 - Flags: superreview?(brendan)

Brendan Eich [:brendan]

Comment 17

•

22 years ago

Comment on attachment 132393 [details] [diff] [review] revised patch >+while (<READELF_OUTPUT>) { >+ >+ if (!$in_symbols) { >+ # capture the 'null' section which has no name, so that the Wahhh, 2- vs. 4-space indentation unit alert! Other than that, sr=brendan@mozilla.org. /be

Attachment #132393 - Flags: superreview?(brendan) → superreview+

Brian Ryner (not reading)

Assignee

Comment 18

•

22 years ago

checked in and reporting good numbers on comet and luna (the numbers are up a bit from the nm-based reporting - this is expected)

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: Browser → Seamonkey

You need to log in before you can comment on or make changes to this bug.