codesighs gives incorrect symbol sizes

RESOLVED FIXED

Status

SeaMonkey
Build Config
RESOLVED FIXED
16 years ago
14 years ago

People

(Reporter: Brian Ryner (not reading), Assigned: Brian Ryner (not reading))

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 4 obsolete attachments)

(Assignee)

Description

16 years ago
(filing under Build Config because I couldn't find anything more appropriate)

Codesighs can, in certain cases, report incorrect sizes for symbols on Linux. 
This leads to wrong attributed blame for code size increases.

One particular case I found today is that string literals are completely
anonymous (no symbol table entry), so based on the method nm uses to calculate
symbol sizes, the size of all string literals up to the next symbol are added
into the size of the previous symbol.  This is just nm being lame, but there's
not much we can do about that.

Instead, we can convert codesighs to use readelf on linux, which knows how to
look at the size field of the symbol table entry to give the correct size for
the symbol.  This will leave string literals unaccounted for, but I'd be in
favor of fixing that later if we care.
(Assignee)

Comment 1

16 years ago
Created attachment 120302 [details] [diff] [review]
convert to using readelf

This seems to work.  I'm not entirely happy about using both a perl wrapper and
a separate output processor, but it turns out some things are just easier to do
in perl -- the sorting (which seems to not be necessary for codesighs, but is
nice to have when examining the file by hand) and buffering up the lines until
we [possibly] hit a .symtab table.

Comparing the TSV files generated from readelf and nm, the readelf one uses
non-rounded-up sizes for everything, so there will likely be an artificial
decrease in code size if we switch to this.  I think the improved accuracy
makes it worth the hit of a one-time jump (down) in the results.
(Assignee)

Comment 2

16 years ago
... maybe I'll lose elf2tsv.c and just write the whole thing in perl.
perl is better at all that sort of stuff.

/be
(Assignee)

Comment 4

16 years ago
Created attachment 120361 [details] [diff] [review]
all-perl version

Ok, this is much simpler.  I'm converting the readelf output directly into
codesighs TSV output.  I'm also now getting real values for section names and
object files within the library for each symbol.
(Assignee)

Updated

16 years ago
Attachment #120302 - Attachment is obsolete: true
(Assignee)

Comment 5

16 years ago
Created attachment 120366 [details] [diff] [review]
cleaned up a bit
Attachment #120361 - Attachment is obsolete: true
(Assignee)

Updated

16 years ago
Attachment #120366 - Flags: superreview?(brendan)
Attachment #120366 - Flags: review?(sfraser)
(Assignee)

Comment 6

16 years ago
Anyone have any thoughts on whether we should try to account for symbol-less
data, like string literals?  One possibility is that if there's a gap between
the end of a symbol and the beginning of the next one, we could generate a fake
symbol name to fill that space, like __data_N, where N counts up from 0 for each
library (or maybe __string_N since it's most likely a string).

Comment 7

16 years ago
Comment on attachment 120366 [details] [diff] [review]
cleaned up a bit

>Index: readelf_wrap.pl
>===================================================================

>+# First read in the list of sections
>+open(READELF_OUTPUT, "readelf -SW $ARGV[0] |") or die "readelf failed to run on $ARGV[0]\n";
...
>+open(READELF_OUTPUT, "readelf -sW $ARGV[0] | c++filt |") or die "readelf failed to run on $ARGV[0]\n";

How expensive is it to run readelf twice? Maybe run it once and store the
output?

>+	# Filter out types of symbols that we don't care about:
>+	#  - anything that's not of type OBJECT, FUNC, or FILE

What other types are there? Should you include other types with non-zero sizes?

Otherwise it looks good.
Attachment #120366 - Flags: review?(sfraser) → review+
(Assignee)

Comment 8

16 years ago
> How expensive is it to run readelf twice? Maybe run it once and store the
> output?

Good point.  I can run 'readelf -sSW' and get both the section headers and the
symbol table from the same run.

> What other types are there? Should you include other types with non-zero sizes?

The other types are:

NOTYPE  (symbols like gcc2_compiled, which are also 0 size)
SECTION (section symbols; also 0 size)
COMMON  (I can't find these anywhere in real files.  I suppose they could be non
0 size though.)
TLS     (thread local storage - I can't find any of these symbols either)

So maybe I should just include all symbol types, and call anything DATA that's
not of type FUNC.
We could try treating anything not FUNC as signal, and see how that goes.  If we
want to be strict about code size regressions, we'll need to avoid false
positives like this string foozle, though.

COMMON shouldn't show up unless someone is using Fortran, I bet. ;-)

/be
(Assignee)

Comment 10

16 years ago
Turns out we can't do what I wanted to do to figure out the object file a symbol
comes from.  For example, the patch above will mis-attribute all global
functions and data (oops).  We really need debugging info (either stabs or
dwarf) to get this right.  I'd like to get this patch in without the object file
parts and come back and do a later revision to fix that.

As for the debugging symbols, finding the right object file is done roughly the
same way for both stabs and dwarf (find the SO before the symbol for stabs or
the DW_TAG_compile_unit for dwarf).  These can be dumped with 'objdump -G' and
'readelf -wi' respectively.
(Assignee)

Updated

16 years ago
Attachment #120366 - Flags: superreview?(brendan)
(Assignee)

Comment 11

16 years ago
Created attachment 120681 [details] [diff] [review]
patch with faulty object file guessing removed
Attachment #120366 - Attachment is obsolete: true
(Assignee)

Updated

16 years ago
Attachment #120681 - Flags: superreview?(brendan)
Comment on attachment 120681 [details] [diff] [review]
patch with faulty object file guessing removed

Extra newline after comment?

+    # module name
+
+    print "$module\t";

Looks great -- just curious, under what conditions do we not find a .symtab
that trumps a .dynsym section?

/be
Attachment #120681 - Flags: superreview?(brendan) → superreview+
I tried it, and tried to compare with an old-style .tsv file. They are not
compatible, so codesighs was runnig very long to compare everything. Is this
intentional? It now uses names like "rodata" instead of "R". I don't know the
old names exactly. Anyway, the new ones look clearer. At least, I can understand
them :)

It also seems te be slower. Is that the perlscript, or is it readelf being slow?

Comment 14

16 years ago
The old readelf definitely is no speed daemon.  The best you can do is use the
readelf from elfutils, installed as eu-readelf on RHL9.  It should be possible
to compile the package oon RHL7.3 but I never tried.
(Assignee)

Updated

16 years ago
Blocks: 202164
(Assignee)

Comment 15

15 years ago
Unfortunately, the output of readelf and eu-readelf isn't identical, so I'll
probably have to rewrite the parsing to use eu-readelf.
(Assignee)

Comment 16

15 years ago
Created attachment 132393 [details] [diff] [review]
revised patch

Changes from the last patch:

- Work with readelf or eu-readelf (prefer the latter, for speed)
- Report space used in a section but not associated with a symbol (this will
make sure we take into account string constants, etc).	The extra space is
reported as ".nosyms.<section>"

I ran this with both readelf and eu-readelf and verified that the results were
identical.
(Assignee)

Updated

15 years ago
Attachment #120681 - Attachment is obsolete: true
(Assignee)

Updated

15 years ago
Attachment #132393 - Flags: superreview?(brendan)
Comment on attachment 132393 [details] [diff] [review]
revised patch

>+while (<READELF_OUTPUT>) {
>+
>+    if (!$in_symbols) {
>+	# capture the 'null' section which has no name, so that the

Wahhh, 2- vs. 4-space indentation unit alert!

Other than that, sr=brendan@mozilla.org.

/be
Attachment #132393 - Flags: superreview?(brendan) → superreview+
(Assignee)

Comment 18

15 years ago
checked in and reporting good numbers on comet and luna (the numbers are up a
bit from the nm-based reporting - this is expected)
Status: NEW → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.