Closed Bug 201622 Opened 21 years ago Closed 21 years ago

codesighs gives incorrect symbol sizes

Categories

(SeaMonkey :: Build Config, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bryner, Assigned: bryner)

References

Details

Attachments

(1 file, 4 obsolete files)

(filing under Build Config because I couldn't find anything more appropriate)

Codesighs can, in certain cases, report incorrect sizes for symbols on Linux. 
This leads to wrong attributed blame for code size increases.

One particular case I found today is that string literals are completely
anonymous (no symbol table entry), so based on the method nm uses to calculate
symbol sizes, the size of all string literals up to the next symbol are added
into the size of the previous symbol.  This is just nm being lame, but there's
not much we can do about that.

Instead, we can convert codesighs to use readelf on linux, which knows how to
look at the size field of the symbol table entry to give the correct size for
the symbol.  This will leave string literals unaccounted for, but I'd be in
favor of fixing that later if we care.
Attached patch convert to using readelf (obsolete) — Splinter Review
This seems to work.  I'm not entirely happy about using both a perl wrapper and
a separate output processor, but it turns out some things are just easier to do
in perl -- the sorting (which seems to not be necessary for codesighs, but is
nice to have when examining the file by hand) and buffering up the lines until
we [possibly] hit a .symtab table.

Comparing the TSV files generated from readelf and nm, the readelf one uses
non-rounded-up sizes for everything, so there will likely be an artificial
decrease in code size if we switch to this.  I think the improved accuracy
makes it worth the hit of a one-time jump (down) in the results.
... maybe I'll lose elf2tsv.c and just write the whole thing in perl.
perl is better at all that sort of stuff.

/be
Attached patch all-perl version (obsolete) — Splinter Review
Ok, this is much simpler.  I'm converting the readelf output directly into
codesighs TSV output.  I'm also now getting real values for section names and
object files within the library for each symbol.
Attachment #120302 - Attachment is obsolete: true
Attached patch cleaned up a bit (obsolete) — Splinter Review
Attachment #120361 - Attachment is obsolete: true
Attachment #120366 - Flags: superreview?(brendan)
Attachment #120366 - Flags: review?(sfraser)
Anyone have any thoughts on whether we should try to account for symbol-less
data, like string literals?  One possibility is that if there's a gap between
the end of a symbol and the beginning of the next one, we could generate a fake
symbol name to fill that space, like __data_N, where N counts up from 0 for each
library (or maybe __string_N since it's most likely a string).
Comment on attachment 120366 [details] [diff] [review]
cleaned up a bit

>Index: readelf_wrap.pl
>===================================================================

>+# First read in the list of sections
>+open(READELF_OUTPUT, "readelf -SW $ARGV[0] |") or die "readelf failed to run on $ARGV[0]\n";
...
>+open(READELF_OUTPUT, "readelf -sW $ARGV[0] | c++filt |") or die "readelf failed to run on $ARGV[0]\n";

How expensive is it to run readelf twice? Maybe run it once and store the
output?

>+	# Filter out types of symbols that we don't care about:
>+	#  - anything that's not of type OBJECT, FUNC, or FILE

What other types are there? Should you include other types with non-zero sizes?

Otherwise it looks good.
Attachment #120366 - Flags: review?(sfraser) → review+
> How expensive is it to run readelf twice? Maybe run it once and store the
> output?

Good point.  I can run 'readelf -sSW' and get both the section headers and the
symbol table from the same run.

> What other types are there? Should you include other types with non-zero sizes?

The other types are:

NOTYPE  (symbols like gcc2_compiled, which are also 0 size)
SECTION (section symbols; also 0 size)
COMMON  (I can't find these anywhere in real files.  I suppose they could be non
0 size though.)
TLS     (thread local storage - I can't find any of these symbols either)

So maybe I should just include all symbol types, and call anything DATA that's
not of type FUNC.
We could try treating anything not FUNC as signal, and see how that goes.  If we
want to be strict about code size regressions, we'll need to avoid false
positives like this string foozle, though.

COMMON shouldn't show up unless someone is using Fortran, I bet. ;-)

/be
Turns out we can't do what I wanted to do to figure out the object file a symbol
comes from.  For example, the patch above will mis-attribute all global
functions and data (oops).  We really need debugging info (either stabs or
dwarf) to get this right.  I'd like to get this patch in without the object file
parts and come back and do a later revision to fix that.

As for the debugging symbols, finding the right object file is done roughly the
same way for both stabs and dwarf (find the SO before the symbol for stabs or
the DW_TAG_compile_unit for dwarf).  These can be dumped with 'objdump -G' and
'readelf -wi' respectively.
Attachment #120366 - Flags: superreview?(brendan)
Attachment #120366 - Attachment is obsolete: true
Attachment #120681 - Flags: superreview?(brendan)
Comment on attachment 120681 [details] [diff] [review]
patch with faulty object file guessing removed

Extra newline after comment?

+    # module name
+
+    print "$module\t";

Looks great -- just curious, under what conditions do we not find a .symtab
that trumps a .dynsym section?

/be
Attachment #120681 - Flags: superreview?(brendan) → superreview+
I tried it, and tried to compare with an old-style .tsv file. They are not
compatible, so codesighs was runnig very long to compare everything. Is this
intentional? It now uses names like "rodata" instead of "R". I don't know the
old names exactly. Anyway, the new ones look clearer. At least, I can understand
them :)

It also seems te be slower. Is that the perlscript, or is it readelf being slow?
The old readelf definitely is no speed daemon.  The best you can do is use the
readelf from elfutils, installed as eu-readelf on RHL9.  It should be possible
to compile the package oon RHL7.3 but I never tried.
Blocks: 202164
Unfortunately, the output of readelf and eu-readelf isn't identical, so I'll
probably have to rewrite the parsing to use eu-readelf.
Attached patch revised patchSplinter Review
Changes from the last patch:

- Work with readelf or eu-readelf (prefer the latter, for speed)
- Report space used in a section but not associated with a symbol (this will
make sure we take into account string constants, etc).	The extra space is
reported as ".nosyms.<section>"

I ran this with both readelf and eu-readelf and verified that the results were
identical.
Attachment #120681 - Attachment is obsolete: true
Attachment #132393 - Flags: superreview?(brendan)
Comment on attachment 132393 [details] [diff] [review]
revised patch

>+while (<READELF_OUTPUT>) {
>+
>+    if (!$in_symbols) {
>+	# capture the 'null' section which has no name, so that the

Wahhh, 2- vs. 4-space indentation unit alert!

Other than that, sr=brendan@mozilla.org.

/be
Attachment #132393 - Flags: superreview?(brendan) → superreview+
checked in and reporting good numbers on comet and luna (the numbers are up a
bit from the nm-based reporting - this is expected)
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: