Open Bug 889822 Opened 11 years ago Updated 2 years ago

msvc symbols are not being recognized/used in some cases

Categories

(Toolkit :: Crash Reporting, defect)

x86_64
Windows 7
defect

Tracking

()

People

(Reporter: benjamin, Unassigned)

Details

(Whiteboard: [lang=c++][crashkill:P2])

      No description provided.
See e.g. https://crash-stats.mozilla.com/report/index/0013ff42-587e-4d68-98a2-29c792130629

The top frame is crashing at 0x68d4af06 which is msvcr100.dll + 0x8af06

Visual Studio correctly reports this crash as happening in _purecall.

The .sym file located on the server has matching information:

FUNC 8aef4 2a 0 _purecall
8aef4 0 43 40794
8aef4 c 44 40794
8af00 4 45 40794
8af04 2 47 40794
8af06 7 54 40794
8af0d c 56 40794
8af19 5 57 40794

But for some reason breakpad isn't picking this up.
So here's what's going on:

FUNC 8aeec b 14 _invalid_parameter(unsigned short const *,unsigned short const *,unsigned short const *,unsigned int,unsigned int)
8aeec 5 321 42951
8aef1 1 329 42951
8aef2 5 328 42951
FUNC 8aef4 2a 0 _purecall
8aef4 0 43 40794
8aef4 c 44 40794
8af00 4 45 40794
8af04 2 47 40794
8af06 7 54 40794
8af0d c 56 40794
8af19 5 57 40794

The _invalid_parameter function is in fact 8 bytes long. But the FUNC header in the .sym file says that it's 11 bytes (0xb). This overlaps the next function which starts at the 9th byte. This causes breakpad to throw away the range for the next function with the warning:

2013-07-03 07:39:43: range_map-inl.h:77: INFO: StoreRange failed, an existing range is contained by or extends lower than the new range: new 0x8aef4+0x2a, existing 0x8aeec+0xb

The reason that it's listing the function as 11 bytes long is because the final instruction of the function is a 2-byte jump instruction up to _invalid_parameter (the C function). I believe that when the compiler generated this instruction, it was originally a 5-byte long-jump instruction. When the linker finished placing the functions, it discovered that it could rewrite the jump to a 2-byte relative nearjump, but it didn't update the debuginfo.

I ran a quick analysis on just this .sym file, and of the 2886 functions, 359 of them are affected by this bug, including some of the most important. I will attach a list, but in includes

* operator delete
* operator new
* malloc/free
* memcpy/memmov
* fclose/fread/_close/_read

I'm still looking at whether all of the overlaps are the same cause. If it's specifically that, we may be able to just hack in a small overlap check. Not sure yet.
Assignee: nobody → benjamin
This is not limited to jumps at the end of functions:

FUNC 461e4 27 0 Concurrency::Context::IsCurrentTaskCollectionCanceling()
461e4 0 92 404
461e4 5 93 404
461e9 4 94 404
461ed 3 100 404
461f0 4 101 404
461f4 a 103 404
0 5 104 404
0 5 106 404
46207 2 109 404
46209 1 110 404

In this case the jump(s) that are being rewritten to nearjumps are in the middle of the function. In this case the total function size is off by only 1 byte, but I think that's because of some kind of padding (it should be 6 bytes, if everything were tightly packed).

I ran my analysis script over xul.sym for Firefox 22, and it does not appear that xul.sym or nss3.sym are affected by this.

Given this, I think that we should consider automatically rewriting ranges "smaller" when another range is found that sits on the end of them.
Fixing this at symbol-dumping time would be doable, but require a bit of rework of the symbol dumping code, because it currently doesn't store functions in memory, it just dumps them all out as it iterates them:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/windows/pdb_source_line_writer.cc#280

Fixing this at symbol load time would mean fixing this:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/basic_source_line_resolver.cc#146

Where the implementation for StoreRange is here:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/range_map-inl.h#50
The former *feels* better, since we shouldn't be writing out malformed symbol files in the first place, even if it's more of a PITA.
We could also rewrite the .sym files in python, which would be a total hack but would be relatively simple.

Do we get any guarantees from the iterator that they are handed back in order? It would be nice to rewrite only the last symbol, instead of doing arbitrary searches for overlapping symbols.
It's not 100% clear to me, but the DIA APIs seem to claim that we're enumerating in address order:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/windows/pdb_source_line_writer.cc#280


http://msdn.microsoft.com/en-us/library/971a5e3e.aspx
"IDiaEnumSymbolsByAddr::Next


Retrieves the next symbols in order by address."
Assignee: benjamin → nobody
Whiteboard: [mentor=benjamin@smedbergs.us][lang=c++][crashkill:P2]
There's also a third-party implementation of dump_syms for Windows now:
https://github.com/luser/dump_syms

It would be interesting to see if it suffers from the same bug.
See also https://breakpad.appspot.com/1574002/ which may be changing the general pattern of retrieving symbols.
Mentor: benjamin
Whiteboard: [mentor=benjamin@smedbergs.us][lang=c++][crashkill:P2] → [lang=c++][crashkill:P2]
Just some poking here since I ran into this in another bug. I grabbed the dump for https://crash-stats.mozilla.com/report/index/a9e740e2-ad90-4f34-b151-8e8292140918 and was poking at the symbols. I dumped symbols for msvcr100.dll with the current in-tree dump_syms and they look right:
FUNC 8aeec 8 14 _invalid_parameter(unsigned short const *,unsigned short const *
,unsigned short const *,unsigned int,unsigned int)
FUNC 8aef4 2a 0 _purecall

compared with what we have on the symbol server:
FUNC 8aeec b 14 _invalid_parameter(unsigned short const *,unsigned short const *,unsigned short const *,unsigned int,unsigned int)
FUNC 8aef4 2a 0 _purecall

I guess that upstream Breakpad change fixes this bug. I'll have to update the binaries in use for the fetch-win32-symbols script to make sure we're getting good symbols.

bsmedberg: do you have a list anywhere of affected symbols? We should be able to replace them with files dumped using a newer dump_syms.
Flags: needinfo?(benjamin)
No I don't.
Flags: needinfo?(benjamin)
Mentor: benjamin
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.