Closed
Bug 1062709
Opened 10 years ago
Closed 10 years ago
Clean up stack printing and fixing
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla35
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(4 files, 5 obsolete files)
Our printing and "fixing" (i.e. post-processing) of stacks is a mess. PRINTING -------- Two separate functions format individual stack entries: - NS_FormatCodeAddressDetails(), which is used by nsSigHandlers.cpp and nsTraceRefcnt.cpp. - CodeAddressService::GetLocation(), which is used by DMD.cpp, nsTraceRefcnt.cpp, and BlockingResourceBase.cpp. They are similar, but CodeAddressService provides a cache, which means it's more work to set up but is much faster if you're printing lots of stacks with repeated entries. The exact formatting of the stack entries depends on which of these functions you call, and which platform you are on. What follows is my best attempt to capture all the variations. The inputs are <pc>, <fn>, <foffset>, <file>, <line>, <lib>, <loffset>. The conditions are shown in the comments on the RHS. CodeAddressService::GetLocation(), all platforms: > "??? <pc>" # if: no <fn>, no <lib>, no <loffset> > "<fn|???> (<file>:<line>) <pc>" # else if: <file> > "<fn|???>[<lib> +<loffset>] <pc>" # else NS_FormatCodeAddressDetails(), Windows: > "<fn>+<foffset> [<lib> +<loffset>]" # if: fn > "UNKNOWN [<lib> +<loffset>]" # else if: lib > "UNKNOWN <pc>" # else then, if <file>, append: > " (<file>, line <line>)" # then append: > "\n" NS_FormatCodeAddressDetails(), Sun: > "<pc> <lib|??>:<fn|??>+<foffset>\n" NS_FormatCodeAddressDetails(), other Unixes: > "UNKNOWN <pc>\n" # if: no lib > "UNKNOWN [<lib> +<loffset>]\n" # if: no fn > "<fn>+<foffset> [<lib> +<loffset>]\n" # else NS_FormatCodeAddressDetails(), other: > (error) This is madness. I plan to unify these five variants. FIXING ------ Printed stacks are often "fixed" to add extra information that's not obtainable in the above functions. Specifically: - fix_stacks_using_bpsyms.py is used on tinderbox machines; all mochitest output goes through it. - fix_{linux,macosx}_stacks.py are used on local machines for mochitest, and also for DMD output. - fix_b2g_stacks.py is used for DMD output on B2G. These scripts expect the output to have a particular form, like this: > <stuff1>[<lib> +<offset>]<stuff2> The contents of <stuff1> is restricted, but exactly what's allowed is hard to determine, because it depends on two complex regexps in each of these scripts. This form isn't particularly easy to parse, and it's causing problems for bug 1044709 where I need to put stack entries in JSON strings. So I want to change it to an easier-to-parse form. The above-mentioned unification of the formatting functions will make this easier.
Assignee | ||
Comment 1•10 years ago
|
||
This patch unifies all formatting functions into the single following variant.
> "<fn|???> (<file>:<line>) <pc>" # if: <file>
> "FIX[fn=<fn|???> lib=<lib> offset=<loffset>] <pc>" # else if: <lib>
> "??? <pc>" # else
Things to note:
- Case 1 is the best case: we don't need to do any post-processing.
- Case 2 indicates that post-processing is needed. It's in a form that's much
more obvious and easier to parse.
- Case 3 is the "no useful information" case.
- The <pc> is always on the end now. It wasn't on most of the
NS_FormatCodeAddressDetails() cases previously. It seems useful.
- <foffset> is never printed. It doesn't seem useful.
- "(<file>:<line>) is always used for source locations, rather than "(<file>,
line <line>).
- "???" is used consistently for unknown things. No more "??" and "UNKNOWN".
- There's no newline. Some callers don't want it, and callers who need it can
easily add it.
The patch modifies all callers of the formatting functions accordingly.
The patch also modifies the fix scripts accordingly. fix_b2g_stacks.py is in
the B2G repo and will need to be modified separately, and I'll need to
co-ordinate the landings to avoid breaking DMD on B2G, alas.
Attachment #8483951 -
Flags: review?(dbaron)
Comment 2•10 years ago
|
||
Nice! I made the Windows stack printing match other platforms more closely a while ago, it's all a mess in there. Just for reference, Breakpad's minidump_stackwalk has a similar stack printing routine: https://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/minidump_stackwalk.cc#149 (This is what gets used when we print stack frames from minidumps on tinderbox.)
Assignee | ||
Comment 3•10 years ago
|
||
> Just for reference, Breakpad's minidump_stackwalk has a similar stack
> printing routine:
> https://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/
> minidump_stackwalk.cc#149
>
> (This is what gets used when we print stack frames from minidumps on
> tinderbox.)
Those stack frames don't require post-processing by the fix*.py scripts, right? AFAICT, even prior to this change they aren't in a form that the fix*.py scripts would match.
Assignee | ||
Comment 4•10 years ago
|
||
Attachment #8484769 -
Flags: review?(dbaron)
Assignee | ||
Comment 5•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #4) > Created attachment 8484769 [details] [review] > Link to GitHub PR: https://github.com/mozilla-b2g/B2G/pull/384 I created this attachment manually; not sure if there's an automated way to do this kind of review request.
Comment 6•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #3) > Those stack frames don't require post-processing by the fix*.py scripts, > right? AFAICT, even prior to this change they aren't in a form that the > fix*.py scripts would match. Right, that was just for reference as "here's another way we print stack frames that wind up in test logs".
Assignee | ||
Comment 7•10 years ago
|
||
> "FIX[fn=<fn|???> lib=<lib> offset=<loffset>] <pc>" # else if: <lib> erahm has said he finds this ugly. On Linux, you usually don't have <fn> so any line in this form is practically useless -- "oh, it's somewhere in libxul" -- and so post-processing is practically mandatory. On Mac, however, having a <fn> is common, so post-processing isn't always necessary to get something useful. Here's a Mac example: > nsAString_internal::SetCapacity(unsigned int, mozilla::fallible_t const&)[/Users/njn/moz/mi1/d64dmd/dist/NightlyDebug.app/Contents/MacOS/XUL +0x33277] 0x102d0d277 The constraints on the final form are: 1. Need to allow almost arbitrary stuff beforehand, in a way that is easy to parse. 2. <lib> and <loffset> are highly constrained, but <fn> can be C++ type signatures, and so can contain chars like ' ', ':', '*', '&', '(, ')'. So whatever separates <fn> and <lib> must involve chars that can't appear in <fn>. The FIX form has six chars at the front ("FIX[fn") that satisfy (1), and the "lib=" satisfies (2). If we go with a form similar to the existing forms: > "<fn|???> [<lib> +<loffset>]" then the dividing line between any preceding stuff and <fn> (and possibly even <lib>) is unclear.
Comment 8•10 years ago
|
||
Is there a reason you don't want to put the function name at the end of the line? lib+offset fn [file+line]
Assignee | ||
Comment 9•10 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8) > Is there a reason you don't want to put the function name at the end of the > line? > > lib+offset fn [file+line] That'll make the unfixed Mac case harder to read, because it puts the interesting part (<fn>) nearer to the end. Also, "<lib>+<offset>" doesn't distinguish itself clearly from "arbitrary preceding stuff".
Comment 10•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #7) > If we go with a form similar to the existing forms: > > > "<fn|???> [<lib> +<loffset>]" > > then the dividing line between any preceding stuff and <fn> (and possibly > even <lib>) is unclear. Perhaps just a very explicit prefix would suffice. Something like: frame 0: [Foo::Bar(int baz) lib=libxul+0xABCD1234 pc=0x12300000] frame 1: [nsBlah::Foo() /path/to/file/nsBlah.cpp+345 pc=0x12300002] ... frame 23: [??? lib=libxul+0x1234ABCD pc=0x12300004] This provides for my proposed constraint 3: Format should be consistent across frames, function name always first So now we can determine if a line has a stack frame that could use fixing by using basic regex like "frame\s+[:digit:]+:\s\[([^\]])*\] I guess if we always have a pc= at the end then you can get away with not having the brackets: frame 0: Foo::Bar(int baz) lib=libxul+0xABCD1234 pc=0x12300000 frame 1: nsBlah::Foo() /path/to/file/nsBlah.cpp+345 pc=0x12300002 ... frame 23: ??? lib=libxul+0x1234ABCD pc=0x12300004 is_stack_frame_regex = "^.*frame\s.*pc=-0x[:hex:]+.*$"
Assignee | ||
Comment 11•10 years ago
|
||
Frame number prefixes are a good idea. We could even do it like GDB and use #0, #1, etc. That would allow the following:
> "#<n>: <fn|???> (<file>:<line>) <pc>" # if: <file>
> "#<n>: <fn|???> [<lib> +<loffset>] <pc>" # else if: <lib>
> "#<n>: ??? (???:???) <pc>" # else
Which has the following properties:
- Highly readable even in the non-fixed form.
- More similar to what we currently produce.
- Still pretty easy to parse with a regex. (What comes before cannot match /#\d+:/, which is sufficiently unlikely that I'm happy with it.)
- The 1st and 3rd cases have the same form, which is nice.
Assignee | ||
Comment 12•10 years ago
|
||
I'm going to do it like comment 11, except with the <pc>, because erahm and I decided it doesn't add much.
Assignee | ||
Comment 13•10 years ago
|
||
Attachment #8487686 -
Flags: review?(dbaron)
Assignee | ||
Comment 14•10 years ago
|
||
Attachment #8487687 -
Flags: review?(dbaron)
Assignee | ||
Updated•10 years ago
|
Attachment #8483951 -
Attachment is obsolete: true
Attachment #8483951 -
Flags: review?(dbaron)
Assignee | ||
Comment 15•10 years ago
|
||
dbaron, I updated the GitHub patch so it's also ready for review. Thanks.
Assignee | ||
Comment 16•10 years ago
|
||
Here's some example output:
> #00: replace_malloc[cd64dmd/dist/lib/libdmd.so +0x30ee]
> #01: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x2c632a6]
> #02: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x11e27df]
> #03: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x11e2c1f]
> #04: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x11e7701]
> #05: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x11e69b8]
> #06: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x11e6f50]
> #07: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x11e73b2]
> #08: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1efb146]
> #09: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1efadb3]
> #10: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1f398d5]
> #11: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x16aa877]
> #12: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1adfe67]
I'm ambivalent about the 0-padding of single digit numbers -- it keeps everything at the same indent level, which is nice, but the extra zeroes are ugly. Not sure what's best.
Comment 17•10 years ago
|
||
That looks good to me, I guess you could just use spaces. I'm reasonably ambivalent as well, so whatever is easiest is probably fine. FWIW lldb doesn't pad, not sure about gdb. > #0: replace_malloc[cd64dmd/dist/lib/libdmd.so +0x30ee] > #9: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1efadb3] > #10: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1f398d5] > #11: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x16aa877] > #12: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1adfe67] or > #0: replace_malloc[cd64dmd/dist/lib/libdmd.so +0x30ee] > #9: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1efadb3] > #10: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1f398d5] > #11: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x16aa877] > #12: ???[/home/njn/moz/mi7/cd64dmd/dist/bin/libxul.so +0x1adfe67]
Assignee | ||
Comment 18•10 years ago
|
||
BTW, the frame numbers are quite useful in mochitest output, where stack frames are printed among lots of other output -- it makes it much easier to see where the stack traces start and end.
Assignee | ||
Comment 19•10 years ago
|
||
dbaron: 12 day review ping.
Assignee | ||
Comment 20•10 years ago
|
||
Updated patch that fixes some small problems found on the try server.
Attachment #8491106 -
Flags: review?(dbaron)
Assignee | ||
Updated•10 years ago
|
Attachment #8487686 -
Attachment is obsolete: true
Attachment #8487686 -
Flags: review?(dbaron)
Assignee | ||
Comment 21•10 years ago
|
||
Updated patch that fixes some small problems found on the try server.
Attachment #8491108 -
Flags: review?(dbaron)
Assignee | ||
Updated•10 years ago
|
Attachment #8487687 -
Attachment is obsolete: true
Attachment #8487687 -
Flags: review?(dbaron)
Comment on attachment 8491106 [details] [diff] [review] (part 1) - Add a frame number argument to NS_WalkStackCallback TableTicker.cpp: >+ StackWalkCallback(/* frameNumber */ 0, aSample->pc, aSample->sp, &nativeStack); It almost seems like this should be -1 given that the code following will send 0, I think. Although I suppose the argument is a uint32_t. (Should it be?) nsStackWalk.h: >+// aFrameNumber is the frame number. aSP will be the best approximation >+// possible of what the stack pointer will be pointing to when the execution >+// returns to executing that at that PC. If no approximation can be made it >+// will be nullptr. Perhaps better to make the description of each param described be its own paragraph (i.e., don't rewrap the old stuff)? r=dbaron
Attachment #8491106 -
Flags: review?(dbaron) → review+
So I'm a bit confused by some of the comments in patch 2? Is this changing what the fix_*_stack scripts fix up (i.e., both poor function names and adding file:line info)? And is there such a thing as a "FIX[...]" entry, or is that now "???[...]"? Does the inclusion of frame numbers break tools that want to invert stacks? In particular, make-tree.pl has the ability to build stack trees rooted either at main or at AddRef/Release, and both are useful. (It's the default case, rooted at main, that would be at risk here, rather than --reverse, I think.) I think make-tree.pl is still a pretty important tool for debugging reference count leaks. Sorry for taking so long to get to this; it was one of the large items in the pile of reviews when I got back from vacation.
Flags: needinfo?(n.nethercote)
Assignee | ||
Comment 24•10 years ago
|
||
> TableTicker.cpp:
>
> >+ StackWalkCallback(/* frameNumber */ 0, aSample->pc, aSample->sp, &nativeStack);
>
> It almost seems like this should be -1 given that the code following
> will send 0, I think. Although I suppose the argument is a uint32_t.
> (Should it be?)
That is definitely a weird one. The good news is that StackWalkCallback() ignores the frame number argument, so it doesn't really matter what value we pass in. I'll add a comment about this.
Flags: needinfo?(n.nethercote)
Assignee | ||
Comment 25•10 years ago
|
||
> So I'm a bit confused by some of the comments in patch 2? Is this changing > what the fix_*_stack scripts fix up (i.e., both poor function names and > adding file:line info)? And is there such a thing as a "FIX[...]" entry, or > is that now "???[...]"? In short, it: - Makes the output of our C++ stack-walking code more consistent (e.g. no variation across platforms), and easier to parse. - Changes the fix*.py scripts to handle the new output form. Furthermore, there are three variants in the output form. If we have a filename, it'll be like this: > "#01: foo (Foo.cpp:123) Otherwise, if we have a library name it'll be like this: > "#01: foo[lib +0x1234] (In my original patch this variant used the "FIX" form, but no longer because it was ugly.) Otherwise, it'll be like this: > "#01: ??? (???:???) which is basically the same as the first case. The fix*.py scripts only have to deal with the second case. > Does the inclusion of frame numbers break tools that want to invert stacks? > In particular, make-tree.pl has the ability to build stack trees rooted > either at main or at AddRef/Release, and both are useful. (It's the default > case, rooted at main, that would be at risk here, rather than --reverse, I > think.) I think make-tree.pl is still a pretty important tool for debugging > reference count leaks. I don't know anything about make-tree.pl. I'll give it a run and see if it's affected.
Assignee | ||
Comment 26•10 years ago
|
||
Bleh, the frame number does break make-tree.pl. Here's sample output with the frame numbers present: > n/a .root > 1 #09: ??? (???:???) > | 1 #08: _start (cd64r/dist/bin/firefox) > | 1 #07: __libc_start_main (/build/buildd/eglibc-2.19/csu/libc-start.c:321) > | 1 #06: main (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:612) > | 1 #05: InitXPCOMGlue(char const*, nsIFile**) (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:571) > | 1 #04: NS_NewNativeLocalFile (/home/njn/moz/mi8/cd64r/xpcom/glue/standalone/../../../../xpcom/glue/standalone/nsXPCOMGlue.cpp:575) > | 1 #03: NS_NewNativeLocalFile (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:2081) > | 1 #02: nsRefPtr (/home/njn/moz/mi8/cd64r/xpcom/io/../../dist/include/nsAutoPtr.h:887) > | 1 #01: nsRefPtr (/home/njn/moz/mi8/cd64r/xpcom/io/../../dist/include/nsAutoPtr.h:885) > | 1 #00: nsLocalFile::AddRef() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > 1 #10: ??? (???:???) > | 1 #09: _start (cd64r/dist/bin/firefox) > | 1 #08: __libc_start_main (/build/buildd/eglibc-2.19/csu/libc-start.c:321) > | 1 #07: main (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:643) > | 1 #06: do_main(int, char**, nsIFile*) (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:282) > | 1 #05: XRE_main (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4439) > | 1 #04: XREMain::XRE_main(int, char**, nsXREAppData const*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4183) > | 1 #03: ScopedAppData (/home/njn/moz/mi7/xpcom/glue/AppData.cpp:56) > | 1 #02: void mozilla::SetStrongPtr<nsIFile>(nsIFile*&, nsIFile*) (/home/njn/moz/mi7/cd64r/xpcom/build/../../dist/include/mozilla/AppData.h:59) > | 1 #01: void ns_if_addref<nsIFile*>(nsIFile*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsISupportsUtils.h:46) > | 1 #00: nsLocalFile::AddRef() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > 1 #11: ??? (???:???) > | 1 #10: _start (cd64r/dist/bin/firefox) > | 1 #09: __libc_start_main (/build/buildd/eglibc-2.19/csu/libc-start.c:321) > | 1 #08: main (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:643) > | 1 #07: do_main(int, char**, nsIFile*) (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:282) > | 1 #06: XRE_main (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4439) > | 1 #05: XREMain::XRE_main(int, char**, nsXREAppData const*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4205) > | 1 #04: XREMain::XRE_mainInit(bool*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:3137) > | 1 #03: nsXREDirProvider::Initialize(nsIFile*, nsIFile*, nsIDirectoryServiceProvider*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsXREDirProvider.cpp:107) > | 1 #02: nsCOMPtr<nsIFile>::operator=(nsIFile*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:678) > | 1 #01: nsCOMPtr<nsIFile>::assign_with_AddRef(nsISupports*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:1159) > | 1 #00: nsLocalFile::AddRef() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > -1 #13: ??? (???:???) > | -1 #12: _start (cd64r/dist/bin/firefox) > | -1 #11: __libc_start_main (/build/buildd/eglibc-2.19/csu/libc-start.c:321) > | -1 #10: main (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:643) > | -1 #09: do_main(int, char**, nsIFile*) (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:282) > | -1 #08: XRE_main (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4439) > | -1 #07: XREMain::XRE_main(int, char**, nsXREAppData const*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4205) > | -1 #06: XREMain::XRE_mainInit(bool*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:3137) > | -1 #05: nsXREDirProvider::Initialize(nsIFile*, nsIFile*, nsIDirectoryServiceProvider*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsXREDirProvider.cpp:107) > | -1 #04: nsCOMPtr<nsIFile>::operator=(nsIFile*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:679) > | -1 #03: nsCOMPtr<nsIFile>::Assert_NoQueryNeeded() (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:534) > | -1 #02: ~nsCOMPtr (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:524) > | -1 #01: ~nsCOMPtr (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:522) > | -1 #00: nsLocalFile::Release() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > 1 #17: ??? (???:???) > 1 #16: _start (cd64r/dist/bin/firefox) > 1 #15: __libc_start_main (/build/buildd/eglibc-2.19/csu/libc-start.c:321) > 1 #14: main (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:643) > 1 #13: do_main(int, char**, nsIFile*) (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:282) > 1 #12: XRE_main (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4439) > 1 #11: XREMain::XRE_main(int, char**, nsXREAppData const*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4205) > 1 #10: XREMain::XRE_mainInit(bool*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:3137) > 1 #09: nsXREDirProvider::Initialize(nsIFile*, nsIFile*, nsIDirectoryServiceProvider*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsXREDirProvider.cpp:107) > 1 #08: nsCOMPtr<nsIFile>::operator=(nsIFile*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:679) > 1 #07: nsCOMPtr<nsIFile>::Assert_NoQueryNeeded() (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:531) > 1 #06: nsCOMPtr (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:614) > 1 #05: nsCOMPtr (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:614) > 1 #04: nsCOMPtr<nsIFile>::assign_from_qi(nsQueryInterface, nsID const&) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:1169) > 1 #03: nsQueryInterface::operator()(nsID const&, void**) const (/home/njn/moz/mi7/xpcom/glue/nsCOMPtr.cpp:14) > 1 #02: nsLocalFile::QueryInterface(nsID const&, void**) (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > 1 #01: NS_TableDrivenQI(void*, nsID const&, void**, QITableEntry const*) (/home/njn/moz/mi7/xpcom/glue/nsISupportsImpl.cpp:18) > 1 #00: nsLocalFile::AddRef() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) And the equivalent output with all the frame numbers converted to "#00": > n/a .root > 3 #00: ??? (???:???) > 3 #00: _start (cd64r/dist/bin/firefox) > 3 #00: __libc_start_main (/build/buildd/eglibc-2.19/csu/libc-start.c:321) > 1 #00: main (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:612) > | 1 #00: InitXPCOMGlue(char const*, nsIFile**) (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:571) > | 1 #00: NS_NewNativeLocalFile (/home/njn/moz/mi8/cd64r/xpcom/glue/standalone/../../../../xpcom/glue/standalone/nsXPCOMGlue.cpp:575) > | 1 #00: NS_NewNativeLocalFile (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:2081) > | 1 #00: nsRefPtr (/home/njn/moz/mi8/cd64r/xpcom/io/../../dist/include/nsAutoPtr.h:887) > | 1 #00: nsRefPtr (/home/njn/moz/mi8/cd64r/xpcom/io/../../dist/include/nsAutoPtr.h:885) > | 1 #00: nsLocalFile::AddRef() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > 2 #00: main (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:643) > 2 #00: do_main(int, char**, nsIFile*) (/home/njn/moz/mi7/cd64r/browser/app/../../../browser/app/nsBrowserApp.cpp:282) > 2 #00: XRE_main (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4439) > 1 #00: XREMain::XRE_main(int, char**, nsXREAppData const*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4183) > | 1 #00: ScopedAppData (/home/njn/moz/mi7/xpcom/glue/AppData.cpp:56) > | 1 #00: void mozilla::SetStrongPtr<nsIFile>(nsIFile*&, nsIFile*) (/home/njn/moz/mi7/cd64r/xpcom/build/../../dist/include/mozilla/AppData.h:59) > | 1 #00: void ns_if_addref<nsIFile*>(nsIFile*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsISupportsUtils.h:46) > | 1 #00: nsLocalFile::AddRef() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > 1 #00: XREMain::XRE_main(int, char**, nsXREAppData const*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:4205) > 1 #00: XREMain::XRE_mainInit(bool*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsAppRunner.cpp:3137) > 1 #00: nsXREDirProvider::Initialize(nsIFile*, nsIFile*, nsIDirectoryServiceProvider*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../../toolkit/xre/nsXREDirProvider.cpp:107) > 1 #00: nsCOMPtr<nsIFile>::operator=(nsIFile*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:678) > | 1 #00: nsCOMPtr<nsIFile>::assign_with_AddRef(nsISupports*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:1159) > | 1 #00: nsLocalFile::AddRef() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > 0 #00: nsCOMPtr<nsIFile>::operator=(nsIFile*) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:679) > 1 #00: nsCOMPtr<nsIFile>::Assert_NoQueryNeeded() (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:531) > | 1 #00: nsCOMPtr (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:614) > | 1 #00: nsCOMPtr (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:614) > | 1 #00: nsCOMPtr<nsIFile>::assign_from_qi(nsQueryInterface, nsID const&) (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:1169) > | 1 #00: nsQueryInterface::operator()(nsID const&, void**) const (/home/njn/moz/mi7/xpcom/glue/nsCOMPtr.cpp:14) > | 1 #00: nsLocalFile::QueryInterface(nsID const&, void**) (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > | 1 #00: NS_TableDrivenQI(void*, nsID const&, void**, QITableEntry const*) (/home/njn/moz/mi7/xpcom/glue/nsISupportsImpl.cpp:18) > | 1 #00: nsLocalFile::AddRef() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) > -1 #00: nsCOMPtr<nsIFile>::Assert_NoQueryNeeded() (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:534) > -1 #00: ~nsCOMPtr (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:524) > -1 #00: ~nsCOMPtr (/home/njn/moz/mi7/cd64r/toolkit/xre/../../dist/include/nsCOMPtr.h:522) > -1 #00: nsLocalFile::Release() (/home/njn/moz/mi8/cd64r/xpcom/io/../../../xpcom/io/nsLocalFileUnix.cpp:244) Bah. I guess I'll have to remove the frame numbers and find another way to proceed.
(In reply to Nicholas Nethercote [:njn] from comment #26) > Bah. I guess I'll have to remove the frame numbers and find another way to > proceed. An alternative would be to fix make-tree.pl to remove the frame number
(In reply to David Baron [:dbaron] (UTC-7) (needinfo? for questions) from comment #27) > An alternative would be to fix make-tree.pl to remove the frame number Although if you do that, you still need to make the fix_*_stack scripts handle fixup of stack frames that have the frame number removed.
Comment on attachment 8491108 [details] [diff] [review] (part 2) - Clean up stack printing and fixing Other comments on the patch in addition to the 2 previous comments: security/sandbox/win/src/warnonlysandbox/wosCallbacks.h: need to add a "\n" tools/rb/fix_linux_stack.py: I think the comment at the top was useful; why not just replace the first sentence with your replacement, and leave the rest (but replacing UNKNOWN with ???)? You either need to patch fix-linux-stack.pl or get automation converted to using fix_linux_stack.py before landing this. Or did that happen while I wasn't looking? tools/rb/fix_macosx_stack.py: Your comment refers to the "FIX[...]" entries that are no longer present. Same comment as fix_linux_stack.py about the comment as well, although I'm a bit less confident that it's correct for mac. tools/rb/fix_stack_using_bpsyms.py: again, out of date "FIX[...]" comment review- because I'd like to see what you decide about the make-tree issue
Attachment #8491108 -
Flags: review?(dbaron) → review-
Assignee | ||
Comment 30•10 years ago
|
||
> You either need to patch fix-linux-stack.pl or get automation converted > to using fix_linux_stack.py before landing this. Or did that happen while I > wasn't looking? Already done! :) Bug 1050601. > review- because I'd like to see what you decide about the make-tree issue I just talked to erahm about this. I will try to keep the frame numbers in and modify make-tree.pl. One possibility is to just modify the numbers to all be "#00" or something; clunky, but it would work. Another possibility is to require that fixing occur before running make-tree.pl. I know that the make-tree.pl output is a lot smaller, but that's less of an issue now that the fix*.py scripts have caches in them that make them much faster on files with repeated stack frames. Anyway, not sure if that'll be necessary. Hopefully there aren't any other tools in the tree that silently rely on stack frames having a particular form...
(In reply to Nicholas Nethercote [:njn] from comment #30) > I just talked to erahm about this. I will try to keep the frame numbers in > and modify make-tree.pl. One possibility is to just modify the numbers to > all be "#00" or something; clunky, but it would work. Yeah, that sounds fine (done in make-tree.pl, I presume, and with output that would then be parseable by fix_*_stack).
Assignee | ||
Comment 32•10 years ago
|
||
This fixes make-tree.pl to replace stack frame numbers with 0. I wonder if I should make the stack traces start at one instead of zero. It would allow tools that need to strip out the frame number (which includes make-tree.pl and DMD) to use #00 to indicate "no frame number". We could also use 0 for that weird case in TableTicker.cpp. Plus, counting from 1 is just nicer for this sort of thing, IMO. dbaron, what do you think?
Attachment #8494910 -
Flags: review?(dbaron)
Assignee | ||
Updated•10 years ago
|
Attachment #8491106 -
Attachment is obsolete: true
Assignee | ||
Comment 33•10 years ago
|
||
Updated version with comments addressed. The previous patch fixed the make-tree.pl issue, so this version is only slightly different to the previous one. I tweaked the comments at the top of the fix*.py scripts a little. But the old comment in fix_linux_stack.py was overly specific and thus misleading. Because the output really comes from NS_FormatCodeAddress(). That function is called by NS_StackWalk(), which is called by nsTraceRefcnt::WalkTheStack() but also several other functions.
Attachment #8494913 -
Flags: review?(dbaron)
Assignee | ||
Updated•10 years ago
|
Attachment #8491108 -
Attachment is obsolete: true
Comment on attachment 8494910 [details] [diff] [review] (part 1) - Add a frame number argument to NS_WalkStackCallback >diff --git a/tools/rb/make-tree.pl b/tools/rb/make-tree.pl >+ $_ =~ s/#\d\d: /#00: /; # replace frame number with 0 This really belongs in patch #2. You should allow more than 2 digits, since frame numbers could go over 100. Probably just "\d+". r=dbaron with that. (There really wasn't any need to re-request review on patch 1, particularly if the make-tree.pl change were in patch 2.)
Attachment #8494910 -
Flags: review?(dbaron) → review+
(In reply to Nicholas Nethercote [:njn] from comment #32) > I wonder if I should make the stack traces start at one instead of zero. It > would allow tools that need to strip out the frame number (which includes > make-tree.pl and DMD) to use #00 to indicate "no frame number". We could also > use 0 for that weird case in TableTicker.cpp. Plus, counting from 1 is just > nicer for this sort of thing, IMO. dbaron, what do you think? I don't have a strong feeling, though this sounds kinda nice.
(In reply to Nicholas Nethercote [:njn] from comment #33) > I tweaked the comments at the top of the fix*.py scripts a little. But the > old > comment in fix_linux_stack.py was overly specific and thus misleading. > Because > the output really comes from NS_FormatCodeAddress(). That function is called > by > NS_StackWalk(), which is called by nsTraceRefcnt::WalkTheStack() but also > several other functions. FWIW, the code actually lived in nsTraceRefcnt::WalkTheStack prior to bug 189221 (after which it was moved again in bug 374689).
Comment on attachment 8494913 [details] [diff] [review] (part 2) - Clean up stack printing and fixing r=dbaron, though perhaps the comment for fix_stack_using_bpsyms.py should mention that the symbol is often bad on Linux
Attachment #8494913 -
Flags: review?(dbaron) → review+
Comment on attachment 8484769 [details] [review] Link to GitHub PR: https://github.com/mozilla-b2g/B2G/pull/384 r=dbaron, although the xpcom/base/nsStackWalk.h comment in patch 2 should probably also mention this file. (One thing I'm having second thoughts about is whether the fix_*_stack.py scripts should remove the #0, #1; I was initially thinking they did, but now I notice they don't. I guess it's probably ok to keep it, though.)
Attachment #8484769 -
Flags: review?(dbaron) → review+
Assignee | ||
Comment 39•10 years ago
|
||
Comment on attachment 8484769 [details] [review] Link to GitHub PR: https://github.com/mozilla-b2g/B2G/pull/384 I updated fix_b2g_stack.py to handle both the old and the new formats, per erahm's request.
Attachment #8484769 -
Flags: review?(erahm)
Assignee | ||
Comment 40•10 years ago
|
||
> I updated fix_b2g_stack.py to handle both the old and the new formats, per
> erahm's request.
And I've tested locally that it works with both formats.
Assignee | ||
Comment 41•10 years ago
|
||
Try looks good: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=7d597785ffc5
Updated•10 years ago
|
Attachment #8484769 -
Flags: review?(erahm) → review+
Assignee | ||
Comment 42•10 years ago
|
||
https://hg.mozilla.org/integration/b2g-inbound/rev/a0b82c954206 https://hg.mozilla.org/integration/b2g-inbound/rev/7a1b7d7eba12
Assignee | ||
Comment 43•10 years ago
|
||
https://github.com/mozilla-b2g/B2G/commit/ab86d8c76e59a9edb43dad72c7c996b293843b83
Comment 44•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/a0b82c954206 https://hg.mozilla.org/mozilla-central/rev/7a1b7d7eba12
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Comment 45•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #44) > https://hg.mozilla.org/mozilla-central/rev/a0b82c954206 > https://hg.mozilla.org/mozilla-central/rev/7a1b7d7eba12 Unfortunately I've had to revert these two for causing failures on debug Windows mochitest-5: https://treeherder.mozilla.org/ui/logviewer.html#?job_id=443886&repo=mozilla-central https://treeherder.mozilla.org/ui/logviewer.html#?job_id=443936&repo=mozilla-central https://treeherder.mozilla.org/ui/logviewer.html#?job_id=443937&repo=mozilla-central remote: https://hg.mozilla.org/mozilla-central/rev/c0f5f5f80e66 remote: https://hg.mozilla.org/mozilla-central/rev/388e101e75c8
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 46•10 years ago
|
||
> ValueError: invalid literal for int() with base 16: '0x--DOMWINDOW == 36 (10B40A00) [pid = 3756] [serial = 4583] [outer = 0A028E00] [url = about:blank' For that to happen, there must be some logging lines being spliced together, something like: > #91: nsTimerEvent::Run() [libxul.so 0x--DOMWINDOW == 36 (10B40A00) [pid = 3756] [serial = 4583] [outer = 0A028E00] [url = about:blank' That's really weird, but I can tighten up the regexp to avoid franken-frames like this from matching.
Comment 47•10 years ago
|
||
This is kind of scope creep, but you could also try doing what is done for --DOMWINDOW: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#1291 I think the printf_stderr somehow prints without getting spliced up like that? https://bugzilla.mozilla.org/show_bug.cgi?id=934091#c14 Though I see that was based on something you originally suggested. ;)
Assignee | ||
Comment 48•10 years ago
|
||
Hmm, PrintStackFrame() in toolkit/xre/nsSigHandlers.cpp prints its stacks to stdout. I bet that's the problem.
Assignee | ||
Comment 49•10 years ago
|
||
The diagnosis in comment 46 and comment 48 are correct -- some output lines are being spliced together. I did 72 runs of Mochi-5 with the patches applied and got 10 failures. That's a failure rate of ~14%. The problem is that I was lazy and just used `0x.+` in the regexp matching library addresses. I tried changing it to `0x[0-9A-Fa-f]+` and this fixed the problem because the spliced lines no longer match. I did 45 runs of Mochi-5 with this change and they all passed. Here's an example of the splicing: > 12:38:58 INFO - #90: nsTimerImpl::Fire() [xpcom/threads/nsTimerImpl.cpp:618] > 12:38:58 INFO - #91: nsTimerEvent::Run() [xpcom/threads/nsTimerImpl.cpp:716] > 12:38:59 INFO - #92: XRE_AddJarManifestLocation[xul +0x--DOCSHELL 0FDB2800 == 15 [pid = 1948] [id = 548] > 12:38:59 INFO - --DOCSHELL 0FBAD000 == 14 [pid = 1948] [id = 552] > 12:38:59 INFO - --DOCSHELL 0F2DD400 == 13 [pid = 1948] [id = 551] > 12:38:59 INFO - --DOCSHELL 0E603000 == 12 [pid = 1948] [id = 550] > 12:38:59 INFO - --DOCSHELL 0C034800 == 11 [pid = 1948] [id = 562] > 12:38:59 INFO - --DOCSHELL 0FD48800 == 10 [pid = 1948] [id = 565] > 12:38:59 INFO - --DOCSHELL 10641000 == 9 [pid = 1948] [id = 564] > 12:38:59 INFO - --DOCSHELL 0F46A800 == 8 [pid = 1948] [id = 563] > 12:38:59 INFO - --DOCSHELL 106CEC00 == 7 [pid = 1948] [id = 560] > 12:38:59 INFO - --DOCSHELL 0C037800 == 6 [pid = 1948] [id = 559] > 12:38:59 INFO - 96307] > 12:38:59 INFO - #93: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/glue/nsThreadUtils.cpp:265] > 12:38:59 INFO - #94: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:99] So that's good, but it would be nice to fix the root cause, which is the splicing. So I also tried adding flush() calls to all the places that print stack traces. I did another 45 runs of Mochi-5 with this in place (but without the regexp fix) and all of them passed. So both changes in isolation avoid the problem. But for thoroughness, I will land both of them.
Comment 50•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #49) > So that's good, but it would be nice to fix the root cause, which is the > splicing. So I also tried adding flush() calls to all the places that print > stack traces. I did another 45 runs of Mochi-5 with this in place (but > without the regexp fix) and all of them passed. Thank you for fixing the root cause too! :-)
Assignee | ||
Comment 51•10 years ago
|
||
Re-landed with the regexp fixes and additional flush calls: https://hg.mozilla.org/integration/mozilla-inbound/rev/4ca3b16e24bd https://hg.mozilla.org/integration/mozilla-inbound/rev/0b61bbe1dda7
Assignee | ||
Comment 52•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Attachment #8500188 -
Attachment description: Link to: https://github.com/mozilla-b2g/B2G/pull/393 → Link to GitHub PR: https://github.com/mozilla-b2g/B2G/pull/393
Assignee | ||
Updated•10 years ago
|
Attachment #8500188 -
Flags: review?(erahm)
Comment 53•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/4ca3b16e24bd https://hg.mozilla.org/mozilla-central/rev/0b61bbe1dda7
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Comment 54•10 years ago
|
||
FYI the root cause of the log-mingling is known, bug 798300. It's just hard to fix because we have to work around Windows' incredibly stupid buffering behavior.
Comment 55•10 years ago
|
||
Comment on attachment 8500188 [details] [review] Link to GitHub PR: https://github.com/mozilla-b2g/B2G/pull/393 https://github.com/mozilla-b2g/B2G/commit/42f23cac4c24e2b4694245759a735d4ddb1137ec
Attachment #8500188 -
Flags: review?(erahm) → review+
Assignee | ||
Comment 56•10 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #54) > FYI the root cause of the log-mingling is known, bug 798300. It's just hard > to fix because we have to work around Windows' incredibly stupid buffering > behavior. It's easy to fix by adding explicit flushes in the cases that might write to stdout. I did exactly that in this patch and it fixed at least some of the mingling.
You need to log in
before you can comment on or make changes to this bug.
Description
•