Closed
Bug 323853
Opened 19 years ago
Closed 19 years ago
JB_BP not exported by newer glibc (2.4)
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: wolfiR, Assigned: dbaron)
References
Details
(4 keywords, Whiteboard: [nvn-dl])
Attachments
(5 files, 5 obsolete files)
4.78 KB,
patch
|
Details | Diff | Splinter Review | |
5.29 KB,
patch
|
caillon
:
review+
shaver
:
superreview+
shaver
:
approval-branch-1.8.1+
dveditz
:
approval1.8.0.2+
|
Details | Diff | Splinter Review |
3.89 KB,
patch
|
dveditz
:
approval-aviary1.0.8+
dveditz
:
approval1.7.13+
|
Details | Diff | Splinter Review |
4.58 KB,
patch
|
brendan
:
review+
brendan
:
approval-branch-1.8.1+
|
Details | Diff | Splinter Review |
3.29 KB,
patch
|
bryner
:
review+
bryner
:
superreview+
bryner
:
approval-branch-1.8.1+
|
Details | Diff | Splinter Review |
JB_BP is not exported anymore by glibc 2.4pre. So Linux i386 and PPC are affected. I'm testing a patch to use backtrace() glibc functions right now and will attach it if seems to work.
Reporter | ||
Comment 1•19 years ago
|
||
glibc documentation about backtrace() stuff: http://www.delorie.com/gnu/docs/glibc/libc_665.html
Reporter | ||
Comment 2•19 years ago
|
||
This patch uses the backtrace() glibc functions if glibc 2.3 or newer is used. It's mandatory only for glibc 2.4. But using the backtrace functions enables more platforms to be able to dump a stacktrace. (This has to be activated at caller side, too. Therefore this is "part 1")
Reporter | ||
Comment 3•19 years ago
|
||
This patch adds a dummy function for Linux platforms where stacktrace isn't implemented and changes the callers accordingly. To make the logic for the callers simpler.
Attachment #208843 -
Attachment is obsolete: true
Reporter | ||
Updated•19 years ago
|
Attachment #208878 -
Flags: review?(benjamin)
Updated•19 years ago
|
Attachment #208878 -
Flags: review?(benjamin) → review+
Reporter | ||
Updated•19 years ago
|
Attachment #208878 -
Flags: superreview?(shaver)
Comment on attachment 208878 [details] [diff] [review] patch (complete) > static char _progname[1024] = "huh?"; > >-#if defined(LINUX) && defined(DEBUG) && (defined(__i386) || defined(PPC)) >+#if defined(LINUX) && defined(DEBUG) > #define CRAWL_STACK_ON_SIGSEGV > #endif Doesn't this #if also need the glibc version check? Otherwise older-glibc Linuxes that aren't on PPC or x86 will define CRAWL_STACK_ON_SIGSEGV but have no way to do it. Likely a build-killer there, and I don't know what glibc version is seen on various ARM Linux setups, etc.
Attachment #208878 -
Flags: superreview?(shaver) → superreview-
Reporter | ||
Comment 5•19 years ago
|
||
(In reply to comment #4) > (From update of attachment 208878 [details] [diff] [review] [edit]) > > static char _progname[1024] = "huh?"; > > > >-#if defined(LINUX) && defined(DEBUG) && (defined(__i386) || defined(PPC)) > >+#if defined(LINUX) && defined(DEBUG) > > #define CRAWL_STACK_ON_SIGSEGV > > #endif > > Doesn't this #if also need the glibc version check? Otherwise older-glibc > Linuxes that aren't on PPC or x86 will define CRAWL_STACK_ON_SIGSEGV but have > no way to do it. Likely a build-killer there, and I don't know what glibc > version is seen on various ARM Linux setups, etc. hmm, basically right, but there was no glibc check here before. So it should just output "not implemented" instead of the stack. That's one of the reasons for the not implemented case in nsStackFrameUnix.cpp
Comment on attachment 208878 [details] [diff] [review] patch (complete) Ah, OK. So people on those platforms will get a "write me, dammit" output when they crash, rather than a stack. I can live with that, though it's likely to be a little confusing (and make people wonder at first if their crash is due to a call of an unimplemented methods, etc). I can live with that, though. Thanks for the clarification.
Attachment #208878 -
Flags: superreview- → superreview+
Reporter | ||
Comment 7•19 years ago
|
||
This patch addresses the comments with minor tweaks: - we could have called DumpStackToFile with non-glibc Linux systems (are there any?) without implementing the function on those systems - changed wording of the not implemented case to "Info: Stacktrace not implemented for this Linux platform" - define CRAWL_STACK_ON_SIGSEGV only if there is a better chance to have a useful stacktrace if it's a Linux glibc system
Reporter | ||
Comment 8•19 years ago
|
||
Checking in xpcom/base/nsStackFrameUnix.cpp; /cvsroot/mozilla/xpcom/base/nsStackFrameUnix.cpp,v <-- nsStackFrameUnix.cpp new revision: 1.13; previous revision: 1.12 done Checking in xpcom/base/nsTraceRefcntImpl.cpp; /cvsroot/mozilla/xpcom/base/nsTraceRefcntImpl.cpp,v <-- nsTraceRefcntImpl.cpp new revision: 1.97; previous revision: 1.96 done Checking in xpfe/bootstrap/nsSigHandlers.cpp; /cvsroot/mozilla/xpfe/bootstrap/nsSigHandlers.cpp,v <-- nsSigHandlers.cpp new revision: 1.43; previous revision: 1.42 done
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment 9•19 years ago
|
||
So... this changed the format of the data output by this code, no? What happened to updating the tools that are used to post-process said data? As things stand, refcount logging, for example, became nearly useless when this landed because the tree-balancer and so forth can't handle the data anymore. I've backed this out of my tree locally so that I can continue to debug leaks, but we should have a better solution than that, since I presume sometimes this patch really is needed. Would you like a separate bug on fixing things to work again, or should I reopen this one?
Comment 10•19 years ago
|
||
Also, doesn't that patch limit the stack to 20 deep? Why was that done? Why 20? If nothing else, if I understand what backtrace() does, 20 is way too low to work -- we have stacks with far more frames on them quite routinely.
Reporter | ||
Comment 11•19 years ago
|
||
(In reply to comment #9) > So... this changed the format of the data output by this code, no? What > happened to updating the tools that are used to post-process said data? As > things stand, refcount logging, for example, became nearly useless when this > landed because the tree-balancer and so forth can't handle the data anymore. Sorry, I wasn't aware of tools which analyze that data. And I'm not sure if there is a way to get the old format back. A workaround would be to change the glibc version check to only use the new code if glibc is >= 2.4. (This would still break some platforms as current devel glibc release seems to be 2.3.90 and JB_BP is already removed. (In reply to comment #10) > Also, doesn't that patch limit the stack to 20 deep? Why was that done? Why > 20? If nothing else, if I understand what backtrace() does, 20 is way too low > to work -- we have stacks with far more frames on them quite routinely. I agree that 20 might be too low. But AFAIK we have to choose a fixed number. What would you propose? I think it's OK to reopen this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Sounds like we need a configure test for JP_BP, so that only those cases which wouldn't work with the current (previous) code anyway get the new backtrace() behaviour. (Or, better, someone could write another way to walk the stack without JP_BP so that we could preserve the output format and tool utility.) If backtrace() makes us pick a maximum number of frames, it's definitely broken for our purposes and we should just revert and start over.
Comment 13•19 years ago
|
||
> Sorry, I wasn't aware of tools which analyze that data. They're at least in tools/rb; I don't think we have other tools, but I could be wrong... > And I'm not sure if there is a way to get the old format back. That might be fine if we can fix the scripts to deal with both formats. > I agree that 20 might be too low. But AFAIK we have to choose a fixed number. > What would you propose? The only way I can see of making backtrace() work for us is to pick a large fixed number and then do something like: retry: entries = backtrace(array, size); if (entries == size) { // allocate new array with bigger size goto retry; } // call backtrace_symbols_fd (except with better flow control than using goto, hopefully). Any choice of fixed number without something like the above is going to probably cause breakage in some cases, unless we pick it big enough to guarantee that that many frames would case stack overflow...
I don't think we can allocate memory while we're doing backtraces, e.g., in the segfault handler. OTOH I don't see why a limit of say 1000 would be any real constraint.
Comment 15•19 years ago
|
||
Comment 16•19 years ago
|
||
Comment on attachment 208878 [details] [diff] [review] patch (complete) We'll need this on the various branches. My builds are failing with this error trying to deploy 1.5.0.1. Patching my RPMs for now, but would be good to get this in the next set of releases.
Attachment #208878 -
Flags: branch-1.8.1?
Attachment #208878 -
Flags: approval1.8.0.2?
Attachment #208878 -
Flags: approval1.7.13?
Attachment #208878 -
Flags: approval-aviary1.0.8?
Comment 17•19 years ago
|
||
Comment on attachment 208878 [details] [diff] [review] patch (complete) Well, except this doesn't apply to the branches. :-) I'll post a backport in a few.
Attachment #208878 -
Flags: branch-1.8.1?
Attachment #208878 -
Flags: approval1.8.0.2?
Attachment #208878 -
Flags: approval1.7.13?
Attachment #208878 -
Flags: approval-aviary1.0.8?
Comment 18•19 years ago
|
||
Actually, bz raised a good point that this shouldn't even be being built for releases. Filed bug 325537
No, I don't think we want that patch on the branches, because it breaks our memory-leak detection tools. What we want on the branch is either a patch to use some alternate means of walking the stack so that we can preserve the format (best), or a configure test to only use backtrace() when JB_BP is not available and the build would therefore fail. We should back this patch out of the trunk shortly, if nobody is going to be able to commit to providing a compatibility-restoring fix, say by Saturday?
Assignee | ||
Comment 20•19 years ago
|
||
I agree that this should be backed out and should not be taken on branches.
Reporter | ||
Comment 21•19 years ago
|
||
I also agree for now. But anyway we need a solution soon, while I don't have one at the moment :-( The upcoming distributions will use something like glibc 2.3.90 which doesn't export JB_BP.
Comment 22•19 years ago
|
||
timeless backed this out on trunk.
Assignee | ||
Comment 23•19 years ago
|
||
The "trivial" solution (much less invasive, anyway) to getting the frame pointer should be to use inline __asm__ rather than setjmp.
Assignee | ||
Comment 24•19 years ago
|
||
I tested that this works on x86; the bits of ppc and sparc (for fun!) assembly are completely untested.
Attachment #210815 -
Flags: review?(mozilla)
Reporter | ||
Comment 25•19 years ago
|
||
Comment on attachment 210815 [details] [diff] [review] patch to use inline assembler The patch looks OK for certain circumstances. At least it should keep the functionality (when -fomit-frame-pointer is not used as compiler flag) I'm about checking out if there are other options which keep the output but are more portable. Please stay tuned.
Reporter | ||
Updated•19 years ago
|
Attachment #210815 -
Flags: review?(mozilla) → review+
Reporter | ||
Comment 26•19 years ago
|
||
This patch uses backtrace() again but should keep the output format as it was. There is still the problem that we have a fixed number (here 1024) of stack depth. I'm fine with dbaron's patch as it fixes the compile error but looking at the other architectures it might be interesting to get this one ready.
Attachment #208878 -
Attachment is obsolete: true
Attachment #209335 -
Attachment is obsolete: true
Attachment #210230 -
Attachment is obsolete: true
Assignee | ||
Updated•19 years ago
|
Attachment #210815 -
Flags: superreview?(shaver)
Comment 27•19 years ago
|
||
So, I had a talk with our toolchain folks who suggest using __builtin_frame_address (0); which has been around in gcc since the 2.x days. We should also check to make sure though that we are not building with -fomit-frame-pointer because that will make walking the stack pretty difficult :-) To get better portability, we should use glibc backtrace() since that will work on platforms where it is otherwise impossible to get to the unwind information, such as s390(x), x86-64 and ia64. We should probably combine the two. Use glibc backtrace() if available, else fall back to what we have now, using __builtin_frame_address () instead on the platforms with easy stack walking code (i386 and ppc).
Comment 28•19 years ago
|
||
Jakub Jelinek does not have a bugzilla account here, and asked me add this comment: "If you are worrying about fixed buffer size, you can use some sane default. Say 100 or something like that, and if backtrace fills the whole array (returns 100), then retry with twice as big buffer."
Comment 29•19 years ago
|
||
Kai, see comment 13 and comment 14.
Assignee | ||
Comment 30•19 years ago
|
||
I think __builtin_frame_address is better than inline assembler. This patch also cleans up a few includes for stuff that's no longer in that file (these were copied to nsStackFrameUnix.cpp but never removed from the source) and fixes some warnings.
Attachment #210815 -
Attachment is obsolete: true
Attachment #210915 -
Flags: superreview?(shaver)
Attachment #210915 -
Flags: review?(mozilla)
Attachment #210815 -
Flags: superreview?(shaver)
Comment on attachment 210915 [details] [diff] [review] use __builtin_frame_address sr=shaver, but is __builtin_frame_address linux-only, or can we do this with gcc on any platforms? (Should we be configure-testing for __builtin_frame_address instead?)
Attachment #210915 -
Flags: superreview?(shaver) → superreview+
Comment 32•19 years ago
|
||
__builtin_frame_address is not platform specific. Why should it? Just think about it. The compiler always has to know about the current location of the frame to access parameters, local variables, etc. gcc and all compilers which claim to be gcc-compatible have this built-in.
I did "think about it", and I realized that I didn't know enough about whether all gcc-supported computation models had a compatible notion of a frame. Sorry.
Comment 34•19 years ago
|
||
Comment on attachment 210915 [details] [diff] [review] use __builtin_frame_address I'll go ahead and mark r=caillon on this. It gets the job done for me here, plus it seems to be the cleanest solution to the specific issue with JB_BP. Ideally, we can get this working with backtrace() so other platforms, especially x86-64, can work with this code, but that might deserve a different bug anyway.
Attachment #210915 -
Flags: review?(mozilla) → review+
Assignee | ||
Updated•19 years ago
|
Assignee: dougt → dbaron
Status: REOPENED → NEW
Assignee | ||
Comment 35•19 years ago
|
||
__builtin_frame_address isn't Linux-specific or architecture specific, but some of the other stuff that the ifdef-ed code does (especially the stack frame layout, having the saved PC one word mathematically above the frame pointer) could be. I'd rather expand the ifdefs based on what's known to work.
Assignee | ||
Comment 36•19 years ago
|
||
Fix checked in to trunk. Filed bug 326594 on using backtrace().
Status: NEW → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•19 years ago
|
Attachment #210915 -
Flags: branch-1.8.1?(shaver)
Comment on attachment 210915 [details] [diff] [review] use __builtin_frame_address b181=shaver
Attachment #210915 -
Flags: approval-branch-1.8.1?(shaver) → approval-branch-1.8.1+
Comment 38•19 years ago
|
||
Merge conflicts, simple backport. We should get this on the older branches as well.
Attachment #211873 -
Flags: approval1.7.13?
Attachment #211873 -
Flags: approval-aviary1.0.8?
Assignee | ||
Comment 40•18 years ago
|
||
I'm curious why anyone cares about updating any branch other than the latest release branch to build on new systems. Any new distributions should be using the latest release branch, not using a previous release branch, right? I'd expect the maintainance of old release branches to be for existing systems, none of which have this new a version of glibc.
Comment 41•18 years ago
|
||
Fedora Core 5 will ship with Mozilla 1.7.12
Comment 42•18 years ago
|
||
It would also be helpful to have our official binaries run on these newer glibcs.
Assignee | ||
Comment 43•18 years ago
|
||
OK, yikes. (I'm not sure there's any runtime issue here, though, but I suppose there could be.) So I think we should approve this.
Comment 44•18 years ago
|
||
Comment on attachment 211873 [details] [diff] [review] 1.7.x branch patch approving for old branches assuming you're now on the hook to help us sanity check these releases on linux, a=dveditz for drivers
Attachment #211873 -
Flags: approval1.7.13?
Attachment #211873 -
Flags: approval1.7.13+
Attachment #211873 -
Flags: approval-aviary1.0.8?
Attachment #211873 -
Flags: approval-aviary1.0.8+
Assignee | ||
Updated•18 years ago
|
Attachment #210915 -
Flags: approval1.8.0.2?
Assignee | ||
Comment 45•18 years ago
|
||
Checked caillon's 1.7.x branch patch into MOZILLA_1_7_BRANCH and AVIARY_1_0_1_20050124_BRANCH .
Keywords: fixed-aviary1.0.8,
fixed1.7.13
Comment 46•18 years ago
|
||
*** Bug 324786 has been marked as a duplicate of this bug. ***
Updated•18 years ago
|
Flags: blocking1.8.0.2+
Comment 47•18 years ago
|
||
Comment on attachment 210915 [details] [diff] [review] use __builtin_frame_address approved for 1.8.0 branch, a=dveditz
Attachment #210915 -
Flags: approval1.8.0.2? → approval1.8.0.2+
Updated•18 years ago
|
Whiteboard: [nvn-dl]
Assignee | ||
Comment 49•18 years ago
|
||
We have other debugging code with the same problem...
Attachment #215509 -
Flags: review?(brendan)
Attachment #215509 -
Flags: approval-branch-1.8.1?(brendan)
Updated•18 years ago
|
Attachment #215509 -
Flags: review?(brendan)
Attachment #215509 -
Flags: review+
Attachment #215509 -
Flags: approval-branch-1.8.1?(brendan)
Attachment #215509 -
Flags: approval-branch-1.8.1+
Assignee | ||
Comment 50•18 years ago
|
||
Comment on attachment 215509 [details] [diff] [review] and apply the same fix elsewhere Checked in to trunk.
Assignee | ||
Comment 51•18 years ago
|
||
I'm rebuilding with -fno-omit-frame-pointer (which I think is now needed if you want frame pointers reliably) to test that this really works...
Attachment #215705 -
Flags: superreview?(bryner)
Attachment #215705 -
Flags: review?(bryner)
Updated•18 years ago
|
Attachment #215705 -
Flags: superreview?(bryner)
Attachment #215705 -
Flags: superreview+
Attachment #215705 -
Flags: review?(bryner)
Attachment #215705 -
Flags: review+
Assignee | ||
Comment 52•18 years ago
|
||
So, FWIW, none of the stack trace tools work for me on FC5, even with -fno-omit-frame-pointer. But I don't *think* this should break anything...
Assignee | ||
Updated•18 years ago
|
Attachment #215705 -
Flags: approval-branch-1.8.1?(bryner)
Updated•18 years ago
|
Attachment #215705 -
Flags: approval-branch-1.8.1?(bryner) → approval-branch-1.8.1+
Assignee | ||
Comment 53•18 years ago
|
||
And I tested on an RH 7.x system that the two give the same result (i.e., not a frame off, which would matter here more than elsewhere).
Assignee | ||
Comment 54•18 years ago
|
||
Comment on attachment 215705 [details] [diff] [review] and fix jprof too Checked in to trunk and MOZILLA_1_8_BRANCH.
Assignee | ||
Comment 55•18 years ago
|
||
Comment on attachment 215509 [details] [diff] [review] and apply the same fix elsewhere Checked in to MOZILLA_1_8_BRANCH.
Assignee | ||
Comment 56•18 years ago
|
||
So it turns out __builtin_frame_address(0) stopped working on Fedora Core 5, so none of this new code actually works anymore. The problem is described in more detail in bug 331436. Unless somebody comes up with a decent solution, I'll probably scrap it and replace it with inline assembler.
caillon: any new ideas from the toolchain people? :)
Comment 58•18 years ago
|
||
ok, it looks like the 7th patch in that bug makes it compile, if you go in and fix the rejects by hand
Comment 59•18 years ago
|
||
(In reply to comment #58) > ok, it looks like the 7th patch in that bug makes it compile, if you go in and > fix the rejects by hand > Just a user comment here, Some IBM client software (Workplace WMC) wants you to build Mozilla 1.4.3 if you use a non-supported Linux version. It needs to be configured to use the the gtk2 library and the Xft font engine. This bug keeps it from building on many versions. Is there a fix for 1.4.3? thanks, fred
You need to log in
before you can comment on or make changes to this bug.
Description
•