Last Comment Bug 323853 - JB_BP not exported by newer glibc (2.4)
: JB_BP not exported by newer glibc (2.4)
Status: RESOLVED FIXED
[nvn-dl]
: fixed-aviary1.0.8, fixed1.7.13, fixed1.8.0.2, fixed1.8.1
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: x86 Linux
: -- major (vote)
: ---
Assigned To: David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
:
Mentors:
: 324786 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-01-18 00:27 PST by Wolfgang Rosenauer [:wolfiR]
Modified: 2006-04-17 10:33 PDT (History)
9 users (show)
dveditz: blocking1.8.0.2+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (part 1) (1.27 KB, patch)
2006-01-18 02:53 PST, Wolfgang Rosenauer [:wolfiR]
no flags Details | Diff | Splinter Review
patch (complete) (3.73 KB, patch)
2006-01-18 11:09 PST, Wolfgang Rosenauer [:wolfiR]
benjamin: review+
shaver: superreview+
Details | Diff | Splinter Review
patch (slightly changed) (3.76 KB, patch)
2006-01-23 01:30 PST, Wolfgang Rosenauer [:wolfiR]
no flags Details | Diff | Splinter Review
Patch for Seamonkey 1.0 (2.91 KB, patch)
2006-01-31 05:59 PST, Bjoern Voigt
no flags Details | Diff | Splinter Review
patch to use inline assembler (1.44 KB, patch)
2006-02-05 18:09 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
mozilla: review+
Details | Diff | Splinter Review
patch using backtrace() and keep demangling (4.78 KB, patch)
2006-02-05 22:55 PST, Wolfgang Rosenauer [:wolfiR]
no flags Details | Diff | Splinter Review
use __builtin_frame_address (5.29 KB, patch)
2006-02-06 12:35 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
caillon: review+
shaver: superreview+
shaver: approval‑branch‑1.8.1+
dveditz: approval1.8.0.2+
Details | Diff | Splinter Review
1.7.x branch patch (3.89 KB, patch)
2006-02-14 10:24 PST, Christopher Aillon (sabbatical, not receiving bugmail)
dveditz: approval‑aviary1.0.8+
dveditz: approval1.7.13+
Details | Diff | Splinter Review
and apply the same fix elsewhere (4.58 KB, patch)
2006-03-18 09:59 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
brendan: review+
brendan: approval‑branch‑1.8.1+
Details | Diff | Splinter Review
and fix jprof too (3.29 KB, patch)
2006-03-20 14:56 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
bryner: review+
bryner: superreview+
bryner: approval‑branch‑1.8.1+
Details | Diff | Splinter Review

Description Wolfgang Rosenauer [:wolfiR] 2006-01-18 00:27:03 PST
JB_BP is not exported anymore by glibc 2.4pre.
So Linux i386 and PPC are affected.
I'm testing a patch to use backtrace() glibc functions right now and will attach it if seems to work.
Comment 1 Wolfgang Rosenauer [:wolfiR] 2006-01-18 00:44:18 PST
glibc documentation about backtrace() stuff:
http://www.delorie.com/gnu/docs/glibc/libc_665.html
Comment 2 Wolfgang Rosenauer [:wolfiR] 2006-01-18 02:53:03 PST
Created attachment 208843 [details] [diff] [review]
patch (part 1)

This patch uses the backtrace() glibc functions if glibc 2.3 or newer is used.
It's mandatory only for glibc 2.4. But using the backtrace functions enables more platforms to be able to dump a stacktrace.
(This has to be activated at caller side, too. Therefore this is "part 1")
Comment 3 Wolfgang Rosenauer [:wolfiR] 2006-01-18 11:09:00 PST
Created attachment 208878 [details] [diff] [review]
patch (complete)

This patch adds a dummy function for Linux platforms where stacktrace isn't implemented and changes the callers accordingly. To make the logic for the callers simpler.
Comment 4 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-01-22 11:44:55 PST
Comment on attachment 208878 [details] [diff] [review]
patch (complete)

> static char _progname[1024] = "huh?";
> 
>-#if defined(LINUX) && defined(DEBUG) && (defined(__i386) || defined(PPC))
>+#if defined(LINUX) && defined(DEBUG)
> #define CRAWL_STACK_ON_SIGSEGV
> #endif

Doesn't this #if also need the glibc version check?  Otherwise older-glibc Linuxes that aren't on PPC or x86 will define CRAWL_STACK_ON_SIGSEGV but have no way to do it.  Likely a build-killer there, and I don't know what glibc version is seen on various ARM Linux setups, etc.
Comment 5 Wolfgang Rosenauer [:wolfiR] 2006-01-22 11:58:16 PST
(In reply to comment #4)
> (From update of attachment 208878 [details] [diff] [review] [edit])
> > static char _progname[1024] = "huh?";
> > 
> >-#if defined(LINUX) && defined(DEBUG) && (defined(__i386) || defined(PPC))
> >+#if defined(LINUX) && defined(DEBUG)
> > #define CRAWL_STACK_ON_SIGSEGV
> > #endif
> 
> Doesn't this #if also need the glibc version check?  Otherwise older-glibc
> Linuxes that aren't on PPC or x86 will define CRAWL_STACK_ON_SIGSEGV but have
> no way to do it.  Likely a build-killer there, and I don't know what glibc
> version is seen on various ARM Linux setups, etc.

hmm, basically right, but there was no glibc check here before.
So it should just output "not implemented" instead of the stack.
That's one of the reasons for the not implemented case in nsStackFrameUnix.cpp

Comment 6 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-01-22 22:28:05 PST
Comment on attachment 208878 [details] [diff] [review]
patch (complete)

Ah, OK.  So people on those platforms will get a "write me, dammit" output when they crash, rather than a stack.  I can live with that, though it's likely to be a little confusing (and make people wonder at first if their crash is due to a call of an unimplemented methods, etc).

I can live with that, though.  Thanks for the clarification.
Comment 7 Wolfgang Rosenauer [:wolfiR] 2006-01-23 01:30:26 PST
Created attachment 209335 [details] [diff] [review]
patch (slightly changed)

This patch addresses the comments with minor tweaks:
- we could have called DumpStackToFile with non-glibc Linux systems (are there any?) without implementing the function on those systems
- changed wording of the not implemented case to
  "Info: Stacktrace not implemented for this Linux platform"
- define CRAWL_STACK_ON_SIGSEGV only if there is a better chance to have a useful
  stacktrace if it's a Linux glibc system
Comment 8 Wolfgang Rosenauer [:wolfiR] 2006-01-24 21:47:02 PST
Checking in xpcom/base/nsStackFrameUnix.cpp;
/cvsroot/mozilla/xpcom/base/nsStackFrameUnix.cpp,v  <--  nsStackFrameUnix.cpp
new revision: 1.13; previous revision: 1.12
done
Checking in xpcom/base/nsTraceRefcntImpl.cpp;
/cvsroot/mozilla/xpcom/base/nsTraceRefcntImpl.cpp,v  <--  nsTraceRefcntImpl.cpp
new revision: 1.97; previous revision: 1.96
done
Checking in xpfe/bootstrap/nsSigHandlers.cpp;
/cvsroot/mozilla/xpfe/bootstrap/nsSigHandlers.cpp,v  <--  nsSigHandlers.cpp
new revision: 1.43; previous revision: 1.42
done
Comment 9 Boris Zbarsky [:bz] 2006-01-27 15:14:08 PST
So... this changed the format of the data output by this code, no?  What happened to updating the tools that are used to post-process said data?  As things stand, refcount logging, for example, became nearly useless when this landed because the tree-balancer and so forth can't handle the data anymore.

I've backed this out of my tree locally so that I can continue to debug leaks, but we should have a better solution than that, since I presume sometimes this patch really is needed.

Would you like a separate bug on fixing things to work again, or should I reopen this one?
Comment 10 Boris Zbarsky [:bz] 2006-01-27 15:19:11 PST
Also, doesn't that patch limit the stack to 20 deep?  Why was that done?  Why 20?  If nothing else, if I understand what backtrace() does, 20 is way too low to work -- we have stacks with far more frames on them quite routinely.
Comment 11 Wolfgang Rosenauer [:wolfiR] 2006-01-28 01:06:33 PST
(In reply to comment #9)
> So... this changed the format of the data output by this code, no?  What
> happened to updating the tools that are used to post-process said data?  As
> things stand, refcount logging, for example, became nearly useless when this
> landed because the tree-balancer and so forth can't handle the data anymore.

Sorry, I wasn't aware of tools which analyze that data.
And I'm not sure if there is a way to get the old format back.
A workaround would be to change the glibc version check to only use the new
code if glibc is >= 2.4. (This would still break some platforms as current devel glibc release seems to be 2.3.90 and JB_BP is already removed.

(In reply to comment #10)
> Also, doesn't that patch limit the stack to 20 deep?  Why was that done?  Why
> 20?  If nothing else, if I understand what backtrace() does, 20 is way too low
> to work -- we have stacks with far more frames on them quite routinely.

I agree that 20 might be too low. But AFAIK we have to choose a fixed number.
What would you propose?
I think it's OK to reopen this bug.
Comment 12 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-01-28 14:30:02 PST
Sounds like we need a configure test for JP_BP, so that only those cases which wouldn't work with the current (previous) code anyway get the new backtrace() behaviour.  (Or, better, someone could write another way to walk the stack without JP_BP so that we could preserve the output format and tool utility.)

If backtrace() makes us pick a maximum number of frames, it's definitely broken for our purposes and we should just revert and start over.
Comment 13 Boris Zbarsky [:bz] 2006-01-29 13:01:08 PST
> Sorry, I wasn't aware of tools which analyze that data.

They're at least in tools/rb; I don't think we have other tools, but I could be wrong...

> And I'm not sure if there is a way to get the old format back.

That might be fine if we can fix the scripts to deal with both formats.

> I agree that 20 might be too low. But AFAIK we have to choose a fixed number.
> What would you propose?

The only way I can see of making backtrace() work for us is to pick a large fixed number and then do something like:

retry:
  entries = backtrace(array, size);
  if (entries == size) {
    // allocate new array with bigger size
    goto retry;
  }
  // call backtrace_symbols_fd

(except with better flow control than using goto, hopefully).

Any choice of fixed number without something like the above is going to probably cause breakage in some cases, unless we pick it big enough to guarantee that that many frames would case stack overflow...
Comment 14 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2006-01-30 00:38:16 PST
I don't think we can allocate memory while we're doing backtraces, e.g., in the segfault handler.

OTOH I don't see why a limit of say 1000 would be any real constraint.
Comment 15 Bjoern Voigt 2006-01-31 05:59:07 PST
Created attachment 210230 [details] [diff] [review]
Patch for Seamonkey 1.0
Comment 16 Christopher Aillon (sabbatical, not receiving bugmail) 2006-02-01 20:51:30 PST
Comment on attachment 208878 [details] [diff] [review]
patch (complete)

We'll need this on the various branches.  My builds are failing with this error trying to deploy 1.5.0.1.  Patching my RPMs for now, but would be good to get this in the next set of releases.
Comment 17 Christopher Aillon (sabbatical, not receiving bugmail) 2006-02-01 21:02:41 PST
Comment on attachment 208878 [details] [diff] [review]
patch (complete)

Well, except this doesn't apply to the branches.  :-)

I'll post a backport in a few.
Comment 18 Christopher Aillon (sabbatical, not receiving bugmail) 2006-02-01 21:17:10 PST
Actually, bz raised a good point that this shouldn't even be being built for releases.  Filed bug 325537
Comment 19 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-02-02 06:24:37 PST
No, I don't think we want that patch on the branches, because it breaks our memory-leak detection tools.  What we want on the branch is either a patch to use some alternate means of walking the stack so that we can preserve the format (best), or a configure test to only use backtrace() when JB_BP is not available and the build would therefore fail.

We should back this patch out of the trunk shortly, if nobody is going to be able to commit to providing a compatibility-restoring fix, say by Saturday?
Comment 20 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-02 16:07:14 PST
I agree that this should be backed out and should not be taken on branches.
Comment 21 Wolfgang Rosenauer [:wolfiR] 2006-02-02 21:32:07 PST
I also agree for now. But anyway we need a solution soon, while I don't have one at the moment :-(
The upcoming distributions will use something like glibc 2.3.90 which doesn't export JB_BP.
Comment 22 Boris Zbarsky [:bz] 2006-02-03 22:29:15 PST
timeless backed this out on trunk.
Comment 23 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-05 17:22:43 PST
The "trivial" solution (much less invasive, anyway) to getting the frame pointer should be to use inline __asm__ rather than setjmp.
Comment 24 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-05 18:09:33 PST
Created attachment 210815 [details] [diff] [review]
patch to use inline assembler

I tested that this works on x86; the bits of ppc and sparc (for fun!) assembly are completely untested.
Comment 25 Wolfgang Rosenauer [:wolfiR] 2006-02-05 21:57:08 PST
Comment on attachment 210815 [details] [diff] [review]
patch to use inline assembler

The patch looks OK for certain circumstances. At least it should keep the functionality (when -fomit-frame-pointer is not used as compiler flag)

I'm about checking out if there are other options which keep the output but are more portable.
Please stay tuned.
Comment 26 Wolfgang Rosenauer [:wolfiR] 2006-02-05 22:55:00 PST
Created attachment 210837 [details] [diff] [review]
patch using backtrace() and keep demangling

This patch uses backtrace() again but should keep the output format as it was.
There is still the problem that we have a fixed number (here 1024) of stack depth.
I'm fine with dbaron's patch as it fixes the compile error but looking at the other architectures it might be interesting to get this one ready.
Comment 27 Christopher Aillon (sabbatical, not receiving bugmail) 2006-02-06 01:14:13 PST
So, I had a talk with our toolchain folks who suggest using

__builtin_frame_address (0);

which has been around in gcc since the 2.x days.  We should also check to make sure though that we are not building with -fomit-frame-pointer because that will make walking the stack pretty difficult :-)

To get better portability, we should use glibc backtrace() since that will work on platforms where it is otherwise impossible to get to the unwind information, such as s390(x), x86-64 and ia64.

We should probably combine the two.  Use glibc backtrace() if available, else fall back to what we have now, using __builtin_frame_address () instead on the platforms with easy stack walking code (i386 and ppc).
Comment 28 Kai Engert (:kaie) 2006-02-06 09:30:14 PST
Jakub Jelinek does not have a bugzilla account here, and asked me add this comment:

"If you are worrying about fixed buffer size, you can use some sane default. Say 100 or something like that, and if backtrace fills the whole array (returns 100), then retry with twice as big buffer."
Comment 29 Boris Zbarsky [:bz] 2006-02-06 10:32:06 PST
Kai, see comment 13 and comment 14.
Comment 30 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-06 12:35:49 PST
Created attachment 210915 [details] [diff] [review]
use __builtin_frame_address

I think __builtin_frame_address is better than inline assembler.  This patch also cleans up a few includes for stuff that's no longer in that file (these were copied to nsStackFrameUnix.cpp but never removed from the source) and fixes some warnings.
Comment 31 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-02-06 12:43:51 PST
Comment on attachment 210915 [details] [diff] [review]
use __builtin_frame_address

sr=shaver, but is __builtin_frame_address linux-only, or can we do this with gcc on any platforms?  (Should we be configure-testing for __builtin_frame_address instead?)
Comment 32 Ulrich Drepper 2006-02-06 12:52:51 PST
__builtin_frame_address is not platform specific.  Why should it?  Just think about it.  The compiler always has to know about the current location of the frame to access parameters, local variables, etc.  gcc and all compilers which claim to be gcc-compatible have this built-in.
Comment 33 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-02-06 12:57:51 PST
I did "think about it", and I realized that I didn't know enough about whether all gcc-supported computation models had a compatible notion of a frame.  Sorry.
Comment 34 Christopher Aillon (sabbatical, not receiving bugmail) 2006-02-09 14:22:14 PST
Comment on attachment 210915 [details] [diff] [review]
use __builtin_frame_address

I'll go ahead and mark r=caillon on this.  It gets the job done for me here, plus it seems to be the cleanest solution to the specific issue with JB_BP.  Ideally, we can get this working with backtrace() so other platforms, especially x86-64, can work with this code, but that might deserve a different bug anyway.
Comment 35 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-09 15:08:47 PST
__builtin_frame_address isn't Linux-specific or architecture specific, but some of the other stuff that the ifdef-ed code does (especially the stack frame layout, having the saved PC one word mathematically above the frame pointer) could be.  I'd rather expand the ifdefs based on what's known to work.
Comment 36 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-09 15:11:34 PST
Fix checked in to trunk.

Filed bug 326594 on using backtrace().
Comment 37 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-02-13 14:28:13 PST
Comment on attachment 210915 [details] [diff] [review]
use __builtin_frame_address

b181=shaver
Comment 38 Christopher Aillon (sabbatical, not receiving bugmail) 2006-02-14 10:24:54 PST
Created attachment 211873 [details] [diff] [review]
1.7.x branch patch

Merge conflicts, simple backport.  We should get this on the older branches as well.
Comment 39 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-14 11:19:42 PST
Fix checked in to MOZILLA_1_8_BRANCH.
Comment 40 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-15 11:09:50 PST
I'm curious why anyone cares about updating any branch other than the latest release branch to build on new systems.  Any new distributions should be using the latest release branch, not using a previous release branch, right?  I'd expect the maintainance of old release branches to be for existing systems, none of which have this new a version of glibc.
Comment 41 Christopher Aillon (sabbatical, not receiving bugmail) 2006-02-15 13:21:55 PST
Fedora Core 5 will ship with Mozilla 1.7.12
Comment 42 Benjamin Smedberg [:bsmedberg] 2006-02-15 13:29:46 PST
It would also be helpful to have our official binaries run on these newer glibcs.
Comment 43 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-15 13:37:31 PST
OK, yikes.  (I'm not sure there's any runtime issue here, though, but I suppose there could be.)

So I think we should approve this.
Comment 44 Daniel Veditz [:dveditz] 2006-02-16 12:01:46 PST
Comment on attachment 211873 [details] [diff] [review]
1.7.x branch patch

approving for old branches assuming you're now on the hook to help us sanity check these releases on linux, a=dveditz for drivers
Comment 45 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-17 17:15:33 PST
Checked caillon's 1.7.x branch patch into MOZILLA_1_7_BRANCH and AVIARY_1_0_1_20050124_BRANCH .
Comment 46 Kai Engert (:kaie) 2006-02-20 04:49:26 PST
*** Bug 324786 has been marked as a duplicate of this bug. ***
Comment 47 Daniel Veditz [:dveditz] 2006-02-22 00:57:19 PST
Comment on attachment 210915 [details] [diff] [review]
use __builtin_frame_address

approved for 1.8.0 branch, a=dveditz
Comment 48 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-02-23 21:47:23 PST
Fix checked in to MOZILLA_1_8_0_BRANCH.
Comment 49 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-03-18 09:59:10 PST
Created attachment 215509 [details] [diff] [review]
and apply the same fix elsewhere

We have other debugging code with the same problem...
Comment 50 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-03-18 17:27:35 PST
Comment on attachment 215509 [details] [diff] [review]
and apply the same fix elsewhere

Checked in to trunk.
Comment 51 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-03-20 14:56:03 PST
Created attachment 215705 [details] [diff] [review]
and fix jprof too

I'm rebuilding with -fno-omit-frame-pointer (which I think is now needed if you want frame pointers reliably) to test that this really works...
Comment 52 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-03-20 17:01:46 PST
So, FWIW, none of the stack trace tools work for me on FC5, even with -fno-omit-frame-pointer.  But I don't *think* this should break anything...
Comment 53 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-03-21 23:29:27 PST
And I tested on an RH 7.x system that the two give the same result (i.e., not a frame off, which would matter here more than elsewhere).
Comment 54 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-03-21 23:30:44 PST
Comment on attachment 215705 [details] [diff] [review]
and fix jprof too

Checked in to trunk and MOZILLA_1_8_BRANCH.
Comment 55 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-03-21 23:34:42 PST
Comment on attachment 215509 [details] [diff] [review]
and apply the same fix elsewhere

Checked in to MOZILLA_1_8_BRANCH.
Comment 56 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-03-23 00:13:04 PST
So it turns out __builtin_frame_address(0) stopped working on Fedora Core 5, so none of this new code actually works anymore.  The problem is described in more detail in bug 331436.  Unless somebody comes up with a decent solution, I'll probably scrap it and replace it with inline assembler.
Comment 57 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-03-24 17:17:22 PST
caillon: any new ideas from the toolchain people? :)
Comment 58 Daniel Brooks [:db48x] 2006-04-13 16:54:52 PDT
ok, it looks like the 7th patch in that bug makes it compile, if you go in and fix the rejects by hand
Comment 59 clewisf 2006-04-17 10:33:25 PDT
(In reply to comment #58)
> ok, it looks like the 7th patch in that bug makes it compile, if you go in and
> fix the rejects by hand
> 
Just a user comment here, Some IBM client software (Workplace WMC) wants you to build Mozilla 1.4.3 if you use a non-supported Linux version. It needs to be configured to use the the gtk2 library and the Xft font engine.  This bug keeps it from building on many versions.  Is there a fix for 1.4.3?
thanks,
fred

Note You need to log in before you can comment on or make changes to this bug.