JB_BP not exported by newer glibc (2.4)

RESOLVED FIXED

Status

()

Core
XPCOM
--
major
RESOLVED FIXED
12 years ago
11 years ago

People

(Reporter: wolfiR, Assigned: dbaron)

Tracking

(4 keywords)

Trunk
x86
Linux
fixed-aviary1.0.8, fixed1.7.13, fixed1.8.0.2, fixed1.8.1
Points:
---
Bug Flags:
blocking1.8.0.2 +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [nvn-dl])

Attachments

(5 attachments, 5 obsolete attachments)

4.78 KB, patch
Details | Diff | Splinter Review
5.29 KB, patch
Christopher Aillon (sabbatical, not receiving bugmail)
: review+
shaver
: superreview+
Details | Diff | Splinter Review
3.89 KB, patch
Details | Diff | Splinter Review
4.58 KB, patch
brendan
: review+
Details | Diff | Splinter Review
3.29 KB, patch
Brian Ryner (not reading)
: review+
Brian Ryner (not reading)
: superreview+
Brian Ryner (not reading)
: approval-branch-1.8.1+
Details | Diff | Splinter Review
(Reporter)

Description

12 years ago
JB_BP is not exported anymore by glibc 2.4pre.
So Linux i386 and PPC are affected.
I'm testing a patch to use backtrace() glibc functions right now and will attach it if seems to work.
(Reporter)

Comment 1

12 years ago
glibc documentation about backtrace() stuff:
http://www.delorie.com/gnu/docs/glibc/libc_665.html
(Reporter)

Comment 2

12 years ago
Created attachment 208843 [details] [diff] [review]
patch (part 1)

This patch uses the backtrace() glibc functions if glibc 2.3 or newer is used.
It's mandatory only for glibc 2.4. But using the backtrace functions enables more platforms to be able to dump a stacktrace.
(This has to be activated at caller side, too. Therefore this is "part 1")
(Reporter)

Comment 3

12 years ago
Created attachment 208878 [details] [diff] [review]
patch (complete)

This patch adds a dummy function for Linux platforms where stacktrace isn't implemented and changes the callers accordingly. To make the logic for the callers simpler.
Attachment #208843 - Attachment is obsolete: true
(Reporter)

Updated

12 years ago
Attachment #208878 - Flags: review?(benjamin)
Attachment #208878 - Flags: review?(benjamin) → review+
(Reporter)

Updated

12 years ago
Attachment #208878 - Flags: superreview?(shaver)
Comment on attachment 208878 [details] [diff] [review]
patch (complete)

> static char _progname[1024] = "huh?";
> 
>-#if defined(LINUX) && defined(DEBUG) && (defined(__i386) || defined(PPC))
>+#if defined(LINUX) && defined(DEBUG)
> #define CRAWL_STACK_ON_SIGSEGV
> #endif

Doesn't this #if also need the glibc version check?  Otherwise older-glibc Linuxes that aren't on PPC or x86 will define CRAWL_STACK_ON_SIGSEGV but have no way to do it.  Likely a build-killer there, and I don't know what glibc version is seen on various ARM Linux setups, etc.
Attachment #208878 - Flags: superreview?(shaver) → superreview-
(Reporter)

Comment 5

12 years ago
(In reply to comment #4)
> (From update of attachment 208878 [details] [diff] [review] [edit])
> > static char _progname[1024] = "huh?";
> > 
> >-#if defined(LINUX) && defined(DEBUG) && (defined(__i386) || defined(PPC))
> >+#if defined(LINUX) && defined(DEBUG)
> > #define CRAWL_STACK_ON_SIGSEGV
> > #endif
> 
> Doesn't this #if also need the glibc version check?  Otherwise older-glibc
> Linuxes that aren't on PPC or x86 will define CRAWL_STACK_ON_SIGSEGV but have
> no way to do it.  Likely a build-killer there, and I don't know what glibc
> version is seen on various ARM Linux setups, etc.

hmm, basically right, but there was no glibc check here before.
So it should just output "not implemented" instead of the stack.
That's one of the reasons for the not implemented case in nsStackFrameUnix.cpp

Comment on attachment 208878 [details] [diff] [review]
patch (complete)

Ah, OK.  So people on those platforms will get a "write me, dammit" output when they crash, rather than a stack.  I can live with that, though it's likely to be a little confusing (and make people wonder at first if their crash is due to a call of an unimplemented methods, etc).

I can live with that, though.  Thanks for the clarification.
Attachment #208878 - Flags: superreview- → superreview+
(Reporter)

Comment 7

12 years ago
Created attachment 209335 [details] [diff] [review]
patch (slightly changed)

This patch addresses the comments with minor tweaks:
- we could have called DumpStackToFile with non-glibc Linux systems (are there any?) without implementing the function on those systems
- changed wording of the not implemented case to
  "Info: Stacktrace not implemented for this Linux platform"
- define CRAWL_STACK_ON_SIGSEGV only if there is a better chance to have a useful
  stacktrace if it's a Linux glibc system
(Reporter)

Comment 8

12 years ago
Checking in xpcom/base/nsStackFrameUnix.cpp;
/cvsroot/mozilla/xpcom/base/nsStackFrameUnix.cpp,v  <--  nsStackFrameUnix.cpp
new revision: 1.13; previous revision: 1.12
done
Checking in xpcom/base/nsTraceRefcntImpl.cpp;
/cvsroot/mozilla/xpcom/base/nsTraceRefcntImpl.cpp,v  <--  nsTraceRefcntImpl.cpp
new revision: 1.97; previous revision: 1.96
done
Checking in xpfe/bootstrap/nsSigHandlers.cpp;
/cvsroot/mozilla/xpfe/bootstrap/nsSigHandlers.cpp,v  <--  nsSigHandlers.cpp
new revision: 1.43; previous revision: 1.42
done
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → FIXED
So... this changed the format of the data output by this code, no?  What happened to updating the tools that are used to post-process said data?  As things stand, refcount logging, for example, became nearly useless when this landed because the tree-balancer and so forth can't handle the data anymore.

I've backed this out of my tree locally so that I can continue to debug leaks, but we should have a better solution than that, since I presume sometimes this patch really is needed.

Would you like a separate bug on fixing things to work again, or should I reopen this one?
Also, doesn't that patch limit the stack to 20 deep?  Why was that done?  Why 20?  If nothing else, if I understand what backtrace() does, 20 is way too low to work -- we have stacks with far more frames on them quite routinely.
(Reporter)

Comment 11

12 years ago
(In reply to comment #9)
> So... this changed the format of the data output by this code, no?  What
> happened to updating the tools that are used to post-process said data?  As
> things stand, refcount logging, for example, became nearly useless when this
> landed because the tree-balancer and so forth can't handle the data anymore.

Sorry, I wasn't aware of tools which analyze that data.
And I'm not sure if there is a way to get the old format back.
A workaround would be to change the glibc version check to only use the new
code if glibc is >= 2.4. (This would still break some platforms as current devel glibc release seems to be 2.3.90 and JB_BP is already removed.

(In reply to comment #10)
> Also, doesn't that patch limit the stack to 20 deep?  Why was that done?  Why
> 20?  If nothing else, if I understand what backtrace() does, 20 is way too low
> to work -- we have stacks with far more frames on them quite routinely.

I agree that 20 might be too low. But AFAIK we have to choose a fixed number.
What would you propose?
I think it's OK to reopen this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Sounds like we need a configure test for JP_BP, so that only those cases which wouldn't work with the current (previous) code anyway get the new backtrace() behaviour.  (Or, better, someone could write another way to walk the stack without JP_BP so that we could preserve the output format and tool utility.)

If backtrace() makes us pick a maximum number of frames, it's definitely broken for our purposes and we should just revert and start over.
> Sorry, I wasn't aware of tools which analyze that data.

They're at least in tools/rb; I don't think we have other tools, but I could be wrong...

> And I'm not sure if there is a way to get the old format back.

That might be fine if we can fix the scripts to deal with both formats.

> I agree that 20 might be too low. But AFAIK we have to choose a fixed number.
> What would you propose?

The only way I can see of making backtrace() work for us is to pick a large fixed number and then do something like:

retry:
  entries = backtrace(array, size);
  if (entries == size) {
    // allocate new array with bigger size
    goto retry;
  }
  // call backtrace_symbols_fd

(except with better flow control than using goto, hopefully).

Any choice of fixed number without something like the above is going to probably cause breakage in some cases, unless we pick it big enough to guarantee that that many frames would case stack overflow...
I don't think we can allocate memory while we're doing backtraces, e.g., in the segfault handler.

OTOH I don't see why a limit of say 1000 would be any real constraint.

Comment 15

12 years ago
Created attachment 210230 [details] [diff] [review]
Patch for Seamonkey 1.0
Comment on attachment 208878 [details] [diff] [review]
patch (complete)

We'll need this on the various branches.  My builds are failing with this error trying to deploy 1.5.0.1.  Patching my RPMs for now, but would be good to get this in the next set of releases.
Attachment #208878 - Flags: branch-1.8.1?
Attachment #208878 - Flags: approval1.8.0.2?
Attachment #208878 - Flags: approval1.7.13?
Attachment #208878 - Flags: approval-aviary1.0.8?
Comment on attachment 208878 [details] [diff] [review]
patch (complete)

Well, except this doesn't apply to the branches.  :-)

I'll post a backport in a few.
Attachment #208878 - Flags: branch-1.8.1?
Attachment #208878 - Flags: approval1.8.0.2?
Attachment #208878 - Flags: approval1.7.13?
Attachment #208878 - Flags: approval-aviary1.0.8?
Actually, bz raised a good point that this shouldn't even be being built for releases.  Filed bug 325537
No, I don't think we want that patch on the branches, because it breaks our memory-leak detection tools.  What we want on the branch is either a patch to use some alternate means of walking the stack so that we can preserve the format (best), or a configure test to only use backtrace() when JB_BP is not available and the build would therefore fail.

We should back this patch out of the trunk shortly, if nobody is going to be able to commit to providing a compatibility-restoring fix, say by Saturday?
(Assignee)

Comment 20

12 years ago
I agree that this should be backed out and should not be taken on branches.
(Reporter)

Comment 21

12 years ago
I also agree for now. But anyway we need a solution soon, while I don't have one at the moment :-(
The upcoming distributions will use something like glibc 2.3.90 which doesn't export JB_BP.
timeless backed this out on trunk.
(Assignee)

Comment 23

12 years ago
The "trivial" solution (much less invasive, anyway) to getting the frame pointer should be to use inline __asm__ rather than setjmp.
(Assignee)

Comment 24

12 years ago
Created attachment 210815 [details] [diff] [review]
patch to use inline assembler

I tested that this works on x86; the bits of ppc and sparc (for fun!) assembly are completely untested.
Attachment #210815 - Flags: review?(mozilla)
(Reporter)

Comment 25

12 years ago
Comment on attachment 210815 [details] [diff] [review]
patch to use inline assembler

The patch looks OK for certain circumstances. At least it should keep the functionality (when -fomit-frame-pointer is not used as compiler flag)

I'm about checking out if there are other options which keep the output but are more portable.
Please stay tuned.
(Reporter)

Updated

12 years ago
Attachment #210815 - Flags: review?(mozilla) → review+
(Reporter)

Comment 26

12 years ago
Created attachment 210837 [details] [diff] [review]
patch using backtrace() and keep demangling

This patch uses backtrace() again but should keep the output format as it was.
There is still the problem that we have a fixed number (here 1024) of stack depth.
I'm fine with dbaron's patch as it fixes the compile error but looking at the other architectures it might be interesting to get this one ready.
Attachment #208878 - Attachment is obsolete: true
Attachment #209335 - Attachment is obsolete: true
Attachment #210230 - Attachment is obsolete: true
(Assignee)

Updated

12 years ago
Attachment #210815 - Flags: superreview?(shaver)
So, I had a talk with our toolchain folks who suggest using

__builtin_frame_address (0);

which has been around in gcc since the 2.x days.  We should also check to make sure though that we are not building with -fomit-frame-pointer because that will make walking the stack pretty difficult :-)

To get better portability, we should use glibc backtrace() since that will work on platforms where it is otherwise impossible to get to the unwind information, such as s390(x), x86-64 and ia64.

We should probably combine the two.  Use glibc backtrace() if available, else fall back to what we have now, using __builtin_frame_address () instead on the platforms with easy stack walking code (i386 and ppc).

Comment 28

12 years ago
Jakub Jelinek does not have a bugzilla account here, and asked me add this comment:

"If you are worrying about fixed buffer size, you can use some sane default. Say 100 or something like that, and if backtrace fills the whole array (returns 100), then retry with twice as big buffer."
Kai, see comment 13 and comment 14.
(Assignee)

Comment 30

12 years ago
Created attachment 210915 [details] [diff] [review]
use __builtin_frame_address

I think __builtin_frame_address is better than inline assembler.  This patch also cleans up a few includes for stuff that's no longer in that file (these were copied to nsStackFrameUnix.cpp but never removed from the source) and fixes some warnings.
Attachment #210815 - Attachment is obsolete: true
Attachment #210915 - Flags: superreview?(shaver)
Attachment #210915 - Flags: review?(mozilla)
Attachment #210815 - Flags: superreview?(shaver)
Comment on attachment 210915 [details] [diff] [review]
use __builtin_frame_address

sr=shaver, but is __builtin_frame_address linux-only, or can we do this with gcc on any platforms?  (Should we be configure-testing for __builtin_frame_address instead?)
Attachment #210915 - Flags: superreview?(shaver) → superreview+

Comment 32

12 years ago
__builtin_frame_address is not platform specific.  Why should it?  Just think about it.  The compiler always has to know about the current location of the frame to access parameters, local variables, etc.  gcc and all compilers which claim to be gcc-compatible have this built-in.
I did "think about it", and I realized that I didn't know enough about whether all gcc-supported computation models had a compatible notion of a frame.  Sorry.
Comment on attachment 210915 [details] [diff] [review]
use __builtin_frame_address

I'll go ahead and mark r=caillon on this.  It gets the job done for me here, plus it seems to be the cleanest solution to the specific issue with JB_BP.  Ideally, we can get this working with backtrace() so other platforms, especially x86-64, can work with this code, but that might deserve a different bug anyway.
Attachment #210915 - Flags: review?(mozilla) → review+
(Assignee)

Updated

12 years ago
Assignee: dougt → dbaron
Status: REOPENED → NEW
(Assignee)

Comment 35

12 years ago
__builtin_frame_address isn't Linux-specific or architecture specific, but some of the other stuff that the ifdef-ed code does (especially the stack frame layout, having the saved PC one word mathematically above the frame pointer) could be.  I'd rather expand the ifdefs based on what's known to work.
(Assignee)

Comment 36

12 years ago
Fix checked in to trunk.

Filed bug 326594 on using backtrace().
Status: NEW → RESOLVED
Last Resolved: 12 years ago12 years ago
Resolution: --- → FIXED
(Assignee)

Updated

12 years ago
Attachment #210915 - Flags: branch-1.8.1?(shaver)
Comment on attachment 210915 [details] [diff] [review]
use __builtin_frame_address

b181=shaver
Attachment #210915 - Flags: approval-branch-1.8.1?(shaver) → approval-branch-1.8.1+
Created attachment 211873 [details] [diff] [review]
1.7.x branch patch

Merge conflicts, simple backport.  We should get this on the older branches as well.
Attachment #211873 - Flags: approval1.7.13?
Attachment #211873 - Flags: approval-aviary1.0.8?
(Assignee)

Comment 39

12 years ago
Fix checked in to MOZILLA_1_8_BRANCH.
Keywords: fixed1.8.1
(Assignee)

Comment 40

12 years ago
I'm curious why anyone cares about updating any branch other than the latest release branch to build on new systems.  Any new distributions should be using the latest release branch, not using a previous release branch, right?  I'd expect the maintainance of old release branches to be for existing systems, none of which have this new a version of glibc.
Fedora Core 5 will ship with Mozilla 1.7.12
It would also be helpful to have our official binaries run on these newer glibcs.
(Assignee)

Comment 43

12 years ago
OK, yikes.  (I'm not sure there's any runtime issue here, though, but I suppose there could be.)

So I think we should approve this.
Comment on attachment 211873 [details] [diff] [review]
1.7.x branch patch

approving for old branches assuming you're now on the hook to help us sanity check these releases on linux, a=dveditz for drivers
Attachment #211873 - Flags: approval1.7.13?
Attachment #211873 - Flags: approval1.7.13+
Attachment #211873 - Flags: approval-aviary1.0.8?
Attachment #211873 - Flags: approval-aviary1.0.8+
(Assignee)

Updated

12 years ago
Attachment #210915 - Flags: approval1.8.0.2?
(Assignee)

Comment 45

12 years ago
Checked caillon's 1.7.x branch patch into MOZILLA_1_7_BRANCH and AVIARY_1_0_1_20050124_BRANCH .
Keywords: fixed-aviary1.0.8, fixed1.7.13

Comment 46

12 years ago
*** Bug 324786 has been marked as a duplicate of this bug. ***
Flags: blocking1.8.0.2+
Comment on attachment 210915 [details] [diff] [review]
use __builtin_frame_address

approved for 1.8.0 branch, a=dveditz
Attachment #210915 - Flags: approval1.8.0.2? → approval1.8.0.2+
(Assignee)

Comment 48

12 years ago
Fix checked in to MOZILLA_1_8_0_BRANCH.
Keywords: fixed1.8.0.2

Updated

12 years ago
Whiteboard: [nvn-dl]
(Assignee)

Comment 49

11 years ago
Created attachment 215509 [details] [diff] [review]
and apply the same fix elsewhere

We have other debugging code with the same problem...
Attachment #215509 - Flags: review?(brendan)
Attachment #215509 - Flags: approval-branch-1.8.1?(brendan)

Updated

11 years ago
Attachment #215509 - Flags: review?(brendan)
Attachment #215509 - Flags: review+
Attachment #215509 - Flags: approval-branch-1.8.1?(brendan)
Attachment #215509 - Flags: approval-branch-1.8.1+
(Assignee)

Comment 50

11 years ago
Comment on attachment 215509 [details] [diff] [review]
and apply the same fix elsewhere

Checked in to trunk.
(Assignee)

Comment 51

11 years ago
Created attachment 215705 [details] [diff] [review]
and fix jprof too

I'm rebuilding with -fno-omit-frame-pointer (which I think is now needed if you want frame pointers reliably) to test that this really works...
Attachment #215705 - Flags: superreview?(bryner)
Attachment #215705 - Flags: review?(bryner)
Attachment #215705 - Flags: superreview?(bryner)
Attachment #215705 - Flags: superreview+
Attachment #215705 - Flags: review?(bryner)
Attachment #215705 - Flags: review+
(Assignee)

Comment 52

11 years ago
So, FWIW, none of the stack trace tools work for me on FC5, even with -fno-omit-frame-pointer.  But I don't *think* this should break anything...
(Assignee)

Updated

11 years ago
Attachment #215705 - Flags: approval-branch-1.8.1?(bryner)
Attachment #215705 - Flags: approval-branch-1.8.1?(bryner) → approval-branch-1.8.1+
(Assignee)

Comment 53

11 years ago
And I tested on an RH 7.x system that the two give the same result (i.e., not a frame off, which would matter here more than elsewhere).
(Assignee)

Comment 54

11 years ago
Comment on attachment 215705 [details] [diff] [review]
and fix jprof too

Checked in to trunk and MOZILLA_1_8_BRANCH.
(Assignee)

Comment 55

11 years ago
Comment on attachment 215509 [details] [diff] [review]
and apply the same fix elsewhere

Checked in to MOZILLA_1_8_BRANCH.
(Assignee)

Comment 56

11 years ago
So it turns out __builtin_frame_address(0) stopped working on Fedora Core 5, so none of this new code actually works anymore.  The problem is described in more detail in bug 331436.  Unless somebody comes up with a decent solution, I'll probably scrap it and replace it with inline assembler.
caillon: any new ideas from the toolchain people? :)
ok, it looks like the 7th patch in that bug makes it compile, if you go in and fix the rejects by hand

Comment 59

11 years ago
(In reply to comment #58)
> ok, it looks like the 7th patch in that bug makes it compile, if you go in and
> fix the rejects by hand
> 
Just a user comment here, Some IBM client software (Workplace WMC) wants you to build Mozilla 1.4.3 if you use a non-supported Linux version. It needs to be configured to use the the gtk2 library and the Xft font engine.  This bug keeps it from building on many versions.  Is there a fix for 1.4.3?
thanks,
fred
You need to log in before you can comment on or make changes to this bug.