Closed Bug 343283 Opened 17 years ago Closed 17 years ago

SeaMonkey segfaults on FreeBSD when using PrefBar since version 3.3 (wrong handling of malloc?)

Categories

(Core :: DOM: Core & HTML, defect, P2)

x86
FreeBSD
defect

Tracking

()

VERIFIED FIXED
mozilla1.9alpha1

People

(Reporter: Manuel.Spam, Assigned: mrbkap)

References

()

Details

(Keywords: fixed1.8.1, verified1.8.0.7)

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.8.0.2) Gecko/20060409 SeaMonkey/1.0.1
Build Identifier: 

This bug had been originally filed for PrefBar (see URL above). Thank you very much to j. schade who did really great work to find the reason of this bug. His unchanged comments follow:



------- Comment  #18 From j. schade  2006-06-26 15:02:23 -------

Well, since it was raining yesterday here in the Caribbean I finally
found time to look at this issue. Here is what I found out so far:

Using newer perfbars (>=3.3) gives me a sig11 on FreeBSD-5.4. It
happens in libgklayout.so:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 100116)]
0x29a9599b in nsGlobalWindow::RunTimeout (this=0x8393500, aTimeout=0x8935000)
at nsGlobalWindow.cpp:6378
6378          timeout->mArgv[timeout->mArgc] =
Current language:  auto; currently c++
(gdb) p timeout->mArgc
$1 = 0
(gdb) p timeout->mArgv
$2 = (jsval *) 0x800
(gdb) p timeout->mArgv[timeout->mArgc]
Error accessing memory address 0x800: Bad address.


The corresponding code in nsGlobalWindow.cpp:

      ...
      PRTime lateness = now - timeout->mWhen;

      timeout->mArgv[timeout->mArgc] =
        INT_TO_JSVAL((jsint)(lateness / PR_USEC_PER_MSEC));

      jsval dummy;
      scx->CallEventHandler(mJSObject, timeout->mFunObj, timeout->mArgc + 1,
                            timeout->mArgv, &dummy);
      ...


Now, where does the funny 0x800 come from? Throwing in some debugging
printf's revealed that when timeout->mArgv had "some reasonable" values
(like 0x83655c0, 0x88ef8e0, 0x843d070, ...) nothing bad happens. The
special value 0x800 was created when the following code (in the same file)
was called with an argc of 1:

  ...
  } else if (funobj) {
    /* Leave an extra slot for a secret final argument that
       indicates to the called function how "late" the timeout is. */
    timeout->mArgv = (jsval *) PR_MALLOC((argc - 1) * sizeof(jsval));

    if (!timeout->mArgv) {
      timeout->Release(scx);
  ...


This gives us a malloc() with 0 as arg. On FreeBSD, a malloc(0)
returns some constant value (which is 0x800 on my architecture)
which, in fact, does not actually point to a usable memory region.

While one can argue what result SHOULD be returned in case of doing
a malloc(0), firefox seems to behave properly if it receives NULL
as a result (since the timeout gets released immediately in this case).

The behaviour of FreeBSD can be changed according to the the malloc(3)
manpage by setting some malloc flags:

...
  V    Attempting to allocate zero bytes will return a NULL pointer
       instead of a valid pointer.  (The default behavior is to make a
       minimal allocation and return a pointer to it.)  This option is
       provided for System V compatibility.  This option is incompatible
       with the ``X'' option.
...

So I gave it a try by running

MALLOC_OPTIONS=V firefox

and firefox didn't crash anymore and prefbar was running.

Therefore I come to the point that:

1. It isn't prefbar's fault, it only reveals the problem.
2. It is an interaction of firefox and FreeBSD's funky malloc()
   implementation.


------- Comment  #20 From j. schade  2006-06-28 13:23:25 -------

> If I understand you right, then the problem is solved immediately if you use
> "MALLOC_OPTIONS=V firefox"?

Yes. I have added it to /usr/X11R6/lib/firefox/run-mozilla.sh.
Other options are the use of /etc/malloc.conf or _malloc_options (see the
malloc manpage).

> If so, then this is in fact a core bug. I'm unable to fix it with PrefBar code
> so I could finally mark this heavy bug fixed.

I would say "yes". However, I would like to see others confirm
that it works on their systems as well.

> Do you allow me to paste your whole comment as new core-bug to
> bugzilla.mozilla.org?

Of course. However, I don't know if we should consider it a firefox bug.
IF malloc(0) returns NULL (as it does with MALLOC_OPTIONS=V) firefox
behaves properly. If it is wise to return something Non-NULL (as it is done
in FreeBSD) is a different question.

I will send a message to the -hackers list and we will see.




------- Comment  #21 From j. schade  2006-06-29 08:33:38 -------

OK, I have just learned from the freebsd-hackers list (thanks to
the people who answered there) that:

The C standard explicitly allows both behaviours, and requires the
implementation to choose one of them.  If it returns a non-NULL
pointer, though, that pointer can *only* be used for passing back to   
free().  It may *not* be dereferenced.  So firefox is wrong, and
actually broken.

So we can safely consider it as a bug in firefox. A patch could be:

--- mozilla/dom/src/base/nsGlobalWindow.cpp.ORI Fri Apr 21 23:36:30 2006
+++ mozilla/dom/src/base/nsGlobalWindow.cpp     Thu Jun 29 14:29:12 2006
@@ -6079,7 +6079,7 @@
        indicates to the called function how "late" the timeout is. */
     timeout->mArgv = (jsval *) PR_MALLOC((argc - 1) * sizeof(jsval));

-    if (!timeout->mArgv) {
+    if (!timeout->mArgv || argc==1) {
       timeout->Release(scx);

       return NS_ERROR_OUT_OF_MEMORY;

However, I have no idea if the whole (argc - 1) construct could
be wrong since I don't know what this function is exactly doing :-)


Reproducible: Didn't try

Steps to Reproduce:
To reproduce this bug you may need to install PrefBar Version 3.3 or higher.
Actual Results:  
SeaMonkey segfaults on FreeBSD if you try to setup PrefBar

Expected Results:  
No crash ;-)
OS: Other → FreeBSD
Assignee: nobody → general
Component: General → DOM
QA Contact: general → ian
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
Target Milestone: --- → mozilla1.9alpha
Attached patch Fix for trunkSplinter Review
This seems worth fixing on all branches. Writing to memory returned from malloc(0) is bad, and that it doesn't crash on all platforms is really lucky.
Assignee: general → mrbkap
Status: NEW → ASSIGNED
Attachment #227858 - Flags: superreview?(jst)
Attachment #227858 - Flags: review?(jst)
It seems that fixing this improves overall stability as well.
Earlier, I had randomly appearing segfaults from time to time,
especially when clicking around with the mouse like crazy.
I'm still using my MALLOC_OPTIONS=V fix here on FreeBSD and
now it seems that these problems are gone as well.
(In reply to comment #2)
> It seems that fixing this improves overall stability as well.

When you say "this", do you mean this specific problem, or using MALLOC_OPTIONS=V on FreeBSD? If you mean the latter, then I hope that you'll file bugs on the remaining crashes, since they are real bugs, and that we don't crash is IMO an unlucky artifact of malloc implementations. If you mean the former, then we really should fix this bug on all branches.

> When you say "this", do you mean this specific problem, or using
> MALLOC_OPTIONS=V on FreeBSD? If you mean the latter, then I hope that you'll

I am using only MALLOC_OPTIONS=V, not the patch. I am currently not able to
recompile firefox to test the patch (holiday's over :-().

When the patch has been incorporated and I have found the time to recompile
the whole thing I will happily remove MALLOC_OPTIONS=V and see what happens.

I only wanted to point out that MALLOC_OPTIONS=V seems to improve overall
stability. Since the code in question seems to have to do with timeout
handling, there is IMHO a good possibility that dereferencing the return
value of THIS malloc(0) could have been the cause for other sporadic crashes
(not only the crash of the prefbar extension). However, I really don't know how
many other similar malloc(0) constructs elsewhere are being "fixed" when using
MALLOC_OPTIONS=V. I was already happy being able to track down this one :-).
(In reply to comment #4)
> When the patch has been incorporated and I have found the time to recompile
> the whole thing I will happily remove MALLOC_OPTIONS=V and see what happens.

Great! Thanks for tracking this down.
Comment on attachment 227858 [details] [diff] [review]
Fix for trunk

r+sr=jst
Attachment #227858 - Flags: superreview?(jst)
Attachment #227858 - Flags: superreview+
Attachment #227858 - Flags: review?(jst)
Attachment #227858 - Flags: review+
This bug fixes a use of malloc(0)'d memory, which is bad.
Flags: blocking1.8.1?
Flags: blocking1.8.0.6?
Flags: blocking1.8.1? → blocking1.8.1+
Fix checked into trunk.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Attached patch Fix for branchSplinter Review
Attachment #228870 - Flags: superreview?(jst)
Attachment #228870 - Flags: review?(jst)
Attachment #228870 - Flags: approval1.8.1?
Comment on attachment 228870 [details] [diff] [review]
Fix for branch

r+sr=jst
Attachment #228870 - Flags: superreview?(jst)
Attachment #228870 - Flags: superreview+
Attachment #228870 - Flags: review?(jst)
Attachment #228870 - Flags: review+
Attachment #228870 - Flags: approval1.8.1? → approval1.8.1+
Checked into the 1.8 branch.
Keywords: fixed1.8.1
Comment on attachment 228870 [details] [diff] [review]
Fix for branch

This falls into the 'stability' catagory of fixes. I'm not sure if it's severe enough for 1.8.0, but I thought I'd try anyway.
Attachment #228870 - Flags: approval1.8.0.6?
Since the patch didn't show up in recent firefox sources yet
(talking about 1.5.0.6) I patched nsGlobalWindow.cpp manually
and removed the MALLOC_OPTION statement. Everything seems to
work well so far (prefbar can be installed and no crashes).

BTW, what has to be done to import this bugfix into the
firefox sources officially?
(In reply to comment #13)
> BTW, what has to be done to import this bugfix into the
> firefox sources officially?

The patch has been landed on the trunk (for Firefox 3) and the 1.8 branch (for Firefox 2). It's waiting for approval to be landed on the 1.8.0 branch for Firefox 1.5.0.7 (the approval1.8.0.7? flag).
Flags: blocking1.8.0.7? → blocking1.8.0.7+
Comment on attachment 228870 [details] [diff] [review]
Fix for branch

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #228870 - Flags: approval1.8.0.7? → approval1.8.0.7+
Fixed on the 1.8.0 branch.
Keywords: fixed1.8.0.7
two questions:

1) Are you running BSD builds or using the ELF binaries?

2) Does this apply to Firefox as well as SeaMonkey?
(In reply to comment #17)
> two questions:
> 
> 1) Are you running BSD builds or using the ELF binaries?

Who is "you"? In case it's me: I run my own self-compiled
binaries from the FreeBSD ports system.

> 
> 2) Does this apply to Firefox as well as SeaMonkey?

I debugged the bug with firefox. Don't know about SeaMonkey.
I now run a patched version of firefox and the bug is gone.
Hi js,

> (In reply to comment #17)
> > 1) Are you running BSD builds or using the ELF binaries?
> 
> Who is "you"? In case it's me: I run my own self-compiled
> binaries from the FreeBSD ports system.

well, by "you" I meant the Reporter. Sorry if that sounded rude. :)

> > 2) Does this apply to Firefox as well as SeaMonkey?
> 
> I debugged the bug with firefox. Don't know about SeaMonkey.
> I now run a patched version of firefox and the bug is gone.

I guess the reporter was running SeaMonkey in an ELF binary, according to the description, so I guess that answers that question.

thanks!
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #17)
> 2) Does this apply to Firefox as well as SeaMonkey?

This patch applies equally to SeaMonkey, Firefox or any other Gecko embedding. It is in core code that is shared by everybody.

I'm guessing that you didn't mean to REOPEN this bug. I'm re-marking this as fixed, or is there a reason it shouldn't be?
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.