Closed
Bug 154267
Opened 23 years ago
Closed 23 years ago
Reference Counting needs to be atomic throughout entire system
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: beos, Assigned: beos)
References
Details
Refcounting is only made atomic when objects are expected to be shared between
threads. But since BeOS has a thread per window, this isn't safe and the XPCOM
base class tends to delete objects twice, causing crashes. This can mainly be
seen on SMP machines.
There is no particular class this crashes in, and is not completely reproducable.
*** Bug 154268 has been marked as a duplicate of this bug. ***
Daniel, do you have any other information that may be helpful for this?
Comment 3•23 years ago
|
||
Paul, your description is right on. Typically, the symptom is a crash on exit or when closing a window. Since the refcounting is not atomic, objects tend to get deleted twice. I'm sure they are also leaking on occasion for the same reason.Mathias and I tried an experiment a number of months back to make all reference counting atomic. This would have been simple if everything was in a base class, but turned out to be quite hard since everything is done with macros. Unfortunately, we found a number of places which were not using a full 32 bit integer as the count value. Typically, they were using 31 bits for the count and one bit for a boolean in the same class - uggh. The compiler threw warnings about using atomic_add() on 31 bit ints, which is how we found it. Unfortunately I no longer have a list of files that had this problem, but it's quite easy to duplicate.The solution seems to be two parts:1. Clean up any classes which do not use a full 32 bit integer for the reference count variable.2. Add to the build system the capability, either explictly for BeOS or better yet for any platform which chooses to turn it on, to make all reference counting atomic in the entire app.This poses a serious stability problem as Paul mentioned. A single CPU machine will crash now and again, but a dual CPU often crashes within five or ten minutes of solid activity.
Comment 4•23 years ago
|
||
Let me post that again with correct word-wrapping (sorry):
Paul, your description is right on. Typically, the symptom is a crash on exit or
when closing a window. Since the refcounting is not atomic, objects tend to get
deleted twice. I'm sure they are also leaking on occasion for the same reason.
Mathias and I tried an experiment a number of months back to make all reference
counting atomic. This would have been simple if everything was in a base class,
but turned out to be quite hard since everything is done with macros.
Unfortunately, we found a number of places which were not using a full 32 bit
integer as the count value. Typically, they were using 31 bits for the count and
one bit for a boolean in the same class - uggh. The compiler threw warnings
about using atomic_add() on 31 bit ints, which is how we found it. Unfortunately
I no longer have a list of files that had this problem, but it's quite easy to
duplicate.
The solution seems to be two parts:
1. Clean up any classes which do not use a full 32 bit integer for the reference
count variable.
2. Add to the build system the capability, either explictly for BeOS or better
yet for any platform which chooses to turn it on, to make all reference counting
atomic in the entire app.
This poses a serious stability problem as Paul mentioned. A single CPU machine
will crash now and again, but a dual CPU often crashes within five or ten
minutes of solid activity.
I am setting a dependency to bug#92793, since these are related. That bug also
gives a little more of an explanation for how things work under BeOS, for people
that don't know.
Also, Makoto made a comment about the number of threads per window being 2,
which is wrong. The way it works is each window receives its own thread, and
the app has a thread. So, in a single window app, there are at least two threads.
Daniel, how significant of a change did you and Mathias make? I see in
nsISupportsImpl.h is where the macros are defined for adding/removing references
atomically. Did you have to change all files manually, or did you change a
global macro, so that it would always all the threadsafe macros? I would like
to try to at least get to a point where you had been, and then continue from there.
Depends on: 92793
Comment 6•23 years ago
|
||
Right, we changed it at a top level, so that no matter which macro an ISupports derivate
used, it always got the threadsafe version. Then just rebuild the world from the top, and
you'll run across build breakage where people aren't using the nsrefcnt typedef to get a
full 32 bit integer.
The question is whether you want to add another flavor of NS_MT_SUPPORTED for example, such
as NS_MT_REQUIRED_AT_ALL_TIMES_DAMMIT and then turn that on for BeOS in the makefiles. It
might take some correspondence with the XPCOM maintainers to see how they'd like to do
this, since it is kind of their holiest of holy base classes.
Comment 7•23 years ago
|
||
hmm. I do not see why this bug exists at all. rpotts gave a better solution to
BeOS tread-per-window woes in 92793.
Do you really want every nsISupports implementation to atomically modify the
refcount?
what solution? he said we need a "proxy", which we have. he then adds, that we need to
worry about our reference counting. it is my understanding that there is a lot of non-beos
specific code that calls addref/release, so, how can we be sure they are doing everything
on the correct thread? especially when, as rpotts stated, the thread per window goes
against the mozilla design, so, the non-beos code would "think" it is safe, when actually
it may not be. then again, I could be wrong, but, it makes sense to me, and I don't see
any other solution offerred.
Comment 9•23 years ago
|
||
Hi.
I thought that you could apartment the gecko code off onto one thread and proxy
to this thread to do any work that needs to be threadsafe.
I am assigning back to you. I do not think that making all refcounting atomic
would benefit all platforms. Feel free to write up a patch which makes ref
counting atomic on beos.
If you would like to talk about how to make mozilla better on BeOS, send me some
email. Maybe you, me, rpotts could talk about what this soon?
Assignee: dougt → arougthopher
Comment 10•23 years ago
|
||
The fact that some reference counting is NOT atomic is simply a reflection that
the rest of the code for the given component is NOT threadsafe either !
Making all XPCOM reference counts atomic will NOT solve the underlying problem
that non-threadsafe code is being accessed by multiple threads. It will merely
remove the 'annoying' ASSERTs that tell you so :-)
The only reason that the crashes occur in XPCOM is because it is doing 'more'
non-threadsafe operations... making the reference counting atomic will merely
move the crash to the next place (outside of XPCOM) where data will be
manipulated in a non-threadsafe manner.
In essence, this bug is saying 'make ALL of mozilla threadsafe'. I don't
believe that this is a realistic goal.
-- rick
Comment 11•23 years ago
|
||
what rick said. RefCounting isn't the problem. the problem is that Mozilla
isn't threadsafe
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
Comment 12•23 years ago
|
||
Thanks for the explanation Rick. It's unforunate that Mozilla is not threadsafe across the board, as this obviously limits its portability and usefulness. I understand that realistically this may never happen. However, the ref counting issue (in informal testing) does greatly reduce the number of crashes on BeOS, even if it doesn't solve the entire problem. I would still vote to make all reference counting atomic under BeOS as it has no drawbacks, and would give developers like me a reason to keep working on BeZilla. Paul, any thoughts?
Assignee | ||
Comment 13•23 years ago
|
||
Daniel, I actually have some ideas about this. Tonight, I am going to be working on
actuall documentation about the BeOS port, so that other developers will have an easier
job than I did, working on mozilla. Much of the documentation at mozilla.org does not
seem to have been updated in a while :( So, when I have that done, I will post it at
bezilla.org for review by other developers, and I'll send you a link.
also, when using Net+ for posting comments, remember to hit return at the end of the line,
or bugzilla won't wrap the comment string.
Comment 14•23 years ago
|
||
adding myself to CC.
Comment 15•23 years ago
|
||
Usability note for developers with SMP.
It seems that main crashers on SMP machines are tooltips.
People who disabled it reported MUCH longer work without crashes (if at all).
So, at least for development purposes, it is recommeded to disable tooltips on
Preferences->Appearance
You need to log in
before you can comment on or make changes to this bug.
Description
•