Closed Bug 154267 Opened 23 years ago Closed 23 years ago

Reference Counting needs to be atomic throughout entire system

Categories

(Core :: XPCOM, defect)

x86
BeOS
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX

People

(Reporter: beos, Assigned: beos)

References

Details

Refcounting is only made atomic when objects are expected to be shared between threads. But since BeOS has a thread per window, this isn't safe and the XPCOM base class tends to delete objects twice, causing crashes. This can mainly be seen on SMP machines. There is no particular class this crashes in, and is not completely reproducable.
*** Bug 154268 has been marked as a duplicate of this bug. ***
Daniel, do you have any other information that may be helpful for this?
Paul, your description is right on. Typically, the symptom is a crash on exit or when closing a window. Since the refcounting is not atomic, objects tend to get deleted twice. I'm sure they are also leaking on occasion for the same reason.Mathias and I tried an experiment a number of months back to make all reference counting atomic. This would have been simple if everything was in a base class, but turned out to be quite hard since everything is done with macros. Unfortunately, we found a number of places which were not using a full 32 bit integer as the count value. Typically, they were using 31 bits for the count and one bit for a boolean in the same class - uggh. The compiler threw warnings about using atomic_add() on 31 bit ints, which is how we found it. Unfortunately I no longer have a list of files that had this problem, but it's quite easy to duplicate.The solution seems to be two parts:1. Clean up any classes which do not use a full 32 bit integer for the reference count variable.2. Add to the build system the capability, either explictly for BeOS or better yet for any platform which chooses to turn it on, to make all reference counting atomic in the entire app.This poses a serious stability problem as Paul mentioned. A single CPU machine will crash now and again, but a dual CPU often crashes within five or ten minutes of solid activity.
Let me post that again with correct word-wrapping (sorry): Paul, your description is right on. Typically, the symptom is a crash on exit or when closing a window. Since the refcounting is not atomic, objects tend to get deleted twice. I'm sure they are also leaking on occasion for the same reason. Mathias and I tried an experiment a number of months back to make all reference counting atomic. This would have been simple if everything was in a base class, but turned out to be quite hard since everything is done with macros. Unfortunately, we found a number of places which were not using a full 32 bit integer as the count value. Typically, they were using 31 bits for the count and one bit for a boolean in the same class - uggh. The compiler threw warnings about using atomic_add() on 31 bit ints, which is how we found it. Unfortunately I no longer have a list of files that had this problem, but it's quite easy to duplicate. The solution seems to be two parts: 1. Clean up any classes which do not use a full 32 bit integer for the reference count variable. 2. Add to the build system the capability, either explictly for BeOS or better yet for any platform which chooses to turn it on, to make all reference counting atomic in the entire app. This poses a serious stability problem as Paul mentioned. A single CPU machine will crash now and again, but a dual CPU often crashes within five or ten minutes of solid activity.
I am setting a dependency to bug#92793, since these are related. That bug also gives a little more of an explanation for how things work under BeOS, for people that don't know. Also, Makoto made a comment about the number of threads per window being 2, which is wrong. The way it works is each window receives its own thread, and the app has a thread. So, in a single window app, there are at least two threads. Daniel, how significant of a change did you and Mathias make? I see in nsISupportsImpl.h is where the macros are defined for adding/removing references atomically. Did you have to change all files manually, or did you change a global macro, so that it would always all the threadsafe macros? I would like to try to at least get to a point where you had been, and then continue from there.
Depends on: 92793
Blocks: 125156
Right, we changed it at a top level, so that no matter which macro an ISupports derivate used, it always got the threadsafe version. Then just rebuild the world from the top, and you'll run across build breakage where people aren't using the nsrefcnt typedef to get a full 32 bit integer. The question is whether you want to add another flavor of NS_MT_SUPPORTED for example, such as NS_MT_REQUIRED_AT_ALL_TIMES_DAMMIT and then turn that on for BeOS in the makefiles. It might take some correspondence with the XPCOM maintainers to see how they'd like to do this, since it is kind of their holiest of holy base classes.
hmm. I do not see why this bug exists at all. rpotts gave a better solution to BeOS tread-per-window woes in 92793. Do you really want every nsISupports implementation to atomically modify the refcount?
what solution? he said we need a "proxy", which we have. he then adds, that we need to worry about our reference counting. it is my understanding that there is a lot of non-beos specific code that calls addref/release, so, how can we be sure they are doing everything on the correct thread? especially when, as rpotts stated, the thread per window goes against the mozilla design, so, the non-beos code would "think" it is safe, when actually it may not be. then again, I could be wrong, but, it makes sense to me, and I don't see any other solution offerred.
Hi. I thought that you could apartment the gecko code off onto one thread and proxy to this thread to do any work that needs to be threadsafe. I am assigning back to you. I do not think that making all refcounting atomic would benefit all platforms. Feel free to write up a patch which makes ref counting atomic on beos. If you would like to talk about how to make mozilla better on BeOS, send me some email. Maybe you, me, rpotts could talk about what this soon?
Assignee: dougt → arougthopher
The fact that some reference counting is NOT atomic is simply a reflection that the rest of the code for the given component is NOT threadsafe either ! Making all XPCOM reference counts atomic will NOT solve the underlying problem that non-threadsafe code is being accessed by multiple threads. It will merely remove the 'annoying' ASSERTs that tell you so :-) The only reason that the crashes occur in XPCOM is because it is doing 'more' non-threadsafe operations... making the reference counting atomic will merely move the crash to the next place (outside of XPCOM) where data will be manipulated in a non-threadsafe manner. In essence, this bug is saying 'make ALL of mozilla threadsafe'. I don't believe that this is a realistic goal. -- rick
what rick said. RefCounting isn't the problem. the problem is that Mozilla isn't threadsafe
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
Thanks for the explanation Rick. It's unforunate that Mozilla is not threadsafe across the board, as this obviously limits its portability and usefulness. I understand that realistically this may never happen. However, the ref counting issue (in informal testing) does greatly reduce the number of crashes on BeOS, even if it doesn't solve the entire problem. I would still vote to make all reference counting atomic under BeOS as it has no drawbacks, and would give developers like me a reason to keep working on BeZilla. Paul, any thoughts?
Daniel, I actually have some ideas about this. Tonight, I am going to be working on actuall documentation about the BeOS port, so that other developers will have an easier job than I did, working on mozilla. Much of the documentation at mozilla.org does not seem to have been updated in a while :( So, when I have that done, I will post it at bezilla.org for review by other developers, and I'll send you a link. also, when using Net+ for posting comments, remember to hit return at the end of the line, or bugzilla won't wrap the comment string.
adding myself to CC.
Usability note for developers with SMP. It seems that main crashers on SMP machines are tooltips. People who disabled it reported MUCH longer work without crashes (if at all). So, at least for development purposes, it is recommeded to disable tooltips on Preferences->Appearance
You need to log in before you can comment on or make changes to this bug.