Closed Bug 215085 Opened 21 years ago Closed 12 years ago

Cleanup EventQueue(Service)

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: timeless, Assigned: timeless)

References

Details

(Keywords: crash)

testcase (for xpcshell):

eqs=Components.classes["@mozilla.org/event-queue-service;1"].getService(Components.interfaces.nsIEventQueueService)
try {
for (a=0; a<10000; print(++a)) {
eqs.popThreadEventQueue(eqs.pushThreadEventQueue()); gc(); }
} finally { quit(); }

I'd like to start w/ danm's review of the patch in attachment 129042 [details] [diff] [review].

Note that this test case should *not* accumulate event queues since each
iteration pushes, pops, and ideally garbage collects a single event queue.

This flavor is a crasher in various ways (out of file handles, stack depth
exceeded).
There we go. Time to get this review off my hard disk. I think the patch is
mostly good guards and checks that are currently missing. But I do feel the need
to pick at it piecemeal:

Index: nsEventQueue.cpp
===================================================================
RCS file: /cvsroot/mozilla/xpcom/threads/nsEventQueue.cpp,v
retrieving revision 3.38
diff -up6 -r3.38 nsEventQueue.cpp
--- nsEventQueue.cpp
+++ nsEventQueue.cpp
@@ -60,12 +60,16 @@ static PRThread *gEventQueueLogThread = 
@@ -110,17 +114,12 @@ nsEventQueueImpl::nsEventQueueImpl()
@@ -141,12 +140,16 @@ nsEventQueueImpl::Init(PRBool aNative)
@@ -169,12 +172,16 @@ nsEventQueueImpl::InitFromPRThread(PRThr

all fine-'n-good

@@ -493,33 +500,31 @@ nsEventQueueImpl::Create(nsISupports *aO
...
-  nsCOMPtr<nsPIEventQueueChain> queueChain(do_QueryInterface(aQueue));
+  nsCOMPtr<nsPIEventQueueChain> queueChain(do_QueryInterface(aQueue, &rv));
+  if (NS_FAILED(rv))
+    return rv;
...
-  nsCOMPtr<nsPIEventQueueChain> endChain(do_QueryInterface(end));
+  nsCOMPtr<nsPIEventQueueChain> endChain(do_QueryInterface(end, &rv));

But I do have to grouse about this one. What, am I the only guy who doesn't
trust return codes? Really, there are accessors in the code which return an
NS_OK error code but a null interface. Couldn't name one right now of course but
I've been bitten by that one before.

In my apparently minority opinion, what I really care about is whether I got the
interface I wanted, not what the stinking result code was. The pattern

  nsCOMPtr<nsIThing> thing(do_QueryInterface(protoThing,&rv))
  if (NS_SUCCEEDED(rv))
    thing->Method()

makes me totally nervous. In this case it's just a QI so |returned value| and
|returned interface| are equivalent but still I prefer this section the way it
was. But since it seems like I'm in the minority I'm prepared to cave. Brendan?

Index: nsEventQueue.h
===================================================================
RCS file: /cvsroot/mozilla/xpcom/threads/nsEventQueue.h,v
retrieving revision 3.19
diff -up6 -r3.19 nsEventQueue.h
--- nsEventQueue.h
+++ nsEventQueue.h
@@ -77,12 +77,13 @@ private:
...
+        Unlink();
...

This Unlink() is premature. I hope this isn't the crux of your patch because it
can't be that way.

There are clients of this code (necko) which continue to post events to cached
event queues even after they've been deactivated. This is legal; we deal with it
by punting the tardy event up to an active elder queue. But this Unlink cuts the
queue out of the chain so it loses track of the elder queue(s) before its last
owner (and potential poster) has released it. If you run with this change and
exercise networking a bit I promise in time you'll hit the "event dropped
because event chain is dead" (nsEventQueue.cpp;3.38#251) assertion.

It's not a bad idea though, pulling the queue out of the chain at this point.
I've been toying with it myself. It's just a little trickier than that.

Index: nsEventQueueService.cpp
===================================================================
RCS file: /cvsroot/mozilla/xpcom/threads/nsEventQueueService.cpp,v
retrieving revision 1.39
diff -up6 -r1.39 nsEventQueueService.cpp
--- nsEventQueueService.cpp
+++ nsEventQueueService.cpp
@@ -182,13 +182,14 @@ nsEventQueueServiceImpl::CreateEventQueu
@@ -259,13 +260,12 @@ NS_IMETHODIMP
@@ -277,64 +277,66 @@ nsEventQueueServiceImpl::PushThreadEvent

again fine-'n-good, except for that curious reliance on return values :). Note
in this case the value-of-interest returned from MakeNewQueue (aQueue) can
actually be non-null though the value-of-no-interest returned (rv) indicates
failure, which breaks my own minority world-view. However that's another error:
the queue is broken and shouldn't be returned and in fact will leak because of
queue's unusual ownership model but that's another issue...

And in the case of the third change (@@ -277...) if the incompletely initialized
queue mistakenly returned by MakeNewQueue didn't leak then with this change from
the patch:

   nsCOMPtr<nsIEventQueue> newQueue;
+  rv = MakeNewQueue((PRThread*)key.GetValue()...
+  *aNewQueue = newQueue;
+  if (NS_SUCCEEDED(rv)) {
...
+    NS_ADDREF(*aNewQueue);

aNewQueue would return a dangling pointer. It's important to return null
pointers! The int return code; not so much. I know. I stand alone there.

So mostly good. I'd like some changes and I have in mind two additions:

MakeNewQueue I think should also be fixed to not return an incompletely
initialized object (and to fix the leak this will require a hack to artificially
decrement the refcount because Queues, and I admit this is an error of mine, ref
themselves in their constructor (that could also be fixed but it wants care)).

Also, a DEBUG-only check to warn us when the event queue stack gets too deep.
That'll be more important now. I've just checked that in, though.
Shouldn't the logging in the constructor be moved to one of the Init functions?
Isn't it just logging the uninitialized mEventQueue, well now initialized to
null? And while your in there, the Init functions share some code that could
potentially be put in an inline helper function and shared. Something like
below. Have Init and InitFromPRThread call this helper, and then put the logging
in InitFromPLQueue. That should get the initial logging in a central place.
Timeless, up to you if you want to fix the logging or not.

inline
nsresult nsEventQueueImpl::InitImpl(PRThread* thread, PRBool aNative)
{
  PLEventQueue* queue;
  if (aNative)
    queue = PL_CreateNativeEventQueue("Thread event queue...", thread);
  else
    queue = PL_CreateMonitoredEventQueue("Thread event queue...", thread);
  if (queue)
    return InitFromPLQueue(queue);
  else
    return NS_ERROR_OUT_OF_MEMORY;
}
It's just logging. It was exactly where I wanted it when I put it in there, (I
imagine!). Seemed worthwhile checking in at the time but I doubt anyone has used
it since. Seems senseless to move it unless you want to use it and find it's in
an inconvenient place for the problem at hand.
Danm may have a point: if a getter can return NS_OK with a null out param, you
have to test both. If you care more about null than the rv, test the out param
pointer after the call (make sure it's null before -- use an nsCOMPtr) and bail
with rv if it's null.

I don't know about the particular interface methods called in the patch; someone
could look. But in many cases a one-size-fits-all rule is best: null check and
fail with the rv, or don't even bother capturing the callee's rv, make your own
up (don't hide errors such as NS_ERROR_OUT_OF_MEMORY, tho).

/be
Yay! Just checked in; an example of the subject of comments 4 and (mostly)
comment 1, where the returned interface was null but the error code was NS_OK.
Part of the patch checked in (ignore the patch posted in the bug which used a
more clumsy NS_ENSURE_TRUE(sectionNode) form) went:

   nsCOMPtr<nsIDOMNode> sectionNode;
   nsresult rv = GetParentNode(getter_AddRefs(sectionNode));
-  NS_ENSURE_SUCCESS(rv, rv);
+  if (!sectionNode) {
+    return rv;
+  }

So there's an illustration of my point. The patched version
(nsHTMLTableRowElement.cpp 1.78) was a return to how the code originally read.
In rev 1.69 it was changed to the "trust rv" version. The result was crash bug
211357.

Now of course QueryInterface isn't the same thing as an accessor method. If a QI
returned a null interface you'd also expect an error return. So in the case of
the patch for this bug, both versions are pretty much equivalent. Still, I find
the "if (!sectionNode)" code more ad rem and therefore more readable, more
elegant and more trustworthy. So I'd hate to see perfectly good interface
checking code lose any more ground to that misbegotten "trust rv" stuff.

There. All done ranting. For a few minutes.
QA Contact: scc → xpcom
This bug has no information on current versions and no clear lead as to what the crashes that it tracks are. This should either be updated with current info and reopened or new bugs be filed on concrete actions/crashes on current code.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.