Closed
Bug 206947
Opened 22 years ago
Closed 21 years ago
Synchronous XMLHttpRequest crashes the browser on approx. 500th call on Mac and Linux [@ nsEventQueueImpl::GetYoungestActive]
Categories
(Core :: XML, defect, P1)
Core
XML
Tracking
()
RESOLVED
FIXED
mozilla1.6alpha
People
(Reporter: arysin, Assigned: keeda)
References
Details
(Keywords: crash, testcase, Whiteboard: patch, need review)
Crash Data
Attachments
(10 files, 3 obsolete files)
4.54 KB,
text/plain
|
Details | |
647 bytes,
text/html
|
Details | |
43.47 KB,
text/plain
|
Details | |
835 bytes,
text/html
|
Details | |
8.44 KB,
patch
|
Details | Diff | Splinter Review | |
496 bytes,
patch
|
timeless
:
review-
|
Details | Diff | Splinter Review |
701 bytes,
text/html
|
Details | |
1.19 KB,
patch
|
hjtoi-bugzilla
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
9.42 KB,
patch
|
Details | Diff | Splinter Review | |
6.90 KB,
patch
|
keeda
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030521
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030521
XMLHttpRequest crashes the browser on approx 500th call, this number is
cumulative, i.e. if call made 300 times then the window closed and the page is
opened in another the browser will crash after ~200 calls.
The same effect in Mozilla on Linux and Windows. On Windows it also says there's
a problem in XPCOM library.
Reproducible: Always
Steps to Reproduce:
1.open the page test.html
2.wait until counter gets to ~500
Actual Results:
browser crashes
Expected Results:
browser should keep working
this bug does not depend on setTimeout() - with for(;;) loop it gives the same
crash, and this bug does not depend on setting innerHTML of the element.
<html><head>
</head><body onload="changeSize();">
<script>
var count = 0;
var xml = null;
if(document.all)
xml = new ActiveXObject('Microsoft.XMLHTTP');
else
xml = new XMLHttpRequest();
function changeSize(){
xml.open("GET", "about:blank", false);
xml.send(null);
document.getElementById("count").innerHTML = eval(++count);
setTimeout("changeSize()", 50);
}
</script>
<br>
Count: <span id="count">0</span>
</body>
</html>
Comment 1•22 years ago
|
||
Confirmed using FizzillaMach/2003-05-20-08-trunk. Setting All/All.
Crash occurred at iteration 575. Also, TalkBack did not execute following crash.
Blocks: 201894
Comment 6•22 years ago
|
||
I'm also seeing this on RH Linux 9.0 / Mozilla 1.3.1
Heikki -
This is critical for us to get fixed. Is there a way to get some luvin' on this
bug??
Thanks!!!
- Bill Edney
- Technical Pursuit Inc.
I landed some memory leak fixes on 5/20, and I don't see any crashes on Win2k
debug build even with 3500 loads of the testcase, and the memory fluctuated
between 40 and 60 MB. Please try this with a recent nightly.
Still crashing using FizzillaMach/2003-05-25-03-trunk, again at iteration 575.
While my Linux and Mac builds are cranking, can you try some variations to see
if it still crashes? Specifically:
1. Make sure that you don't have more than one XMLHttpRequest connection active
at any time
2. Reuse the same XMLHttpRequest object instead of creating new objects
Reporter | ||
Comment 10•22 years ago
|
||
Well, I've been trying different variations and all of them crash. As to your 2
points I think they're covered in the "HTML tescase from comment 0".
There only one XMLHttpRequest is created and no new call until previous is
completed.
I made the testcase use asynchronous loading and it does not seem to crash
anymore (at least on Mac OS X). Please try this as a workaround while I am
looking at the issue with sync loading and Mac/Linux.
Updated•22 years ago
|
Summary: XMLHttpRequest crashes the browser on approx. 500th call [@ nsEventQueueImpl::GetYoungestActive] → Synchronous XMLHttpRequest crashes the browser on approx. 500th call on Mac and Linux [@ nsEventQueueImpl::GetYoungestActive]
Reporter | ||
Comment 12•22 years ago
|
||
yes, with asynchronous calls it does not crash neither on linux nor on Windows
thanks, now at least we can do some workaround
Comment 13•22 years ago
|
||
bug 208127 is a dupe of this bug.
Looking at this issue, heres what I came up w/ith:
In nsXMLHttpReqeust, the call in ::Send -> PushThreadEventQueue is the cause of
the problem.
If PushThreadEventQueue is called exactly 501 times, we crash. This appears to
exercize the limit to the EventQueue and appears PushThreadEventQueue is not
properly dequeue'ing. I tested explicitly calling PopThreadEventQueue, but that
didn't help.
I have a backtrace i'll post here.
Comment 14•22 years ago
|
||
PR_EXTERN(PRBool)
PL_IsQueueNative(PLEventQueue *queue)
{
return queue->type == EventQueueIsNative ? PR_TRUE : PR_FALSE;
}
********** CREATE (0x8c92c30) **********
out: modalEventQueue(0x8c8fe90)
Document http://linuxbox/leak/page1.html loaded successfully
********** DESTROY (0x8c92c30) **********
Document http://linuxbox/leak/page2.html loaded successfully
********** CREATE (0x8c8b918) **********
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1080897920 (LWP 7575)]
0x40252679 in PL_IsQueueNative (queue=0x0) at plevent.c:1279
/backups/MOZILLA_1_2_1_RELEASE/mozilla/xpcom/threads/plevent.c:1279:33657:beg:0x40252679
Current language: auto; currently c
(gdb) where
#0 0x40252679 in PL_IsQueueNative (queue=0x0) at plevent.c:1279
#1 0x402542f5 in nsEventQueueImpl::IsQueueNative(int*) (this=0x87e4a70,
aResult=0xbfffcea8) at nsEventQueue.cpp:350
#2 0x41464f03 in nsAppShellService::Observe(nsISupports*, char const*, unsigned
short const*) (this=0x81274e8, aSubject=0x87e4a70, aTopic=0x402c35d8
"nsIEventQueueActivated") at nsAppShellService.cpp:1067
#3 0x401e31bf in nsObserverService::NotifyObservers(nsISupports*, char const*,
unsigned short const*) (this=0x80de618, aSubject=0x87e4a70, aTopic=0x402c35d8
"nsIEventQueueActivated", someData=0x0) at nsObserverService.cpp:211
#4 0x40253e6d in nsEventQueueImpl::NotifyObservers(char const*)
(this=0x87e4a70, aTopic=0x402c35d8 "nsIEventQueueActivated") at nsEventQueue.cpp:224
#5 0x40253a81 in nsEventQueueImpl::InitFromPRThread(PRThread*, int)
(this=0x87e4a70, thread=0x809dfd8, aNative=1) at nsEventQueue.cpp:176
#6 0x402560a3 in nsEventQueueServiceImpl::MakeNewQueue(PRThread*, int,
nsIEventQueue**) (this=0x80a4320, thread=0x809dfd8, aNative=1,
aQueue=0xbfffd13c) at nsEventQueueService.cpp:166
#7 0x4025665a in nsEventQueueServiceImpl::PushThreadEventQueue(nsIEventQueue**)
(this=0x80a4320, aNewQueue=0xbfffd48c) at nsEventQueueService.cpp:285
#8 0x44f452d7 in nsXMLHttpRequest::Send(nsIVariant*) (this=0x8c8b918,
aBody=0x8c35910) at nsXMLHttpRequest.cpp:1261
#9 0x40282467 in XPTC_InvokeByIndex () at xptcinvoke_gcc_x86_unix.cpp:66
#10 0x4098eb6e in XPCWrappedNative::CallMethod(XPCCallContext&,
XPCWrappedNative::CallMode) (ccx=@0xbfffd7d0, mode=CALL_METHOD) at
xpcwrappednative.cpp:2011
#11 0x40998d99 in XPC_WN_CallMethod(JSContext*, JSObject*, unsigned, long*,
long*) (cx=0x86589e0, obj=0x83be218, argc=1, argv=0x8ca208c, vp=0xbfffd970) at
xpcwrappednativejsops.cpp:1281
#12 0x400cbc2d in js_Invoke (cx=0x86589e0, argc=1, flags=0) at jsinterp.c:839
#13 0x400d9d6c in js_Interpret (cx=0x86589e0, result=0xbfffdf8c) at jsinterp.c:2803
#14 0x400cc3dd in js_Execute (cx=0x86589e0, chain=0x85bd210, script=0x8c3f0a8,
down=0x0, special=0, result=0xbfffdf8c) at jsinterp.c:1020
#15 0x4009c77c in JS_EvaluateUCScriptForPrincipals (cx=0x86589e0, obj=0x85bd210,
principals=0x8c7e988, chars=0x8c921b8, length=448, filename=0xbfffe0c0
"http://linuxbox/leak/page1.html", lineno=7, rval=0xbfffdf8c) at jsapi.c:3382
#16 0x42b12d5c in nsJSContext::EvaluateString(nsAString const&, void*,
nsIPrincipal*, char const*, unsigned, char const*, nsAString&, int*)
(this=0x8658970, aScript=@0xbfffe240, aScopeObject=0x85bd210,
aPrincipal=0x8c7e984, aURL=0xbfffe0c0 "http://linuxbox/leak/page1.html",
aLineNo=7, aVersion=0x40120cc8 "default", aRetValue=@0xbfffe100,
aIsUndefined=0xbfffe0ac) at nsJSEnvironment.cpp:694
#17 0x4124b7de in nsScriptLoader::EvaluateScript(nsScriptLoadRequest*,
nsAFlatString const&) (this=0x8c862b8, aRequest=0x8c0edd0, aScript=@0xbfffe240)
at nsScriptLoader.cpp:582
#18 0x4124b177 in nsScriptLoader::ProcessRequest(nsScriptLoadRequest*)
(this=0x8c862b8, aRequest=0x8c0edd0) at nsScriptLoader.cpp:493
#19 0x4124ae91 in nsScriptLoader::ProcessScriptElement(nsIDOMHTMLScriptElement*,
nsIScriptLoaderObserver*) (this=0x8c862b8, aElement=0x8a41224,
aObserver=0x8a41228) at nsScriptLoader.cpp:436
#20 0x40f7bac8 in nsHTMLScriptElement::MaybeProcessScript() (this=0x8a41200) at
nsHTMLScriptElement.cpp:422
#21 0x40f7b053 in nsHTMLScriptElement::SetDocument(nsIDocument*, int, int)
(this=0x8a41200, aDocument=0x8c0e3f8, aDeep=0, aCompileEventHandlers=1) at
nsHTMLScriptElement.cpp:195
#22 0x40efdce3 in nsGenericHTMLContainerElement::AppendChildTo(nsIContent*, int,
int) (this=0x8c30ad0, aKid=0x8a41200, aNotify=0, aDeepSetDocument=0) at
nsGenericHTMLElement.cpp:4093
#23 0x40fbd424 in HTMLContentSink::ProcessSCRIPTTag(nsIParserNode const&)
(this=0x89d5940, aNode=@0x8c8f090) at nsHTMLContentSink.cpp:5690
#24 0x40fb65df in HTMLContentSink::AddLeaf(nsIParserNode const&)
(this=0x89d5940, aNode=@0x8c8f090) at nsHTMLContentSink.cpp:3706
#25 0x41ea823e in CNavDTD::AddLeaf(nsIParserNode const*) (this=0x8c8b7e8,
aNode=0x8c8f090) at CNavDTD.cpp:3799
#26 0x41ea8452 in CNavDTD::AddHeadLeaf(nsIParserNode*) (this=0x8c8b7e8,
aNode=0x8c8f090) at CNavDTD.cpp:3862
#27 0x41ea4a06 in CNavDTD::HandleStartToken(CToken*) (this=0x8c8b7e8,
aToken=0x8ca01b0) at CNavDTD.cpp:1757
#28 0x41ea2d84 in CNavDTD::HandleToken(CToken*, nsIParser*) (this=0x8c8b7e8,
aToken=0x0, aParser=0x8c9f5d0) at CNavDTD.cpp:910
#29 0x41ea1f9f in CNavDTD::BuildModel(nsIParser*, nsITokenizer*,
nsITokenObserver*, nsIContentSink*) (this=0x8c8b7e8, aParser=0x8c9f5d0,
aTokenizer=0x8c2d6c0, anObserver=0x0, aSink=0x89d5940) at CNavDTD.cpp:521
#30 0x41ebcdb9 in nsParser::BuildModel() (this=0x8c9f5d0) at nsParser.cpp:1885
#31 0x41ebc9f2 in nsParser::ResumeParse(int, int, int) (this=0x8c9f5d0,
allowIteration=1, aIsFinalChunk=0, aCanInterrupt=1) at nsParser.cpp:1752
#32 0x41ebe4f6 in nsParser::OnDataAvailable(nsIRequest*, nsISupports*,
nsIInputStream*, unsigned, unsigned) (this=0x8c9f5d0, request=0x8b96bd8,
aContext=0x0, pIStream=0x8bdccb0, sourceOffset=0, aLength=928) at nsParser.cpp:2386
#33 0x4295a700 in nsDocumentOpenInfo::OnDataAvailable(nsIRequest*, nsISupports*,
nsIInputStream*, unsigned, unsigned) (this=0x8a19a38, request=0x8b96bd8,
aCtxt=0x0, inStr=0x8bdccb0, sourceOffset=0, count=928) at nsURILoader.cpp:244
#34 0x40bdc841 in nsHttpChannel::OnDataAvailable(nsIRequest*, nsISupports*,
nsIInputStream*, unsigned, unsigned) (this=0x8b96bd8, request=0x8c4636c,
ctxt=0x0, input=0x8bdccb0, offset=0, count=928) at nsHttpChannel.cpp:3096
#35 0x40b84e6b in nsOnDataAvailableEvent::HandleEvent() (this=0x444370d8) at
nsStreamListenerProxy.cpp:193
#36 0x40b64ab5 in nsARequestObserverEvent::HandlePLEvent(PLEvent*)
(plev=0x444370dc) at nsRequestObserverProxy.cpp:115
#37 0x402520ad in PL_HandleEvent (self=0x444370dc) at plevent.c:644
#38 0x40251edf in PL_ProcessPendingEvents (self=0x80de6b8) at plevent.c:574
#39 0x4025446e in nsEventQueueImpl::ProcessPendingEvents() (this=0x80a4278) at
nsEventQueue.cpp:388
#40 0x414b449a in event_processor_callback (data=0x80a4278, source=6,
condition=GDK_INPUT_READ) at nsAppShell.cpp:184
#41 0x414b3ed5 in our_gdk_io_invoke (source=0x82600e0, condition=G_IO_IN,
data=0x825ff38) at nsAppShell.cpp:76
#42 0x404ea0a6 in g_io_add_watch () from /usr/lib/libglib-1.2.so.0
#43 0x404eb9ae in g_get_current_time () from /usr/lib/libglib-1.2.so.0
#44 0x404ebe89 in g_get_current_time () from /usr/lib/libglib-1.2.so.0
#45 0x404ec124 in g_main_run () from /usr/lib/libglib-1.2.so.0
#46 0x403f727f in gtk_main () from /usr/lib/libgtk-1.2.so.0
#47 0x414b49bc in nsAppShell::Run() (this=0x8141530) at nsAppShell.cpp:332
#48 0x41463657 in nsAppShellService::Run() (this=0x81274e8) at
nsAppShellService.cpp:471
#49 0x0805dfd4 in main1 (argc=3, argv=0xbffff3a4, nativeApp=0x809f838) at
nsAppRunner.cpp:1541
#50 0x0805ec48 in main (argc=3, argv=0xbffff3a4) at nsAppRunner.cpp:1902
#51 0x420156a4 in __libc_start_main () from /lib/tls/libc.so.6
(gdb)
Comment 15•22 years ago
|
||
What happens is on the 501'st call, PL_IsQueueNative param "queue" is nil.
Starting at PushThreadEventQueue and ending at PL_IsQueueNative.
Comment 16•22 years ago
|
||
I can confirm this still crashes (linux) on a new trunk build from this morning.
Doug, do you know anything about limitations on the number of calls to
PushThreadEventQueue()? It seems like popping the queue does not help, and we
crash on 501st call.
Comment 18•22 years ago
|
||
This seems to fix the crash. My test running is at 3,000 and climbing. So the
crash at 501 is fixed by this patch.
--pete
Comment 19•22 years ago
|
||
Man this is leaking like a siv.
Using, my test case posted in bug (208127), after about 3,031 iterations.
Start memory aprox 22Megs
End Memory aprox 50Megs
Serious leakage here.
Assignee: heikki → petejc
Whiteboard: patch, need review
Target Milestone: --- → mozilla1.5beta
Updated•22 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P1
Comment 20•22 years ago
|
||
Leaked 30 Megs in about 4 minutes.
Comment 21•22 years ago
|
||
*** Bug 208127 has been marked as a duplicate of this bug. ***
Comment 22•22 years ago
|
||
Anyone around to review this one line patch?
CC'ing Brendan for some review love.
Updated•22 years ago
|
Attachment #127820 -
Flags: superreview?(brendan)
Attachment #127820 -
Flags: review?(dougt)
Comment 23•22 years ago
|
||
Why is adding NS_RELEASE a crash fix? Wouldn't that cure a memory leak?
/be
Comment 24•22 years ago
|
||
It fixes the crash. Remove it and we crash at 50st1 call to PushThreadEventQueue.
The memory leak here is actually a different issue. This bug is about crashing
on the 501st call. My guess is perhaps there is a limit to how many ADDREFS can
be made on a single class?
I really don't know this code,. just that the explicit RELEASE in the patch
fixes this crasher.
Comment 25•22 years ago
|
||
There's nothing magic about 500.
Dougt, lend us a clue.
/be
Comment 26•22 years ago
|
||
Brendan, if Dougt is too busy who else knows this code well enough to review?
It's a one liner, I've been running w/ it for some time and haven't seen any
related problems.
Can you help me get some traction on this nasty crasher fix so I can check it in?
Thanks
Comment 27•22 years ago
|
||
ok, here's the testcase:
const eqs=Components.classes["@mozilla.org/event-queue-service;1"].getService(Components.interfaces.nsIEventQueueService)
for (a=0; a<10000; print(++a)) eqs.popThreadEventQueue(eqs.pushThreadEventQueue());
On raistlin-beos [cvs from now] (only tested once because it took too long)
$ ./run-mozilla.sh ./xpcshell /tmp/206947
...
5821
Kill Thread
I don't know if this was killed for resource reasons or what.
On cobra-darwin [cvs from this week] (only tested once because it took too long)
...
3029
./run-mozilla.sh: line 72: 13142 Segmentation fault "$prog" ${1+"$@"}
On boffo-linux [cvs from last week + some minor changes] (tested a few times)
...
509
./run-mozilla.sh: line 72: 16248 Segmentation fault "$prog" ${1+"$@"}
On deathstarii-solaris [cvs from a long time ago + various changes that i can't remember]
...
125
Segmentation Fault - core dumped
From this we can conclude that 500 is definitely not a magic number. 512 might be on linux (128 on solaris?), but i'm not sure how reasonable that is.
Anyway. I see a whole bunch of things i don't like in the eqs/eq so i'm going to poke this stuff today and maybe sunday.
For the time being i'm going to side w/ brendan
Comment 28•22 years ago
|
||
> For the time being i'm going to side w/ brendan
Meaning?
There is nothing special about 500?
Anyway, did you try your test w/ the release to aQueue?
NS_RELEASE(aQueue);
Comment 29•22 years ago
|
||
BTW: Timless's test case doesn't crash on FreeBSD Moz 1.3b.
Comment 30•22 years ago
|
||
Well, put it simply, i claim there are a few problems.
1. nsEventQueue::*Init don't check the nspr create event method for failure
^ I think i had a bug or at least plans to change this a long time ago, but i can't find the tree with those changes and am not going to search for a bug about it (this bug will be hijackable to suit my needs)
2. Pop doesn't trigger Unlink
^ the documentation in the two classes talks a lot about Unlink and lists a certain place where it'd be called. except well ... it isn't called.
Running on cobra-darwin w/ a patch to make the init methods return failures.
...
4027
Program received signal EXC_BAD_ACCESS, Could not access memory.
0x00f2aa40 in nsCOMPtr_base::~nsCOMPtr_base() (this=0xbff80070, __in_chrg=0) at /Users/timeless/mozilla/xpcom/glue/nsCOMPtr.cpp:62
#0 0x00f2aa40 in nsCOMPtr_base::~nsCOMPtr_base() (this=0xbff80070, __in_chrg=0) at /Users/timeless/mozilla/xpcom/glue/nsCOMPtr.cpp:62
#1 0x00ef8db0 in nsEventQueueImpl::GetYoungestActive(nsIEventQueue**) (this=0x8fde10, aQueue=0xbff800f0) at ../../dist/include/xpcom/nsCOMPtr.h:1063
...
#4027 0x00ef8db0 in nsEventQueueImpl::GetYoungestActive(nsIEventQueue**) (this=0x1b5a20, aQueue=0xbfffde70) at ../../dist/include/xpcom/nsCOMPtr.h:1063
#4028 0x00f4b19c in nsEventQueueServiceImpl::GetYoungestEventQueue(nsIEventQueue*, nsIEventQueue**) (this=0xbff80070, queue=0xf57980, aResult=0xbfffdf30) at ../../dist/include/xpcom/nsCOMPtr.h:1063
Remember according to the obvious interpretation of my testcase event queues should not linger in the chain (and Unlink is the function which would make that happen). From the crash we can see that they're lingering. and from the code we can see there isn't a call to Unlink anywhere near this.
Comment 31•22 years ago
|
||
Well, it appears on Linux,
nsEventQueueImpl::InitFromPRThread ->
mEventQueue = PL_CreateNativeEventQueue("Thread event queue...", thread);
after the 509th asssignment, PL_CreateNativeEventQueue returns (nil)
So, mEventQueue is assigned a null value and the chain of events leading to the
crash starts:
Specifically, CheckForDeactivation(). Crash, Boom.
Comment 32•22 years ago
|
||
So it appears, that after 508 successful calls to PL_CreateNativeEventQueue all
on the same thread things break.
I notice Unlik is called in the nsEventQueueImpl destructor, and this works if
you set the test to 508 times. All 508 objects allocated, are properly released
when the service is realeased.
Comment 33•22 years ago
|
||
Running Timeless's test case, using PL_CreateMonitoredEventQueue instead of
PL_CreateNativeEventQueue, it all seems to work peachy. I reach 1000 no probes
and the queue is properly destroyed when the service is unloaded.
Any reason we need to use PL_CreateNativeEventQueue?
Comment 34•22 years ago
|
||
Here's an alternative testcase:
const
eqs=Components.classes["@mozilla.org/event-queue-service;1"].getService(Components.interfaces.nsIEventQueueService)
try {
for (a=0; a<10000; print(++a)) {
eqs.popThreadEventQueue(eqs.pushThreadEventQueue()); gc(); }
} finally { quit(); }
with this testcase and the patch i'm attaching things work up to 4099 (and
growing, but very slowly. gc at each iteration is expensive).
Comment 35•22 years ago
|
||
Timeless, this patch just does error checking and throws out an
NS_ERROR_OUT_OF_MEMORY
On Linux it still dies at 509
509
uncaught exception: [Exception... "Component returned failure code: 0x8007000e
(NS_ERROR_OUT_OF_MEMORY) [nsIEventQueueService.pushThreadEventQueue]" nsresult:
"0x8007000e (NS_ERROR_OUT_OF_MEMORY)" location: "JS frame :: test.js ::
<TOP_LEVEL> :: line 4" data: no]
I think the root of the problem is this:
On some systems, PL_CreateNativeEventQueue after "n" calls on the same thread,
will assign nil to mEventQueue.
Right here:
http://lxr.mozilla.org/seamonkey/source/xpcom/threads/nsEventQueue.cpp#170
Slap a printf right after this assignment and you'll see something like this (on
Linux):
504
mEventQueue(0x813a120)
505
mEventQueue(0x813a358)
506
mEventQueue(0x813a590)
507
mEventQueue(0x813a7c8)
508
mEventQueue(0x813aa00)
509
mEventQueue((nil))
Still seems like some kind of limit is reached here.
Error checking indeed needs to be there, but the root of the problemstill needs
a fix'n.
Comment 36•22 years ago
|
||
Traced it a little further to here:
http://lxr.mozilla.org/seamonkey/source/xpcom/threads/plevent.c#852
Looks like a system limitation, when filedes[1] == 1024
pipe(self->eventPipe);
When self->eventPipe[1] == 1024
pipe() returns -1
$ ulimit -a
open files (-n) 1024
It looks like the system allows only 1024 file descs . . .
So, it appears that threadEventQueues that are poped are not really fully poped
until the service is released and the nsEventQueue destructors are then called.
Since the service is only released at shutdown, poped queues are not destroyed,
still holding on to their file descriptors.
So this would explain the disparity of resutls between the various systems.
Everything is being properly addrefed and released, but like I said, only
destroyed when the service is unloaded.
Comment 37•22 years ago
|
||
This two liner correctly resolves the problem I believe. Using Timeless's test
case, I ran it up to 10,000.
9992
9993
9994
9995
9996
9997
9998
9999
10000
Timeless, can you verify this patch on some other systems. If good, then "r="
it and maybe Brendan can give me some "sr=" love.
Gotta close those open files folks.
Attachment #127820 -
Attachment is obsolete: true
Attachment #129029 -
Attachment is obsolete: true
Comment 38•22 years ago
|
||
Well patch #129066 fixes the test case but breaks other things.
So, still looking for the right fix here . . .
Comment 39•22 years ago
|
||
Timeless, your GC solution may fix your specific test case, but from what I can
tell, it would have no impact on the first test case posted here. Is that test
case still valid?
So which test case are we currently trying to fix?
Comment 40•22 years ago
|
||
Comment on attachment 129066 [details] [diff] [review]
FIX
this is wrong and breaks darwin(macosx)
Attachment #129066 -
Flags: review-
Comment 41•22 years ago
|
||
cc:ing a few folks who might be more clueful than I about why
nsXMLContentSink::DidBuildModel is never called. (Did I get your attention?)
You guys have already noticed, but, closing the event pipes after each
ProcessPendingEvents would not be a good thing (tm).
But I think you are right, though, that the problem is leaking event queues
holding on to their pipe file descriptors until the system chokes itself to
death. However this is not caused by the event queue service, which does not
hold a reference to event queues.
I see this: each iteration of nsXMLHttpRequest::Send in timeless' test case
pushes a new event queue (line 1283). Though this queue is properly popped and
cleaned up it is never actually destroyed because of a persistent reference held
in nsParser::mEventQueue, set in nsParser's ctor. This ref would be dropped in
nsParser's dtor but nsParser itself is leaking. That's because of a circular
reference between nsParser::mSink, an nsXMLContentSink, and that
nsXMLContentSink::mParser. That circular reference would be broken in
nsXMLContentSink::DidBuildModel, but DidBuildModel is never called, ever.
That's as far as I've traced it. This sucks. Old event queues are building up in
a never-ending stack and when they get 1,000 deep the app goes down. Can't blame
it. I feel pretty confident this crash will go away if the leaking nsParser can
be plugged.
Comment 42•22 years ago
|
||
From what I can tell, nsParser is holding onto them. It's getting constructed,
grabs one of the event queues for the thread. I'm unsure why the parser isn't
getting destructed. I see the destructor hit, but not for all instances.
Comment 43•22 years ago
|
||
the timeless patch attachment.cgi?id=129042 applied to a cvs pulled at around
20030802 20:00 UTC+02:00 gives the following results:
Mac OS X 10.2.6 with all the whistes and bells to compile the beasts
1) the counter stops at 65535 = 2^16 - 1
2) memory use :
PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE
7404 MozillaFir 0.0% 41:58.71 16 254 2701 517M 29.0M 334M 809M
which is way about normal:
17968 mozilla-bi 0.0% 0:34.56 16 254 512 21.3M 24.5M 29.9M 280M
5543 mozilla-bi 0.0% 25:16.92 16 260 1051 46.1M 30.5M 67.9M 441M
766 mozilla-bi 0.0% 2:43:59 16 266 1677 61.5M 32.5M 85.6M 480M
3) the test-browser is unresponsive to loading pages in new tabs, javascript
console won't start; but opening and closing tabs works, resizing of window
works. Closing the window doesn't help things - I get an arrow with a
busy-indicator ; the window menu indicates that all the windows I've requested
are active, but no windows get drawn
Attachment #127820 -
Flags: superreview?(brendan)
Attachment #127820 -
Flags: review?(dougt)
Comment 44•22 years ago
|
||
I see the ref counts go up on both the parser and the content sink when
SetContentSink is called in nsXMLDocument::StartDocumentLoad.
Comment 45•22 years ago
|
||
Comment on attachment 129042 [details] [diff] [review]
Error check and Unlink
danm: i know that this doesn't fix all of the problems, but it fixes the
testcase in comment 34, and some other crashers.
I know we still want some fix for parser, and I still plan to write some code
so that xpconnect won't hold a strong reference to eventqueues, but would you
please review these changes?
Attachment #129042 -
Flags: review?(danm-moz)
Comment 46•22 years ago
|
||
Restating comment 41 because it seems I wasn't clear: we have rampant leaks and
it looks to me that several of them, including the important nsEventQueue leak,
will be plugged if the parser (?) would only call
nsXMLContentSink::DidBuildModel. That ain't happening, and it's a problem.
timeless: I'd like to see the attachment 129042 [details] [diff] [review] patch in a separate (perhaps
dependent) bug because I consider it a distraction from the core, leak problem.
The tight push/pop loop in the comment 34 testcase unfortunately pushes N
simultaneous queues. It looks to me that there's no long-term leak: they are all
properly unwound though only after the loop finishes.
I imagine this is a problem like the comment 0 testcase for similar reasons. So
obviously the two cases are related but the fixes will be different and the
cc:list on this bug is growing, and I'd like to compartmentalize the two issues
and avoid spamming potential help on this leak. If you'll put that patch in a
new bug (sorry!) I have a 100-line comment ready for it :).
Comment 47•22 years ago
|
||
I don't believe nsXMLContentSink is being used at all in test case #124114. I
put printf's in the class ctor and dtor and don't see them showing up.
The nsParser is certainly used and appears to be properly destroyed.
nsXMLHTTPRequest is properly being created and destroyed as well.
Comment 48•22 years ago
|
||
Actually, I am seeing the parser leak now.
It appears to be 2 to 1.
nsParser CREATE (0x43b92aa8)
nsParser CREATE (0x43bc68d8)
nsParser DESTROY (0x43b92aa8)
Document http://linuxbox/leak/page1.html loaded successfully
nsParser CREATE (0x43b0c950)
nsParser CREATE (0x43b9f1c0)
nsParser DESTROY (0x43b0c950)
Document http://linuxbox/leak/page1.html loaded successfully
nsParser CREATE (0x43bb81c0)
nsParser CREATE (0x43bcedd0)
nsParser DESTROY (0x43bb81c0)
Document http://linuxbox/leak/page1.html loaded successfully
nsParser CREATE (0x43bb79b8)
nsParser CREATE (0x43bde3d0)
nsParser DESTROY (0x43bb79b8)
Document http://linuxbox/leak/page1.html loaded successfully
Where the second create is never released.
The XML sink will not be created because XMLHttpRequest will only try to parse
things if the document mime type is XML, or if you explicitly override the mime
type. It should not even create the parser in this testcase. Have a look at
XMLHttpRequest::OnStartRequest:
http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#952
Comment 50•22 years ago
|
||
Right. that's what I thought. I'm using text/plain in my test case.
I do see the parser leak. So obviously it's not being caused by nsXMLContentSink.
There is only one reference from nsXMLHTTPRequest to nsIParser
here:
http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#987
Comment 51•22 years ago
|
||
Nope they *are* being created. I did a compile from the tip and now I see the
printfs.
nsXMLDocument and nsXMLContentSink are indeed being created.
Doh!
Try this. Now we call overrideMimeType() to try and force the creation and use
for the XML sink, which should result in proper cleanup.
I also realized we do have a parser even in the case where there is no XML
data, and we do not force the mime type. If this workaround actually works
(haven't tried), then we should find a way to clean things up without requiring
the sink in cases like this where it's pointless trying to parse the data as
XML.
Comment 53•22 years ago
|
||
The XML Content sink object is being created w/ a null docshell.
http://lxr.mozilla.org/seamonkey/source/content/xml/document/src/nsXMLDocument.cpp#671
Maybe that is setting off a chain of events that leads the the leak in
nsXMLContentSink.
Comment 54•22 years ago
|
||
W/ this patch, there are no leaks, no crash and everything seems to work
peachy.
--pete
Comment 55•22 years ago
|
||
I can verify that both overrideMimeType (comment 52) and not setting the content
sink when there's no docshell (comment 54) prevent event queues from leaking in
the comment 0 testcase. (Note I just checked in a patch to nsEventQueue.cpp that
makes verification easy.) pete's patch looks interesting but the right reviewer,
that guy isn't me. (jst? heikki?)
Updated•22 years ago
|
Attachment #129191 -
Flags: superreview?(jst)
Attachment #129191 -
Flags: review?(hjtoi-bugzilla)
Comment 56•22 years ago
|
||
Changing target to 1.6 alpha.
Target Milestone: mozilla1.5beta → mozilla1.6alpha
I think we should try to avoid creating the objects we don't really need. In
this case it seems like we shouldn't need the XML content sink. Please try to
change the code so we won't create the XML sink in this case. If you can get
that to work I'd rather approve that kind of change. (If it won't work I can
re-evaluate this patch as well.)
Comment 58•22 years ago
|
||
> Please try to change the code so we won't create the XML sink in this case.
> If you can get that to work I'd rather approve that kind of change.
No it doesn't work, that's the first thing i tried and other things started to
break. There are lots of instances where there is no docshell, but the new
content sink object is required. That's why the conditionals in this patch are
where they are.
Comment on attachment 129191 [details] [diff] [review]
Proposed FIX
Okay, r=heikki
Attachment #129191 -
Flags: review?(hjtoi-bugzilla) → review+
Comment 60•22 years ago
|
||
Thanks Heikki.
jst or ping any super reviewers cc'd on this bug. I haven't heard from jst in a
while.
Comment 61•22 years ago
|
||
Comment on attachment 129191 [details] [diff] [review]
Proposed FIX
It seems to me that there's another problem lurking here that should IMO be
fixed in stead of simply not handing the sink to the parser when there's no
docshell around. If there's a leak in that case, we should fix the leak, not
hack around it for this single case.
If this is needed for 1.5, or whatever the target is, I'll sr, but please do
open a new bug about backing out this change and fixing the actual leak.
Attachment #129191 -
Flags: superreview?(jst) → superreview+
Comment 62•22 years ago
|
||
jst, thanks for the quick review.
I'll make sure I clearly comment this and refer to this bug number before
checking in. Then whoever has time can track this down even further.
Comment 63•22 years ago
|
||
Checked in.
Comment 64•22 years ago
|
||
this caused camino to hang at startup parsing its bookmarks. backing it out
fixed it. what should i do?
Comment 65•22 years ago
|
||
Ok, i guess we need to probe deeper to find the root source of the leak like jst
suggested.
Assignee | ||
Comment 66•22 years ago
|
||
The core problem here seems to be in the parser API's. Once you start
an async parse by calling Parse(), the parser assumes that you
will send it the appropriate nsIStreamListener events. There
seems to be no way of getting it to shutdown properly otherwise.
I tried calling nsIParser::Terminate(), but it doesn't do anything
useful in this case. I even tried to send it just a OnStopRequest()
without sending it any data, but it doesnt like that very much either.
Some scary looking assertions fired inside the parser.
In this particular case, it looks like the Parse() call is set off
by the call to nsXMLDocument::StartDocumentLoad() in Send().
However, as Heikki pointed out we may subsequently decide,
in OnStartRequest(), to not forward the nsIStreamListner events
into the parser for non-xml mimetypes etc. (BTW, that was bug 129607).
I don't know enough about the code here, but it seems to me that
one resonable way to avoid this situation would be delay notifying
the document that a load has started till such a time that
nsXMLHttpRequest has made a decision about whether it wants to try
building an xml document. This would mean that Parse() is not called,
and the contentsink is not created unless nsXMLHttpRequest is sure
that it intends to send forward nsIStreamListener events into the
parser.
This patch is my stab at taking this approach of lazily calling
StartDocumentLoad(). It appears to work OK. The testcase in comment 0 doesnt
leak/crash anymore. I tested this a bit (basically the online tests up at
http://www.mozilla.org/xmlextras/tests.html) Nothing appears to be
broken.
Assignee | ||
Comment 67•22 years ago
|
||
This is something that I started with and abandonded since it was getting into
areas that I really dont understand. I can finish/test it if the approach
sounds better than the earlier one. I was trying to delay the construction of
the document itself till it is absolutely needed. It also fixed the leak, but
is untested beyond that. It turns out the document is used to dispatch events
when the request completes and I was not sure that my changes in
RequestCompleted() made sense, or how to test them.
Assignee | ||
Updated•22 years ago
|
Attachment #130523 -
Flags: superreview?(jst)
Attachment #130523 -
Flags: review?(hjtoi-bugzilla)
Comment on attachment 130523 [details] [diff] [review]
Another proposed fix
Seems like a good idea!
>Index: nsXMLHttpRequest.cpp
>===================================================================
>+ if (!mDocument) {
>+ // We were probably aborted. So no point in attempting to parse
>+ // incoming data.
>+ mState &= ~XML_HTTP_REQUEST_PARSEBODY;
>+ }
Did you actually hit this condition? How is it possible to be aborted before
starting data transfer?
>+ } else if (!parser || (parser && parser->IsParserEnabled())) {
>+ // If we do have a parser, then it needs to be enabled for us to
>+ // safely call RequestCompleted(). If the parser is not enabled,
>+ // it means it was blocked, by xml-stylesheet PI for example, and
>+ // is still building the document. RequestCompleted() must be
> // called later, when we get the load event from the document.
This comment seems lacking. Why do we need to treat not having parser the same
as having enabled parser?
>- rv = document->StartDocumentLoad(kLoadAsData, mChannel,
>- loadGroup, nsnull,
>- getter_AddRefs(listener),
>- PR_TRUE);
> mChannel->SetNotificationCallbacks(this);
> rv = mChannel->AsyncOpen(this, nsnull);
I am most wary of this change. Is it really ok to open a channel without
telling the document first to start loading?
![]() |
||
Comment 69•22 years ago
|
||
> Is it really ok to open a channel without telling the document first to start
> loading?
Should be -- that's how normal content area loads work. In fact, telling the
document to start loading before the channel calls OnStartRequest is very much
wrong -- StartDocumentLoad asks the channel for things like the content type,
etc. Since the channel's listener is 'this', you just want to call
StartDocumentLoad from the OnStartRequest of 'this' and then forward the other
notifications to the document's listener (if any). That's essentially what the
URILoader does for normal content area loads.
Assignee | ||
Comment 70•21 years ago
|
||
> Did you actually hit this condition? How is it possible to be aborted
> before starting data transfer?
Yes I did. The reason I put that in was that I saw this condition fairly
regularly on the following two tests.
http://www.mozilla.org/xmlextras/xgetabort.html
http://www.mozilla.org/xmlextras/xgetopen.html
I presumed that it would be quite natural for that to happen when async requests
are aborted. I'm fairly certain that the current code was also ending up in
OnStartRequest() after the Abort() call, before I touched it at all. Was that
not supposed to happen?
> This comment seems lacking. Why do we need to treat not having parser the
> same as having enabled parser?
Indeed it is lacking. Is the following better?
// If we don't have a parser, we never attempted to parse the
// incoming data, and we can proceed to call RequestCompleted().
// Alternatively, if we do have a parser, its possible that we
// have given it some data and thus caused it to block e.g. by a
// by a xml-stylesheet PI. In this case, we will have to wait till
// it gets enabled again and RequestCompleted() must be called
// later, when we get the load event from the document. If the
// parser is enabled, it is not blocked and we can still go ahead
// and call RequestCompleted() and expect everything to get
// cleaned up immediately.
I am open to suggestions about improving that further.
And I guess bz adressed your last point, it does appear to be safe to open the
channel without starting the load. Thats kinda the whole point of the patch :-)
Comment on attachment 130523 [details] [diff] [review]
Another proposed fix
With the new comment, r=
Attachment #130523 -
Flags: review?(hjtoi-bugzilla) → review+
Assignee | ||
Comment 72•21 years ago
|
||
Assignee | ||
Updated•21 years ago
|
Attachment #130523 -
Attachment is obsolete: true
Assignee | ||
Comment 73•21 years ago
|
||
Comment on attachment 131388 [details] [diff] [review]
Same patch. Updated comment foo.
Marking heikki's review.
Tranferring sr request to updated patch. jst, can you please have a look at
this.
Attachment #131388 -
Flags: superreview?(jst)
Attachment #131388 -
Flags: review+
Assignee | ||
Updated•21 years ago
|
Attachment #130523 -
Flags: superreview?(jst)
Comment 74•21 years ago
|
||
Comment on attachment 131388 [details] [diff] [review]
Same patch. Updated comment foo.
- In nsXMLHttpRequest::OnStartRequest():
+ return mXMLParserStreamListener->OnStartRequest(request,ctxt);
While you're at it, please add a space after ','.
- In nsXMLHttpRequest::OnStopRequest():
+ } else if (!parser || (parser && parser->IsParserEnabled())) {
Unnecessary parser check, simply:
+ } else if (!parser || parser->IsParserEnabled()) {
would do (the rhs of an || is only executed if the lhs expression evaluates to
false, and in that case, we know parse is non-null here).
sr=jst
Attachment #131388 -
Flags: superreview?(jst) → superreview+
Assignee | ||
Comment 75•21 years ago
|
||
Taking bug since I have the lastest patch, and I don't want to forget about
checking this in.
jst, thanks for the sr. I can't believe I wrote that redundant parser check :-)
Will update and check this in soon.
Assignee: petejc → keeda
Status: ASSIGNED → NEW
Assignee | ||
Comment 76•21 years ago
|
||
Patch checked in. Hopefully, this time, this really is fixed :-)
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Updated•21 years ago
|
Attachment #129042 -
Flags: review?(danm-moz)
Updated•14 years ago
|
Crash Signature: [@ nsEventQueueImpl::GetYoungestActive]
You need to log in
before you can comment on or make changes to this bug.
Description
•