<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 2

•

22 years ago

Confirmed using FizzillaMach/2003-05-20-08-trunk. Setting All/All.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Keywords: crash, testcase

OS: Linux → All

Hardware: PC → All

Summary: XMLHttpRequest crashes the browser on approx. 500th call → XMLHttpRequest crashes the browser on approx. 500th call [@ nsEventQueueImpl::GetYoungestActive]

Comment 3

•

22 years ago

Attached file HTML testcase from comment 0 — Details

Comment 4

•

22 years ago

Attached file Crash report generated by FizzillaMach/2003-05-20-08-trunk showing crash at nsEventQueueImpl::GetYoungestActive on thread 9 — Details

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 5

•

22 years ago

Crash occurred at iteration 575. Also, TalkBack did not execute following crash.

Blocks: 201894

William J. Edney

Comment 6

•

22 years ago

I'm also seeing this on RH Linux 9.0 / Mozilla 1.3.1 Heikki - This is critical for us to get fixed. Is there a way to get some luvin' on this bug?? Thanks!!! - Bill Edney - Technical Pursuit Inc.

Comment 7

•

22 years ago

I landed some memory leak fixes on 5/20, and I don't see any crashes on Win2k debug build even with 3500 loads of the testcase, and the memory fluctuated between 40 and 60 MB. Please try this with a recent nightly.

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 8

•

22 years ago

Still crashing using FizzillaMach/2003-05-25-03-trunk, again at iteration 575.

Comment 9

•

22 years ago

While my Linux and Mac builds are cranking, can you try some variations to see if it still crashes? Specifically: 1. Make sure that you don't have more than one XMLHttpRequest connection active at any time 2. Reuse the same XMLHttpRequest object instead of creating new objects

Andriy Rysin

Reporter

Comment 10

•

22 years ago

Well, I've been trying different variations and all of them crash. As to your 2 points I think they're covered in the "HTML tescase from comment 0". There only one XMLHttpRequest is created and no new call until previous is completed.

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 11

•

22 years ago

Attached file aynschronous test, no crash — Details

I made the testcase use asynchronous loading and it does not seem to crash anymore (at least on Mac OS X). Please try this as a workaround while I am looking at the issue with sync loading and Mac/Linux.

Heikki Toivonen (remove -bugzilla when emailing directly)

Updated

•

22 years ago

Summary: XMLHttpRequest crashes the browser on approx. 500th call [@ nsEventQueueImpl::GetYoungestActive] → Synchronous XMLHttpRequest crashes the browser on approx. 500th call on Mac and Linux [@ nsEventQueueImpl::GetYoungestActive]

Andriy Rysin

Reporter

Comment 12

•

22 years ago

yes, with asynchronous calls it does not crash neither on linux nor on Windows thanks, now at least we can do some workaround

Comment 13

•

22 years ago

bug 208127 is a dupe of this bug. Looking at this issue, heres what I came up w/ith: In nsXMLHttpReqeust, the call in ::Send -> PushThreadEventQueue is the cause of the problem. If PushThreadEventQueue is called exactly 501 times, we crash. This appears to exercize the limit to the EventQueue and appears PushThreadEventQueue is not properly dequeue'ing. I tested explicitly calling PopThreadEventQueue, but that didn't help. I have a backtrace i'll post here.

Comment 14

•

22 years ago

PR_EXTERN(PRBool) PL_IsQueueNative(PLEventQueue *queue) { return queue->type == EventQueueIsNative ? PR_TRUE : PR_FALSE; } ********** CREATE (0x8c92c30) ********** out: modalEventQueue(0x8c8fe90) Document http://linuxbox/leak/page1.html loaded successfully ********** DESTROY (0x8c92c30) ********** Document http://linuxbox/leak/page2.html loaded successfully ********** CREATE (0x8c8b918) ********** Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1080897920 (LWP 7575)] 0x40252679 in PL_IsQueueNative (queue=0x0) at plevent.c:1279 /backups/MOZILLA_1_2_1_RELEASE/mozilla/xpcom/threads/plevent.c:1279:33657:beg:0x40252679 Current language: auto; currently c (gdb) where #0 0x40252679 in PL_IsQueueNative (queue=0x0) at plevent.c:1279 #1 0x402542f5 in nsEventQueueImpl::IsQueueNative(int*) (this=0x87e4a70, aResult=0xbfffcea8) at nsEventQueue.cpp:350 #2 0x41464f03 in nsAppShellService::Observe(nsISupports*, char const*, unsigned short const*) (this=0x81274e8, aSubject=0x87e4a70, aTopic=0x402c35d8 "nsIEventQueueActivated") at nsAppShellService.cpp:1067 #3 0x401e31bf in nsObserverService::NotifyObservers(nsISupports*, char const*, unsigned short const*) (this=0x80de618, aSubject=0x87e4a70, aTopic=0x402c35d8 "nsIEventQueueActivated", someData=0x0) at nsObserverService.cpp:211 #4 0x40253e6d in nsEventQueueImpl::NotifyObservers(char const*) (this=0x87e4a70, aTopic=0x402c35d8 "nsIEventQueueActivated") at nsEventQueue.cpp:224 #5 0x40253a81 in nsEventQueueImpl::InitFromPRThread(PRThread*, int) (this=0x87e4a70, thread=0x809dfd8, aNative=1) at nsEventQueue.cpp:176 #6 0x402560a3 in nsEventQueueServiceImpl::MakeNewQueue(PRThread*, int, nsIEventQueue**) (this=0x80a4320, thread=0x809dfd8, aNative=1, aQueue=0xbfffd13c) at nsEventQueueService.cpp:166 #7 0x4025665a in nsEventQueueServiceImpl::PushThreadEventQueue(nsIEventQueue**) (this=0x80a4320, aNewQueue=0xbfffd48c) at nsEventQueueService.cpp:285 #8 0x44f452d7 in nsXMLHttpRequest::Send(nsIVariant*) (this=0x8c8b918, aBody=0x8c35910) at nsXMLHttpRequest.cpp:1261 #9 0x40282467 in XPTC_InvokeByIndex () at xptcinvoke_gcc_x86_unix.cpp:66 #10 0x4098eb6e in XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (ccx=@0xbfffd7d0, mode=CALL_METHOD) at xpcwrappednative.cpp:2011 #11 0x40998d99 in XPC_WN_CallMethod(JSContext*, JSObject*, unsigned, long*, long*) (cx=0x86589e0, obj=0x83be218, argc=1, argv=0x8ca208c, vp=0xbfffd970) at xpcwrappednativejsops.cpp:1281 #12 0x400cbc2d in js_Invoke (cx=0x86589e0, argc=1, flags=0) at jsinterp.c:839 #13 0x400d9d6c in js_Interpret (cx=0x86589e0, result=0xbfffdf8c) at jsinterp.c:2803 #14 0x400cc3dd in js_Execute (cx=0x86589e0, chain=0x85bd210, script=0x8c3f0a8, down=0x0, special=0, result=0xbfffdf8c) at jsinterp.c:1020 #15 0x4009c77c in JS_EvaluateUCScriptForPrincipals (cx=0x86589e0, obj=0x85bd210, principals=0x8c7e988, chars=0x8c921b8, length=448, filename=0xbfffe0c0 "http://linuxbox/leak/page1.html", lineno=7, rval=0xbfffdf8c) at jsapi.c:3382 #16 0x42b12d5c in nsJSContext::EvaluateString(nsAString const&, void*, nsIPrincipal*, char const*, unsigned, char const*, nsAString&, int*) (this=0x8658970, aScript=@0xbfffe240, aScopeObject=0x85bd210, aPrincipal=0x8c7e984, aURL=0xbfffe0c0 "http://linuxbox/leak/page1.html", aLineNo=7, aVersion=0x40120cc8 "default", aRetValue=@0xbfffe100, aIsUndefined=0xbfffe0ac) at nsJSEnvironment.cpp:694 #17 0x4124b7de in nsScriptLoader::EvaluateScript(nsScriptLoadRequest*, nsAFlatString const&) (this=0x8c862b8, aRequest=0x8c0edd0, aScript=@0xbfffe240) at nsScriptLoader.cpp:582 #18 0x4124b177 in nsScriptLoader::ProcessRequest(nsScriptLoadRequest*) (this=0x8c862b8, aRequest=0x8c0edd0) at nsScriptLoader.cpp:493 #19 0x4124ae91 in nsScriptLoader::ProcessScriptElement(nsIDOMHTMLScriptElement*, nsIScriptLoaderObserver*) (this=0x8c862b8, aElement=0x8a41224, aObserver=0x8a41228) at nsScriptLoader.cpp:436 #20 0x40f7bac8 in nsHTMLScriptElement::MaybeProcessScript() (this=0x8a41200) at nsHTMLScriptElement.cpp:422 #21 0x40f7b053 in nsHTMLScriptElement::SetDocument(nsIDocument*, int, int) (this=0x8a41200, aDocument=0x8c0e3f8, aDeep=0, aCompileEventHandlers=1) at nsHTMLScriptElement.cpp:195 #22 0x40efdce3 in nsGenericHTMLContainerElement::AppendChildTo(nsIContent*, int, int) (this=0x8c30ad0, aKid=0x8a41200, aNotify=0, aDeepSetDocument=0) at nsGenericHTMLElement.cpp:4093 #23 0x40fbd424 in HTMLContentSink::ProcessSCRIPTTag(nsIParserNode const&) (this=0x89d5940, aNode=@0x8c8f090) at nsHTMLContentSink.cpp:5690 #24 0x40fb65df in HTMLContentSink::AddLeaf(nsIParserNode const&) (this=0x89d5940, aNode=@0x8c8f090) at nsHTMLContentSink.cpp:3706 #25 0x41ea823e in CNavDTD::AddLeaf(nsIParserNode const*) (this=0x8c8b7e8, aNode=0x8c8f090) at CNavDTD.cpp:3799 #26 0x41ea8452 in CNavDTD::AddHeadLeaf(nsIParserNode*) (this=0x8c8b7e8, aNode=0x8c8f090) at CNavDTD.cpp:3862 #27 0x41ea4a06 in CNavDTD::HandleStartToken(CToken*) (this=0x8c8b7e8, aToken=0x8ca01b0) at CNavDTD.cpp:1757 #28 0x41ea2d84 in CNavDTD::HandleToken(CToken*, nsIParser*) (this=0x8c8b7e8, aToken=0x0, aParser=0x8c9f5d0) at CNavDTD.cpp:910 #29 0x41ea1f9f in CNavDTD::BuildModel(nsIParser*, nsITokenizer*, nsITokenObserver*, nsIContentSink*) (this=0x8c8b7e8, aParser=0x8c9f5d0, aTokenizer=0x8c2d6c0, anObserver=0x0, aSink=0x89d5940) at CNavDTD.cpp:521 #30 0x41ebcdb9 in nsParser::BuildModel() (this=0x8c9f5d0) at nsParser.cpp:1885 #31 0x41ebc9f2 in nsParser::ResumeParse(int, int, int) (this=0x8c9f5d0, allowIteration=1, aIsFinalChunk=0, aCanInterrupt=1) at nsParser.cpp:1752 #32 0x41ebe4f6 in nsParser::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned, unsigned) (this=0x8c9f5d0, request=0x8b96bd8, aContext=0x0, pIStream=0x8bdccb0, sourceOffset=0, aLength=928) at nsParser.cpp:2386 #33 0x4295a700 in nsDocumentOpenInfo::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned, unsigned) (this=0x8a19a38, request=0x8b96bd8, aCtxt=0x0, inStr=0x8bdccb0, sourceOffset=0, count=928) at nsURILoader.cpp:244 #34 0x40bdc841 in nsHttpChannel::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned, unsigned) (this=0x8b96bd8, request=0x8c4636c, ctxt=0x0, input=0x8bdccb0, offset=0, count=928) at nsHttpChannel.cpp:3096 #35 0x40b84e6b in nsOnDataAvailableEvent::HandleEvent() (this=0x444370d8) at nsStreamListenerProxy.cpp:193 #36 0x40b64ab5 in nsARequestObserverEvent::HandlePLEvent(PLEvent*) (plev=0x444370dc) at nsRequestObserverProxy.cpp:115 #37 0x402520ad in PL_HandleEvent (self=0x444370dc) at plevent.c:644 #38 0x40251edf in PL_ProcessPendingEvents (self=0x80de6b8) at plevent.c:574 #39 0x4025446e in nsEventQueueImpl::ProcessPendingEvents() (this=0x80a4278) at nsEventQueue.cpp:388 #40 0x414b449a in event_processor_callback (data=0x80a4278, source=6, condition=GDK_INPUT_READ) at nsAppShell.cpp:184 #41 0x414b3ed5 in our_gdk_io_invoke (source=0x82600e0, condition=G_IO_IN, data=0x825ff38) at nsAppShell.cpp:76 #42 0x404ea0a6 in g_io_add_watch () from /usr/lib/libglib-1.2.so.0 #43 0x404eb9ae in g_get_current_time () from /usr/lib/libglib-1.2.so.0 #44 0x404ebe89 in g_get_current_time () from /usr/lib/libglib-1.2.so.0 #45 0x404ec124 in g_main_run () from /usr/lib/libglib-1.2.so.0 #46 0x403f727f in gtk_main () from /usr/lib/libgtk-1.2.so.0 #47 0x414b49bc in nsAppShell::Run() (this=0x8141530) at nsAppShell.cpp:332 #48 0x41463657 in nsAppShellService::Run() (this=0x81274e8) at nsAppShellService.cpp:471 #49 0x0805dfd4 in main1 (argc=3, argv=0xbffff3a4, nativeApp=0x809f838) at nsAppRunner.cpp:1541 #50 0x0805ec48 in main (argc=3, argv=0xbffff3a4) at nsAppRunner.cpp:1902 #51 0x420156a4 in __libc_start_main () from /lib/tls/libc.so.6 (gdb)

Comment 15

•

22 years ago

What happens is on the 501'st call, PL_IsQueueNative param "queue" is nil. Starting at PushThreadEventQueue and ending at PL_IsQueueNative.

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 16

•

22 years ago

I can confirm this still crashes (linux) on a new trunk build from this morning.

Comment 17

•

22 years ago

Doug, do you know anything about limitations on the number of calls to PushThreadEventQueue()? It seems like popping the queue does not help, and we crash on 501st call.

Comment 18

•

22 years ago

Attached patch FIX (obsolete) — Details — Splinter Review

This seems to fix the crash. My test running is at 3,000 and climbing. So the crash at 501 is fixed by this patch. --pete

Comment 19

•

22 years ago

Man this is leaking like a siv. Using, my test case posted in bug (208127), after about 3,031 iterations. Start memory aprox 22Megs End Memory aprox 50Megs Serious leakage here.

Assignee: heikki → petejc

Whiteboard: patch, need review

Target Milestone: --- → mozilla1.5beta

Updated

•

22 years ago

Status: NEW → ASSIGNED

Priority: -- → P1

Comment 20

•

22 years ago

Leaked 30 Megs in about 4 minutes.

Comment 21

•

22 years ago

*** Bug 208127 has been marked as a duplicate of this bug. ***

Comment 22

•

22 years ago

Anyone around to review this one line patch? CC'ing Brendan for some review love.

Doug Turner (:dougt)

Updated

•

22 years ago

Attachment #127820 - Flags: superreview?(brendan)

Attachment #127820 - Flags: review?(dougt)

Brendan Eich [:brendan]

Comment 23

•

22 years ago

Why is adding NS_RELEASE a crash fix? Wouldn't that cure a memory leak? /be

Comment 24

•

22 years ago

It fixes the crash. Remove it and we crash at 50st1 call to PushThreadEventQueue. The memory leak here is actually a different issue. This bug is about crashing on the 501st call. My guess is perhaps there is a limit to how many ADDREFS can be made on a single class? I really don't know this code,. just that the explicit RELEASE in the patch fixes this crasher.

Brendan Eich [:brendan]

Comment 25

•

22 years ago

There's nothing magic about 500. Dougt, lend us a clue. /be

Comment 26

•

22 years ago

Brendan, if Dougt is too busy who else knows this code well enough to review? It's a one liner, I've been running w/ it for some time and haven't seen any related problems. Can you help me get some traction on this nasty crasher fix so I can check it in? Thanks

Comment 27

•

22 years ago

ok, here's the testcase: const eqs=Components.classes["@mozilla.org/event-queue-service;1"].getService(Components.interfaces.nsIEventQueueService) for (a=0; a<10000; print(++a)) eqs.popThreadEventQueue(eqs.pushThreadEventQueue()); On raistlin-beos [cvs from now] (only tested once because it took too long) $ ./run-mozilla.sh ./xpcshell /tmp/206947 ... 5821 Kill Thread I don't know if this was killed for resource reasons or what. On cobra-darwin [cvs from this week] (only tested once because it took too long) ... 3029 ./run-mozilla.sh: line 72: 13142 Segmentation fault "$prog" ${1+"$@"} On boffo-linux [cvs from last week + some minor changes] (tested a few times) ... 509 ./run-mozilla.sh: line 72: 16248 Segmentation fault "$prog" ${1+"$@"} On deathstarii-solaris [cvs from a long time ago + various changes that i can't remember] ... 125 Segmentation Fault - core dumped From this we can conclude that 500 is definitely not a magic number. 512 might be on linux (128 on solaris?), but i'm not sure how reasonable that is. Anyway. I see a whole bunch of things i don't like in the eqs/eq so i'm going to poke this stuff today and maybe sunday. For the time being i'm going to side w/ brendan

Comment 28

•

22 years ago

> For the time being i'm going to side w/ brendan Meaning? There is nothing special about 500? Anyway, did you try your test w/ the release to aQueue? NS_RELEASE(aQueue);

Comment 29

•

22 years ago

BTW: Timless's test case doesn't crash on FreeBSD Moz 1.3b.

Comment 30

•

22 years ago

Well, put it simply, i claim there are a few problems. 1. nsEventQueue::*Init don't check the nspr create event method for failure ^ I think i had a bug or at least plans to change this a long time ago, but i can't find the tree with those changes and am not going to search for a bug about it (this bug will be hijackable to suit my needs) 2. Pop doesn't trigger Unlink ^ the documentation in the two classes talks a lot about Unlink and lists a certain place where it'd be called. except well ... it isn't called. Running on cobra-darwin w/ a patch to make the init methods return failures. ... 4027 Program received signal EXC_BAD_ACCESS, Could not access memory. 0x00f2aa40 in nsCOMPtr_base::~nsCOMPtr_base() (this=0xbff80070, __in_chrg=0) at /Users/timeless/mozilla/xpcom/glue/nsCOMPtr.cpp:62 #0 0x00f2aa40 in nsCOMPtr_base::~nsCOMPtr_base() (this=0xbff80070, __in_chrg=0) at /Users/timeless/mozilla/xpcom/glue/nsCOMPtr.cpp:62 #1 0x00ef8db0 in nsEventQueueImpl::GetYoungestActive(nsIEventQueue**) (this=0x8fde10, aQueue=0xbff800f0) at ../../dist/include/xpcom/nsCOMPtr.h:1063 ... #4027 0x00ef8db0 in nsEventQueueImpl::GetYoungestActive(nsIEventQueue**) (this=0x1b5a20, aQueue=0xbfffde70) at ../../dist/include/xpcom/nsCOMPtr.h:1063 #4028 0x00f4b19c in nsEventQueueServiceImpl::GetYoungestEventQueue(nsIEventQueue*, nsIEventQueue**) (this=0xbff80070, queue=0xf57980, aResult=0xbfffdf30) at ../../dist/include/xpcom/nsCOMPtr.h:1063 Remember according to the obvious interpretation of my testcase event queues should not linger in the chain (and Unlink is the function which would make that happen). From the crash we can see that they're lingering. and from the code we can see there isn't a call to Unlink anywhere near this.

Comment 31

•

22 years ago

Well, it appears on Linux, nsEventQueueImpl::InitFromPRThread -> mEventQueue = PL_CreateNativeEventQueue("Thread event queue...", thread); after the 509th asssignment, PL_CreateNativeEventQueue returns (nil) So, mEventQueue is assigned a null value and the chain of events leading to the crash starts: Specifically, CheckForDeactivation(). Crash, Boom.

Comment 32

•

22 years ago

So it appears, that after 508 successful calls to PL_CreateNativeEventQueue all on the same thread things break. I notice Unlik is called in the nsEventQueueImpl destructor, and this works if you set the test to 508 times. All 508 objects allocated, are properly released when the service is realeased.

Comment 33

•

22 years ago

Attached patch using PL_CreateMonitoredEventQueue prevents crash (obsolete) — Details — Splinter Review

Running Timeless's test case, using PL_CreateMonitoredEventQueue instead of PL_CreateNativeEventQueue, it all seems to work peachy. I reach 1000 no probes and the queue is properly destroyed when the service is unloaded. Any reason we need to use PL_CreateNativeEventQueue?

Comment 34

•

22 years ago

Attached patch Error check and Unlink — Details — Splinter Review

Here's an alternative testcase: const eqs=Components.classes["@mozilla.org/event-queue-service;1"].getService(Components.interfaces.nsIEventQueueService) try { for (a=0; a<10000; print(++a)) { eqs.popThreadEventQueue(eqs.pushThreadEventQueue()); gc(); } } finally { quit(); } with this testcase and the patch i'm attaching things work up to 4099 (and growing, but very slowly. gc at each iteration is expensive).

Comment 35

•

22 years ago

Timeless, this patch just does error checking and throws out an NS_ERROR_OUT_OF_MEMORY On Linux it still dies at 509 509 uncaught exception: [Exception... "Component returned failure code: 0x8007000e (NS_ERROR_OUT_OF_MEMORY) [nsIEventQueueService.pushThreadEventQueue]" nsresult: "0x8007000e (NS_ERROR_OUT_OF_MEMORY)" location: "JS frame :: test.js :: <TOP_LEVEL> :: line 4" data: no] I think the root of the problem is this: On some systems, PL_CreateNativeEventQueue after "n" calls on the same thread, will assign nil to mEventQueue. Right here: http://lxr.mozilla.org/seamonkey/source/xpcom/threads/nsEventQueue.cpp#170 Slap a printf right after this assignment and you'll see something like this (on Linux): 504 mEventQueue(0x813a120) 505 mEventQueue(0x813a358) 506 mEventQueue(0x813a590) 507 mEventQueue(0x813a7c8) 508 mEventQueue(0x813aa00) 509 mEventQueue((nil)) Still seems like some kind of limit is reached here. Error checking indeed needs to be there, but the root of the problemstill needs a fix'n.

Comment 36

•

22 years ago

Traced it a little further to here: http://lxr.mozilla.org/seamonkey/source/xpcom/threads/plevent.c#852 Looks like a system limitation, when filedes[1] == 1024 pipe(self->eventPipe); When self->eventPipe[1] == 1024 pipe() returns -1 $ ulimit -a open files (-n) 1024 It looks like the system allows only 1024 file descs . . . So, it appears that threadEventQueues that are poped are not really fully poped until the service is released and the nsEventQueue destructors are then called. Since the service is only released at shutdown, poped queues are not destroyed, still holding on to their file descriptors. So this would explain the disparity of resutls between the various systems. Everything is being properly addrefed and released, but like I said, only destroyed when the service is unloaded.

Comment 37

•

22 years ago

Attached patch FIX — Details — Splinter Review

This two liner correctly resolves the problem I believe. Using Timeless's test case, I ran it up to 10,000. 9992 9993 9994 9995 9996 9997 9998 9999 10000 Timeless, can you verify this patch on some other systems. If good, then "r=" it and maybe Brendan can give me some "sr=" love. Gotta close those open files folks.

Attachment #127820 - Attachment is obsolete: true

Attachment #129029 - Attachment is obsolete: true

Comment 38

•

22 years ago

Well patch #129066 fixes the test case but breaks other things. So, still looking for the right fix here . . .

David Bradley

Comment 39

•

22 years ago

Timeless, your GC solution may fix your specific test case, but from what I can tell, it would have no impact on the first test case posted here. Is that test case still valid? So which test case are we currently trying to fix?

Comment 40

•

22 years ago

Comment on attachment 129066 [details] [diff] [review] FIX this is wrong and breaks darwin(macosx)

Attachment #129066 - Flags: review-

Dan M

Comment 41

•

22 years ago

cc:ing a few folks who might be more clueful than I about why nsXMLContentSink::DidBuildModel is never called. (Did I get your attention?) You guys have already noticed, but, closing the event pipes after each ProcessPendingEvents would not be a good thing (tm). But I think you are right, though, that the problem is leaking event queues holding on to their pipe file descriptors until the system chokes itself to death. However this is not caused by the event queue service, which does not hold a reference to event queues. I see this: each iteration of nsXMLHttpRequest::Send in timeless' test case pushes a new event queue (line 1283). Though this queue is properly popped and cleaned up it is never actually destroyed because of a persistent reference held in nsParser::mEventQueue, set in nsParser's ctor. This ref would be dropped in nsParser's dtor but nsParser itself is leaking. That's because of a circular reference between nsParser::mSink, an nsXMLContentSink, and that nsXMLContentSink::mParser. That circular reference would be broken in nsXMLContentSink::DidBuildModel, but DidBuildModel is never called, ever. That's as far as I've traced it. This sucks. Old event queues are building up in a never-ending stack and when they get 1,000 deep the app goes down. Can't blame it. I feel pretty confident this crash will go away if the leaking nsParser can be plugged.

David Bradley

Comment 42

•

22 years ago

From what I can tell, nsParser is holding onto them. It's getting constructed, grabs one of the event queues for the thread. I'm unsure why the parser isn't getting destructed. I see the destructor hit, but not for all instances.

Bjarne D Mathiesen

Comment 43

•

22 years ago

the timeless patch attachment.cgi?id=129042 applied to a cvs pulled at around 20030802 20:00 UTC+02:00 gives the following results: Mac OS X 10.2.6 with all the whistes and bells to compile the beasts 1) the counter stops at 65535 = 2^16 - 1 2) memory use : PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE 7404 MozillaFir 0.0% 41:58.71 16 254 2701 517M 29.0M 334M 809M which is way about normal: 17968 mozilla-bi 0.0% 0:34.56 16 254 512 21.3M 24.5M 29.9M 280M 5543 mozilla-bi 0.0% 25:16.92 16 260 1051 46.1M 30.5M 67.9M 441M 766 mozilla-bi 0.0% 2:43:59 16 266 1677 61.5M 32.5M 85.6M 480M 3) the test-browser is unresponsive to loading pages in new tabs, javascript console won't start; but opening and closing tabs works, resizing of window works. Closing the window doesn't help things - I get an arrow with a busy-indicator ; the window menu indicates that all the windows I've requested are active, but no windows get drawn

Updated

•

22 years ago

Attachment #127820 - Flags: superreview?(brendan)

Attachment #127820 - Flags: review?(dougt)

David Bradley

Comment 44

•

22 years ago

I see the ref counts go up on both the parser and the content sink when SetContentSink is called in nsXMLDocument::StartDocumentLoad.

Comment 45

•

22 years ago

Comment on attachment 129042 [details] [diff] [review] Error check and Unlink danm: i know that this doesn't fix all of the problems, but it fixes the testcase in comment 34, and some other crashers. I know we still want some fix for parser, and I still plan to write some code so that xpconnect won't hold a strong reference to eventqueues, but would you please review these changes?

Attachment #129042 - Flags: review?(danm-moz)

Dan M

Comment 46

•

22 years ago

Restating comment 41 because it seems I wasn't clear: we have rampant leaks and it looks to me that several of them, including the important nsEventQueue leak, will be plugged if the parser (?) would only call nsXMLContentSink::DidBuildModel. That ain't happening, and it's a problem. timeless: I'd like to see the attachment 129042 [details] [diff] [review] patch in a separate (perhaps dependent) bug because I consider it a distraction from the core, leak problem. The tight push/pop loop in the comment 34 testcase unfortunately pushes N simultaneous queues. It looks to me that there's no long-term leak: they are all properly unwound though only after the loop finishes. I imagine this is a problem like the comment 0 testcase for similar reasons. So obviously the two cases are related but the fixes will be different and the cc:list on this bug is growing, and I'd like to compartmentalize the two issues and avoid spamming potential help on this leak. If you'll put that patch in a new bug (sorry!) I have a 100-line comment ready for it :).

Comment 47

•

22 years ago

I don't believe nsXMLContentSink is being used at all in test case #124114. I put printf's in the class ctor and dtor and don't see them showing up. The nsParser is certainly used and appears to be properly destroyed. nsXMLHTTPRequest is properly being created and destroyed as well.

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 48

•

22 years ago

Actually, I am seeing the parser leak now. It appears to be 2 to 1. nsParser CREATE (0x43b92aa8) nsParser CREATE (0x43bc68d8) nsParser DESTROY (0x43b92aa8) Document http://linuxbox/leak/page1.html loaded successfully nsParser CREATE (0x43b0c950) nsParser CREATE (0x43b9f1c0) nsParser DESTROY (0x43b0c950) Document http://linuxbox/leak/page1.html loaded successfully nsParser CREATE (0x43bb81c0) nsParser CREATE (0x43bcedd0) nsParser DESTROY (0x43bb81c0) Document http://linuxbox/leak/page1.html loaded successfully nsParser CREATE (0x43bb79b8) nsParser CREATE (0x43bde3d0) nsParser DESTROY (0x43bb79b8) Document http://linuxbox/leak/page1.html loaded successfully Where the second create is never released.

Comment 49

•

22 years ago

The XML sink will not be created because XMLHttpRequest will only try to parse things if the document mime type is XML, or if you explicitly override the mime type. It should not even create the parser in this testcase. Have a look at XMLHttpRequest::OnStartRequest: http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#952

Comment 50

•

22 years ago

Right. that's what I thought. I'm using text/plain in my test case. I do see the parser leak. So obviously it's not being caused by nsXMLContentSink. There is only one reference from nsXMLHTTPRequest to nsIParser here: http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#987

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 51

•

22 years ago

Nope they *are* being created. I did a compile from the tip and now I see the printfs. nsXMLDocument and nsXMLContentSink are indeed being created. Doh!

Comment 52

•

22 years ago

Attached file Potential workaround - call overrideMimeType — Details

Try this. Now we call overrideMimeType() to try and force the creation and use for the XML sink, which should result in proper cleanup. I also realized we do have a parser even in the case where there is no XML data, and we do not force the mime type. If this workaround actually works (haven't tried), then we should find a way to clean things up without requiring the sink in cases like this where it's pointless trying to parse the data as XML.

Comment 53

•

22 years ago

The XML Content sink object is being created w/ a null docshell. http://lxr.mozilla.org/seamonkey/source/content/xml/document/src/nsXMLDocument.cpp#671 Maybe that is setting off a chain of events that leads the the leak in nsXMLContentSink.

Comment 54

•

22 years ago

Attached patch Proposed FIX — Details — Splinter Review

W/ this patch, there are no leaks, no crash and everything seems to work peachy. --pete

Dan M

Comment 55

•

22 years ago

I can verify that both overrideMimeType (comment 52) and not setting the content sink when there's no docshell (comment 54) prevent event queues from leaking in the comment 0 testcase. (Note I just checked in a patch to nsEventQueue.cpp that makes verification easy.) pete's patch looks interesting but the right reviewer, that guy isn't me. (jst? heikki?)

Updated

•

22 years ago

Attachment #129191 - Flags: superreview?(jst)

Attachment #129191 - Flags: review?(hjtoi-bugzilla)

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 56

•

22 years ago

Changing target to 1.6 alpha.

Target Milestone: mozilla1.5beta → mozilla1.6alpha

Comment 57

•

22 years ago

I think we should try to avoid creating the objects we don't really need. In this case it seems like we shouldn't need the XML content sink. Please try to change the code so we won't create the XML sink in this case. If you can get that to work I'd rather approve that kind of change. (If it won't work I can re-evaluate this patch as well.)

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 58

•

22 years ago

> Please try to change the code so we won't create the XML sink in this case. > If you can get that to work I'd rather approve that kind of change. No it doesn't work, that's the first thing i tried and other things started to break. There are lots of instances where there is no docshell, but the new content sink object is required. That's why the conditionals in this patch are where they are.

Comment 59

•

22 years ago

Comment on attachment 129191 [details] [diff] [review] Proposed FIX Okay, r=heikki

Attachment #129191 - Flags: review?(hjtoi-bugzilla) → review+

Comment 60

•

22 years ago

Thanks Heikki. jst or ping any super reviewers cc'd on this bug. I haven't heard from jst in a while.

Johnny Stenback (:jst)

Comment 61

•

22 years ago

Comment on attachment 129191 [details] [diff] [review] Proposed FIX It seems to me that there's another problem lurking here that should IMO be fixed in stead of simply not handing the sink to the parser when there's no docshell around. If there's a leak in that case, we should fix the leak, not hack around it for this single case. If this is needed for 1.5, or whatever the target is, I'll sr, but please do open a new bug about backing out this change and fixing the actual leak.

Attachment #129191 - Flags: superreview?(jst) → superreview+

Comment 62

•

22 years ago

jst, thanks for the quick review. I'll make sure I clearly comment this and refer to this bug number before checking in. Then whoever has time can track this down even further.

Mike Pinkerton (not reading bugmail)

Comment 63

•

22 years ago

Checked in.

Comment 64

•

22 years ago

this caused camino to hang at startup parsing its bookmarks. backing it out fixed it. what should i do?

Comment 65

•

22 years ago

Ok, i guess we need to probe deeper to find the root source of the leak like jst suggested.

Assignee

Comment 66

•

22 years ago

Attached patch Another proposed fix (obsolete) — Details — Splinter Review

The core problem here seems to be in the parser API's. Once you start an async parse by calling Parse(), the parser assumes that you will send it the appropriate nsIStreamListener events. There seems to be no way of getting it to shutdown properly otherwise. I tried calling nsIParser::Terminate(), but it doesn't do anything useful in this case. I even tried to send it just a OnStopRequest() without sending it any data, but it doesnt like that very much either. Some scary looking assertions fired inside the parser. In this particular case, it looks like the Parse() call is set off by the call to nsXMLDocument::StartDocumentLoad() in Send(). However, as Heikki pointed out we may subsequently decide, in OnStartRequest(), to not forward the nsIStreamListner events into the parser for non-xml mimetypes etc. (BTW, that was bug 129607). I don't know enough about the code here, but it seems to me that one resonable way to avoid this situation would be delay notifying the document that a load has started till such a time that nsXMLHttpRequest has made a decision about whether it wants to try building an xml document. This would mean that Parse() is not called, and the contentsink is not created unless nsXMLHttpRequest is sure that it intends to send forward nsIStreamListener events into the parser. This patch is my stab at taking this approach of lazily calling StartDocumentLoad(). It appears to work OK. The testcase in comment 0 doesnt leak/crash anymore. I tested this a bit (basically the online tests up at http://www.mozilla.org/xmlextras/tests.html) Nothing appears to be broken.

Assignee

Comment 67

•

22 years ago

Attached patch More agressive alternative patch (that I did not finish) — Details — Splinter Review

This is something that I started with and abandonded since it was getting into areas that I really dont understand. I can finish/test it if the approach sounds better than the earlier one. I was trying to delay the construction of the document itself till it is absolutely needed. It also fixed the leak, but is untested beyond that. It turns out the document is used to dispatch events when the request completes and I was not sure that my changes in RequestCompleted() made sense, or how to test them.

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Updated

•

22 years ago

Attachment #130523 - Flags: superreview?(jst)

Attachment #130523 - Flags: review?(hjtoi-bugzilla)

Comment 68

•

22 years ago

Comment on attachment 130523 [details] [diff] [review] Another proposed fix Seems like a good idea! >Index: nsXMLHttpRequest.cpp >=================================================================== >+ if (!mDocument) { >+ // We were probably aborted. So no point in attempting to parse >+ // incoming data. >+ mState &= ~XML_HTTP_REQUEST_PARSEBODY; >+ } Did you actually hit this condition? How is it possible to be aborted before starting data transfer? >+ } else if (!parser || (parser && parser->IsParserEnabled())) { >+ // If we do have a parser, then it needs to be enabled for us to >+ // safely call RequestCompleted(). If the parser is not enabled, >+ // it means it was blocked, by xml-stylesheet PI for example, and >+ // is still building the document. RequestCompleted() must be > // called later, when we get the load event from the document. This comment seems lacking. Why do we need to treat not having parser the same as having enabled parser? >- rv = document->StartDocumentLoad(kLoadAsData, mChannel, >- loadGroup, nsnull, >- getter_AddRefs(listener), >- PR_TRUE); > mChannel->SetNotificationCallbacks(this); > rv = mChannel->AsyncOpen(this, nsnull); I am most wary of this change. Is it really ok to open a channel without telling the document first to start loading?

Boris Zbarsky [:bzbarsky]

Comment 69

•

22 years ago

> Is it really ok to open a channel without telling the document first to start > loading? Should be -- that's how normal content area loads work. In fact, telling the document to start loading before the channel calls OnStartRequest is very much wrong -- StartDocumentLoad asks the channel for things like the content type, etc. Since the channel's listener is 'this', you just want to call StartDocumentLoad from the OnStartRequest of 'this' and then forward the other notifications to the document's listener (if any). That's essentially what the URILoader does for normal content area loads.

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 70

•

22 years ago

> Did you actually hit this condition? How is it possible to be aborted > before starting data transfer? Yes I did. The reason I put that in was that I saw this condition fairly regularly on the following two tests. http://www.mozilla.org/xmlextras/xgetabort.html http://www.mozilla.org/xmlextras/xgetopen.html I presumed that it would be quite natural for that to happen when async requests are aborted. I'm fairly certain that the current code was also ending up in OnStartRequest() after the Abort() call, before I touched it at all. Was that not supposed to happen? > This comment seems lacking. Why do we need to treat not having parser the > same as having enabled parser? Indeed it is lacking. Is the following better? // If we don't have a parser, we never attempted to parse the // incoming data, and we can proceed to call RequestCompleted(). // Alternatively, if we do have a parser, its possible that we // have given it some data and thus caused it to block e.g. by a // by a xml-stylesheet PI. In this case, we will have to wait till // it gets enabled again and RequestCompleted() must be called // later, when we get the load event from the document. If the // parser is enabled, it is not blocked and we can still go ahead // and call RequestCompleted() and expect everything to get // cleaned up immediately. I am open to suggestions about improving that further. And I guess bz adressed your last point, it does appear to be safe to open the channel without starting the load. Thats kinda the whole point of the patch :-)

Comment 71

•

22 years ago

Comment on attachment 130523 [details] [diff] [review] Another proposed fix With the new comment, r=

Attachment #130523 - Flags: review?(hjtoi-bugzilla) → review+

Assignee

Comment 72

•

22 years ago

Attached patch Same patch. Updated comment foo. — Details — Splinter Review

Assignee

Updated

•

22 years ago

Attachment #130523 - Attachment is obsolete: true

Assignee

Comment 73

•

22 years ago

Comment on attachment 131388 [details] [diff] [review] Same patch. Updated comment foo. Marking heikki's review. Tranferring sr request to updated patch. jst, can you please have a look at this.

Attachment #131388 - Flags: superreview?(jst)

Attachment #131388 - Flags: review+

Assignee

Updated

•

22 years ago

Attachment #130523 - Flags: superreview?(jst)

Johnny Stenback (:jst)

Comment 74

•

22 years ago

Comment on attachment 131388 [details] [diff] [review] Same patch. Updated comment foo. - In nsXMLHttpRequest::OnStartRequest(): + return mXMLParserStreamListener->OnStartRequest(request,ctxt); While you're at it, please add a space after ','. - In nsXMLHttpRequest::OnStopRequest(): + } else if (!parser || (parser && parser->IsParserEnabled())) { Unnecessary parser check, simply: + } else if (!parser || parser->IsParserEnabled()) { would do (the rhs of an || is only executed if the lhs expression evaluates to false, and in that case, we know parse is non-null here). sr=jst

Attachment #131388 - Flags: superreview?(jst) → superreview+

Assignee

Comment 75

•

22 years ago

Taking bug since I have the lastest patch, and I don't want to forget about checking this in. jst, thanks for the sr. I can't believe I wrote that redundant parser check :-) Will update and check this in soon.

Assignee: petejc → keeda

Status: ASSIGNED → NEW