Closed
Bug 957666
Opened 10 years ago
Closed 10 years ago
Frequent "System JS : ERROR (null):0 - uncaught exception: 2147746065" spammed during desktop mochitest runs, coming from httpd.js (nsIHttpResponse.write)
Categories
(Core :: Networking, defect)
Core
Networking
Tracking
()
People
(Reporter: RyanVM, Assigned: smaug)
References
Details
(Whiteboard: [qa-])
Attachments
(1 file)
648 bytes,
patch
|
ehsan.akhgari
:
review+
|
Details | Diff | Splinter Review |
I know this has been around for awhile now, but it's annoying enough that it needs investigation if for no other reason than it being a significant source of log spam (thousands of instances per log in the ones I'm looking at). It has also served as a point of confusion more than once about whether it's contributing to intermittent failures or not. Obviously, the error message doesn't give much to go on, but there are a few things that might help track it down. 1. It happens on all desktop platforms when running debug mochitests. AFAICT, it doesn't occur in any other test suites. --> On Linux mochitest-1, it always starts after content/base/test/test_bug338583.html. --> OSX and Windows seem to be a bit more variable. Presumably due to differences in what tests are run on each platform? --> On B2G desktop, it starts during dom/imptests/editing/conformancetest/test_event.html. 2. It happens on B2G desktop mochitests, which are opt. 3. It does *not* happen on Android/B2G mochitests, even on Cedar where we run them on debug as well. Example logs: https://tbpl.mozilla.org/php/getParsedLog.php?id=32699553&tree=Mozilla-Central&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=32699563&tree=Mozilla-Central&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=32697267&tree=Mozilla-Central&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=32699267&tree=Mozilla-Central&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=32699354&tree=Mozilla-Central&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=32697430&tree=Mozilla-Central&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=32697163&tree=Mozilla-Central&full=1 Given the reproducibility, I'm hoping someone can look at this. It's a very frustrating error for the reason given above.
Comment 1•10 years ago
|
||
This is where this error comes from: http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/nsXPConnect.cpp#249
Comment 2•10 years ago
|
||
I did not repro on Mac. On Linux, I keep hitting bug 957729. Ryan, can you please get somebody to fix the js engine there?
Depends on: 957729
Comment 3•10 years ago
|
||
Given the workaround in bug 957729 comment 5, I managed to run all of content/base tests under the debugger without hitting this error path. Then I started a full mochitest run and let it run over night and I hit the crash in bug 957729 comment 8. Ryan, do you have any other ideas/hunches/guesses on how I can reproduce this locally?
Comment 5•10 years ago
|
||
OK, no luck in repro'ing on Mac either. I'll file a bug to get a loaner.
Comment 6•10 years ago
|
||
2147746065 == NS_ERROR_NOT_AVAILABLE fwiw
Comment 7•10 years ago
|
||
Status update: I have been trying really hard to get my slave to run tests in a way that I can debug them. I gave up doing that earlier today and am doing my own build on the slave at this point. The build is 90 minutes in and still running. I'm going home now! More next week.
Comment 8•10 years ago
|
||
I finally figured out what's happening here, and why I could never catch this under the debugger. I pushed this patch to try <https://tbpl.mozilla.org/?tree=Try&rev=cabfea40e5d3> which rewrote a bunch of NS_ERROR_NOT_AVAILABLE error codes thrown in httpd.js to dummy error values to see what the error message in the log <https://tbpl.mozilla.org/php/getParsedLog.php?id=34331844&tree=Try&full=1> would look like, and lo and behold: System JS : ERROR (null):0 - uncaught exception: 2152136708 So, this means that these are coming from: 1.44 // 1.45 // see nsIHttpResponse.write 1.46 // 1.47 write: function(data) 1.48 { 1.49 if (this._finished) 1.50 - throw Cr.NS_ERROR_NOT_AVAILABLE; 1.51 + throw Cr.NS_BASE_STREAM_ILLEGAL_ARGS; 1.52 1.53 var dataAsString = String(data); 1.54 this.bodyOutputStream.write(dataAsString, dataAsString.length); 1.55 }, Which means that they are caused by something calling nsIHttpResponse.write after the response is finished. Someone who knows more about that code should investigate. Filed bug 969760 on the XPConnect side of this puzzle.
Component: General → Networking
Summary: Frequent "System JS : ERROR (null):0 - uncaught exception: 2147746065" spammed during desktop mochitest runs → Frequent "System JS : ERROR (null):0 - uncaught exception: 2147746065" spammed during desktop mochitest runs, coming from httpd.js (nsIHttpResponse.write)
Comment 9•10 years ago
|
||
The credit here should go to Karl for comment 6, BTW, which made me think about this idea!
Reporter | ||
Comment 10•10 years ago
|
||
Awesome detective work! Thanks for sticking with this, Ehsan :)
Comment 11•10 years ago
|
||
(In reply to comment #10) > Awesome detective work! Thanks for sticking with this, Ehsan :) No worries! I'm assuming that you will hunt down somebody else to finish the investigation here, BTW. ;-)
Reporter | ||
Comment 12•10 years ago
|
||
Jason, any suggestions for who might be able to help investigate here? :)
Flags: needinfo?(jduell.mcbugs)
Comment 13•10 years ago
|
||
httpd.js is sort of unowned at the moment. Waldo used to do a lot of work here, and might have an idea of what we should do in this case. It looks like some server-side handler code is writing too late in the game, but I'm not sure what could be causing what are either 1) late write() calls or 2) early reply finish.
Flags: needinfo?(jduell.mcbugs) → needinfo?(jwalden+bmo)
Reporter | ||
Comment 14•10 years ago
|
||
Jason, this could be another good bug for the workweek if you're still looking for ideas.
Comment 15•10 years ago
|
||
So I've hunted down the problematic code here, somewhat. What's happening is that we run content/base/test/test_bug338583.html (wish I'd seen comment 0 before I started debugging!) which as part of doTest7 invokes new EventSource("delayedServerEvents.sjs"). delayedServerEvents.sjs sets a 500ms repeating timer to do this: 12 if (strings_to_send.length == 0) { // clean up stuff here 18 } 19 20 resp.write(strings_to_send[0]); 21 strings_to_send = strings_to_send.splice(1, strings_to_send.length - 1); I added some logging to this sjs, and I see us invoking it 3 times while running this test. The first two times we send all the strings and empty out the array. The third time, we send 8 of the 10 strings, and then the write() calls start throwing, which means we never splice out the thing we tried to write, which means that we keep logging this exception message every 500ms. I assume the reason this happens is that the server closes the connection after sending the data. But eventsource then goes and reconnects! So we invoke the SJS again. And again. However many times we manage to in the 8s*stress_factor timer the test uses before it checks only the first three returned packets and calls close() on the EventSource. Is there a way to tell EventSource to not reconnect? If not, we should fix the sjs to either cancel the timer if write fails or set some sort of persistent cookie after the first time it's called to not try sending any more data or something...
Flags: needinfo?(bugs)
Assignee | ||
Comment 16•10 years ago
|
||
I guess server could send back huge retry reconnection time.
Flags: needinfo?(bugs)
Assignee | ||
Comment 17•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=c5b30960b443 default reconnection timeout is 5000ms, so this increases that quite a bit.
Comment 18•10 years ago
|
||
Try run says that fix works.
Assignee | ||
Updated•10 years ago
|
Attachment #8407174 -
Flags: review?(ehsan)
Updated•10 years ago
|
Attachment #8407174 -
Flags: review?(ehsan) → review+
Assignee | ||
Updated•10 years ago
|
Keywords: checkin-needed
Reporter | ||
Comment 19•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/f0d5c27b7257
Keywords: checkin-needed
Comment 20•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/f0d5c27b7257
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Reporter | ||
Comment 21•10 years ago
|
||
Please nominate this for Aurora uplift when you get a chance.
Assignee: nobody → bugs
status-firefox29:
--- → wontfix
status-firefox30:
--- → fixed
status-firefox31:
--- → fixed
status-firefox-esr24:
--- → wontfix
Flags: needinfo?(jwalden+bmo) → needinfo?(bugs)
Assignee | ||
Comment 22•10 years ago
|
||
Comment on attachment 8407174 [details] [diff] [review] possible fix [Approval Request Comment] Bug caused by (feature/regressing bug #): old stuff User impact if declined: this is testing only Testing completed (on m-c, etc.): just landed Risk to taking this patch (and alternatives if risky): testing only String or IDL/UUID changes made by this patch: NA
Attachment #8407174 -
Flags: approval-mozilla-aurora?
Flags: needinfo?(bugs)
Comment 23•10 years ago
|
||
This is NPOTB, why does it require an approval?
Reporter | ||
Comment 24•10 years ago
|
||
Comment on attachment 8407174 [details] [diff] [review] possible fix Because I wasn't paying attention. Sorry for the wasted time, smaug.
Attachment #8407174 -
Flags: approval-mozilla-aurora?
Reporter | ||
Updated•10 years ago
|
Whiteboard: [checkin-needed-aurora]
Reporter | ||
Comment 25•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/6583f34d6abe https://hg.mozilla.org/releases/mozilla-beta/rev/3051963e5ec3 https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/378ac35ff66b https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/6be991728f84 https://hg.mozilla.org/releases/mozilla-esr24/rev/e6a410ed1c2e
status-b2g-v1.2:
--- → fixed
status-b2g-v1.3:
--- → fixed
status-b2g-v1.4:
--- → fixed
status-b2g-v2.0:
--- → fixed
Whiteboard: [checkin-needed-aurora]
Reporter | ||
Updated•10 years ago
|
status-b2g-v1.3T:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•