Closed
Bug 654201
Opened 13 years ago
Closed 13 years ago
ASSERTION: Invalid start index: 'count == 0 || start < Length()' browser_bug581076.js browser_searching.js browser_bug618502.js browser_uninstalling.js browser_globalinformations.js
Categories
(Core :: Networking: HTTP, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: philikon, Assigned: mcmanus)
References
(Depends on 1 open bug)
Details
(Keywords: intermittent-failure, Whiteboard: [test which aborts the suite])
Attachments
(1 file, 5 obsolete files)
8.75 KB,
patch
|
mayhemer
:
review+
Biesinger
:
superreview+
|
Details | Diff | Splinter Review |
Comment 1•13 years ago
|
||
Based on the assertions I suspect this is a http issue: ###!!! ASSERTION: Invalid start index: 'count == 0 || start < Length()', file ../../../dist/include/nsTArray.h, line 832 nsTArray<nsHttpTransaction*,nsTArrayDefaultAllocator>::RemoveElementsAt [nsTArray.h:833] nsTArray<nsHttpTransaction*,nsTArrayDefaultAllocator>::RemoveElementAt [nsTArray.h:840] nsHttpConnectionMgr::ProcessPendingQForEntry [netwerk/protocol/http/nsHttpConnectionMgr.cpp:588] nsHttpConnectionMgr::OnMsgProcessPendingQ [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1024] nsHttpConnectionMgr::OnMsgReclaimConnection [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1110] nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1505] nsSocketOutputStream::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp:515] nsSocketTransport::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp:1530] nsSocketTransportService::DoPollIteration [netwerk/base/src/nsSocketTransportService2.cpp:740] nsSocketTransportService::OnProcessNextEvent [netwerk/base/src/nsSocketTransportService2.cpp:606] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:584] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp:637] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:618] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsThread::ThreadFunc [xpcom/threads/nsThread.cpp:272] _pt_root [nsprpub/pr/src/pthreads/ptthread.c:190] libSystem.B.dylib + 0x32155 libSystem.B.dylib + 0x32012 ###!!! ASSERTION: Invalid length: 'start + count <= Length()', file ../../../dist/include/nsTArray.h, line 833 nsTArray<nsHttpTransaction*,nsTArrayDefaultAllocator>::RemoveElementsAt [nsTArray.h:834] nsTArray<nsHttpTransaction*,nsTArrayDefaultAllocator>::RemoveElementAt [nsTArray.h:840] nsHttpConnectionMgr::ProcessPendingQForEntry [netwerk/protocol/http/nsHttpConnectionMgr.cpp:588] nsHttpConnectionMgr::OnMsgProcessPendingQ [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1024] nsHttpConnectionMgr::OnMsgReclaimConnection [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1110] nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1505] nsSocketOutputStream::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp:515] nsSocketTransport::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp:1530] nsSocketTransportService::DoPollIteration [netwerk/base/src/nsSocketTransportService2.cpp:740] nsSocketTransportService::OnProcessNextEvent [netwerk/base/src/nsSocketTransportService2.cpp:606] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:584] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp:637] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:618] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsThread::ThreadFunc [xpcom/threads/nsThread.cpp:272] _pt_root [nsprpub/pr/src/pthreads/ptthread.c:190] libSystem.B.dylib + 0x32155 libSystem.B.dylib + 0x32012 ###!!! ASSERTION: Release()ing something that hasn't been Acquire()ed: 'chainFront && CallStack::kNone != mDDEntry->mAcquisitionContext', file /builds/slave/services-cen-osx-dbg/build/obj-firefox/xpcom/build/BlockingResourceBase.cpp, line 193 mozilla::BlockingResourceBase::Release [obj-firefox/xpcom/build/BlockingResourceBase.cpp:195] mozilla::Mutex::Unlock [obj-firefox/xpcom/build/BlockingResourceBase.cpp:269] mozilla::Monitor::Unlock [Monitor.h:76] mozilla::MonitorAutoUnlock::MonitorAutoUnlock [Monitor.h:172] TimerThread::Run [xpcom/threads/TimerThread.cpp:290] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:618] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsThread::ThreadFunc [xpcom/threads/nsThread.cpp:272] _pt_root [nsprpub/pr/src/pthreads/ptthread.c:190] libSystem.B.dylib + 0x32155 libSystem.B.dylib + 0x32012 WARNING: Resource acquired at calling context : file /builds/slave/services-cen-osx-dbg/build/obj-firefox/xpcom/build/BlockingResourceBase.cpp, line 200 mozilla::Mutex::Lock [obj-firefox/xpcom/build/BlockingResourceBase.cpp:258] mozilla::Monitor::Lock [Monitor.h:71] mozilla::MonitorAutoUnlock::~MonitorAutoUnlock [Monitor.h:177] TimerThread::Run [xpcom/threads/TimerThread.cpp:320] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:618] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsThread::ThreadFunc [xpcom/threads/nsThread.cpp:272] _pt_root [nsprpub/pr/src/pthreads/ptthread.c:190] libSystem.B.dylib + 0x32155 libSystem.B.dylib + 0x32012 WARNING: is being released in non-LIFO order; why?: file /builds/slave/services-cen-osx-dbg/build/obj-firefox/xpcom/build/BlockingResourceBase.cpp, line 202
Comment 2•13 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1304369910.1304373507.6518.gz&fulltext=1 Rev3 MacOSX Leopard 10.5.8 tracemonkey debug test mochitest-other on 2011/05/02 13:58:30 TEST-PASS | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_searching.js | Should be displaying the correct UI TEST-INFO | before wait for focus -- loaded: uninitialized active window: ([object ChromeWindow @ 0x5bf3c90 (native @ 0x5bf0838)]) chrome://browser/content/browser.xul focused window: ([object Window @ 0x465acbb0 (native @ 0x469a6038)]) about:addons desired window: ([object Window @ 0x465acbb0 (native @ 0x469a6038)]) about:addons child window: ([object Window @ 0x465acbb0 (native @ 0x469a6038)]) about:addons docshell visible: true TEST-INFO | must wait for load TEST-INFO | already focused TEST-INFO | maybe run tests <load:false, focus:true> -- loaded: uninitialized active window: ([object ChromeWindow @ 0x5bf3c90 (native @ 0x5bf0838)]) chrome://browser/content/browser.xul focused window: ([object Window @ 0x465acbb0 (native @ 0x469a6038)]) about:addons desired window: ([object Window @ 0x465acbb0 (native @ 0x469a6038)]) about:addons child window: ([object Window @ 0x465acbb0 (native @ 0x469a6038)]) about:addons docshell visible: true WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/tm-osx-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1886 ++DOMWINDOW == 404 (0x46942888) [serial = 6793] [outer = 0x466bf2e0] TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_searching.js | Console message: [JavaScript Warning: "CSP: Directive "script-src https://services.addons.mozilla.org https://static.addons.mozilla.net https://www.google.com https://www.paypalobjects.com" violated by https://addons.mozilla.org/en-US/jsi18n//build:7d7e85b"] TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_searching.js | Console message: [JavaScript Warning: "CSP: Directive "script-src https://services.addons.mozilla.org https://static.addons.mozilla.net https://www.google.com https://www.paypalobjects.com" violated by https://addons.mozilla.org/en-US/firefox/addons/buttons.js/build:7d7e85b"] TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_searching.js | Console message: [JavaScript Warning: "CSP: Directive "script-src https://services.addons.mozilla.org https://static.addons.mozilla.net https://www.google.com https://www.paypalobjects.com" violated by https://addons.mozilla.org/en-US/jsi18n//build:7d7e85b"] TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_searching.js | Console message: [JavaScript Warning: "Error in parsing value for 'max-width'. Declaration dropped." {file: "https://static.addons.mozilla.net/media/css/zamboni/discovery-pane-min.css?build=7d7e85b" line: 1}] TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_searching.js | Console message: [JavaScript Warning: "Unknown property 'zoom'. Declaration dropped." {file: "https://static.addons.mozilla.net/media/css/zamboni/discovery-pane-min.css?build=7d7e85b" line: 1}] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80530008: file /builds/slave/tm-osx-dbg/build/dom/src/storage/nsDOMStorage.cpp, line 1280 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80530008: file /builds/slave/tm-osx-dbg/build/dom/src/storage/nsDOMStorage.cpp, line 1280 TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_searching.js | Console message: [JavaScript Warning: "CSP: Directive "script-src https://services.addons.mozilla.org https://static.addons.mozilla.net https://www.google.com https://www.paypalobjects.com" violated by https://addons.mozilla.org/en-US/firefox/addons/buttons.js/build:7d7e85b"] ###!!! ASSERTION: Invalid start index: 'count == 0 || start < Length()', file ../../../dist/include/nsTArray.h, line 832 nsTArray<nsHttpTransaction*,nsTArrayDefaultAllocator>::RemoveElementsAt [nsTArray.h:833] nsTArray<nsHttpTransaction*,nsTArrayDefaultAllocator>::RemoveElementAt [nsTArray.h:840] nsHttpConnectionMgr::ProcessPendingQForEntry [netwerk/protocol/http/nsHttpConnectionMgr.cpp:588] nsHttpConnectionMgr::OnMsgProcessPendingQ [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1024] nsHttpConnectionMgr::OnMsgReclaimConnection [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1110] nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1505] nsSocketOutputStream::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp:515] nsSocketTransport::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp:1530] nsSocketTransportService::DoPollIteration [netwerk/base/src/nsSocketTransportService2.cpp:740] nsSocketTransportService::OnProcessNextEvent [netwerk/base/src/nsSocketTransportService2.cpp:606] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:584] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp:637] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:618] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsThread::ThreadFunc [xpcom/threads/nsThread.cpp:272] _pt_root [nsprpub/pr/src/pthreads/ptthread.c:190] libSystem.B.dylib + 0x32155 libSystem.B.dylib + 0x32012 ###!!! ASSERTION: Invalid length: 'start + count <= Length()', file ../../../dist/include/nsTArray.h, line 833 nsTArray<nsHttpTransaction*,nsTArrayDefaultAllocator>::RemoveElementsAt [nsTArray.h:834] nsTArray<nsHttpTransaction*,nsTArrayDefaultAllocator>::RemoveElementAt [nsTArray.h:840] nsHttpConnectionMgr::ProcessPendingQForEntry [netwerk/protocol/http/nsHttpConnectionMgr.cpp:588] nsHttpConnectionMgr::OnMsgProcessPendingQ [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1024] nsHttpConnectionMgr::OnMsgReclaimConnection [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1110] nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady [netwerk/protocol/http/nsHttpConnectionMgr.cpp:1505] nsSocketOutputStream::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp:515] nsSocketTransport::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp:1530] nsSocketTransportService::DoPollIteration [netwerk/base/src/nsSocketTransportService2.cpp:740] nsSocketTransportService::OnProcessNextEvent [netwerk/base/src/nsSocketTransportService2.cpp:606] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:584] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp:637] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:618] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:250] nsThread::ThreadFunc [xpcom/threads/nsThread.cpp:272] _pt_root [nsprpub/pr/src/pthreads/ptthread.c:190] libSystem.B.dylib + 0x32155 libSystem.B.dylib + 0x32012 TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_searching.js | Exited with code 1 during test run INFO | automation.py | Application ran for: 0:40:05.018135 INFO | automation.py | Reading PID log: /var/folders/Xr/Xr--yJnSEY0U11ET5NZuMU+++TM/-Tmp-/tmpZbm47Bpidlog PROCESS-CRASH | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_searching.js | application crashed (minidump found) Operating system: Mac OS X 10.5.8 9L31a CPU: x86 GenuineIntel family 6 model 23 stepping 10 2 CPUs Crash reason: EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE Crash address: 0x2b Thread 0 (crashed) 0 XUL!imgFrame::UnlockImageData [imgFrame.cpp:b67b035200ef : 691 + 0x3] eip = 0x02ac0e34 esp = 0xbfffc1a0 ebp = 0xbfffc1d8 ebx = 0x02ac0e30 esi = 0x00000000 edi = 0x00000000 eax = 0xffffffff ecx = 0x00000000 edx = 0x049da4d8 efl = 0x00210286 Found by: given as instruction pointer in context 1 XUL!mozilla::imagelib::RasterImage::InternalAddFrame [RasterImage.cpp:b67b035200ef : 826 + 0xa] eip = 0x02ab4605 esp = 0xbfffc1e0 ebp = 0xbfffc258 ebx = 0x02ab438f esi = 0x00000000 edi = 0x00000000 Found by: call frame info 2 XUL!mozilla::imagelib::RasterImage::AppendFrame [RasterImage.cpp:b67b035200ef : 880 + 0x67] eip = 0x02ab4dcb esp = 0xbfffc260 ebp = 0xbfffc2a8 ebx = 0x02ab4cb6 esi = 0x4679dd48 edi = 0x00000000 Found by: call frame info 3 XUL!mozilla::imagelib::nsPNGDecoder::CreateFrame [nsPNGDecoder.cpp:b67b035200ef : 120 + 0x51] eip = 0x02adeed5 esp = 0xbfffc2b0 ebp = 0xbfffc308 ebx = 0x02adee82 esi = 0x4679dd48 edi = 0x00000000 Found by: call frame info 4 XUL!mozilla::imagelib::nsPNGDecoder::info_callback [nsPNGDecoder.cpp:b67b035200ef : 633 + 0x34] eip = 0x02adf50d esp = 0xbfffc310 ebp = 0xbfffc3c8 ebx = 0x02adf08f esi = 0x4679e000 edi = 0x0004a252 Found by: call frame info 5 XUL!MOZ_PNG_push_have_info [pngpread.c:b67b035200ef : 1900 + 0x17] eip = 0x040fe23c esp = 0xbfffc3d0 ebp = 0xbfffc3e8 ebx = 0x040fc0d7 esi = 0x4679e000 edi = 0x0004a252 Found by: call frame info 6 XUL!MOZ_PNG_push_read_chunk [pngpread.c:b67b035200ef : 435 + 0x11] eip = 0x040fc7a7 esp = 0xbfffc3f0 ebp = 0xbfffc428 ebx = 0x040fc0d7 esi = 0x4679e000 edi = 0x0004a252 Found by: call frame info 7 XUL!MOZ_PNG_proc_some_data [pngpread.c:b67b035200ef : 64 + 0x11] eip = 0x040fbfac esp = 0xbfffc430 ebp = 0xbfffc458 ebx = 0x02ade761 esi = 0x4679e000 edi = 0x0004a252 Found by: call frame info 8 XUL!MOZ_PNG_process_data [pngpread.c:b67b035200ef : 41 + 0x11] eip = 0x040fbf3d esp = 0xbfffc460 ebp = 0xbfffc478 ebx = 0x02ade761 esi = 0x4679e000 edi = 0x0004a252 Found by: call frame info 9 XUL!mozilla::imagelib::nsPNGDecoder::WriteInternal [nsPNGDecoder.cpp:b67b035200ef : 354 + 0x25] eip = 0x02ade9e1 esp = 0xbfffc480 ebp = 0xbfffc4b8 ebx = 0x02ade761 esi = 0x4679e000 edi = 0x0004a252 Found by: call frame info 10 XUL!mozilla::imagelib::Decoder::Write [Decoder.cpp:b67b035200ef : 103 + 0x1f] eip = 0x02aada33 esp = 0xbfffc4c0 ebp = 0xbfffc4e8 ebx = 0x02aad9c2 esi = 0x00000000 edi = 0x0004a252 Found by: call frame info 11 XUL!mozilla::imagelib::RasterImage::WriteToDecoder [RasterImage.cpp:b67b035200ef : 2264 + 0x25] eip = 0x02ab0a1c esp = 0xbfffc4f0 ebp = 0xbfffc538 ebx = 0x02ab0952 esi = 0x00000000 edi = 0x0004a252 Found by: call frame info 12 XUL!mozilla::imagelib::RasterImage::DecodeSomeData [RasterImage.cpp:b67b035200ef : 2567 + 0x30] eip = 0x02ab0c16 esp = 0xbfffc540 ebp = 0xbfffc588 ebx = 0x02ab0b1d esi = 0x00000000 edi = 0x0004a252 Found by: call frame info 13 XUL!mozilla::imagelib::imgDecodeWorker::Run [RasterImage.cpp:b67b035200ef : 2681 + 0x11] eip = 0x02ab0dcf esp = 0xbfffc590 ebp = 0xbfffc618 ebx = 0x02ab0c38 esi = 0x163db16e edi = 0x0004a252 Found by: call frame info 14 XUL!mozilla::imagelib::RasterImage::AddSourceData [RasterImage.cpp:b67b035200ef : 1259 + 0x1d] eip = 0x02ab15dc esp = 0xbfffc620 ebp = 0xbfffc698 ebx = 0x02ab12fe esi = 0x3d0d9004 edi = 0x469efbac Found by: call frame info 15 XUL!mozilla::imagelib::RasterImage::WriteToRasterImage [RasterImage.cpp:b67b035200ef : 2758 + 0x18] eip = 0x02ab175b esp = 0xbfffc6a0 ebp = 0xbfffc6c8 ebx = 0x03ee6810 esi = 0x3d0d9004 edi = 0x469efbac Found by: call frame info 16 XUL!nsPipeInputStream::ReadSegments [nsPipe3.cpp:b67b035200ef : 798 + 0x2f] eip = 0x03ee6951 esp = 0xbfffc6d0 ebp = 0xbfffc728 ebx = 0x03ee6810 esi = 0x3d0d9004 edi = 0x469efbac Found by: call frame info 17 XUL!imgRequest::OnDataAvailable [imgRequest.cpp:b67b035200ef : 1152 + 0x42] eip = 0x02ad5424 esp = 0xbfffc730 ebp = 0xbfffca68 ebx = 0x02ad45db esi = 0x03ee6802 edi = 0x02999f56 Found by: call frame info 18 XUL!ProxyListener::OnDataAvailable [imgLoader.cpp:b67b035200ef : 2020 + 0x3e] eip = 0x02ac3b85 esp = 0xbfffca70 ebp = 0xbfffcaa8 ebx = 0x0299ed9a esi = 0x00000000 edi = 0x469c45d0 Found by: call frame info 19 XUL!nsHttpChannel::OnDataAvailable [nsHttpChannel.cpp:b67b035200ef : 4198 + 0x60] eip = 0x0299f0bb esp = 0xbfffcab0 ebp = 0xbfffcb28 ebx = 0x0299ed9a esi = 0x00000000 edi = 0x469c45d0 Found by: call frame info
Component: Add-ons Manager → Networking: HTTP
Product: Toolkit → Core
QA Contact: add-ons.manager → networking.http
Summary: Test failure in chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser-window/browser_bug581076.js | application timed out → ASSERTION: Invalid start index: 'count == 0 || start < Length()' browser_bug581076.js browser_searching.js
Whiteboard: [orange] → [orange][test which aborts the suite]
Version: unspecified → Trunk
Assignee | ||
Comment 3•13 years ago
|
||
I'm not familiar with either the services-central or tracemonkey branches, but from the stack I suspect this is a dup of 653530 .. did these branches contain changeset 9b12c34434f4 ? (the fix for that bug)
Comment 4•13 years ago
|
||
The services-central instance was on a merge from mozilla-central up to http://hg.mozilla.org/mozilla-central/rev/4c616579cdfa, and the tracemonkey instance was on a merge from mozilla-central up to http://hg.mozilla.org/mozilla-central/rev/bf1db7f29686, so m-c as of 04:33 and 10:44 this morning, both well after 9b12c34434f4.
Assignee | ||
Comment 5•13 years ago
|
||
i see what's wrong. grrr.
Assignee | ||
Comment 6•13 years ago
|
||
(In reply to comment #5) > i see what's wrong. nope. It was me that was wrong.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 8•13 years ago
|
||
I added some debug and started running mochitest-other on osx-32-debug on try and eventually came up with a stack trace that shows the culprit. (below). Basically the processpendingQ code is testing the idle persistent connections to make sure they are still alive, looking for one to dispatch the pending entry on. It appears that touching an SSL idle persistent connection with IsAlive() is enough to make it synchronously start the socket thread event loop again (with IsAlive() on the stack). Any code that isn't re-entrant with respect to IsAlive() is going to have a problem. In this case mPendingQ is being changed (possibly the same transaction dispatched twice, possibly a bounds exception).. I can also see a problem with the idle connection list - it is being iterated in processPendingQ, but the second event loop could certainly modify that as this stack shows. IsAlive() basically does a non-blocking network read to see if there is an EOF marker there. 0 libmozalloc.dylib!TouchBadMemory [mozalloc_abort.cpp:273c0fc2f4be : 64 + 0x5] eip = 0x00015f43 esp = 0xb02875d0 ebp = 0xb02875d8 ebx = 0x00015f66 esi = 0x00419fb0 edi = 0x42c9a570 eax = 0x00000000 ecx = 0x00015f3d edx = 0x00000000 efl = 0x00010282 Found by: given as instruction pointer in context 1 libmozalloc.dylib!mozalloc_abort [mozalloc_abort.cpp:273c0fc2f4be : 85 + 0x4] eip = 0x00015f9f esp = 0xb02875e0 ebp = 0xb02875f8 ebx = 0x00015f66 esi = 0x00419fb0 edi = 0x42c9a570 Found by: call frame info 2 XUL!Abort [nsDebugImpl.cpp:273c0fc2f4be : 388 + 0xa] eip = 0x03f27c15 esp = 0xb0287600 ebp = 0xb0287618 ebx = 0x03f27f17 esi = 0x00419fb0 edi = 0x42c9a570 Found by: call frame info 3 XUL!NS_DebugBreak_P [nsDebugImpl.cpp:273c0fc2f4be : 345 + 0xd] eip = 0x03f281be esp = 0xb0287620 ebp = 0xb0287a48 ebx = 0x03f27f17 esi = 0x00419fb0 edi = 0x42c9a570 Found by: call frame info 4 XUL!assertPending [nsHttpConnectionMgr.cpp:273c0fc2f4be : 341 + 0x3d] eip = 0x0297bd5d esp = 0xb0287a50 ebp = 0xb0287a78 ebx = 0x0297bd1e esi = 0x00419fb0 edi = 0x42c9a570 Found by: call frame info 5 XUL!nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady [nsHttpConnectionMgr.cpp:273c0fc2f4be : 1525 + 0x4] eip = 0x02980ce8 esp = 0xb0287a80 ebp = 0xb0287b38 ebx = 0x02980713 esi = 0x00419fb0 edi = 0x42c9a570 Found by: call frame info 6 XUL!nsSocketOutputStream::OnSocketReady [nsSocketTransport2.cpp:273c0fc2f4be : 515 + 0x1f] eip = 0x028de8bb esp = 0xb0287b40 ebp = 0xb0287b78 ebx = 0x028de754 esi = 0x028dec50 edi = 0x03f177f6 Found by: call frame info 7 XUL!nsSocketTransport::OnSocketReady [nsSocketTransport2.cpp:273c0fc2f4be : 1527 + 0x17] eip = 0x028ded20 esp = 0xb0287b80 ebp = 0xb0287bc8 ebx = 0x028dec5c esi = 0x028dec50 edi = 0x03f177f6 Found by: call frame info 8 XUL!nsSocketTransportService::DoPollIteration [nsSocketTransportService2.cpp:273c0fc2f4be : 742 + 0x2e] eip = 0x028e66d8 esp = 0xb0287bd0 ebp = 0xb0287c38 ebx = 0x028e62f6 esi = 0x028dec50 edi = 0x03f177f6 Found by: call frame info 9 XUL!nsSocketTransportService::OnProcessNextEvent [nsSocketTransportService2.cpp:273c0fc2f4be : 603 + 0x12] eip = 0x028e6934 esp = 0xb0287c40 ebp = 0xb0287c78 ebx = 0x03f167a8 esi = 0x0043ef48 edi = 0x03f177f6 Found by: call frame info 10 XUL!nsThread::ProcessNextEvent [nsThread.cpp:273c0fc2f4be : 582 + 0x61] eip = 0x03f1693b esp = 0xb0287c80 ebp = 0xb0287d28 ebx = 0x03f167a8 esi = 0x0043ef48 edi = 0x03f177f6 Found by: call frame info 11 XUL!NS_ProcessNextEvent_P [nsThreadUtils.cpp:273c0fc2f4be : 250 + 0x1f] eip = 0x03e9e50a esp = 0xb0287d30 ebp = 0xb0287d78 ebx = 0x03e9e494 esi = 0x00419fb0 edi = 0x03f177f6 Found by: call frame info 12 XUL!nsProxyEventObject::CallMethod [nsProxyEventObject.cpp:273c0fc2f4be : 260 + 0x12] eip = 0x03f24266 esp = 0xb0287d80 ebp = 0xb0287e08 ebx = 0x03f23ec4 esi = 0x00419fb0 edi = 0x03f177f6 Found by: call frame info 13 XUL!PrepareAndDispatch [xptcstubs_unixish_x86.cpp:273c0fc2f4be : 93 + 0x31] eip = 0x03f3e2a8 esp = 0xb0287e10 ebp = 0xb0287f18 ebx = 0x03f3e03f esi = 0x00000010 edi = 0x03f23eb6 Found by: call frame info 14 XUL!nsXPTCStubBase::Stub16 [xptcstubsdef.inc : 14 + 0x25] eip = 0x03f39215 esp = 0xb0287f20 ebp = 0xb0287f68 ebx = 0x03f391ee esi = 0x3f617430 edi = 0xb0287f30 Found by: call frame info 15 XUL!nsNSSSocketInfo::EnsureDocShellDependentStuffKnown [nsNSSIOLayer.cpp:273c0fc2f4be : 398 + 0x3c] eip = 0x039febdb esp = 0xb0287f70 ebp = 0xb0288018 ebx = 0x039fe9e9 esi = 0x3f617430 edi = 0x03f391e0 Found by: call frame info 16 XUL!nsNSSSocketInfo::GetPreviousCert [nsNSSIOLayer.cpp:273c0fc2f4be : 847 + 0xa] eip = 0x039fef6a esp = 0xb0288020 ebp = 0xb0288068 ebx = 0x039fef14 esi = 0x00000000 edi = 0x00000001 Found by: call frame info 17 XUL!HandshakeCallback [nsNSSCallbacks.cpp:273c0fc2f4be : 919 + 0x31] eip = 0x039e99b5 esp = 0xb0288070 ebp = 0xb02883b8 ebx = 0x039e93e9 esi = 0x00000000 edi = 0x00000001 Found by: call frame info 18 libssl3.dylib!ssl3_HandleFinished [ssl3con.c : 8503 + 0x1f] eip = 0x001f1b72 esp = 0xb02883c0 ebp = 0xb0288418 ebx = 0x001f139c esi = 0x00000000 edi = 0x00000001 Found by: call frame info 19 libssl3.dylib!ssl3_HandleHandshakeMessage [ssl3con.c : 8659 + 0x1f] eip = 0x001f22e2 esp = 0xb0288420 ebp = 0xb0288498 ebx = 0x001f1b91 esi = 0x00000301 edi = 0x00000001 Found by: call frame info 20 libssl3.dylib!ssl3_HandleHandshake [ssl3con.c : 8727 + 0x20] eip = 0x001f250e esp = 0xb02884a0 ebp = 0xb02884d8 ebx = 0x001f2332 esi = 0x00000301 edi = 0x00000001 Found by: call frame info 21 libssl3.dylib!ssl3_HandleRecord [ssl3con.c : 9066 + 0x11] eip = 0x001f3075 esp = 0xb02884e0 ebp = 0xb02885d8 ebx = 0x001f2784 esi = 0x00000301 edi = 0x00000001 Found by: call frame info 22 libssl3.dylib!ssl3_GatherCompleteHandshake [ssl3gthr.c : 209 + 0x1d] eip = 0x001f41ca esp = 0xb02885e0 ebp = 0xb0288628 ebx = 0x001f40ed esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 23 libssl3.dylib!ssl_GatherRecord1stHandshake [sslcon.c : 1258 + 0x12] eip = 0x001f6e71 esp = 0xb0288630 ebp = 0xb0288678 ebx = 0x001f6dec esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 24 libssl3.dylib!ssl_Do1stHandshake [sslsecur.c : 151 + 0x10] eip = 0x0020247c esp = 0xb0288680 ebp = 0xb02886a8 ebx = 0x00202237 esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 25 libssl3.dylib!ssl_SecureRecv [sslsecur.c : 1141 + 0xa] eip = 0x0020436f esp = 0xb02886b0 ebp = 0xb0288708 ebx = 0x002041ee esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 26 libssl3.dylib!ssl_Recv [sslsock.c : 1591 + 0x25] eip = 0x0020bd8b esp = 0xb0288710 ebp = 0xb0288748 ebx = 0x0020bcd4 esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 27 XUL!nsSSLThread::requestRecvMsgPeek [nsSSLThread.cpp:273c0fc2f4be : 208 + 0x2b] eip = 0x039e1597 esp = 0xb0288750 ebp = 0xb02887a8 ebx = 0x039e133a esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 28 XUL!PSMRecv [nsNSSIOLayer.cpp:273c0fc2f4be : 2180 + 0x26] eip = 0x039fcb94 esp = 0xb02887b0 ebp = 0xb02887f8 ebx = 0x039fcaea esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 29 libnspr4.dylib!PR_Recv [priometh.c:273c0fc2f4be : 220 + 0x2b] eip = 0x000484ac esp = 0xb0288800 ebp = 0xb0288828 ebx = 0x02979d0d esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 30 XUL!nsSocketTransport::IsAlive [nsSocketTransport2.cpp:273c0fc2f4be : 1852 + 0x29] eip = 0x028df1ae esp = 0xb0288830 ebp = 0xb0288898 ebx = 0x02979d0d esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 31 XUL!nsHttpConnection::IsAlive [nsHttpConnection.cpp:273c0fc2f4be : 255 + 0x22] eip = 0x02979049 esp = 0xb02888a0 ebp = 0xb02888d8 ebx = 0x02979d0d esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 32 XUL!nsHttpConnection::CanReuse [nsHttpConnection.cpp:273c0fc2f4be : 220 + 0x3b] eip = 0x02979d4a esp = 0xb02888e0 ebp = 0xb0288918 ebx = 0x02979d0d esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 33 XUL!nsHttpConnectionMgr::GetConnection [nsHttpConnectionMgr.cpp:273c0fc2f4be : 716 + 0xa] eip = 0x0297f377 esp = 0xb0288920 ebp = 0xb0288958 ebx = 0x0297f2d7 esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 34 XUL!nsHttpConnectionMgr::ProcessPendingQForEntry [nsHttpConnectionMgr.cpp:273c0fc2f4be : 604 + 0x26] eip = 0x0297fb68 esp = 0xb0288960 ebp = 0xb02889b8 ebx = 0x0297fa2b esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 35 XUL!nsHttpConnectionMgr::OnMsgProcessPendingQ [nsHttpConnectionMgr.cpp:273c0fc2f4be : 1057 + 0x17] eip = 0x02980396 esp = 0xb02889c0 ebp = 0xb0288a08 ebx = 0x0298030e esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 36 XUL!nsHttpConnectionMgr::OnMsgReclaimConnection [nsHttpConnectionMgr.cpp:273c0fc2f4be : 1142 + 0x19] eip = 0x029806d4 esp = 0xb0288a10 ebp = 0xb0288a98 ebx = 0x029803fd esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 37 XUL!nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady [nsHttpConnectionMgr.cpp:273c0fc2f4be : 1552 + 0x32] eip = 0x02980ecd esp = 0xb0288aa0 ebp = 0xb0288b58 ebx = 0x02980713 esi = 0x42cfa9f0 edi = 0x42c9b6c0 Found by: call frame info 38 XUL!nsSocketOutputStream::OnSocketReady [nsSocketTransport2.cpp:273c0fc2f4be : 515 + 0x1f] eip = 0x028de8bb esp = 0xb0288b60 ebp = 0xb0288b98 ebx = 0x028de754 esi = 0x028dec50 edi = 0x00000000 Found by: call frame info 39 XUL!nsSocketTransport::OnSocketReady [nsSocketTransport2.cpp:273c0fc2f4be : 1527 + 0x17] eip = 0x028ded20 esp = 0xb0288ba0 ebp = 0xb0288be8 ebx = 0x028dec5c esi = 0x028dec50 edi = 0x00000000 Found by: call frame info 40 XUL!nsSocketTransportService::DoPollIteration [nsSocketTransportService2.cpp:273c0fc2f4be : 742 + 0x2e] eip = 0x028e66d8 esp = 0xb0288bf0 ebp = 0xb0288c58 ebx = 0x028e62f6 esi = 0x028dec50 edi = 0x00000000 Found by: call frame info 41 XUL!nsSocketTransportService::OnProcessNextEvent [nsSocketTransportService2.cpp:273c0fc2f4be : 607 + 0x12] eip = 0x028e6949 esp = 0xb0288c60 ebp = 0xb0288c98 ebx = 0x03f167a8 esi = 0x0043ef48 edi = 0x00000000 Found by: call frame info 42 XUL!nsThread::ProcessNextEvent [nsThread.cpp:273c0fc2f4be : 582 + 0x61] eip = 0x03f1693b esp = 0xb0288ca0 ebp = 0xb0288d48 ebx = 0x03f167a8 esi = 0x0043ef48 edi = 0x00000000 Found by: call frame info 43 XUL!NS_ProcessNextEvent_P [nsThreadUtils.cpp:273c0fc2f4be : 250 + 0x1f] eip = 0x03e9e50a esp = 0xb0288d50 ebp = 0xb0288d98 ebx = 0x03e9e494 esi = 0x00000000 edi = 0x00000000 Found by: call frame info 44 XUL!nsSocketTransportService::Run [nsSocketTransportService2.cpp:273c0fc2f4be : 649 + 0x12] eip = 0x028e6138 esp = 0xb0288da0 ebp = 0xb0288de8 ebx = 0x028e6019 esi = 0x00000000 edi = 0x00000000 Found by: call frame info 45 XUL!nsThread::ProcessNextEvent [nsThread.cpp:273c0fc2f4be : 618 + 0x18] eip = 0x03f16b26 esp = 0xb0288df0 ebp = 0xb0288e98 ebx = 0x03f167a8 esi = 0x00000000 edi = 0x00000000 Found by: call frame info 46 XUL!NS_ProcessNextEvent_P [nsThreadUtils.cpp:273c0fc2f4be : 250 + 0x1f] eip = 0x03e9e50a esp = 0xb0288ea0 ebp = 0xb0288ee8 ebx = 0x03e9e494 esi = 0x00000000 edi = 0x00000000 Found by: call frame info 47 XUL!nsThread::ThreadFunc [nsThread.cpp:273c0fc2f4be : 273 + 0x12] eip = 0x03f174f9 esp = 0xb0288ef0 ebp = 0xb0288f48 ebx = 0x03f173ff esi = 0x00000000 edi = 0x00000000 Found by: call frame info 48 libnspr4.dylib!_pt_root [ptthread.c:273c0fc2f4be : 187 + 0x10] eip = 0x0006cca9 esp = 0xb0288f50 ebp = 0xb0288f78 ebx = 0x0006cb93 esi = 0xb0289000 edi = 0x00000000 Found by: call frame info 49 libSystem.B.dylib + 0x32154 eip = 0x9296b155 esp = 0xb0288f80 ebp = 0xb0288fc8 ebx = 0x9296b028 esi = 0xb0289000 edi = 0x00000000 Found by: call frame info 50 libSystem.B.dylib + 0x32011 eip = 0x9296b012 esp = 0xb0288fd0 ebp = 0xb0288fec Found by: previous frame's frame pointer
Assignee: nobody → mcmanus
Status: NEW → UNCONFIRMED
Ever confirmed: false
Assignee | ||
Comment 9•13 years ago
|
||
adding brian and honza to the cc as they have more ssl understanding than I do.. do either of you have thoughts on comment 8?
Comment 10•13 years ago
|
||
Yeah, this is a regression I was a bit worried about when you started to use PR_Available form the http code. I'll try to figure out what we can do about this.
Assignee: mcmanus → honzab.moz
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 11•13 years ago
|
||
I guess the event loop is running because the ensuredocshell() routine uses synchronous proxies to the main thread. whoa. AFAIK IsAlive() would have triggered this for a long time although maybe somewhat less painfully in other code rearrangements.. including firefox 4. e.g. look at the mozilla-2.0 code which calls CanReuse()->IsAlive() out of nsHttpConnectionMgr::GetConnection() .. that holds an index reference to mIdleConns across the call which might certainly be invalid after the event loop runs resulting in a similar bounds exception problem or the deletion of the wrong object. The same goes for the pruneDeadConnections code..
Assignee | ||
Comment 12•13 years ago
|
||
honza, Do you think the "ssl-recv can spin the event loop" rises to the level of a separate bug? It certainly seems very likely to cause other bugs. meanwhile, something as simple as this probably gets us back to the state of things as they are in 4.0.. what do you think of that as a stop gap? (untested) diff --git a/netwerk/protocol/http/nsHttpConnectionMgr.cpp b/netwerk/protocol/http/nsHttpConnectionMgr.cpp --- a/netwerk/protocol/http/nsHttpConnectionMgr.cpp +++ b/netwerk/protocol/http/nsHttpConnectionMgr.cpp @@ -579,7 +579,14 @@ nsHttpConnectionMgr::ProcessPendingQForE if (conn) break; } - if (conn) { + + // It is possible that the GetConnection() call invoked CanReuse() + // on an idle SSL persistent connection. That combination can result + // in the socket thread event loop being run synchronously on the stack, + // which in turn means ent->mPending may have changed. To fix that + // enure the index value we are using. + i = ent->mPendingQ.IndexOf(trans); + if (conn && (i >= 0)) { LOG((" dispatching pending transaction...\n")); // remove pending transaction
Comment 13•13 years ago
|
||
I misslooked the stack in comment 8. I thought this was a regression from calling PR_Available. But it's not the reason. Before patch for HTTP acceleration, ProcessPendingQForEntry was called (indirectly) from ~nsConnectionHandle dtor or in vary rare case from nsHttpConnection::OnSocketReadable. Both places being called on the socket thread, any events involved in the chain also posted to the socket thread only. So, this problem is not a regression. Why we see this now (happening just more often?) is that we store all new transactions to a pending queue now and establish half-opens for them. Those soon success and dispatches transactions for real. Before the patch, transaction were dispatched immediately, so this bug happened much more rarely, if even so. What can we do: 1) Change nsNSSSocketInfo::EnsureDocShellDependentStuffKnown to not proxy and say that PR_Read must not spin the event loop (out of scope of this bug, but I'd like to do it later anyway) 2) Patch from comment 12 is not enough from following reasons: - it must keep ref to the object held in |trans| to prevent it from being released during call to GetConnection - the iteration must use direct mPendingQ.Count() check instead of caching it in the |count| variable - GetConnection calls CreateTransport that might create a new half-open for a transaction that doesn't necessarily have to be in the pendingQ anymore (personally I play with a thought to break the relation transaction/half-open soon anyway) - other possible problems I cannot anticipate now 3) [I vote for this one] Change the code to first search for an idle connection if there is a positive number of pending transactions that are not already referred by their respective half-opens ; the goal is to first call CanReuse before we iterate mPendingQ. So, my idea for the design is: - let's split GetConnection method as follows - have a new FindConnection to only look for idle connections, but do not create new ones (half opens) - have a new method CreateConnection that creates a new half open and bound it with a transaction - GetConnection will call these two new methods - in ProcessPendingQForEntry: first call FindConnection, i.e. gather an idle conn and remove it from the list, regardless of trans->Caps() & NS_HTTP_ALLOW_KEEPALIVE - if found, search for a transaction that trans->Caps() & NS_HTTP_ALLOW_KEEPALIVE and is not referred by a half-open already and dispatch it with the connection found by FindConnection; we are done - (if such a transaction has not been found, return the connection back to the idle list) - otherwise, take any transaction that is not referred by a half-open already and create a new half-open (call CreateConnection) - FindConnection also have to check that the connection on which it just called CanReuse is still present in the idle list; if not, ignore that connection since it has already been dispatched to a transaction This seems to me clearest, Patrick, please check the idea and let me know what you think.
Comment 14•13 years ago
|
||
>Change the code to first search for an idle connection
>if there is a positive number of pending transactions that are not already
>referred by their respective half-opens ; the goal is to first call CanReuse
>before we iterate mPendingQ
This should stand as:
Change the code to first search for an idle connection ; the goal is to first call CanReuse before we iterate mPendingQ
Assignee | ||
Comment 15•13 years ago
|
||
(In reply to comment #14) > >Change the code to first search for an idle connection Hypothesis A : any use of IsAlive() can run the event loop. Assuming this is true, I don't understand how we can safely search for an idle connection at all assuming the search uses CanReuse()->IsAlive() for (int i=0; i< idleConns.Count(); i++) if (idleConns[i].CanReuse()) { dispatchOn(idleConns[i]); break; } That's not any safer if isAlive() can run the event loop - the event loop can do anything, including modifying idleConns. Changing mPending is the effect we are seeing, but it's not the only possibility. Indeed, the pseudo code above could theoretically return TRUE for canreuse of some conn and at the same time have removed it from the idle list (and have put some other transaction on it). It can probably even deadlock the ssl connection if isalive() runs the event loop which calls isalive() on the same ssl conn again maybe due to something like a transaction cancel. If Hypothesis A is true, I don't see any real fix other than option 1. Hypothesis B: the only uses of IsAlive() that really cause the event loop to run in practice are checks on idle SSL connections that have not yet completed the initial SSL handshake. In practice these are idle SSL connections that have not yet had an HTTP transaction run on them. That's what is really different about the syn retry code - because we have separated the concept of matching connections and transactions from the process of starting connections these connections are in the idle pool looking for a pending transaction, when there never was a "used 0 times" connection in the idle pool before. Comment 8 is clearly showing IsAlive() being tested on such a connection causing the event loop to run when stuff is needed from the main thread to complete the handshake. If that is the case, the direct thing to do is just work around the problem in IsAlive().. identify connections that meet that criteria (very easy to do) and just skip the mSocketTransport->IsAlive() test for them, making the assumption that they are alive. The test is basically an optimization anyhow - Connections that have not been used before are timed out quite quickly anyhow (5 seconds) and if they should happen to be dead when we use them then they will generate errors that we already cope with as a "ships passing in the night" race detection anyhow (i.e. the FIN could have been sent by the server but not read by us at the time we run IsAlive()).. landing bug 604796 would further help (it would reduce the cases of the isalive() check at dispatch time identifying a closed connection to be a very rare event). I like this better than the long series of steps of option 3. I think this is especially true because the way the code is structured now is doing the right thing assuming isalive() doesn't run the event loop. And there are quite a few other pre-existing cases in netwerk that make that assumption. I suspect Hypothesis B gets us back to the danger level of Firefox-4 (i.e. the danger appears theoretical.) and therefore is a realistic workaround while we work on option 1.
Assignee | ||
Comment 16•13 years ago
|
||
> 1) Change nsNSSSocketInfo::EnsureDocShellDependentStuffKnown to not proxy and
> say that PR_Read must not spin the event loop (out of scope of this bug, but
> I'd like to do it later anyway)
>
I think this is a "must do", even if we do something else faster. Honza, shall I open the bug or would you like to? :)
Comment 17•13 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1304564530.1304568333.21811.gz&fulltext=1
Summary: ASSERTION: Invalid start index: 'count == 0 || start < Length()' browser_bug581076.js browser_searching.js → ASSERTION: Invalid start index: 'count == 0 || start < Length()' browser_bug581076.js browser_searching.js browser_bug618502.js
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 19•13 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1304603564.1304607399.25958.gz&fulltext=1#err0
Summary: ASSERTION: Invalid start index: 'count == 0 || start < Length()' browser_bug581076.js browser_searching.js browser_bug618502.js → ASSERTION: Invalid start index: 'count == 0 || start < Length()' browser_bug581076.js browser_searching.js browser_bug618502.js browser_uninstalling.js
Comment 20•13 years ago
|
||
(In reply to comment #15) I agree with your hypothesis B. As I understand you suggest to remove the IsAlive check for connections that has never been used? If so, please go forward and create a patch. I'll review it quickly. (In reply to comment #16) > > 1) Change nsNSSSocketInfo::EnsureDocShellDependentStuffKnown to not proxy > I think this is a "must do", even if we do something else faster. Honza, shall > I open the bug or would you like to? :) Let me think of this first a bit. Then I'll submit the bug my self. Thanks.
Assignee | ||
Comment 21•13 years ago
|
||
(In reply to comment #20) > (In reply to comment #15) > > I agree with your hypothesis B. As I understand you suggest to remove the > IsAlive check for connections that has never been used? Yes - but just for SSL. > If so, please go > forward and create a patch. I'll review it quickly. I will do it tonight. Thanks!
Assignee | ||
Comment 22•13 years ago
|
||
patch as discussed from comment 20. I have run this patch successfully, along with the assertions added to reproduce the trace in comment 8, on an OS X debug build for 7 iterations on try with mochitest-other. Without the fix that combination was generating a failure for me about 1 out of 3 times. I have also sent the patch off for a full try run - results are pending.
Attachment #530486 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 23•13 years ago
|
||
note the new stack in comment 18 is a different stack, but is consistent with hypothesis B. In that stack processPending isn't being used at all - the issue is with calling canreuse->isalive() on an idle conn during the prune idle connections callback which again leads to the proxying in ensuredocshell(). So this makes me fairly confident in the approach of the patch in comment 22.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Updated•13 years ago
|
Attachment #530486 -
Flags: review?(honzab.moz) → review-
Assignee | ||
Comment 25•13 years ago
|
||
(In reply to comment #22) > Created attachment 530486 [details] [diff] [review] [review] > skip isalive for unused ssl connections v1 > > I have also sent the patch off for a full try run - results are pending. That patch makes /tests/security/ssl/stricttransportsecurity/test_sts_privatebrowsing.html fail fairly often in mochitest-5.. I suspect it is probably because I used mIsReused which only gets set on transaction completion instead of on dispatch.. I guess I need to bite the bullet and add another flag. I'll experiment with try and update the patch.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 27•13 years ago
|
||
(In reply to comment #26) > mfinkle%mozilla.com > http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Aurora/1304767144. > 1304770073.24827.gz > Rev3 MacOSX Leopard 10.5.8 mozilla-aurora debug test mochitest-other on > 2011/05/07 04:19:04 > I don't think that this incident is due to this bug. The test case is the same, but nothing about the stacks is similar.
Assignee | ||
Comment 28•13 years ago
|
||
honza, I'm sorry about the delay here - but I was caught up in some other failures I saw when running this test case again and again and again :) I'll post the fixes as two patches.
Assignee | ||
Comment 29•13 years ago
|
||
This is basically the approach agreed in comment 21. However, totally skipping IsAlive() for non-dispatched SSL connections wasn't quite sufficient. That regressed security/ssl/stricttransportsecurity/test_sts_privatebrowsing.html in mochitest-5 largely for the same root causes as found in bug 480619 (see comment 45 in that bug) This can be pretty easily accommodated by instead of skipping isalive() changing the call to msg_rcv(PEEK) inside it to simply pr_poll() for the non-dispatched ssl connection. That will give us the necessary checks on the SSL status without actually worrying about calling recv and running the event loop.
Attachment #530486 -
Attachment is obsolete: true
Attachment #531677 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 30•13 years ago
|
||
While running the tests involved in this bug I sometimes would get a failure in browser/components/privatebrowsing/test/browser/browser_privatebrowsing_certexceptionsui.js from browser-chrome. The root cause of that was out of date security callbacks that were not reset when an extra idle persistent connection was used for a different transaction than the one that initiated it. The fix is simple.
Attachment #531680 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 31•13 years ago
|
||
the try server gently reminded me that I need to proxy release the callback reference when it changes just as a I do in the dtor. That's the only difference. Try is still running with the update, but things look good so far.
Attachment #531680 -
Attachment is obsolete: true
Attachment #531680 -
Flags: review?(honzab.moz)
Attachment #531793 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 32•13 years ago
|
||
(In reply to comment #31) > Try is still running with the update, but things look good so > far. Everything is green so we should expedite both these fixes so they get at least a little nightly testing before aurora-6. There is a whitespace change to one line in the second patch - I'll fix that before committing anything.
Comment 33•13 years ago
|
||
I just tacked to biesi: - we should use PR_Poll always instead of PR_Read - this can also tell us there are still pending data to read (PR_POLL_READ on output) - it is OK to change the IsAlive signature So, we could also fix bug 631801 with it! We definitely need good tests for changes like these to be sure we do not break the transaction restart logic...
Comment 34•13 years ago
|
||
Comment on attachment 531677 [details] [diff] [review] workaround-isalive-runs-eventloop.2 Review of attachment 531677 [details] [diff] [review]: ----------------------------------------------------------------- Good idea with PR_Poll. ::: netwerk/base/public/nsISocketTransport.idl @@ +161,5 @@ > /** > + * This flag indicates that IsAlive() method should not > + * read, even non-destructively, from the network. Some protocol > + * implementations may deadlock if called without this. > + */ I don't see a reason for a new flag here. You set it only for the time nsSocketTransport::IsAlive is called and I don't see another use to it. The interface is not frozen and I got confirmed from biesi we can modify it. Also, this patch is a workaround for very specific issue with a new SSL socket PR_Read event loop re-spin that has and needs to be fixed in near future. We would then remove this flag again. Let's just pass this flag as an argument to IsAlive(). ::: netwerk/base/src/nsSocketTransport2.cpp @@ +1847,5 @@ > } > > // XXX do some idle-time based checks?? > > + if (mConnectionFlags & nsISocketTransport::PASSIVE_ISALIVE) { As discussed above, let's turn this to a method argument. @@ +1853,4 @@ > > + PRPollDesc desc; > + desc.fd = mFD; > + desc.in_flags = PR_POLL_EXCEPT | PR_POLL_READ; Is it really needed to poll also for READ when you want only EXCEPT and ERROR on out? @@ +1856,5 @@ > + desc.in_flags = PR_POLL_EXCEPT | PR_POLL_READ; > + desc.out_flags = 0; > + if ((PR_Poll(&desc, 1, 0) == 1) && > + (desc.out_flags & (PR_POLL_EXCEPT | PR_POLL_ERR))) { > + *result = PR_FALSE; You probably also want to check for PR_POLL_NVAL. Not sure about PR_POLL_HUP but probably as well. Is anywhere defined what happens when you poll on a socket that is already being polled (with a timeout) on a different thread? This might happen after we land your patch for bug 604796 and this code gets invoked on a thread different then socket transport thread. Here is a good place to add ABORT_IF_FALSE with check for the socket thread.
Attachment #531677 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 35•13 years ago
|
||
(In reply to comment #34) > Comment on attachment 531677 [details] [diff] [review] [review] > workaround-isalive-runs-eventloop.2 > Is it really needed to poll also for READ when you want only EXCEPT and > ERROR on out? > yes, READ is required. I started without it but the poll implementation optimizes the relevant ssl health checks away if there are no fd's queried for read or write. > Is anywhere defined what happens when you poll on a socket that is already > being polled (with a timeout) on a different thread? poll, being level triggered, is going to be fine with that. The implementation generally just looks at the recv queue in the kernel and sets the poll bit based on that. I've done things like that in the past - as long as everything is non blocking we'll be ok.
Assignee | ||
Comment 36•13 years ago
|
||
Updated based on review comments - this one now changes the IsAlive() interface, and comes with an sr?
Attachment #532054 -
Flags: superreview?(cbiesinger)
Attachment #532054 -
Flags: review?(honzab.moz)
Assignee | ||
Updated•13 years ago
|
Attachment #531677 -
Attachment is obsolete: true
Comment 37•13 years ago
|
||
Comment on attachment 532054 [details] [diff] [review] workaround-isalive-runs-eventloop.3 Review of attachment 532054 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/base/public/nsISocketTransport.idl @@ +102,2 @@ > */ > + boolean isAlive(in boolean aPassive); Please don't add this argument. Just always use PR_Poll instead. ::: netwerk/base/src/nsSocketTransport2.cpp @@ +1848,5 @@ > > // XXX do some idle-time based checks?? > > + if (aPassive) { > + *result = PR_TRUE; /* presume true */ I'm not sure this should return true if PR_Poll returns -1 @@ +1853,4 @@ > > + PRPollDesc desc; > + desc.fd = mFD; > + desc.in_flags = PR_POLL_READ | PR_POLL_EXCEPT | Why pass PR_POLL_READ? You're not using that flag. @@ +1853,5 @@ > > + PRPollDesc desc; > + desc.fd = mFD; > + desc.in_flags = PR_POLL_READ | PR_POLL_EXCEPT | > + PR_POLL_HUP | PR_POLL_NVAL; NVAL is not used in in_flags according to https://developer.mozilla.org/en/PR_Poll Same for HUP: http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/include/prio.h#1939
Attachment #532054 -
Flags: superreview?(cbiesinger) → superreview-
Assignee | ||
Comment 38•13 years ago
|
||
(In reply to comment #37) > Comment on attachment 532054 [details] [diff] [review] [review] > workaround-isalive-runs-eventloop.3 > > Please don't add this argument. Just always use PR_Poll instead. Doing that is a regression for the non-ssl case. recv is testing for EOF - a condition that poll() cannot detect. > > Why pass PR_POLL_READ? You're not using that flag. as noted in comment 35, it is necessary in order to get the necessary ssl checks that were added as part of bug 480619 (see comment 45 in that bug). And in a sense I really am using that flag - I am logically testing the read side of that socket for exceptions and errors. I just don't want to actually do a read.
Assignee | ||
Comment 39•13 years ago
|
||
> > + if (aPassive) {
> > + *result = PR_TRUE; /* presume true */
>
> I'm not sure this should return true if PR_Poll returns -1
>
my reasoning is that a failure of the PR_Poll call doesn't tell me anything about the status of a particular FD - so it seems unwise to asusme the FD is dead because of it. Any FD specific failure is returned in the PRFileDesc.
Assignee | ||
Comment 40•13 years ago
|
||
update patch based on sr- in comment 37, modulo my explanations in comment 38 and comment 39
Attachment #532054 -
Attachment is obsolete: true
Attachment #532054 -
Flags: review?(honzab.moz)
Attachment #532082 -
Flags: superreview?(cbiesinger)
Attachment #532082 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 41•13 years ago
|
||
try server is green with both of these patches applied.
Comment 42•13 years ago
|
||
Comment on attachment 532082 [details] [diff] [review] workaround-isalive-runs-eventloop.4 Review of attachment 532082 [details] [diff] [review]: ----------------------------------------------------------------- I agree with Patrick's arguments here. r=honzab with a small change: ::: netwerk/base/src/nsSocketTransport2.cpp @@ +1858,2 @@ > > + if ((PR_Poll(&desc, 1, 0) == 1) && So, it really looks like on some platforms poll (better say select() on windows) is not guaranteed to be accessible concurrently from different treads. [1] Please add NS_ABORT_IF_FALSE with check for the socket thread before we call PR_Poll. [1] http://tangentsoft.net/wskfaq/intermediate.html, section 3.12
Attachment #532082 -
Flags: review?(honzab.moz) → review+
Comment 43•13 years ago
|
||
Comment on attachment 531793 [details] [diff] [review] out of date security callbacks v2 Review of attachment 531793 [details] [diff] [review]: ----------------------------------------------------------------- This is an important change and needs to be moved to a separate bug and have at least some STR on the record. r=honzab ::: netwerk/protocol/http/nsHttpConnection.cpp @@ +166,4 @@ > // take ownership of the transaction > mTransaction = trans; > mIsActivated = PR_TRUE; > + Just remove this whitespace change, but you have already mentioned that, I think.
Attachment #531793 -
Flags: review?(honzab.moz) → review+
Assignee | ||
Updated•13 years ago
|
Attachment #531793 -
Attachment is obsolete: true
Comment 44•13 years ago
|
||
I just pushed a fix on behalf of Patrick over on bug 656989 comment 3, but then noticed that the commit message had the wrong bug number (this bug). So, anyone looking for the bug associated with this cset... http://hg.mozilla.org/projects/cedar/rev/6d0e02e42e34 ...and arriving at this bug will want to hop over to bug 656989 comment 3.
Comment 46•13 years ago
|
||
(In reply to comment #38) > > Why pass PR_POLL_READ? You're not using that flag. > > as noted in comment 35, it is necessary in order to get the necessary ssl > checks that were added as part of bug 480619 (see comment 45 in that bug). Shouldn't the SSL poll implementation be fixed, then? > And in a sense I really am using that flag - I am logically testing the read > side of that socket for exceptions and errors. I just don't want to actually > do a read. That's not how poll() works... the read flag means "is data available for reading". it does not mean what you're written.
Comment 47•13 years ago
|
||
Comment on attachment 532082 [details] [diff] [review] workaround-isalive-runs-eventloop.4 So in the interest of getting a topcrasher fixed I'll mark sr+ but I am not happy with having the read flag here. It is very confusing to readers. At least: - File a bug to fix the poll implementation - Add a comment that it is needed because our SSL poll implementation is bad, referencing that bug number.
Attachment #532082 -
Flags: superreview?(cbiesinger) → superreview+
Assignee | ||
Comment 48•13 years ago
|
||
(In reply to comment #47) > - File a bug to fix the poll implementation bug 658138
Comment 49•13 years ago
|
||
Rev3 MacOSX Leopard 10.5.8 mozilla-central debug test mochitest-other on 2011/05/19 06:49:10 http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1305812950.1305816753.13101.gz
Summary: ASSERTION: Invalid start index: 'count == 0 || start < Length()' browser_bug581076.js browser_searching.js browser_bug618502.js browser_uninstalling.js → ASSERTION: Invalid start index: 'count == 0 || start < Length()' browser_bug581076.js browser_searching.js browser_bug618502.js browser_uninstalling.js browser_globalinformations.js
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 51•13 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/f1d79c22fd71
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 52•13 years ago
|
||
This seems to have caused a Tp4 regression on Win XP and 7. Regression :( Tp4 (%CPU) increase 8.32% on XP Firefox ----------------------------------------------------- Previous: avg 52.138 stddev 0.987 of 30 runs up to revision f594c196fac7 New : avg 56.477 stddev 0.446 of 5 runs since revision f1d79c22fd71 Change : +4.339 (8.32% / z=4.398) Graph : http://mzl.la/ljrnVR Changeset range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f594c196fac7&tochange=f1d79c22fd71 Regression :( Tp4 (%CPU) increase 8.27% on Win7 Firefox ------------------------------------------------------- Previous: avg 60.215 stddev 1.474 of 30 runs up to revision f594c196fac7 New : avg 65.194 stddev 0.390 of 5 runs since revision f1d79c22fd71 Change : +4.979 (8.27% / z=3.378) Graph : http://mzl.la/lduVoO Changeset range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f594c196fac7&tochange=f1d79c22fd71
Assignee: honzab.moz → mcmanus
Comment 53•13 years ago
|
||
This probably broke mailnews which also uses IsAlive
Assignee | ||
Comment 54•13 years ago
|
||
(In reply to comment #52) > This seems to have caused a Tp4 regression on Win XP and 7. > it is possible that without this patch the code was doing some speculative SSL work which was speeding things up - but it is exactly that work that was causing the deadlock. And that work wasn't being done in FF 4. IN some ways it is the flip side to: http://hg.mozilla.org/mozilla-central/rev/9b12c34434f4 which caused an improvement in the same area. It isn't clear to me at all why win tp4 is so sensitive to connection rates (there is another bug that shows that too) which should be very low latency in the test harness, but I think its pretty clear that something in there was hitting these bugs (and this definitely is a bug that can cause timeouts, early errors, or hangs - this code is definitely needed).
Comment 55•13 years ago
|
||
(In reply to comment #54) > (In reply to comment #52) > > This seems to have caused a Tp4 regression on Win XP and 7. > > > > it is possible that without this patch the code was doing some speculative > SSL work which was speeding things up - but it is exactly that work that was > causing the deadlock. And that work wasn't being done in FF 4. If I understand the report correctly (it's quite possible that I don't), the regression isn't about time but CPU load.
Comment 56•13 years ago
|
||
Indeed. The Tp4 regression report shows that on Windows only CPU usage went up 8% or so (while overall time remained unchanged).
Assignee | ||
Comment 57•13 years ago
|
||
* that was my mistake confusing the cpu and time metrics. The other changeset I referenced resulted in a 4% improvement in time. (it is also part of the feature) * the point of the feature, for which this patch is a incremental fix, is to trade a bit more work for improvements in time. Its only in SSL that this will turn out to be significant cpu, and SSL also has the most to gain (because of higher latency handshakes). Our tests, being localhost based, are going to undermeasure the improvements in time because those will scale up with rtt. So in the grand scheme of things this is consistent - though it is weird that it shows up in this patch. * something about tp4 on windows only seems extremely sensitive to connection changes - that bears investigation. We want to separate those effects out of the normal tp4 and get them into the network perf test nick hurley is working on (which should also be able to quantify the performance improvements of changes like this over various types of networks). * the only real change this bugfix makes is that for some early states of an SSL connect isalive() is tested with pr_poll instead of going right to a recv(MSG_PEEK). I would speculate that poll isn't caching some bit of computation that a subsequent recv() (and perhaps other poll() calls) redoes. I think the right thing to do is file a follow-on bug to investigate that and maybe I can get help from brian smith or another ssl guru in digging into it.
Comment 58•13 years ago
|
||
Filing a followup or two sounds good. Please do!
Comment 59•13 years ago
|
||
Note that mozilla-central will be merged to mozilla-aurora next week for Firefox 6. So as long as you don't fully understood what's going on, you might want to consider a backout too.
Comment 60•13 years ago
|
||
This changes a public interface that is being used by at least a couple of add-ons. Adding dev-doc-needed. It should be included in Firefox 6 for Developers.
Keywords: dev-doc-needed
Comment 61•13 years ago
|
||
(In reply to comment #60) > This changes a public interface that is being used by at least a couple of > add-ons. Adding dev-doc-needed. It should be included in Firefox 6 for > Developers. This might soon be backed out. See bug 658580.
Comment 62•13 years ago
|
||
Making the new parameter optional would mean this had no impact on add-on compatibility.
Comment 63•13 years ago
|
||
I see that this change has been backed out (#658580) in FF7 but will it be removed from the FF6 beta or will I have to write a workaround for my add-on to achieve FF6 compatibility?
Assignee | ||
Comment 64•13 years ago
|
||
FF6 has been restored to the original api. (everything matches now).
Comment 65•13 years ago
|
||
Is there any word on this change going back in at some point? It's on my doc radar but as far as I can tell is not actually in the tree right now.
Assignee | ||
Comment 66•13 years ago
|
||
(In reply to Eric Shepherd [:sheppy] from comment #65) > Is there any word on this change going back in at some point? no, it was fixed a different way.
Comment 67•13 years ago
|
||
So does that mean this no longer impacts developer documentation, or is there something I need to know?
Assignee | ||
Comment 68•13 years ago
|
||
(In reply to Eric Shepherd [:sheppy] from comment #67) > So does that mean this no longer impacts developer documentation, or is > there something I need to know? From the pov of actual releases there has been no interface change. So as long as you didn't make an update for the beta stream (which did have a change that got reverted) there is nothing to do.
Updated•13 years ago
|
Keywords: dev-doc-needed
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange][test which aborts the suite] → [test which aborts the suite]
You need to log in
before you can comment on or make changes to this bug.
Description
•