Closed Bug 522416 Opened 15 years ago Closed 15 years ago

Tab Previews must not do sync http requests

Categories

(Firefox :: Shell Integration, defect, P1)

x86
Windows 7
defect

Tracking

()

RESOLVED FIXED
Firefox 3.7a1
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: vlad, Assigned: sdwilsh)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-complete)

Attachments

(6 files, 4 obsolete files)

This made the problem in bug 521668 much much easier to hit, but that's a separate thing.  Regardless, tab preview code must never do sync network requests;  WindowsPreviewPerTab.jsm's _imageFromURI needs to become async.
Flags: blocking-firefox3.6+
Use nsIFaviconService?
That would probably be better. For some reason I thought that I was only getting favicon:// urls which were resolved to some local location.
Component: Tabbed Browser → Shell Integration
QA Contact: tabbed.browser → shell.integration
i think using moz-anno:favicon:uri should be the way to go
Hm, it's not a matter of just changing the URI, right?  The code has to be async, regardless of what URI is being used.
Who can take this bug - it's blocking the beta.
Component: Shell Integration → Tabbed Browser
moz-anno:favicon uses an async channel
I think Vlad meant that it needs to be read asynchronously, so the required fix is a little more than just pulling the information from a different channel - though that will certainly help!
Shawn said he'd take this
Assignee: nobody → sdwilsh
Attached patch xpcom and netwerk changes v1.0 (obsolete) — Splinter Review
This was a lot harder than expected.  Step one - lets make a convenience method for JavaScript to easily open channels asynchronously.  Required a change in nsStorageStream.
Attachment #406568 - Flags: superreview?(benjamin)
Attachment #406568 - Flags: review?(bzbarsky)
Attached patch Taskbar changes v1.0 (obsolete) — Splinter Review
And the jsm changes needed with the new API.
Attachment #406570 - Flags: review?(vladimir)
Attached patch Taskbar changes v1.1 (obsolete) — Splinter Review
per comments from robarnold on irc
Attachment #406570 - Attachment is obsolete: true
Attachment #406583 - Flags: review?(vladimir)
Attachment #406570 - Flags: review?(vladimir)
So this is still reading from the network (albeit async now)? Seems like that's worth a followup bug to move this to use the favicon service (or Places' moz-anno stuff), to avoid the network requests entirely? [Along with the followup you mentioned for favicons often not loading at all.]
(In reply to comment #12)
> So this is still reading from the network (albeit async now)? Seems like that's
> worth a followup bug to move this to use the favicon service (or Places'
> moz-anno stuff), to avoid the network requests entirely? [Along with the
> followup you mentioned for favicons often not loading at all.]
The favicon service has no asynchronous way to read data from the database with the exception of using the moz-anno: protocol for favicons, which is what this does.  The moz-anno: protocol has to go through necko as a result, but it's not actually going over the network.  With these patches, this bit of code no longer does any synchronous IO on the main thread.
Whiteboard: [needs review vlad][needs review bz][needs sr bsmedberg]
(In reply to comment #13)
> With these patches, this bit of code no
> longer does any synchronous IO on the main thread.

Sure, but the point is a DB lookup should complete faster than an HTTP request, sync/async has nothing to do with it. We can change the favicon service to offer an async method if that's a win.
Attached patch Taskbar changes v1.2 (obsolete) — Splinter Review
Fix default icon race and a comment nit from robarnold.
Attachment #406583 - Attachment is obsolete: true
Attachment #406588 - Flags: review?(vladimir)
Attachment #406583 - Flags: review?(vladimir)
Attachment #406568 - Flags: superreview?(benjamin) → superreview?(vladimir)
Whiteboard: [needs review vlad][needs review bz][needs sr bsmedberg] → [needs review vlad][needs review bz][needs sr vlad]
(In reply to comment #14)
> We can change the favicon service to
> offer an async method if that's a win.

no reason, if you want to get an async channel from favicon service, as i said, and as i implemented apart, you can open an async channel to a moz-anno:favicon: address (Well now you could even use the netUtil method Shwan added). i tested that working, but Shawn's solution looks more polite than this http://mozilla.pastebin.com/m365fe0b4.
(In reply to comment #16)
> no reason, if you want to get an async channel from favicon service, as i said,
> and as i implemented apart, you can open an async channel to a
> moz-anno:favicon: address (Well now you could even use the netUtil method Shwan
> added). i tested that working, but Shawn's solution looks more polite than this
> http://mozilla.pastebin.com/m365fe0b4.
we also have to worry about a delayed load with the favicon service, right?
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsFaviconService.cpp#503
A few quick questions:

1)  Is there a reason to use a storage stream here instead of a pipe?  The latter would allow data to be discarded as it's read, which might be desirable.

2)  If you use nsISimpleStreamListener you don't have to reimplement it in js (or rather just need to implement an nsIRequestObserver with an OnStopRequest that calles the callback), right?  What are the drawbacks?
Ran the regression tests on Shawn's patch. Found a crash-near-startup bug (see widget fixes; test included to catch a regression on it) and all the browser tests for AeroPeek were broken in bug 520724 so those are fixed too.
Attachment #406615 - Flags: review?(vladimir)
Status: NEW → ASSIGNED
(In reply to comment #18)
> 1)  Is there a reason to use a storage stream here instead of a pipe?  The
> latter would allow data to be discarded as it's read, which might be desirable.
biesi and I were talking on irc and thought that this would be simpler than using a pipe.  I don't fully recall the reasoning (we looked at three or four options all together).  For a more general case, a pipe might be better.

> 2)  If you use nsISimpleStreamListener you don't have to reimplement it in js
> (or rather just need to implement an nsIRequestObserver with an OnStopRequest
> that calles the callback), right?  What are the drawbacks?
Do I just pass the nsISimpleStreamListener to asyncOpen on the channel, and then OnStopRequest, flush init it with the either the nsIStorageStream's output stream or a pipe's output stream?
(In reply to comment #17)
> (In reply to comment #16)
> we also have to worry about a delayed load with the favicon service, right?
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsFaviconService.cpp#503
that's why i added a second icon refresh after 5s. but yeah that's the main problem of the favicon service approach.
Component: Tabbed Browser → Shell Integration
OK.  No strong opinions on storage stream vs pipe, I guess, other than pipe already supporting writefrom and such.

> Do I just pass the nsISimpleStreamListener to asyncOpen on the channel, and
> then OnStopRequest, flush init it with the either the nsIStorageStream's output
> stream or a pipe's output stream?

In your asyncOpen, you create the nsISimpleStreamListener, init it with the output end of a pipe or storage stream and an observer of your choice that watches for onStopRequest, and pass the nsISimpleStreamListener to the channel's asyncOpen.
Comment on attachment 406568 [details] [diff] [review]
xpcom and netwerk changes v1.0

>+    return Write(data.get(), aCount, _bytesWritten);

s/aCount/data.Length()/, right?
Attachment #406615 - Flags: review?(vladimir) → review+
(In reply to comment #23)
> s/aCount/data.Length()/, right?
Those should be the same, right?  I could add an assertion there I suppose...
Whiteboard: [needs review vlad][needs review bz][needs sr vlad] → [needs review bz][needs sr vlad]
Attachment #406588 - Flags: review?(vladimir) → review+
Maybe call asyncOpen something like asyncFetch?  "asyncOpen" implies just opening to me, whereas the important part of the API (and this particular usage of it) is that it will open and fully read from the stream, ensuring that reading from the input stream will never block.
> Those should be the same, right? 

Not if the stream hit EOF before aCount bytes got read from it.  Which can happen, no problem.
s/asyncOpen/asyncFetch/ per comments by vlad on irc
Use nsIPipe and nsISimpleStreamListener
Attachment #406568 - Attachment is obsolete: true
Attachment #406744 - Flags: superreview?(vladimir)
Attachment #406744 - Flags: review?(bzbarsky)
Attachment #406568 - Flags: superreview?(vladimir)
Attachment #406568 - Flags: review?(bzbarsky)
Updated to use the new API name.  Ready to land.
Attachment #406588 - Attachment is obsolete: true
(In reply to comment #26)
> Not if the stream hit EOF before aCount bytes got read from it.  Which can
> happen, no problem.
Ah.  That code has all been removed from the latest patch anyway since I no longer needed to make that change.
Attachment #406744 - Flags: superreview?(vladimir) → superreview+
Whiteboard: [needs review bz][needs sr vlad] → [needs review bz]
Comment on attachment 406744 [details] [diff] [review]
netwerk changes v2.0

Looks good.
Attachment #406744 - Flags: review?(bzbarsky) → review+
Whiteboard: [needs review bz]
http://hg.mozilla.org/mozilla-central/rev/d3ce844996ee
http://hg.mozilla.org/mozilla-central/rev/a9e961601e20
http://hg.mozilla.org/mozilla-central/rev/3124e42df5a6
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Keywords: dev-doc-needed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 3.7a1
This didn't apply perfectly on branch - one minor conflict in the test file.  This does.
Shawn, do you think makes sense to file a bug to use the favicon service when we fail fetching through http? (offline mode or network errors)
I think it makes sense to always use the favicon service, and we should file a bug for that.
release branch!  I had to take the changesets for bug 521216 since it would require non-trivial changes for this work to apply to just the release branch.  That bug had already landed on 1.9.2, so I figure it's probably just fine.

http://hg.mozilla.org/releases/mozilla-1.9.2/rev/c307c9951835
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/5d6d66d2e92d
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/28f2f49b61fe
Blocks: 522855
i have filed Bug 522855 to implement this through the favicon service.
Awesome, Shawn. Thanks so much for getting this taken care of on 1.9.2 and the releasebranch.
Ugh... with today's nightly (Minefield, haven't tried Namaroka), I see deadlocks that look like this:

ntdll_77330000!NtWaitForSingleObject
KERNELBASE!WaitForSingleObjectEx
KERNEL32!WaitForSingleObjectExImplementation
KERNEL32!WaitForSingleObject
nspr4!_PR_MD_WAIT_CV
nspr4!_PR_WaitCondVar
nspr4!PR_Wait
xul!nsAutoMonitor::Wait
xul!nsPipeInputStream::Wait
xul!nsPipeInputStream::ReadSegments
xul!NS_InputStreamIsBuffered
xul!imgTools::DecodeImageData
xul!NS_InvokeByIndex_P
xul!XPCWrappedNative::CallMethod
xul!XPCWrappedNative::GetLock
xul!XPC_WN_OnlyIWrite_PropertyStub
MOZCRT19!arena_malloc_small
MOZCRT19!malloc
mozjs!JS_HashTableRawAdd
mozjs!JS_HashTableAdd

There are only two users of DecodeImageData, and the other one is in C++ code; so it has to be the one here.   This stack is bogus after the XPCWrappedNative bits due to xpconnect, but the top part is telling me that the PipeInputStream is blocking on DecodeImageData -- I'm guessing that DecodeImageData consumed all the data, but the pipe somehow didn't get notified that the base stream got closed.  So, it's getting a wouldblock and is calling Wait(), leading to badness.

Is a simple fix here to set the pipe to non-blocking via SetNonBlocking?  Or is there some deeper problem?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Is it possible that these deadlocks are behind the random 'not responding' issues I am having with the 17 October build of Minefield?
I just hit this again despite having the taskbar previews pref set to false -- does that pref actually work?  It does seem to turn off the previews, but this code still seems to be executed.

(In reply to comment #40)
> Is it possible that these deadlocks are behind the random 'not responding'
> issues I am having with the 17 October build of Minefield?

If you're running Windows 7, then quite possibly, yes.
I can confirm that the build with cset:
http://hg.mozilla.org/mozilla-central/rev/8976e1704153 is OK, no hangs on Acid3 test

The first build with cset: 
http://hg.mozilla.org/mozilla-central/rev/3124e42df5a6 which contains this patch locks hard, and have to kill the process. 

Win7RC build 7100 
AMD Athlon CPU Phenom II Quad X4, 8 gig RAM HD3200 onboard video 512meg
Blocks: 522892
(In reply to comment #41)
> I just hit this again despite having the taskbar previews pref set to false --
> does that pref actually work?  It does seem to turn off the previews, but this
> code still seems to be executed.

Yes, the previews' visibility is just set to false. That's the only difference.
Apparently, pages with empty favicons trigger the hang:

data:text/html,<link rel="icon" href="data:image/x-icon,">
(In reply to comment #39)
> Is a simple fix here to set the pipe to non-blocking via SetNonBlocking?  Or is
> there some deeper problem?
hmm.  bz - did we want this line:
pipe.init(false, false, 0, PR_UINT32_MAX, null);
to be this instead:
pipe.init(true, true, 0, PR_UINT32_MAX, null);
Er... we want non-blocking streams, so yes.  I double-checked that when I read the patch originally, and clearly got confused by all the negatives...

That said, would just calling close() on the output end of the pipe in the onStopRequest there before calling the callback address the issue?
Ah, so the stack in comment 39 makes sense: we're trying to read 1 byte from a blocking stream.

I think we might want to both close() and make the streams non-blocking to be safe, though the former should be enough (changes the return from GetReadSegment from NS_BASE_STREAM_WOULD_BLOCK to NS_BASE_STREAM_CLOSED when the input end of the pipe is empty).
Vlad, I'm resetting this as status1.9.2:--- on the assumption that you meant to have this continue to block the beta. (If that's not right, we should close this bug again and file a follow up.)
(In reply to comment #48)
> Vlad, I'm resetting this as status1.9.2:--- on the assumption that you meant to
> have this continue to block the beta. (If that's not right, we should close
> this bug again and file a follow up.)
It needs to continue to block the beta.  I need to head out to dinner, but I'll make a patch tonight for bz to review.  I'll try to land it tomorrow.

bz - ideas for a test that would fail and won't in the "correct" situation?  Preferably xpcshell.
Attached patch hang fix v1.0Splinter Review
I'll make a test for this on Monday when I get back to the office.  I'm having issues with building on this machine.
Attachment #406902 - Flags: review?(bzbarsky)
Whiteboard: [needs review bz]
Comment on attachment 406902 [details] [diff] [review]
hang fix v1.0

r=bzbarsky

If we're talking unit test for the fetch method here, you could fetch "data:text/plain," then in the caller try to read one byte from the input stream returned.  If it hangs, the test failed.  ;)
Attachment #406902 - Flags: review?(bzbarsky) → review+
Oh, and presumably do the read in a try/catch, since it'll throw BASE_STREAM_CLOSED when passing.
This morning I foolishly updated to today's nightly and immediately encountered this hang on startup. I can now verify that this patch fixes the hang and the favicons still do show up as expected in the tab previews.
Whiteboard: [needs review bz]
It would seem that a temporary workaround would be to toggle browser.chrome.favicons to false.  After I did this, I was able to get the Acid3 test to run w/o hanging.
If somebody has the ability to land this, please do.  I'm not going to be able to do so for several hours.

As stated in comment 50, I'll write the test on Monday.  Comment 53 indicates that the patch fixes the hang.
Keywords: checkin-needed
(In reply to comment #35)
> http://hg.mozilla.org/releases/mozilla-1.9.2/rev/d6cce120cff7
> http://hg.mozilla.org/releases/mozilla-1.9.2/rev/56c0fa3e56ba
> http://hg.mozilla.org/releases/mozilla-1.9.2/rev/429e7f7ba34d

xpcshell tests have been hanging mozilla-1.9.2 since this set of changes landed there, and the hang fix has not changed that. Eg http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1255907641.1255910818.28959.gz
It passes xpcshell/tests/test_intl_strres/unit/test_bug397093.js and then hangs. xpcshell/tests/test_necko/unit/test_NetUtil.js is the next test in logs when it there isn't a hang.

The regression range is only the three changes above, which points the finger at d6cce120cff7. Could someone please check asap if this is a app regression or a test regression.
I'm not sure how that's passing on mozilla-central either...

There should be a run_next_test() call here I think.  
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/d6cce120cff7#l2.76

I'm in the process of rebuilding to test this fix, but I suspect it's it.
Keywords: checkin-needed
I just downloaded the latest hourly with this patch in it and when I toggle browser.chrome.favicons back to true, I don't get a hang on the Acid3 site or other sites that were hanging.
Pushed the run_next_test() call to all 3 branches.
(In reply to comment #62)
> Pushed the run_next_test() call to all 3 branches.
I presume that worked then?  My build just finally finished...
Apparently it didn't: http://tinderbox.mozilla.org/showlog.cgi?tree=Firefox3.6-Unittest&errorparser=unittest&logfile=1255917030.1255919856.28064.gz&buildtime=1255917030&buildname=OS%20X%2010.5.2%20mozilla-1.9.2%20test%20everythingelse&fulltext=1
is http://hg.mozilla.org/releases/mozilla-1.9.2/rev/35804d6d3262

TEST-PASS | /builds/slave/mozilla-1.9.2-macosx-unittest-everythingelse/build/xpcshell/tests/test_intl_strres/unit/test_bug397093.js | test passed

command timed out: 1200 seconds without output, killing pid 372
Silly.  NetUtil.ioService doesn't exist on 1.9.2, so the test fails there.  We never report the error because it's outside of the normal run_test() function (asynchronous testing FTL I guess).

Fixed in:
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/8217722e6af4
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/5879c8980e69
So is this now resolved? Looks like the patch fixes the issue (comment 60) and the test failures were resolved (comment 65).
Marking FIXED and status1.9.2beta1-fixed.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
(In reply to comment #66)
> So is this now resolved? Looks like the patch fixes the issue (comment 60) and
> the test failures were resolved (comment 65).
Left it open because I still need to write a test as stated in comment 50.  If we close it, I'm more likely to lose track of it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #59)
> I'm not sure how that's passing on mozilla-central either...
> 
> There should be a run_next_test() call here I think.  
> http://hg.mozilla.org/releases/mozilla-1.9.2/rev/d6cce120cff7#l2.76
Ugh - this was actually wrong.  The call that stops the server takes a callback, where we pass that method in anyway.  My new patch with the test will fix this.
This is a P1 blocker, so if you think we should resolve the test issue before we ship the beta, let's leave it open. If not, though, I'd suggest filing a follow-up for the tests, mark it blocking? and I'll + it as a P2 blocker.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
The tests, as promised in comment 50
Attachment #407080 - Flags: review?(bzbarsky)
Attachment #407080 - Flags: review?(bzbarsky)
I'll land on branch in a bit.  Patch doesn't apply completely cleanly, so being extra careful this time.

http://hg.mozilla.org/mozilla-central/rev/3314dc670ff4
Comment on attachment 407080 [details] [diff] [review]
tests for hang vix v1.0

r+, fwiw.
Attachment #407080 - Flags: review+
And documentation:
https://developer.mozilla.org/en/JavaScript_code_modules/NetUtil.jsm#asyncFetch

Leaving dev-doc-needed so sheppy can look it over.
Looks good - nice job!
Shawn: What is the easiest way for QA verify this bug? Am seeing some weirdness while testing the candidate builds.  Thanks.
(In reply to comment #77)
> Shawn: What is the easiest way for QA verify this bug? Am seeing some weirdness
> while testing the candidate builds.  Thanks.
That's probably best identified by vlad since he noticed the issue in the first place.  I can't think of an easy way to verify it.
"Doesn't crash during browsing" is really the only way I'm verifying it.  I was crashing trivially every 15 min or so with more than a handful of tabs open, especially upon session restore.  Bugzilla tabs in particular seemed highly likely to trigger the problem.
FWIW, I've been browsing and reviewing bugs with yesterday's nightly all day yesterday and today with no crashes; I'd be comfortable calling this VERIFIED based on that.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: