Last Comment Bug 418470 - Go to line in View Source broken (NS_ERROR_UNEXPECTED at nsISelectionController.scrollSelectionIntoView)
: Go to line in View Source broken (NS_ERROR_UNEXPECTED at nsISelectionControll...
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: All All
: P2 normal (vote)
: mozilla1.9beta4
Assigned To: Masayuki Nakano [:masayuki] (Mozilla Japan)
:
Mentors:
Depends on:
Blocks: 392785 419636
  Show dependency treegraph
 
Reported: 2008-02-19 12:27 PST by Steffen Wilberg
Modified: 2008-02-28 07:29 PST (History)
10 users (show)
roc: blocking1.9+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch v1.0 (5.84 KB, patch)
2008-02-20 12:49 PST, Masayuki Nakano [:masayuki] (Mozilla Japan)
no flags Details | Diff | Splinter Review
Patch v2.0 (7.37 KB, patch)
2008-02-22 10:29 PST, Masayuki Nakano [:masayuki] (Mozilla Japan)
roc: review+
roc: superreview+
Details | Diff | Splinter Review
Patch v3.0 (9.79 KB, patch)
2008-02-26 14:03 PST, Masayuki Nakano [:masayuki] (Mozilla Japan)
no flags Details | Diff | Splinter Review
Patch v4.0 (38.85 KB, patch)
2008-02-27 02:34 PST, Masayuki Nakano [:masayuki] (Mozilla Japan)
no flags Details | Diff | Splinter Review
Patch v4.1 (36.54 KB, patch)
2008-02-27 03:06 PST, Masayuki Nakano [:masayuki] (Mozilla Japan)
roc: review+
roc: superreview+
mbeltzner: approval1.9b4+
Details | Diff | Splinter Review

Description Steffen Wilberg 2008-02-19 12:27:11 PST
When clicking on errors in the Error Console, View Source displays the file, but doesn't scroll down to the line of the error. If you scroll down manually, you'll find the line selected though. Affects Firebug as well.

When using View Source -> menu Edit -> Go to Line (Ctrl+L), this error is displayed in the Error Console:

Error: uncaught exception: [Exception... "Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsISelectionController.scrollSelectionIntoView]"  nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)"  location: "JS frame :: chrome://global/content/viewSource.js :: goToLine :: line 372"  data: no]

This is with today's Minefield nightly on Windows.
Works with Firefox 2.
Comment 1 Brian Polidoro 2008-02-19 15:37:16 PST
Regression range 

02-15 2302 ok
02-16 nightly broken

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1203145320&maxdate=1203176399

Comment 2 Brian Polidoro 2008-02-19 15:58:40 PST
From the patch in bug 392785 it looks like the last parameter needs to be changed to false in viewSource.js:
http://lxr.mozilla.org/mozilla/source/toolkit/components/viewsource/content/viewSource.js#406

There may be other places to change as well e.g.:
http://lxr.mozilla.org/mozilla/source/toolkit/components/viewsource/content/viewPartialSource.js#309

Comment 3 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-19 20:46:00 PST
(In reply to comment #2)
> From the patch in bug 392785 it looks like the last parameter needs to be
> changed to false in viewSource.js:
> http://lxr.mozilla.org/mozilla/source/toolkit/components/viewsource/content/viewSource.js#406
> 
> There may be other places to change as well e.g.:
> http://lxr.mozilla.org/mozilla/source/toolkit/components/viewsource/content/viewPartialSource.js#309

Exactly.
Comment 5 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-20 08:30:39 PST
the patch coming soon.
Comment 6 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-20 12:49:13 PST
Created attachment 304575 [details] [diff] [review]
Patch v1.0

I think that the cause of this bug is behavior changing of layout, so, this should be reviewed by layout reviewer.
Comment 7 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-20 15:58:51 PST
Actually, shouldn't synchronous ScrollSelectionIntoView be flushing layout?
Comment 8 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-22 10:29:25 PST
Created attachment 305038 [details] [diff] [review]
Patch v2.0

how about this?

this patch also includes backing out patch of the additional patch in bug 392785.
Comment 9 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-22 13:11:26 PST
Comment on attachment 305038 [details] [diff] [review]
Patch v2.0

Looks good, but have you checked that all ScrollSelectionIntoView callers are safe for flushing? They should not be holding nsIFrame* pointers, for example.
Comment 10 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-22 16:27:07 PST
(In reply to comment #9)
> (From update of attachment 305038 [details] [diff] [review])
> Looks good, but have you checked that all ScrollSelectionIntoView callers are
> safe for flushing? They should not be holding nsIFrame* pointers, for example.

I'm not sure. Is there a good way for checking it?
Comment 11 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-22 16:37:20 PST
If I change

   return mDomSelections[index]->ScrollIntoView(aRegion, aIsSynchronous,
-                                               PR_FALSE);
+                                               PR_TRUE);

to

+                                               aIsSynchronous);

The users can choose the flush safe or not. And the risk is lower. How about this?
Comment 12 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-23 01:18:41 PST
That's safer. You still should have a look at the callers who are passing true for aIsSynchronous. They should not be holding frame pointers across the ScrollIntoView call, and they should not be holding nsPresContext* or nsPresShell* pointers either.
Comment 13 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-23 04:19:52 PST
It sounds like my latest patch is very risky. Because I cannot check all callers of the ScrollSelectionIntoView callers. I think that for API compatibility, we should flush layout if aIsSynchronus is true, however, should I change the param of all callers in Gecko changing to PR_FALSE? There are 13 points in .cpp file. (I think that the risk is too low in .js files (5 points) for most cases.)
Comment 14 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-24 14:38:54 PST
See bug 375436 BTW.

> That's safer.

Actually it makes no difference; if aIsSynchronous is false then aDoFlush is ignored.

Changing all callers to async isn't actually safe since they might be relying on the scroll being synchronous, changing that could break their later operations. So we need to check the call sites no matter what.

I checked nsTypeaheadFind (both versions) and nsMsgCompose and those callers are safe. All JS callers must be safe already. You should be able to audit the rest.

Boris, in bug 375436 you made ScrollSelectionIntoView not flush for synchronous scrolls. Do you remember any specific caller/situation that could not handle a flush there?
Comment 15 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-25 06:07:43 PST
(In reply to comment #14)
> I checked nsTypeaheadFind (both versions) and nsMsgCompose and those callers
> are safe. All JS callers must be safe already. You should be able to audit the
> rest.

Note that the backing out positions are not all. I didn't change in some points PR_TRUE to PR_FALSE if they don't change selections before calling. (But we might be able to think that the current caller must not have any pending reflows excepting the new reflows of bug 392785.)
Comment 16 Boris Zbarsky [:bz] 2008-02-25 09:24:24 PST
> Do you remember any specific caller/situation that could not handle a
> flush there?

At least text control frame, iirc.  But basically, I seem to recall a bunch of callers, with a very branchy callstack so that I couldn't audit them in a reasonable amount of time...
Comment 17 Boris Zbarsky [:bz] 2008-02-25 09:24:52 PST
And to be honest, I'm not quite sure what the use cases for a sync scroll here are...
Comment 18 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-25 13:13:35 PST
Thanks. Yeah, the scrolls probably don't need to be synchronous, but code might depend on them being synchronous.

Okay then, the JS callers should stay synchronous and flushing if they want to be. nsTypeheadFind (both versions) and nsMsgCompose look safe to be synchronous. The others we need to audit and either verify that they're safe or convert to asynchronous.
Comment 19 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-25 13:37:36 PST
ok, I'm checking nsWebBrowserFind.cpp and nsHyperTextAccessible.cpp.
Comment 20 Boris Zbarsky [:bz] 2008-02-25 14:02:02 PST
To be honest, in my ideal world the scroll would always be async but a flush would flush out pending scrolls or something...
Comment 21 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-25 14:25:09 PST
With compositor, the frame tree will update synchronously but the actual scrolling will be deferred until the new window update.
Comment 22 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-25 14:33:14 PST
roc:

You said "nsPresShell pointer should not be held." ScrollSelectionIntoView is called from nsPresShell with synchronous, is it ok? (strong reference saves the presShell?)


safe:
nsWebBrowserFind::SetSelectionAndScroll
nsEditor::ScrollSelectionIntoView

unsafe:
nsEventListenerManager::PrepareToUseCaretPosition

unknown:
nsHyperTextAccessible::SetSelectionRange
Comment 23 Boris Zbarsky [:bz] 2008-02-25 14:45:09 PST
> strong reference saves the presShell?

I'd be somewhat doubtful.

I assume for the "safe" cases you checked all possible callstacks to make sure that none of the possible callers are holding any presshell or frame references?
Comment 24 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-25 14:55:32 PST
(In reply to comment #23)
> I assume for the "safe" cases you checked all possible callstacks to make sure
> that none of the possible callers are holding any presshell or frame
> references?

Yes. But I abort the checking when the stack item is js.

I'm worried that ScrollSelectionIntoView breaks the caller presShell by the flushing.
Comment 25 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-25 15:28:05 PST
PresShell::PageMove and PresShell::CompleteMoveInner look safe. The ultimate callers are nsEditorCommands and nsGlobalWindowCommands, which look safe, and each call along the path is at the end of the caller's method. We'd need some comments along those paths explaining that a flush might have occurred and the presshell dead (although not deleted, since the selection caller holds a strong ref to nsISelectionController which is the presshell). So nsPresShell should be OK.
Comment 26 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-25 15:28:50 PST
We need need need Taras-annotations and analysis for this stuff.
Comment 27 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-26 14:03:04 PST
Created attachment 305840 [details] [diff] [review]
Patch v3.0

Do I need to add the notice comments to all callers? (and also to all ancestor callers??)
Comment 28 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-26 14:59:25 PST
+    // This SecrollSelectionIntoView should not call as synchronous scrolling,
+    // because it might flush the layout reflow.
+    // It might be dangerous at aPresShell.
     rv = selCon->ScrollSelectionIntoView(nsISelectionController::SELECTION_NORMAL,
-        nsISelectionController::SELECTION_FOCUS_REGION, PR_TRUE);
+        nsISelectionController::SELECTION_FOCUS_REGION, PR_FALSE);

This is probably not a good idea. After this, the code is messing around with caret coordinates and those might be wrong if the scroll has not happened yet. We should make this path safe for synchronous scrolling. This function actually looks OK, and its caller nsEventListenerManager::FixContextMenuEvent also looks OK, and its caller nsEventListenerManager::HandleEvent should be OK since event handling can do pretty much anything already. (But these functions should be documented as potentially flushing.)
Comment 29 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-27 00:26:38 PST
It would be good if all functions that can flush said so in their documentation, yes.
Comment 30 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-27 02:34:57 PST
Created attachment 305982 [details] [diff] [review]
Patch v4.0

Thank you roc. The ELM part is using synchronous now.
Comment 31 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-02-27 02:41:01 PST
+  // After FixContextMenuEvent(), the pending reflow might be flushed and

"pending notifications"

+  // PresShell/PresContext/Frames will be dead.

"may be dead"

Actually it would be more concise at all these sites to just say
  // Beware! This may flush notifications via synchronous ScrollSelectionIntoView.

Sorry! But if you fix that we should be good.
Comment 32 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-27 03:06:43 PST
Created attachment 305984 [details] [diff] [review]
Patch v4.1

updating the comments.
Comment 33 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-27 04:03:17 PST
Comment on attachment 305984 [details] [diff] [review]
Patch v4.1

This patch is pretty risky. But this patch changes the API (ScrollSelectionIntoView) behavior. So, we should land this to b4 and many testers (including addon developers) should test this.
Comment 34 Mike Beltzner [:beltzner, not reading bugmail] 2008-02-27 14:11:19 PST
Moving bugs that aren't beta 4 blockers to target final release.
Comment 35 Mike Beltzner [:beltzner, not reading bugmail] 2008-02-27 21:37:50 PST
Comment on attachment 305984 [details] [diff] [review]
Patch v4.1

a1.9b4=beltzner
Comment 36 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-28 03:02:54 PST
checked-in.
Comment 37 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-28 06:00:23 PST
backed out.

*** 11859 ERROR FAIL | Should not be able to navigate popup's popup by submitting form. | got 1, expected 2 | /tests/docshell/test/navigation/test_not-opener.html

*** 11860 ERROR FAIL | Should not be able to navigate popup's popup by targeted hyperlink. | got 1, expected 2 | /tests/docshell/test/navigation/test_not-opener.html
Comment 38 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-28 06:12:57 PST
mmmm....
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1204204620.1204207474.9549.gz
# find "/tests/docshell/test/navigation/test_not-opener.html"

> *** 11857 INFO Running /tests/docshell/test/navigation/test_not-opener.html...
> *** 11858 INFO PASS | Should not be able to navigate popup's popup by calling window.open.
> *** 11859 INFO PASS | Should not be able to navigate popup's popup by submitting form.
> *** 11860 INFO PASS | Should not be able to navigate popup's popup by targeted hyperlink.

After the failure, the next test time didn't orange. Isn't it a my bug??
Comment 39 Masayuki Nakano [:masayuki] (Mozilla Japan) 2008-02-28 07:29:51 PST
I cannot reproduce the failure and tinderbox already succeeded the test before backed out. Therefore, I rechecked-in the patch.

Note You need to log in before you can comment on or make changes to this bug.