Last Comment Bug 659633 - Pseudo-hang when a XUL tree element is instantiated
: Pseudo-hang when a XUL tree element is instantiated
Status: VERIFIED FIXED
: perf, regression, verified-aurora
Product: Core
Classification: Components
Component: Disability Access APIs (show other bugs)
: 2.0 Branch
: x86 Windows 7
: -- normal (vote)
: mozilla7
Assigned To: alexander :surkov
:
:
Mentors:
Depends on: 673757
Blocks: a11yperf
  Show dependency treegraph
 
Reported: 2011-05-25 07:31 PDT by Wladimir Palant
Modified: 2011-09-07 07:18 PDT (History)
5 users (show)
surkov.alexander: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
Testcase (2.04 KB, application/vnd.mozilla.xul+xml)
2011-05-26 01:10 PDT, Wladimir Palant
no flags Details
patch (4.66 KB, patch)
2011-05-26 23:39 PDT, alexander :surkov
tbsaunde+mozbugs: review+
mzehe: feedback+
Details | Diff | Splinter Review
Test extension (1.37 KB, application/x-xpinstall)
2011-05-27 03:05 PDT, Wladimir Palant
no flags Details
patch against aurora (715 bytes, patch)
2011-06-08 15:33 PDT, Trevor Saunders (:tbsaunde)
asa: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Wladimir Palant 2011-05-25 07:31:55 PDT
I noticed that after I use Firefox 4 for some time opening Adblock Plus preferences hangs - the window displays and freezes for several seconds then. This only happens after a lengthy browsing session which is why I don't know what is so special about this window and whether the problem happens on trunk as well, experiments involving a browser restart take way too much time. JavaScript Deobfuscator shows that the problem is connected to the tree widget, after the initial paint nsITreeView::GetTreeCell() is being called for all cells of all rows (not only the visible ones) several times which is what causes the issue (row count of something around ten thousand combined with a custom tree view implementation).

I attached WinDbg to the process. This way I could determine that paint isn't the culprit here, it is fast as always. The call stack for the offending call:

> 0052d298 7051b1c2 xul!nsXULTreeGridCellAccessible::Init(void) [e:\builds\moz2_slave\rel-2.0-w32-bld\build\accessible\src\xul\nsxultreegridaccessible.cpp @ 1100]
> 0052d2a4 705423b8 xul!nsDocAccessible::BindToDocument(class nsAccessible * aAccessible = 0x081ce580, struct nsRoleMapEntry * aRoleMapEntry = 0x351b63e0)+0x6f [e:\builds\moz2_slave\rel-2.0-w32-bld\build\accessible\src\base\nsdocaccessible.cpp @ 1370]
> 0052d2c8 703dc7b8 xul!nsXULTreeGridRowAccessible::GetCellAccessible(class nsITreeColumn * aColumn = 0x081ce580)+0x8d [e:\builds\moz2_slave\rel-2.0-w32-bld\build\accessible\src\xul\nsxultreegridaccessible.cpp @ 742]
> 0052d2dc 70541c10 xul!nsXULTreeGridRowAccessible::GetChildAt(unsigned int aIndex = 0x81ce580)+0x1f [e:\builds\moz2_slave\rel-2.0-w32-bld\build\accessible\src\xul\nsxultreegridaccessible.cpp @ 712]
> 0052d2f8 70541c26 xul!nsDocAccessible::CacheChildrenInSubtree(class nsAccessible * aRoot = 0x174859b4)+0x2b [e:\builds\moz2_slave\rel-2.0-w32-bld\build\accessible\src\base\nsdocaccessible.cpp @ 1952]
> 0052d314 70541c26 xul!nsDocAccessible::CacheChildrenInSubtree(class nsAccessible * aRoot = 0x079461cc)+0x41 [e:\builds\moz2_slave\rel-2.0-w32-bld\build\accessible\src\base\nsdocaccessible.cpp @ 1955]
> 0052d330 70541c6c xul!nsDocAccessible::CacheChildrenInSubtree(class nsAccessible * aRoot = 0x079461cc)+0x41 [e:\builds\moz2_slave\rel-2.0-w32-bld\build\accessible\src\base\nsdocaccessible.cpp @ 1955]
> 0052d340 7054a85f xul!nsDocAccessible::NotifyOfInitialUpdate(void)+0x39 [e:\builds\moz2_slave\rel-2.0-w32-bld\build\accessible\src\base\nsdocaccessible.cpp @ 1514]
> 0052d358 6fef0ee6 xul!nsDocAccessibleWrap::NotifyOfInitialUpdate(void)+0xf [e:\builds\moz2_slave\rel-2.0-w32-bld\build\accessible\src\msaa\nsdocaccessiblewrap.cpp @ 288]
> 0052d360 6fef0a23 xul!nsCOMPtr<nsIProxyInfo>::operator=(class nsIProxyInfo * rhs = 0x083bf740)+0x18 [e:\builds\moz2_slave\rel-2.0-w32-bld\build\obj-firefox\dist\include\nscomptr.h @ 665]
> 0052d368 6fd22b59 xul!nsIViewManager::UpdateViewBatch::EndUpdateViewBatch(unsigned int aUpdateFlags = 0x47ea6e3d)+0x1b [e:\builds\moz2_slave\rel-2.0-w32-bld\build\obj-firefox\dist\include\nsiviewmanager.h @ 336]
> 0052d370 6fcce5ea xul!nsRefPtr<nsIDOMEventListener>::~nsRefPtr<nsIDOMEventListener>(void)+0xc [e:\builds\moz2_slave\rel-2.0-w32-bld\build\obj-firefox\dist\include\nsautoptr.h @ 970]
> 0052d420 6fc39b04 xul!PresShell::FlushPendingNotifications(mozFlushType aType = Flush_Content (1))+0x32a [e:\builds\moz2_slave\rel-2.0-w32-bld\build\layout\base\nspresshell.cpp @ 4953]
> 0052d43c 6fc44cf2 xul!TimerThread::UpdateFilter(unsigned int aDelay = 0x5a840fc0, class mozilla::TimeStamp aTimeout = class mozilla::TimeStamp, class mozilla::TimeStamp aNow = class mozilla::TimeStamp, double smoothSlack = 1.9716673574817238e-236)+0x254 [e:\builds\moz2_slave\rel-2.0-w32-bld\build\xpcom\threads\timerthread.cpp @ 236]
> 0052d474 6fc44a5c xul!nsTimerImpl::Fire(void)+0x92 [e:\builds\moz2_slave\rel-2.0-w32-bld\build\xpcom\threads\nstimerimpl.cpp @ 409]
> 0052d47c 6fcbf194 xul!nsTimerEvent::Run(void)+0x1c [e:\builds\moz2_slave\rel-2.0-w32-bld\build\xpcom\threads\nstimerimpl.cpp @ 519]
> 0052d4b0 6fcf317a xul!nsThread::ProcessNextEvent(int mayWait = 0, int * result = 0x00000000)+0x324 [e:\builds\moz2_slave\rel-2.0-w32-bld\build\xpcom\threads\nsthread.cpp @ 633]
> 0052d4ec 6fe0f5ba xul!mozilla::ipc::MessagePump::Run(class base::MessagePump::Delegate * aDelegate = 0x706f2554)+0x6a [e:\builds\moz2_slave\rel-2.0-w32-bld\build\ipc\glue\messagepump.cpp @ 110]
> 0052d4f8 6fe0f5a3 xul!MessageLoop::RunInternal(void)+0x11 [e:\builds\moz2_slave\rel-2.0-w32-bld\build\ipc\chromium\src\base\message_loop.cc @ 219]
> 0052d534 71398b90 xul!MessageLoop::RunHandler(void)+0x1d [e:\builds\moz2_slave\rel-2.0-w32-bld\build\ipc\chromium\src\base\message_loop.cc @ 203]

That's where I am stuck right now, I don't know the accessibility code well enough to determine what is going wrong here. Anybody have a hint on what I should check?

Note that I've seen similar pseudo-hangs before when closing a tab. Attaching the debugger showed tons of accessibility objects being destroyed.

Firefox version: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Comment 1 alexander :surkov 2011-05-25 08:00:46 PDT
(In reply to comment #0)

> That's where I am stuck right now, I don't know the accessibility code well
> enough to determine what is going wrong here. Anybody have a hint on what I
> should check?

this is accessible tree creation, it looks like it's just take too long to create the tree for XUL trees. On the another hand for XUL trees I think we could create tree lazily, this should get rid the problem.

Other question how does it happen that a11y is enabled for you?
Comment 2 Wladimir Palant 2011-05-26 01:04:10 PDT
(In reply to comment #1)
> this is accessible tree creation, it looks like it's just take too long to
> create the tree for XUL trees.

Checked properly, I have a tree with 31k lines and four visible columns, I see 127k GetCellText calls - so there is apparently only one call per tree cell after all (plus some calls from regular paint operations). Which still isn't fun, a tree is supposed to work efficiently with large amounts of data, it should only consider visible rows rather than instantiating objects for each single cell.

Apparently, this is happening for all tree widgets, not just the one in Adblock Plus. I've created a minimal testcase - it is also reporting that GetCellText() is being called for all tree cells, at least in the affected browser instance. A second browser instance running with an identical copy of the profile reports only calls for the visible rows.

> Other question how does it happen that a11y is enabled for you?

Good question. It seems to enable itself after an lengthy browsing session and I have no idea why that happens. As I said, right now I have two Firefox instances running on identical profiles - one is affected (has been running for a week), the other (started yesterday) isn't. The only preferences related to accessibility are:

user_pref("accessibility.typeaheadfind.autostart", false);
user_pref("accessibility.typeaheadfind.flashBar", 0);
user_pref("accessibility.win32.force_disabled", true);

The last one was added after I noticed hangs when closing browser tabs. Doesn't help obviously.
Comment 3 Wladimir Palant 2011-05-26 01:10:58 PDT
Created attachment 535278 [details]
Testcase

This testcase needs to be copied somewhere where it is accessible via chrome://, then it can simply be opened in a browser tab. It contains a tree widget and will register a tree view for it that does almost nothing but counts GetCellText() calls. It also claims to have 20000 rows. The page will then display the number of calls.

Normally it displays something like 84 - calls only for the visible rows. In my "broken" browser instance it hangs for a few seconds and displays 60175 then.
Comment 4 Wladimir Palant 2011-05-26 01:12:46 PDT
(In reply to comment #1)
> Other question how does it happen that a11y is enabled for you?

Is there a single flag controlling whether a11y is enabled? Maybe I can attach a breakpoint to it to see when this flag is flipped.
Comment 5 alexander :surkov 2011-05-26 05:04:39 PDT
a11y gets enabled either when extension starts to use nsIAccessibiltityRetrieval or some application sends WM_GETOBJECT to Firefox. I don't think there's another way to start a11y on Windows (and your two-browsers experience looks quite strange). After a11y is started then there's no way to stop it until you close Firefox.

Thanks, Wladimir, for the testcase, I think I can come with patch to fix this issue.
Comment 6 Wladimir Palant 2011-05-26 06:01:19 PDT
(In reply to comment #5)
> a11y gets enabled either when extension starts to use
> nsIAccessibiltityRetrieval

Ok, found it without debugging - a search in the extensions directory was enough. It's DOM Inspector, it triggers nsIAccessibilityRetrieval when the inspector window is opened. I will file an additional DOMi bug, paying such a high price by default simply because of using DOMi sounds unreasonable.
Comment 7 Wladimir Palant 2011-05-26 06:04:09 PDT
Btw, the issue isn't reproducible in the current 7.0a1 nightly (which of course doesn't mean that it is really fixed).
Comment 8 alexander :surkov 2011-05-26 06:05:10 PDT
Hm, DOMi shouldn't start a11y just because you run it (of course if View->Show Accessible Nodes isn't turned on). Please cc me on that bug.
Comment 9 alexander :surkov 2011-05-26 06:05:50 PDT
(In reply to comment #7)
> Btw, the issue isn't reproducible in the current 7.0a1 nightly (which of
> course doesn't mean that it is really fixed).

I guess it should be because a11y simply isn't started there.
Comment 10 Wladimir Palant 2011-05-26 06:11:08 PDT
My bad, it is reproducible in the nightly as well - but "Show accessible nodes" preference (inspector.dom.showAccessibleNodes) needs to be checked. And apparently it isn't checked by default, I must have switched it on at some point. So no DOM Inspector bug here.
Comment 11 alexander :surkov 2011-05-26 23:39:20 PDT
Created attachment 535562 [details] [diff] [review]
patch
Comment 13 Marco Zehe (:MarcoZ) 2011-05-27 02:35:31 PDT
(In reply to comment #3)
> Created attachment 535278 [details]
> Testcase
> 
> This testcase needs to be copied somewhere where it is accessible via
> chrome://, then it can simply be opened in a browser tab. It contains a tree

I have tried, but don't get *any* reaction. Either I copy the file to a wrong place and it doesn't get loaded without giving me an error 404, or the file doesn't work on my system at all for some reason.
Can you give me some more specific details where I could copy this to make sure it works?

Thanks!
Comment 14 Wladimir Palant 2011-05-27 03:05:28 PDT
Created attachment 535596 [details]
Test extension

Here the same testcase as an extension. Install it, then open chrome://testextension/content/test.xul in the browser. Normally it shows you an alert message saying something like "Number of calls: 87". If the number of calls is beyond 60000 then something is wrong.
Comment 15 Marco Zehe (:MarcoZ) 2011-05-27 03:07:08 PDT
Comment on attachment 535562 [details] [diff] [review]
patch

f=me. After I found a different way of loading the file properly, I found that this indeed speeds up matters. Also I ran the whole test suite, and all tests still pass.
Comment 16 Marco Zehe (:MarcoZ) 2011-05-27 03:49:10 PDT
The extension definitely shows the bug fixed: I get a "number of calls" of 163. And all 20000 rows are there.
Comment 17 Trevor Saunders (:tbsaunde) 2011-06-03 20:37:13 PDT
Comment on attachment 535562 [details] [diff] [review]
patch

>diff --git a/accessible/src/base/nsAccessible.h b/accessible/src/base/nsAccessible.h
>--- a/accessible/src/base/nsAccessible.h
>+++ b/accessible/src/base/nsAccessible.h
>@@ -322,20 +322,34 @@ public:
>    */
>   nsAccessible* GetEmbeddedChildAt(PRUint32 aIndex);
> 
>   /**
>    * Return index of the given embedded accessible child.
>    */
>   PRInt32 GetIndexOfEmbeddedChild(nsAccessible* aChild);
> 
>-  PRUint32 GetCachedChildCount() const { return mChildren.Length(); }
>-  nsAccessible* GetCachedChildAt(PRUint32 aIndex) const { return mChildren.ElementAt(aIndex); }
>+  /**
>+   * Return number of content children/content child at index. The content
>+   * child is created from markup in contrast to it's never constructed by its
>+   * parent accessible (like treeitem accessibles for XUL trees).
>+   */
>+  PRUint32 ContentChildCount() const { return mChildren.Length(); }
>+  nsAccessible* ContentChildAt(PRUint32 aIndex) const
>+    { return mChildren.ElementAt(aIndex); }
>+

next time can the rename be a seperate patch from the real change?thanks :)

Otherwise I think the patch does what we want.  I'm a little worried about this causing other problems in xul trees so lets watch out for that.
Comment 18 alexander :surkov 2011-06-06 19:36:21 PDT
(In reply to comment #17)

> next time can the rename be a seperate patch from the real change?thanks :)

it makes sense for big patches, but here I'm not sure :)
Comment 19 alexander :surkov 2011-06-06 19:37:30 PDT
landed - http://hg.mozilla.org/mozilla-central/rev/85420161e336
Comment 20 alexander :surkov 2011-06-06 19:42:11 PDT
Marco, let's get it running on trunk for few days and then ask approval for Firefox 6. This can result into quite serious performance regression.
Comment 21 David Bolter [:davidb] 2011-06-06 20:04:27 PDT
We'll need a note to drivers here in the bug explaining the patch, and the risk.
Comment 22 alexander :surkov 2011-06-06 20:23:17 PDT
(In reply to comment #21)
> We'll need a note to drivers here in the bug explaining the patch, and the
> risk.

I don't see anything risky in the patch, changes may be not trivial but shouldn't regress us. Therefore I'd take couple days to make sure we don't spot regressions.
Comment 23 Marco Zehe (:MarcoZ) 2011-06-06 21:18:45 PDT
(In reply to comment #22)
> (In reply to comment #21)
> > We'll need a note to drivers here in the bug explaining the patch, and the
> > risk.
> I don't see anything risky in the patch, changes may be not trivial but
> shouldn't regress us. Therefore I'd take couple days to make sure we don't
> spot regressions.

Unfortunately, this patch needs bug 660149 and bug 660153 to apply cleanly. Since Aurora is in pure bug fix mode from regressions introduced recently, I do currently not see how we can get this one into Firefox 6, especially when it depends on two other patches.
Comment 24 alexander :surkov 2011-06-06 21:23:56 PDT
Marco, the fix is in nsDocAccessible::CacheChildrenInSubtree(nsAccessible* aRoot), anything else (including two other patches) is cleaning up.
Comment 25 Marco Zehe (:MarcoZ) 2011-06-06 23:05:09 PDT
OK, can you then please provide an Aurora-specific patch that contains the fix for this bug without any additional cleanup that we can then request approval on just in case? And also, can you clearly point out when this regressed? Is this an FX4 regression?
Comment 26 alexander :surkov 2011-06-06 23:12:36 PDT
(In reply to comment #25)
> OK, can you then please provide an Aurora-specific patch that contains the
> fix for this bug without any additional cleanup that we can then request
> approval on just in case?

Sure, it's trivial. Trevor, can you take this? to make sure the reviewing is close enough to patch works ;)

> And also, can you clearly point out when this
> regressed? Is this an FX4 regression?

yes.
Comment 27 Trevor Saunders (:tbsaunde) 2011-06-08 15:33:30 PDT
Created attachment 538128 [details] [diff] [review]
patch against aurora

apply only the fix  against aurora.  tests pass, and I've confirmed with the extension the number of calls was ~150
Comment 28 Marco Zehe (:MarcoZ) 2011-06-09 04:28:40 PDT
Additions to the approval request: This is a regression introduced in Firefox 4 which impairs all extensions doing lots of tree table manipulation when accessibility is turned on. It's a safe fix and improves performance by a big magnitude (see comments above for value comparisons).
Comment 29 Marco Zehe (:MarcoZ) 2011-06-10 07:41:34 PDT
Pushed to Aurora on Surkov's behalf: http://hg.mozilla.org/releases/mozilla-aurora/rev/cef3a3b83bdb
Comment 30 Marco Zehe (:MarcoZ) 2011-06-16 11:23:28 PDT
Verified fixed in Aurora build: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110616 Firefox/6.0a2
Comment 31 Simona B [:simonab ] 2011-09-07 07:18:10 PDT
Mozilla/5.0 (Windows NT 6.1; rv:7.0) Gecko/20100101 Firefox/7.0

Verified issue on Windows 7 using the test case attached in Comment 14 - the alert message shows up with a "Number of calls: 120".

Setting resolution to VERIFIED FIXED.

Note You need to log in before you can comment on or make changes to this bug.