Closed Bug 490794 Opened 12 years ago Closed 8 years ago

Kaspersky antivirus hooks into our process and enables the accessibility module

Categories

(Plugins Graveyard :: Kaspersky AV, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: d829719, Unassigned)

References

Details

(Keywords: access, Whiteboard: [crashkill-third-party])

Attachments

(3 files, 3 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)

I have a page with 200-300 forms stacked on top of each other (so they look like one big table, but they aren't).  Form style is set to display:inline, but they contain only block elements (plus some hidden fields).  I have some AJAX javascript that adds and removes forms dynamically.  In IE this happens very quickly; in Firefox it's *unbelievably* slow.  I traced it to the single removeChild() DOM operation, and created a test page to illustrate the problem (attached).

In IE clicking on any of the "Delete" buttons on the right instantly pops up an alert dialog informing me that the delete operation took 0 milliseconds (always).

In FireFox clicking the same button causes FireFox to hang for about 40-50 seconds, and causes 100% CPU usage for all that time.  When the alert message pops up, the time it reports is about 7000 ms, though FireFox is unresponsive for much longer than that.

I've figured out that if I change the css so that the form style is display:block, the problem is largely solved - Firefox responds quickly, and reports a delay of 3-5 milliseconds.  It's still slower than IE, but not absurdly so.  However, the fact that there's a workaround in this case doesn't mean there isn't a bug that should be fixed!

I'm not sure if it's the same issue or not, but the attached page is also far, far slower to load in FireFox than in IE.  I'm not certain, but I think that when removeChild() executes, FireFox might be re-flowing the entire page.  That might account for the long delay.


Reproducible: Always

Steps to Reproduce:
1. View the attached test page
2. Scroll to the right of the page
3. Click one of the "Delete" buttons
4. Observe the time it takes for the alert box to pop up, and the time reported in that box.  Also check CPU utilization during this time.
5. Now try the same thing in IE.
6. Also compare load times for the page in the two browsers.
Actual Results:  
Unbelievably slow (~40 seconds) in FireFox, instant in IE

Expected Results:  
Should be (nearly) insantaneous.

To avoid being spammed, the email address I've registered under is no longer valid.
Don't bother trying to contact me using this email.
I'll check the bug report periodically however, and answer any questions added there.
Attached file Initial Test Page (obsolete) —
Tested on Windows XP, Minefield and Firefox 3 need about 11.451 ms for the task, but Firefox 2 only 9.023 ms. So there is a certain decrease in performance over time. Don't know if this deserves the Regression keyword.
Indeed the other browsers have a much lower score.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → general
Version: unspecified → Trunk
Chrome also reports "0 ms" immediately.
On my machine, FF 3.0.10 reports ~2600ms reproducibly, so it obviously is a matter of CPU use by a very active process.
Attached file Updated Test Page (obsolete) —
The time reported by the initial test page was much shorter than the actual time it took Firefox to complete the operation and become responsive again.
I have fixed this, so that the time reported now accurately reflects the complete time taken for the removeChild() operation and all its effects to complete.
Attachment #375158 - Attachment is obsolete: true
[The problem in the initial page was that the alert() function blocks, waiting for all other pending operations to complete - and this waiting time was not being counted.  By calling alert() from with a setTimeout() function, all pending operations complete fully before the timeout is executed.]
Attached file Updated Test Page (obsolete) —
Now reports page load time as well.
Attachment #375456 - Attachment is obsolete: true
On the PC I am currently using (Windows XP, IE8 and FF 3.0.10) the results are as follows:

Page load time:  IE is consistently just under 2.4 seconds; FF varies widely between 5 and 10 seconds.

Pressing a "Delete" button: IE varies from 110ms to 172ms; FF varies from ~35000 to ~39000 ms - MORE THAN 200 TIMES SLOWER THAN IE!!!

(Clearly a bug, and not just a case of one browser being a little slower than another).

---------

r.e. Comment #2 above (Ria Klaassen):  These results are very different from what both I and Peter Gruendler (Comment #3) are getting.
Can you clarify how you got those figures? Are they averages? (The script reports only integer values).
Which version of Firefox 3 you were using?  Was it the release version (3.0.10) or the 3.5 beta version?

Can someone confirm if this bug still occurs in the latest 3.5 beta or not?
(In reply to comment #7)
> r.e. Comment #2 above (Ria Klaassen):  These results are very different from
> what both I and Peter Gruendler (Comment #3) are getting.
> Can you clarify how you got those figures? Are they averages? (The script
> reports only integer values).

As I wrote - different numbers most probaly reflect the CPU type - faster CPU,  smaller number. Indicates a very CPU consuming process.
I think it is the way my processor handles it. On a faster Windows system I get an average of 2550 ms after clicking on the Delete button and I see no difference anymore in performance between browser versions.
Well, for reference the results I posted in Comment #7 were obtained on a 2.8GHz Pentium D with 2GB RAM - i.e. a fairly decent machine.

Ria, either your PC really is 20 times faster than mine, or something else is affecting the speed (perhaps CPU L2 cache size, or a software or graphics card driver difference?).

(Also I don't understand how you could have gotten 11ms before and 2550ms now).

Even with your fast results though, I assume you'd agree that >2.5 seconds for doing a single removeChild() operation is far too long, an indicates a bug somewhere, right?
My Intel Duo Core T5500 (1.66 GHz, 667 Mhz FSB, 2MB L2 cache) does it 4,5x as fast as my good old AMD Athlon XP 2000+ but indeed still terribly slow compared to Opera, Google Chrome and IE.
Component: General → DOM: Core & HTML
Flags: blocking1.9.2?
Keywords: perf, qawanted
QA Contact: general → general
Attached file Shark profile
Component: DOM: Core & HTML → Layout
QA Contact: general → layout
Seems like most of the time is spend in layout and editor.
Olli, if you remove "display:inline;" from the following:

#AJAXdiv form {display:inline;margin:0ex;padding:0ex}

on line number 20 of the test file, then the problem does not occur - in fact Firefox is significantly faster than IE8 in that case.  The page looks identical either way.  So maybe if you profile that case as well, the difference will give a clue as to what's causing the problem?

Also, can the profiler tell you how many times each method is called?  Because I have a hunch it must be re-drawing every form on the page, not just the deleted one.  (Though, that still doesn't account for the time it takes, since loading the page initally takes far less time).
Dean, thanks for the clear steps to reproduce and excellent testcase!

Here's what I can tell you offhand given this page and the profile:

1)  The fact that the form has display:inline means that when it's removed from
    the DOM we end up recreating the layout objects for its containing block (in
    this case the <div id="AJAXdiv").  You can see that in the profile: it's the
    nsCSSFrameConstructor::ReframeContainingBlock call under ContentRemoved.
2)  Just recreating the layout objects immediately takes 25% of the time the
    profile shows.  The other 75% is then the editor initialization for all
    those new text control layout objects (see bug 221820 for doing editor init
    lazily).  The profile doesn't include time spent doing layout, which I
    think the testcase tries to measure using the setTimeout hack; there's
    obviously more time spent there.

So I would fully expect each removal to take about as long as the initial pageload.  That's what I see here: both numbers are consistently in the 2-3 second range, with Firefox 2, Firefox 3.0.10, and current trunk (pretty close to Firefox 3.5).  These are all on a 9-month-old Macbook Pro.  As Ria says, this should all scale pretty linearly with processor speed.  It's not great, but not that much involving handling of block-in-inline really is, performance-wise.  The lazy editor init will help; rewriting frame removal to deal with {ib} splits would help too.  I'm pretty sure we have a bug on that, but don't have the number offhand.  roc, it might be a good idea to consider adding that to our "make frame removal suck less" task list.

If a workaround for the problem is needed, then not setting the forms to be inlines should be it.

All that said, what I don't understand, and can't reproduce, is the effect Dean is seeing where the removal is so much slower than the initial pageload...  Looking at comment 0 again, it's sounding like the 7000ms was the removal I described above and the rest was presumably layout.  One could test this as follows:

      var tStart = new Date();
      f.parentNode.removeChild(f);
      var tMid = new Date();
      document.body.offsetWidth;
      var tEnd = new Date();
      alert("Removal took " + (tMid-tStart) + " milliseconds; layout took " +
            (tEnd-tMid) + " milliseconds");

Over here I pretty consistently get numbers around 2100 and 300 for the two numbers in that alert, respectively.  Dean, what do you get if you try doing that?

Also, just to be sure, are you testing in safe mode?  If not, does doing that affect the numbers?
Attached file Updated Test Page
Implemented Boris' suggestions in Comment #15
Attachment #375457 - Attachment is obsolete: true
> Dean, thanks for the clear steps to reproduce and excellent testcase!

You're welcome!  :-)


> If a workaround for the problem is needed, then not setting the forms to be
inlines should be it.

Umm... yeah.  I mentioned this already in my initial post.


>  All that said, what I don't understand, and can't reproduce, is the effect
>  Dean is seeing where the removal is so much slower than the initial
>  pageload... 

Boris, try the new test page.  On my laptop (A ~2 year old ASUS G2, AMD Turion 64 X2 2.2GHz with 4Gb RAM, running Vista Home Premium SP1) I'm getting typical results like this:

Removal took 3337 milliseconds.
Layout took 387 milliseconds.
Additional 'lazy' tasks took 33434 milliseconds.
Total time:  37158 milliseconds

Page loads vary enormously anywhere from around 5sec to around 20sec.

For comparison, IE8 is giving me these typical results (same machine):

Removal took 1 millseconds.
Layout took 33 milliseconds.
Additional 'lazy' tasks took 99 milliseconds.
Total time:  133 milliseconds

Page load time was 1653 milliseconds

Now some comments are in order:

(1) Safe mode makes no difference - I checked.

(2) IE is a lot faster on page load by anywhere from 3 to 12 times.  That might be worth a bug report of its own!

(3) On two not-too-old machines with completely different hardware and OS, pressing a "Delete" button locks up Firefox for around 40 seconds.  In comparison, IE8 takes 0.13 seconds... around 300 times faster.

(4) 90% of the time Firefox takes is occupied by extra "lazy" tasks.  By these I mean tasks that are queued to execute after the javascript onclick event completes.  You need the "setTimeout hack" to measure that, because setTimeout works by adding an event to the queue when the timer expires - in this case, immdiately; but at that point the "lazy" tasks are already inserted in front of it in the queue, so it executes after they do. (I'm assuming here:  I don't know anything about Firefox's internals, so it's possible there's more than one message queue or some other complication; but that could only increase the actual time taken vs reported, not decrease it).

(5) The first two numbers added together are consistent with a good chunk of the page load time (although why that varies so much I couldn't say).  But whatever the 'lazy' stuff it's doing later is, it's taking much longer that the page load.

(6) So there's possibly two separate performance issues here:  (i) speeding up page load and initial processing of removeChild(), and (ii) Fixing whatever the heck it is that's queued to execute afterwards and takes 90% of the time.

(7) Obviously all these numbers scale with processor speed;  So when Peter in Comment #3 got 2600ms whereas I got 7000ms, he obviously just has a better machine.  However the results reported in Comment #2 and Comment #9 suggest that either Ria has some kind of supercomputer, or else that those results don't include the complete task - either because for some reason Firefox doesn't behave the same way on Ria's machine, or else because a profiler was used - in a manner that (as you pointed out) doesn't measure 'lazy' tasks queued to execute AFTER the initial DOM function returns.  That's the reason I asked Ria what version of Firefox was used, and why I reported the specs of the machines I used to get my results - because it seems to me in this case to be an issue of more than just processor speed.
> Umm... yeah.  I mentioned this already in my initial post.

Right; it just wasn't clear to me whether that was seen as a viable option on your end.  ;)

> Page loads vary enormously anywhere from around 5sec to around 20sec.

That's odd in itself, actually.  I wouldn't expect a 4x variation here...

> Additional 'lazy' tasks took 33434 milliseconds.

_That_ is what makes no sense to me.  The only lazy task that should be left after the offsetWidth get is painting...  And painting this page just shouldn't take 30s.  I really wonder where the time here is spent on your system.  Over here, the "additional lazy tasks" time is on the order of 10-60ms, for the most part, which is believable for painting, gc, whatever.  My other numbers in that alert are about 2x lower than yours, which is maybe consistent with the relative speed of the processors in use.  My pageload time is about 2x faster than your lower bound, and pretty consistent.

I would bet money that the variability in load time you see is related to the time taken in "lazy tasks" here on removal...

> (1) Safe mode makes no difference - I checked.

OK.  Thanks for checking that!

> That might be worth a bug report of its own!

Yes, though I suspect that making editor init lazy would go a long ways here.  Adding a dependency on that bug, so we can retest once that lands.

> (3) On two not-too-old machines with completely different hardware and OS,

OK.  I'd been wondering about a possible graphics driver bug, but this makes it much less likely...

> (6) So there's possibly two separate performance issues here:

Yeah; let's stay focused on that "lazy tasks" thing that takes up most of the time for you and seems completely wrong to me.  We can file a separate bug for the pageload as needed.

I'll try to dig out my Windows VM and see whether I can reproduce the lazy tasks issue, but it'll take a few hours (in the middle of something else right now).  In the meantime, Ria or Martijn, can you let me know what you see on that most recent testcase (for both the pageload number and the other three numbers).
Depends on: 221820
As I mentioned above, I don't know anything about how Firefox works internally
- I'm a currently working as a web/DB developer, and I'm just dutifully
reporting a bug I discovered in a browser I was testing on - I've never looked
at Firefox source code.

Having said that, it seems to me that Firefox is doing a lot more work than it
should be, even if all it was doing is re-laying out the whole page.  (And
mysteriously it seems to be doing even more than that).

After all, if you have a long series of inline elements, and you delete one of
them, you'd expect the prior elements to be unaffected.  So you'd think that if
you deleted the last row on the page, there'd be no further layout to do after
removing that row.  (In fact the time taken does not seem to depend on which
"Delete" button you click; it's re-laying out everything).

Also, while in general inline elements FOLLOWING the deleted one will need to
be moved around and thus change their layout, starting from the first
block-level element encountered following the deleted element (even if it's
inside another inline element), the only change to that and following elements
will be to move up in the document by a fixed amount.  So no layout
calculations should need to be done there either, just a vertical position
adjustment applied to the elements from that point forward.

To illustrate, consider the following text:

Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves, And
the mome raths outgrabe.

Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!


...if you deleted "All mimsy were the borogoves, ", then (a) Nothing happens to
the two lines prior to that, (b) "And the mome raths outgrabe." would need to
be layed out again, because not only does it move, but word wrapping occurs at
a different point.  But, (c) From the start of the next paragraph on, all that
happens is that all the text below moves up a line.

Now, I guess that's probably an over-simplistic analysis, because things like
floats and different kinds of CSS positioning complicate matters.  But surely
the browser should at least be checking to see if it's *possible* to behave
that way in any given case, and doing so (as an optimization) wherever it can?

That's my 2 cents worth, and from this point forward I think I'll leave it in the capable hands of you Firefox developers.
> (3) On two not-too-old machines with completely different hardware and OS

... actually three, because I wasn't counting the old PC at work that I was using when I first discovered the problem.  It's slow, it runs XP, and offhand I don't remember anything else about it, but I'm sure it's got different hardware again to the two computers above.
Dean, in general inlines would behave exactly as you describe.  The issues come with inlines that contain a block inside them, as in this testcase.  So in this testcase Firefox is in fact blowing away the entire box model and rebuilding it on every removal.  But the real issue is still the "lazy time" you're seeing...
OK.  In my Windows VM, with Firefox 3.0.10, I get the following times on that last testcase:

  Pageload: 3508ms
  Removal: 1673ms
  Layout: 300ms
  Lazy tasks: 105ms

This is a Vista VM, though I tend to doubt that this would be an XP-vs-Vista issue.  Martijn, Ria, I'd love to hear what numbers you get.  Hopefully one of you can reproduce this problem....
I have an Intel Dual Core 1.83Ghz, 2.5GB RAM and I get the following times:

FF3.5RC3
Load time: 4050ms
Removal: 2253ms
Layout: 425ms
Additional 'lazy' tasks: 35ms
Total Time: 2713ms

IE8
Load time: 1797ms
Removal: 16ms
Layout: 31ms
Additional 'lazy' tasks: 47ms
Total time: 94ms

Pressing the 'Delete` button in IE is instantaneous whereas on FF you can notice the app not responding for a good 2.5s as the button stays in its pressed state. Was going through bugs, just thought I should help.
Thanks Raza Abbas.

I'm not sure why, but the really long times for the 'lazy' tasks (i.e. ~10 to 20 times as long as 'Removal' time) seem to occur on some computers and not on others.  When this phenomenon occurs, it occurs reliably (i.e. it occurs on every trial).  I haven't found a common denominator to the PCs on which it occurs, except that they're all somewhat older, slower machines.  They're also so far all PCs which I set up or use; someone suggested to me that perhaps some software I've installed causes the problem, but I can't think of anything that's likely to interfere with FF's internal processing.  Unless it's something to do with C++ dll libraries and MS Visual Studio or something?  I've no idea if that's plausible or not.

If someone here can try the test page on some older PCs, that might help clarify the situation.

The other thing I should add is that the window repaint occurs long before the 'lazy' tasks complete.  For example, on a trial I just ran, I got the following times:

FF 3.5, safe mode
Page Load: 3824 ms
Removal: 3186 ms
Layout: 620 ms
Additional 'lazy' tasks: 59826 ms
Total time: 63632 ms

The row being deleted VISIBLY DISAPPEARS about 3.5 to 4 seconds after pressing the "Delete" button; however the Firefox application does not become responsive again until the result message alert box pops up, around a minute later.  What it's doing in the intervening time, I've no idea!
Flags: wanted1.9.2-
Flags: blocking1.9.2?
Flags: blocking1.9.2-
So I've been trying to reproduce that extreme slowness and failing...

I really wonder what's going on with that.  If I could somehow reproduce it, profiling and figuring out where the time is spent would be a piece of cake.  :(
Attached file 500 radio buttons
Here is a simple test case. I noticed several other bugs that deal with slow forms -- but they usually put the forms in tables. I apologize if this is the wrong bug report to post this to. 

This test case simply has 500 radio buttons on a page. 

On my machine, the browser hangs for 15 seconds after the attached test case loads with the firefox process at 100% utilization. If I double the number of radio buttons to 1000, the hang time approximately quadruples to 64s -- Maybe a O(n^2) problem ?

The delay is also noticeable with a smaller numbers of buttons.

Also, exactly every 10s after the browser becomes responsive again, it re-hangs for another 15s. I have tested this in safe mode to rule out plugins.
 
RHEL5, Pentium 4, 2.8Ghz 
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2
Interesting, the test case I just uploaded does not hang when opened by the link in this bug report, but if I save the test case file to disk and open it from there, it still breaks. 

The directory is local (not NTFS) and I don't see any other abnormal system behavior that would make me think that there is a problem with the disk.
I checked to see if saving Dean's test case to disk first made a difference for me. It did not.


Initial page load: 6374

Removal took 5361 milliseconds.
Layout took 718 milliseconds.
Additional 'lazy' tasks took 199 milliseconds.

Total time:  6278 milliseconds
> I apologize if this is the wrong bug report to post this to. 

It is, yes.  Your testcase is almost certainly showing bug 477564 (especially given that 10s thing); it should perform much better in Firefox 3.5.3.  You might want to read that bug for the explanation of why you see no problem with the uploaded testcase.

Sounds like you're also not seeing Dean's problem (which is lazy tasks taking a very long time) on Dean's testcase...
Once bug 501847 lands we should perhaps revisit removal of {ib} splits and see about making it faster.
Depends on: 501847
I found the cause of the very slow "lazy tasks"!!!    :)   :)   :)

I was reinstalling Kaspersky Antivirus 2010 (due to an IE problem it seemed to be causing), and I thought to try running my test case with no Kaspersky AV installed...

To my surprise, the "lazy tasks" took only 100-300ms, instead of around 1 minute!
(They were quite a bit slower than this the first time around; these times are for the 2nd and subsequent "Delete" button presses on the same page).

After Kaspersky was reinstalled, the problem reappeared.
Disabling Kaspersky (i.e. clicking "Pause Protection") did NOT fix the problem (I'd already tried that a long time ago).

So, something about Kaspersky AV being installed (irrespecive of whether it's active or not) is causing the problem.

Now I guess all you guys have to do is figure out how and why Kaspersky AV is affecting Firefox in this particular manner on this particular test page (since I can testify that it hasn't slowed down web browsing in general for me - at least not noticeably).

Note:  when I first posted this bug, I was running an earlier version of Kaspersky AV (v7 I think); and it also occurs on a PC that's protected via Kaspersky Enterprise (i.e. the server edition of the product) -- so the problem ISN'T related to a specific Kaspersky AV version.
Aha!  Progress!

What happens if you set the browser.sessionstore.privacy_level preference to 2 in about:config?  Does that affect things in any way?
No.  No effect.
Huh.  I really wonder what the interaction with Kaspersky is here, esp. when it's claiming to be disabled....
Googling for Kaspersky and Firefox, I found this:

http://www.raymond.cc/blog/archives/2008/06/29/fix-firefox-3-crashing-problem-caused-by-kaspersky-2009-products/

His problem seems very unlikely to be related to this one, and his solution has no effect (I checked); however, he does confirm that (a) Kaspersky interacts with Firefox even when disabled, and (b) it (possibly) has something to do with the dlls that Kaspersky loads at startup (he lists 3 of them, but implies there are more).

I'm not sure just how these dlls would interact with the firefox process.  If firefox were writing to disk, then file AV scanning would intercept that (when enabled, anyway) - but DOM manipulation via javascript should presumably be handled entirely in-memory and in-process... I assume?

If so, there must be some other "hook" connecting FF to Kaspersky - either some OS call that FF is making that is captured by or redirected to Kaspersky, or else Kaspersky is Firefox-aware and is registering itself directly with the browser, presumably via some Firefox API that exists for that purpose?

That's all I can think of, anyway.
Boris (or anyone else interested):

Since no-one's reported being able to reproduce this bug, I'd be happy to run any tests required to track down its source on my own PC, and post the results back.  Just let me know what to do (bearing in mind that I'm not a FF developer).

(If you need to send me some files via email, ask me for my email address and I'll post it to you privately - the email address listed above is a dummy, so I don't get spammed).
> but DOM manipulation via javascript should presumably be handled entirely
> in-memory and in-process

Not quite, but you say that turning off session restore didn't affect things.  And I assume you see the problem on the testcase attached to this bug too, right?

We have no API for anti-virus software to register itself, but that doesn't mean they don't just randomly hook into things.  :(

ccing some folks who might know more about Windows stuff.
(In reply to comment #37)
> 
> We have no API for anti-virus software to register itself, but that doesn't
> mean they don't just randomly hook into things.  :(


That looks to be what's happening here. They probably wrap some system dlls to intercept calls, and for some reason this operation is triggering that, just a guess anyway. I'm not prepared to install Kaspersky on my desktop to test, but I can try and find some time to get a profiler running in an image this afternoon.
Prelim info, I can reproduce:

ie -

load: 1016
removal: 0
layout: 16
lazy: 15

fx 3.5.3 -

load: 1962
removal: 1338
layout: 244
lazy: 31

w/kaspirsky

ie -

load: 812
removal: 16
layout: 15
lazy: 78


fx 3.5.3 -

load: 2562
removal: 1613
layout: 242
lazy: 17130 

I don't have a profiler running yet, but I do see a few things - a lot of assertions in accessibility, particularly NS_ConstructAccessibilityService, on startup.

and whenever I break the thread out during the long "lazy" period during a remove, I land in places within accessibility.dll - 

accessibility.dll!nsDocAccessible::FireDelayedAccessibleEvent()  Line 1633	C++
accessibility.dll!nsDocAccessible::FireValueChangeForTextFields()  Line 1468	C++
accessibility.dll!nsDocAccessible::InvalidateCacheSubtree()  Line 2208	C++
accessibility.dll!nsAccessibilityService::InvalidateSubtreeFor()  Line 2055	C++
gklayout.dll!nsGenericElement::doRemoveChildAt()  Line 3336	C++
gklayout.dll!nsGenericElement::RemoveChildAt()  Line 3306	C++
gklayout.dll!nsGenericElement::doRemoveChild()  Line 4024	C++
gklayout.dll!nsGenericElement::RemoveChild()  Line 3543	C++
gklayout.dll!nsHTMLDivElement::RemoveChild()  Line 56	C++
gklayout.dll!DeleteElementTxn::DoTransaction()  Line 141	C++

Not sure if that's normal or not. I'm going to try and figure out where we spend most of our time next.
Oho!  Accessibility has all sorts of performance issues.  Is Kaspersky enabling it somehow?
Of particular interest, what's the stack to nsAccessibilityService::nsAccessibilityService ?
(In reply to comment #41)
> Of particular interest, what's the stack to
> nsAccessibilityService::nsAccessibilityService ?

accessibility.dll!nsAccessibilityService::nsAccessibilityService()  Line 135	C++
accessibility.dll!nsAccessibilityService::GetAccessibilityService()  Line 2068	C++
accessibility.dll!NS_GetAccessibilityService()  Line 2089	C++
accessibility.dll!NS_ConstructAccessibilityService()  Line 54	C++
xpcom_core.dll!nsGenericFactory::CreateInstance()  Line 80	C++
xpcom_core.dll!nsComponentManagerImpl::CreateInstance()  Line 1596	C++
xpcom_core.dll!nsComponentManagerImpl::GetService()  Line 1900	C++

Is there some sort of pref I could check?
A quick test on Windows with a11y active showed that a delete from the Updated
Test Page took about 6 times longer (with my debug non-optimized build 20 secs
vs 120 secs).

(I wonder. does kaspirsky activate our a11y code?)
Jim, what's higher up on the stack?  That is, who's getting the service and why?
(In reply to comment #44)
> Jim, what's higher up on the stack?  That is, who's getting the service and
> why?

Just poking around, if you want something more specific just ask - this looks suspect for sure, it's one of the original asserts we get on startup:

xpcom_core.dll!NS_DebugBreak_P()  Line 354
accessibility.dll!NS_ConstructAccessibilityService()  Line 60
xpcom_core.dll!nsGenericFactory::CreateInstance()  Line 80
xpcom_core.dll!nsComponentManagerImpl::CreateInstance()  Line 1596
xpcom_core.dll!nsComponentManagerImpl::GetService()  Line 1900
*mzvkbd3.dll!6d832c9a() 	
*mzvkbd3.dll!6d8312b0() 	
gkwidget.dll!nsBaseAppShell::Release()  Line 50
user32.dll!7e41f84a() 	
user32.dll!7e41f7f6() 	
user32.dll!7e41f94b() 	
ntdll.dll!7c90eae3() 	
user32.dll!7e4193e9() 	
user32.dll!7e4193a8() 	
user32.dll!7e419402() 	
gkwidget.dll!nsAppShell::ProcessNextNativeEvent()  Line 158

(below this is our normal event processing stack)

Here is the break out on the constructor a little bit later:

accessibility.dll!nsAccessibilityService::nsAccessibilityService()  Line 135
accessibility.dll!nsAccessibilityService::GetAccessibilityService()  Line 2068
accessibility.dll!NS_GetAccessibilityService()  Line 2089
accessibility.dll!NS_ConstructAccessibilityService()  Line 54
xpcom_core.dll!nsGenericFactory::CreateInstance()  Line 80
xpcom_core.dll!nsComponentManagerImpl::CreateInstance()  Line 1596
xpcom_core.dll!nsComponentManagerImpl::GetService()  Line 1900
*mzvkbd3.dll!6d832c9a() 	
[Frames below may be incorrect and/or missing, no symbols loaded for *mzvkbd3.dll]	
*mzvkbd3.dll!6d8312b0() 	
user32.dll!7e41f84a() 	
user32.dll!7e41f7f6() 	
user32.dll!7e41f94b() 	
ntdll.dll!7c90eae3() 	
user32.dll!7e4193e9() 	
user32.dll!7e4193a8() 	
user32.dll!7e419402() 	
gkwidget.dll!nsAppShell::ProcessNextNativeEvent()  Line 158

dll info:

mzvkbd3.dll	C:\Program Files\Kaspersky Lab\Kaspersky Anti-Virus
Description: Mozilla 3 Virtual Keyboard
File Version: 9.0.0.464
The windows event that's getting processed is odd, this stack drops into a windows message processing api (PeekMessageW) and pops ups up in this ksp dll, and then into our code.
The dll is linked to xpcom and uses the following entry points:

NS_StringGetData
NS_StringContainerInit
NS_StringContainerInit2
NS_StringContainerFinish
NS_GetServiceManager
It's a com server, so a "power user" solution to the problem is to unregister it. Drop a copy of mzvkbd3.dll in the fx directory, cd there in a command prompt, and unregister the object:

regsvr32 /u mzvkbd3.dll

then delete the offending dll.
Jim, thanks for getting this sorted out!

Sounds like we need two things here:

1)  A bug on the fact that accessibility being enabled makes this testcase
    really slow.  I've filed bug 518992.
2)  An evangelism bug on contacting Kaspersky and asking them to not toss random
    junk dlls into our address space and enable the accessibility module (which
    they seem to be doing quite purposefully, given that GetService stack).
    Reassigning this bug, I guess.

I really wish we had Google Chrome style dll blacklisting...

Dean, thanks a ton for your help here and for all the time you've spent on this!
Assignee: nobody → english-us
Component: Layout → English US
Flags: wanted1.9.2-
Flags: blocking1.9.2-
Keywords: perf, qawanted
Product: Core → Tech Evangelism
QA Contact: layout → english-us
Summary: *Unbelievably* slow when using removeChild() to delete a form on a page with 100s of forms that are display:inline. → Kaspersky antivirus hooks into our process and enables the accessibility module
Version: Trunk → unspecified
You're very welcome!

I'm glad to have helped.
(Also rather relieved that someone else has finally been able to reproduce the problem!)

  I don't know if it's particularly important to you at this point, but I *can* tell you what that dll (mzvkbd3.dll) is for.  It's an optional feature that is (or can be) installed with Kaspersky - a "Virtual Keyboard" for IE and FireFox (and maybe other browsers as well - I don't know).

  It puts a little button on the Navigation toolbar, that when clicked pops up a graphical keyboard on which you can "type" using the mouse.  The idea is that you should type in passwords that way for extra security, so that if there happens to be a keylogger infecting your system, what you type on the "Virtual Keyboard" won't be captured by it.

  It's a relatively new feature in recent versions of Kaspersky, and apparently people have reported lots of problems with it.  I don't know exactly what FF's "accessibilty module" does, but as the virtual keyboard is providing an alternate input method, it certainly sounds like something that might require "Accessibility" functions - which might be why Kaspersky is quite purposefully enabling it.

  What is odd though, is that when I first discovered this problem I was running an earlier version of Kaspersky, and I'm *fairly sure*, from memory, that there was no virtual keyboard installed or included in that version.  (Hmm... at least I'm fairly sure there was no *visible* toolbar button for it... I suppose it might have been there but only accessible via a menu or keyboard shortcut - I'm not sure).
  You might want to just check and make sure that mzvkbd3.dll is the ONLY thing in Kaspersky enabling accessibility, and that there isn't anything else that does it as well.


Best Regards,

- Dean
The accessibility module is basically the code that exposes the web page state to screen readers, as I understand.  David might know more about whether it's related to alternate input methods...
I do not understand why such a "virtual keyboard" would require using our accessibility interfaces. You should be able to implement it just fine by posting keyboard messages to the focused HWND. Or you could implement it by creating a real IME. If you really wanted to use accessibility, you could also do it without DLL injection by using the actual Microsoft accessibility API instead of instantiating the XPCOM service directly. If you need UI inside Firefox they can use an extension in combination with those things.

Definitely we have to get them to stop using accessibility here.
I can imagine that the virtual keyboard *might* want to use some of our a11y API, but I can also imagine that Kapersky might otherwise be querying the desktop platform accessibility API (which would also trigger/instantiate our a11y engine) for other reasons. So Dean, your memory might be correct.

Anybody here a (in tough with) Kapersky dev(s)?

I agree with Roc, it should be possible to figure out a solution without using our a11y service. That said, we'll of course still have to tackle the spin off bug.
Just contacted Kaspersky Lab via phone:
(905) 415-4594
[] = buttons pressed

[4]
x[1820]

Kaspersky Labs consumer product support

Hours 8am-9pm EST Monday-Friday
[2]

A friendly bunch. "Alex" from tech support took my email address and a 'higher up' will contact me. Sounds promising.
Nov 18, 2009: Email from Tim Armstrong from kaspersky.com.
Nov 19, 2009: I pointed him to this bug and further explained the issue, requesting he comment here of email me back.
I've had no further contact.
So at what point do we start considering blacklisting?
We should probably have a grace period where we say "we will have to blacklist you in X weeks on version FF Y.z"... not sure.
Update:
My contact, Tim has asked his HQ for an update and is waiting to hear back.
Whiteboard: [third-party]
Whiteboard: [third-party] → [crashkill-third-party]
Ping-emailed Tim again.
Received email today from Tim that Kaspersky engineers are in contact with our developers. Not sure where this is happening.
I'd love to know who they're in contact with, yes.
Update: bz and I are in the loop now.
I have confirmation that the new release (KIS/KAV) will not use accessibility API, which is great. I am not optimistic that current versions will be patched/updated. What is the user impact of blacklisting current versions?
We're now tracking such bugs. This doesn't mean it's something we can fix, merely something we hope to be able to point vendors to so they can investigate. This is an automated message.
Assignee: english-us → nobody
Component: English US → Kaspersky AV
Product: Tech Evangelism → Plugins
QA Contact: english-us → kaspersky-antivirus
Is this fixed now? If not: AFAIK, Firefox now has DLL blacklisting. So revisit this?
This is old and dead.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Product: Plugins → Plugins Graveyard
You need to log in before you can comment on or make changes to this bug.