Closed Bug 730737 (TheGreatAddonTest) Opened 12 years ago Closed 12 years ago

Test the 100 most popular add-ons for memory leaks (esp. zombie compartments)

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Unassigned)

References

()

Details

(Keywords: qawanted, Whiteboard: [MemShrink:P1])

Attachments

(2 files)

Attached file top add-ons list
+++ This bug was initially created as a clone of Bug #704656 +++

Many add-ons leak (bug 700547).  A lot those that leak do so by creating zombie compartments (bug 668871).  The presence of a zombie compartment will cause an AMO add-on to not achieve full review (https://wiki.mozilla.org/AMO:Editors/EditorGuide/AddonReviews#Policies_and_actions_4).

The goal of this bug is to test the 100 most popular add-ons.  This includes the ~75% of all installed add-ons that aren't installed from AMO.

I've created the following etherpad:

  https://etherpad.mozilla.org/hSOVNzKZen

It has the list of 110 of the most popular add-ons, details on how this list was constructed, and instructions on how to test them.  If you are comfortable enough with Firefox to do things like install and uninstall add-ons and create new profiles, please feel free to help with the testing.  Please add test results to the etherpad.

I've also added an attachment showing just the list of the top 110.  In both that list and the etherpad the add-ons are listed in alphabetical order to obscure the popularity ranking, because this is considered sensitive information.
No longer depends on: 704656
Depends on: 730796
Blocks: 730796
No longer depends on: 730796
Alias: thegreataddontest
Whiteboard: [MemShrink] → [MemShrink:P1]
Blocks: 731801
Alias: thegreataddontest → TheGreatAddonTest
According to Jorge, this list includes disabled add-ons, which likely distorts it. Limiting it to Firefox 8 and newer might be a good idea as well.
orly?  Jorge, do we have any way to get the data with just enabled add-ons?
fligtar might be able to get that data, but he's away at MWC at the moment.
Quoting the etherpad:

  "I removed ones marked "AMO: Admin Disabled" and "AMO: Incomplete"."  

This left only ones marked "AMO: Fully reviewed", "AMO: Not hosted", and "AMO: Unknown".  Are some of the add-ons in these three categories disabled?
(In reply to Nicholas Nethercote [:njn] from comment #4)
> Quoting the etherpad:
> 
>   "I removed ones marked "AMO: Admin Disabled" and "AMO: Incomplete"."  
> 
> This left only ones marked "AMO: Fully reviewed", "AMO: Not hosted", and
> "AMO: Unknown".  Are some of the add-ons in these three categories disabled?

As I understand it, yes. All these states are server-side (including "Admin Disabled"), but we're talking about add-ons disabled in local profiles.
Yes, I don't think it's a good idea to remove AMO-disabled add-ons, since they are probably still being distributed outside of AMO (given their high install numbers).

Whether an add-on is locally disabled or not, or to what extent, is not discernible from the dashboard this list was taken from.
I just added this note to the top of the etherpad:

[Update:  the Panorama dashboard doesn't indicate which add-ons are
disabled in a Firefox installation, so this list does not accurately
reflect real-world usage of add-ons.  I suspect that AMO add-ons are
less likely to be disabled than non-AMO add-ons, and so I suggest that
we focus on testing the AMO add-ons in this list until we can get better
data.]
the DivX HiQ extension as listed, is a pita in that it resets the users toolbars when toggling it enabled/disabled and restarting
I remember in the past even having mzvkbd3.dll loaded in the firefox process would cause high memory usage, will be interesting to see if Kaspersky finally fixed it.
(In reply to Danial Horton from comment #8)
> the DivX HiQ extension as listed, is a pita in that it resets the users
> toolbars when toggling it enabled/disabled and restarting

Is there a bug on file for that?
My apologies, i mixed it up with the DivX Plus Webplayer HTML5 extension

the HiQ extension has actually been antiquated and is no longer available with the DivX package for some time now, as for the bug i reported.

https://bugzilla.mozilla.org/show_bug.cgi?id=667627
Depends on: 733516
https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/usage/status/?last=30

This example (Adblock Plus) shows that the status of an installed extension *is* discernible on the AMO side.

Is that not true for non-AMO extensions?
> Is that not true for non-AMO extensions?

Yes, that's what I took away from my last conversation with Unfocused.
Correct. More specifically, we can only tell the enabled/disabled state for addons that use AMO for updates.
Blair, how hard would it be to change the daily ping (or whatever it's called) to indicate if each installed add-on is enabled?
(In reply to Nicholas Nethercote [:njn] from comment #15)
> Blair, how hard would it be to change the daily ping (or whatever it's
> called) to indicate if each installed add-on is enabled?

I can't comment on how AMO would handle it, but from a client-side perspective it wouldn't be too difficult to do. Although, the metrics data ping should end up providing that data also - so I'm a bit hesitant to add that to the Add-ons Manager too.
> Although, the metrics data ping should end up providing that data also - so I'm a bit hesitant to 
> add that to the Add-ons Manager too.

I'm hesitant to wait on MDP, given that the privacy concerns haven't been resolved.  I hear it's being reworked, but I'm not holding my breath.  If and when MDP happens, we can remove the enabled/disabled data from the AMO ping.
(In reply to Justin Lebar [:jlebar] from comment #17)
> If
> and when MDP happens, we can remove the enabled/disabled data from the AMO
> ping.

Can someone clarify which ping is which, preferably by pointing at some code?

Presumably there are 2 different things going on. One thing is the add-on background update check. I think this happens daily and includes status information (such as enabled/disabled) which is the source of the statistics here: https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/usage/status/?last=30. It seems that this only happens for add-ons that use AMO for updates (comment 14).

Then there must be some other thing / check / ping which is the source of the top 100 list in this bug. It includes non-AMO add-ons, but not whether they are enabled or not. Is that correct? What is this particular thing?

As far as I understand, it doesn't come from any of: application update check, blocklist ping, crash data, Telemetry or metrics ping. So where?

(Aside: is there a complete list on the public wiki somewhere of all the checks / pings and what information each one includes)?
I think it would be good to include the google toolbar in these tests as well, even though Google has discontinued it there are many users still forcing it installed, myself being one of them, As the supposed alternatives are just not up to par.

Google has removed the install page but the file is still on their servers, you can install it from places like softpedia, and i also have an accounts.google.com modified version which has the hash file removed to remove that annoying corrupt xpi message.
(In reply to Daniel Cater from comment #18)
> (In reply to Justin Lebar [:jlebar] from comment #17)
> > If
> > and when MDP happens, we can remove the enabled/disabled data from the AMO
> > ping.
> 
> Can someone clarify which ping is which, preferably by pointing at some code?
> 
> Presumably there are 2 different things going on. One thing is the add-on
> background update check. I think this happens daily and includes status
> information (such as enabled/disabled) which is the source of the statistics
> here:
> https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/usage/
> status/?last=30. It seems that this only happens for add-ons that use AMO
> for updates (comment 14).

This happens here: http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/AddonManager.jsm#872

> Then there must be some other thing / check / ping which is the source of
> the top 100 list in this bug. It includes non-AMO add-ons, but not whether
> they are enabled or not. Is that correct? What is this particular thing?

This is a request for updated metadata (name, description, icon, screenshots, ratings) from AMO for the installed add-ons. At one point AMO supported listing non-hosted add-ons so the plan was to allow them to update their metadata in this way too which is why all are included. It happens here: http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/AddonManager.jsm#860

> As far as I understand, it doesn't come from any of: application update
> check, blocklist ping, crash data, Telemetry or metrics ping. So where?

It happens at the same time as the add-on update pings, depending on preference settings.

> (Aside: is there a complete list on the public wiki somewhere of all the
> checks / pings and what information each one includes)?

No.
(In reply to Dave Townsend (:Mossop) from comment #20)
> > Then there must be some other thing / check / ping which is the source of
> > the top 100 list in this bug. It includes non-AMO add-ons, but not whether
> > they are enabled or not. Is that correct? What is this particular thing?
> 
> This is a request for updated metadata (name, description, icon,
> screenshots, ratings) from AMO for the installed add-ons. At one point AMO
> supported listing non-hosted add-ons so the plan was to allow them to update
> their metadata in this way too which is why all are included.

It also now includes compatibility overrides - for both hosted and non-hosted addons.
> [Update:  the Panorama dashboard doesn't indicate which add-ons are
> disabled in a Firefox installation, so this list does not accurately
> reflect real-world usage of add-ons.  I suspect that AMO add-ons are
> less likely to be disabled than non-AMO add-ons, and so I suggest that
> we focus on testing the AMO add-ons in this list until we can get better
> data.]

In hindsight, this is probably overly pessimistic.  For example, I'd be surprised if the top 100 installed add-ons didn't cover the top 50 enabled add-ons.  So please continue to test if you wish to!
Ah, something i would like to bring to testers attention, some Zombies caused by addon's do not necessarily occur right off the bat.  i can reload an affected session and use a bunch of addons thati know i had used previously and every compartment closes as normal from a fresh start, but leave the session open awhile and a handful of different compartments are left open.

Not surprisingly, many of them are social notworking plugins from facebook or google
> Not surprisingly, many of them are social notworking plugins from facebook or google

To be clear, these persist even when you close all tabs other than about:compartments?
It's not surprising. A lot depends on triggering the right code path, and most of our testing has only minimal code coverage. There's really not much that can be done about it unless add-ons provide code coverage tests, which is not going to happen for the bulk of add-ons any time soon.
"To be clear, these persist even when you close all tabs other than about:compartments?"

They wouldn't be zombie compartments otherwise!

@Kris, of course, and it doesn't help these social plugins ....well in facebooks case, aren't exactly written a-grade.
Blocks: 735931
You know what is so crazy it just might work: Doing this test with Mechanical Turk.
Marcia Knous, Anthony Hughes and Juan Becerra are working on a plan for this.
(In reply to Jet Villegas (:jet) from comment #28)
> Marcia Knous, Anthony Hughes and Juan Becerra are working on a plan for this.

Great!  Marcia, Anthony, Juan:  do you have anything to add?
At this point I would say 'no'. We are in the final stages of defining our Desktop QA goals for this quarter. Once done we will have to prioritize our work and see where this fits in. 

If you'd like to speak to me 1:1 about this feel free to send me an email. I'm not sure it's prudent to conduct the consulatation/planning phase on this bug. Perhaps it would be better to work on this offline and report back to this bug when we have something to action.
> Perhaps it would be better to work on this offline and report back to this bug when we 
> have something to action.

That's basically been the course of action we've taken since we first tried to get QA to assist with this bug, soon after we filed it almost two months ago.  From my point of view, that did not work.

My understanding is that we were promised that QA would schedule test days and actively work on this, but then nothing happened.  But maybe I've misunderstood and that's not actually what happened.  The important part is: The fact that there's no public record means we can't check.

If there's anything more you need to know from us to inform your planning, I would very much appreciate if we could conduct ourselves here, in a publicly accountable way.
> My understanding is that we were promised that QA would schedule test days
> and actively work on this, but then nothing happened.  But maybe I've
> misunderstood and that's not actually what happened. 

You might be thinking of bug 704656, which was less ambitious -- it was about testing the top 10 add-ons, but was WONTFIXed in favour of this bug.  That bug is where test days were mentioned multiple times but nothing happened.
(In reply to Justin Lebar [:jlebar] from comment #31)
> If there's anything more you need to know from us to inform your planning, I
> would very much appreciate if we could conduct ourselves here, in a publicly
> accountable way.

Very well then. You want us to be publicly accountable for something? I can promise that QA will prioritize this project with our other work and get back to you soon with an expectation of when we could realistically deliver on this project. But, quite frankly, there's a lot of other competing work.

For Q2 we will be focusing pretty heavily on Desktop integration with Apps, WebRTC, Persona, etc. All while trying to keep up with rapid releases. This will be our priority and any other work will have to be secondary. 

Speaking to testdays; every testday we've run in the past for MemShrink has been completely unsuccessful because our community just finds it too hard and complicated. We need to find a way to make the process of finding and reporting issues simpler before I'm willing to spend anymore energy there.

We could probably rig up some automation to do this type of testing (as we did for Default Compatible Add-ons) but I'm not willing to promise work until we've prioritized. 

I'm sorry if this is not the answer you wanted but it's the best I can do for you right now. I'm not going to promise you something we can't deliver on. I'd rather be completely forthcoming and realistic in terms of expectations and have you disappointed in those expectations than to fail to deliver on a promise.

NOTE: If you want testdays organized, I'm the one you want to talk to -- I'm the module owner. But keep in mind that very few of our attendees have any coding experience whatsoever. Documentation, screencasts, and developer engagement with testers are just some ways that make or break a testday.
FWIW, https://developer.mozilla.org/en/Zombie_Compartments#Proactive_checking_of_add-ons is the current documentation.  Cutting and pasting the instructions for FF13 and later:

- Create a new profile.  Install and enable the add-on.

- Restart the browser.

- Browse one or more pages and do something that exercises the add-on.  For example, for an add-on that remembers passwords, visit a site that requires a password;  for an add-on that performs an operation involving a specific page element such as an image or a chunk of text, perform that operation.  The more you exercise the add-on, the more thorough the testing will be, but in practice a lot of problems show up quickly.

- Open about:compartments and close all other pages.

- Return to about:compartments and reload it, multiple times if necessary.

- If only system compartments are present, congratulations!  That's a good sign that the add-on doesn't cause zombie compartments.  Otherwise, please file a bug (using the "Tech Evangelism" product and the "Add-ons" component), add "[MemShrink]" to its whiteboard, and mark it as blocking bug 668871 and bug 700547.


Perhaps this is too difficult for the people who participate in test days.  I'd be happy to hear suggestions for improving this documentation.
> Very well then. You want us to be publicly accountable for something? I can promise that 
> QA will prioritize this project with our other work and get back to you soon with an 
> expectation of when we could realistically deliver on this project. But, quite frankly, 
> there's a lot of other competing work.

"We don't have time" is a perfectly acceptable response, from where I'm sitting.  We're very happy to fund a contractor to do this work if you guys can't fit it in.
Thanks for that clarification, Justin. So yeah, if you need something turned around rather quickly, I would recommend going outside of QA to do that. I don't see us having any spare cycles until at least 2 weeks from now.

If you decide this is something which can be delivered over the next few months, I think we can assist in the following ways:
 * we could throw the contractors we use for release testing at it
 * we could work with the A-Team to automate the testing (like our DTC tests)
 * we could organize a testday (assumes docs are simplified, screencast is provided, and a developer is available assist)
 * we could organize a meetup where we teach people about MemShrink and get them to help test
 * we could get a few community members to vet the documentation and ask questions

Let me know if any of this would be useful to you.
I propose we WONTFIX this bug, for two reasons.

- It has stalled;  the etherpad hasn't been modified since May 30.

- Bug 695480 and its follow-ups fixed most add-on memory leaks.

Any objections?  If there aren't any, I'll attach a snapshot of the etherpad as it currently stands before closing the bug, just for posterity.
I think testing of add-ons is important, even if memory leaks specifically will be uncommon.  An add-on is a patch to Firefox.  We would never release a build of Firefox with no QA, so I still think it's nuts that we don't do any systematic QA of Firefox plus add-ons.  Just because QA hasn't stepped up to this task doesn't mean we should give up on this bug, I think.

We can wontfix this bug and open a new "test add-ons" bug if you want, but I'm not convinced that would be useful.
I'd agree with closing this bug.
The bug was to test popular addons that might not have been checked during a review because it preceded the testing that is now done as part of the review process on AMO. Some of the popular addons were tested from this bug, and all new updates and addons are tested anyway on AMO.

Opening a new 'test add-ons' bug seems redundant unless there is a particular target for it (and some idea who would be doing the testing).
> The bug was to test popular addons that might not have been checked during a review because it 
> preceded the testing that is now done as part of the review process on AMO.

A major point of this bug was to test add-ons which aren't on AMO, so never get reviewed by the add-ons team.
(In reply to Justin Lebar [:jlebar] from comment #40)
> > The bug was to test popular addons that might not have been checked during a review because it 
> > preceded the testing that is now done as part of the review process on AMO.
> 
> A major point of this bug was to test add-ons which aren't on AMO, so never
> get reviewed by the add-ons team.

Then how about a new bug just for testing the top 25 or 50 non-AMO addons that weren't tested elsewhere yet?
> Then how about a new bug just for testing the top 25 or 50 non-AMO addons that weren't tested 
> elsewhere yet?

That would be great.
If we're going to pick an arbitrary point, I'd say test all non-AMO add-ons with more than 3 million users, which would be the top 26 non-hosted add-ons.
Hi guys, just chiming in as the voice for QA. We have a system for detecting memory usage while peforming basic tasks (tabbed browsing, playing video, etc) with add-ons installed; that is our Mozmill Endurance tests. 

Here is a sample report:
http://mozmill-ci.blargon7.com/#/endurance/report/e2b510e5d4ce4614a48bf3e6fb039cce

In fact, we reused this technology when we were testing Default Addon Compatibility. What we did there was run Mozmill with 5 baseline addons + 5 random top-100 addons and execute a functional+endurance testrun, update Firefox, then re-run the functional+endurance testrun. As long as add-on functionality doesn't need to be tested, we could probably reuse this technology. 

However, our top-3 priorities right now are b2g, basecamp, and fennec native. It's been a challenge finding time for any project which is not directly in service to those priorities. 

Basically, what it all comes down to is:
1) Does this testing support the company's priorities? If so, we'll need to do some QA priority shuffling.
2) Do you need memory leaks detected with add-ons installed? If so, the A-team can be brought in to reuse the DTC automation (assuming they have the resources).
3) Do you need memory leaks detected with add-ons being used? If so, we'll need to some crowdsourced manual testing.


My offer to you:
A) I can start a discussion with the A-team about reusing the Default Addon Compatibility automation (though expectation will be of low priority)
B) I can work with Justin to develop a fairly straightforward test plan so we can crowd-source the testing (either through a test day or a community-sourced sprint)
FWIW, I don't think the endurance tests are the right tool to use.  With an add-on, you often have to /use/ it before it starts behaving badly.  Every add-on is different, of course, so you can't automate this.  Moreover, we'd be looking not only for memory/perf problems, but add-ons policy violations (see recent thread in dev.governance).  Again, that kind of thing requires manual testing.

I of course can't speak to the relative priority of this task.
Automated testing is unfortunately not very useful unless we have unit tests to run for the particular add-on. Most of these issues only show up when the add-on's functionality is tested, so we're basically limited to manual testing.
Justin, I'd be perfectly willing to work with you on organizing a QA testday for this this quarter. It sounds like that might fit the bill. Let me know if that's agreeable.
What I would /like/ is a commitment from someone to get this testing done, one way or another.  I've been disappointed by the testdays in the past, because we hold them, don't get sufficient testing, and then say "well, we tried."  I would like that not to happen here.

I understand that QA may not be able to commit to doing this, due to other priorities.  So if you think a test day would get us somewhere, that would of course be better than doing nothing.  But if your question is whether I'd be satisfied simply with a commitment to hold a test day, the answer is "probably not."  :)

It seems to me -- and I'm not a QA engineer, so maybe I'm missing something here -- that testing 30 add-ons would take someone at most two days.  (That's a rate of 30 minutes per add-on.)  So while I understand that we have lots of priorities, I feel like the magnitude of this project may be misconceived.
(In reply to Justin Lebar [:jlebar] from comment #48)
> I understand that QA may not be able to commit to doing this, due to other
> priorities.  So if you think a test day would get us somewhere, that would
> of course be better than doing nothing.  

I don't see this testing getting done without crowdsourcing it, and a testday is QA's best tool for doing that. Running a testday /might/ get us the results we need, not running a testday is /guaranteed/ to get you nothing.


> It seems to me -- and I'm not a QA engineer, so maybe I'm missing something
> here -- that testing 30 add-ons would take someone at most two days. 
> (That's a rate of 30 minutes per add-on.)

In theory, yes. In my experience, any time estimates for QA work are double in reality. That said, we don't have the resources to pull someone off of a project for a few days. I still think crowdsourced testing is your best route to results.

I appreciate your disappointment, but I'm trying the best I can.
I think that a test day is a good start if we can agree on the extent of the testing and the testing procedures. If it goes well, I think the add-ons team should be able to handle the ones that go untested, since the top 30 add-ons affect tens of millions of users and I agree that testing them is important.
I filed bug 771748 for testing the top 30 non-AMO add-ons.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Here's a snapshot of the etherpad file at the time when the bug was closed.
> Here's a snapshot of the etherpad file at the time when the bug was closed.

Looks like someone added a whole lot of random Chinese at the end.  You gotta love public etherpads...
You need to log in before you can comment on or make changes to this bug.