Last Comment Bug 358591 - (amo-mxr) Ability to view and search source code of all extensions on addons.mozilla.org (MXR for add-ons)
(amo-mxr)
: Ability to view and search source code of all extensions on addons.mozilla.or...
Status: RESOLVED FIXED
[sg:want P2]
:
Product: mozilla.org Graveyard
Classification: Graveyard
Component: Server Operations (show other bugs)
: other
: All All
: P4 normal (vote)
: ---
Assigned To: Dave Miller [:justdave] (justdave@bugzilla.org)
: matthew zeier [:mrz]
Mentors:
: 388391 402566 (view as bug list)
Depends on: 522480
Blocks: 458794 519438 527843 528645
  Show dependency treegraph
 
Reported: 2006-10-29 02:25 PST by Jesse Ruderman
Modified: 2015-03-12 08:17 PDT (History)
27 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Repo Creation Tool (4.10 KB, application/octet-stream)
2008-09-28 01:02 PDT, Basil Hashem [:baz]
no flags Details
list o' add-ons (88.96 KB, application/octet-stream)
2009-12-24 09:53 PST, Wil Clouser [:clouserw]
no flags Details

Description Jesse Ruderman 2006-10-29 02:25:36 PST
It would be nice to be able to search the JavaScript source code of all extensions on addons.mozilla.org.  I might use this to:

* Search for things that look like security holes, and notify authors about them.

* Search for explicit use of unsafe (non-wrapped) access to a page's DOM, so I can figure out if the API needs to be better to support specific patterns.  This could also improve the security of extensions.

* Get an idea of how many extensions will break when bug 47903 is fixed, and notify the authors of affected extensions.
Comment 1 Jesse Ruderman 2006-10-29 04:53:29 PST
This would be great for some quick memory-leak-pattern auditing, too.
Comment 2 Cameron 2006-10-29 08:30:12 PST
Sync files from releases.mozilla.org, unpack, index with a tool like lxr? Might want to have some fancy stuff where it tells us what versions are the latest for that listing, or if an extension has been deleted from the listings or renamed or whatever.

Some extensions have binary components, so you're not going to be able to look inside those.

I don't think this is going to be integrated into AMO, it'd probably be a separate webtool. 
Comment 3 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-11-01 09:13:10 PST
Yeah, this is a WONTFIX as an AMO bug.  If you're going to reopen, please pick another product as target!

(Can Google Code Search help us here?)
Comment 4 Justin Scott [:fligtar] 2006-11-01 10:07:40 PST
Just a note: I did write a page in Remora that allows for viewing the contents of an individual extension for the review queue, but in its current state it's not suitable for all extensions.

I would be willing to use that code in a new webtool for all extensions, but that would be after Remora launch.
Comment 5 Justin Scott [:fligtar] 2007-07-16 21:45:18 PDT
*** Bug 388391 has been marked as a duplicate of this bug. ***
Comment 6 Boris Zbarsky [:bz] 2007-07-16 22:01:06 PDT
So... I filed bug 388391 based on shaver telling Beltzner to do so.  Was there some miscommunication here somewhere?  ;)
Comment 7 Daniel Veditz [:dveditz] 2007-07-21 13:25:17 PDT
We need this tool, this type of question comes up at least once or twice every security update as we try to guess at the real-world impact of various regressions or intentional behavior changes introduced by security restrictions.

Please do not WONTFIX based on the plans or workload of the AMO team. Although that seems like a natural fit since AMO knows when each extension is updated, it can certainly be built elsewhere nearly as easily (as noted in comment 2).

Comment 8 Mike Shaver (:shaver -- probably not reading bugmail closely) 2007-07-22 08:57:51 PDT
It wasn't WONTFIXed based on the workload of the AMO team, it was WONTFIXed because coupling it to AMO adds risk that is very undesirable.

Bug 388391 was to be a bug for running the grep, I believe, at least as I intended to instruct beltzner.  (I thought it would be an IT bug, in fact, but that's cool.)  It was WONTFIX-dup'd as a request for a new tool to be added to AMO.
Comment 9 Justin Scott [:fligtar] 2007-11-05 10:21:40 PST
*** Bug 402566 has been marked as a duplicate of this bug. ***
Comment 10 Reed Loden [:reed] (use needinfo?) 2008-09-26 13:01:02 PDT
As per recent discussions on release-drivers@, drivers want a high-powered VM or real machine running a mxr-like instance that processes all add-ons. Site should be behind LDAP authentication.
Comment 11 Daniel Veditz [:dveditz] 2008-09-26 13:28:46 PDT
public would be really nice, but we can figure that out later -- please let's just get this working!
Comment 12 matthew zeier [:mrz] 2008-09-26 15:20:06 PDT
From internal discussions, preferable to use the same hardware that mxr is already using (or upgrade that host to be able to support both) instead of making a new host.

justdave - can you comment on which path is best and perhaps what additional resources the current host would need to support this?
Comment 13 Dave Miller [:justdave] (justdave@bugzilla.org) 2008-09-26 15:31:57 PDT
The CPU is almost maxed out on that box right now, it'll definitely need about twice the CPU horsepower that it has now.  It's a DL360 G4.  64-bit may prove efficient since it's a lot of data manipulation, which tends to benefit from wider data paths.

Currently has about 130 GB hard drive, will need about twice that, too.
Comment 14 Basil Hashem [:baz] 2008-09-28 01:02:02 PDT
Created attachment 340775 [details]
Repo Creation Tool

I actually wrote up a little proposal for how the public version of this might work, see https://wiki.mozilla.org/User:Bhashem/ViewAddonSources. Also, created bug 457474 to track progress against the public version of this tool.


I'd like to make a couple of notes for whomever plans to do this since I've looked at this already. If you are starting with the full archive of the add-ons such as the one off releases (e.g. http://releases.mozilla.org/pub/mozilla.org/addons/), then this repo includes ALL versions of every add-on. This includes disabled, old, patched, sanboxed experiments along with the "latest".

For a typical grep, you want to just include the "latest" version of an add-on since we can assume that the majority of users have upgraded to the latest (not always true but generally an OK assumption). In order to determine which one is the latest version of an add-on, you can:

1) Attempt to parse the file name - not reliable at all since version numbers are arbitrary and assigned by the author.

2) Query the latest AMO production database (perhaps using the AMO API - but that only covers public add-ons). E.g. using the releases repo and an out-of-date copy of the AMO DB (such as the one on machine khan) will result in xpi files that do not include the latest info. Another approach is to offer a specialize query API to the production DB.

3) Perform a two-step process, export a list of the most current version of an add-on from the production db that is then used to select items from the repo or create a new repo that just has the files of interest.

4) Attempt to use the archive as the only source. Generally works by opening up each directory and xpi file and attempt to select the most up to date XPI based on the individual file dates. (You cannot rely on .XPI file dates since those are based on the date copied to the repo).

Once the "filtered" repo is created, you need to "explode" the files from .xpi or .jar -> directory structure. In many cases, those xpi's contain .jar's themselves, so you'll want to explode those recursively.

Finally, you have an exploded repo that you can now do a deep search for particular string patterns. 

OK, enough rambling - I'm attaching some scripts that I've used to implement Option 4 suggested above. But if I were to do it again, I would use Option 3 despite the two step process.
Comment 15 Dave Miller [:justdave] (justdave@bugzilla.org) 2008-09-28 09:16:22 PDT
(In reply to comment #14)
> If you are starting with the full archive of the
> add-ons such as the one off releases (e.g.
> http://releases.mozilla.org/pub/mozilla.org/addons/), then this repo includes
> ALL versions of every add-on. This includes disabled, old, patched, sanboxed
> experiments along with the "latest".

That's a problem and needs to be fixed.  We've had mirrors get DMCA takedown notices more than once for addons that were copyright violations and had already been removed from the addons site for that reason, and never disappeared out of the addons directory.  They need to go away off the download server if they get disabled on the site.
Comment 16 matthew zeier [:mrz] 2008-10-02 23:44:45 PDT
(In reply to comment #13)
> Currently has about 130 GB hard drive, will need about twice that, too.

So, 260GB?  Going to do this with iscsi.
Comment 17 Reed Loden [:reed] (use needinfo?) 2008-10-02 23:50:42 PDT
(In reply to comment #16)
> (In reply to comment #13)
> > Currently has about 130 GB hard drive, will need about twice that, too.
> 
> So, 260GB?  Going to do this with iscsi.

How difficult is it to expand that in the future if needed? Can it be grown?
Comment 18 matthew zeier [:mrz] 2008-10-07 11:41:46 PDT
For justdave, dm-webtools04 up @ 10.2.74.220 waiting for work.
Comment 19 timeless 2008-10-08 03:36:42 PDT
fwiw, i created http://mxr-test.konigsberg.mozilla.org/mozdev/ for dveditz a couple weeks back.

is it possible to reach khan from konigsberg?
Comment 20 Reed Loden [:reed] (use needinfo?) 2008-10-08 11:28:35 PDT
(In reply to comment #19)
> is it possible to reach khan from konigsberg?

No. What do you need from khan?
Comment 21 Reed Loden [:reed] (use needinfo?) 2008-10-18 02:26:36 PDT
dm-webtools04 has been successfully set up, and mxr has been moved over to it.
Comment 22 Reed Loden [:reed] (use needinfo?) 2008-12-16 16:37:14 PST
Just an update... timeless and I are working together to get this running. Should have it up this week, hopefully.
Comment 23 matthew zeier [:mrz] 2009-01-20 13:42:55 PST
Moving to projects, gated on timeless (right?)
Comment 24 Samuel Sidler (old account; do not CC) 2009-06-01 19:15:41 PDT
This really would've been useful today as we were working on the Firefox 3.0.11 regression found during that cycle and trying to determine how many extensions this affects. I'm not sure where the hold up is now, but we did search using the setup in comment 19, but that's a bit out-of-date now.

Any chance we can get this moved up in priority? We don't *need* it too often, but when we do we really do *need* it. (And I'm sure there's lots of "want" times as well.)
Comment 25 Mark Finkle (:mfinkle) (use needinfo?) 2009-06-09 23:30:51 PDT
Would have been helpful today for tracking the affect of bug 497098
Comment 26 Justin Fitzhugh 2009-09-09 15:24:18 PDT
can we get this moving again?  Moving out of projects...
Comment 27 Justin Scott [:fligtar] 2009-09-09 15:34:41 PDT
AMO is planning a taggable search code repository for Q1 as part of the Add-on Developer Hub's second phase. If this is set up by then, we will happily contribute to and/or re-use this bug's creation. Otherwise we will be starting from scratch on this bug's scope + more sometime around then.
Comment 28 Justin Fitzhugh 2009-09-09 15:45:39 PDT
seems like we should use similar tools and infra as the rest of the project unless there are some features missing.  any reason why we can't use mxr for this too?
Comment 29 Jeremy Orem [:oremj] 2009-09-09 16:01:07 PDT
Is this an addons bug? It seems like the addons devs could better integrate a code search with the preexisting developer panels.
Comment 30 Justin Scott [:fligtar] 2009-09-09 16:12:40 PDT
@justin We haven't started evaluating tools yet, but we do plan on having more than mxr currently supports, most notably the ability to select certain lines of code and tag them as "opens a new tab" or "first run page" so that developers can see and search real, working code snippets.

I have no idea what mxr is written in or if we could easily build our additional functionality into it. I was just saying that we're planning on something similar for Q1 but haven't made any specific implementation plans yet, so if this bug is finished before then, that would certainly factor in to our plans.
Comment 31 Justin Fitzhugh 2009-09-09 16:17:42 PDT
how hard is this really?  If it's something that can be done in a few hours, seems like it would give a lot of people a tool they have been asking for for years.  The AMO part of Developer Hub sounds like a good idea, but an mxr would be a good stop gap in the mean time if it's trivial (or semi-trivial) to setup (in which case it would be an infra bug).
Comment 32 matthew zeier [:mrz] 2009-09-16 15:16:17 PDT
Resource constrained right now - it's on the q4 schedule.
Comment 33 Justin Fitzhugh 2009-10-02 08:03:43 PDT
given we are now in q4 - what's the eta?
Comment 34 matthew zeier [:mrz] 2009-10-02 09:08:31 PDT
Will have updated ETA on Monday.
Comment 35 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-10-02 10:19:51 PDT
So the main hangup with this is disk space.

There's currently 20 GB of addons stored in AMO's sandbox.  They're in xpi wrappers, which are pkzip-compressed internally.  pkzip has an average compression ratio between 2:1 and 3:1, which means on the upper bound we're looking at 60 GB of disk space just to unpack them all.  MXR's indexing usually eats up at least as much space as the source itself, and often more.  So I'd take a wild guess that we'd probably need about 150 GB of disk space to support the addons that are present currently, and that's not allowing for any growth.

There's currently 160 GB free on dm-webtools04 (the machine that currently hosts MXR).

Do we want each addon indexed individually, or all of them as a whole? (i.e. when you search for a variable name, do you want to search within all addons or only that specific one?).  Indexing them individually seems to make more sense, but greatly adds to the complication of setup because that means we need to automate adding/removing indexing jobs from MXR.

We would also need a script written to unpack all of the addons into a directory tree on a regular basis (probably daily).

dm-webtools04 already has a fairly-constrained CPU, with about 40 GB of total code to index on a multiple-times-daily basis.  It's possible, to avoid killing the machine, that we'd need to set up a separate instance of MXR on a different machine to run the indexing jobs then rsync the results back when it's done (which would also double the amount of required disk space in addition to adding more hardware).

To make a long story short: no, it's not trivial to hook this into MXR.
Comment 36 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-10-02 10:24:14 PDT
I see scripts that'll be usable attached to comment 14, so the only major hangup at this point would be CPU power.  Could just try it and see what happens.
Comment 37 Boris Zbarsky [:bz] 2009-10-02 10:27:46 PDT
> Do we want each addon indexed individually, or all of them as a whole? 

The main use case we have so far, I think, is "find all extensions that use a given interface or code pattern".  That argues for indexing as a whole, right?
Comment 38 Justin Scott [:fligtar] 2009-10-02 11:19:27 PDT
I think we only need to consider the files associated with the latest version of an add-on. That should reduce the size considerably, but may require a script to read AMO's db and copy the latest files to another location for MXR.
Comment 39 Jesse Ruderman 2009-10-02 18:04:32 PDT
If you want to search within a specific extension you can use file-path matching in addition to contents matching.
Comment 40 matthew zeier [:mrz] 2009-10-04 18:32:07 PDT
(In reply to comment #36)
> I see scripts that'll be usable attached to comment 14, so the only major
> hangup at this point would be CPU power.  Could just try it and see what
> happens.

Do you still have sufficient space?
Comment 41 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-10-04 19:13:04 PDT
(In reply to comment #40)
> Do you still have sufficient space?

If we do as suggested in comment 38, yes.  That'll require additional code to do the "checkout" part though (which might be best written by someone familiar with the innards of AMO rather than IT).
Comment 42 matthew zeier [:mrz] 2009-10-13 10:25:10 PDT
(In reply to comment #38)
> That should reduce the size considerably, but may require a
> script to read AMO's db and copy the latest files to another location for MXR.

Is that a "may" or "will" and is that on you guys?  ETA?
Comment 43 Justin Scott [:fligtar] 2009-10-13 10:47:07 PDT
It's a "will". Wil, can we write a script that copies the files of the latest version of all viewsource=1 add-ons from REPO_PATH to a specified directory for source viewing?
Comment 44 matthew zeier [:mrz] 2009-10-13 11:12:49 PDT
Great.  Let me know some sort of ETA.
Comment 45 Wil Clouser [:clouserw] 2009-10-15 08:15:13 PDT
I don't like the idea of latest version only.  That means our URLs won't be reliably bookmarkable.  If this is just a temporary solution until webdev comes up with something taggable it's not a huge deal though.

Additionally, by only copying the add-ons who wish to share their source (viewsource=1) we're not really fulfilling the goals of comment 24, comment 25, etc. since some (most?) add-ons don't have that flag set.

I've filed bug 522480 for webdev to write a script and we can start on it next week and have it done in a few weeks.  If my above comments change the requirements, let me know asap.
Comment 46 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-10-15 08:18:40 PDT
Yes, I think we should do all add-ons regardless of viewsource=1 and make it only available to a subset of people (admins, editors, and some security people if we don't want to just make them editors or admins).
Comment 47 Justin Scott [:fligtar] 2009-10-15 10:15:08 PDT
I assumed this was for a publicly accessible site. It's certainly fine to have a second site that is not public and has all files, but add-on developers have explicitly expressed concern about honoring the viewsource setting when we've talked about a public source code search tool before.

As for copying only the latest version, I don't think bookmarkable URLs to a particular add-on is too important for the main use case. And if it is really important, we could just rename the file to something consistent when it's copied.
Comment 48 Justin Scott [:fligtar] 2009-10-20 14:27:48 PDT
We need whoever the stakeholder of this bug is to decide which method we're going with so we can work on bug 522480 in our current AMO release.

The options are:
1. publicly-accessible repository with only add-ons that have enabled view source
2. private repository with all add-ons
3. both (may have space issues, judging from comment #35)

If this is mainly for the security team, it seems like it would be option 2 or 3.
Comment 49 Wil Clouser [:clouserw] 2009-10-22 14:02:44 PDT
(In reply to comment #48)
> We need whoever the stakeholder of this bug is to decide which method we're
> going with so we can work on bug 522480 in our current AMO release.

The current AMO milestone has begun and that bug is unassigned because we don't have a decision here.  If it misses this milestone the next one code freezes on Nov 24.

I talked to fligtar about this and it sounds like it doesn't fulfill the needs of the AMO tool-in-planning so it will only be for whoever needs it now.
Comment 50 Samuel Sidler (old account; do not CC) 2009-10-22 15:33:08 PDT
I'll make a decision! Branch drivers (and developers working on branch releases) really just need #2. If someone else is interested in #1 or #3, then that's fine, but from a security/release perspective, we need #2.
Comment 51 Asa Dotzler [:asa] 2009-10-26 13:06:52 PDT
Post def-con security talk makes this need very clear. jetpack also wants this for API analysis. brendan, chofmann, dveditz agree that this needs to move up in priority. Who can help? AMO team? Server Ops?
Comment 52 Justin Scott [:fligtar] 2009-10-26 14:41:05 PDT
Please read the bug comments... this is already being worked on in the current AMO release.
Comment 53 Wil Clouser [:clouserw] 2009-10-29 13:41:58 PDT
Script is done thanks to davedash.  http://svn.mozilla.org/addons/trunk/bin/latest_addon_extractor.py
Comment 54 Nick Nguyen [:osunick] 2009-11-05 13:41:03 PST
Pinging again- Shawn from the platform team would find this hugely helpful for clearing a gecko blocker.
Comment 55 chris hofmann 2009-11-05 14:57:43 PST
I wish we could stop using terms like "stakeholders."   there are just people that are trying to get their jobs done and are blocked by not having this.

we need this for trying to construct a comprehensive list of jetpack apis that are going to cover the highest pct of use cases.

we need this for quick decisions about what 3.6 end-game changes might potencially break addons, and/or what changes already made in 3.6 might affect which set of addons in the effort to get compatibility levels up for 3.6, like shawn mentioned in the particular bug he is looking at. sam needs this for maintainence releases as well.

we need this for the security checks we need to look at for things asa mentioned in comment 51.

has this been disconnected from the AMO milestone?  can it be, so progress might move faster?

even a one time snap shot of where we are today would be extremely valuable. 

then we could get it updating with the latest data daily.

then we could see if its feasible to provide history.
Comment 56 Wil Clouser [:clouserw] 2009-11-05 15:00:04 PST
> has this been disconnected from the AMO milestone?  can it be, so progress
> might move faster?

The AMO team's contributions were finished with comment 53.  This is an IT bug.
Comment 57 matthew zeier [:mrz] 2009-11-05 15:02:54 PST
What Wil said and this won't be done until after Bugzilla 3.4 is pushed out.  That's justdave's only priority right now.
Comment 58 chris hofmann 2009-11-05 15:06:39 PST
rough guess on when that will be?
Comment 59 chris hofmann 2009-11-09 13:40:25 PST
looks like Bug 519438 -  Frequent crashes in nsAppShell::ProcessNextNativeEvent after mozilla::storage::Connection::Close()  -- a fx 3.6 blocker depends on this bug.
Comment 60 Justin Scott [:fligtar] 2009-11-09 14:51:31 PST
It doesn't depend on this bug -- we manually grep for patterns upon request when people ask us to, usually by filing a bug in addons.mozilla.org::Administration.
Comment 61 chris hofmann 2009-11-10 15:25:53 PST
I think many would disagree with that if 

a) few or none know about the need to file administration bugs and 

b) its too slow and inefficient to do many searches  

or c) do them interactively when the results from one search inspire other searches that need to be run to drill into a problem, or how widespread the problem might be.
Comment 62 matthew zeier [:mrz] 2009-11-10 15:36:10 PST
Should be done with Bugzilla 3.4 this weekend and then start on this bug.
Comment 63 Mike Beltzner [:beltzner, not reading bugmail] 2009-11-19 14:27:34 PST
That's great to hear. We might want to figure out if we can do this every 2-4 months or so, actually, or if it's just as easy to do one-off requests, create a nice front end that lets us request that and get an email of the results?

Can't stress enough how much this matters as we come up to a release, minor or major.
Comment 64 Daniel Veditz [:dveditz] 2009-11-20 10:22:36 PST
Do what every 2-4 months? This bug is about setting up a regularly-updated mxr-like service. If the indexing mechanism is a full-AMO sweep then we'd want it updated at least weekly if not daily. If it's triggered by individual addon updates that's cool, but not if it complicates the code such that it delays the availability of a usable mxr-like service.

Actually let me take that in two parts
 1. mxr-like service for today's addons
 2. regularly updated.

I would be overjoyed to get #1 ASAP with a manually triggered one-time index while we work out #2.
Comment 65 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-20 14:38:13 PST
I was planning to give it the usual MXR treatment...  if the pull script is sufficient, we'll get it going daily.  I'll be playing with it in the next couple days (probably have an initial deployment tonight, whether it updates depends on how good the pulls script works. :)
Comment 66 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-20 14:56:00 PST
ok, the script wants a source repo, but doesn't explain much about it...  I'm assuming that's the "files" directory?
Comment 67 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-20 15:00:18 PST
oops, didn't mean to reassign there.
Comment 68 Dave Dash [:davedash, :dd] (assign all bugs to mbrandt) 2009-11-20 16:53:56 PST
> ok, the script wants a source repo, but doesn't explain much about it...  I'm
assuming that's the "files" directory?(In reply to comment #66)

All directories are "files" directories ;)

But yes, it's the root path of where the xpi file live.

Thanks,

-d
Comment 69 matthew zeier [:mrz] 2009-11-24 19:30:56 PST
What's the ETA to wrap this up?
Comment 70 Dave Dash [:davedash, :dd] (assign all bugs to mbrandt) 2009-11-25 09:44:25 PST
justdave,

Let me know if you need anything from me you can ping me on IRC.
Comment 71 Justin Fitzhugh 2009-11-29 21:02:01 PST
can we please get a hard date to expect this done by?
Comment 72 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-30 14:03:26 PST
I have no idea how long it's going to take because it involves basically experimenting with it until I find something that works.  That said, I did work on it a bunch last Tuesday, couldn't work on it Wednesday because of too many production issues I had to firefight, and had Thanksgiving weekend off.  Based on what I've figured out so far, I'm thinking the range is "in the next couple days" though, not weeks away or anything.
Comment 73 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-03 15:51:46 PST
This is all set up and running now, except that the extraction script isn't unpacking the source.  It's otherwise automated and hooked into MXR now however.

I've reopened bug 522480 to get the script updated, should be a matter of svn up to fix it once that's done.
Comment 74 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-03 15:52:42 PST
Oh, it's at http://mxr.mozilla.org/addons/source/ fwiw.
Comment 75 Samuel Sidler (old account; do not CC) 2009-12-03 16:00:05 PST
So I assume this is #1 instead of #2?

> 1. publicly-accessible repository with only add-ons that have enabled view
> source
> 2. private repository with all add-ons
Comment 76 Justin Scott [:fligtar] 2009-12-03 16:01:00 PST
This shouldn't be publicly accessible, for the reasons in comment #48 and comment #50 -- the script copies all add-ons regardless of whether they have set to not have their source publicly viewable on the web.
Comment 77 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-03 17:01:34 PST
(In reply to comment #76)
> This shouldn't be publicly accessible, for the reasons in comment #48 and
> comment #50 -- the script copies all add-ons regardless of whether they have
> set to not have their source publicly viewable on the web.

OK, MXR is now available over SSL as well, and the /addons directory will force you over to SSL and ask for LDAP authentication.  It's locked down so only folks who can also access the Intranet wiki can access it.  If there's a better group to lock it down to, or if we should create a new one for it, let me know.
Comment 78 Samuel Sidler (old account; do not CC) 2009-12-03 18:27:58 PST
I'd say the security group is a better group. Not everyone at Mozilla should have access to this site for the same reasons we don't give everyone admin access to crash-stats.
Comment 79 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-03 18:52:37 PST
ok, done.
Comment 80 dwitte@gmail.com 2009-12-03 19:53:30 PST
(In reply to comment #78)
> I'd say the security group is a better group.

Wait, why? What's sensitive in this information? I, for one, want to use this for regular development purposes, and I don't see why source code indexing for addons should be tied solely to security.
Comment 81 Dave Townsend [:mossop] 2009-12-03 19:56:22 PST
Certainly one of the reasons for having this index is to help us make decisions on how API changes and bugs will impact extensions, it would seem to me that only allowing the security group is a little too restrictive here though I'm not sure if there is a better one.
Comment 82 Reed Loden [:reed] (use needinfo?) 2009-12-03 20:04:50 PST
(In reply to comment #81)
> Certainly one of the reasons for having this index is to help us make decisions
> on how API changes and bugs will impact extensions, it would seem to me that
> only allowing the security group is a little too restrictive here though I'm
> not sure if there is a better one.

How about making it tied to mozilla-central commit access? Does that cover most people?
Comment 83 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-03 20:48:18 PST
Maybe we need two copies of this, one with the "view-source disabled" extensions not included.  From my understanding, that's the main reason this got secured, because it includes extensions whose authors have asked that their source not be shared.
Comment 84 Dave Dash [:davedash, :dd] (assign all bugs to mbrandt) 2009-12-03 20:59:03 PST
(In reply to comment #83)
> Maybe we need two copies of this, one with the "view-source disabled"
> extensions not included.  From my understanding, that's the main reason this
> got secured, because it includes extensions whose authors have asked that their
> source not be shared.

I'd rather just have one.  We're really trying to audit how the extensions framework is (mis)used and it has nothing to do with view source... perhaps a new ldap group is warranted for this?
Comment 85 Wil Clouser [:clouserw] 2009-12-03 21:00:51 PST
Maybe we should remove the 'view source' option from AMO altogether since it's only cosmetic and not preventing anyone from viewing the source manually.  Perhaps it's even giving authors false assurance.
Comment 86 :Gavin Sharp [email: gavin@gavinsharp.com] 2009-12-03 21:09:19 PST
I think offering the "publically viewable source" checkbox for addon authors is providing a false sense of security, given that all of these addons are available on our FTP server (and on user's computers once installed)... Would it be contentious to remove the option and just make all addons' source publically viewable (on AMO and this MXR)?

I understand that this may make it easier for people to inadvertently violate the license terms of non-open-sourced addons (we allow those, right?), but I'm not sure that's actually a problem that we need to worry about...
Comment 87 chris hofmann 2009-12-03 21:13:36 PST
Any decisions about access should be reached after a review of any agreements that we have with addon developers.   The "committer" and "security-group" both could be viewed as having "public" and "private" components.  Thats how the mozilla project rolls...

This agreement on the addon site
https://addons.mozilla.org/en-US/developers/docs/policies/agreement

with this clause...

"...you grant to Mozilla and its Affiliates a non-exclusive, worldwide, royalty-free, sublicensable license to distribute, transmit, reproduce, publish, publicly and privately perform and display and otherwise use the AMO Contribution, "

would seem to indicate that we have the ability to display or use the contributions.

Are there other agreements that come into play?

When I heard that we needed to protect some addons, I was thinking about obfuscated .js and source for binaries that comes out of band to jorge, and editors.    None of that is in this set up, correct?

We need to examine exactly what it is that we are protecting, and need to protect.

If these are the same .xpi files that we distribute of amo then just taking those and unzipping them and putting the same source that could be view by any users under mxr seems like we have not really exposed anything.
Comment 88 Wil Clouser [:clouserw] 2009-12-03 21:47:05 PST
(In reply to comment #87)
> When I heard that we needed to protect some addons, I was thinking about
> obfuscated .js and source for binaries that comes out of band to jorge, and
> editors.    None of that is in this set up, correct?

Correct.

> If these are the same .xpi files that we distribute of amo then just taking
> those and unzipping them and putting the same source that could be view by any
> users under mxr seems like we have not really exposed anything.

They are the same files.
Comment 89 timeless 2009-12-04 01:48:41 PST
so, if necessary, it would be possible for the mxr search/ident/find and source scripts to be modified so that they read a file to decide whether certain paths are avaiable for an http://mxr.mozilla.org/addons/ view vs. an https://mxr.mozilla.org/addons/ view.

Roughly a file in the mxr index directory could contain a list of paths which are private and when search/ident/find prepare to display a path, they could check that list and if the url is http: instead of https:, they could bump a counter and skip displaying. At the bottom of the page, they could say:

X files skipped, they a <a href="https://$thispath">available to a restricted group</a>

For /source/ the rule would basically generate an error page which similar content.

-----
Personally, I think that restricting mxr is silly and that telling developers that by publishing their applications they can keep their source confidential is disingenuous. iirc when I setup an mxr for addons on konigsberg, i didn't didn't use private information, it was just an exercise in screen scraping, the files are as noted in comment 88 just files, and it seems that our licensing terms according to comment 87 don't preclude us from doing this.

that said, i understand there are political concerns to enabling for everyone to see the source because some vendors seem to want their code to be treated as a locked box.
Comment 90 Justin Scott [:fligtar] 2009-12-04 10:57:04 PST
This bug is about giving our application developers and the security team the tools they need to search the source of the add-ons we have available. If someone wants to file a separate bug in AMO about changing the way we handle the view source options in general, please do so so that discussion can be had there. Changes to that aren't required to meet the needs of this bug.

For this bug, I'd like anyone with mozilla-central commit access or within MoCo to be able to access the site. It should not be open to the whole world or crawlers, but anyone from the community who commits code should be able to use this tool.

As for the extraction piece, I don't think we realized the script needed to extract the add-ons too. My bash skills aren't good, but this is what I'd been using to extract add-ons copied from the script for manual greps on khan: http://fligtar.pastebin.mozilla.org/688344 Can we just use something like that, or do we really need AMO copy script updated?
Comment 91 Dave Dash [:davedash, :dd] (assign all bugs to mbrandt) 2009-12-04 10:58:57 PST
fligtar,

I almost have a patch ready for the extraction script.  I'll update the bug and :justdave when it's finished.
Comment 92 chris hofmann 2009-12-08 16:50:17 PST
what is the latest status on this. 

looks like things are being unziped, but search for something simple like Adblock
with https://mxr.mozilla.org/addons/search?string=Adblock  doesn't turn any results.
Comment 93 Dave Dash [:davedash, :dd] (assign all bugs to mbrandt) 2009-12-08 16:58:30 PST
There's been a few hangups on my end regarding the unzipping, I'll take care of it hopefully by tomorrow.
Comment 94 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-08 17:06:12 PST
I just manually indexed the little bit of content that did successfully unpack before the unzipper crashed, so you've got partial stuff anyway.
Comment 95 chris hofmann 2009-12-11 12:40:04 PST
any updates on this?  looks like another run was attempted a few days ago, but no progress made since dec 7.

also, is there a way to drop a status file that indicates the intention of the addon author to show the source publicly?   I notice that for some addon's the license info is dropped in the root directory and that will help anyone looking at the source to make decisions about disclosing/not disclosing information found in the search in bugs or other places that the research made lead to postings.  

https://mxr.mozilla.org/addons/source/12/license.txt is an example.

having the "public source status" will add to this and also help in the cases where there is no license.
Comment 96 chris hofmann 2009-12-21 18:31:40 PST
any update on this?   does the fact that bug 522480 has been fixed mean an update of the script need to be pushed?
Comment 97 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-21 20:11:35 PST
(In reply to comment #96)
> any update on this?   does the fact that bug 522480 has been fixed mean an
> update of the script need to be pushed?

That's quite likely...

/me pulls an svn update and gives it a try...
Comment 98 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-21 20:25:44 PST
Well, it crashed before finishing, so I reopened bug 522480.

But it got quite a way before it crashed...  it looks like it's unpacking the .xpis.  browsing around a bit I suspect we probably want to also unpack the .jar files that are inside the .xpis, right? (It's not doing that yet)
Comment 99 timeless 2009-12-22 00:14:07 PST
you do...
Comment 100 Daniel Veditz [:dveditz] 2009-12-22 11:43:23 PST
> I suspect we probably want to also unpack the .jar files that are inside the
> .xpis, right?

We do, yes. Thought that was explicit in this bug but maybe it was in a mail thread (fligtar pasted the script he used, if that rings any bells).
Comment 101 Dave Dash [:davedash, :dd] (assign all bugs to mbrandt) 2009-12-22 14:43:02 PST
justdave, please try this again, I rewrote the code so it unzips any jars in the xpi... if we need to do something more let me know, I'd like to have this out of my hands soon, since I'm out of town starting Thu.
Comment 102 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-22 17:59:29 PST
started this running about an hour ago... it hasn't errored out yet, which is promising.  I'll post again when it either crashes or finishes.
Comment 103 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-22 19:24:03 PST
It's still running, but we have visible results on the website already.

Looks mighty pretty :)

I do notice this though: https://mxr.mozilla.org/addons/source/10009/

Looks like we need to recursively unpack xpi, too.
Comment 104 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-22 19:24:53 PST
(and if ddash is already gone I can probably figure that part out, because I imagine it's probably as simple as adding the xpi extension to wherever the existing comparison is being done looking for jar).
Comment 105 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-22 19:28:30 PST
Actually, those are in the top-level directory, maybe I should sit and wait a bit.  It may be the script hasn't gotten to them yet after the initial copy.  I should probably wait until it's completely done before deciding if the results are accurate.
Comment 106 Dave Dash [:davedash, :dd] (assign all bugs to mbrandt) 2009-12-22 19:48:19 PST
added embedded xpis to the list of things to unzip.

svn up
Comment 107 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-22 21:50:10 PST
the run from before hasn't completed yet.  no output, wonder what it's doing. :)
Aborted and restarted with the new version.
Comment 108 chris hofmann 2009-12-23 13:14:48 PST
dvedtiz spotted that we are grabbing personas and that is inflating the top level directory counts.  If there is an easy way to put personas under a different directory tree like https://mxr.mozilla.org/personas or just ignore them for now that will help them from clogging up the addon list and keep them from slowing down searches and navigation to from the top level directory.
Comment 109 Dave Dash [:davedash, :dd] (assign all bugs to mbrandt) 2009-12-23 13:27:54 PST
(In reply to comment #108)
> dvedtiz spotted that we are grabbing personas and that is inflating the top
> level directory counts.  If there is an easy way to put personas under a
> different directory tree like https://mxr.mozilla.org/personas or just ignore
> them for now that will help them from clogging up the addon list and keep them
> from slowing down searches and navigation to from the top level directory.

I'm not seeing where we are grabbing personas.  I am specifically looking for addons that are "extensions" and not personas/themes/etc.

Is there a URL showing this, and I can drill down from there.
Comment 110 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-24 02:50:19 PST
just grabbed the output from the last trial run of the script:

Updating addons...

real    16m38.033s
user    1m29.655s
sys     0m43.592s

real    609m15.564s
user    514m59.929s
sys     0m39.256s

real    134m33.566s
user    3m20.163s
sys     0m36.210s

real    0m1.439s
user    0m0.646s
sys     0m0.084s

The first time counts are from ddash's script unpacking everything, the other three are the three phases of generating indexes.  So it takes about 15 minutes to unpack it all and then 12 hours to index it.
Comment 111 chris hofmann 2009-12-24 09:04:46 PST
haven't seen any personas yet in poking around.

I think I get https://mxr.mozilla.org/addons/source/15052/ as the highest addon.  Does that sound right?

I might be good to break up into directories of 1000 that would allow navigation.  having 15000 top level directories just draws a blank page for me.
Comment 112 Wil Clouser [:clouserw] 2009-12-24 09:53:23 PST
Created attachment 419131 [details]
list o' add-ons


> I think I get https://mxr.mozilla.org/addons/source/15052/ as the highest
> addon.  Does that sound right?

No.  We had ~15000 add-ons or so, and then imported getpersonas which bumped us up to ~45000.  That was a couple months ago, so after that point there are a bunch more add-ons.  I'm attaching the output of "select id,addontype_id from addons where addontype_id in (1,2)".  1=extension, 2=theme.
Comment 113 matthew zeier [:mrz] 2009-12-30 11:49:19 PST
What's outstanding on this bug?  Are we done yet (can I close it?)?
Comment 114 chris hofmann 2009-12-30 13:43:25 PST
I think the original request has been met, but we probably need follow up bugs to make search work across so many addons like the suggestion in commment 111, some kind of alpha order break up, breaking up by ranking, or supported version (3.5.* etc.) might be better ways to organize the files so the searches could be more reliable and catch the most important results.

right now any interesting search for things like eval( or XMLHttpRequest to find possible abuse gets "Too many hits, displaying the first 1000"

We also need to figure out a way to get these sources public as suggested in comment 90 and other areas of this bug so we can apply open source development practices that have served us well with the core mozilla code and start to build a larger set of eyeballs on addon source.

If access is restricted to security group, one additional thing we might do in this bug is to get access for all the current AMO editors.  That would be a good way to start to build more eyeballs looking a the source across addons.
Comment 115 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-01-03 00:23:09 PST
(In reply to comment #114)
> I think the original request has been met, but we probably need follow up bugs
> to make search work across so many addons like the suggestion in commment 111

As of now, this has been added to the daily cron job, so it will auto-update daily now.  Prior to tonight I'd been kicking it off manually every so often.  So with that done, we'll consider this resolved and you can file followup bugs for the individual issues that still need dealing with.

Note You need to log in before you can comment on or make changes to this bug.