848969 - Make a virtual machine image of BMO available

Liz Henry (:lizzard) (relman/hg->git project)

Reporter

Description

•

11 years ago

I would love to have an image of bugzilla.mozilla.org with the latest public data dump, available for community contributors and for researchers, that we could run in VirtualBox or VMWare.  

It would be helpful to lower the barrier to entry for potential b.m.o. extension developers and people who might contribute patches.

David Lawrence [:dkl]

Comment 1

•

11 years ago

As soon as I can get VirtualBox to not crash my system, I will create a Centos6 VM with anonymous BMO code pre-checkedout and running on a Sqlite database. Then I will create a Vagrant Box that can be used to get people up and running quickly.

http://www.vagrantup.com/

OS: Mac OS X → All

Hardware: x86 → All

Liz Henry (:lizzard) (relman/hg->git project)

Reporter

Comment 2

•

11 years ago

justdave had the really good point that we should sanitize the email data in some way in case someone reconfigures it to send live mail instead of writing it to disk.

David Lawrence [:dkl]

Comment 3

•

11 years ago

(In reply to Liz Henry :lizzard from comment #2)
> justdave had the really good point that we should sanitize the email data in
> some way in case someone reconfigures it to send live mail instead of
> writing it to disk.

This makes me think we should be sanitizing the email addresses as part of the sanitize script as well for all dumps. Maybe using some random hash string in replacement of the example.com part of the login. This way it can't be reused and also it protects against duplicate logins from using on the first part of the email.

dkl

David Lawrence [:dkl]

Comment 4

•

11 years ago

(In reply to David Lawrence [:dkl] from comment #3) 
> This makes me think we should be sanitizing the email addresses as part of
> the sanitize script as well for all dumps. Maybe using some random hash
> string in replacement of the example.com part of the login. This way it
> can't be reused and also it protects against duplicate logins from using on
> the first part of the email.

Or simply setting the 'disable_mail' column to true in the profiles table but that could still be turned on per user by the person with the db dump. The hashed email addresses would still be safer IMO.

dkl

Mike Hoye [:mhoye]

Comment 5

•

11 years ago

We had this debate when were discussing dropping the researchers' agreement policy for the dump we've already made public as well. Consensus was that these email addresses are already public-facing, so don't bother; obscuring the

Anyone who wants them can scrape them from the site, so obscuring them is basically like adding DRM layer to the data. A crappy inconvenience for honest researchers, no  barrier at all for people who wish to abuse their access.

Mike Hoye [:mhoye]

Comment 6

•

11 years ago

Perhaps more precisely - I've already got legal, security and privacy's OK to include them as-is.

David Lawrence [:dkl]

Comment 7

•

11 years ago

(In reply to Mike Hoye [:mhoye] from comment #5)
> We had this debate when were discussing dropping the researchers' agreement
> policy for the dump we've already made public as well. Consensus was that
> these email addresses are already public-facing, so don't bother; obscuring
> the
> 
> Anyone who wants them can scrape them from the site, so obscuring them is
> basically like adding DRM layer to the data. A crappy inconvenience for
> honest researchers, no  barrier at all for people who wish to abuse their
> access.

Yeah I am not against having the email addresses there but was just thinking of a way to lessen the risk of them actually having lots of email them mistakenly. So maybe just setting all users to having email disabled in the profiles table will be good enough for now.

dkl

Mike Hoye [:mhoye]

Comment 8

•

11 years ago

Can you elaborate on what risk to our users you perceive, there? I don't follow.

Liz Henry (:lizzard) (relman/hg->git project)

Reporter

Comment 9

•

11 years ago

Exactly -- I'm not concerned with the privacy aspect of it, or against including the emails. I'm worried about the "ease of accidentally turning on live bugmail in the configuration" issue.   

I can't see someone doing it on purpose, but it might be turned on by accident, which could generate a lot of bugmail for people from test instances.  For example if I were developing an extension that sent mail to people (as I in fact want to do!) inviting them to triage a random bug, I might turn on mail thinking my test bug would email only me. But it would email the giant list of people who get cced on every bug in that component.

David Lawrence [:dkl]

Comment 10

•

11 years ago

(In reply to Mike Hoye [:mhoye] from comment #8)
> Can you elaborate on what risk to our users you perceive, there? I don't
> follow.

Mostly a technical issue, but if for some reason someone imports the DB to a 
working Bugzilla code checkout, and they accidently enable email delivery, alot of users could possibly be sent emails. They could think it was from BMO but I am sure they would be wondering why they have email from a source they are not aware of. It only takes a one line change in data/params to enable email delivery.

dkl

Liz Henry (:lizzard) (relman/hg->git project)

Reporter

Comment 11

•

11 years ago

How about replacing the @ in the existing email addresses in the sanitized db with some other character or string, so that they aren't valid email addresses?  

That way, it isn't like deleting half the email address, but it would not send out accidental bugmail even if the option were (for testing, by the person who had downloaded the image and db) turned on the profile tables.

Liz Henry (:lizzard) (relman/hg->git project)

Reporter

Comment 12

•

11 years ago

That way:
- the data would still be there (which I think is your concern, mhoye) 
- it could be easily restored
- the problem we're worried about would not happen
- Actual emails could still be put into the system by the user, for email testing.

David Lawrence [:dkl]

Comment 13

•

11 years ago

(In reply to Liz Henry :lizzard from comment #12)
> That way:
> - the data would still be there (which I think is your concern, mhoye) 
> - it could be easily restored
> - the problem we're worried about would not happen
> - Actual emails could still be put into the system by the user, for email
> testing.

That works for me. 

dkl@mozilla.com => dkl_at_mozilla_dot_com

dkl

Mike Hoye [:mhoye]

Comment 14

•

11 years ago

Works for me as well.

Thanks!

Gervase Markham [:gerv]

Comment 15

•

11 years ago

We should use a reversible transform, for the sake of the researchers. _ is a valid char in email localparts; so the following email addresses have clashing mappings:

gerv.markham@gerv.net
gerv_dot_markham@gerv.net

Can we simply replace the @ with _at_? That achieves the goal, and is reversible by looking for the first occurrence of "_at_" starting at the right hand end of the string, because it's guaranteed to be present but guaranteed not to be present in the domain name, as _ is not valid in Internet domain names.

Gerv

:glob ✱

Comment 16

•

11 years ago

(In reply to Gervase Markham [:gerv] from comment #15)
> We should use a reversible transform, for the sake of the researchers.

if a reversible transform is used, i don't see how that would prevent spammers from applying the same transformation to get the correct addresses.

> Can we simply replace the @ with _at_?

no, that would result in the addresses being invalid according to bugzilla's emailregexp setting.

Gervase Markham [:gerv]

Comment 17

•

11 years ago

(In reply to Byron Jones ‹:glob› from comment #16)
> (In reply to Gervase Markham [:gerv] from comment #15)
> > We should use a reversible transform, for the sake of the researchers.
> 
> if a reversible transform is used, i don't see how that would prevent
> spammers from applying the same transformation to get the correct addresses.

This is not about anti-spam - see comments 5 and 6. It's about making sure people don't accidentally get email from test instances.

> > Can we simply replace the @ with _at_?
> 
> no, that would result in the addresses being invalid according to bugzilla's
> emailregexp setting.

But I am merely suggesting a reduced form of what's suggested and agreed in comment 13 and comment 14. How could that solution be OK but mine not?

Gerv

David Lawrence [:dkl]

Comment 18

•

11 years ago

(In reply to Byron Jones ‹:glob› from comment #16)
> if a reversible transform is used, i don't see how that would prevent
> spammers from applying the same transformation to get the correct addresses.

I guess that is the point. Anything we do to change the email addresses can easily be reversed by the spammer so there is no point in doing it. 

Or if we are worried about spammers, we replace the domain portion with a randomly generated key of some kind.
 
> > Can we simply replace the @ with _at_?
> 
> no, that would result in the addresses being invalid according to bugzilla's
> emailregexp setting.

That can be fixed in the regex in data/params value installed on the VM.

(In reply to Gervase Markham [:gerv] from comment #17)
> This is not about anti-spam - see comments 5 and 6. It's about making sure
> people don't accidentally get email from test instances.
> 

LpSolit would disagree with this point. See bug 840043. If just altering the email addresses to be unusable is not that difficult, why not just do it? Do the researchers really need valid emails for their metrics to come out right?

dkl

:glob ✱

Comment 19

•

11 years ago

(In reply to Gervase Markham [:gerv] from comment #17)
> This is not about anti-spam - see comments 5 and 6. It's about making sure
> people don't accidentally get email from test instances.

oh, sorry gerv.  there's another very similar discussion happening about this regarding anti-spam.

the sanitise script has already been updated in bug 855846 to address that issue -- bugmail is simply disabled for all accounts, and flag-cc's removed.

Liz Henry (:lizzard) (relman/hg->git project)

Reporter

Updated

•

6 years ago

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → INCOMPLETE

Dylan Hardison [:dylan] (he/him)

Comment 20

•

6 years ago

Apparently I resolved this ~2 years ago with vagrant. :-)

Resolution: INCOMPLETE → FIXED

Bugzilla

Quick Search

Make a virtual machine image of BMO available

Categories

(bugzilla.mozilla.org :: General, enhancement)

Tracking

()

People

(Reporter: lizzard, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Updated

Comment 20