Closed Bug 1146897 Opened 9 years ago Closed 8 years ago

Export public json files produced by ship-it

Categories

(Release Engineering :: Applications: Shipit, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Sylvestre, Assigned: ericz)

References

Details

Attachments

(1 file)

ship-it is available only using a vpn + permissions.

Now that ship-it is generating (almost) all the same things as the product details (Bug 1083718), we need to figure out a clean way to publish these json files to the public channel.

I see several solutions:

* Set up a new instance of ship-it just for the json export. This instance would use the same database as the production (it just need SELECT permissions) with the ship-it interface disabled but outside of the vpn.

* Hack ship-it to push the json files regularly (or on each change in the db)

* Set up a cron from a server (with vpn access) to publish the file somewhere

* Move ship-it outside of the VPN (probably the worst idea) :)

Any other ideas?
Ben, Rail, any suggestions?
Flags: needinfo?(rail)
Flags: needinfo?(bhearsum)
I like the pushing the data to some public URL on change or/and by cron. Probably cron would be easier (no need to think about long-time operations triggered in browser).

Publishing to S3 would be the easiest option to go, I think.
Flags: needinfo?(rail)
(In reply to Rail Aliiev [:rail] from comment #2)
> I like the pushing the data to some public URL on change or/and by cron.
> Probably cron would be easier (no need to think about long-time operations
> triggered in browser).
> 
> Publishing to S3 would be the easiest option to go, I think.

We can't deploy cron tasks to the web heads. I think we probably need to create a ro account to the Ship It db and run a separate script somewhere else that grabs the data and pushes it (similar to what we do for Buildbot json).
Flags: needinfo?(bhearsum)
Ben, you are suggesting a mix of my first & second proposals, right?

What do we do for Buildbot json ? :)
(In reply to Sylvestre Ledru [:sylvestre] from comment #4)
> Ben, you are suggesting a mix of my first & second proposals, right?

I'm not sure if we're talking about your original proposals from e-mail, or those from comment #0.

What I'm suggesting is that there will be a process (probably a cronjob) that takes data from Vertica and publishes a public JSON file based on it. This will NOT be a ship it instance, just a script.

Once that's in place, the Ship It frontend can consume that instead of contacting Vertica directly.

> What do we do for Buildbot json ? :)

We have a script that runs on a machine. It talks to the Buildbot database, and publishes JSON files to http://builddata.pub.build.mozilla.org/builddata/buildjson/.

Don't worry about the deployment part - once you've got a script that can translate Vertica data to JSON (which you sortof already have!) we (RelEng) can figure out where to publish the output and deploy it.
Sylvestre just pointed out that my last comment was confusing this bug and bug 1146863.

With that in mind, it still seems like an external process that runs through cron would make sense. It would need RO access to the Ship It database, and it could publish a JSON file publicly.

I'd suggest this over implementing it as a second webapp because it's much simpler and lower maintenance. A second webapp would require WebOps to set-up/deploy, and they're pretty busy these days. A cronjob could be managed by RelEng entirely.
(In reply to Ben Hearsum [:bhearsum] from comment #3)
> (In reply to Rail Aliiev [:rail] from comment #2)
> > I like the pushing the data to some public URL on change or/and by cron.
> > Probably cron would be easier (no need to think about long-time operations
> > triggered in browser).
> > 
> > Publishing to S3 would be the easiest option to go, I think.
> 
> We can't deploy cron tasks to the web heads. I think we probably need to
> create a ro account to the Ship It db and run a separate script somewhere
> else that grabs the data and pushes it (similar to what we do for Buildbot
> json).

Ben/Rail/Sylvestre :

I'm wondering if we can have a quick meeting about this? This assumption is incorrect, we can run cronjobs on the webheads and do this for a lot of our current projects. I would love to understand what we're trying to accomplish and see if there's a better way to do this. 

Thoughts?
(In reply to Shyam Mani [:fox2mike] from comment #7)
> (In reply to Ben Hearsum [:bhearsum] from comment #3)
> > (In reply to Rail Aliiev [:rail] from comment #2)
> > > I like the pushing the data to some public URL on change or/and by cron.
> > > Probably cron would be easier (no need to think about long-time operations
> > > triggered in browser).
> > > 
> > > Publishing to S3 would be the easiest option to go, I think.
> > 
> > We can't deploy cron tasks to the web heads. I think we probably need to
> > create a ro account to the Ship It db and run a separate script somewhere
> > else that grabs the data and pushes it (similar to what we do for Buildbot
> > json).
> 
> Ben/Rail/Sylvestre :
> 
> I'm wondering if we can have a quick meeting about this? This assumption is
> incorrect, we can run cronjobs on the webheads and do this for a lot of our
> current projects. I would love to understand what we're trying to accomplish
> and see if there's a better way to do this. 
> 
> Thoughts?

Sure...
tl;dr from the meeting (please correct me if I'm wrong).

1) We're going to run the cronjob on genericadm.private.phx1.mozilla.com, push out the file to the ship-it webheads
2) No need for a VM
3) No need for an SSL certificate
4) No need for Zeus configs 

We'll work separately on the product-details stuff. 

I'll spin off a bug to track step 1 above and close out the other bugs (steps 2, 3 & 4)
Depends on: 1155935
(In reply to Shyam Mani [:fox2mike] from comment #9)

> I'll spin off a bug to track step 1 above and close out the other bugs
> (steps 2, 3 & 4)

Spun off Bug 1155935 and closed out Bug 1152754 and Bug 1153770
The cron is now in place. I believe this bug is complete?
No longer depends on: 1155935
The cron was for the partial generation.
This bug will also requires a cron. Basically retrieve the files listed in bug 1083718#c10 and pushing/commit them somewhere public (like mozilla.com) as we discussed.
Shyam, we are now ready to work on this (finally).

On this page, available only for vpn users with the correct permissions, we will export all the json files:
https://ship-it.mozilla.org/json_exports.html
We will need to export all these files (at least one more is coming on top of that list) on a public website.
Can you help us to:
* set up a cron job from this system (every 5 or 10 minutes would be great).
* publish it somewhere public (mozilla.org is fine)

We don't need history on these files.
Flags: needinfo?(smani)
Summary: Figure out how to export public json files produced by ship-it → Export public json files produced by ship-it
Blocks: 1201289
Blocks: 1201286
(In reply to Sylvestre Ledru [:sylvestre] from comment #13)
> Shyam, we are now ready to work on this (finally).
> 
> On this page, available only for vpn users with the correct permissions, we
> will export all the json files:
> https://ship-it.mozilla.org/json_exports.html
> We will need to export all these files (at least one more is coming on top
> of that list) on a public website.
> Can you help us to:
> * set up a cron job from this system (every 5 or 10 minutes would be great).
> * publish it somewhere public (mozilla.org is fine)

Sure. So, where are the files today? or the script to generate the /json_exports.html file? 

I'm thinking of making a ship-it-public.mozilla.org and putting the files there.
Flags: needinfo?(smani)
Files are available here:
https://ship-it.mozilla.org/json_exports.html
json_exports.html is static.
From this page, we link to other json files, these files should be exported.
FYI, this is the case here: https://www.mozilla.org/includes/product-details/json/firefox_versions.json (the other files are also exported).

"ship-it-public.mozilla.org" I would prefer product-details.mozilla.org to avoid confusion with the tool and to keep the same wording as the previous implementation.
+1 for product-details.m.o
Flags: needinfo?(smani)
Ah, ok. This makes more sense now. Thanks!

I'm happy to use product-details.mozilla.org

However, it seems like https://ship-it.mozilla.org/json_exports.html is generated programmatically and I can't seem to find those files in the Filesystem. Is this correct? If that's the case, would it be possible to have that dumped by the code that generates them into a spot in the filesystem? 

Happy to discuss with Sylvestre or pmac when you guys are online. Thanks!
Flags: needinfo?(smani)
Attached file Add the list of urls
I wrote this PR to add two new URLs

When the change is live, the list of proposed files will be available here:
https://ship-it.mozilla.org/json/json_exports.json
https://ship-it.mozilla.org/json_exports.txt
Rail published them. Thanks Rail :)

Shyam, here is a small script which should do the job and will scale in case of new files.

#!/bin/bash
LOGIN="sledru@mozilla.com"
CURL_ARGS="-O -s --user $LOGIN"
curl $CURL_ARGS https://ship-it.mozilla.org/json_exports.txt
for f in $(cat json_exports.txt); do
  curl $CURL_ARGS https://ship-it.mozilla.org/$f
done
Flags: needinfo?(smani)
Rail/Ben,

I see we already have an automation account setup for ship-it (that I can re-use for the above purpose). Any idea about the auth details for that? Thanks!
Flags: needinfo?(smani)
Flags: needinfo?(rail)
Flags: needinfo?(bhearsum)
Maybe instead of using credentials, it'd be better to whitelist the URLs? In other words, disable auth for those URLs.
Flags: needinfo?(rail)
(In reply to Rail Aliiev [:rail] from comment #21)
> Maybe instead of using credentials, it'd be better to whitelist the URLs? In
> other words, disable auth for those URLs.

I think this would be better too, if it's possible. There's no ACLs within Ship It, so any person or system with those credentials could trigger releases. Since there's nothing protected about these endpoints, and someone hitting them can't make changes using them (though we should double check this), whitelisting them seems like a good solution.
Flags: needinfo?(bhearsum)
Depends on: 1210710
I tried to implement that into bug 1210710 but I failed.
AuthBasicHandler ( http://pythonpaste.org/modules/auth.basic.html ) seems limited and does not allow per page permissions.

We could change the auth system to something more clever but is it worth it? Rail, your opinion?
Flags: needinfo?(rail)
AFAIK, ship-it's auth is done by apache. If we add some custom rules, we can get those pages without authentication.
Flags: needinfo?(rail)
Shyam, as it is server side, I assigned you the bug 1210710.
Could you take care of these two bugs soon? Thanks
Flags: needinfo?(smani)
So to summarize the work required here, since it's a very long confusing bug.

Webops is being asked to create a new site from scratch, "product-details.mozilla.org". This site will consist solely of a homepage linking to JSON files, and then the JSON files mirrored by the script in comment 19 or its equivalent.

Sylvestre, needinfo? you for two questions:

Without going into the technical details of how we'll implement each of these steps, is the above a complete and accurate representation of your new site request?

Do you wish to provide input on the appearance of the homepage? If so, please provide an example HTML page (and a ZIP file of any image/css/js resources it depends on) that contains at least two example JSON links, and we will integrate it into the script unmodified. If not, we'll provide a basic file listing with no particular styling.
Flags: needinfo?(sledru)
Assignee: nobody → eziegenhorn
Flags: needinfo?(smani)
(In reply to Richard Soderberg [:atoll] from comment #26)
> Webops is being asked to create a new site from scratch,
> "product-details.mozilla.org". This site will consist solely of a homepage
> linking to JSON files, and then the JSON files mirrored by the script in
> comment 19 or its equivalent.
exactly that.

> 
> Sylvestre, needinfo? you for two questions:
> 
> Without going into the technical details of how we'll implement each of
> these steps, is the above a complete and accurate representation of your new
> site request?
yes

> Do you wish to provide input on the appearance of the homepage? If so,
> please provide an example HTML page (and a ZIP file of any image/css/js
> resources it depends on) that contains at least two example JSON links, and
> we will integrate it into the script unmodified. If not, we'll provide a
> basic file listing with no particular styling.
For now, basic file listing is fine.
I will provide a new html page later as basic json files are enough.

Thanks for the help!
Flags: needinfo?(sledru)
For reference, this is what we're using now:

http://svn.mozilla.org/libs/product-details/json/

Our scripts[0] scrape that page and the one for the regions folder below that for links to json files, download said files, and store the locally on the server for use.

[0] https://github.com/mozilla/django-product-details
:pmac, is this the script you want us to run to generate the directory of files?
I checked the repository and, uh, I can't determine what is the django app and what the usage case is for the 'scripts' you referenced. If you can provide us with usage instructions, we can use that instead of the bash script above or something homegrown.
Flags: needinfo?(pmac)
not at all. that's what sylvestre's team does. "django-product-details" consumes the output of the script for other software to use (e.g. bedrock and basket). I just linked here as an example of a consumer of this new product-details.m.o site.
I'm just trying to say that it need not be fancy, just a page like the one the svn server created, or like one apache generates for a directory listing.
Flags: needinfo?(pmac)
I talked with :pmac and we agree that the plan in comment 26 remains sound (bash script is correct, "Index of /" apache was our plan).
The url retrieved by the script in comment 19 requires authorization still, is that the right URL and is the plan still to whitelist it so as to not require auth?
We should either create our own credential or auth bypass, your call. Do we admin the ship it server in question?
About the auth issue, see comments #20, #21 & #22.
:sylvestre may I remove auth on all of the /json path, including everything below it?  json_exports.txt is not enough as we then need to fetch everything it references, which as I understand can change.  Currently they are all under /json though, so if that is to stay the same and there isn't anything else valuable in there, I will just remove auth for /json.
Yes, don't hesitate! Thanks
Eric, have you been able to make to progress on this? Thanks
Flags: needinfo?(eziegenhorn)
I have been pulled away by other things.  I removed auth in Apache for the json_exports.txt file but was still getting an error saying auth was required -- I have suspicions as to why but haven't conclusively nailed it down yet.  I hope to get back to this tomorrow.
Flags: needinfo?(eziegenhorn)
I've successfully removed auth on the dev site now, but when I do so, the app returns no information for the dynamically generated /json_exports.txt and /json/*.  Might the app somehow depend on login/user information?
The dev instance doesn't have proper information about releases. The only page that gives some output is https://ship-it-dev.allizom.org/json/json_exports.json, but it requires AUTH.
Thanks.  I'll figure this out in production then, but it needs to wait until Nov 12th, after the IT change freeze is over.  Thanks for your patience.
Blocks: 1225107
I've firmed up my suspicions from the 27th -- I believe the app is somehow requiring or depending on auth.  I've turned off auth in Apache for the relevant dynamically-generated json files and a test file.  The test file gets served up fine without auth.  The dynamically-generated files come back with "HTTP/1.1 401 UNAUTHORIZED" which I believe is coming from the app itself or some interaction with it.  I'm available on IRC all day today if a dev wants to look at it with me, but I believe the Apache configuration is good at this point.
Yep that's it.  Setting REMOTE_USER in Apache makes it work!  I'll run with that and if you want to special case it in your code, go right ahead.
Ok, with that gnarly auth issue out of the way I have http://product-details.mozilla.org/ up and running with the mirrored json files which will be updated every 10 minutes.  Please verify that looks good.
Needs a cert and a redirect from http -> https. It will open services using the JSON to easy MiTM without TLS.
Without arguing with the need for HTTPS, what is an example attack vector that could be activated by MitM'ing the JSON?
I can think of a few:

1. Could be a vulnerability in a client that uses the files that doesn't expect malicious data.
2. Could give an old and known-vulnerable version of Fx as latest possibly driving downloads.
3. Could inject messages or break a site in a way that would be embarrassing for us.

We only just got this fixed (by default) in django-mozilla-product-details when loading from svn.m.o. I'd mostly just like to avoid the regression.
#1 - what clients currently exist for this data?

#2 - to what service would the "old vulnerable version" be provided?

#3 - break what site? who will be using this?
(In reply to Richard Soderberg [:atoll] from comment #51)
> #1 - what clients currently exist for this data?

The main one is django-mozilla-product-details[0] which I've mentioned several times

> #2 - to what service would the "old vulnerable version" be provided?

www.mozilla.org (and thus the download buttons), support.m.o, developer.m.o, etc.

> #3 - break what site? who will be using this?

www.mozilla.org, support.m.o, developer.m.o, sites like those. And that's not to mention many community sites that rely on these data as well.

If you don't want to install the cert that's fine, but I'll have to file a separate bug and escalate. I can not switch the default in django-mozilla-product-details or bedrock to this service until it is accessible via HTTPS.

[0] https://pypi.python.org/pypi/django-mozilla-product-details
Don't worry, it'll get HTTPS. I'm just trying to scope the unexpected security risk here :)
(In reply to Eric Ziegenhorn :ericz from comment #47)
> Ok, with that gnarly auth issue out of the way I have
> http://product-details.mozilla.org/ up and running with the mirrored json
> files which will be updated every 10 minutes.  Please verify that looks good.

Is it expected that all versions on there are old ones and not the current versions?
Can you point me to a specific example of old content?
HTTPS has been enabled for https://product-details.mozilla.org, HTTP has been disabled.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
First of all: HOORAY for TLS! Thanks Eric.

But the files do contain old info:

Compare the version numbers in this file:

https://product-details.mozilla.org/firefox_versions.json

With the ones in this one:

http://svn.mozilla.org/libs/product-details/json/firefox_versions.json

e.g. LATEST_FIREFOX_VERSION should be 42.0, not 38.2.1. Most of the rest of the values are behind by a couple major versions as well. Not sure if this is expected and something on the ship-it side still needs to be enabled or is a real problem. Any ideas Sylvestre?
Flags: needinfo?(sledru)
https://ship-it.mozilla.org/json/firefox_versions.json shows LATEST_FIREFOX_VERSION "38.2.1" so yes that needs to be pursued on the ship-it side.
Yes, we (releng & relman) have to agree on the way we mark a release as shipped... Sorry :/
I will update the values by hand for now.
Flags: needinfo?(sledru)
https://product-details.mozilla.org/ seems to be still a little outdated. What's the ETA prod ready?
Depends on: 1234832
When bug 1234832 is fixed (probably in the next few weeks)
Depends on: 1266790
I think we accidentally resolved this, treating it as a Webops bug, but based on the past couple months of comments, this is still WIP. Reopening on that basis.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Sylvestre is there something else we should track here as per comment 62?
Flags: needinfo?(sledru)
See Also: → 1272390
I think we are good now.
We just need to implement bug 1272390 for the regions.
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Flags: needinfo?(sledru)
Resolution: --- → FIXED
Michal, I believe you can switch ! :)
Flags: needinfo?(mstanke)
Thank you Sylvestre. I have updated Firefox and Mobile on the site. But still seems not all versions are up to date. For Firefox and Fennec I see 48.0b2 instead of 48.0b3. And Thunderbird is stuck with 38.7.1 instead of 45.1.1 as latest, so I still rely on the old source now.
Flags: needinfo?(mstanke) → needinfo?(sledru)
Yes, I pinged releng about that. It is because we pushed late last Friday.
Flags: needinfo?(sledru)
Component: Applications: ShipIt (backend) → Applications: ShipIt
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: