Closed Bug 770995 Opened 12 years ago Closed 9 years ago

Partial update generation service

Categories

(Release Engineering :: General, defect, P5)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: rail, Mentored)

References

Details

Attachments

(2 files)

Bug 768576 asked us to create partial updates from the N-2 release to N, which was a fairly painful manual process. Having a service that generates partials on request would make this a lot simpler.

Eg, we provide
 * url to starting complete mar, and expected hash
 * url to starting complete mar, and expected hash 
 * version and channel_id to apply to partial
 * (maybe) the cert to sign the new partial with
which returns either the partial mar and checksum information, or  directly uploads to a provided location on stage.

We'd still need to create snippets and handle them etc.

This isn't quite the same as 'partials on demand' that catlee promotes, since that's something responsive to the number of requests for completes where a partial would help, but a partial generator would be a necessary component of that.
Product: mozilla.org → Release Engineering
I was talking to Rob about bug 941949 the other day and we chatted about this briefly too. One thing that he mentioned is that it's really important to him to use the in-tree tools to do this, to make sure we don't accidentally serve something to someone that can't use it (eg, a MAR with an unsupported manifest type in it). This is probably something that requires additional discussion, but one way we could do this is to grab the repo/revision of the starting MAR from its application.ini and build the tools from that (and cache them). This could get pretty expensive for Nightly, even with the caching though.
Whiteboard: [mentor=bhearsum]
We've brainstormed around this before, and one idea we had was to explicitly version the MAR format/contents. Any time the in-tree tools change, we would bump the version. The partial generation service would need to be told explicitly which version of partial to generate.
If I want to go look at MAR-related code, where do I go? The MAR wiki page seems to indicate that it's implemented in C, are there Python wrappers? Where does this stuff live? I'm eager to do some exploration and could post ideas here so that they'll be useful even if I don't have time to implement.
We have a python implementation here:
http://hg.mozilla.org/build/tools/file/tip/buildfarm/utils/mar.py

NB - I'm not sure if it implements the new(ish) mar signatures correctly at this point.

The C code lives in http://mxr.mozilla.org/mozilla-central/source/modules/libmar/src/
When someone is ready to work on this we'll need to figure out some more details, but the basic implementation will probably be along these lines:
* Simple HTTP API interface - probably no UI. I'm not too fussy about the details, but we shouldn't require uploading of both MARs every time we make a request (we'd generate a lot of unnecessary network traffic if we did. A couple of ideas to work around this:
** Send links to files instead of actual files. Eg, POST /partial w/ data mar1=http://... and mar2=http://...
** Require file uploads before requesting a partial. Flow could be like this:
GET /mar/<checksum>
if 404, POST /mar/checksum w/ full mar in body
repeat for 2nd mar
POST /partial w/ data mar1=<checksum> and mar2=<checksum>
* Once we have the request for a partial + the two complete MARs, a partial mar will be generated. As noted in comments #1 and #2, we'll need to support multiple MAR formats. I agree that this should be passed explicitly. Partial mar generation depends on "mar" (or mar.py like catlee mentioned) and "mbsdiff" binaries as well as the make_incremental_update.sh script: https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/make_incremental_update.sh. Note all of the extra update.manifest stuff going on there.
* Actually generation of MARs must be able to scale across multiple cores _and_ machines. Could be implemented by multiple independent backend nodes running the same app, fronted by a load balancer (this would be similar to our signing servers). Could also be implemented by offloading work to various nodes through Celery or similar. I'm leaning a little more towards the latter because it seems to be the direction we're heading.


(In reply to Chris AtLee [:catlee] from comment #2)
> We've brainstormed around this before, and one idea we had was to explicitly
> version the MAR format/contents. Any time the in-tree tools change, we would
> bump the version. The partial generation service would need to be told
> explicitly which version of partial to generate.

Does this address all of the update.manifest that the in-tree tools do, too?
> (In reply to Chris AtLee [:catlee] from comment #2)
> > We've brainstormed around this before, and one idea we had was to explicitly
> > version the MAR format/contents. Any time the in-tree tools change, we would
> > bump the version. The partial generation service would need to be told
> > explicitly which version of partial to generate.
> 
> Does this address all of the update.manifest that the in-tree tools do, too?

Yeah, that's the idea. If we change how the in-mar manifests are treated (by adding another manifest, or changing an existing one), then we should bump this version #.
(In reply to Chris AtLee [:catlee] from comment #6)
> > (In reply to Chris AtLee [:catlee] from comment #2)
> > > We've brainstormed around this before, and one idea we had was to explicitly
> > > version the MAR format/contents. Any time the in-tree tools change, we would
> > > bump the version. The partial generation service would need to be told
> > > explicitly which version of partial to generate.
> > 
> > Does this address all of the update.manifest that the in-tree tools do, too?
> 
> Yeah, that's the idea. If we change how the in-mar manifests are treated (by
> adding another manifest, or changing an existing one), then we should bump
> this version #.

Rob, what do you think of this? This would be much, much easier to implement if we didn't rely on in-tree tools. Generally, it seems like a good idea to bump the MAR format version when changes are made, too.
Flags: needinfo?(robert.bugzilla)
I believe all previous changes I have made to the mar generation scripts have been backwards compatible especially since we have to maintain backward compatibility to serve older clients the latest mar files. So, I'm not sure what versioning would buy us.

We have had cases where the mar generation scripts (sh and py) have generated different mar files for the same update type and that needs to be prevented whether or not versioning is implemented for this bug. I don't have a problem with separate code generating mar files as long as there is something in place that prevents them from generating different mar files. I think the simplest method to accomplish this would be to finish implementing the mar generation in python (complete mar generation has not been implemented in python yet) and just using that for both. That way only one script would need to be updated and maintained for both this bug and for the in tree mar generation tools. That would also remove the need to update the partial and complete shell mar generation script. It should be fairly trivial to just drive the python code for this bug.
Flags: needinfo?(robert.bugzilla)
"Finish implementing the mar generation in Python": this is the aforementioned make_incremental_update.sh script? Is there a bug for this already? Sounds it like it should be a blocker for this bug?
Always using the latest partial generation code would be a change from how we currently do things, even if they are supposed to be backwards compatible.

Currently all partials are generated from the in-tree tools. For example, if mozilla-central and mozilla-beta have differences in generating partials, right now we serve mozilla-central style partials only to nightly users, and mozilla-beta style partials to beta users.

If we decide to use just the latest implementation for the service, then beta users will be given mozilla-central style partials.

As Rob mentioned, the modifications to these tools are all supposed to be backwards compatible. I would just like to highlight this particular risk if we decide to not version the partial generation service.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #8)
> I believe all previous changes I have made to the mar generation scripts
> have been backwards compatible especially since we have to maintain backward
> compatibility to serve older clients the latest mar files. So, I'm not sure
> what versioning would buy us.

Do we have any fallback plan if we screw that up at some point?

> We have had cases where the mar generation scripts (sh and py) have
> generated different mar files for the same update type and that needs to be
> prevented whether or not versioning is implemented for this bug.

Just to avoid confusion, which files are you talking about exactly? When you say this I'm thinking of:
https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/common.sh
https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/make_full_update.sh
https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/make_incremental_update.sh
https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/unwrap_full_update.pl

> I don't
> have a problem with separate code generating mar files as long as there is
> something in place that prevents them from generating different mar files. I
> think the simplest method to accomplish this would be to finish implementing
> the mar generation in python (complete mar generation has not been
> implemented in python yet) and just using that for both.

And here, I assume you're referencing https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/make_incremental_updates.py?

> That way only one
> script would need to be updated and maintained for both this bug and for the
> in tree mar generation tools. That would also remove the need to update the
> partial and complete shell mar generation script. It should be fairly
> trivial to just drive the python code for this bug.

There's a couple of tricky things about this.
* "In tree" means we're talking about at least 5 different versions at any given time (central, aurora, beta, release, esr).
* The service we write will likely have it's own copy, because we'll want explicit updates rather than implicitly pulling from a tree.

The way we dealt with this for signing was to put the signing tools outside of the tree, and add hooks into the build system that call them (eg, MOZ_SIGN_CMD is a variable that we point at a script which is outside the tree). Even doing this, we use the in-tree mar binary/mbsdiff still. I think this would ultimately mean replacing most or all of tools/update-packaging with out of tree stuff. I'm not terribly happy with that, to be honest. I think I have a better idea below:

This is where versioning may help though. If we insist on a new MAR version format anytime we change anything in it, things will break at build time until the partial update generation service supports it. This could help with the sync issue too. Eg, we could create a new build target that packages up all of the update tooling (mar, mbsdiff, whatever scripts we need) that we could deploy to the partial generation machines.

Rob, Chris - any thoughts?

(In reply to Dirkjan Ochtman (:djc) from comment #9)
> "Finish implementing the mar generation in Python": this is the
> aforementioned make_incremental_update.sh script? Is there a bug for this
> already? Sounds it like it should be a blocker for this bug?

It seems likely that we'll be doing that...there's certainly no bug for it yet though. Once we finalize the plan I'll file the necessary bug chain.
Flags: needinfo?(robert.bugzilla)
Flags: needinfo?(catlee)
(In reply to Ben Hearsum [:bhearsum] from comment #11)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #8)
> > I believe all previous changes I have made to the mar generation scripts
> > have been backwards compatible especially since we have to maintain backward
> > compatibility to serve older clients the latest mar files. So, I'm not sure
> > what versioning would buy us.
> 
> Do we have any fallback plan if we screw that up at some point?
Revert to the previous code has been the plan for previous changes.

> 
> > We have had cases where the mar generation scripts (sh and py) have
> > generated different mar files for the same update type and that needs to be
> > prevented whether or not versioning is implemented for this bug.
> 
> Just to avoid confusion, which files are you talking about exactly? When you
> say this I'm thinking of:
> https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/common.
> sh
> https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/
> make_full_update.sh
> https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/
> make_incremental_update.sh
> https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/
> unwrap_full_update.pl
Yes

> > I don't
> > have a problem with separate code generating mar files as long as there is
> > something in place that prevents them from generating different mar files. I
> > think the simplest method to accomplish this would be to finish implementing
> > the mar generation in python (complete mar generation has not been
> > implemented in python yet) and just using that for both.
> 
> And here, I assume you're referencing
> https://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/
> make_incremental_updates.py?
Yes again

> > That way only one
> > script would need to be updated and maintained for both this bug and for the
> > in tree mar generation tools. That would also remove the need to update the
> > partial and complete shell mar generation script. It should be fairly
> > trivial to just drive the python code for this bug.
> 
> There's a couple of tricky things about this.
> * "In tree" means we're talking about at least 5 different versions at any
> given time (central, aurora, beta, release, esr).
> * The service we write will likely have it's own copy, because we'll want
> explicit updates rather than implicitly pulling from a tree.
> 
> The way we dealt with this for signing was to put the signing tools outside
> of the tree, and add hooks into the build system that call them (eg,
> MOZ_SIGN_CMD is a variable that we point at a script which is outside the
> tree). Even doing this, we use the in-tree mar binary/mbsdiff still. I think
> this would ultimately mean replacing most or all of tools/update-packaging
> with out of tree stuff. I'm not terribly happy with that, to be honest. I
> think I have a better idea below:
> 
> This is where versioning may help though. If we insist on a new MAR version
> format anytime we change anything in it, things will break at build time
> until the partial update generation service supports it. This could help
> with the sync issue too. Eg, we could create a new build target that
> packages up all of the update tooling (mar, mbsdiff, whatever scripts we
> need) that we could deploy to the partial generation machines.
> 
> Rob, Chris - any thoughts?
I'm fine with this
Flags: needinfo?(robert.bugzilla)
+1
Flags: needinfo?(catlee)
Depends on: 950684
Depends on: 950689
(In reply to Ben Hearsum [:bhearsum] from comment #11)
> This is where versioning may help though. If we insist on a new MAR version
> format anytime we change anything in it, things will break at build time
> until the partial update generation service supports it. This could help
> with the sync issue too. Eg, we could create a new build target that
> packages up all of the update tooling (mar, mbsdiff, whatever scripts we
> need) that we could deploy to the partial generation machines.

This is bug 950684

> (In reply to Dirkjan Ochtman (:djc) from comment #9)
> > "Finish implementing the mar generation in Python": this is the
> > aforementioned make_incremental_update.sh script? Is there a bug for this
> > already? Sounds it like it should be a blocker for this bug?
> 
> It seems likely that we'll be doing that...there's certainly no bug for it
> yet though. Once we finalize the plan I'll file the necessary bug chain.

This is bug 950689.

With those two pieces moved out, I think this bug can focus on implementing the actual back-end service, which needs a bit more design still.
Assignee: nobody → ffledgling
Script being used to test the mar service at the moment, just wanted feedback to make sure I'm on the right track.
Attachment #8438290 - Flags: feedback?(hwine)
I'll be using this bug to get feedback/reviews for various Senbonzakura code chunks because github isn't ideal for a feedback review cycle without pull requests.
Will also be using this as a tracking bug.
Hope that's alright.
Depends on: 1023624
Attachment #8438290 - Attachment mime type: application/x-sh → text/plain
Comment on attachment 8438290 [details]
Bash script used for testing service

Looks good - only a few of the comments I'm making are functional. The rest are more stylistic.

>#!/usr/bin/env bash
More common to use /bin/bash directly. Indirecting via /usr/bin/env is mostly a python idiom.

If you're using alternate versions of bash to test your test, GREAT! And that's worthy of a comment.

Also, for test (and most all scripts), I like to explicitly turn on "fail on error" and "undefined variables are error":
  set -eu

That helps catch test harness errors, so you don't spend time debugging the wrong program.

># General variables
>DEFAULT_URL='http://127.0.0.1:5000'
>DEFAULT_CACHE='/perma/cache'

This reads as if it's an absolute path on my local machine. In which case, I'd need root access to create that path. If DEFAULT_CACHE is just a name used within the caching system, DEFAULT_CACHE_NAME would remove confusion. Reading your test code is often the first thing a brand new contributor will do, so they may not yet be familiar with the terminology of the app classes. If it's a path relative to my current dir, having it start with '/' is misleading.

>DEFAULT_DB='./test.db'

A comment here to say what kind of db would help. Is it mysql, sqlite3, adhoc?

># Nightly related variables
>DEFAULT_SRC_MAR='http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-05-12-03-02-02-mozilla-central/firefox-32.0a1.en-US.mac.complete.mar'
>DEFAULT_DST_MAR='http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-05-13-03-02-01-mozilla-central/firefox-32.0a1.en-US.mac.complete.mar'

These variables aren't so much "about nightly" as "Currently using nightly builds as the default test case". So the lead in comment should be changed. And why nightly builds?

># Release related variables
>FF29="http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/29.0/update/mac/en-US/firefox-29.0.complete.mar"
>FF29_HASH="cd757aa3cfed6f0a117fa16f33250f74"
>FF28="http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/28.0/update/mac/en-US/firefox-28.0.complete.mar"
>FF28_HASH="d7f65cf5002a1dfda88a33b7a31b65eb"
>RELEASE_IDENTIFIER="$FF28_HASH-$FF29_HASH"

General note: both default test cases are for mac platform. Any reason for that? The vast majority of FF users are on Windows, so it would be good to have at least one built in case be for that platform.

># Coloured output

Fancy! Especially for bash.
>error(){
>    # Red
>    str=$1
Any reason to not have this be: str="$@"? Being surprised by truncated error messages is no fun.

>    echo -e "\e[0;31m"$str"\e[0m" >&2

Personally, I'd leave out the internal double quotes, and use ${str} if you want to set off the boundary.

>usage(){

Nice to have "usage"!

>count=0
>while getopts ":u:hd" option

I'm not very familiar with getopts, but I'd be dis-inclined to disable the internal error messages. Unless your replacement error handling is perfect, there is a risk of a silent test harness failure being interpreted as a sut failure.
>do
>    case $option in
>        d)
>            DEBUG=true;

"DEBUG" is neither initialized to "false" anywhere, nor used, so I can't comment on whether or not this code is correct. If it's part of your "boiler plate", then I'd recommend an initialization above. Otherwise, YAGNI and remove for now.

>
>        u)
>            echo "OPTARG" $OPTARG
>            #DEFAULT_URL=
>            count=$(expr $count + 2);

So, 2 comments here - not sure you should be manually counting -- that's the one useful feature of 'getopts' aiui. :)  Also, no issues if the feature isn't implemented yet, but let the operator know using an 'error' message.

>        h)
>            usage;
>            exit 0;

By setting an exit code here, you're implying it can be relied upon. That's great! And what code is being returned by the main loop?

>shift $count

I think you want: shift $(($OPTIND - 1))
>
>trigger_partial_build(){
>
>    debug "CMD: curl -X POST" "$DEFAULT_URL/partial" -d "mar_from=$SRC_MAR&mar_to=$DST_MAR&mar_from_hash=$SRC_HASH&mar_to_hash=$DST_HASH"
>    curl -X POST "$DEFAULT_URL/partial" -d "mar_from=$SRC_MAR&mar_to=$DST_MAR&mar_from_hash=$SRC_HASH&mar_to_hash=$DST_HASH" && echo

The above is a common pattern, but difficult to maintain. Consider:
    cmd='echo "hi there"'
    debug "CMD: $cmd"
    $cmd
or refactoring to a function.

>
>clobber(){
>    # Wipe ALL the things
>    CACHE=$1
>    DB=$2
>
>    rm $1/*

Two things:
 a) since you made the assignment CACHE=$1, it would be clearer to use "$CACHE"
 b) danger danger -- if you call without a value, you try to delete the root dir. Worth a check to ensure you avoid that.

>    "trigger") 

fwiw, you don't need the quotes

>        trigger_partial_build $DEFAULT_SRC_MAR $DEFAULT_DST_MAR $DEFAULT_SRC_HASH $DEFAULT_DST_HASH
>        debug "Called trigger_partial_build"

I prefer the log message before the action:
  a) I always know what the program thinks it's doing
  b) I have the context in advance for any error messages that appear on the screen
Attachment #8438290 - Flags: feedback?(hwine) → feedback+
Mentor: bhearsum
Whiteboard: [mentor=bhearsum]
Patch for the first draft of the service, it still has a bunch of comments, prints and such, but other than that all other input welcome.
Attachment #8442844 - Flags: feedback?(bhearsum)
Comment on attachment 8442844 [details] [diff] [review]
senbonzakura.patch

Review of attachment 8442844 [details] [diff] [review]:
-----------------------------------------------------------------

A couple of overall comments:
* You should reorganize these files into a more proper package. SlaveAPI might be the best example we have these days: http://git.mozilla.org/?p=build/slaveapi.git;a=tree - Balrog is also a good one, though: https://github.com/mozilla/balrog/. Basically, you want all your code and tests in an actual module. Things like requirements files and shell scripts are fine to live at the root level. I think it would also be good to clearly deliniate between code that runs in different places. Eg, worker only code should go in one place, webapp only code in another, and any shared code should go somewhere that is obviously shared.
* I'd like a walkthrough of code at some point, but that will probably have to wait until I'm back from PTO.

I know there's a lot of feedback below, but it looks like things are coming along nicely. Once you're at a more stable point functionality wise, I think things will look a lot better with a round of clean-up/organization.

::: api.py
@@ +26,5 @@
> +
> +# Turn off werkzeug logging
> +
> +log = logging.getLogger('werkzeug')
> +log.setLevel(logging.ERROR)

This should probably move to a .wsgi and/or standalone server when you have them - it'll be easier to change this log level from there if necessary eg, for debugging weird issues.

@@ +53,5 @@
> +    # TODO: Validate params and values in form Ideally
> +    mar_from = flask.request.form['mar_from']
> +    mar_to = flask.request.form['mar_to']
> +    mar_from_hash = flask.request.form['mar_from_hash']
> +    mar_to_hash = flask.request.form['mar_to_hash']

You may want to look at using Flask-WTF to help validation. Certainly not a blocker, but it can clean up this sort of thing, and reduce duplication if you have the same arguments in multiple endpoints: https://flask-wtf.readthedocs.org/en/latest/. You can find examples of it being used in Balrog and Ship It.

@@ +55,5 @@
> +    mar_to = flask.request.form['mar_to']
> +    mar_from_hash = flask.request.form['mar_from_hash']
> +    mar_to_hash = flask.request.form['mar_to_hash']
> +    # TODO: Verify hashes and URLs are valid before returning the URL with a 201
> +    #       or return the concat anyway and just return a 202?

I'm not sure this is worthwhile. It might be OK to do a HEAD request to make sure the files exist, but you'd have to download them to verify the hash, and then throw them away. I think it's okay for the worker to just return an error for that case.

@@ +66,5 @@
> +        # error testing and parameter validation, maybe do this up close to checking
> +        # existence
> +
> +        dbo.insert(identifier=identifier, status=db.status_code['IN_PROGRESS'], start_timestamp=time.time())
> +    except db.IntegrityError, e:

I'd recommend against using "except" for flow control. You also need to catch real exceptions somewhere so that you can either return a 400 error or a 500 that doesn't leak server information. A better flow for this could be:
try:
    partial = dbo.lookup(...)
    if partial:
        # return 201
    else:
        dbo.insert(...)
        # return 202
except:
    # return 500
finally:
    dbo.close()

@@ +104,5 @@
> +@app.route('/partial/<identifier>', methods=['GET'])
> +def get_partial(identifier):
> +
> +    app.config['count']+=1
> +    print "Count -- %s" % app.config['count']

I assume this is just testing code? If not, you should move "count" somewhere else. app.config should be considered static after the app starts.

@@ +107,5 @@
> +    app.config['count']+=1
> +    print "Count -- %s" % app.config['count']
> +
> +    cacheo = cache.Cache(app.config['CACHE_URI'])
> +    dbo = db.Database(app.config['DB_URI'])

These should probably be global singletons. You're going to end up with objects per request here, which will end up being particularly problematic for the database connections.

@@ +130,5 @@
> +        if status == db.status_code['COMPLETED']:
> +            # Lookup DB and return blob
> +            # Call relevant functions from the core section.
> +            # We'll want to stream the data to the client eventually, right now,
> +            # we can just throw it at the client just like that.

Streaming might be easier than you're making it out to be. It'll be a little different with Flask, but we do something similar when returning files from the signing servers:
https://github.com/mozilla/build-tools/blob/master/lib/python/signing/server.py#L522

@@ +161,5 @@
> +
> +        elif status == db.status_code['ABORTED']:
> +            # Something went wrong, what do we do?
> +            resp = flask.Response("{'result': '%s'}" %
> +                        "Something went wrong while generating this partial",

When we hit one of these, any idea how we'll debug it? Eg, are there logs on the workers to look at?

@@ +177,5 @@
> +        else:
> +            # This should not happen
> +            resp = flask.Response("{'result':'%s'}" % 
> +                                  "Status of this partial is unknown",
> +                                  status=400)

You probably want a 500 here - it's not the client's fault that the server did something wrong!

::: cache.py
@@ +1,1 @@
> +# Stub, structure only

Have you thought about how aging old items out of the cache will work yet?

@@ +27,5 @@
> +        self.cache_dir = cache_uri
> +        self.cache_complete_dir = os.path.join(self.cache_dir, 'complete') # For complete Mars/files
> +        self.cache_diffs_dir = os.path.join(self.cache_dir, 'diff') # For diffs
> +        self.cache_partials_dir = os.path.join(self.cache_dir, 'partial') # for partials
> +

You'll want to move instantiation to a helper method in order to make this a global singleton - same thing for the database class.

@@ +42,5 @@
> +    def _generate_identifier(self, key):
> +        return '-'.join(csum.hexto64(x) if len(x)==128 else x for x in key.split('-'))
> +
> +
> +    def save(self, string, key, category, isfile=False):

"string" is a bad name because it's also the name of a python builtin module. string_ would be fine, but something like "data" or "value" might be better.

@@ +48,5 @@
> +        # being written to cache, but since it exists is returned as is (most
> +        # likely corrupted).
> +
> +        # Possible solution - Write file as id.tmp and then rename to id when
> +        # done

You should probably do this even for keys that don't already exist. Otherwise you've got a race condition where you're writing to the file while a GET is happening - and the client ends up getting a partial file.

@@ +63,5 @@
> +
> +        if isfile:
> +            try:
> +                with open(string, 'rb') as f:
> +                     data = f.read()

You're going to consume loads of memory if you do this. I'd recommend dropping this block altogether...

@@ +70,5 @@
> +                logging.warning('Could not read file source %s ' %string +
> +                                'to insert into cache')
> +                raise oddity.CacheError('Error reading input %s' % string)
> +        else:
> +            data = string

...and putting this in a StringIO instead. Then when you write out, you can do it in chunks.

@@ +72,5 @@
> +                raise oddity.CacheError('Error reading input %s' % string)
> +        else:
> +            data = string
> +        
> +        identifier = self._generate_identifier(key)

I'm not quite sure I understand the purpose behind this identifier. Is the key by itself not unique enough?

@@ +96,5 @@
> +                raise oddity.CacheError('Could not insert in Cache')
> +
> +
> +        if self.find(key, category):
> +            raise oddity.CacheCollisionError('identifier %s collision' % identifier)

Other than a general hash collision, is this guarding against anything else?

@@ +100,5 @@
> +            raise oddity.CacheCollisionError('identifier %s collision' % identifier)
> +
> +        try:
> +            with open(file_cache_path, 'wb') as f:
> +                f.write(data)

Here's an example of iterating over a file:
https://github.com/mozilla/build-tools/blob/master/lib/python/signing/server.py#L523

@@ +113,5 @@
> +        """ Checks if file with specified key is in cache
> +            returns True or False depending on whether the file exists
> +        """
> +        identifier = self._generate_identifier(key)
> +        id_path = os.path.join(*[d for d in identifier[:5]] + [identifier[5:]])

If you do need the identifier, this should probably be factored out.

@@ +122,5 @@
> +            file_cache_path = os.path.join(self.cache_complete_dir, id_path)
> +        elif category == 'diff':
> +            file_cache_path = os.path.join(self.cache_diffs_dir, id_path)
> +        elif category == 'partial':
> +            file_cache_path = os.path.join(self.cache_partials_dir, id_path)

This too.

@@ +132,5 @@
> +    def retrieve(self, key, category, output_file=None):
> +        """ retrieve file with the given key
> +            writes the file to the path specified by output_file if present
> +            otherwise returns the file as a binary string/file object
> +        """

I'm curious what the use case for handling both of these things is? My gut says that an object whose responsibility is managing a cache shouldn't be writing to anything outside of the cache directory.

::: core.py
@@ +1,1 @@
> +# What should go into core and what shouldn't?

I think you'll either find the answer to this question or not need to answer it at all if you move some things to less generic places. Eg, maybe things relating to actually mar processing could go in mar.py or something similarly named. It's been my experience that using names like "core" (or "util", "misc", etc.) end up with huge files with a bunch of unrelated things in them.

@@ +72,5 @@
> +# Some global config here
> +
> +# FIXME: Need to move the DBO stuff into celery logic
> +    dbo = db.Database(DB_URI)
> +    cacheo = cache.Cache(CACHE_URI)

Similar thing here - singleton objects should be used!

@@ +86,5 @@
> +    TMP_MAR_STORAGE = tempfile.mkdtemp(prefix='cmar_storage_')
> +    TMP_WORKING_DIR = tempfile.mkdtemp(prefix='working_dir_')
> +
> +    new_cmar_path = os.path.join(TMP_MAR_STORAGE, 'new.mar')
> +    old_cmar_path = os.path.join(TMP_MAR_STORAGE, 'old.mar')

Is there a reason these need to be saved to tmp and to the cache? It seems like they should just go straight into the cache and be used from there. When you unpack them, they'll obviously go into tempdirs, but two completes are read-only, so there's probably no reason to have more than one copy.

@@ +161,5 @@
> +def generate_partial_mar(cmar_new, cmar_old, difftools_path, working_dir=None):
> +    """ cmar_new is the path of the newer complete .mar file
> +        cmar_old is the path of the older complete .mar file
> +        difftools_path specifies the path of the directory in which
> +        the difftools, including mar,mbsdiff exist

The length of this function fails the smell test, are there things you can factor out? Eg, unwrapping a MAR.

@@ +180,5 @@
> +
> +    # Add MAR and MBSDIFF to env we pass down to subprocesses
> +    my_env = os.environ.copy()
> +    my_env['MAR'] = MAR
> +    my_env['MBSDIFF'] = MBSDIFF

If we end using our own version of make_incremental_updates.py, I think it would be better to give it a Python interface rather than call out to a subprocess. Subprocesses suck! Obviously you're stuck calling out to shell and perl scripts, though.

::: csum.py
@@ +1,1 @@
> +import hashlib

Nit: checksum.py would be a nicer name - no need to shorten it.

@@ +20,5 @@
> +                if not chunk:
> +                    break
> +                m.update(chunk)
> +    else:
> +        m.update(string)

Like in other places, I don't think you should support operations on huge files in memory.

@@ +95,5 @@
> +        csum = getsha256(string, isfile=isfile)
> +    elif cipher == 'sha512':
> +        csum = getsha512(string, isfile=isfile)
> +    elif cipher == 'sha512b64':
> +        csum = getsha512b64(string, isfile=isfile)

Do you need all of these different hash types?

::: db_classes.py
@@ +9,5 @@
> +               'IN_PROGRESS': 2,
> +               'INVALID': 3, # Do we need this? We can use this to invalidate Cache entries
> +              }
> +
> +class Partial(Base):

SQLAlcehmy ORM is slooooow. Your database usage is minimal, so this probably won't make a big difference - but if you start seeing bad performance at scale, it may be something to look at.
http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly

@@ +50,5 @@
> +
> +    # Define Columns
> +    id = Column(Integer, primary_key=True)
> +    start_timestamp = Column(BigInteger, default=-1, nullable=False)
> +    finish_timestamp = Column(BigInteger, default=-1)

If you want to support sqlite, you may hit the same issues we did in Balrog: https://github.com/mozilla/balrog/blob/bee765ad4548ba3d7167431decdcb2269ef58824/auslib/db.py#L441

::: fetch.py
@@ +1,3 @@
> +import logging
> +import hashlib
> +import redo

This is being imported but not used. You definitely should be retrying downloads, though.

@@ +58,5 @@
> +        The file is written to the location specified by output file,
> +        if not specified, the downloaded file is returned.
> +
> +        Chunking version
> +    """

Is there a reason you need two of these? It seems like they could be one method, or at the very least the code could be factored. Either way, "downloadmar2" fails the smell test as a name!

The more I think about this, the more I think that you shouldn't allow anything other than chunked reading directly into another file. If files can be fully loaded into memory, it will be extremely easy to get into OOM situations.

@@ +84,5 @@
> +
> +        return # Only works with an output_file
> +
> +if __name__ == '__main__':
> +    TEST_URL = "http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-05-12-03-02-02-mozilla-central/firefox-32.0a1.en-US.mac.complete.mar"

Is this an actual test? If so, it should go with the other tests. I'd also recommend mocking out the network operations, otherwise you'll feel nothing but pain when they fail randomly

::: flasktask.py
@@ +1,1 @@
> +#from flask import Flask, request, Response

I think this is a file I'll need a walkthrough of to understand it's purpose/where it's run...

::: make_incremental_updates2.py
@@ +1,1 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public

If this is a modified version of the in tree copy, you should describe the changes somewhere. And as I mentioned further up, you should probably provide a Python interface to avoid calling it through a subprocess.

::: oddity.py
@@ +1,1 @@
> +#Calling this oddity because we don't want to override the built-in expcetions module

Once you have a proper library, this won't be an issue. Eg, this could be in:
senbonzakura/exceptions.py

And then you could do things like:
from senbonzakura.exceptions import Foo, Bar, Blah

::: test_csum.py
@@ +29,5 @@
> +        self.assertTrue(csum.verify(self.test_string, self.correct_sha512,
> +            cipher='sha512'))
> +        self.assertTrue(csum.verify(self.test_string, self.correct_sha512b64,
> +            cipher='sha512b64'))
> +

Two things here:
1) You can store the correct hashes on the object rather than using setUpClass. Eg:
class TestChecksums:
   correct_md5 = ...
   ...

2) You should split out the tests to different hashes into different methods. How it is right now, if md5 hashes are wrong, you won't know that other ones are wrong until the next time you run the tests. Generally, if you have this many asserts in one test you should consider splitting them out.

::: tools.py
@@ +59,5 @@
> +        elif OS in ['Windows', 'Microsoft']:
> +            platform_version = 'win32'
> +        else:
> +            raise oddity.ToolError('Could not determine platform')
> +        mar_tools_url = "http://ftp.mozilla.org/pub/mozilla.org/firefox/%s/latest-mozilla-central/mar-tools/%s/" % (self.channel, platform_version)

I don't think you ever want to download from this directory. If we're going the direction of one version of the tools, you should probably assume they're available on the machine and have the config file point at them. If this is the case, I don't think you need this class at all...

If we're going to deal with different versions of tools, we'll need to download them from specific directories, not latest.
Attachment #8442844 - Flags: feedback?(bhearsum) → feedback+
Depends on: 1029870
Depends on: 1027871
Status: NEW → ASSIGNED
So in the process of writing this service, I stumbled into an issue.
Files in the partials generated in production have files with permissions 0664
Partials generated on elsewhere have different permissions for the same files 644 or 666 depending on where they're generated. 
The contents of the unwrapped partial MARs are identical, (diff -rq'd) but the hashes differ because of the permission issues.

The custom partial with the different permission applies cleanly and there are no differences b/w file permissions after application of mar and before application.

Upon further investigation based on bhearsum's suggestion, we found that a new file with permissions 0666 in the partial.mar still ended ups with the 0644 (assuming that's correct) after update/application of the .mar on the client side.

I haven't been able to figure out what causes the difference in permissions though, because the umask in all three tried locations (production, local machine, docker container) seem to be the same at `0022`.

The most obvious problem caused by this is that I'm unable to verify that a partial generated by different sources in fact correct by looking at the hash.

The two questions I'd like to figure out are:
i) Does it matter what the permissions the patch (and other) files in the generated partial.mar are as long as the partial applies correctly? If so what are the possible repercussions?
ii) What's causing this behaviour of different permissions across platforms? What's a possible fix?

 ($MAR -T official.mar, ls -alR unwrapped_official/)
(0644 for Partials generated on local machine and 0666 for those generated in a docker container).
Flags: needinfo?(robert.strong.bugs)
Flags: needinfo?(bhearsum)
Not sure what is causing the difference between those but here is some background. When a file is patched it will have the permissions of the existing file while files that are added vs. patched will have the permissions set on the source file.
Flags: needinfo?(robert.strong.bugs)
BTW: the updater tests also test the above behavior for Mac and Linux.
Complete
mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/tests/unit_aus_update/head_update.js#437

Partial
http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/tests/unit_aus_update/head_update.js#437
Nick, do you have any info/insight and/or comments regarding the above?
Flags: needinfo?(nthomas)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #21)
> Not sure what is causing the difference between those but here is some
> background. When a file is patched it will have the permissions of the
> existing file while files that are added vs. patched will have the
> permissions set on the source file.

Hi Rob, that's not the behaviour I seem to have found. I added in a new file into the MAR with permissions 666, and then applied the MAR, upon successful update, I went in to check the permissions on the new file and they were set to 644. I used the "add" directive in the update manifests (v2, v3). I'm not sure what's causing this.

Also, do you have any comments on whether this is actually an important issue?
Is there a possibility that this might cause problems while firefox is run after the update has been applied and/or might this result in some kind of security issue?
Regarding whether it is important, it has been using that code in the mar packaging code for quite some time so I don't think there is an issue.
Hi Rob,
So to confirm, the permissions that the files are packaged with inside the MAR doesn't/shouldn't matter?
They do matter. This is why the mar packaging code has the following that calls chmod to prevent permissions from breaking the application.
http://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/test/common.sh#30

If you are including the chmod calls as they currently exist then there should be no problem.
ffledgling and I spoke about various things today, including the permissions issue here. 

The summary was
* per rstrong, .patch file permissions are not important, as the patched file retains its existing permissions
* updatev*.manifests are 0666, but are only used by the updater (doubt they ever hit any disk)
* everything else, which is the files for ADD instructions, have the same permissions in ffledgling's mar file and the production (a mix of 0755 and 0644, aka copy_perm() in common.sh is working fine)
Flags: needinfo?(nthomas)
(In reply to Nick Thomas [:nthomas] from comment #29)
> ffledgling and I spoke about various things today, including the permissions
> issue here. 
> 
> The summary was
> * per rstrong, .patch file permissions are not important, as the patched
> file retains its existing permissions
> * updatev*.manifests are 0666, but are only used by the updater (doubt they
> ever hit any disk)
> * everything else, which is the files for ADD instructions, have the same
> permissions in ffledgling's mar file and the production (a mix of 0755 and
> 0644, aka copy_perm() in common.sh is working fine)

Hmm, okay. For some reason I thought ffledging's test showed that files with an ADD instruction and 0666 permissions somehow ended up as 0644 - that's probably a misread on my part though. Thanks for helping sort through this folks.
Flags: needinfo?(bhearsum)
(In reply to Ben Hearsum [:bhearsum] from comment #30)
> (In reply to Nick Thomas [:nthomas] from comment #29)
> > ffledgling and I spoke about various things today, including the permissions
> > issue here. 
> > 
> > The summary was
> > * per rstrong, .patch file permissions are not important, as the patched
> > file retains its existing permissions
> > * updatev*.manifests are 0666, but are only used by the updater (doubt they
> > ever hit any disk)
> > * everything else, which is the files for ADD instructions, have the same
> > permissions in ffledgling's mar file and the production (a mix of 0755 and
> > 0644, aka copy_perm() in common.sh is working fine)
> 
> Hmm, okay. For some reason I thought ffledging's test showed that files with
> an ADD instruction and 0666 permissions somehow ended up as 0644 - that's
> probably a misread on my part though. Thanks for helping sort through this
> folks.

NOT a misread, they *do* show up as 0644.

Let me clarify. The existing files with the "add" instructions, for example the firefox binary, already have the correct permissions on it (0755 in case of the binary) and thus they apply correctly as well.
However when I went in and added a new file at my end manually with 0666, when the MAR was applied the resulting permissions of that file were changed to 0644.
Assigning to myself to figure out how it fits into the Funsize world.
Assignee: ffledgling → rail
No longer depends on: 950689
Bug 1149142 is the new tracking bug
Blocks: funsize
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: