Closed Bug 938915 Opened 11 years ago Closed 11 years ago

Create a one-off dump function for recommendation data

Categories

(Marketplace Graveyard :: API, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
2013-12-03

People

(Reporter: clouserw, Assigned: robhudson)

References

Details

(Whiteboard: [qa-])

(This bug is actually close to bug 938887 so maybe read that one over too?)

Linas, from Telefonica Labs, is looking for some data so he can start crunching some numbers to give us rockin' recommendations.  From an email he is looking for a JSON output with rows of:

> one-way hashed user id (due to our strict privacy policy)
> id of installed app
> slug of installed app
> timestamp of install
> user's region
> user's language

A daily .tar.gz behind oauth would be ideal (and could probably be mostly copied from that other bug).  If the distribution is time consuming we could figure something else out since this is a one off, but this dump shouldn't be publicly accessible.
See Also: → 938887
Should this factor in "heart" data from bug 937278 when it lands?
I'd like to get this out the door, so I'll say no, but we could certainly add more to it in the future.  Especially if we leave the code in to make dumps easy.
We will use this JSON format for the user dumps:

  {
    "user": "<md5 hash>",
    "region": "br",
    "lang": "pt-BR",
    "installed_apps": [
        {"id": <app id>,
        "slug": <app slug>,
        "installed": "<installed date/time ISO 8601 format in UTC>"
      },
        {"id": 123,
        "slug": "twitter",
        "installed": "2013-11-18T08:50:24"
      },
      ...
    ]
  }
Assignee: nobody → robhudson.mozbugs
Github pull request ready for review:
https://github.com/mozilla/zamboni/pull/1403
https://github.com/mozilla/zamboni/commit/f7a4745
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [qa-]
Target Milestone: --- → 2013-12-03
How does this work?
./manage.py cron dump_user_installs_cron

I was going to check on it tomorrow to see how it ran but we could run manually. The location of the dumps should be:
DUMPED_USERS_PATH = NETAPP_STORAGE + '/dumped-users'
How do people get the dump?  Is there oauth?  URLs?
(In reply to Wil Clouser [:clouserw] from comment #8)
> How do people get the dump?  Is there oauth?  URLs?

Nothing yet. What's the best approach? I thought we might not want URLs since these are somewhat sensitive. There is a VPN machine that was set up which could grab these via SSH/SCP? Or we can push them somewhere?
looks like this didn't make it to prod so we'll have to wait another week for real data.  I wouldn't mind giving them a dump of -dev though so they could start parsing.  I filed bug 940722 to follow up.
You need to log in before you can comment on or make changes to this bug.