Create a one-off dump function for recommendation data

RESOLVED FIXED in 2013-12-03

Status

P1
normal
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: clouserw, Assigned: robhudson)

Tracking

2013-12-03
Points:
---

Details

(Whiteboard: [qa-])

(Reporter)

Description

5 years ago
(This bug is actually close to bug 938887 so maybe read that one over too?)

Linas, from Telefonica Labs, is looking for some data so he can start crunching some numbers to give us rockin' recommendations.  From an email he is looking for a JSON output with rows of:

> one-way hashed user id (due to our strict privacy policy)
> id of installed app
> slug of installed app
> timestamp of install
> user's region
> user's language

A daily .tar.gz behind oauth would be ideal (and could probably be mostly copied from that other bug).  If the distribution is time consuming we could figure something else out since this is a one off, but this dump shouldn't be publicly accessible.
(Reporter)

Updated

5 years ago
See Also: → bug 938887
Should this factor in "heart" data from bug 937278 when it lands?
(Reporter)

Comment 2

5 years ago
I'd like to get this out the door, so I'll say no, but we could certainly add more to it in the future.  Especially if we leave the code in to make dumps easy.
(Assignee)

Comment 3

5 years ago
We will use this JSON format for the user dumps:

  {
    "user": "<md5 hash>",
    "region": "br",
    "lang": "pt-BR",
    "installed_apps": [
        {"id": <app id>,
        "slug": <app slug>,
        "installed": "<installed date/time ISO 8601 format in UTC>"
      },
        {"id": 123,
        "slug": "twitter",
        "installed": "2013-11-18T08:50:24"
      },
      ...
    ]
  }
(Assignee)

Updated

5 years ago
Assignee: nobody → robhudson.mozbugs
(Assignee)

Comment 4

5 years ago
Github pull request ready for review:
https://github.com/mozilla/zamboni/pull/1403
(Assignee)

Comment 5

5 years ago
https://github.com/mozilla/zamboni/commit/f7a4745
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Whiteboard: [qa-]
Target Milestone: --- → 2013-12-03
(Reporter)

Comment 6

5 years ago
How does this work?
(Assignee)

Comment 7

5 years ago
./manage.py cron dump_user_installs_cron

I was going to check on it tomorrow to see how it ran but we could run manually. The location of the dumps should be:
DUMPED_USERS_PATH = NETAPP_STORAGE + '/dumped-users'
(Reporter)

Comment 8

5 years ago
How do people get the dump?  Is there oauth?  URLs?
(Assignee)

Comment 9

5 years ago
(In reply to Wil Clouser [:clouserw] from comment #8)
> How do people get the dump?  Is there oauth?  URLs?

Nothing yet. What's the best approach? I thought we might not want URLs since these are somewhat sensitive. There is a VPN machine that was set up which could grab these via SSH/SCP? Or we can push them somewhere?
(Reporter)

Comment 10

5 years ago
looks like this didn't make it to prod so we'll have to wait another week for real data.  I wouldn't mind giving them a dump of -dev though so they could start parsing.  I filed bug 940722 to follow up.
You need to log in before you can comment on or make changes to this bug.