Closed
Bug 986226
(Glow:JSONSpec)
Opened 12 years ago
Closed 11 years ago
Spec Json Files
Categories
(Websites :: glow.mozilla.org, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bensternthal, Assigned: aalmossawi)
References
Details
Ali:
Pretty early on we would like some documentation on the various JSON files and their format for the visualizations. Knowing what you need to consume informs a lot on the backend.
You will need to work with Pmac on this. If we complete this next week we will be in good shape.
Reporter | ||
Updated•12 years ago
|
Blocks: Glow:JsonToFolder
Assignee | ||
Comment 1•12 years ago
|
||
Sure, I sent pmac some prose a few days ago that covered the breadth of data categories that I expect we will need. I plan to send him something more concrete, i.e. a json spec, on Monday.
Assignee | ||
Comment 2•11 years ago
|
||
This will have to wait until tomorrow as the stats page hasn't been finalized yet.
Comment 3•11 years ago
|
||
My start at a data model spec, JSON will inform this pretty much completely, so let's use this for both things:
https://etherpad.mozilla.org/glow-backend-data-model
Assignee | ||
Comment 4•11 years ago
|
||
pmac,
Things appear to be simpler than anticipated, given what the near-final version of the stats page looks like, which no longer includes time-series data. Here are the data components, as I see them:
1. Map data. A list of coordinates for which there are downloads at this tick. We would also need to maintain a count of total downloads for incrementing the top-most counter as well as continent-level data split by priority. So for instance:
(A)
{
"downloads_total": 12476232,
"downloads_geo": [
{
"long": -122.419416,
"lat": 37.774929,
"count": 254
},
{
"long": -122.419416,
"lat": 37.774929,
"count": 254
},
{
"long": -122.419416,
"lat": 37.774929,
"count": 254
}
]
}
(B)
{
"privacy": [
{
"continent": "AFRICA",
"count": 0.10
},
{
"country": "ASIA",
"count": 0.30
},
{
"country": "AUSTRALIA",
"count": 0.10
},
{
"country": "EUROPE",
"count": 0.25
},
{
"country": "NOAM",
"count": 0.20
},
{
"country": "SOAM",
"count": 0.05
}
],
"freedom": [
{
"continent": "AFRICA",
"count": 0.10
},
...
]
}
All count values are percentages. Here is a screen from Sean's mockup for reference: http://mozilla.seanmartell.com/glow/index.php?directory=.¤tPic=5
2. Stats data. If it is cheap to do operations on your end, it would be ideal to have data in the following two formats, i.e. split by country (including global) and split by priority. Values would all be percentages and we're only showing country-level data here.
(A)
{
"GLOBAL": {
"privacy": 0.28,
"opportunity": 0.24,
"accessibility": 0.06,
"freedom": 0.15,
"education": 0.15,
"communication": 0.12
},
"US": {
"privacy": 0.32,
"opportunity": 0.2,
"accessibility": 0.08,
"freedom": 0.15,
"education": 0.15,
"communication": 0.1
}
}
(B)
{
"privacy": [
{
"country": "US",
"count": 0.32
},
{
"country": "DE",
"count": 0.28
},
{
"country": "CA",
"count": 0.26
}
],
"freedom": [
{
"country": "AL",
"count": 0.67
},
{
"country": "GB",
"count": 0.64
},
{
"country": "ES",
"count": 0.61
}
]
}
Here is Sean's latest mockup for reference: http://cl.ly/image/2V3E1V1c0O47
I'd be interested in hearing your thoughts.
Comment 5•11 years ago
|
||
Ali,
This all looks pretty great. We can do any splitting you need on the server side I think.
For the downloads geo coordinates data, I think we need another layer right? Each file will be for a minute of data, but we'll need the animation to move faster than that. So won't you need it split into smaller buckets? Like every 1 or 5 seconds? Also, are we going to get as granular as LAT, LON, or are we going to just use the "city" value? That might be too course, but the LAT, LON may well be too fine a grain. It's possible we could do some geo maths to bundle up pings that are within X miles of each other, but that seems a bit more complex and error prone than we likely want to get with this.
Thoughts?
Flags: needinfo?(aalmossawi)
Comment 6•11 years ago
|
||
I don't think we need intermediate points between minutes in the data itself, it should be fairly trivial to interpolate for animation purposes, and give much better performance than putting 1-5 second resolution data in the json. The way I interpreted the lat and long was that would be the lat and long of the city that the downloads were identified with, to make it easy to visualize on the map. Does that sound right?
Assignee | ||
Comment 7•11 years ago
|
||
Indeed, to both your points, the thought was as Josh said. Since there's no need to identify individual cities on the map, we'd simply be scaling the lat/long coordinates to our map and showing animated circles at those locations.
Flags: needinfo?(aalmossawi)
Comment 8•11 years ago
|
||
You both may be right. I'll take a 2nd look at the maxmind db docs to make sure of what the lat, lon really indicates. And I'm all for the simplest and least voluminous data possible. Doing just counts for the minute works just fine for me :)
Comment 9•11 years ago
|
||
Will we also need download aggregates per continent or trending downloads? Or is download data purely for the dots on the map and the total count?
Flags: needinfo?(aalmossawi)
Assignee | ||
Comment 10•11 years ago
|
||
It's purely for the dots on the map and the total count, per the latest mockup that I've seen, which I believe is the final one.
Flags: needinfo?(aalmossawi)
Comment 11•11 years ago
|
||
Since the "downloads_total" and "downloads_geo" actually mean "interactions", I suggest changing those keys to "map_total" and "map_geo".
Also, all of the main keys in the spec are in the same file right? I'm only planning on producing a single file per minute.
Flags: needinfo?(aalmossawi)
Assignee | ||
Comment 12•11 years ago
|
||
Sure, that sounds reasonable. And, yes, they could all be in the same file. Thanks.
Flags: needinfo?(aalmossawi)
Assignee | ||
Comment 13•11 years ago
|
||
pmac,
The sample json that you sent me works great! That ought to allow us to have a finished Stats page by the end of today. Here are a few minor comments:
1. Could you please change "accessibility" to "access" and "education" to "learning".
2. Could we have the values for country_issues.GLOBAL be different just for demo purposes.
3. I realize that it's all dummy data, but just a comment to confirm that the values for an individual country's choices add up to 100%, since I noticed that the total is off by one or so for some countries.
For the glows on the map, what are your thoughts on consolidating coordinates that are geographically close to each other. We would ultimately need to have a smaller set to work with so I was thinking we either 1) consolidate the set of coordinates in the json, or 2) do something fancy in the front-end. I'd opt for the former if at all possible.
Flags: needinfo?(pmac)
Comment 14•11 years ago
|
||
(In reply to Ali Almossawi from comment #13)
> 2. Could we have the values for country_issues.GLOBAL be different just for
> demo purposes.
I think so... the problem is that apparently the random.choice function in Python is too egalitarian with its choices :)
> 3. I realize that it's all dummy data, but just a comment to confirm that
> the values for an individual country's choices add up to 100%, since I
> noticed that the total is off by one or so for some countries.
That's likely a rounding error... hmm.... Perhaps allowing for 4 digits instead of 2 would clear it up?
> For the glows on the map, what are your thoughts on consolidating
> coordinates that are geographically close to each other. We would ultimately
> need to have a smaller set to work with so I was thinking we either 1)
> consolidate the set of coordinates in the json, or 2) do something fancy in
> the front-end. I'd opt for the former if at all possible.
Hmm... I thought they were supposed to already be a bit consolidated according to the Maxmind docs. It's possible we could do some geo math to further consolidate. How much further do we need?
Flags: needinfo?(pmac) → needinfo?(aalmossawi)
Comment 15•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/smithers
https://github.com/mozilla/smithers/commit/1574f16c0808e51161bc30e5d80d4fef30cf2be9
Bug 986226: Change some data formatting things from the bug.
1. Change issue names.
2. Change debug data randomness to try to weight some fields.
3. Round percentages to 4 digits to hopefully avoid percentages
not adding to 100.
Comment 16•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/smithers
https://github.com/mozilla/smithers/commit/583428eef5638f37c7d3c2bc257f1b09149fa6eb
Bug 986226: Weight random issues by position.
Assignee | ||
Comment 17•11 years ago
|
||
One more thing, could the choices be reordered so that they appear as they do on the website, i.e.
privacy
opportunity
access
freedom
learning
control
Flags: needinfo?(aalmossawi)
Comment 18•11 years ago
|
||
(In reply to Ali Almossawi from comment #17)
> One more thing, could the choices be reordered so that they appear as they
> do on the website, i.e.
Hmm... For most of the data these keys appear in objects, and JSON objects are represented in python by a dict, which has no predictable key order. I think it'd be significant work to make it output those keys in a specific order. If it's important I can certainly try though.
Assignee | ||
Comment 19•11 years ago
|
||
Not a problem, I'll make it work.
Assignee | ||
Comment 20•11 years ago
|
||
Two more questions for you, please:
1. Some countries have missing entries, e.g. Yemen (YE) doesn't have a value for privacy
2. Would it be possible to sort map_geo on count, descending? If not, I can do it; it's just that I'd prefer to do minimal operations on the data if at all possible.
Assignee | ||
Comment 21•11 years ago
|
||
Actually, never mind the second point.
Reporter | ||
Comment 22•11 years ago
|
||
I think this one is closed. If it turns out we need to tweak the JSON files or format please re-open.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•