Closed Bug 986226 (Glow:JSONSpec) Opened 12 years ago Closed 11 years ago

Spec Json Files

Categories

(Websites :: glow.mozilla.org, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bensternthal, Assigned: aalmossawi)

References

Details

Ali: Pretty early on we would like some documentation on the various JSON files and their format for the visualizations. Knowing what you need to consume informs a lot on the backend. You will need to work with Pmac on this. If we complete this next week we will be in good shape.
Sure, I sent pmac some prose a few days ago that covered the breadth of data categories that I expect we will need. I plan to send him something more concrete, i.e. a json spec, on Monday.
This will have to wait until tomorrow as the stats page hasn't been finalized yet.
My start at a data model spec, JSON will inform this pretty much completely, so let's use this for both things: https://etherpad.mozilla.org/glow-backend-data-model
pmac, Things appear to be simpler than anticipated, given what the near-final version of the stats page looks like, which no longer includes time-series data. Here are the data components, as I see them: 1. Map data. A list of coordinates for which there are downloads at this tick. We would also need to maintain a count of total downloads for incrementing the top-most counter as well as continent-level data split by priority. So for instance: (A) { "downloads_total": 12476232, "downloads_geo": [ { "long": -122.419416, "lat": 37.774929, "count": 254 }, { "long": -122.419416, "lat": 37.774929, "count": 254 }, { "long": -122.419416, "lat": 37.774929, "count": 254 } ] } (B) { "privacy": [ { "continent": "AFRICA", "count": 0.10 }, { "country": "ASIA", "count": 0.30 }, { "country": "AUSTRALIA", "count": 0.10 }, { "country": "EUROPE", "count": 0.25 }, { "country": "NOAM", "count": 0.20 }, { "country": "SOAM", "count": 0.05 } ], "freedom": [ { "continent": "AFRICA", "count": 0.10 }, ... ] } All count values are percentages. Here is a screen from Sean's mockup for reference: http://mozilla.seanmartell.com/glow/index.php?directory=.&currentPic=5 2. Stats data. If it is cheap to do operations on your end, it would be ideal to have data in the following two formats, i.e. split by country (including global) and split by priority. Values would all be percentages and we're only showing country-level data here. (A) { "GLOBAL": { "privacy": 0.28, "opportunity": 0.24, "accessibility": 0.06, "freedom": 0.15, "education": 0.15, "communication": 0.12 }, "US": { "privacy": 0.32, "opportunity": 0.2, "accessibility": 0.08, "freedom": 0.15, "education": 0.15, "communication": 0.1 } } (B) { "privacy": [ { "country": "US", "count": 0.32 }, { "country": "DE", "count": 0.28 }, { "country": "CA", "count": 0.26 } ], "freedom": [ { "country": "AL", "count": 0.67 }, { "country": "GB", "count": 0.64 }, { "country": "ES", "count": 0.61 } ] } Here is Sean's latest mockup for reference: http://cl.ly/image/2V3E1V1c0O47 I'd be interested in hearing your thoughts.
Ali, This all looks pretty great. We can do any splitting you need on the server side I think. For the downloads geo coordinates data, I think we need another layer right? Each file will be for a minute of data, but we'll need the animation to move faster than that. So won't you need it split into smaller buckets? Like every 1 or 5 seconds? Also, are we going to get as granular as LAT, LON, or are we going to just use the "city" value? That might be too course, but the LAT, LON may well be too fine a grain. It's possible we could do some geo maths to bundle up pings that are within X miles of each other, but that seems a bit more complex and error prone than we likely want to get with this. Thoughts?
Flags: needinfo?(aalmossawi)
I don't think we need intermediate points between minutes in the data itself, it should be fairly trivial to interpolate for animation purposes, and give much better performance than putting 1-5 second resolution data in the json. The way I interpreted the lat and long was that would be the lat and long of the city that the downloads were identified with, to make it easy to visualize on the map. Does that sound right?
Indeed, to both your points, the thought was as Josh said. Since there's no need to identify individual cities on the map, we'd simply be scaling the lat/long coordinates to our map and showing animated circles at those locations.
Flags: needinfo?(aalmossawi)
You both may be right. I'll take a 2nd look at the maxmind db docs to make sure of what the lat, lon really indicates. And I'm all for the simplest and least voluminous data possible. Doing just counts for the minute works just fine for me :)
Will we also need download aggregates per continent or trending downloads? Or is download data purely for the dots on the map and the total count?
Flags: needinfo?(aalmossawi)
It's purely for the dots on the map and the total count, per the latest mockup that I've seen, which I believe is the final one.
Flags: needinfo?(aalmossawi)
Since the "downloads_total" and "downloads_geo" actually mean "interactions", I suggest changing those keys to "map_total" and "map_geo". Also, all of the main keys in the spec are in the same file right? I'm only planning on producing a single file per minute.
Flags: needinfo?(aalmossawi)
Sure, that sounds reasonable. And, yes, they could all be in the same file. Thanks.
Flags: needinfo?(aalmossawi)
pmac, The sample json that you sent me works great! That ought to allow us to have a finished Stats page by the end of today. Here are a few minor comments: 1. Could you please change "accessibility" to "access" and "education" to "learning". 2. Could we have the values for country_issues.GLOBAL be different just for demo purposes. 3. I realize that it's all dummy data, but just a comment to confirm that the values for an individual country's choices add up to 100%, since I noticed that the total is off by one or so for some countries. For the glows on the map, what are your thoughts on consolidating coordinates that are geographically close to each other. We would ultimately need to have a smaller set to work with so I was thinking we either 1) consolidate the set of coordinates in the json, or 2) do something fancy in the front-end. I'd opt for the former if at all possible.
Flags: needinfo?(pmac)
(In reply to Ali Almossawi from comment #13) > 2. Could we have the values for country_issues.GLOBAL be different just for > demo purposes. I think so... the problem is that apparently the random.choice function in Python is too egalitarian with its choices :) > 3. I realize that it's all dummy data, but just a comment to confirm that > the values for an individual country's choices add up to 100%, since I > noticed that the total is off by one or so for some countries. That's likely a rounding error... hmm.... Perhaps allowing for 4 digits instead of 2 would clear it up? > For the glows on the map, what are your thoughts on consolidating > coordinates that are geographically close to each other. We would ultimately > need to have a smaller set to work with so I was thinking we either 1) > consolidate the set of coordinates in the json, or 2) do something fancy in > the front-end. I'd opt for the former if at all possible. Hmm... I thought they were supposed to already be a bit consolidated according to the Maxmind docs. It's possible we could do some geo math to further consolidate. How much further do we need?
Flags: needinfo?(pmac) → needinfo?(aalmossawi)
Commit pushed to master at https://github.com/mozilla/smithers https://github.com/mozilla/smithers/commit/1574f16c0808e51161bc30e5d80d4fef30cf2be9 Bug 986226: Change some data formatting things from the bug. 1. Change issue names. 2. Change debug data randomness to try to weight some fields. 3. Round percentages to 4 digits to hopefully avoid percentages not adding to 100.
One more thing, could the choices be reordered so that they appear as they do on the website, i.e. privacy opportunity access freedom learning control
Flags: needinfo?(aalmossawi)
(In reply to Ali Almossawi from comment #17) > One more thing, could the choices be reordered so that they appear as they > do on the website, i.e. Hmm... For most of the data these keys appear in objects, and JSON objects are represented in python by a dict, which has no predictable key order. I think it'd be significant work to make it output those keys in a specific order. If it's important I can certainly try though.
Not a problem, I'll make it work.
Two more questions for you, please: 1. Some countries have missing entries, e.g. Yemen (YE) doesn't have a value for privacy 2. Would it be possible to sort map_geo on count, descending? If not, I can do it; it's just that I'd prefer to do minimal operations on the data if at all possible.
Actually, never mind the second point.
I think this one is closed. If it turns out we need to tweak the JSON files or format please re-open.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.