[l10nstats] show percentile on tree status?

RESOLVED INCOMPLETE

Status

Webtools
Elmo
RESOLVED INCOMPLETE
8 months ago
7 months ago

People

(Reporter: Pike, Assigned: Pike)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Assignee)

Description

8 months ago
I've been pondering if there's data in our data that shows us tree goodness and activity.

I'm looking into showing the X percentile on missing strings, aka the 20 means:

The top 20% of our locales have at most this many missing strings.

I'll create something on stage, and leave it up to the PM team to see if that's useful.
(Assignee)

Comment 1

8 months ago
https://l10n.allizom.org/dashboard/tree-status/fx_central?starttime=2017-4-19&hideBad=true&bound=70&percentile=20 is up

You can play with the number by going to the text box for percentile, and go up and down (increments by 10)

If you want to understand more, you can hover over the graph, and it'll show a tooltip, which lists the locales that changed around that time (highlighted box) to good, OK-ish or bad. Beneath that is the percentile missing strings, including the locale that's currently having that many missing strings. That links to teh per-locale-tree graph, so you can see how the locale itself evolved.

Jumps in the graph are usually when the percentile switches from one locale to the other, and you can click on the per-locale links to see what exactly happened.
Assignee: nobody → l10n
Flags: needinfo?(lebedel.delphine)
Flags: needinfo?(francesco.lodolo)
Top locales = locales with less missing strings?
Shaded area = gray area?

One thing that I miss is a data in the tooltip box.

I've pushed strings this morning to mozilla-central because I was tired of waiting, it should have some more interesting data in the next days.
(In reply to Francesco Lodolo [:flod] from comment #2)
> One thing that I miss is a data in the tooltip box.

Data -> date (Italian slipped in)
(Assignee)

Comment 4

8 months ago
(In reply to Francesco Lodolo [:flod] from comment #2)
> Top locales = locales with less missing strings?

Jein. 

The number of locales is constant over time, and sort them by missing strings, and then take the top 20%, and the number of the worst is what we return.

I have no idea how to explain that without an essay. Stas? I stole that from you, can you give a good lengend string?

> Shaded area = gray area?

Yes

> One thing that I miss is a data in the tooltip box.

s/data/date/, yeah, I can add that. There's a ton of polish and nits that can be done on this.

I'm also unhappy about the histogram these days.
(In reply to Axel Hecht [:Pike] from comment #4)

> I have no idea how to explain that without an essay. Stas? I stole that from
> you, can you give a good lengend string?

I have two suggestions:

  a)missing strings of the top X percent of locales
  b) missing strings of the locales in the (100-X)th percentile
(Assignee)

Comment 6

8 months ago
Created attachment 8871974 [details]
something colorful

I've doodled around a bit today, and created this colorful thing.

Basically, each color relates to a constant percentile. The timeline is horizontal, from April 20 to May 20.

color rotating through the scale 4 times, that is.

There's a nice little gradient showing that some locales make good progress compared to their status quo, and some locales keep up to date.

But this graph also shows that we have some locales that just live with a lot of missing strings. That might be mostly devtools, but it's also not a sharp line, I think.

Attaching for two reasons, first, it might help to understand the percentile stuff. The line I showed on the tree history is basically just a line of constant color. The second reason is to see if showing more than one line would be of more value than one.
In the meantime, clarification on the non psychedelic graph.

Left axis: number of locales for both gray and green areas?
Right axis: number of missing strings?

How are the groups at the bottom (7-31, 47-138) defined? I assume they are group of missing strings.
(Assignee)

Comment 8

8 months ago
(In reply to Francesco Lodolo [:flod] from comment #7)
> In the meantime, clarification on the non psychedelic graph.
> 
> Left axis: number of locales for both gray and green areas?
> Right axis: number of missing strings?

Yes

> How are the groups at the bottom (7-31, 47-138) defined? I assume they are
> group of missing strings.

Yes, they're the attempt to show a histogram over missing strings. Because it's unwieldy to do that against a single axis, stas and I back then looked up some clusterer algorithm, and we're showing a linear axis per cluster.

That used to be OK-ish, but it's falling down more and more as more locales are spread out all over the place.

It's really hard to get descent visualizations over bad data.
So I've tried playing around with this for the past weeks, and unless I'm completely mistaken (which is totally possible given I don't think I grasp entirely the data presented here), this is interesting to spot which locales are falling behind, which locales have been inactive for X amount of time, locales we should consider dropping... I'm sure I'm missing relevant stuff, though, and that frustrates me.
BTW is there a way to sort this to see just Fennec vs Desktop locales?
Flags: needinfo?(lebedel.delphine)
(Assignee)

Comment 10

8 months ago
The graph above is just desktop. For fennec, the url would be https://l10n.allizom.org/dashboard/tree-status/fennec_central?starttime=2017-4-19&hideBad=true&bound=70&percentile=30, for example.

The graph looks rather different, and actually the 20 percentile quite often touches 0, and even the 30 percentile is down to a single string.
I'm looking at the graph and see a nice growing green area of the right, which is good. Still, how is "good" determined? I might have missed it, but I have no idea.

Sadly the blue line is not really useful in a world with devtools in Firefox.
Flags: needinfo?(francesco.lodolo)
(Assignee)

Comment 12

7 months ago
Created attachment 8877510 [details] [diff] [review]
archiving patch
(Assignee)

Comment 13

7 months ago
This turned out to me more confusing than helpful, resolving.

If we get to a point where our stats make more sense at large, we might look into this again, but I'm not sure if it'll be needed if the stats make sense ;-)
Status: NEW → RESOLVED
Last Resolved: 7 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.