Closed Bug 1687529 Opened 3 years ago Closed 3 years ago

Update Unreviewed suggestion lifespan section in the Community health dashboard (spec)

Categories

(Webtools Graveyard :: Pontoon, defect, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mathjazz, Assigned: flod)

References

Details

Attachments

(2 files)

The Unreviewed suggestion lifespan chart in the Community health dashboard shows "the average age of the unreviewed suggestions in a particular month" (spec).

The original idea was different and is summarized by the (now bogus) content in the information tooltip of the chart: "How much time it takes on average to review a suggestion." We also have a script that retrieves such data.

This bug tracks updates to the spec, which will define what data exactly do we need to collect. In the followup step, we'll then implement these changes in the codebase.

The goal is to get a sense of how long contributors need to wait before someone reviews their suggestions, and how that waiting time evolves over time. The real challenge is which data to use to get this information, in particular data that can be plot over time on a monthly basis.

Potential data points

1 - Suggestions added in month

We could look at suggestions added in a specific month, and display on average how long it took for them to be reviewed. For example, in January:

  • Look at all suggestions added in January.
  • If they get reviewed, how long did it take?

Problems:

  • This doesn't tell us anything about suggestions submitted before January.
  • What if a suggestion is added in January but not reviewed? Should we consider the days till the end of the month?

2 - Suggestions reviewed in month

Alternative approach: look at reviews happening in a specific month, e.g. January. How old on average were those translations?

This approach has similar limitations to the previous one:

  • What if suggestions get never reviewed? The graph would be completely flat, and somehow look "healthy".
  • It doesn't really tell us how things are evolving month over month, especially if a locale has accumulated a lot of old suggestions.

3 - Average lifetime of reviewed suggestions over last X months

What if we look at the average age of reviewed suggestions for the X months before?

For example:

  • In January 2021 we look at translations submitted during the previous 12 months (2020-01-31 -> 2021-01-31). We exclude those that are self-approved.
  • For the remaining translations: if they're approved or rejected (but not self-rejected), look at how long it took. If they're not, consider 365 days (i.e. the full period).
  • Calculate the average and display it over time.

This has the benefit of looking beyond the single month (12 months is arbitrary, it could be 6). It still has the limitation of not providing much information beyond the 24 months (12 + 12 look-behind)

4 - Average lifetime of pending unreviewed suggestions

That's the data we're currently displaying. It could be used in combination with the previous data point to give us some additional info.

For example, for Italian it would show that there are projects that are not actively maintained. Unfortunately, I don't think we can plot them on the same graph: data point 3 has a worst case scenario of 365, we have locales where the Y axis for this graph reaches beyond 1 thousand days :-|.

Proposal

My proposal would be to have two separate graphs.

a) Keep the current graph, rename the title to "Lifetime of currently unreviewed suggestions" and its infobox to "How old are on average the suggestions currently unreviewed".

b) Add a new graph using data point 3 (with 12 months). Title "Review turnaround", infobox "How much time it took on average to review the suggestions submitted over the previous 12 months".

(In reply to Francesco Lodolo [:flod] from comment #1)

Proposal

My proposal would be to have two separate graphs.

a) Keep the current graph, rename the title to "Lifetime of currently unreviewed suggestions" and its infobox to "How old are on average the suggestions currently unreviewed".

b) Add a new graph using data point 3 (with 12 months). Title "Review turnaround", infobox "How much time it took on average to review the suggestions submitted over the previous 12 months".

I generally agree with the proposal, but have nits on how we name and describe the graphs. For the current graph, I suggest we rename it "Average age of currently unreviewed suggestions". I prefer using "age" because it denotes the issue at hand: these suggestions are old. "Lifetime" implies a more complex chronology with an anticipated "death" event.

For the new graph, I suggest titling it "Time to review suggestions" with the infobox saying, "The average number of days (based on data from the last 12 months) a contributor can expect to wait for their translation suggestions to be reviewed by managers and translators." I feel like this makes the desired behavior from the community more clear and should a contributor glance at the Insights tab, they can walk away knowing how long they'll have to wait for their contributions to be reviewed.

Calculating "How much time it takes on average to review a suggestion" is hard, because suggestions might never get reviewed. Which means the average time to review any set of suggestions that includes unreviewed suggestions is infinity.

That's why I think unreviewed suggestions should be excluded from the calculations for the b) chart and not assumed to be reviewed in the length of the interval as proposed in point 3 (365 days), which would be mathematically incorrect.

For example: Say we're looking at the [2020-01-31, 2021-01-31] interval, and find an unreviewed suggestion submitted on 2021-01-31. Why should we assume it will take 365 days to review it? The only thing we can assume is that it will take the average amount of time to review it, which means it should be excluded from the calculation of the average (how meta!).

--

If we exclude unreviewed suggestions, point 3 effectively starts to sound like a rolling average of point 2. I wonder if we should plot both. 2 will be an early indicator of changes and 3 will highlight longer-term trends and predict future.

--

On a more general note, a dashboard as a whole should be used as a signal of community health, individual charts not so much. If your time to review is low, other charts will tell you if that's because you review fast or because you don't review at all.

I disagree on the last point. For example, the Review Activity tells me if a locale is reviewing, but waiting 1 day vs 30 days provides a completely different experience, and that wouldn't show up on the graphs.

I'm OK with your suggestion to ignore unreviewed suggestions in point 3, since it felt hacky to start with. As for adding point 2, I think it makes sense. I expect the axes to be potentially on very different scales, so it might make sense to use histogram bars for point 2.

So, an updated proposal could look like this.

Graph 1

Maintain the current graph, but update title and infobox.

Title
Average age of currently unreviewed suggestions

Info
How old are on average the suggestions currently unreviewed

Graph 2

Add a new graph.

Title
Time to review suggestions

Info
The average number of days a contributor can expect to wait for their translation suggestions to be reviewed by managers and translators.

  • Average 12 months: average age of suggestions reviewed over the previous 12 months
  • Monthly reviews: average age of suggestions reviewed during the specific month

Average 12 months: look at translations rejected or approved in the previous 12 months (but not self-approved or self-rejected), display average age as a line over time.
Monthly review: look at reviews happening during the specific month, display average age as an histogram per month.

How does this sound?

I like the sound of that.

Would it be helpful to generate the charts for a handful representative locales (which?) and see how helpful the charts turn out to be?

(In reply to Matjaz Horvat [:mathjazz] from comment #5)

Would it be helpful to generate the charts for a handful representative locales (which?) and see how helpful the charts turn out to be?

Yeah, it might helpful. Three good candidates could be it, de, he.

The new proposal sounds great! The only thing I would suggest to change is the title of the 2nd graph: Average time of suggestions reviewed

(In reply to Peiying Mo [:CocoMo] from comment #7)

The new proposal sounds great! The only thing I would suggest to change is the title of the 2nd graph: Average time of suggestions reviewed

See comment 2 from Jeff for the current title. I don't think "Time of suggestions" works here, because it's either how old a suggestion is (and that would be "age", covered by the first graph), or how much time the review took (associated to the action of reviewing, not the suggestion itself).

Sorry for responding a bit late to this.
Some notes:

  • I agree with the new proposal
  • I agree with Jeff's proposal to use "Time to review suggestions" in order to better reflect what we want in the new graph
  • It would indeed be helpful to generate charts

@Matjaz
Did you have a chance to check the graphs?

Flags: needinfo?(m)

This is data for Jan 2020 - Dec 2020 for the monthly reviews histogram (average age [in days] of suggestions reviewed during the specific month):

  • it: 11, 4, 6, 17, 12, 4, 81, 4, 19, 5, 22, 12
  • de: 195, 199, 72, 24, 35, 19, 34, 273, 444, 408, 146, 3
  • he: 53, 425, 18, 80, 60, 533, 349, 44, 1035, 56, 256, 277

We only started collecting ActionLog data on Jan 1, 2020 - so we can't use it to calculate the rolling average for 12 months (the line chart). I should be able to retrieve past data from the Translations table. Let me know if you'd like to preview that data, too.

Flags: needinfo?(m)
Attached file Histograms

Note that the labels on the x-axis should be Jan - Dec.

(In reply to Matjaz Horvat [:mathjazz] from comment #11)

I should be able to retrieve past data from the Translations table. Let me know if you'd like to preview that data, too.

If it's not too much work, I think it would be very helpful. Maybe just look at Hebrew and Italian, since they have wildly different scales on the Y-axis. I somehow expect the line trend to have a scale that will flatten the graph (or the other way around), and I'm trying to understand if we need two Y-axes with different scales.

12-month average:

it: 23, 23, 25, 28, 29, 27, 26, 15, 16, 15, 15, 15
de: 34, 51, 57, 58, 47, 47, 48, 48, 80, 95, 103, 106
he: 330, 345, 362, 357, 332, 324, 359, 369, 371, 528, 511, 505

Sadly the data coming from the Translation table produces different results for the monthly histogram. I haven't been able to identify the root cause of the discrepancy yet.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Summary: Update Unreviewed suggestion lifespan section in the Community health dashboard → Update Unreviewed suggestion lifespan section in the Community health dashboard (spec)
Blocks: 1691770
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: