Extract data to determine impact of COVID-19 on contributions
Categories
(Webtools Graveyard :: Pontoon, task, P2)
Tracking
(Not tracked)
People
(Reporter: flod, Assigned: mathjazz)
Details
It would be interesting to look at the data in Pontoon to see how COVID-19 impact our communities.
Possible data points, looking at a comparison for the period Mar 15-Apr 15 between 2019 and 2020:
- Number of new accounts
- Number of active contributors
- Number of translations submitted
There's probably more. It would also be interesting to have this data organized per region, to see how different areas of the world are impacted.
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Comment 1•5 years ago
•
|
||
I've collected the requested numbers in the spreadsheet:
https://docs.google.com/spreadsheets/d/1kJzAS6cLXUSdaPm7a_oAnG-8zSBHQUbMpsil4rLqIJs/edit?usp=sharing
I used the scripts below.
Let me know if you have more data requests.
import datetime
from pontoon.base.models import *
from django.db.models.functions import TruncMonth
# New User Registrations
users = User.objects.all() \
.annotate(period=TruncMonth("date_joined")) \
.values("period") \
.annotate(count=Count("id")) \
.order_by("period")
for x in users:
date = x["period"]
count = x["count"]
print("{month}/{year},{count}".format(month=date.month, year=date.year, count=count))
# New Translation Submissions
translations = Translation.objects.filter(user__isnull=False) \
.annotate(period=TruncMonth("date")) \
.values("period") \
.annotate(count=Count("id")) \
.order_by("period")
for x in translations:
date = x["period"]
count = x["count"]
print("{month}/{year},{count}".format(month=date.month, year=date.year, count=count))
# New Entity Creations
entities = Entity.objects \
.annotate(period=TruncMonth("date_created")) \
.values("period") \
.annotate(count=Count("id")) \
.order_by("period")
for x in entities:
date = x["period"]
count = x["count"]
print("{month}/{year},{count}".format(month=date.month, year=date.year, count=count))
# Active Users
translations = Translation.objects.filter(user__isnull=False) \
.annotate(period=TruncMonth("date")) \
.values("period", "user_id") \
.annotate(count=Count("user_id")) \
.values('period', 'count') \
.order_by("period")
translations_dict = defaultdict(list)
for x in translations:
translations_dict[x["period"]].append(x["count"])
for y in translations_dict.items():
date = y[0]
count = len(y[1])
print("{month}/{year},{count}".format(month=date.month, year=date.year, count=count))
# New Translation Submissions per Locale
locales = Locale.objects.available().order_by("code")
data = {}
for year in range(2017, datetime.datetime.now().year + 1):
for month in range(1, 13):
if year == datetime.datetime.now().year and month > datetime.datetime.now().month:
continue
data["{}/{}".format(month, year)] = {}
for locale in locales:
translations = Translation.objects.filter(
date__gte=datetime.datetime(2017,1,1),
locale=locale,
user__isnull=False,
) \
.annotate(period=TruncMonth("date")) \
.values("period") \
.annotate(count=Count("id")) \
.order_by("period")
for x in translations:
date = x["period"]
date = "{}/{}".format(date.month, date.year)
count = x["count"]
data[date][locale.code] = count
print("," + ",".join(locales.values_list("code", flat=True)))
for date, values in data.items():
line = date
for l in locales:
line = line + "," + str(values.get(l.code, 0))
print(line)
Comment 2•5 years ago
|
||
How do migrations factor into this data, aka the system user?
| Assignee | ||
Comment 3•5 years ago
|
||
I've excluded those:
Translation.objects.filter(user__isnull=False)
| Reporter | ||
Comment 4•5 years ago
|
||
While we can't do anything about region, could we have this data split by locale?
| Reporter | ||
Updated•5 years ago
|
| Assignee | ||
Comment 5•5 years ago
|
||
We can gather per-locale data for New Translation Submissions and Active Users.
Should we look into all active locales (198) for each month since January 2014 (77)?
| Reporter | ||
Comment 6•5 years ago
|
||
I think we can limit this to the last couple of years, maybe three? 2017-2020 seems like a good amount of data already.
| Assignee | ||
Comment 7•5 years ago
|
||
I've added New Translation Submissions per Locale sheet:
https://docs.google.com/spreadsheets/d/1kJzAS6cLXUSdaPm7a_oAnG-8zSBHQUbMpsil4rLqIJs/edit#gid=1086324160
Comment 8•5 years ago
|
||
Something's weird with the per-locale data. Notably, ms, cak, en-CA,ia, and hye have peaks that I wouldn't expect for any of those languages.
| Reporter | ||
Comment 9•5 years ago
|
||
ms has been abandoned for a long time, when the main contributors left. That seems realistic to me
hye just started, and it's backed out by a school
ia is constantly existing translation updating, and that's one person doing the work (also active in Italian)
I can't tell much about en-CA, possibly a new project enabled, or cak (activity is not really consistent there).
Comment 10•5 years ago
|
||
These locales have across times and locales the most contributions per month. Malay for example clocks in at 14283 for September 2017. That's the top-most spike in the chart I left in the google doc.
| Reporter | ||
Comment 11•5 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #10)
These locales have across times and locales the most contributions per month. Malay for example clocks in at 14283 for September 2017. That's the top-most spike in the chart I left in the google doc.
Being the person who does all the sign-offs, that doesn't surprise me at all. There was a period where the translation was constantly rewritten.
| Reporter | ||
Comment 12•5 years ago
|
||
One more data point: can we get out the number of new strings added per month? I realized that the number of submitted translations alone is not that helpful. No need for that data to be split by locale.
| Comment hidden (off-topic) |
| Comment hidden (off-topic) |
| Assignee | ||
Comment 15•5 years ago
•
|
||
I've updated the sheet and script with the "New Entity Creations" section.
Note that we've only been collecting Entity.date_created for the last 3 years.
| Reporter | ||
Comment 16•5 years ago
|
||
I've started playing with another spreadsheet, and adding graphs
https://docs.google.com/spreadsheets/d/16ycqQG2NHTlW9PtSRdarK-0MfyTL_DkGyZiiv7hLXBQ/edit?usp=sharing
Not exactly sure how to use the number of added strings to weight the number of translations submitted.
Please make a copy if you want to make tests on your own.
Also added scripts to my repository
https://github.com/flodolo/scripts/pull/6
Let's close this bug and open a new one if we need more data. Thanks Matjaz!
Comment 17•4 years ago
|
||
It's been about a year since this bug was closed, could you run this again and incorporate the data from the last 11 months into https://docs.google.com/spreadsheets/d/1kJzAS6cLXUSdaPm7a_oAnG-8zSBHQUbMpsil4rLqIJs/edit#gid=0 please?
| Assignee | ||
Comment 18•4 years ago
|
||
Updated the sheet.
Updated•4 years ago
|
Description
•