Closed Bug 1629968 Opened 5 years ago Closed 4 years ago

Extract data to determine impact of COVID-19 on contributions

Categories

(Webtools Graveyard :: Pontoon, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: flod, Assigned: mathjazz)

Details

It would be interesting to look at the data in Pontoon to see how COVID-19 impact our communities.

Possible data points, looking at a comparison for the period Mar 15-Apr 15 between 2019 and 2020:

  • Number of new accounts
  • Number of active contributors
  • Number of translations submitted

There's probably more. It would also be interesting to have this data organized per region, to see how different areas of the world are impacted.

Assignee: nobody → m
Status: NEW → ASSIGNED
Priority: -- → P2

I've collected the requested numbers in the spreadsheet:
https://docs.google.com/spreadsheets/d/1kJzAS6cLXUSdaPm7a_oAnG-8zSBHQUbMpsil4rLqIJs/edit?usp=sharing

I used the scripts below.

Let me know if you have more data requests.

import datetime
from pontoon.base.models import *
from django.db.models.functions import TruncMonth

# New User Registrations

users = User.objects.all() \
    .annotate(period=TruncMonth("date_joined")) \
    .values("period") \
    .annotate(count=Count("id")) \
    .order_by("period")

for x in users:
    date = x["period"]
    count = x["count"]
    print("{month}/{year},{count}".format(month=date.month, year=date.year, count=count))


# New Translation Submissions

translations = Translation.objects.filter(user__isnull=False) \
    .annotate(period=TruncMonth("date")) \
    .values("period") \
    .annotate(count=Count("id")) \
    .order_by("period")

for x in translations:
    date = x["period"]
    count = x["count"]
    print("{month}/{year},{count}".format(month=date.month, year=date.year, count=count))


# New Entity Creations

entities = Entity.objects \
    .annotate(period=TruncMonth("date_created")) \
    .values("period") \
    .annotate(count=Count("id")) \
    .order_by("period")

for x in entities:
    date = x["period"]
    count = x["count"]
    print("{month}/{year},{count}".format(month=date.month, year=date.year, count=count))


# Active Users

translations = Translation.objects.filter(user__isnull=False) \
    .annotate(period=TruncMonth("date")) \
    .values("period", "user_id") \
    .annotate(count=Count("user_id")) \
    .values('period', 'count') \
    .order_by("period")

translations_dict = defaultdict(list)
for x in translations:
    translations_dict[x["period"]].append(x["count"])

for y in translations_dict.items():
    date = y[0]
    count = len(y[1])
    print("{month}/{year},{count}".format(month=date.month, year=date.year, count=count))


# New Translation Submissions per Locale

locales = Locale.objects.available().order_by("code")
data = {}
for year in range(2017, datetime.datetime.now().year + 1):
    for month in range(1, 13):
        if year == datetime.datetime.now().year and month > datetime.datetime.now().month:
            continue
        data["{}/{}".format(month, year)] = {}

for locale in locales:
    translations = Translation.objects.filter(
        date__gte=datetime.datetime(2017,1,1),
        locale=locale,
        user__isnull=False,
    ) \
        .annotate(period=TruncMonth("date")) \
        .values("period") \
        .annotate(count=Count("id")) \
        .order_by("period")
    for x in translations:
        date = x["period"]
        date = "{}/{}".format(date.month, date.year)
        count = x["count"]
        data[date][locale.code] = count

print("," + ",".join(locales.values_list("code", flat=True)))
for date, values in data.items():
    line = date
    for l in locales:
        line = line + "," + str(values.get(l.code, 0))
    print(line)

How do migrations factor into this data, aka the system user?

I've excluded those:

Translation.objects.filter(user__isnull=False)

While we can't do anything about region, could we have this data split by locale?

Flags: needinfo?(m)

We can gather per-locale data for New Translation Submissions and Active Users.

Should we look into all active locales (198) for each month since January 2014 (77)?

Flags: needinfo?(m)

I think we can limit this to the last couple of years, maybe three? 2017-2020 seems like a good amount of data already.

Something's weird with the per-locale data. Notably, ms, cak, en-CA,ia, and hye have peaks that I wouldn't expect for any of those languages.

ms has been abandoned for a long time, when the main contributors left. That seems realistic to me
hye just started, and it's backed out by a school
ia is constantly existing translation updating, and that's one person doing the work (also active in Italian)
I can't tell much about en-CA, possibly a new project enabled, or cak (activity is not really consistent there).

These locales have across times and locales the most contributions per month. Malay for example clocks in at 14283 for September 2017. That's the top-most spike in the chart I left in the google doc.

(In reply to Axel Hecht [:Pike] from comment #10)

These locales have across times and locales the most contributions per month. Malay for example clocks in at 14283 for September 2017. That's the top-most spike in the chart I left in the google doc.

Being the person who does all the sign-offs, that doesn't surprise me at all. There was a period where the translation was constantly rewritten.

One more data point: can we get out the number of new strings added per month? I realized that the number of submitted translations alone is not that helpful. No need for that data to be split by locale.

I've updated the sheet and script with the "New Entity Creations" section.

Note that we've only been collecting Entity.date_created for the last 3 years.

I've started playing with another spreadsheet, and adding graphs
https://docs.google.com/spreadsheets/d/16ycqQG2NHTlW9PtSRdarK-0MfyTL_DkGyZiiv7hLXBQ/edit?usp=sharing

Not exactly sure how to use the number of added strings to weight the number of translations submitted.

Please make a copy if you want to make tests on your own.

Also added scripts to my repository
https://github.com/flodolo/scripts/pull/6

Let's close this bug and open a new one if we need more data. Thanks Matjaz!

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

It's been about a year since this bug was closed, could you run this again and incorporate the data from the last 11 months into https://docs.google.com/spreadsheets/d/1kJzAS6cLXUSdaPm7a_oAnG-8zSBHQUbMpsil4rLqIJs/edit#gid=0 please?

Status: RESOLVED → REOPENED
Flags: needinfo?(m)
Resolution: FIXED → ---

Updated the sheet.

Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Flags: needinfo?(m)
Resolution: --- → FIXED
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.