Need to get stats on how many users have updated the new MDN profiles

VERIFIED FIXED

Status

Data & BI Services Team
DB: MySQL
--
blocker
VERIFIED FIXED
6 years ago
3 years ago

People

(Reporter: jay, Assigned: mpressman)

Tracking

Details

(URL)

(Reporter)

Description

6 years ago
As part of our Q3 goals reporting, I need to get some stats on how many users have updated their profile to take advantage of the new MDN profiles (e.g. https://developer.mozilla.org/en-US/profiles/Jay/)

We might need to do a raw SQL query to see if certain fields have been populated (that did not exist before this change).

Les / Luke:  Any details you can provide to help IT dig into the DB for this data?

IT:  It will be great to get the count of users that have updated their profiles, but if possible any additional information you can pull out would be nice as well while you're in there:

Counts for specified URL values for:
    Website
    Twitter
    GitHub
    StackOverflow
    LinkedIn

As well as how many users defined any set of topics/keywords they are interested or experts in (tags).

I need this info by tomorrow, so marking critical.  Sorry in advance for any annoying paging that happens as a result. ;-)

Thanks!
(In reply to Jay Patel [:jay] from comment #0)
> As part of our Q3 goals reporting, I need to get some stats on how many
> users have updated their profile to take advantage of the new MDN profiles
> (e.g. https://developer.mozilla.org/en-US/profiles/Jay/)

This might be difficult, since the feature wasn't built with collecting these metrics in mind.

> We might need to do a raw SQL query to see if certain fields have been
> populated (that did not exist before this change).
> 
> Les / Luke:  Any details you can provide to help IT dig into the DB for this
> data?
>
> IT:  It will be great to get the count of users that have updated their
> profiles, but if possible any additional information you can pull out would
> be nice as well while you're in there:

The table where the new profiles are created is named "user_profiles". However, records in this table get auto-created whenever someone logs in, if they don't already have one. So, we can't just do a plain row count.

The new columns, which default to blank text, include:

  location, title, fullname, organization, bio, misc, irc_nickname

If any of those are non-blank, that row could be counted as updated by the user. So, something like this SQL might get the answer:

  SELECT "populated_user_profiles", COUNT(*) 
  FROM user_profiles 
  WHERE location <> '' OR title <> '' OR fullname <> '' OR 
    organization <> '' OR bio <> '' OR misc <> '' OR irc_nickname <> '';

> Counts for specified URL values for:
>     Website
>     Twitter
>     GitHub
>     StackOverflow
>     LinkedIn

These fields are encoded in a JSON blob in the misc column, and so not easily counted in SQL. It might be possible to do some SQL to count rows based on the presence of a JSON keyword in the misc column (eg. website, twitter, github, stackoverflow, linkedin). So, something like this might work:

  SELECT 'website', COUNT(*) FROM user_profiles 
    WHERE misc LIKE '%"website"%';

  SELECT 'twitter', COUNT(*) FROM user_profiles 
    WHERE misc LIKE '%"twitter"%';

  SELECT 'github', COUNT(*) FROM user_profiles 
    WHERE misc LIKE '%"github"%';

  SELECT 'stackoverflow', COUNT(*) FROM user_profiles 
    WHERE misc LIKE '%"stackoverflow"%';

  SELECT 'linkedin', COUNT(*) FROM user_profiles 
    WHERE misc LIKE '%"linkedin"%';

A better way might be to build a maintenance command in Django. But, that won't be ready by tomorrow

> As well as how many users defined any set of topics/keywords they are
> interested or experts in (tags).

Something like this might work in SQL:

  SELECT "tagged_users", COUNT(DISTINCT(taggit_taggeditem.object_id)) 
  FROM taggit_taggeditem 
  LEFT JOIN django_content_type ON 
    taggit_taggeditem.content_type_id = django_content_type.id 
  WHERE django_content_type.app_label="devmo" AND
    django_content_type.model="userprofile";

This might need some tweaking, though, since I don't have a good data set to try it on here.

> I need this info by tomorrow, so marking critical.  Sorry in advance for any
> annoying paging that happens as a result. ;-)

FWIW, this is something we should think about in advance when planning features. That way, we can make sure things are logged or structured in a measurable way and possibly even run through our usual metrics systems.
(In reply to l.m.orchard [:lorchard] from comment #1)

> > Counts for specified URL values for:
> >     Website
> >     Twitter
> >     GitHub
> >     StackOverflow
> >     LinkedIn
> 
> These fields are encoded in a JSON blob in the misc column, and so not
> easily counted in SQL. 
...
> A better way might be to build a maintenance command in Django. But, that
> won't be ready by tomorrow

Oh, and an even better way - at least with respect to counting in SQL - would be to rewrite this feature to use a 2nd table and joins instead of a JSON blob column. That's definitely not happening by tomorrow, though.
(Reporter)

Comment 3

6 years ago
Thanks for the SQL love Les.  I agree that we need to plan these out better.  I didn't want to worry about it for a little while, but Webgagement send out the Q3 report email last Friday, so just trying to gather as much data as we can to make mgmt happy. :-)

Ideally, we should develop any new feature that involve user input and engagement with metrics in mind.   The engagement team is pushing hard for more data driven goal setting and community building, so we can discuss this in more detail at our MDN meeting on Wed.

IT: Anything you can grab with the queries Les provided is fine (counts only will be ok for tomorrow).   Thanks!
(In reply to Jay Patel [:jay] from comment #3)
> Ideally, we should develop any new feature that involve user input and
> engagement with metrics in mind.   The engagement team is pushing hard for
> more data driven goal setting and community building, so we can discuss this
> in more detail at our MDN meeting on Wed.

Yeah, I'm just smacking myself because I thought "Hmm, we probably won't need to *count* these things" when I built that JSON links thing in the profile. *facepalm*
Assignee: server-ops-database → mpressman
(Reporter)

Comment 5

6 years ago
It's been a week and I really need those numbers... so marking this blocker and hoping to get some stats about MDN profiles today.   Thanks!
Severity: critical → blocker
(Assignee)

Comment 6

6 years ago
+-----------------------------+--------+
| Description                     | Count |
+-----------------------------+--------+
| populated_user_profiles |  2750  | 
| website                           |  146    | 
| twitter                             |  115    | 
| github                             |  71      | 
| stackoverflow                 |  32      | 
| linkedin                          |  47      | 
+-------------------------+-----------+
(Assignee)

Comment 7

6 years ago
+----------------+---------------------------------------------------------------+
| tagged_users  | COUNT(DISTINCT(taggit_taggeditem.object_id))  |
+----------------+---------------------------------------------------------------+
| tagged_users  |                                                                            77 |   |
+----------------+---------------------------------------------------------------+
(Assignee)

Updated

6 years ago
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Reporter)

Comment 8

6 years ago
Thanks Matt!

v.fixed.
Status: RESOLVED → VERIFIED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.