Last Comment Bug 714804 - templates: record template usage stats during migration
: templates: record template usage stats during migration
Status: RESOLVED FIXED
u=developer c=wiki p=1
:
Product: Mozilla Developer Network
Classification: Other
Component: Wiki pages (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: 2.1
Assigned To: Les Orchard [:lorchard]
:
Mentors:
Depends on:
Blocks: 710728 715253
  Show dependency treegraph
 
Reported: 2012-01-03 07:57 PST by Luke Crouch [:groovecoder]
Modified: 2012-09-18 23:13 PDT (History)
4 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
First count of template usage in non-namespaced MindTouch pages (15.36 KB, text/plain)
2012-01-11 12:05 PST, Les Orchard [:lorchard]
no flags Details
Second count of template use (595 bytes, text/plain)
2012-01-12 11:03 PST, Les Orchard [:lorchard]
no flags Details
Second count of template usage (60.78 KB, text/plain)
2012-01-12 11:04 PST, Les Orchard [:lorchard]
no flags Details
Attempt to map template calls to Template: pages (48.51 KB, text/plain)
2012-01-12 11:31 PST, Les Orchard [:lorchard]
no flags Details
Tab-delimited list of pages and template calls (1.50 MB, application/x-gzip)
2012-01-25 10:56 PST, Les Orchard [:lorchard]
no flags Details

Description Luke Crouch [:groovecoder] 2012-01-03 07:57:18 PST
To help with bug 710728.
Comment 1 Les Orchard [:lorchard] 2012-01-11 11:55:15 PST
Quick & dirty template call parsing & extraction in the migration script:
https://github.com/lmorchard/kuma/commit/bfc31365672f70f4b211ce2b9cd470d791404eb6
Comment 2 Les Orchard [:lorchard] 2012-01-11 12:05:27 PST
Created attachment 587777 [details]
First count of template usage in non-namespaced MindTouch pages

Built a quick page parsing and template call extraction option in the migration script. It runs through all default namespace pages and looks for DekiScript calls, where default namespace are pages without colon-prefixes (eg. Talk:, User:, Special:, etc)

I ran the template extraction on the MindTouch import in my VM:

$ ./manage.py migrate_to_kuma_wiki --all --template-metrics > templates.txt
$ sort templates.txt | uniq -c | sort -rn | head -25
  11906 wiki.languages
  10515 wiki.template
   4975 template.Source
   4703 template.XULElem
   4409 template.PrefAnch
   4318 mediawiki.external
   4257 HTMLElement
   4039 template.XULAttr
   3423 template.Cssxref
   3256 domxref
   2808 Interface
   2680 interface
   2652 CompatUnknown
   1970 template.XULAttrInc
   1861 Cssxref
   1835 gecko_minversion_inline
   1684 template.XULPropInc
   1668 template.Bug
   1633 CompatNo
   1574 SVGElement
   1541 cssxref
   1323 template.DomRef
   1239 template.PreviousNext
   1200 template.Interface
   1177 template.XULRefAttr

Attached is a full dump, without the `head -25`
Comment 3 Les Orchard [:lorchard] 2012-01-11 12:07:31 PST
Hopefully this list gives a good priority list of which template scripts need a looking at and support in Kuma.

The list could definitely use some eyeballs and a sanity check, in case it's missing any templates that someone more familiar with MDN content knows for sure should be there.
Comment 4 Luke Crouch [:groovecoder] 2012-01-11 13:44:45 PST
sheppy, jms, teoli: can you review this list to help with bug 715253 - let us know which data resources are the most prevalent in these scripts?
Comment 5 Les Orchard [:lorchard] 2012-01-12 08:34:33 PST
Self-assigning. Could probably call this closed, but want to get some eyes on the list before that
Comment 6 Les Orchard [:lorchard] 2012-01-12 11:03:35 PST
Created attachment 588107 [details]
Second count of template use

Getting a little more understanding after beginning to poke into these results. Looks like the wiki.template() calls need to be broken down further, since they represent an alternative calling style to invoke templates.
Comment 7 Les Orchard [:lorchard] 2012-01-12 11:04:25 PST
Created attachment 588110 [details]
Second count of template usage
Comment 8 Les Orchard [:lorchard] 2012-01-12 11:31:12 PST
Created attachment 588125 [details]
Attempt to map template calls to Template: pages

Did some more normalization and collation, trying to convert in-page calls to Template: page names.
Comment 9 [github robot] 2012-01-17 14:05:32 PST
Commit pushed to https://github.com/mozilla/kuma

https://github.com/mozilla/kuma/commit/9839f125c1590bf826456874883acd2ff966e74f
deki-migration: bug 714804, more normalization for better collation of template metrics
Comment 10 Les Orchard [:lorchard] 2012-01-23 09:43:37 PST
Going to call this closed, since a count of templates was produced. Next steps are to continue to refine those counts, and to spend more time reading the templates to get a sense for capabilities needed by a new template system
Comment 11 Les Orchard [:lorchard] 2012-01-25 10:56:55 PST
Created attachment 591545 [details]
Tab-delimited list of pages and template calls

For good measure, here's another dump from migration. Should be useful for things like this report on the pages with the most templates used:

$ gzip -dc tmpl-use-raw.txt.gz | cut -f1 | sort | uniq -c | sort -rn | head -25
   2110 ja/reftest_opportunities_files
   2073 en/reftest_opportunities_files
   1648 en/Interfaces
   1580 trevorh/Interface_documentation_status
    518 en/Gecko_DOM_Reference
    473 ja/CSS/CSS_Reference/Mozilla_Extensions
    393 en/SVG/Attribute
    353 en/Interfaces_moved_in_Firefox_3.6
    339 en/CSS/CSS_Reference/Mozilla_Extensions
    284 en/HTML/Element/Input
    268 ja/Firefox_4_for_developers
    265 en/DOM/element
    253 pt/Firefox_4_para_desenvolvedores
    251 en/HTML/Attributes
    251 en/Firefox_4_for_developers
    238 en/CSS/CSS_Reference
    230 en/HTML/Content_categories
    227 es/Firefox_4_para_desarrolladores
    222 fr/Référence_CSS
    221 ja/CSS/CSS_Reference
    216 en/HTML/Element
    215 es/Referencia_CSS
    214 zh_tw/Firefox_2_佈景主題之更動
    214 ja/Theme_changes_in_Firefox_2
    213 pl/Zmiany_w_motywie_graficznym_w_Firefoksie_2

Something like top templates will need some more munging (eg. parse the template name from the params, etc)

Note You need to log in before you can comment on or make changes to this bug.