Closed Bug 1155539 Opened 9 years ago Closed 9 years ago

Remove obsolete encoding decoder telemetry probes

Categories

(Core :: Internationalization, defect)

defect
Not set
trivial

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

Details

Attachments

(1 file, 1 obsolete file)

We have DECODER_INSTANTIATED_* stuff we no longer need.
The results for gfx-only Mac encodings are mixed (Firefox 37 data on 2015-04-17):

Used
MacArabic hundreds (vs. 19.23M)
MacCE hundreds
Greek hundreds
Cyrillic thousands (also exposed to Web content)
Gurmukhi a dozen
Hebrew hundreds of thousands
Romanian half a dozen
Turkish dozens

Unused (literally zero)
Croatian
Devanagari
Farsi
Gujarati
Icelandic

Except for Hebrew and Cyrillic, we might get away with removing these. I don't have the energy to pursue at the moment. Maybe we should keep around these probes until a decision is made.
Release 31, Thunderbird, saved_session, data queried on 2015-04-17

Total sessions: 206.76M

Counting sessions where decoder instantiated at least once.

ARMSCII8: 2.03k
HZ: 12.12k
IBM850: 19.79k
IBM852: 45.24k
IBM857: 360
IBM862: 132
IBM866: 57.39k
ISO-2022-CN: 8.54k
ISO-2022-KR: 31.2k
ISO-IR-111: 552
JOHAB: 12
MacArabic: 3.18k
MacCE: 17.95k
MacCroatian: 5.05k
MacCyrillic: 39.15k
MacDevanagari: 181
MacFarsi: 8
MacGreek: 3.3k
MacGujarati: 0
MacGurmukhi: 80
MacHebrew: 5.96M
MacIcelandic: 37
MacRomanian: 134
MacTurkish: 1.36k
T.61: 51
VIETTCVN5712: 50
VIETVPS: 5.23k
VISCII: 2.74k

For reference:
ISO-8859-5: 1.25M
ISO-2022-JP: 37.22M

When comparing with the Firefox numbers in the previous comment and scaling by the total sessions, it seems that all Mac numbers are explained by gfx usage for font names, which continues to work without the encodings being exposed to either Web content or email content.

Needinfoing jcranmer and mkmelin as a way to bring this to their attention in case they want to use this data to resurrect some of these in c-c.
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(Pidgeot18)
Attached patch Edit Histograms.json (obsolete) — Splinter Review
Let's keep around the Cyrillic probes to understand the impact of future Cyrillic detection changes and let's keep around the ISO-2022-JP probe to gauge usage on the Web (usage in email is obvious).
Attachment #8593789 - Flags: review?(VYV03354)
Comment on attachment 8593789 [details] [diff] [review]
Edit Histograms.json

Review of attachment 8593789 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Histograms.json
@@ +4191,5 @@
>      "kind": "flag",
>      "description": "Whether the decoder for IBM866 has been instantiated in this session."
>    },
>    "DECODER_INSTANTIATED_MACGREEK": {
> +    "expires_in_version": "never",

Why all Mac encodings are kept?
(In reply to Masatoshi Kimura [:emk] from comment #4)
> Why all Mac encodings are kept?

As noted in comment 1, I left them in until we decide to keep the Mac encoding for font names or to pursue their removal.
Comment on attachment 8593789 [details] [diff] [review]
Edit Histograms.json

Review of attachment 8593789 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Histograms.json
@@ +4251,4 @@
>      "kind": "flag",
>      "description": "Whether the decoder for MACGUJARATI has been instantiated in this session."
>    },
>    "DECODER_INSTANTIATED_MACGURMUKHI": {

This should be DECODER_INSTANTIATED_MACGURMUKHI.
Attachment #8593789 - Flags: review?(VYV03354) → review+
(In reply to Henri Sivonen (:hsivonen) from comment #2)
> Needinfoing jcranmer and mkmelin as a way to bring this to their attention
> in case they want to use this data to resurrect some of these in c-c.

Thank you for notifying us. The line of threshold I'd be tempted to draw would be on the order of 10s of k, which puts MacCE, IBM850, IBM852, ISO-2022-KR (eek!), and Hebrew in this list. I suspect the IBM charsets may be mostly ASCII in practice--I might have suggested making those decoders enabled only if they had non-ASCII content, but that's not something that can be usefully changed right now, so meh.

At this point, my plan of action for most of these charsets are to go along with their removal from mozilla-central (whenever that happens), and then add support back if people file bugs. Although it seems that Eastern/Central Europe are the most affected in terms of borderline charsets, so it may be worth reaching out to our localization communities there for more information.
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(Pidgeot18)
(In reply to Masatoshi Kimura [:emk] from comment #6)
> Comment on attachment 8593789 [details] [diff] [review]
> Edit Histograms.json
> 
> Review of attachment 8593789 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/Histograms.json
> @@ +4251,4 @@
> >      "kind": "flag",
> >      "description": "Whether the decoder for MACGUJARATI has been instantiated in this session."
> >    },
> >    "DECODER_INSTANTIATED_MACGURMUKHI": {
> 
> This should be DECODER_INSTANTIATED_MACGURMUKHI.

I don't understand this review comment. What do you mean?
Flags: needinfo?(VYV03354)
(In reply to Joshua Cranmer [:jcranmer] from comment #7)
> (In reply to Henri Sivonen (:hsivonen) from comment #2)
> > Needinfoing jcranmer and mkmelin as a way to bring this to their attention
> > in case they want to use this data to resurrect some of these in c-c.
> 
> Thank you for notifying us. The line of threshold I'd be tempted to draw
> would be on the order of 10s of k, which puts MacCE, IBM850, IBM852,
> ISO-2022-KR (eek!), and Hebrew in this list.

As noted in comment 2, MacHebrew and MacCE may be explained by font names, so you probably shouldn't re-expose them to email content without direct requests from users.

The IBM850, IBM852, ISO-2022-CN and ISO-2022-KR findings are more troubling.

> At this point, my plan of action for most of these charsets are to go along
> with their removal from mozilla-central (whenever that happens),

Already happened.

> and then
> add support back if people file bugs.

OK.

> Although it seems that Eastern/Central
> Europe are the most affected in terms of borderline charsets, so it may be
> worth reaching out to our localization communities there for more
> information.

You might get "let's add stuff back just in case" that way, though.
(In reply to Henri Sivonen (:hsivonen) from comment #8)
> I don't understand this review comment. What do you mean?

Perhaps I meant to say "Please remove DECODER_INSTANTIATED_MACGUJARATI" (because it is dead).
Flags: needinfo?(VYV03354)
(In reply to Masatoshi Kimura [:emk] from comment #10)
> (In reply to Henri Sivonen (:hsivonen) from comment #8)
> > I don't understand this review comment. What do you mean?
> 
> Perhaps I meant to say "Please remove DECODER_INSTANTIATED_MACGUJARATI"
> (because it is dead).

Whoa. Rather, I think I should do the opposite and actually put the probe in place. No wonder the probe gets zero hits when it doesn't exist.
Attachment #8593789 - Attachment is obsolete: true
Attachment #8599349 - Flags: review?(VYV03354)
Attachment #8599349 - Flags: review?(VYV03354) → review+
Keywords: checkin-needed
Please post a link to a green Try run for this patch.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/a5d0c44ba370
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
You need to log in before you can comment on or make changes to this bug.