Gather telemetry about HZ and other encodings we might try to remove

RESOLVED FIXED in mozilla28

Status

()

defect
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: hsivonen, Assigned: hsivonen)

Tracking

unspecified
mozilla28
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qa-])

Attachments

(1 attachment)

(Assignee)

Description

6 years ago
Of the encodings that we do support, I think HZ is by far the scariest
one, because it's escape delimiters are printable characters in ASCII
and because it's fairly easy to construct attack byte sequences that
round-trip from HZ to Unicode and back.

I think we should investigate if we could get away with removing
support for HZ and mapping its labels to the replacement encoding
without breaking the Web too much.
(Assignee)

Updated

6 years ago
Depends on: 932281
(Assignee)

Comment 1

6 years ago
Morphing bug scope to save trouble creating a large number of bugs and patches.
Summary: Gather telemetry about HZ usage → Gather telemetry about HZ and other encodings we might try to remove
(Assignee)

Comment 2

6 years ago
Attachment #829214 - Flags: review?(VYV03354)
(Assignee)

Comment 3

6 years ago
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes

Requesting policy compliance review from sstamm. This patch adds telemetry for recording if certain character encoding decoders have been instantiated during the session. I believe this falls under measuring "feature usage".
Attachment #829214 - Flags: review?(sstamm)
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes

Note that some encodings are instantiated solely for internal usage.
For example, Thebes uses Mac encodings and x-johab.
https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.cpp?rev=c51e46899a05#1149
https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.cpp?rev=c51e46899a05#1187
x-johab is also used by the OS/2 port.
https://mxr.mozilla.org/mozilla-central/source/widget/os2/nsOS2Uni.cpp?rev=fd7a0ace6b0e#24

>+        Telemetry::Accumulate(Telemetry::DECODER_INSTANTIATED_ISO2022JP, true);

I regularly encounter Web pages encoded by ISO-2022-JP. Mailnews will encounter it much more frequently. This will only degrade performance.
Also, ISO-2022-JP is defined a non-replacement encoding in the Encoding spec. I don't understand why do you try to remove ISO-2022-JP so eagerly and it is totally unacceptable for Japanese users (at least for me).
Attachment #829214 - Flags: review?(VYV03354) → review-
(Assignee)

Comment 5

6 years ago
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes

(In reply to Masatoshi Kimura [:emk] from comment #4)
> Comment on attachment 829214 [details] [diff] [review]
> Add telemetry probes
> 
> Note that some encodings are instantiated solely for internal usage.
> For example, Thebes uses Mac encodings and x-johab.
> https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.
> cpp?rev=c51e46899a05#1149
> https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.
> cpp?rev=c51e46899a05#1187

I'm trying to learn if that code actually ends up being used. No one replied when I asked in dev-platform. Probably because no one knows withou telemetry.

> x-johab is also used by the OS/2 port.
> https://mxr.mozilla.org/mozilla-central/source/widget/os2/nsOS2Uni.
> cpp?rev=fd7a0ace6b0e#24

OK. Doesn't hurt to see if we can not compile it for other platforms.

> >+        Telemetry::Accumulate(Telemetry::DECODER_INSTANTIATED_ISO2022JP, true);
> 
> I regularly encounter Web pages encoded by ISO-2022-JP. Mailnews will
> encounter it much more frequently. This will only degrade performance.
> Also, ISO-2022-JP is defined a non-replacement encoding in the Encoding
> spec. I don't understand why do you try to remove ISO-2022-JP so eagerly and
> it is totally unacceptable for Japanese users (at least for me).

I'm not trying to remove that one. I'm trying to understand it's level of usage to contrast with HZ. (Marking it as .notForBrowser in the other patch was an accident.)
Attachment #829214 - Flags: review- → review?(VYV03354)
(In reply to Henri Sivonen (:hsivonen) from comment #5)
> I'm not trying to remove that one. I'm trying to understand it's level of
> usage to contrast with HZ. (Marking it as .notForBrowser in the other patch
> was an accident.)

I thought because the subject says "other encodings we might try to remove" :)
I'm not yet convinced of the usefulness of counting ISO-2022-JP usage here. ISO-2022-JP is the default encoding of Japanese Thunderbird. I'm fine with other probably-rarely-used-encodings.
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes

Review of attachment 829214 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Henri Sivonen (:hsivonen) from comment #3)
> Requesting policy compliance review from sstamm. This patch adds telemetry
> for recording if certain character encoding decoders have been instantiated
> during the session. I believe this falls under measuring "feature usage".

I don't so much do policy compliance review anymore, but I agree this is more or less "feature usage."  If I understand correctly, it's feature usage triggered by the sites a user visits, so it's different than changing the default font or something.  This means we learn a little bit about what sites a user browses, though it doesn't seem to be a whole lot.  If the aim is to identify rarely used decoders and remove them, I think it's reasonable to collect this data.
Attachment #829214 - Flags: review?(sstamm) → review+
CC'ing Curtis and Alina as an FYI.
(Assignee)

Comment 9

6 years ago
(In reply to Masatoshi Kimura [:emk] from comment #6)
> I'm not yet convinced of the usefulness of counting ISO-2022-JP usage here.

It's one of the XSS-dangerous encodings. It seems weird to avoid having data about its usage. Are you sure that its usage on the Web (not email) is so high that it's useless to compare the relative usage of ISO-2022-JP in the Japanese Firefox localization with the relative usage of HZ in the Simplified Chinese localization as a way of understanding if HZ is used too much for HZ to be suitable for removal?

Anyway, I want to land. Do I get r=emk if I remove ISO-2022-JP counting?

Comment 10

6 years ago
I think we should measure it. It seems good to have statistics on it and not rely on assumptions.
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes

OK, go for it.
Attachment #829214 - Flags: review?(VYV03354) → review+
https://hg.mozilla.org/mozilla-central/rev/8c2d1c1b2eaf
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28

Updated

5 years ago
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.