Open Bug 1261154 Opened 5 years ago Updated 3 years ago

Use "formatAbbreviatedBytes" utility from tree map in sidebar

Categories

(DevTools :: Memory, defect, P2)

defect

Tracking

(Not tracked)

People

(Reporter: gregtatum, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

https://bugzilla.mozilla.org/show_bug.cgi?id=1238695

Includes a new abbreviation utility which can be used in the sidebar as well. This is to be done after the tree map lands.

::: devtools/client/memory/test/unit/test_utils.js
@@ +54,5 @@
> +
> +  equal(utils.formatAbbreviatedBytes(12), "12B", "Formats bytes");
> +  equal(utils.formatAbbreviatedBytes(12345), "12KiB", "Formats kilobytes");
> +  equal(utils.formatAbbreviatedBytes(12345678), "11MiB", "Formats megabytes");
> +  equal(utils.formatAbbreviatedBytes(12345678912), "11GiB", "Formats gigabytes");
Assignee: nobody → gtatum
Depends on: 1238695
Priority: -- → P2
I'm un-assigning myself from this for right now since we are focusing on track 3 of the devtools.html project. Once we move beyond that I can look at this again.
Assignee: gtatum → nobody
Whiteboard: [good first bug]
Attached patch bug-1261154.patch (obsolete) — Splinter Review
Hi Greg, it's a quick patch for applying formatAbbreviatedBytes in snapshot sidebar. Not sure if there exists any other places that need to be replaced as well. BTW, this new string won't have localized size unit, compared with the original string. Shouldn't we localize the unit inside formatAbbreviatedBytes?
Attachment #8775497 - Flags: feedback?(gtatum)
Sorry for the delay on responding. I asked that localization question when I originally wrote formatAbbreviatedBytes, but during the discussion with other folks they deemed it wasn't necessary (see bug 1238695). Since then I have seen the bytes like that localized in other parts of the devtools code.

I think we should pull in someone from localization to get their opinion. Otherwise this looks like the correct path forward.
Attachment #8775497 - Flags: feedback?(gtatum)
Hi Francesco, do you think the localization for size unit is necessary? Because the size of memory applied aggregate.mb previously. Now we might leverage a new function formatAbbreviatedBytes, but it didn't localize the size unit for the size string.
Flags: needinfo?(francesco.lodolo)
I can't seem to find any discussion about l10n in bug 1238695 (but it's a long bug).

I think it is important to have them localizable. For one, I don't want to use KB as English incorrectly does.

We do localize them for Downloads
https://hg.mozilla.org/releases/mozilla-aurora/file/default/toolkit/locales/en-US/chrome/mozapps/downloads/downloads.properties#l65

And you can get an idea of the current localizations of KB
https://transvision.mozfr.org/string/?entity=toolkit/chrome/mozapps/downloads/downloads.properties:kilobyte&repo=aurora

I wonder if there's anything useful in the Intl API to leverage to get those (CCing Gandalf)
Flags: needinfo?(francesco.lodolo)
We're working in bug 1291408 on Intl.UnitFormat that will be designed exactly for that. 

See the spec proposal for ECMA - https://rawgit.com/zbraniecki/proposal-intl-unit-format/master/index.html

If you need it now, I'd recommend writing it in a way that is most compatible so that you can later easily switch to Intl.UnitFormat once we land it (I expect it to land this year).
Hi Greg, it's the WIP if we want to leverage the current L10n solution for byte format. Not sure if it's fine for the rest of the tree heap part, and it would need to rewrite the test as well.
Attachment #8775497 - Attachment is obsolete: true
Attachment #8780371 - Flags: feedback?(gtatum)
Can you guys use the same unit abbreviations that CLDR is using[0]?

It'll make it more compatible with CLDR once we start switching Firefox to it (coming with L20n).

[0] http://www.unicode.org/cldr/charts/29/summary/en.html#6220
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #8)
> Can you guys use the same unit abbreviations that CLDR is using[0]?
> 
> It'll make it more compatible with CLDR once we start switching Firefox to
> it (coming with L20n).
> 
> [0] http://www.unicode.org/cldr/charts/29/summary/en.html#6220

That's the question I also want to raise: Should we treat kilobyte/Kibibyte [1] as different unit? 
Per calculation the unit of the size should be binary, so I still kept the original KiB/MiB/GiB unit in this patch. I'm not sure if we need to keep these because I didn't see these units in firefox code base.

[1] https://en.wikipedia.org/wiki/Kibibyte
Flags: needinfo?(gandalf)
In that case, we will not be able to replace this with CLDR because CLDR does not handle Kibibytes.

My only question is if from the UX standpoint introducing a unit that users are likely not familiar with is worth the technical precision. I don't have an answer, but if you ever decide to switch to something that CLDR handles, I hope we'll have the UnitFormatter for you by that time! :)
Flags: needinfo?(gandalf)
I do like Zibi's reasoning on the UX standpoint and like where your patch is going Steve. This originally came out of the discussion in bug 1238695 which involved Jim Blandy and Nick Fitzgerald. I'm going to NI them to get their thoughts.
Flags: needinfo?(nfitzgerald)
Flags: needinfo?(jimb)
Mebibytes/kibibytes/etc are more correct, we should use them. CLDR should add them if it doesn't have them, but that seems like an aside to me.
Flags: needinfo?(nfitzgerald)
I don't think most people know the difference between kb (kilobits), kB (kilobytes), kib (kibibits) and kiB (kibibytes). However, I do think people who don't know the difference will easy read them correctly, if they're in a context where a memory size would make sense.

In other words, I don't think it's confusing to do the right thing. So we should use B, kiB, MiB, GiB.

It is not important at all to localize the abbreviations of these units. Using the IEC units is fine in all locales; that's the point of having an International Electrotechnical Commission.

OTOH, getting commas versus dots right is something I have a lot more sympathy for.
Flags: needinfo?(jimb)
Ok, the user impact does seem pretty small for confusion on the units, especially considering that the memory tool is a more technical tool. It would make sense to use the more technically correct units to remove an ambiguity of what you are looking at.

So it looks like the consensus is we'll use formatAbbreviatedBytes as it stands as there are no decimal places, and just get it into the sidebar.
(In reply to Jim Blandy :jimb from comment #13)
> It is not important at all to localize the abbreviations of these units.

Yeah, except that not the whole world happens to be using left-to-right western arabic alphabet[0]. Do you have sympathy for that?
Also, in some locales there is a difference between singular form and plural form even for narrow variants.
 
> Using the IEC units is fine in all locales; that's the point of having an
> International Electrotechnical Commission.

I'm terribly sorry the complexity of the world does not fit into this picture.


> OTOH, getting commas versus dots right is something I have a lot more
> sympathy for.

Intl.NumberFormat deals with this.


[0]
http://st.unicode.org/cldr-apps/v#/ar_DZ/Digital/
http://st.unicode.org/cldr-apps/v#/fa/Digital/
http://st.unicode.org/cldr-apps/v#/he/Digital/
http://st.unicode.org/cldr-apps/v#/ur/Digital/
http://st.unicode.org/cldr-apps/v#/ru/Digital/
 etc.
Hi Gandalf, is it possible that we still apply binary unit defined in IEC and decimal unit for other locals without IEC abbreviation? We can add more comments for it in the localization files.
Flags: needinfo?(gandalf)
If you want to stick to KiB, MiB, and GiB, I'd stick to what you are doing which is to localize the unit formatting string. It's not optimal (until we switch to l20n localizers wont be able to pluralize those strings), but it's a good start and allows localizers to adapt the string.

You should format the number using Intl.NumberFormat (in L20n it will be done automatically).
Flags: needinfo?(gandalf)
[good first bug] whiteboard -> keyword mass change
Keywords: good-first-bug
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #15)
> (In reply to Jim Blandy :jimb from comment #13)
> > It is not important at all to localize the abbreviations of these units.
> 
> Yeah, except that not the whole world happens to be using left-to-right
> western arabic alphabet[0]. Do you have sympathy for that?
> Also, in some locales there is a difference between singular form and plural
> form even for narrow variants.

We should be using Arabic numerals in all cases for this tool. We should not be using singular or plural forms at all.
So I think the answer is no, I don't think it's important in this context.
That was a little more abrupt than I'd intended --- but comment 15 was as well. Let's take a step back.

First, just to be clear what we're discussing here: this is a display of the amount of memory occupied by various categories of data, presented to web developers as part of an expert-level developer tool. This is not user-facing UI; it is developer-facing only.

I think it is important for us to use the correct units here. Those units are "kiB". "KB" may be a common way to refer to 1024 bytes in popular use, but this is a technically specialized context, and we should use the industry-standard units.

If there is a simple way for us to get the right units and otherwise properly localize the number, then let's do it. If it is not presently possible to use the correct units, then I think the way to maximize the quality of the tool for the largest audience is, unfortunately, to use the non-localized form.
(In reply to Jim Blandy :jimb from comment #19)
> We should be using Arabic numerals in all cases for this tool. 

Western arabic numerals (0-9) are not used by many languages at all.

> We should not be using singular or plural forms at all.

I don't understand why you believe so.

If you don't want to make this UI localizable, and you only want to display it in en-US, then you are correct. In ever other scenario, I believe you are wrong.

> I think it is important for us to use the correct units here. Those units are "kiB".

I understand that. I suggested using "KB" because a) it's already localized and b) it may be easier to understand for users. The argument that it's a technical tool and it's valuable to preserve the accuracy convinces me. I'm ok with "kiB" :)

> If it is not presently possible to use the correct units, then I think the way to maximize the quality of the tool for the largest audience is, unfortunately, to use the non-localized form.

It's perfectly possible, we just need to add l10n strings that allow localizers to format the unit. I gave you UnitFormat example so that you can shape the string templates and API after it.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #21)
> (In reply to Jim Blandy :jimb from comment #19)
> > We should be using Arabic numerals in all cases for this tool. 
> 
> Western arabic numerals (0-9) are not used by many languages at all.

I'm personally familiar with software development in Japan. Japanese users of this tool would much rather see "213kiB" than “二百十三kiB". The latter would be ridiculous and irritating to a Japanese programmer.

In these other locales, what notation would engineering and computer science students use in their textbooks and classes? I believe that if you're doing software, you're working with Arabic numerals and metric units.

Here is an example of the sort of display whose numbers we're formatting:
http://tatumcreative.github.io/memory-treemap/

The labels "Array", "Object", etc. are JavaScript terms; it is not correct to localize them, as one cannot use 行列 in JavaScript as a synonym for "Array"; the developer would be left guessing what JavaScript term is meant.
You are right at that Japanese is not a good example of language that requires unit localization. They don't, and as you can see above, I didn't list it in the list of example I gave you straight from CLDR.

But assuming you believe that CLDR doesn't know what it's doing and the list of five examples of non left-to-right, non-western-arabic numerals, using different alphabet is not enough, fortunately, you can look at wikipedia.

Here's an example list of articles that don't use the combination of numbers, alphabet and directionality that would allow for en-US expression "10 kb" to work for them with transliteration of digits and translation of digits (and directions, and separators):

* https://uk.wikipedia.org/wiki/%D0%9C%D0%B5%D0%B3%D0%B0%D0%B1%D0%B0%D0%B9%D1%82
* https://ar.wikipedia.org/wiki/%D9%85%D9%8A%D9%82%D8%A7%D8%A8%D8%A7%D9%8A%D8%AA
* https://fa.wikipedia.org/wiki/%D9%85%DA%AF%D8%A7%D8%A8%D8%A7%DB%8C%D8%AA
* https://ru.wikipedia.org/wiki/%D0%9C%D0%B5%D0%B3%D0%B0%D0%B1%D0%B0%D0%B9%D1%82
* https://mk.wikipedia.org/wiki/%D0%9C%D0%B5%D0%B3%D0%B0%D0%B1%D0%B0%D1%98%D1%82
* https://bn.wikipedia.org/wiki/%E0%A6%AE%E0%A7%87%E0%A6%97%E0%A6%BE%E0%A6%AC%E0%A6%BE%E0%A6%87%E0%A6%9F
* https://ko.wikipedia.org/wiki/%EB%A9%94%EA%B0%80%EB%B0%94%EC%9D%B4%ED%8A%B8
* https://hy.wikipedia.org/wiki/%D5%84%D5%A5%D5%A3%D5%A1%D5%A2%D5%A1%D5%B5%D5%A9
* https://ka.wikipedia.org/wiki/%E1%83%9B%E1%83%94%E1%83%92%E1%83%90%E1%83%91%E1%83%90%E1%83%98%E1%83%A2%E1%83%98
* https://mr.wikipedia.org/wiki/%E0%A4%AE%E0%A5%87%E0%A4%97%E0%A4%BE%E0%A4%AC%E0%A4%BE%E0%A4%88%E0%A4%9F
* https://ta.wikipedia.org/wiki/%E0%AE%AE%E0%AF%86%E0%AE%95%E0%AE%BE%E0%AE%AA%E0%AF%88%E0%AE%9F%E0%AF%8D%E0%AE%9F%E0%AF%81
* https://th.wikipedia.org/wiki/%E0%B9%80%E0%B8%A1%E0%B8%81%E0%B8%B0%E0%B9%84%E0%B8%9A%E0%B8%95%E0%B9%8C
Looking through that list, it seems that every locale other than Macedonian actually admits the use of KiB, MiB, and GiB. Score one for the IEC! (It makes me wonder whether the Macedonian page is correct...)

Zibi brings up the issue of LTR/RTL differences, and non-Arabic numerals, and pluralization, but nothing in this bug up to this point has really considered those broader questions of numeric formatting --- the original question at hand was just about the unit names. I don't know if we want to block the simpler, immediate question on the broader question with the more involved answer.
Sorry, I this context especially, I should use the word "locale" advisedly. What I meant was, it seems that these Wikipedia pages translated into different languages all (except for Macedonian) say that the units Greg's original code was using were actually acceptable.
> Looking through that list, it seems that every locale other than Macedonian actually admits the use of KiB, MiB, and GiB. Score one for the IEC

I'm not sure if we see the same results, so let me be more explicit:

In Ukrainian, the unit is "кБ", which matches data from CLDR: http://www.unicode.org/cldr/charts/29/summary/uk.html#6834

In Arabic (and Faarsi), the unit is "كيلوبايت", which matches data from CLDR: http://www.unicode.org/cldr/charts/29/summary/ar.html#8021
Yes, that means that in Arabic, we will not be using the short form, because its not recommended by CLDR/ICU/Unicode.

In Armenian, the unit is "կԲ" which matches data from CLDR: http://www.unicode.org/cldr/charts/29/summary/hy.html#5448

And so on.

> Zibi brings up the issue of LTR/RTL differences, and non-Arabic numerals, and pluralization, but nothing in this bug up to this point has really considered those broader questions of numeric formatting --- the original question at hand was just about the unit names.

Correct, but I don't believe you can answer one without the other. If you will want to display latin characters and unit name ("KiB") without localization, you will have to match it with western-arabic numerals and left-to-right.

> I should use the word "locale" advisedly. What I meant was, it seems that these Wikipedia pages translated into different languages all (except for Macedonian) say that the units Greg's original code was using were actually acceptable.

Can you point out which fragments of those articles indicate what you are claiming they do?
Just to be clear - the presence of IEC units in tables is a reference, not the unit used by that language.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #26)
> Can you point out which fragments of those articles indicate what you are
> claiming they do?
> Just to be clear - the presence of IEC units in tables is a reference, not
> the unit used by that language.

Well, maybe I'm misreading them. I was assuming that if the Wikipedia editors in those languages were citing the IEC units, then that implied that a technically-oriented audience in that language would be familiar with them.

Either way, you've persuaded me that using the CLDR is the right thing in the long term. You mentioned in email that CLDR 30 is frozen, and it will take another 6-7 months to get the binary base prefixes into the next version, and suggested that we use ordinary localized string templates whose forms imitate those used by the extant decimal base prefixes in the CLDR (e.g. "{0} KiB", imitating "{0} KB"). That seems reasonable to me.
Attachment #8780371 - Flags: feedback?(gtatum) → feedback+
Keywords: good-first-bug
Whiteboard: [good first bug]
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.