Closed Bug 1106380 Opened 6 years ago Closed 5 years ago
Make average word length and reading speed localizable
Bug 889351 implemented an approximate reading time for items on the reading list. The average length of the words and the reading speed are hardcoded but should be localizable: 1.33 + private static final int AVERAGE_READING_SPEED = 250; 1.34 + 1.35 + // Length of average word. 1.36 + private static final float AVERAGE_WORD_LENGTH = 5.1f; The reading speed can also vary strong between people (e.g. one person reads twice as fast as someone else). Getting input from CJK and other languages with more complex characters expressing more content would also be helpful.
As I noted in https://github.com/mozilla-services/readinglist/issues/3, there are two separate uses for something like a word count: accurately calculating and tracking scroll/read position (where the only thing that matters is consistency), and calculating an estimated reading time. We'll definitely need the former. That is probably best addressed by using display-oriented concepts: either a character count or a word count (in the 'split on spaces and punctuation' sense). The rails that this bug is on for the latter are -- characters / avg_char_per_word / words_per_minute but for ideographic languages this gets difficult. So switch to characters / characters_per_minute? That gets hairy when you have compound texts: what do you do for a blog post that's part Hiragana, part kanji, and part English quotes? Take a look at, e.g., http://www.mozilla.jp/blog/entry/10439/ or http://googlejapan.blogspot.jp/2014/10/1-game-week-with-google-play.html which both contain a good mix of Japanese scripts and English -- three different densities. Your effective reading speed will be shifted by the percentages that each text contains. Kindle figures out your reading speed as you read. It has the luxury of doing so, because you're typically reading book-length works. It'll switch from "15%" to "20% -- 1 hour remaining" as it learns. So a simpler approach for us might be: make a guess at layout and use a 'traffic light' model instead of 'X minutes': one screen (blog post), a few screens (news piece), lots of screens (essay).
Oh, and there are interesting parallels here with translation, which typically bills by the word -- so what's a word? http://www.proz.com/forum/localization/229393-word_count_when_source_language_is_korean_chinese_japanese_arabic.html http://www.proz.com/forum/business_issues/220408-how_to_count_source_words_in_an_asian_text_into_eng.html
Many thanks for the pointers!
We're not continuing to invest in the reading list, so let's not fix this.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.