Closed Bug 1756305 Opened 3 years ago Closed 3 years ago

Reader View doesn't apply appropriate `lang` attribute to the title

Categories

(Toolkit :: Reader Mode, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox100 --- fixed

People

(Reporter: jfkthame, Assigned: jfkthame)

References

Details

Attachments

(2 obsolete files)

(Spun off from bug 1756278, where this was one of the scenarios mentioned.)

The wikipedia page https://ja.wikipedia.org/wiki/%E6%96%87%E5%8C%96 is clearly tagged as being Japanese:

<!DOCTYPE html>
<html class="client-nojs" lang="ja" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>文化 - Wikipedia</title>

This should mean that we use the Japanese font prefs to resolve the default fonts used. This works fine within the page content.

However, in Reader Mode, the title of the generated page (specifically, the element

<h1 class="reader-title">文化</h1>

seen in the Inspector) is not tagged as Japanese, and therefore defaults to using a Chinese font.

I think the lang of the original page (if specified) should be applied to the Reader Mode view as well.

Note that in this example, the reader-title is actually coming from JSON metadata found at the end of the source:

<script type="application/ld+json">{"@context":"https:\/\/schema.org","@type":"Article","name":"\u6587\u5316","url":"https:\/\/ja.wikipedia.org\/wiki\/%E6%96%87%E5%8C%96","sameAs":"http:\/\/www.wikidata.org\/entity\/Q11042","mainEntity":"http:\/\/www.wikidata.org\/entity\/Q11042","author":{"@type":"Organization","name":"Contributors to Wikimedia projects"},"publisher":{"@type":"Organization","name":"\u30a6\u30a3\u30ad\u30e1\u30c7\u30a3\u30a2\u8ca1\u56e3","logo":{"@type":"ImageObject","url":"https:\/\/www.wikimedia.org\/static\/images\/wmf-hor-googpub.png"}},"datePublished":"2003-03-07T13:14:32Z","dateModified":"2022-02-09T10:21:11Z","headline":"\u4eba\u9593\u304c\u793e\u4f1a\u306e\u69cb\u6210\u54e1\u3068\u3057\u3066\u7372\u5f97\u3059\u308b\u591a\u6570\u306e\u632f\u308b\u821e\u3044\u306e\u5168\u4f53"}</script>

specifically, "name":"\u6587\u5316" here provides the text "文化" that Reader Mode uses as the title. If this is removed, Reader Mode will then use the content of the <title> tag from the page; but it still leaves it without any lang attribute and therefore gets the wrong font prefs.

So the simplest thing to do here is just to give the Reader Mode title element (the <h1 class="reader-title">) the same lang attribute as gets applied to the main article content. This fixes the display of examples like the Japanese wikipedia page.

To go a step further, we could also check when parsing the document for an existing lang attribute on the <title> element, and if present, use this (in the relatively unlikely case that it differs from the article language).

There will no doubt still be examples that don't look right, because language tagging on the web is often rather haphazard, but these heuristics should at least improve the odds of getting it right in the Reader Mode view, particularly for CJK users where using the right language is important for getting the correct glyph shapes.

Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Depends on: 1474565
Severity: -- → S3
Priority: -- → P3

Hi Jonathan, I believe this is fixed by Bug 1474565. I can see that the title in your example is now using the expected font (Yu Gothic Bold) instead of Microsoft YaHei Bold. I'll mark this as resolved but feel free to re-open if there's anything else we missed!

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jfkthame)
Resolution: --- → FIXED

Thanks for following up, Micah - yes, this looks fine now. I'll mark the patches here as obsolete to get rid of the clutter.

Flags: needinfo?(jfkthame)
Attachment #9264796 - Attachment is obsolete: true
Attachment #9264797 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: