Closed Bug 1640408 Opened 4 years ago Closed 4 years ago

macOS: Cannot lookup words in the dictionary when they are enclosed in typographic quotation marks

Categories

(Core :: Internationalization, defect)

76 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla79
Tracking Status
firefox79 --- verified

People

(Reporter: ansgarwohl, Assigned: jfkthame)

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:76.0) Gecko/20100101 Firefox/76.0

Steps to reproduce:

  1. Go to a web page that uses typographic quotation marks, e.g.:

https://www.nytimes.com/2020/05/23/us/coronavirus-government-trust.html

or try it with this bug report where I'm going to include typographic quotation marks as well.

  1. Three-finger-click or force-click (or whatever gesture you are using for dictionary lookup) on a word that has a typographic quotation mark in front of it (or after it). On the example page, the first such word is "Every". Probably, you can also try it here on this page: “Every time”.

Actual results:

A dictionary lookup for »“Every« (including the “) is performed and of course, there are no results.

Expected results:

The quotation mark should have been stripped from the word and a search for "Every" (without the quotation marks) should have been performed. This does work for other punctuation marks like standard non-typographic quotes.

When fixing this, please test it for different languages as well, e.g.

„German typographic quotation marks“
»alternative German typographic quotation marks«
«French typographic quotation marks»

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Widget: Cocoa
Product: Firefox → Core
Severity: -- → S3
Priority: -- → P3

This patch that adds the punctuation marks from the Unicode pages "Latin-1" and "General Punctuation" to the word breaker.
I have tested the patch and it works, but I don't have time to find out how to properly submit the patch. I hope it might still be useful.

Flags: needinfo?(spohl.mozilla.bugs)

Masayuki-san, could you take a look or redirect as you see appropriate? Thank you!

Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(masayuki)

Well, Jonathan Kew or Makoto-san might be better for word breaker.

(If nobody wouldn't take this, I'd like to check it, but probably, I don't have spare time in a couple of months.)

Flags: needinfo?(masayuki)
Flags: needinfo?(m_kato)
Flags: needinfo?(jfkthame)

This is a legitimate issue, caused by the intl/lwbrk/WordBreaker code treating lots of Unicode punctuation characters as if they were alphabetic. I think a better fix than just hard-coding a couple of ranges, though, would be to check the General Category property; this will also catch various punctuation characters in other blocks, without needing an exhaustive list or constant maintenance. I'll put up a patch.

Assignee: nobody → jfkthame
Status: UNCONFIRMED → NEW
Component: Widget: Cocoa → Internationalization
Ever confirmed: true
Flags: needinfo?(jfkthame)

The component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit auto_nag documentation.

Priority: P3 → --
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/430ea4182cef
Check Unicode general category to identify punctuation marks in word-breaker. r=m_kato
Flags: needinfo?(m_kato)
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
Flags: qe-verify+

Confirmed issues with 76.0.
Verified with 79.0b4 on macOS 10.15.5 & 10.13.6.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: