Closed
Bug 1358347
Opened 7 years ago
Closed 5 years ago
[jsplugins] Implement back-end part of find whole-word function
Categories
(Core Graveyard :: Plug-ins, enhancement, P3)
Core Graveyard
Plug-ins
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: lochang, Assigned: lochang)
References
Details
Attachments
(1 file, 1 obsolete file)
2.49 KB,
patch
|
jj.evelyn
:
feedback-
|
Details | Diff | Splinter Review |
This bug implements the back-end part of find whole-word function. Either just bypass the string that is not whole-word or find the similar algorithm of finding whole-word in Gecko code base, e.g. pdf.js.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → lochang
Updated•7 years ago
|
Group: mozilla-employee-confidential
Assignee | ||
Comment 1•7 years ago
|
||
I am still investigating the implementation of finding entire word in gecko. It is a bit more complicated than I think originally, since there are many different languages and side cases that we need to consider. However, I have a WIP patch of finding entire word by using regexp. Maybe we can use it if we do not have any concern with regexp.
Comment 2•7 years ago
|
||
I think the problem here is about performance and, as you mentioned, multiple language support. How does the WIP work? Do you feel significant delay on searching? Will it work on non-English text? On the other hand, could we use ICU library to help on string matching?
Assignee | ||
Comment 3•7 years ago
|
||
(In reply to Evelyn Hung [:evelyn] from comment #2) > I think the problem here is about performance and, as you mentioned, > multiple language support. How does the WIP work? Do you feel significant > delay on searching? Will it work on non-English text? > > On the other hand, could we use ICU library to help on string matching? I have tested find string (non-entire word) on optimized, non-debug build. And the performance with or without regex seems no significant difference [1]. So I will first keep working on WIP patch (regex) to make it supports finding Chinese. About ICU library, I do find that twitter has a open source ICU library [2] that we might be able to use. But I am not sure if we could just check them into m-c and use it, or how it will improve our string matching. [1] Both take 78 sec to find "for" on 300+ pages pdf: https://doc.lagout.org/programmation/C/Addison.Wesley.Effective.CPP.3rd.Edition.May.2005.pdf [2] https://github.com/twitter/twitter-cldr-js
Assignee | ||
Comment 4•7 years ago
|
||
It seems current Firefox doesn't support finding entire word in non-English language [1]. I think we should first clarify the coverage we want to support here. [1] Find for entire word "委" in pdf http://www.ia.nctu.edu.tw//ezfiles/0/1000/attach/94/pta_921_9439026_07440.pdf
Assignee | ||
Comment 5•7 years ago
|
||
Hi Evelyn, Could you feedback the patch for me? Thanks. Since currently FireFox does not support finding entire word in Chinese. I guess maybe the patch is good enough for now? I simply check the character before and after the search string to see if they are ASCII digit or ASCII alphabet when finding entire word. BTW do we have any spec for finding entire word, such as which punctuation should we escape?
Attachment #8865331 -
Attachment is obsolete: true
Attachment #8868058 -
Flags: feedback?(ehung)
Assignee | ||
Comment 6•7 years ago
|
||
(In reply to Louis Chang [:louis] from comment #5) > Since currently FireFox does not support finding entire word in Chinese. I > guess maybe the patch is good enough for now? The patch can support finding entire word in English.
Comment 7•7 years ago
|
||
Comment on attachment 8868058 [details] [diff] [review] [WIP] entire_word.patch If I understand your logic correctly, this approach is wrong. It's possible that the user try to find a string that completely non-alphabet non-digit. Try to use the text in your patch and find ' == ', then you will see a match in |if (n == 0) {| expression, which won't be detected by your way. Compare to regular expression, this approach is less efficient. You can use \b in regular expression for detecting word boundary without making your own heuristic(alphabet/digit).
Attachment #8868058 -
Flags: feedback?(ehung) → feedback-
Comment 8•7 years ago
|
||
Per our offline discussion, it's worth studying how icu library can be leveraged to solve this problem for multi-languages, and how we do word match for web pages in Firefox. If it's useful, we should consider exposing it to chrome's scripts. I'm fine to firstly address just English, but English-only doesn't meet our shipping criteria.
Comment 9•5 years ago
|
||
I'm closing this bug as WONTFIX per:
"The Mortar experiment has concluded. Mozilla does not consider the PDF use case justifies the burden of implementing and maintaining PDFium and a Pepper API implementation in Gecko."
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Updated•2 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•