Closed Bug 1358347 Opened 7 years ago Closed 5 years ago

[jsplugins] Implement back-end part of find whole-word function

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: lochang, Assigned: lochang)

References

Details

Attachments

(1 file, 1 obsolete file)

[WIP] entire_word.patch 7 years ago Louis Chang [:louis] 2.47 KB, patch		Details \| Diff \| Splinter Review
[WIP] entire_word.patch 7 years ago Louis Chang [:louis] 2.49 KB, patch	jj.evelyn : feedback-	Details \| Diff \| Splinter Review

Louis Chang [:louis]

Assignee

Description

•

7 years ago

This bug implements the back-end part of find whole-word function. Either just bypass the string that is not whole-word or find the similar algorithm of finding whole-word in Gecko code base, e.g. pdf.js.

Louis Chang [:louis]

Assignee

Updated

•

7 years ago

Assignee: nobody → lochang

Evelyn Hung

Updated

•

7 years ago

Group: mozilla-employee-confidential

Louis Chang [:louis]

Assignee

Comment 1

•

7 years ago

Attached patch [WIP] entire_word.patch (obsolete) — Details — Splinter Review

I am still investigating the implementation of finding entire word in gecko. It is a bit more complicated than I think originally, since there are many different languages and side cases that we need to consider. However, I have a WIP patch of finding entire word by using regexp. Maybe we can use it if we do not have any concern with regexp.

Evelyn Hung

Comment 2

•

7 years ago

I think the problem here is about performance and, as you mentioned, multiple language support. How does the WIP work? Do you feel significant delay on searching? Will it work on non-English text?

On the other hand, could we use ICU library to help on string matching?

Louis Chang [:louis]

Assignee

Comment 3

•

7 years ago

(In reply to Evelyn Hung [:evelyn] from comment #2)
> I think the problem here is about performance and, as you mentioned,
> multiple language support. How does the WIP work? Do you feel significant
> delay on searching? Will it work on non-English text?
> 
> On the other hand, could we use ICU library to help on string matching?

I have tested find string (non-entire word) on optimized, non-debug build. And the performance with or without regex seems no significant difference [1]. So I will first keep working on WIP patch (regex) to make it supports finding Chinese.

About ICU library, I do find that twitter has a open source ICU library [2] that we might be able to use. But I am not sure if we could just check them into m-c and use it, or how it will improve our string matching.

[1] Both take 78 sec to find "for" on 300+ pages pdf:
https://doc.lagout.org/programmation/C/Addison.Wesley.Effective.CPP.3rd.Edition.May.2005.pdf
[2] https://github.com/twitter/twitter-cldr-js

Louis Chang [:louis]

Assignee

Comment 4

•

7 years ago

It seems current Firefox doesn't support finding entire word in non-English language [1]. I think we should first clarify the coverage we want to support here.


[1] Find for entire word "委" in pdf http://www.ia.nctu.edu.tw//ezfiles/0/1000/attach/94/pta_921_9439026_07440.pdf

Louis Chang [:louis]

Assignee

Comment 5

•

7 years ago

Attached patch [WIP] entire_word.patch — Details — Splinter Review

Hi Evelyn,

Could you feedback the patch for me? Thanks.

Since currently FireFox does not support finding entire word in Chinese. I guess maybe the patch is good enough for now? I simply check the character before and after the search string to see if they are ASCII digit or ASCII alphabet when finding entire word.

BTW do we have any spec for finding entire word, such as which punctuation should we escape?

Attachment #8865331 - Attachment is obsolete: true

Attachment #8868058 - Flags: feedback?(ehung)

Louis Chang [:louis]

Assignee

Comment 6

•

7 years ago

(In reply to Louis Chang [:louis] from comment #5)
> Since currently FireFox does not support finding entire word in Chinese. I
> guess maybe the patch is good enough for now? 

The patch can support finding entire word in English.

Evelyn Hung

Comment 7

•

7 years ago

Comment on attachment 8868058 [details] [diff] [review]
[WIP] entire_word.patch

If I understand your logic correctly, this approach is wrong. It's possible that the user try to find a string that completely non-alphabet non-digit. Try to use the text in your patch and find ' == ', then you will see a match in |if (n == 0) {| expression, which won't be detected by your way.

Compare to regular expression, this approach is less efficient. You can use \b in regular expression for detecting word boundary without making your own heuristic(alphabet/digit).

Attachment #8868058 - Flags: feedback?(ehung) → feedback-

Evelyn Hung

Comment 8

•

7 years ago

Per our offline discussion, it's worth studying how icu library can be leveraged to solve this problem for multi-languages, and how we do word match for web pages in Firefox. If it's useful, we should consider exposing it to chrome's scripts. I'm fine to firstly address just English, but English-only doesn't meet our shipping criteria.

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 9

•

5 years ago

I'm closing this bug as WONTFIX per:

"The Mortar experiment has concluded. Mozilla does not consider the PDF use case justifies the burden of implementing and maintaining PDFium and a Pepper API implementation in Gecko."

Source: https://wiki.mozilla.org/Mortar_Project

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WONTFIX

BMO Automation

Updated

•

2 years ago

Product: Core → Core Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

[jsplugins] Implement back-end part of find whole-word function

Categories

(Core Graveyard :: Plug-ins, enhancement, P3)

Tracking

(Not tracked)

People

(Reporter: lochang, Assigned: lochang)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Attachment

General

Description

File Name

Content Type