Closed
Bug 1120148
Opened 11 years ago
Closed 11 years ago
Find should search across element boundaries and ignore breaking hyphens
Categories
(Toolkit :: Find Toolbar, enhancement)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: roess, Unassigned)
Details
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0
Build ID: 20141126041045
Steps to reproduce:
I converted PDFs into HTML. Uploaded them into the FF browser and performed a search with Quick Find. Note: Tools that convert PDF into HTML frequently or always use Paragraph Marks for formatting purposes at the end of line.
Actual results:
Quick Find recognizes a Paragraph Mark as character. Therefore words or terms that are separated by a Paragraph Mark are not found. Example: "Firefox is [Paragraph Mark] awesome" cannot be found by Quick Find because Quick Find recognizes the Paragraph Mark as character.
Expected results:
The term "Firefox is [Paragraph Mark] awesome" should be found as "Firefox is awesome". This problem applies to any file, PDF i.e. FF PDF js. viewer or HTML, that has Paragraph Marks at the end of a line. Search engines like YAHOO and desktop search applications ignore Paragraph Marks and correctly find such words or terms.
Note: the handling of hyphens causes a similar problem: if a word can be written as "predefined" "pre-defined" or "pre defined" only one version is found by Quick Find, it should find all three versions.
Related SuMo thread: https://support.mozilla.org/questions/1040472
The problem is not literally with a paragraph mark, but with the PDF converter's method of handling lines: the reporter's converter is creating an absolutely positioned div of each line of text from a single paragraph, fragmenting the text node. Example: https://support.mozilla.org/questions/1040472#answer-675768
Even in an ordinary web page, a find cannot span the break between certain elements, such as div, p, and h1.
An option to search across these element boundaries would work around that problem.
(I think the behavior is the same between Quick Find and the regular Find bar.)
Updated•11 years ago
|
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Component: Untriaged → Find Toolbar
Ever confirmed: true
OS: Windows 7 → All
Product: Firefox → Toolkit
Hardware: x86_64 → All
Summary: quick find search recognizes Paragraph Mark as character → Find should search across element boundaries and ignore breaking hyphens
Comment 2•11 years ago
|
||
(In reply to Jefferson from comment #1)
> Related SuMo thread: https://support.mozilla.org/questions/1040472
>
> The problem is not literally with a paragraph mark, but with the PDF
> converter's method of handling lines: the reporter's converter is creating
> an absolutely positioned div of each line of text from a single paragraph,
> fragmenting the text node. Example:
> https://support.mozilla.org/questions/1040472#answer-675768
This is a problem, or bug even, with the PDF converter that is used. Basically it produces invalid markup, because block elements should not to be used to break up words that semantically belong together. Styling should be separated from markup, which is why CSS was invented and should be used here instead.
If we were to change the find-in-page implementation to support this use case, we'd have to make substantial changes to the algorithm to introduce more complexity, which is high-risk and costly. The cost will be higher than the return.
I'm marking this bug as WONTFIX and strongly recommend the author(s) of this PDF converter library to fix their code. Possible pointers would be to read http://en.wikipedia.org/wiki/Semantic_HTML and WCAG - (http://www.w3.org/WAI/WCAG20/quickref/Overview.php#content-structure-separation .
Please feel free to reopen this bug if you think I misunderstood something and we should reconsider adapting our find-in-page algorithm.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•