Open Bug 240932 Opened 20 years ago Updated 3 years ago

Find matches across blocks and form controls

Categories

(SeaMonkey :: Find In Page, defect)

defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: rbs, Unassigned)

Details

Attachments

(3 files)

Doing for a find for, say 'x1' will match:

x<p></p>123<p>
x<input type="text" size=70>123

Will attach a testcase.
It also matches across other form controls such as 'x<input type=radio>1'.
changing title to reflect that.
OS: Windows 2000 → All
Hardware: PC → All
Summary: Find matches across blocks and textboxes → Find matches across blocks and form controls
Need help getting a fix for 1.7.

/be
This behavior first appeared between 2002-02-16-21 and 2002-02-18-21, strongly
implicating the initial landing of the current Find code (see bug 97157, bug
126232).

Before trying for a "fix", why don't we decide what we're trying to fix, first?
   Without that, it's all just flailing.  Of the following cases, which should
"x1" match (I think these cover most of the things we'd run into.....)

1)  x1 
2)  x<span></span>1 
3)  x<p></p>1
4)  x<span style="display: block"></span>1
5)  x<br>1
6)  x<span style="float: left"></span>1
7)  x<img src="exists.gif alt="">1
8)  x<img src="404.gif" alt="">1
9)  As 8, but with a linebreak before/after the image (should we allow such?)
10) As 8, but using <input type="image">
11) As 9, but using <input type="image">
12) x<input type="text">1
13) x<input type="text" style="width: 0; padding: 0; border: 0; margin:0">1
14) x<p style="display: inline"></p>1
15) x&shy;1 (with no linebreak)
16) As 11, but with linebreak at soft hyphen (I know we don't do it yet; not
    relevant to the question).
17) x<img src="404.gif" alt="1">
18) As 13, with linebreak (again, if we should allow such)
19) x<object src="404">1</object>
20) x<button>1</button>
21) x&zwnj;1
22) x&zwj;1
23) x&#xFEFF;1
bz: "fix" for 1.7 is "status quo ante", but you make a good point -- I hasten to
add that I was not encouraging flailing, just fixing.  First step in fixing is
specifying, if there's no spec for what should happen.

Your list raises some hard cases, but hard cases make bad law.  The easy cases
where we want /x1 not to match are mentioned in the summary. Fix those and we'll
be in better shape, even if we have some post-1.7 followup bugs to fix.

/be
"Status quo ante" being "what we had in 0.9.7" in this case?  I'm not sure
that's exactly feasible given the scope of the rewrite that happened.....

Note that some of my questions _are_ about form controls and are still hard.

If we can find a quick fix that's safe enough for 1.7 that fixes the common
cases, fine.  But I suspect that the only way to do that will be to
systematically examine what find considers to be "space".
IMHO find should work based on what is displayed. So what markup is inbetween
the 'x' and the '1' doesn't matter, what matters is how it's displayed, i.e.
blocks, hr-frames, "linebreaking-frames", form-frames should all be considered
delimiters.

Also, if it makes it easier to implement, i would also buy the argument that
when there is markup that logically breaks a word then that could be considered
a no-match to. Like in the case of 13 and 14. In thoses cases I think we can
'blame' the webauthor.

17 is technically very hard. At least if you make the testcase:
x<img src="404.gif" alt="1 more text">
since the range code can't deal with selecting part of an attribute.
OK, so as easy cases go, I would say that 1-7 and 12 are easy.  I thought 14 was
easy, but clearly I was wrong.  ;)

Trying to make sure 1-7 and 12 work for 1.7 seems reasonable.
The current code includes a notion of 'block' node (or 'treat node as block' to
be precise, @see nsFind::IsBlockNode, though it is essentially based on what the
HTML parser thinks, rather than what the corresponding frame's styledata says).
The Find code uses its notion of block to match <p><b>x</b>1</p>, but not
<p>x</p><p>1</p> (i.e., prevent continutation across different block parents).

However, the crux of this bug is really that the current code doesn't include a
provision to deal with crossing pass blocks that don't add text to the match.
There is no attempt at doing something, even aimed at just some of the easy
cases in bz's list. It will match weird cases such as 'x<hr>1' that bz didn't
include, or other cases such as x<br>1 or x<p><br><br></p>1.

I gather from a comment in the code that akkana knew that something was fishy in
this respect, but hasn't pursued it since her original rewrite.
Note: using view-selection-source on the find matches is good to see any fixups
that the parser may have done.
Attached patch patchSplinter Review
This does the trick for me. How comprehensive the patch becomes depends on how
comprehensive the |CanContinueMatch| function can be made. Right now the
function simply uses the same criterion as 'IBlockNode', with some simple
additions there such as 'br' and text controls.
patch also seems okay with the regression tests (browser version) at:
http://www.mozilla.org/quality/browser/front-end/testcases/xpapps-gui/find-on-page/find-in-page.html
Sorry, I was out of town during the discussion.

The patch looks reasonable.

Adding br to IsBlockNode: I'm concerned that this mean that br will also not
match spaces.  For example, in the "Where's mozilla?" section of the regression
tests, if you search for "la m", it would match across lines before, and now it
won't.  I suspect that in real pages, most of the time, matching across br is
the more useful behavior.  Consider, for one common example, people who send
html mail but hit return at the end of every line for no good reason.

(The same objection might be raised to having img in that list, though it was
there before the patch.)

Then that raises the thorny issue of matching spaces even across things that
look like br, such as p, though I can understand why we'd want to prevent
matching across block tags in general.

Additional request: I would like to see an update to the official Find
regression test page (the one rbs mentioned) as part of this fix, including
clear instructions as to which ones should and should not match, since the whole
point of this bug is "Now we're clear on what should and should not match". 
This should probably replace or modify the current "Where's mozilla?" section,
since tests the same sort of thing (but "Where's mozilla?" is deliberately vague
because we weren't sure ourselves).
> Adding br to IsBlockNode: I'm concerned that this mean that br will also not
> match spaces.
...
> Then that raises the thorny issue of matching spaces even across things that
> look like br, such as p, though I can understand why we'd want to prevent
> matching across block tags in general.

Interesting points. In general, it wouldn't seem that bad to me to match across
"empty" blocks when there is a space in the search pattern. For example, the
search pattern 'x 1' can match 'x<br>1' (or 'x<p><br></p>') in the document. In
other words, a special treatment for separate words. Changing the behavior this
way could be more intuitive and consistent (and address your concerns about
real-life cases), compared to having 'x1' matching 'x<br>1' (or the other forms)
which is somewhat counter-intuitive. I suspect that 'x1' matching 'x<br>1' is
uncommon, and that users resort to the inter-space 'x 1' when they envision
multi-word matches.

But the current patch lacks that support anyway.
> compared to having 'x1' matching 'x<br>1' (or the other forms)
> which is somewhat counter-intuitive

Agreed.  I don't advocate that, and neither the current code nor your proposed
patch does it.  My concern is only whether br (and lookalikes) should match an
explicit space in the pattern string.  I think they should.
Actually, the current code matches 'x1' to 'x<br>1) (see the attached testcase,
#5), and it (and the others) triggered the interest to the bug.

Did you put some thoughts on how to implement the inter-space behavior that we
are agreeing on? It doesn't look like a simple fix that would be safe enough for
1.7f. Does it?
Product: Core → Mozilla Application Suite
Since this affects Firefox as well, should the Product be changed to Core? Or is the code separate?
Filter "spam" on "guifeatures-nobody-20080610".
Assignee: guifeatures → nobody
QA Contact: guifeatures
Component: XP Apps: GUI Features → UI Design
Component: UI Design → Find In Page
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: