Improve Regex docs for capturing groups

NEW
Unassigned

Status

defect
4 years ago
2 years ago

People

(Reporter: ugajin, Unassigned)

Tracking

Details

()

:: Developer Documentation Request

      Request Type: Correction
     Gecko Version: unspecified
 Technical Contact: 

:: Details

I am struggling to understand this:
The '(foo)' and '(bar)' in the pattern /(foo) (bar) \1 \2/ match and remember the first two words in the string "foo bar foo bar". The \1 and \2 in the pattern match the string's last two words.

However I get the next part, which swaps bar foo with foo bar:
Note that \1, \2, \n are used in the matching part of the regex. In the replacement part of a regex the syntax $1, $2, $n must be used, e.g.: 'bar foo'.replace( /(...) (...)/, '$2 $1' ).
Perhaps it is just me, but I wonder if it could not be better explained?
those 2 examples are not related.
may be better using different words and splitting them into 2 paragraph, with more examples and its output.

> The '(foo)' and '(bar)' in the pattern /(foo) (bar) \1 \2/ match and remember
> the first two words in the string "foo bar foo bar".
> The \1 and \2 in the pattern match the string's last two words.

The things this explains are:
  * a capturing parenthesis can remember matching substring
  * it can be referred inside the pattern, with \1
  * there can be multiple capturing parentheses, and the second one can be
    referred with \2  (and so on for 3rd and others)

> Note that \1, \2, \n are used in the matching part of the regex.
> In the replacement part of a regex the syntax $1, $2, $n must be used,
> e.g.: 'bar foo'.replace( /(...) (...)/, '$2 $1' ).

The things this explains are:
  * a capturing parenthesis can also be referred inside replacement string,
    with $1
  * the prefix for them are different

not sure it's so important to note the different prefix here.  it might be better just explaining some examples with \1 and $1.

There's "Using parenthesized substring matches" section that explains capturing parentheses and replacement.  so it would be better linking to it.  and we'd better adding some section for backreference (actually there's no mention about its name...), and link to the section from there.


(also, using "\n" as a backreference in the explanation might be confusing...)
How about following?
Is there anything still unclear?  (or do you have better example?)

====

Matches 'x' and remembers the match. The parentheses are called capturing parentheses.

The capturing parentheses can be referred by \1, \2, ... inside the pattern. It's called backreference.
/(.+)cat is \1/ matches to "longcat is long" but not to "longcat is looong", as the capturing parentheses remembers "long", and \1 also matches only to "long".
See Also [**** link to the new section for backreference ****].

The capturing parentheses can also be referred by $1, $2, ... inside the replacement string for String.prototype.replace.
"fox and cat".replace(/(.+) and (.+)/, "$2 and $1") returns "cat and fox", as 1st capturing parenthesis matches and remembers "fox" and 2nd one does for "cat".
See Also [Using parenthesized substring matches].

====
Flags: needinfo?(ugajin)
Thanks

The two examples as written appear as if related.

I am not sure the example: /(.+)cat helps much, as one now needs to be comfortable, and familiar with back-referencing. I had previously looked at x(?:y) which I note is called a lookahead, which seems to be a similar thing.

If both (.+)cat and \1 in /(.+)cat \1/ matches in 'longcat is long' but not in 'longcat is loooong' I am guessing that \1 matches to, or references the content of the capturing parenthesis. I think to see a practical example showing its use and purpose, would help more.

For example with the second part we can clearly see $1 is reference to 'foo':

      var fb = /(foo) (bar) \1 \2/;
      var bf = 'foo bar'.replace( /(...) (...)/, '$2 $1' );
      console.log(bf); // bar foo


What I would like to see is the purpose, and use for '\1 \2'.

Thanks again.

-u
Flags: needinfo?(ugajin)
(In reply to ugajin from comment #3)
> The two examples as written appear as if related.

which examples do you mean?


> I am not sure the example: /(.+)cat helps much, as one now needs to be
> comfortable, and familiar with back-referencing.

there's already example of backreference in current document, it's /(foo) (bar) \1 \2/.
do you mean that backreference should not be mentioned there?


> If both (.+)cat and \1 in /(.+)cat \1/ matches in 'longcat is long' but not
> in 'longcat is loooong'

maybe my explanation was not clear.
it means, if (.+) matches to "long", \1 also matches to "long".


> I am guessing that \1 matches to, or references the
> content of the capturing parenthesis.

\1 matches to the same string as 1st capturing parenthesis matches.


> For example with the second part we can clearly see $1 is reference to 'foo':
> 
>       var fb = /(foo) (bar) \1 \2/;
>       var bf = 'foo bar'.replace( /(...) (...)/, '$2 $1' );
>       console.log(bf); // bar foo

declaring fb there doesn't make sense, as it's not used.
$1 is replaced with "foo" because first (...) matches to "foo", not because of the (foo) in fb.


> What I would like to see is the purpose, and use for '\1 \2'.

the purpose is to match same string that appears before.
I think /(.+)cat is \1/ is the actual use.
(In reply to Tooru Fujisawa [:arai] from comment #4)
> (In reply to ugajin from comment #3)
> > The two examples as written appear as if related.
> 
> which examples do you mean?

This would be the same two that you refer to in comment #1 and that are provided under 'meaning'.

> 
> 
> > I am not sure the example: /(.+)cat helps much, as one now needs to be
> > comfortable, and familiar with back-referencing.
> 
> there's already example of backreference in current document, it's /(foo)
> (bar) \1 \2/.
> do you mean that backreference should not be mentioned there?
> 
Possibly, as it is (I think) intended to explain (x) capturing parentheses, why complicate it with a back-reference?
> 
> > If both (.+)cat and \1 in /(.+)cat \1/ matches in 'longcat is long' but not
> > in 'longcat is loooong'
> 
> maybe my explanation was not clear.
> it means, if (.+) matches to "long", \1 also matches to "long".
> 
> 
> > I am guessing that \1 matches to, or references the
> > content of the capturing parenthesis.
> 
> \1 matches to the same string as 1st capturing parenthesis matches.
> 
We seem to have got that.
> 
> > For example with the second part we can clearly see $1 is reference to 'foo':
> > 
> >       var fb = /(foo) (bar) \1 \2/;
> >       var bf = 'foo bar'.replace( /(...) (...)/, '$2 $1' );
> >       console.log(bf); // bar foo
> 
> declaring fb there doesn't make sense, as it's not used.
This is true, fb is redundant, it is there because it appears in the source explanation.
First explain capturing parenthesis, and then explain back-reference, and explain it elsewhere in a separate context.
> $1 is replaced with "foo" because first (...) matches to "foo", not because
> of the (foo) in fb.
> 
Yes, we got that.
> 
> > What I would like to see is the purpose, and use for '\1 \2'.
> 
> the purpose is to match same string that appears before.
> I think /(.+)cat is \1/ is the actual use.

This is not a use, unless you do something with \1 and/or \2

Thanks
(In reply to ugajin from comment #5)
> > do you mean that backreference should not be mentioned there?
> > 
> Possibly, as it is (I think) intended to explain (x) capturing parentheses,
> why complicate it with a back-reference?

because it's one of the usage of the capturing parentheses,
and capturing parentheses cannot be explained without referring it in some way.
of course we could explain it only with $1 in replacement string tho.
(or, match object returned by RegExp.prototype.exec, or arguments passed to replacement function to String.prototype.replace)

> > the purpose is to match same string that appears before.
> > I think /(.+)cat is \1/ is the actual use.
> 
> This is not a use, unless you do something with \1 and/or \2

I'm not sure what you mean.
(In reply to Tooru Fujisawa [:arai] from comment #6)
> (In reply to ugajin from comment #5)
> > > do you mean that backreference should not be mentioned there?
> > > 
> > Possibly, as it is (I think) intended to explain (x) capturing parentheses,
> > why complicate it with a back-reference?
> 
> because it's one of the usage of the capturing parentheses,
> and capturing parentheses cannot be explained without referring it in some
> way.
> of course we could explain it only with $1 in replacement string tho.
> (or, match object returned by RegExp.prototype.exec, or arguments passed to
> replacement function to String.prototype.replace)
> 
> > > the purpose is to match same string that appears before.
> > > I think /(.+)cat is \1/ is the actual use.
> > 
> > This is not a use, unless you do something with \1 and/or \2
> 
> I'm not sure what you mean.

I hope you can now improve this:

The '(foo)' and '(bar)' in the pattern /(foo) (bar) \1 \2/ match and remember the first two words in the string "foo bar foo bar". The \1 and \2 in the pattern match the string's last two words. Note that \1, \2, \n are used in the matching part of the regex. In the replacement part of a regex the syntax $1, $2, $n must be used, e.g.: 'bar foo'.replace( /(...) (...)/, '$2 $1' ).

It is a fine example of how to make a simple thing into something quite baffling, and tortured.

Thank you.
Status: UNCONFIRMED → NEW
Component: General → JavaScript
Ever confirmed: true
Priority: P5 → --
Summary: syntax query → Improve Regex docs for capturing groups
Hiya,

I am currently looking for bugs to fix as part of my Open Source Development module at Coventry University and I am interested in developing this bug.

Please could you assign this task to me and give me more information.

This is my first bug fix and any help would be appreciated.

Thank you.
You need to log in before you can comment on or make changes to this bug.