Closed Bug 995609 Opened 10 years ago Closed 8 years ago

TextSelection regex thinks 'p', 'w', '#' are dialable phone numbers

Categories

(Firefox for Android Graveyard :: Text Selection, defect, P5)

All
Android
defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: capella, Unassigned, Mentored)

References

Details

(Whiteboard: [lang=js][good first bug])

Attachments

(3 files, 5 obsolete files)

Note for context: [pw#] are all valid in a dial string; they represent pauses and pound.

The fix is to ensure that there's at least one numeral first!
Hardware: ARM → All
Whiteboard: [mentor=rnewman][lang=js][good first bug]
Mentor: rnewman
Whiteboard: [mentor=rnewman][lang=js][good first bug] → [lang=js][good first bug]
/^[\s]*[+]?[\d\s,-.\(\)*#]{1,}[pw\d\s,-.\(\)*#]*[\s]*$/

This conforms to these rules:

0) May begin with any no. of whitespaces
1) May begin with a single '+' sign
2) More than one digit should begin the dial string/appear immediately after single '+' sign
3) 'p', 'w' can occur only after the string satisfies the previous rules
4) Anywhere in the dial string, whitespaces, dashes, dots, parantheses, asterisks and pounds can appear
5) May end with any no. of whitespaces

Cases:
------
+1-(333)-(333)-(3333)            --           Typical phone number
*#06#                            --           Request for IMEI number
*#92702689#                      --           Nokia's warranty information

Concerns:
---------
1) Can there be a minimum and maximum lengths for the dial string?
2) Should we handle 'Extension numbers' too?
3) Are we modifying Linkify.js too? If so, can we make a global constant such as PHONE_REGEX?
Flags: needinfo?(rnewman)
N.B., ";" is also in common use (and available in Android phone dialers) for pause.



> 1) Can there be a minimum and maximum lengths for the dial string?

Certainly sanity limits, but not low ones.

With punctuation we expect real numbers to be no more than 30 characters (E.164 says 15 digits), but full dialstring stuff could easily be more (imagine calling into remote voicemail: 0114412345123456p12345123456#pp1234# would be the basic 'login' string).

Minimum lengths: lots of carriers support three-digit numbers (150). There are also star-shortcuts like *72 for forwarding. There seems to be evidence for two-digit numbers being supported by some networks.



> 2) Should we handle 'Extension numbers' too?

IMO yes. Google's dialer seems to not support "ext" or "x", but I've seen dialers that do. It's in common use on other platforms.



> 3) Are we modifying Linkify.js too? If so, can we make a global constant
> such as PHONE_REGEX?

needinfo on wesj and fedepaol for that.
Flags: needinfo?(wjohnston)
Flags: needinfo?(rnewman)
Flags: needinfo?(fedepaol)
(In reply to Richard Newman [:rnewman] from comment #3)
> N.B., ";" is also in common use (and available in Android phone dialers) for
> pause.

> > 3) Are we modifying Linkify.js too? If so, can we make a global constant
> > such as PHONE_REGEX?
> 
> needinfo on wesj and fedepaol for that.

We were aware that the regex used in linkify is pretty rough, however, due to the concern expressed here https://bugzilla.mozilla.org/show_bug.cgi?id=566225#c43 regarding both the performance impacts and the fact that the linkification process could mess up the page, we choose to refine the regex afterwards.

Linkify is still disabled (as far as I know), but it would certainly benefit from a better phone number regex.
Flags: needinfo?(fedepaol)
(In reply to Richard Newman [:rnewman] from comment #3)
> N.B., ";" is also in common use (and available in Android phone dialers) for
> pause.
Ok

> > 1) Can there be a minimum and maximum lengths for the dial string?
> 
> Certainly sanity limits, but not low ones.
So I see it that this would do good:

/^[\s]*[+]?[\d\s,-.\(\)*#]{2,15}[pw\d\s,-.\(\)*#]*[\s]*$/
                             ^
  -  Min. & Max. for just the phone number, prefixed with country code
  -  No other limits can be set for further part of string as it may differ with service provider

> > 2) Should we handle 'Extension numbers' too?
> 
> IMO yes. Google's dialer seems to not support "ext" or "x", but I've seen
> dialers that do. It's in common use on other platforms.
Can you point me to an example web page with 'ext' or 'x' listed? It would help if I see possible presentation styles for such numbers
Flags: needinfo?(rnewman)
(In reply to Shashank VRSN Sabniveesu from comment #5)

> So I see it that this would do good:
> 
> /^[\s]*[+]?[\d\s,-.\(\)*#]{2,15}[pw\d\s,-.\(\)*#]*[\s]*$/
>                              ^
>   -  Min. & Max. for just the phone number, prefixed with country code

Not quite, because punctuation is included. There can be 15 digits in the number, plus any amount of punctuation and whitespace. Perhaps 30 or 40 characters is a reasonable upper bound.


> Can you point me to an example web page with 'ext' or 'x' listed? It would
> help if I see possible presentation styles for such numbers

01234 123456x70
+12341231234;ext=800
(123) 123-4500 x120
(123) 123-4500 ext 120
(123) 123-4500 ext. 120
Flags: needinfo?(rnewman)
*#06#                            --           Request for IMEI number
*#92702689#                      --           Nokia's warranty information

These two need to fail. bug 794034
(In reply to Kevin Brosnan [:kbrosnan] from comment #7)
> *#06#                            --           Request for IMEI number
> *#92702689#                      --           Nokia's warranty information
> 
> These two need to fail. bug 794034

Useful context, but I'm not sure I agree it applies to this case -- this isn't even clickable, let alone silently wiping your phone from a frameset!

Remember that all we're doing here is selecting a number; the user needs to tap the phone icon to hand it to the dialer. And the state of vulnerable dialers is likely different now than it was two years ago.

That said, if there's a plausible way a user would end up with a wiped phone without having to jump through hoops, I'd be happy to not match numbers starting with "*".

Perhaps a good compromise is to treat *#06# as dialable if you select it manually, but not include the leading * in auto-selection?
(In reply to Kevin Brosnan [:kbrosnan] from comment #7)
> *#06#                            --           Request for IMEI number
> *#92702689#                      --           Nokia's warranty information
> 
> These two need to fail. bug 794034

Thanks for the heads up, kbrosnan; I didn't see this possibility

(After rnewman's comment)
For this bug, this doesn't seem to be considered so seriously. I will go with working as it is now yet would look for more such cases from you and rnewman here
^[\s]*[+]?[\d\s,-.\(\)*#]{2,30}[[;pw\d\s,-.\(\)*#]*[\s]*]?[(ext|x)?[=\d\s,-.\(\)\ *#]{2,30}]?$

This conforms to these rules:

0) May begin with any no. of whitespaces
1) May begin with a single '+' sign
2) More than two digits should begin the dial string/appear immediately after single '+' sign; min=2, max=30
3) /semicolon/, 'p' or 'w' can occur only after the string satisfies the previous rules
4) Anywhere in the dial string, whitespaces, dashes, dots, parantheses, asterisks and pounds can appear
5) Irrespective of any spaces, the strings 'ext' or 'x' may appear, with an optional '=' sign
6) Any dial string further conforms to rule#2   NOTE: No allowance for characters specified in rule #3
5) May end with any no. of whitespaces

Cases:
------
+1-(333)-(333)-(3333)            --           Typical phone number
*#06#                            --           Request for IMEI number          NOTE: still handled
*#92702689#                      --           Nokia's warranty information     NOTE: still handled

(Cases with 'ext|x')
01234 123456x70
+12341231234;ext=800
(123) 123-4500 x120
(123) 123-4500 ext 120
(123) 123-4500 ext. 120

I'm proceeding to make a patch with constant PHONE_REGEX and let both 'SelectionHandler.js' and 'Linkify.js' point to it
Flags: needinfo?(rnewman)
(In reply to Kevin Brosnan [:kbrosnan] from comment #7)
> *#06#                            --           Request for IMEI number
> *#92702689#                      --           Nokia's warranty information
> 
> These two need to fail. bug 794034

I disagree that these need to fail. The difference is that the user has to highlight and then click the call button with this UX. With a tel URI, they can just be tricked into clicking an innocuous looking link.
Comment on attachment 8444678 [details] [diff] [review]
BUG 995609 - Rewrite regex for phone number in web pages

Review of attachment 8444678 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/chrome/content/Linkify.js
@@ +5,5 @@
>  const LINKIFY_TIMEOUT = 0;
>  
>  function Linkifier() {
>    this._linkifyTimer = null;
> +  this._phoneRegex = /^[\s]*[+]?[\d\s,-.\(\)*#]{2,30}[[;pw\d\s,-.\(\)*#]*[\s]*]?[(ext|x)?[=\d\s,-.\(\)*#]{2,30}]?$/g;

I don't think starting with [\s]* is correct. The original fragment -- (?:\s|^) -- seems like it meets our needs better, no? "Whitespace or the start of the line" versus "A line starting with any amount of whitespace".

I suspect that if we had tests for this, this change would break them.

Further, this regex should differ from the selection handler per Comment 11 -- it matches strings like "*2767*3855#" which we explicitly don't want to make clickable (Bug 794034).

Our stated direction is "you can select and then dial whatever valid dialstring you like, but we're not going to make it a one-click mistake to factory-reset your phone". I think that applies to linkifying, not just tel: URIs.

::: mobile/android/chrome/content/SelectionHandler.js
@@ +900,5 @@
>      }
>      this._closeSelection();
>    },
>  
> +  _phoneRegex: /^[\s]*[+]?[\d\s,-.\(\)*#]{2,30}[[;pw\d\s,-.\(\)*#]*[\s]*]?[(ext|x)?[=\d\s,-.\(\)*#]{2,30}]?$/g,

No need for the []s where you have single-character classes.

Let's add some tests for this in

mobile/android/base/tests/testSelectionHandler.java
Attachment #8444678 - Flags: review?(rnewman) → feedback+
Blocks: 1035218
Flags: needinfo?(rnewman)
Hello, Shashank - Are you still working on this issue?
Flags: needinfo?(shashank)
FWIW, this is what Android itself uses: https://code.google.com/p/libphonenumber/
(In reply to Mike Hoye [:mhoye] from comment #14)
> Hello, Shashank - Are you still working on this issue?

No, I cannot allot the time this analysis requires, for now.
Flags: needinfo?(shashank)
Hi I am Arjun Vijayvargiya.I am interested to solve this bug.
There are no strict restrictive rules for a current or future phone number format. So the question is what is better: false positives or false negatives. What about simply assuming that a phone number is at least two symbols?
Attached patch dialable-phone-numbers.patch (obsolete) — Splinter Review
Simply made dialable numbers to be atleast 2 symbols (false positives approach instead of a "comprehensive" false-negative one). Also I've added test cases for the private method "_isPhoneNumber". If needed we can test this function indirectly by calling ".actions.CALL.selector.matches()".
Attachment #8503615 - Flags: review?(rnewman)
Attached patch dialable-phone-numbers.patch (obsolete) — Splinter Review
Attachment #8503615 - Attachment is obsolete: true
Attachment #8503615 - Flags: review?(rnewman)
Attachment #8503617 - Flags: review?(rnewman)
There was failure on try for the added checks:

> TEST-UNEXPECTED-FAIL | testSelectionHandler | JS Test - +1-(333)-(333)-(3333) should be a phone number

https://tbpl.mozilla.org/php/getParsedLog.php?id=50030880&tree=Try&full=1#error1
Comment on attachment 8503617 [details] [diff] [review]
dialable-phone-numbers.patch

Review of attachment 8503617 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for writing a test. The solution is incrementally better than the current state, but only solves the exact mentioned problem, not a trivial extension, so let's try to do better.

::: mobile/android/base/tests/roboextender/testSelectionHandler.html
@@ +223,5 @@
> + * Tests matching a  phone number
> + *
> + */
> +function testPhoneNumbers() {
> +    var sh = getSelectionHandler();

`let` not `var`, always.

@@ +240,5 @@
> +        '(123) 123-4500 ext. 120': true,
> +        // non numbers
> +        'p': false,
> +        'w': false,
> +        '#': false

Add "##" and "pp" to this set.

Nit: add trailing comma for the last item in the map.

@@ +247,5 @@
> +    return Promise.all(Object.keys(testCases).map(function (selection) {
> +        var expected = testCases[selection];
> +        return ok(
> +            sh._isPhoneNumber(selection) === expected,
> +            selection + ' should be ' + (expected ? '':'not ') + 'a phone number'

Always double quotes for strings.

::: mobile/android/chrome/content/SelectionHandler.js
@@ +960,5 @@
>      }
>      this._closeSelection();
>    },
>  
> +  _phoneRegex: /^\+?[0-9\s,-.\(\)*#pw]{2,30}$/,

This just turns the bug into "pp, ww, ## are dialable phone numbers".

No, there is no perfect international standard for the way human beings write phone numbers. But they definitely don't consist only of dialstring pause instructions.

Earlier iterations in this bug were moving in the direction of representing most valid dialstrings (indeed, essentially all that you'd encounter on the web) without many false positives or negatives.
Attachment #8503617 - Flags: review?(rnewman)
Attachment #8503617 - Flags: review-
Attachment #8503617 - Flags: feedback+
Attached patch dialable-phone-numbers.patch (obsolete) — Splinter Review
You are right. "Dialable phone numbers" at least implies an existence of a number/digit :) So I've added a look ahead to check for digits.
Attachment #8503617 - Attachment is obsolete: true
Attachment #8503815 - Flags: review?(rnewman)
Attached patch dialable-phone-numbers.patch (obsolete) — Splinter Review
Attachment #8503815 - Attachment is obsolete: true
Attachment #8503815 - Flags: review?(rnewman)
Attachment #8503816 - Flags: review?(rnewman)
Attached patch dialable-phone-numbers.patch (obsolete) — Splinter Review
Attachment #8503816 - Attachment is obsolete: true
Attachment #8503816 - Flags: review?(rnewman)
Attachment #8503818 - Flags: review?(rnewman)
Attachment #8503818 - Attachment is obsolete: true
Attachment #8503818 - Flags: review?(rnewman)
Attachment #8503819 - Flags: review?(rnewman)
Comment on attachment 8503819 [details] [diff] [review]
dialable-phone-numbers.patch

Review of attachment 8503819 [details] [diff] [review]:
-----------------------------------------------------------------

Apologies for the delayed review.

::: mobile/android/base/tests/roboextender/testSelectionHandler.html
@@ +236,5 @@
> +        "01234 123456x70": true,
> +        "+12341231234;ext=800": true,
> +        "(123) 123-4500 x120": true,
> +        "(123) 123-4500 ext 120": true,
> +        "(123) 123-4500 ext. 120": true,

Add one more:

German, with slashes:

  +49/228/ 429-2538

@@ +247,5 @@
> +    return Promise.all(Object.keys(testCases).map(function (selection) {
> +        let expected = testCases[selection];
> +        return ok(
> +            sh._isPhoneNumber(selection) === expected,
> +            selection + " should be " + (expected ? "":"not ") + "a phone number"

Nit: make this line:

  selection + " should be " + (expected ? " " : "not ") + "a phone number."

::: mobile/android/chrome/content/SelectionHandler.js
@@ +963,5 @@
>  
> +  // ^                          start of the string
> +  // (?=.*[0-9])                positive look ahead to see if at least one digit exists
> +  // \+?                        optional international call prefix
> +  // ([0-9\s,-=.\(\)*#pwx]|ext) all allowed tokens

Add '/' to this set. I just discovered that some German numbers are formatted with slashes.

@@ +964,5 @@
> +  // ^                          start of the string
> +  // (?=.*[0-9])                positive look ahead to see if at least one digit exists
> +  // \+?                        optional international call prefix
> +  // ([0-9\s,-=.\(\)*#pwx]|ext) all allowed tokens
> +  // {2,30}                     number should be at least 2 symbols and atmost 30

Nit: "at most" is two words.

@@ +970,1 @@
>  

Strictly speaking, this won't match something like:

(+1) 555 555 5555

I don't mind, though.
Attachment #8503819 - Flags: review?(rnewman) → feedback+
Hi, Dmytrii - Are you still working on this?
Flags: needinfo?(wjohnston)
see: bug 1235508 for current status.

Pruning some old bugs obviated by new Core/Layout AccessibleCaret implementation, or being no longer observable.

(Please re-open if appropriate.)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: