Closed Bug 1165858 Opened 9 years ago Closed 9 years ago

[ServiceWorkers] Create a better algoritm to match urls

Categories

(Firefox OS Graveyard :: Gaia, defect)

defect
Not set
normal

Tracking

(b2g-master fixed)

RESOLVED FIXED
FxOS-S1 (26Jun)
Tracking Status
b2g-master --- fixed

People

(Reporter: arcturus, Assigned: flaki)

References

Details

Attachments

(1 file)

We need to simplify the route matching algorithm to keep it powerful but predictible.
Whiteboard: [s2]
Summary: [ServiWorkers] Create a better algoritm to match urls → [ServiceWorkers] Create a better algoritm to match urls
Whiteboard: [s2] → [s3]
Assignee: nobody → falconmaster
Target Milestone: --- → NGA S2 (12Jun)
Hi! According to our mail discussions I have added some tests and preliminary implementation for a simplified path matching algorithm.

The code could be viewed at:
https://github.com/gaia-components/serviceworkerware/compare/master...flaki:bug-1165858?expand=1

I have added basic support for the "*" token which matches arbitrary-length substrings (and is basically translated to (.*?)) and the ":foobar123"-style tokens. The later tokens are currently simple aliases for the "*" token.

There still remains a few issues, all of which are noted in the code above.

Main issue: what should we do for backward-compatibility & arbitrary regexes? I think we should keep the possibility of passing RegExp objects as the path parameter when adding a middleware to make it possible to create matching rules of arbitrary complexity. Currently we are passing a string, which is then converted to a RegExp.

The problem with this, is that for some patterns of the algorithm (that are invalid regex patterns, like the "*foo") this works well, as simple regex conversion fails and we handle the pattern in _parseSimplePath. The problem arises when we have valid-regex-simple-path-patterns (such as "a*b") -- those in turn are compiled to RegExp as-is, and never get the treatment of _parseSimplePath. This causes most of the new path algo tests to fail, that is why I chose to disable RegExps until we decide on how to proceed on this issue.

There are two other minor issues, found in the router tests - both have items currently disabled for the issues in question.

First question would be what should we do on "invalid" patterns (patterns that do not make sense). Currently I have an example, the placeholder crowding for the anonymous placeholder, e.g. patterns like "foo**bar" - this kind of pattern doesn't make sense, we should decide what to do (fail silently, gracefully degrade, fix it silently, throw...? etc.). Crowding for non-anonymous placeholders should not, on the other hand, be considered a problem (e.g.: "foo/:one:two/bar")

A slightly related second issue is with parseSimplePath - how to handle invalid characters in the path? Currently if the "translated" path can not be converted to a RegExp (e.g. because it contains invalid characters, like ? or ], for example the path "foo/*/bar]/") we currently fall back and interpret the whole path literally (that is, no interpreting tokens for "*" & the likes). This might not always be favorable, and should be discussed how should be handled.

Any other comments besides the issues above are, of course, much welcome.
Flags: needinfo?(salva)
(In reply to Szmozsánszky István [:flaki] from comment #1)
> Hi! According to our mail discussions I have added some tests and
> preliminary implementation for a simplified path matching algorithm.
> 
> The code could be viewed at:
> https://github.com/gaia-components/serviceworkerware/compare/master...flaki:
> bug-1165858?expand=1
> 
> I have added basic support for the "*" token which matches arbitrary-length
> substrings (and is basically translated to (.*?)) and the ":foobar123"-style
> tokens. The later tokens are currently simple aliases for the "*" token.
> 
> There still remains a few issues, all of which are noted in the code above.
> 
> Main issue: what should we do for backward-compatibility & arbitrary
> regexes? I think we should keep the possibility of passing RegExp objects as
> the path parameter when adding a middleware to make it possible to create
> matching rules of arbitrary complexity. Currently we are passing a string,
> which is then converted to a RegExp.

Not a priority right now. I don't mind breaking backward-compatibility as SWW is still in test. In the future we could provide some syntax like this:

```
.use('/:id', mw1, { id: /\d+/ });
```

To specify arbitrary complex fields. What do you think?

> 
> The problem with this, is that for some patterns of the algorithm (that are
> invalid regex patterns, like the "*foo") this works well, as simple regex
> conversion fails and we handle the pattern in _parseSimplePath. The problem
> arises when we have valid-regex-simple-path-patterns (such as "a*b") --
> those in turn are compiled to RegExp as-is, and never get the treatment of
> _parseSimplePath. This causes most of the new path algo tests to fail, that
> is why I chose to disable RegExps until we decide on how to proceed on this
> issue.
> 
> There are two other minor issues, found in the router tests - both have
> items currently disabled for the issues in question.
> 
> First question would be what should we do on "invalid" patterns (patterns
> that do not make sense). Currently I have an example, the placeholder
> crowding for the anonymous placeholder, e.g. patterns like "foo**bar" - this
> kind of pattern doesn't make sense, we should decide what to do (fail
> silently, gracefully degrade, fix it silently, throw...? etc.). Crowding for
> non-anonymous placeholders should not, on the other hand, be considered a
> problem (e.g.: "foo/:one:two/bar")

Fortunately, it's up to us. We need some rules first and I would throw if the URL is not matching syntax rules in the same way JS throws when you try to build an incorrect regular expression. I think this is quite predictable.

  1. Symbol * means any sequence.
  2. A :xxxx token is defined by a colon : and a sequence matching \w+
  3. ** seems to me to be equivalent to * but I would accept throwing as well as it could be a typo.
  4. :a:b its, for me, right now, clearly incorrect and I would throw.

> 
> A slightly related second issue is with parseSimplePath - how to handle
> invalid characters in the path? Currently if the "translated" path can not
> be converted to a RegExp (e.g. because it contains invalid characters, like
> ? or ], for example the path "foo/*/bar]/") we currently fall back and
> interpret the whole path literally (that is, no interpreting tokens for "*"
> & the likes). This might not always be favorable, and should be discussed
> how should be handled.

RegExp could be not the answer for this problem, maybe a custom `match` is better.

> 
> Any other comments besides the issues above are, of course, much welcome.

Francisco, can you take a second look?

Thank you.
Flags: needinfo?(salva) → needinfo?(francisco)
Flaki, when you want one of us to review your code or to provide feedback, add an attachment with the URL for the WIP pull request and set the flag feedback to ? with one of us or both.
I have the removed backward-compatibility with RegExp-based paths, and created a PR, please review & comment.
Attachment #8616551 - Flags: review?(salva)
Attachment #8616551 - Flags: review?(francisco)
(In reply to Salvador de la Puente González [:salva] from comment #2)
> (In reply to Szmozsánszky István [:flaki] from comment #1)
> > Hi! According to our mail discussions I have added some tests and
> > preliminary implementation for a simplified path matching algorithm.
> > 
> > The code could be viewed at:
> > https://github.com/gaia-components/serviceworkerware/compare/master...flaki:
> > bug-1165858?expand=1
> > 
> > I have added basic support for the "*" token which matches arbitrary-length
> > substrings (and is basically translated to (.*?)) and the ":foobar123"-style
> > tokens. The later tokens are currently simple aliases for the "*" token.
> > 
> > There still remains a few issues, all of which are noted in the code above.
> > 
> > Main issue: what should we do for backward-compatibility & arbitrary
> > regexes? I think we should keep the possibility of passing RegExp objects as
> > the path parameter when adding a middleware to make it possible to create
> > matching rules of arbitrary complexity. Currently we are passing a string,
> > which is then converted to a RegExp.
> 
> Not a priority right now. I don't mind breaking backward-compatibility as
> SWW is still in test. In the future we could provide some syntax like this:
> 
> ```
> .use('/:id', mw1, { id: /\d+/ });
> ```

Agree, this may come in the future if it's demanded.

> 
> To specify arbitrary complex fields. What do you think?
> 
> > 
> > The problem with this, is that for some patterns of the algorithm (that are
> > invalid regex patterns, like the "*foo") this works well, as simple regex
> > conversion fails and we handle the pattern in _parseSimplePath. The problem
> > arises when we have valid-regex-simple-path-patterns (such as "a*b") --
> > those in turn are compiled to RegExp as-is, and never get the treatment of
> > _parseSimplePath. This causes most of the new path algo tests to fail, that
> > is why I chose to disable RegExps until we decide on how to proceed on this
> > issue.
> > 
> > There are two other minor issues, found in the router tests - both have
> > items currently disabled for the issues in question.
> > 
> > First question would be what should we do on "invalid" patterns (patterns
> > that do not make sense). Currently I have an example, the placeholder
> > crowding for the anonymous placeholder, e.g. patterns like "foo**bar" - this
> > kind of pattern doesn't make sense, we should decide what to do (fail
> > silently, gracefully degrade, fix it silently, throw...? etc.). Crowding for
> > non-anonymous placeholders should not, on the other hand, be considered a
> > problem (e.g.: "foo/:one:two/bar")
> 
> Fortunately, it's up to us. We need some rules first and I would throw if
> the URL is not matching syntax rules in the same way JS throws when you try
> to build an incorrect regular expression. I think this is quite predictable.
> 
>   1. Symbol * means any sequence.

+1

>   2. A :xxxx token is defined by a colon : and a sequence matching \w+

+1

>   3. ** seems to me to be equivalent to * but I would accept throwing as
> well as it could be a typo.

To me equals to *.

>   4. :a:b its, for me, right now, clearly incorrect and I would throw.

I will throw, since we can have different data types /w+ or /d+, but in theory we could hav two consecutive strings, making it impossible to understand the difference from each paramter.
:<identifier> must be the end of the expression or must be followed by a breaking character (or string), but never another identifier.
> > 
> > A slightly related second issue is with parseSimplePath - how to handle
> > invalid characters in the path? Currently if the "translated" path can not
> > be converted to a RegExp (e.g. because it contains invalid characters, like
> > ? or ], for example the path "foo/*/bar]/") we currently fall back and
> > interpret the whole path literally (that is, no interpreting tokens for "*"
> > & the likes). This might not always be favorable, and should be discussed
> > how should be handled.
> 
> RegExp could be not the answer for this problem, maybe a custom `match` is
> better.
> 
> > 
> > Any other comments besides the issues above are, of course, much welcome.
> 
> Francisco, can you take a second look?

Looking pretty good.

> 
> Thank you.

I would add one more thing, :flaki, could you add a section in the readme (I think it already exists) about the router, how to write the matching strings, options, restrictions and so on?
Flags: needinfo?(francisco)
Comment on attachment 8616551 [details] [review]
PR: [ServiceWorkers] Create a better algoritm to match urls (#1165858) #21

Happy to land this, with the comments that Salva did :)
Attachment #8616551 - Flags: review?(francisco) → review+
Comment on attachment 8616551 [details] [review]
PR: [ServiceWorkers] Create a better algoritm to match urls (#1165858) #21

Very good work, :flaki! Please, address the comments on GitHub and ask for my review again.
Attachment #8616551 - Flags: review?(salva)
Status: NEW → ASSIGNED
Comment on attachment 8616551 [details] [review]
PR: [ServiceWorkers] Create a better algoritm to match urls (#1165858) #21

I have updated the PR, please have a look. Thanks!
Attachment #8616551 - Flags: review?(salva)
Comment on attachment 8616551 [details] [review]
PR: [ServiceWorkers] Create a better algoritm to match urls (#1165858) #21

Very good work! I have some doubts and comments. Address them before asking for the r? flag again.
Attachment #8616551 - Flags: review?(salva)
Whiteboard: [s3]
Target Milestone: NGA S2 (12Jun) → NGA S3 (26Jun)
Comment on attachment 8616551 [details] [review]
PR: [ServiceWorkers] Create a better algoritm to match urls (#1165858) #21

Excellent work! Let's merge this.
Attachment #8616551 - Flags: superreview+
master: 72f8ea24cbbadae33c788e7551ed330fb6ad4667
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
As NGA Program Manager suggested, let's replace the NGA-X milestones with FxOS-Sx ones (more generic ones), once Bug 1174794 has already landed
Target Milestone: NGA S3 (26Jun) → FxOS-S1 (26Jun)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: