Closed Bug 1363581 Opened 7 years ago Closed 6 years ago

Investigate if RESTRequest may do bad things with utf-8 in edge-cases.

Categories

(Firefox :: Sync, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
Firefox 61
Iteration:
61.1 - Mar 26
Tracking Status
firefox61 --- fixed

People

(Reporter: markh, Assigned: tcsc)

References

Details

Attachments

(2 files)

Currently RESTRequest is implemented with a nsIStreamListener, and in the OnDataAvailable method we create a nsIConverterInputStream to decode the received chunk as utf-8. However, there seems a risk that the chunk fed to onDataAvailable splits a valid utf-8 sequence, causing some level of data corruption.

The 2 choices we seem to have are:

* work out how to use a single nsIConverterInputStream across all onDataAvailable calls - however, this relies on 2 assumptions: (1) that the stream implementation leaves behind not-yet-valid utf-8 chars ready for the next chunk and (2) that the same |stream| arg is passed to onDataAvailable on each call. (2) in particular seems unsafe to rely on.

* Buffer the bytes and do the encode when we know the stream has been consumed.

Given that (IIUC) no consumers of RESTRequest attempt to use the data as it streams in, the second option sounds slightly better (although with a slightly larger memory foot-print - there will be a brief period where both the raw stream and the decoded utf-8 are both in memory).
Priority: -- → P3
I think we should consider looking at this again - it would happen rarely, but it'd be fairly bad when it did.
Priority: P3 → --
See Also: → 1419203
Priority: -- → P2
Assignee: nobody → tchiovoloni
Iteration: --- → 61.1 - Mar 26
It certainly will do bad things to UTF8 in edge cases. Sadly, this patch makes the semantics of RESTRequest rather confusing, so (Specifically, onProgress is now a lie) I'd like to clear that up a bit before requesting review.
I ended up making it use promises rather than adapt the current callback-based API to more accurately reflect that `onProgress` is meaningless now.

I also modernized the code a bit, especially the tests.
Comment on attachment 8958650 [details]
Bug 1363581 - (part 1) Buffer RESTRequest response in memory before decoding to avoid corruption

https://reviewboard.mozilla.org/r/227568/#review234068

::: services/common/rest.js:515
(Diff revision 2)
>        this.onComplete(error);
>        this.onComplete = this.onProgress = null;
>        return;
>      }
>  
> +    try {

I think we want this block above the check for statusSuccess, so we can still examine the body on failure?

::: services/common/rest.js:518
(Diff revision 2)
>      }
>  
> +    try {
> +      // Decode this.response._rawBody,
> +      this.response.body = decodeString(this.response._rawBody, this.response.charset);
> +      // Call the 'progress' callback a single time with all the data.

and maybe kill `_rawBody` at this point?
Attachment #8958650 - Flags: review?(markh) → review+
Comment on attachment 8959060 [details]
Bug 1363581 - (part 2) Make RESTRequest's public API use promises and not callbacks

https://reviewboard.mozilla.org/r/227950/#review234070

Looks great, thanks!

::: services/common/hawkclient.js:221
(Diff revision 1)
> +      error = e;
> -        log.warn("hawk request error", error);
> +      log.warn("hawk request error", error);
> +      return request.response;
> +    });
> +
> +    // This seems impossible, but it seemed impossible in the old code too, but

this comment doesn't really make sense to me. Mentioning "the old code" will confuse future readers, but it also seems to be saying "it's impossible, but it's not"

::: services/common/hawkclient.js:233
(Diff revision 1)
> -      let status = restResponse.status;
> +    let status = restResponse.status;
>  
> -      log.debug("(Response) " + path + ": code: " + status +
> +    log.debug("(Response) " + path + ": code: " + status +
> -                " - Status text: " + restResponse.statusText);
> +              " - Status text: " + restResponse.statusText);
> -      if (logPII) {
> +    if (logPII) {
> -        log.debug("Response text: " + restResponse.body);
> +      log.debug("Response text: ", restResponse.body);

We don't need the trailing ": " on the message - Log.jsm will add it.

::: services/common/tests/unit/test_restrequest.js
(Diff revision 1)
> -    Assert.equal(request1.response.status, 200);
> +  Assert.equal(request1.response.status, 200);
> -    Assert.equal(request1.response.body, response);
> +  Assert.equal(request1.response.body, response);
> -    Assert.equal(request1.response.headers["content-type"],
> +  Assert.equal(request1.response.headers["content-type"],
> -                 contentType + charsetSuffix);
> +               contentType + charsetSuffix);
>  
> -    // Check that we default to UTF-8 if Content-Type doesn't have a charset.

this comment looks like it remains valid?

::: services/common/tests/unit/test_restrequest.js:202
(Diff revision 1)
> +  // real network requests there doesn't appear to be any pattern to the size of
> +  // the data `onDataAvailable` is called with), the smiling cat emoji encodes as
> +  // 4 bytes, and so when utf8 encoded, the `"a" + "
Attachment #8959060 - Flags: review?(markh) → review+
Comment on attachment 8959060 [details]
Bug 1363581 - (part 2) Make RESTRequest's public API use promises and not callbacks

https://reviewboard.mozilla.org/r/227950/#review234070

> this comment doesn't really make sense to me. Mentioning "the old code" will confuse future readers, but it also seems to be saying "it's impossible, but it's not"

I guess the reality here is that it can happen if the stream finishes before it starts. This seems broken, but it absolutely used to happen (I remember adding that check to address a issue, I think one in TPS, but it could have been an xpcshell-test -- having trouble finding it).

I'm going to assume it still can happen in edge cases, but update the comment to be more accurate.

> We don't need the trailing ": " on the message - Log.jsm will add it.

Thanks.

> This should either be uncommented or removed.

Also whoops.
Comment on attachment 8958650 [details]
Bug 1363581 - (part 1) Buffer RESTRequest response in memory before decoding to avoid corruption

https://reviewboard.mozilla.org/r/227568/#review234068

> I think we want this block above the check for statusSuccess, so we can still examine the body on failure?

I think I did this since I thought it would lead to us reporting real network errors as utf-8 decoding errors. Upon further thought, I'm pretty sure that this shouldn't happen though, since we replace dubious utf-8 with the replacement char.
Comment on attachment 8959060 [details]
Bug 1363581 - (part 2) Make RESTRequest's public API use promises and not callbacks

https://reviewboard.mozilla.org/r/227950/#review234070

> I guess the reality here is that it can happen if the stream finishes before it starts. This seems broken, but it absolutely used to happen (I remember adding that check to address a issue, I think one in TPS, but it could have been an xpcshell-test -- having trouble finding it).
> 
> I'm going to assume it still can happen in edge cases, but update the comment to be more accurate.

Actually it still shouldn't happen (although, for the same reasons it shouldn't have happened before). I've changed the comment to reflect the truth of the matter which is roughly "This shouldn't happen, but is easy to handle".
Pushed by tchiovoloni@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/492762b65abb
(part 1) Buffer RESTRequest response in memory before decoding to avoid corruption r=markh
https://hg.mozilla.org/integration/autoland/rev/2eb3eccb0d33
(part 2) Make RESTRequest's public API use promises and not callbacks r=markh
https://hg.mozilla.org/mozilla-central/rev/492762b65abb
https://hg.mozilla.org/mozilla-central/rev/2eb3eccb0d33
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 61
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: