Closed Bug 1201979 Opened 9 years ago Closed 8 years ago

Support requestBody in onBeforeRequest

Categories

(WebExtensions :: Untriaged, defect, P1)

defect

Tracking

(firefox50 fixed)

RESOLVED FIXED
mozilla50
Iteration:
48.3 - Apr 25
Tracking Status
firefox50 --- fixed

People

(Reporter: makeen, Assigned: ma1)

References

(Blocks 1 open bug)

Details

(Whiteboard: [webRequest]triaged)

Attachments

(1 file, 2 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36

Steps to reproduce:

https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/WebRequest.jsm#Chrome_incompatibilities

I am requesting the addition of the "requestBody" in the opt_extraInfoSpec for onBeforeRequest so that I can access requestBody of the NetworkRequest. This would allow for similarity to the Chrome WebRequest extension API.


Actual results:

I cannot access the requestBody for outgoing network requests, I can only get/modify headers.


Expected results:

I was expecting for the WebRequest module to expose requestBody.
Component: General → WebExtensions
Product: Add-on SDK → Toolkit
Whiteboard: [webRequest]
Flags: blocking-webextensions-
Summary: Cannot access requestBody in the WebRequest.jsm (onBeforeRequest). → Support requestBody in onBeforeRequest
Status: UNCONFIRMED → NEW
Ever confirmed: true
See Also: → 1224581
Would be great to support exposing the http method as "method" too.
OS: Unspecified → All
Hardware: Unspecified → All
Whiteboard: [webRequest] → [webRequest]triaged
This is needed for NoScript's XSS filter.
Assignee: nobody → g.maone
Iteration: --- → 48.1 - Mar 21
Flags: blocking-webextensions- → blocking-webextensions+
Depends on: 1254204
Depends on: 1255894
Iteration: 48.1 - Mar 21 → 48.3 - Apr 25
No longer depends on: 1254204, 1255894
Attachment #8741198 - Flags: review?(kmaglione+bmo)
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

https://reviewboard.mozilla.org/r/46281/#review43147

I'm a bit worried about the way this is implemented.

The most obvious concern is performance. I don't think that we should read over the entire contents of file uploads in order to get their metadata. I really wish there were some clean way of just keeping hold of the original metadata from the form/FormData/File/..., but that would require a huge amount of refactoring.

The upside is that the nsIMultiplexInputStream that handles multi-part form data is seekable, and can completely skip the file input streams, if we seek over the size specified in the content-length header, or just handle the sub-streams individually.

There are some other issues too, though:

- When a File object is passed to xhr.send, it needs to be represented as `{file: filename}` in the request body segment. The only way I can see of handling this is to check if the stream in question is a nsIFileInputStream. The problem with that is that file streams don't give us any way to get the filename that they point to, so we'd need to add a getter to either nsIFileStream or nsIFileMetadata.
- We should really put a cap on the amount of data we read into memory. If we happen to mishandle a stream that points to a huge file, we shouldn't try to read gigabytes of data into memory.
- I really don't like the idea of reimplementing a whole new multi-part MIME data parser for this. Unfortunately, I'm not sure there's a better option at this point. The two implementations that are closest to what we need are nsMultiMixedConv, which doesn't let us seek over files, and BodyUtil.cpp, which requires the entire post body to be in an in-memory string. It would be nice if we could refactor that into a templated class that could also operate on a stream, but it might be too much work, and we'd need to run it by the module owner. Either way, it might be better to implement this logic in C++, which would at least give us some performance advantage, and give us access to some useful helper functions.

::: toolkit/components/extensions/test/mochitest/file_WebRequest_page1.html:32
(Diff revision 1)
>  <iframe src="redirection.sjs" width="200" height="200"></iframe>
>  <iframe src="data:text/plain,webRequestTest" width="200" height="200"></iframe>
>  <iframe src="data:text/plain,webRequestTest_bad" width="200" height="200"></iframe>
>  <iframe src="https://invalid.localhost/" width="200" height="200"></iframe>
>  <a href="file_WebRequest_page3.html?trigger=a" target="webrequest_link">link</a>
> -<form method="post" action="file_WebRequest_page3.html?trigger=form" target="webrequest_form"></form>
> +<form method="post" 

Please remove trailing whitespace, here and in the other form.

::: toolkit/components/extensions/test/mochitest/file_WebRequest_page1.html:53
(Diff revision 1)
>  "use strict";
> +{
> -for (let a of document.links) {
> +  for (let a of document.links) {
> -  a.click();
> +    a.click();
> -}
> +  }
> -for (let f of document.forms) {
> +  for (let f of document.forms) {

Can we call this `form` rather than `f`?

::: toolkit/components/extensions/test/mochitest/file_WebRequest_page1.html:56
(Diff revision 1)
> -  a.click();
> +    a.click();
> -}
> +  }
> -for (let f of document.forms) {
> +  for (let f of document.forms) {
> +    let action = new URL(f.action);
> +    let formData = new FormData(f);
> +    let webRequestFD = Object.create(null);

Please just use `{}`

::: toolkit/components/extensions/test/mochitest/file_WebRequest_page1.html:63
(Diff revision 1)
> +      name: "blobAsFile",
> +      content: new Blob(["A blob sent as a file"], {type: "application/json"}),
> +      fileName: "file.json",
> +    };
> +
> +    let update = () => {

Can you name this something a bit more descriptive?

::: toolkit/components/extensions/test/mochitest/file_WebRequest_page1.html:71
(Diff revision 1)
> +      }
> +      action.searchParams.set("upload", JSON.stringify(webRequestFD));
> +    };
> +    update();
> +    f.action = action;
> -  f.submit();
> +    f.submit();

We need to set an actual file for the file input element.

::: toolkit/components/extensions/test/mochitest/file_WebRequest_page1.html:84
(Diff revision 1)
> +      formData.append(file.name, file.content, file.fileName);
> +      update();
> +      post(formData);
> +      let byteLength = 16;
> +      action.searchParams.set("upload", `${byteLength} bytes`);
> +      post(new ArrayBuffer(byteLength));

We also need to test this with a bare File and Blob object.

::: toolkit/modules/addons/WebRequest.jsm:571
(Diff revision 1)
>        if (opts.responseHeaders) {
>          data.responseHeaders = this.getHeaders(channel, "visitResponseHeaders", kind);
>          responseHeaderNames = data.responseHeaders.map(h => h.name);
>        }
> +      if (opts.requestBody) {
> +        if (typeof requestBodyGetter === "undefined") {

Please just use `requestBodyGetter === undefined`

::: toolkit/modules/addons/WebRequestUpload.jsm:166
(Diff revision 1)
> +}
> +
> +function convertRawData(stream) {
> +  const CHUNK_SIZE = 8 * 1024; // Arbitrary, on Chrome it varies, maybe worth investigating ArrayBuffer limits.
> +  let raw = [];
> +  let binaryStream = Cc["@mozilla.org/binaryinputstream;1"].createInstance(Ci.nsIBinaryInputStream);

Please use an `nsIArrayBufferInputStream` instead.

::: toolkit/modules/addons/WebRequestUpload.jsm:197
(Diff revision 1)
> +        } else {
> +          throw new Error("Upload data not available anymore");
> +        }
> +      } catch (e) {
> +        Cu.reportError(e);
> +        requestBody = {error: e.message || e.toString()};

`String(e)`

::: toolkit/modules/addons/WebRequestUpload.jsm:207
(Diff revision 1)
> +  };
> +}
> +
> +var WebRequestUpload = {
> +  createGetter(channel) {
> +    return (channel instanceof Ci.nsIUploadChannel && channel.uploadStream) ? createGetter(Cu.getWeakReference(channel)) : null;

Please expand this out into a multi-line `if` statement.
https://reviewboard.mozilla.org/r/46281/#review43147

Thank you! I'm gonna investigate how much we can "exploit" nsIMultiplexInputStream for our purposes. Please notice, though, that the available content length information is about the whole upload, not the individual parts, and I kinda recall having read somewhere that touching the multiplex stream before the upload can break it, but I'll check first hand of course.
Anyway, since I've minimized the impact of requestBody support by making it completely on-demand (i.e. nothing happens until the listener actually tries to use the property) I believe we should ship an implementation as long as it's correct, even if suboptimal, especially given that multi-process support could significantly change the way we could optimize out these potential performance issues.

> Please just use `{}`

I just wanted to be sure there was no prototype, since property names reflect the form's element names and are therefore completely arbitrary. Was my concern moot, and if so why?

> We need to set an actual file for the file input element.

Do you know how to do it non-interactively? I resorted to adding the "blob as a file" in order to work-around this issue.
Anyway testing that an empty string is set if no file has been choosen is worth as well, isn't it?

> Please use an `nsIArrayBufferInputStream` instead.

I'm not sure I understand how. Isn't nsIArrayBufferInputStream just a wrapper to read an existent ArrayBuffer as a stream (much like nsIStringInputStream)? 
Isn't that the opposite of what we're trying to do here (transferring or, ideally, representing an opaque stream into an ArrayBuffer), or I'm misunderstanding something?
(In reply to Giorgio Maone [:mao] from comment #5)
> https://reviewboard.mozilla.org/r/46281/#review43147
> 
> Thank you! I'm gonna investigate how much we can "exploit"
> nsIMultiplexInputStream for our purposes. Please notice, though, that the
> available content length information is about the whole upload, not the
> individual parts,

Each part can have its own content-length header, but it looks like we don't
add it, so I guess it's not much help.

> and I kinda recall having read somewhere that touching the multiplex stream
> before the upload can break it, but I'll check first hand of course.

I don't think it should be a problem, as long as each part is seeked back to 0
when we're done. If it does cause problems, we can fix them.

> Anyway, since I've minimized the impact of requestBody support by making it
> completely on-demand (i.e. nothing happens until the listener actually tries
> to use the property) I believe we should ship an implementation as long as
> it's correct, even if suboptimal, especially given that multi-process
> support could significantly change the way we could optimize out these
> potential performance issues.

I'm not sure that making it on-demand is going to help much (and I'm not
honestly sure how on-demand you've made it, since I'm pretty sure the callback
arguments are structured-cloned).

> > Please just use `{}`
> 
> I just wanted to be sure there was no prototype, since property names
> reflect the form's element names and are therefore completely arbitrary. Was
> my concern moot, and if so why?

The only thing you're doing with the object is passing it to JSON.stringify,
which ignores prototypes anyway, and then parsing it with JSON.parse, which
always creates objects with the default prototype.

> > We need to set an actual file for the file input element.
> 
> Do you know how to do it non-interactively? I resorted to adding the "blob
> as a file" in order to work-around this issue.

You can just set the `value` property to a path, as long as you have chrome
privileges.

> Anyway testing that an empty string is set if no file has been choosen is
> worth as well, isn't it?

Testing that makes sense, yeah, but we also need to test with a real file.

> > Please use an `nsIArrayBufferInputStream` instead.
> 
> I'm not sure I understand how. Isn't nsIArrayBufferInputStream just a
> wrapper to read an existent ArrayBuffer as a stream (much like
> nsIStringInputStream)? 

Sorry, you're right. I always confuse the two. I meant please use
readArrayBuffer.
(In reply to Kris Maglione [:kmag] from comment #6)
> (In reply to Giorgio Maone [:mao] from comment #5)
> > https://reviewboard.mozilla.org/r/46281/#review43147
> > 
> > Thank you! I'm gonna investigate how much we can "exploit"
> > nsIMultiplexInputStream for our purposes. Please notice, though, that the

> Each part can have its own content-length header, but it looks like we don't
> add it, so I guess it's not much help.

The fact channel.uploadStream is an opaque nsIMimeInputStream rather than a nsIMultiplexInputStream doesn't help either, unfortunately. Any suggestion?


> I'm not sure that making it on-demand is going to help much (and I'm not
> honestly sure how on-demand you've made it, since I'm pretty sure the
> callback
> arguments are structured-cloned).

You're right, unfortunately. Maybe still worth keeping the on-demand machinery in order to run the potentially expensive data-building code just once.

Do we really need all this cloning, though, while we're still same-process, especially for sealed objects?

> You can just set the `value` property to a path, as long as you have chrome
> privileges.

Can I just do it inside the html web page or do I need to perform some magic dance to gain those privileges?
Flags: needinfo?(kmaglione+bmo)
(In reply to Giorgio Maone [:mao] from comment #7)
> The fact channel.uploadStream is an opaque nsIMimeInputStream rather than a
> nsIMultiplexInputStream doesn't help either, unfortunately. Any suggestion?

I think we may wind up having to change the interface. I don't like it, but I
don't really see any better options that allow us to handle raw file uploads
correctly. The only other option that I see for handling multi-part streams
with files is to add content-length headers to the individual parts, so that
we can seek over them, but that's not a trivial change.

> You're right, unfortunately. Maybe still worth keeping the on-demand
> machinery in order to run the potentially expensive data-building code just
> once.
> 
> Do we really need all this cloning, though, while we're still same-process,
> especially for sealed objects?

We do. The other option is to create the objects in the target compartments
directly, and make sure that every property we add to them is also in the
target compartment.

We could do that here, for the lazy getter, but I don't think we should. It's
not going to work when the add-on and the stream are in different processes,
for one thing, so we're going to wind up having to back it out sooner or later
anyway. And I'm not sure how helpful it will be, given that extensions are
expected to use event filters so they only get notified about streams where
they actually need the request body.

> > You can just set the `value` property to a path, as long as you have chrome
> > privileges.
> 
> Can I just do it inside the html web page or do I need to perform some magic
> dance to gain those privileges?

You can do it using SpecialPowers.wrap
Flags: needinfo?(kmaglione+bmo)
Necko crew, please help.

We need a way to parse a potentially gigabyte-long multipart nsIHttpChannel.uploadStream as fast as possible and extract all the name-value pairs for the "regular" form elements and just the field name and the file name as the for file input elements, obviously skipping over the actual file input streams.

ATM the most promising way seems to be using the nsIMultiplexInputStream index to move the cursor at the beginning of each part, then (rewinding of 1KB or so?) parse the headers and extract the required info.

The problem is that there's no way to grab this multiplex stream from the nsIMimeInputStream which wraps it.

Question: may I add a getter for the nsIMimeInputStream.data? 
Should I create a new interface or I can just change the existing one these days? 
Can you think of any better way to do this (maybe fixing bug 236084?)
Depends on: 236084
Flags: needinfo?(mcmanus)
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(honzab.moz)
> 
> Question: may I add a getter for the nsIMimeInputStream.data? 
> Should I create a new interface or I can just change the existing one these
> days? 
> Can you think of any better way to do this (maybe fixing bug 236084?)

I think this stuff is special purpose enough already that you can add the getter to the existing interfaces.

I'm kinda on the fence about 236084 - so if that does go forward and eliminate the need for this change that would be a point in its favor.
Flags: needinfo?(mcmanus)
Depends on: 1266184
So, after a little chat with mcmanus we decided to implement a getter to extract the multiplex stream from its nsIMimeInputStream envelope, rather than exposing the content length of each subpart in the encoded body, so I dropped bug 236084 and instead created + fixed bug 1266184.
I'm about to submit a patch here using this getter to skip the file streams: in preliminary tests this takes milliseconds to produce the desired requestBody format even with huge uploads (tested with a ~600MB ISO image).
In facts, I'm pretty happy to say this will make NoScript's XSS filter much faster than it is now when file uploads are involved :)
Now, if I could only manage to stuff this into 48...
No longer depends on: 236084
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(honzab.moz)
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46281/diff/1-2/
Attachment #8741198 - Attachment description: MozReview Request: Bug 1201979 - Implement requestBody. r?kmag → MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest, take 2, not ready to land (missing tests). f?kmag
Attachment #8741198 - Flags: review?(kmaglione+bmo)
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46281/diff/2-3/
Attachment #8741198 - Attachment description: MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest, take 2, not ready to land (missing tests). f?kmag → MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest. r?kmag
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46281/diff/3-4/
Blocks: 1268885
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46281/diff/4-5/
Attachment #8741198 - Attachment description: MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest. r?kmag → MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest (rebased after bug 1267033 fix). r?kmag
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46281/diff/5-6/
Attachment #8741198 - Attachment description: MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest (rebased after bug 1267033 fix). r?kmag → MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest (rebased after bug 1267033 + fixed overlapping test forms). r?kmag
Attachment #8741198 - Flags: review?(kmaglione+bmo)
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

https://reviewboard.mozilla.org/r/46281/#review46129

Sorry this is taking so long to review. I'm still trying to get a handle on your request body parser. Will hopefully be finished by later today.

::: modules/libpref/init/all.js:5273
(Diff revision 2)
>  pref("toolkit.pageThumbs.screenSizeDivisor", 7);
>  pref("toolkit.pageThumbs.minWidth", 0);
>  pref("toolkit.pageThumbs.minHeight", 0);
>  
>  pref("webextensions.tests", false);
> +pref("webextensions.webRequest.requestBodyMaxRawBytes", 16777216);



::: modules/libpref/init/all.js:5329
(Diff revision 6)
>  pref("toolkit.pageThumbs.screenSizeDivisor", 7);
>  pref("toolkit.pageThumbs.minWidth", 0);
>  pref("toolkit.pageThumbs.minHeight", 0);
>  
>  pref("webextensions.tests", false);
> +pref("webextensions.webRequest.requestBodyMaxRawBytes", 16777216);

Can you add a comment that this is 16MB?

::: toolkit/modules/addons/WebRequestUpload.jsm:270
(Diff revision 6)
> +    }
> +    return false;
> +  }
> +  let unbuffered = outerStream.unbufferedStream || outerStream;
> +  if (unbuffered instanceof Ci.nsIMultiplexInputStream) {
> +    for (let j = 0, count = unbuffered.count; j < count; j++) {

Any particular reason to use `j` rather than `i` here?

::: toolkit/modules/addons/WebRequestUpload.jsm:299
(Diff revision 6)
> +        }
> +      } catch (e) {
> +        Cu.reportError(e);
> +        requestBody = {error: e.message || String(e)};
> +      }
> +      requestBody = Object.freeze(requestBody);

No need to assign the result.
https://reviewboard.mozilla.org/r/46281/#review47345

I should have submitted this earlier in the week. I wanted to go over the tests in more detail, but I still haven't had a chance. They seemed OK in the last version, though.

As for the request body parsing code, I think the general method is OK, but I had a really hard time following it. There are a bunch of small issues I pointed out in the review, but I had to refactor a lot of it before I could follow the flow well enough to be sure it was correct: https://gist.github.com/kmaglione/2fa1e69008e81a681a1870015ab2d48c

In the process, I think I came across a few issues, and removed some redundant/unnecessary logic. I didn't do anything about the more serious issues that I mentioned in the review, though. They still need to be addressed.

If you can have an update ready within the next week or so, I can promise to have it reviewed in a day or two. Sorry about the extremely long wait this time.

::: toolkit/modules/addons/WebRequestUpload.jsm:30
(Diff revision 6)
> +
> +function parseFormData(stream, channel) {
> +  const BUFFER_SIZE = 8192; // Empirically it seemed a good compromise.
> +  let formData;
> +  let streamIdx = 0;
> +  let multiplex = (stream.data || stream) instanceof Ci.nsIMultiplexInputStream ? (stream.data || stream) : null;

Please use a multi-line if for this. Something like:

    if (stream.data) {
        stream = stream.data;
    }
    let multiplex = null;
    if (stream instanceof Ci.nsIMultiplexInputStream) {
        multiplex = stream;
    }

::: toolkit/modules/addons/WebRequestUpload.jsm:31
(Diff revision 6)
> +function parseFormData(stream, channel) {
> +  const BUFFER_SIZE = 8192; // Empirically it seemed a good compromise.
> +  let formData;
> +  let streamIdx = 0;
> +  let multiplex = (stream.data || stream) instanceof Ci.nsIMultiplexInputStream ? (stream.data || stream) : null;
> +  let touchedStreams = [];

Please declare variables as close to their first use as possible.

::: toolkit/modules/addons/WebRequestUpload.jsm:35
(Diff revision 6)
> +  let multiplex = (stream.data || stream) instanceof Ci.nsIMultiplexInputStream ? (stream.data || stream) : null;
> +  let touchedStreams = [];
> +
> +  function createTextStream(stream) {
> +    let textStream = Cc["@mozilla.org/intl/converter-input-stream;1"].createInstance(Ci.nsIConverterInputStream);
> +    textStream.init(stream, "UTF-8", 0, textStream.DEFAULT_REPLACEMENT_CHARACTER);

I don't think we want a replacement character here. Chrome specifically says that it only processes streams that are valid UTF-8.

::: toolkit/modules/addons/WebRequestUpload.jsm:52
(Diff revision 6)
> +      }
> +    }
> +    return null;
> +  }
> +
> +  let textBuffer = {};

Please make this a local variable in `readString`.

::: toolkit/modules/addons/WebRequestUpload.jsm:53
(Diff revision 6)
> +    }
> +    return null;
> +  }
> +
> +  let textBuffer = {};
> +  let textStream = multiplex ? nextTextStream() : createTextStream(stream);

Please use a multi-line `if` for this.

::: toolkit/modules/addons/WebRequestUpload.jsm:68
(Diff revision 6)
> +    }
> +    return "";
> +  }
> +
> +  function multiplexRead() {
> +    let s = readString();

Please avoid one-letter variables. Maybe `str`?

::: toolkit/modules/addons/WebRequestUpload.jsm:78
(Diff revision 6)
> +      }
> +    }
> +    return s;
> +  }
> +
> +  let read = multiplex ? multiplexRead : readString;

Multi-line if, please. Also, can we rename this to something like `readChunk`?

::: toolkit/modules/addons/WebRequestUpload.jsm:80
(Diff revision 6)
> +    return s;
> +  }
> +
> +  let read = multiplex ? multiplexRead : readString;
> +
> +  function append(name, value) {

Can we make formData an argument, rather than a variable in the outer scope?

Also, can we name this something like `appendFormData`?

::: toolkit/modules/addons/WebRequestUpload.jsm:91
(Diff revision 6)
> +  }
> +
> +  function parseMultiPart(chunk) {
> +    let boundary;
> +    {
> +      let m = chunk.match(/^--\S+/);

`let match = ...`?

::: toolkit/modules/addons/WebRequestUpload.jsm:96
(Diff revision 6)
> +      let m = chunk.match(/^--\S+/);
> +      if (!m) {
> +        formData = null;
> +        return;
> +      }
> +      boundary = m[0];

Why aren't we using the boundary from the header?

::: toolkit/modules/addons/WebRequestUpload.jsm:99
(Diff revision 6)
> +        return;
> +      }
> +      boundary = m[0];
> +    }
> +
> +    let unslash = (s) => s.replace(/\\(?=.)/g, "");

`.replace(/\\(.)/g, "$1")`, otherwise a double backslash won't be handled correctly.

Also... The only character we actually quote with backslashes is double quote, so I guess it should be `.replace(/\\"/g, '"')`

::: toolkit/modules/addons/WebRequestUpload.jsm:112
(Diff revision 6)
> +        }
> +        parts = [tail];
> +        tail = "";
> +      } else {
> +        if (tail) {
> +          chunk = tail.concat(chunk);

`chunk = tail + chunk`. Same for the other instances of the same.

::: toolkit/modules/addons/WebRequestUpload.jsm:125
(Diff revision 6)
> +      let partsCount = parts.length;
> +      if (partsCount) {
> +        if (partsCount > 1) {
> +          tail = parts.pop();
> +        } else {
> +          tail = chunk ? parts[0] : "";

Please expand into an `if`.

    } else if (chunk) {
        tail = parts[0];
    } else {
        tail = "";
    }

Same for the other instance.

::: toolkit/modules/addons/WebRequestUpload.jsm:128
(Diff revision 6)
> +          tail = parts.pop();
> +        } else {
> +          tail = chunk ? parts[0] : "";
> +        }
> +        let partIdx = parts.length;
> +        for (let p of parts) {

`part`

::: toolkit/modules/addons/WebRequestUpload.jsm:130
(Diff revision 6)
> +          tail = chunk ? parts[0] : "";
> +        }
> +        let partIdx = parts.length;
> +        for (let p of parts) {
> +          partIdx--;
> +          let m = p.match(/^\s*Content-Disposition: form-data; name="(|(?:.*?)[^\\])"(?:;\s*filename="(|(?:.*?)[^\\])"|[^;])\r?\n(?:Content-Type: (\S+))?.*\r?\n/i);

`match`

The `\r` should be mandatory.

Why the `[^;]`? And why the `.*` on the Content-Type line?

I don't think this accounts for a lot of possibilities correctly. The header block ends with `\r\n\r\n`. We should probably split at the first occurrence of that, split the block on `\r\n`, and then parse the individual headers.

::: toolkit/modules/addons/WebRequestUpload.jsm:142
(Diff revision 6)
> +            inFile = partIdx === 0 && p.length - header.length > 4;
> +            append(name, fileName ? unslash(fileName) : "");
> +          } else {
> +            inFile = false;
> +            if (partsCount > 1) {
> +              append(name, p.substring(header.length, p.length - 2));

`part.slice(header.length, -2)`

::: toolkit/modules/addons/WebRequestUpload.jsm:156
(Diff revision 6)
> +    }
> +  }
> +
> +  function parseUrlEncoded(chunk) {
> +    let tail = "";
> +    for (;;) {

Maybe `while (chunk || tail)` ?

Same goes for the other instances of the same.

::: toolkit/modules/addons/WebRequestUpload.jsm:162
(Diff revision 6)
> +      let pairs;
> +      if (!chunk) {
> +        if (!tail) {
> +          break;
> +        }
> +        pairs = [chunk = tail];

No assignment as a side-effect, please.

::: toolkit/modules/addons/WebRequestUpload.jsm:165
(Diff revision 6)
> +          break;
> +        }
> +        pairs = [chunk = tail];
> +        tail = "";
> +      } else {
> +        chunk = chunk.replace(/^\s+|\s+$/g, "");

`chunk = chunk.trim()`

::: toolkit/modules/addons/WebRequestUpload.jsm:172
(Diff revision 6)
> +          chunk = tail.concat(chunk);
> +        }
> +        pairs = chunk.split("&");
> +        tail = pairs.pop();
> +      }
> +      for (let p of pairs) {

`pair`

::: toolkit/modules/addons/WebRequestUpload.jsm:188
(Diff revision 6)
> +  if (multiplex) {
> +    parse = parseMultiPart;
> +    touchedStreams.push(multiplex);
> +  } else {
> +    let contentType;
> +    if (/^Content-Type:/i.test(header)) {

Why do we want to treat this as a header, especially in preference to the channel's Content-Type header?

::: toolkit/modules/addons/WebRequestUpload.jsm:231
(Diff revision 6)
> +  }
> +  return null;
> +}
> +
> +function convertRawData(outerStream) {
> +  const MAX_BYTES = Services.prefs.getIntPref("webextensions.webRequest.requestBodyMaxRawBytes");

We should cache this. See `XPCOMUtils.defineLazyPreferenceGetter`.

::: toolkit/modules/addons/WebRequestUpload.jsm:257
(Diff revision 6)
> +        raw.push(chunk);
> +        totalBytes += size;
> +
> +        if (totalBytes >= MAX_BYTES) {
> +          if (size < available) {
> +            chunk.truncated = true;

We should probably also include the original size, somewhere.

::: toolkit/modules/addons/WebRequestUpload.jsm:291
(Diff revision 6)
> +      try {
> +        let channel = weakChannelRef.get();
> +        if (channel instanceof Ci.nsIUploadChannel && channel.uploadStream) {
> +          let stream = channel.uploadStream.QueryInterface(Ci.nsISeekableStream);
> +          let formData = createFormData(stream, channel);
> +          requestBody = formData ? {formData} : {raw: convertRawData(stream)};

Please expand this into a multi-line `if`.
https://reviewboard.mozilla.org/r/46281/#review47345

> I don't think we want a replacement character here. Chrome specifically says that it only processes streams that are valid UTF-8.

NoScript's XSS filter needs the decoder to be lenient, in order to detect payloads in malformed uploads. As a compromise, I'm adding a "lenientFormData" property in addition to "raw" when the parser fails because of malformed UTF-8 sequences.

> Why aren't we using the boundary from the header?

Because we don't always find a boundary in the header, e.g. when multipart form data is sent from XHR, and anyway if we detect a multiplex stream we don't want to try parsing the header (which is not accessible from the channel, but from the MIME stream) because we could accidentally exceed it and bleed into a file field, triggering a synchronous filesystem I/O operation.

> `match`
> 
> The `\r` should be mandatory.
> 
> Why the `[^;]`? And why the `.*` on the Content-Type line?
> 
> I don't think this accounts for a lot of possibilities correctly. The header block ends with `\r\n\r\n`. We should probably split at the first occurrence of that, split the block on `\r\n`, and then parse the individual headers.

I slightly modified and simplified the regular expression, introducing it with comments about its rationale.

> Why do we want to treat this as a header, especially in preference to the channel's Content-Type header?

Because when the uploadStream is a nsIMIMEInputStream, channel.getRequestHeader("Content-Type") usually fails (NS_ERROR_NOT_AVAILABLE), while the header is actually parseable from the stream's content.

> We should probably also include the original size, somewhere.

Added an originalSize field, which is defined and contains the original size if and only if truncated == true;

> Please expand this into a multi-line `if`.

Done. But is there a guideline banning the ternary operator even in trivial cases like this one?
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46281/diff/6-7/
Attachment #8741198 - Attachment description: MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest (rebased after bug 1267033 + fixed overlapping test forms). r?kmag → MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest, refactoring + lenientFormData + originalSize. r?kmag
Attachment #8741198 - Flags: review?(kmaglione+bmo)
https://reviewboard.mozilla.org/r/46281/#review47345

> Because we don't always find a boundary in the header, e.g. when multipart form data is sent from XHR, and anyway if we detect a multiplex stream we don't want to try parsing the header (which is not accessible from the channel, but from the MIME stream) because we could accidentally exceed it and bleed into a file field, triggering a synchronous filesystem I/O operation.

Can you give me an example of how to trigger this?

Either way, if we have a header with a boundary, we should use that boundary.

> Because when the uploadStream is a nsIMIMEInputStream, channel.getRequestHeader("Content-Type") usually fails (NS_ERROR_NOT_AVAILABLE), while the header is actually parseable from the stream's content.

Can you give me an example of how to reproduce this?

> Done. But is there a guideline banning the ternary operator even in trivial cases like this one?

There isn't, but it's generally frowned upon, especially in code as complex as this where the structure needs to be as easy to follow as possible.
https://reviewboard.mozilla.org/r/46281/#review47345

> Can you give me an example of how to trigger this?
> 
> Either way, if we have a header with a boundary, we should use that boundary.

I added an optional argument to parseFormData() to pass and use a boundary string if it could be found earlier, but I'd be surprised if that actually happens.
In order to trigger the situation I meant to avoid here you just need a form with @enctype="multipart/form-data" and a big file as the first input data. On submission, now try to parse the Content-Type header from the nsIMIMEInputStream (channel.getRequestHeader() won't work, as previously noted): unless you're lucky enough to guess the right buffer size, the chunk you read will also contain some data from the file.
https://reviewboard.mozilla.org/r/46281/#review47345

> I added an optional argument to parseFormData() to pass and use a boundary string if it could be found earlier, but I'd be surprised if that actually happens.
> In order to trigger the situation I meant to avoid here you just need a form with @enctype="multipart/form-data" and a big file as the first input data. On submission, now try to parse the Content-Type header from the nsIMIMEInputStream (channel.getRequestHeader() won't work, as previously noted): unless you're lucky enough to guess the right buffer size, the chunk you read will also contain some data from the file.

I tested this with an XHR, and I still got the expected content-type header. A content-type header in the body of the request should not be valid, so I still don't understand the purpose of this.
(In reply to Kris Maglione [:kmag] from comment #24)
> https://reviewboard.mozilla.org/r/46281/#review47345
> 
> > In order to trigger the situation I meant to avoid here you just need a form
> 
> I tested this with an XHR,

Did you test this with a HTML form, too (submitted by clicking it submit button or calling its submit() method), as I suggested?
I created a Firefox Internet Content Filter add-on which uses XPCOM HTTP Observers discussed here:
https://developer.mozilla.org/en-US/Add-ons/Overlay_Extensions/XUL_School/Intercepting_Page_Loads#HTTP_Observers 

Since XPCOM is going away I am converting to electrolisys and webextensions.  From a Firefox add-on I need to be able to intercept, view, and block web content as it streams into a page as the user browses the Internet.  Will the resolution of this bug give me that capability?  If not, how do I accomplish this?
(In reply to Quazi Mozi from comment #26)

> From a Firefox add-on I need to be able to intercept, view, and block web
> content as it streams into a page as the user browses the Internet.  Will
> the resolution of this bug give me that capability?

You probably want to track Bug 1255894, too.

Kris, is there anything else preventing this one from landing yet?
Flags: needinfo?(kmaglione+bmo)
Is there an update on the status of this one? Would like to land in 50 if possible.
Priority: -- → P1
(In reply to Andy McKay [:andym] from comment #28)
> Is there an update on the status of this one? Would like to land in 50 if
> possible.

The current patch does its job efficiently and includes the refactoring proposed by Kris.
He was still unhappy from the complexity added by the need of parsing the Content-type header from the request body, instad of using channel.getRequestHeader(), in the most common case (HTML form submission), and suggested to directly eradicate the Necko quirks which at this moment mandate this.

Having considered this option, I'd really prefer to land this as it is and "fix" Necko in a spin-off bug, taking in account other potential clients (either internal, e.g. devtools, or add-ons) which deal or even rely on the quirks we want to remove.
I'm OK with landing the version without necko changes as long as it's behind a non-release only flag. We can decide if we want to let it ride the trains, or try to uplift the necko changes, once it's baked in Aurora for a while.
Flags: needinfo?(kmaglione+bmo)
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46281/diff/7-8/
Attachment #8741198 - Attachment description: MozReview Request: Bug 1201979: Support requestBody in onBeforeRequest, refactoring + lenientFormData + originalSize. r?kmag → Bug 1201979: Support requestBody in onBeforeRequest, use boundary from header if ever possible, rebased.
Attachment #8754562 - Attachment is obsolete: true
Attachment #8754562 - Flags: review?(kmaglione+bmo)
(In reply to Kris Maglione [:kmag] from comment #30)
> I'm OK with landing the version without necko changes as long as it's behind
> a non-release only flag. We can decide if we want to let it ride the trains,
> or try to uplift the necko changes, once it's baked in Aurora for a while.

Thank you. I've rebased the patch to be sure, but it seems nothing actually needed to be changed.

Regarding the non-release flag, is it just "virtual" to be remembered from this discussion, or there's something actually to be persisted somewhere, e.g. in this bug's metadata?
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46281/diff/8-9/
I hope I correctly figured out how to put this feature behind a no-release-build flag in my last patch, as agreed. Kris, do you think we can finally stuff it in 50? Thank you! :)
Kris, is there anything else I need to do to have it landed in 50? Thanks!
Flags: needinfo?(kmaglione+bmo)
Attachment #8741198 - Flags: review?(kmaglione+bmo)
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

https://reviewboard.mozilla.org/r/46281/#review65132

Just a few minor changes to make and then this can land.

::: toolkit/components/extensions/ext-webRequest.js:64
(Diff revision 9)
>            data2[opt] = data[opt];
>          }
>        }
>  
> +      if (data.requestBodyGetter) {
> +        XPCOMUtils.defineLazyGetter(data2, "requestBody", data.requestBodyGetter);

Please just define the property directly. The lazy getter won't survive structured clone, and is likely to cause other problems in any case.

::: toolkit/modules/addons/WebRequest.jsm:579
(Diff revision 9)
> +      if (opts.requestBody) {
> +        if (requestBodyGetter === undefined) {
> +          requestBodyGetter = WebRequestUpload.createGetter(channel);
> +        }
> +        if (requestBodyGetter) {
> +          XPCOMUtils.defineLazyGetter(data, "requestBody", requestBodyGetter);

Non-lazy, please.

::: toolkit/modules/addons/WebRequest.jsm:666
(Diff revision 9)
>    },
>  };
>  
>  var onBeforeRequest = {
>    addListener(callback, filter = null, opt_extraInfoSpec = null) {
> -    // FIXME: Add requestBody support.
> +    let opts = parseExtra(opt_extraInfoSpec, ["blocking", "requestBody"]);

Check `RELEASE_BUILD` flag here.

::: toolkit/modules/addons/WebRequestUpload.jsm:329
(Diff revision 9)
> +}
> +
> +WebRequestUpload = {
> +  createGetter(channel) {
> +    if (channel instanceof Ci.nsIUploadChannel && channel.uploadStream) {
> +      return createGetter(Cu.getWeakReference(channel));

Please just use a strong reference here. We can't actually evaluate this lazily, so it doesn't gain us anything. And if it did, it would just give us unpredictable behavior.
Comment on attachment 8774882 [details]
Bug 1201979 - Support requestBody in onBeforeRequest, use boundary from header if ever possible, non-release builds only.

https://reviewboard.mozilla.org/r/67270/#review65134

Please roll this into the previous commit.

::: toolkit/components/extensions/test/mochitest/test_ext_webrequest.html:324
(Diff revision 1)
>    }
>  
>    function onUpload(details) {
>      let url = new URL(details.url);
>      let upload = url.searchParams.get("upload");
> -    if (!upload) {
> +    if (!upload || details._requestBodyUnsupported) {

No need for `requestBodyUnsupported`. `addListener` should throw in builds where we don't support `requestBody`, so let's check that instead.

::: toolkit/modules/addons/WebRequest.jsm:27
(Diff revision 1)
>                                    "resource://gre/modules/BrowserUtils.jsm");
>  XPCOMUtils.defineLazyModuleGetter(this, "WebRequestCommon",
>                                    "resource://gre/modules/WebRequestCommon.jsm");
>  XPCOMUtils.defineLazyModuleGetter(this, "WebRequestUpload",
>                                    "resource://gre/modules/WebRequestUpload.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "AppConstants",

Please keep imports sorted by symbol name.

::: toolkit/modules/addons/WebRequest.jsm:577
(Diff revision 1)
>        if (opts.responseHeaders) {
>          data.responseHeaders = this.getHeaders(channel, "visitResponseHeaders", kind);
>          responseHeaderNames = data.responseHeaders.map(h => h.name);
>        }
> -      if (opts.requestBody) {
> +      if (opts.requestBody && !AppConstants.RELEASE_BUILD) {
> +        if (AppConstants.RELEASE_BUILD) {

This will never be true, given the conditional of the outer if.
Attachment #8774882 - Flags: review?(kmaglione+bmo)
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/46281/diff/9-10/
Attachment #8741198 - Attachment description: Bug 1201979: Support requestBody in onBeforeRequest, use boundary from header if ever possible, rebased. → Bug 1201979 - Support requestBody in onBeforeRequest.
Attachment #8741198 - Flags: review?(kmaglione+bmo)
Attachment #8774882 - Attachment is obsolete: true
I hope it's all fixed and ready to land now ;)
Thank you!
Comment on attachment 8741198 [details]
Bug 1201979 - Support requestBody in onBeforeRequest.

https://reviewboard.mozilla.org/r/46281/#review65250
Attachment #8741198 - Flags: review?(kmaglione+bmo) → review+
Pushed by maglione.k@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/3b04c5fb4e57
Support requestBody in onBeforeRequest. r=kmag
https://hg.mozilla.org/mozilla-central/rev/3b04c5fb4e57
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Flags: needinfo?(kmaglione+bmo)
Depends on: 1314493
I tried adding 'requestBody' to my addListener extraInfoSpec:

function startRequestListener (blocklist, allowedHosts, entityList) {
  browser.webRequest.onBeforeRequest.addListener(
    blockTrackerRequests(blocklist, allowedHosts, entityList),
    {urls: ['*://*/*']},
    ['blocking', 'requestBody']
  )
}

But in 51.0b12 (64-bit) the browser console shows "Error: Invalid option requestBody" as soon as it loads my extension.
Are you hitting bug 1305162 groovecoder?
Yup, looks like I am.
See Also: → 1379131
Depends on: 1410755
Product: Toolkit → WebExtensions
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: