Closed Bug 803377 Opened 12 years ago Closed 12 years ago

Add function to compress string data

Categories

(Cloud Services :: Firefox: Common, defect)

defect
Not set
normal

Tracking

(firefox18 fixed, firefox19 fixed, firefox20 fixed)

RESOLVED FIXED
mozilla20
Tracking Status
firefox18 --- fixed
firefox19 --- fixed
firefox20 --- fixed

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(1 file, 4 obsolete files)

AFAICT there is no generic JS module for performing string encodings. I constantly find myself having to muck about with nsIStringConverterService and friends. I think it would be really nice if all this XPCOM voodoo were hidden from view and we could easily perform common string encodings.

The attached patch adds a new "string" component to toolkit. In it, we have a single file, StringEncoding.jsm. In that, we have the StringEncoding object which exposes an API for performing gzip compression.

Other encodings can be added to this file in the future. Other features can be added to our string library as well. Now we have a place for them to call home.

Gavin gets review. Mossop gets f? for approving new toolkit additions (since he is toolkit module owner).

I'm not sure what to do about the test. I don't want to put in data I obtained by calling the API itself. And, I was unable to figure out a trusted input and output pair to verify with. I tried Python's zlib.compress and it returned something different. I /think/ our built-in gzip encoder is writing out extra header data (perhaps a file header?) but I really have no clue. I'm also not an expert with nsIStringConverterService and friends, so I might have done something horribly wrong in XPCOM land.
Attachment #673043 - Flags: review?(gavin.sharp)
Attachment #673043 - Flags: feedback?(dtownsend+bugmail)
Oh, you meant nsIStreamConverterService (not nsIStringConverterService) - that confused me for a bit. The idea of consolidating this code seems reasonable, but I only see one other potential caller of this specific method (in TelemetryPing.js). There are other test callers of asyncConvertData with string streams, but they mostly use different to/from parameters.

This seems like something that could live in a toolkit/modules, were we to create one (ala browser/modules). A whole new components directory to contain just a single module and its test seems like overkill.

The proliferation of small, very specific JS modules is a bit problematic because the fixed overhead of a JS module is rather high at the moment (see e.g. bug 773845 and bug 798491), but that's a problem we probably should be addressing separately either way, so I guess we shouldn't worry about it too much here. Worth being aware of, though.
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #1)
> Oh, you meant nsIStreamConverterService (not nsIStringConverterService) -
> that confused me for a bit. The idea of consolidating this code seems
> reasonable, but I only see one other potential caller of this specific
> method (in TelemetryPing.js). There are other test callers of
> asyncConvertData with string streams, but they mostly use different to/from
> parameters.

I have lots of code in services/common/utils.js and services/sync/modules/util.js that could be moved to this module and related string modules.

> This seems like something that could live in a toolkit/modules, were we to
> create one (ala browser/modules). A whole new components directory to
> contain just a single module and its test seems like overkill.

I will create toolkit/modules then.

> The proliferation of small, very specific JS modules is a bit problematic
> because the fixed overhead of a JS module is rather high at the moment (see
> e.g. bug 773845 and bug 798491), but that's a problem we probably should be
> addressing separately either way, so I guess we shouldn't worry about it too
> much here. Worth being aware of, though.

I think I'm going to bury my head in the sand here as being discouraged from writing small, loosely coupled and reusable modules is so wrong on so many levels. That being said, concatenating files together into a single module is doable with a little build system magic and maybe some minor changes to how we do modules in Gecko. Hopefully that's never my problem ;)
I'm not discouraging you from doing it, I'm making you aware of some of our current platform limitations that you should consider, as part of the cost/benefit analysis exercise that is software development :)
Normally I'd say the benefits of a line of code written in 1 place instead of N outweighs a slight memory increase due to module proliferation. But B2G. Dammit.
Comment on attachment 673043 [details] [diff] [review]
Implement string component with gzip encoding, v1

Review of attachment 673043 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/string/tests/unit/test_gzip_encoding.js
@@ +18,5 @@
> +
> +  do_check_true(result.length > 0);
> +
> +  // We should probably verify results against an expected value. However,
> +  // how does one obtain an expected value without calling the API itself?

FWIW, for the tests I added when fixing gzip bugs in the platform code, I used the result of gzip itself.  IIRC, there were a few fiddly details that required hex-editing the gzip output so our gzip implementation would be happy with it, all in non-required fields in the gzip format.

...but maybe you think doing that is scarier than not having any tests at all. ;)
Nathan: Can you just refer me to a known input and output that I can hard-code into the test? I'm basically looking for a smoke test.
Ping regarding comment #6.
Flags: needinfo?(nfroyd)
modules/libjar/zipwriter/test/unit/test_bug717061.js
modules/libjar/zipwriter/test/unit/data/test_bug717061.html
modules/libjar/zipwriter/test/unit/data/test_bug717061.gz

Are the bits I was talking about in comment 5.  See:

http://mxr.mozilla.org/mozilla-central/source/modules/libjar/zipwriter/test/unit/test_bug717061.js#51

for some of the fiddly bits.
Flags: needinfo?(nfroyd)
Comment on attachment 673043 [details] [diff] [review]
Implement string component with gzip encoding, v1

Review of attachment 673043 [details] [diff] [review]:
-----------------------------------------------------------------

As said elsewhere, let's pull the trigger and create toolkit/modules and put this in there.

::: toolkit/components/string/StringEncoding.jsm
@@ +10,5 @@
> +
> +/**
> + * Provides utility functions for performing encoding operations on strings.
> + *
> + * Every find yourself encoding a string to another type or format? This

s/Every/Ever/

@@ +14,5 @@
> + * Every find yourself encoding a string to another type or format? This
> + * object is for you! Simply add your encoder and/or decoder to this and
> + * share it with the world.
> + */
> +let StringEncoding = {

Is this actually appropriate? To me string encoding is all about character encodings (UTF8, UTF16, etc.). Gzip seems different to that. Maybe StringConverter is better?

@@ +24,5 @@
> +   * @param onComplete
> +   *        (function) Callback invoked with compressed output as a string
> +   *        parameter.
> +   */
> +  gzipEncodeToString: function gzipEncodeToString(s) {

This is a really confusing name. How about convertToGzip?

I'm concerned that this is a synchronous method and people might try to use it for large chunks of data. What are the performance characteristics of this?
Comment on attachment 673043 [details] [diff] [review]
Implement string component with gzip encoding, v1

I rewrote a lot of this patch locally. Clearing review flag.
Attachment #673043 - Flags: review?(gavin.sharp)
Attachment #673043 - Flags: feedback?(dtownsend+bugmail)
(In reply to Dave Townsend (:Mossop) from comment #9)
> Comment on attachment 673043 [details] [diff] [review]
> Implement string component with gzip encoding, v1
> 
> Review of attachment 673043 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> As said elsewhere, let's pull the trigger and create toolkit/modules and put
> this in there.

Already done in my local patch (to be uploaded).


> @@ +14,5 @@
> > + * Every find yourself encoding a string to another type or format? This
> > + * object is for you! Simply add your encoder and/or decoder to this and
> > + * share it with the world.
> > + */
> > +let StringEncoding = {
> 
> Is this actually appropriate? To me string encoding is all about character
> encodings (UTF8, UTF16, etc.). Gzip seems different to that. Maybe
> StringConverter is better?

Sure.

> @@ +24,5 @@
> > +   * @param onComplete
> > +   *        (function) Callback invoked with compressed output as a string
> > +   *        parameter.
> > +   */
> > +  gzipEncodeToString: function gzipEncodeToString(s) {
> 
> This is a really confusing name. How about convertToGzip?

If we move this to a StringConverter type, then that would definitely make sense.

> I'm concerned that this is a synchronous method and people might try to use
> it for large chunks of data. What are the performance characteristics of
> this?

If you start with a String, you have all the data available. So, making this async would require splitting data manually, event loop spinning, etc. I'm not convinced that's a good idea.

I think a better idea is to just not use these functions if you have large values: use streams instead.
(In reply to Gregory Szorc [:gps] from comment #11)
> (In reply to Dave Townsend (:Mossop) from comment #9)
> > Comment on attachment 673043 [details] [diff] [review]
> > Implement string component with gzip encoding, v1
> > 
> > Review of attachment 673043 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > As said elsewhere, let's pull the trigger and create toolkit/modules and put
> > this in there.
> 
> Already done in my local patch (to be uploaded).
> 
> 
> > @@ +14,5 @@
> > > + * Every find yourself encoding a string to another type or format? This
> > > + * object is for you! Simply add your encoder and/or decoder to this and
> > > + * share it with the world.
> > > + */
> > > +let StringEncoding = {
> > 
> > Is this actually appropriate? To me string encoding is all about character
> > encodings (UTF8, UTF16, etc.). Gzip seems different to that. Maybe
> > StringConverter is better?
> 
> Sure.
> 
> > @@ +24,5 @@
> > > +   * @param onComplete
> > > +   *        (function) Callback invoked with compressed output as a string
> > > +   *        parameter.
> > > +   */
> > > +  gzipEncodeToString: function gzipEncodeToString(s) {
> > 
> > This is a really confusing name. How about convertToGzip?
> 
> If we move this to a StringConverter type, then that would definitely make
> sense.
> 
> > I'm concerned that this is a synchronous method and people might try to use
> > it for large chunks of data. What are the performance characteristics of
> > this?
> 
> If you start with a String, you have all the data available. So, making this
> async would require splitting data manually, event loop spinning, etc. I'm
> not convinced that's a good idea.

It shouldn't need any event loop spinning, you'd just make a stream out of the string and async feed it to the stream converter and call a callback when done.

> I think a better idea is to just not use these functions if you have large
> values: use streams instead.

I think that's a great idea too, but it's difficult to enforce that on people so providing a way to do it seems like a potential footgun for the future.
Changing bug to reflect its original purpose and moving out of Toolkit. Hopefully I planted a seed and Toolkit will add something similar one day.
Component: General → Firefox: Common
Product: Toolkit → Mozilla Services
Summary: Add a component for string encodings → Add function to compress string data
I wish the test were more robust. Then again, I'm not really testing the core stream encoding code, am I?
Assignee: nobody → gps
Attachment #673043 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #677883 - Flags: review?(rnewman)
More generic solution to avoid XPCOM masturbation. I'm sure this function has already been implemented somewhere in the tree. But, I can't find it.
Attachment #677883 - Attachment is obsolete: true
Attachment #677883 - Flags: review?(rnewman)
Attachment #677889 - Flags: review?(rnewman)
Forgot to qref.
Attachment #677889 - Attachment is obsolete: true
Attachment #677889 - Flags: review?(rnewman)
Attachment #677891 - Flags: review?(rnewman)
Comment on attachment 677891 [details] [diff] [review]
Add CommonUtils.convertString, v2

Review of attachment 677891 [details] [diff] [review]:
-----------------------------------------------------------------

Let's get a UTF-8 test first. Even if the breakage is elsewhere in the codebase, I don't want to ship a handy-dandy utility that makes it easier to shoot oneself in the foot.

::: services/common/tests/unit/test_utils_convert_string.js
@@ +10,5 @@
> +  run_next_test();
> +}
> +
> +add_test(function test_compress_string() {
> +  const INPUT = "hello";

Please put non-ASCII characters into this test string. I opted for "pïgéons1" for my Android Sync text encoding test, but I encourage you to also pass the Turkey Test :)

E.g.,

"Árvíztűrő tükörfúrógép いろはにほへとちりぬるを Pijamalı hasta, yağız şoföre çabucak güvendi."

::: services/common/utils.js
@@ +611,5 @@
> +   */
> +  convertString: function convertString(s, source, dest) {
> +    let is = Cc["@mozilla.org/io/string-input-stream;1"]
> +               .createInstance(Ci.nsIStringInputStream);
> +    is.setData(s, s.length);

Check for !s.

@@ +614,5 @@
> +               .createInstance(Ci.nsIStringInputStream);
> +    is.setData(s, s.length);
> +
> +    let scs = Cc["@mozilla.org/streamConverters;1"]
> +                .getService(Ci.nsIStreamConverterService);

Is this something we ought to be wrapping in a lazy getter? We wrap our scriptableunicodeconverter…
Attachment #677891 - Flags: review?(rnewman) → feedback+
Additional Unicode string test did expose an error. Turns out JS String -> IDL's "string" throws away data in the high byte octet from JS strings. I documented this limitation in the API and made the test convert the input to an "octet string" before feeding into convertString(). That change made things work.
Attachment #677891 - Attachment is obsolete: true
Attachment #678478 - Flags: review?(rnewman)
Comment on attachment 678478 [details] [diff] [review]
Add CommonUtils.convertString, v3

Review of attachment 678478 [details] [diff] [review]:
-----------------------------------------------------------------

Hell yeah test coverage.
Attachment #678478 - Flags: review?(rnewman) → review+
This is going to get superreview before landing in m-c right?
(In reply to Dave Townsend (:Mossop) from comment #21)
> This is going to get superreview before landing in m-c right?

Why would this need super review?
Flags: needinfo?(dtownsend+bugmail)
(In reply to Gregory Szorc [:gps] from comment #22)
> (In reply to Dave Townsend (:Mossop) from comment #21)
> > This is going to get superreview before landing in m-c right?
> 
> Why would this need super review?

It adds an API
Flags: needinfo?(dtownsend+bugmail)
Because it's an API, under the SR policy: http://www.mozilla.org/hacking/reviewers.html

Feel free to tag me for SR.
Comment on attachment 678478 [details] [diff] [review]
Add CommonUtils.convertString, v3

SR flag has gone missing…?
Attachment #678478 - Flags: review?(mconnor)
Attachment #678478 - Flags: review?(mconnor) → superreview?(mconnor)
Comment on attachment 678478 [details] [diff] [review]
Add CommonUtils.convertString, v3

Bulk-setting approval flags for FHR landing for FxOS ADU ping (Bug 788894).
Attachment #678478 - Flags: superreview?(mconnor)
Attachment #678478 - Flags: approval-mozilla-beta?
Attachment #678478 - Flags: approval-mozilla-aurora?
Target Milestone: --- → mozilla20
https://hg.mozilla.org/mozilla-central/rev/48f311b7d355
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Comment on attachment 678478 [details] [diff] [review]
Add CommonUtils.convertString, v3

FHR for B2G ADU ping, won't be build/enabled for Mobile/Desktop.
Attachment #678478 - Flags: approval-mozilla-beta?
Attachment #678478 - Flags: approval-mozilla-beta+
Attachment #678478 - Flags: approval-mozilla-aurora?
Attachment #678478 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: