493975 - Make nsTextToSubURI threadsafe

Reporter

Description

•

15 years ago

Over in bug 455555, I'm moving location bar searches off of the main thread.  We call nsITextToSubURI::unEscapeURIForUI on our data to search through it.  We could proxy our calls back to the main thread, but given how often we'd call that, we'd have a significant slowdown in obtaining results, which is not the goal of this change.

Therefore, we need to make nsTextToSubURI threadsafe.  We have two options here:
1) Add appropriate locking
2) Rewrite this in JS since JS is threadsafe (this was suggested by timeless).

I don't think (1) or (2) are that different in terms of workload, and (2) gives us the advantage of having less code in C++.  I'm interested in hearing what smontagu and bz have to say, since they will likely be the reviewers for this code.

This will also require that nsCharsetConverterManager be threadsafe, which I'll be filing a bug for shortly.

Shawn Wilsher :sdwilsh

Reporter

Updated

•

15 years ago

Depends on: 493981

Simon Montagu :smontagu

Comment 1

•

15 years ago

I'm not generally enthusiastic about the trend to reimplement the world in JS, but this seems like a good candidate for it.

As I said on IRC, you probably want to use nsIScriptableUnicodeConverter, but that still requires making nsCharsetConvertManager threadsafe.

Boris Zbarsky [:bzbarsky]

Comment 2

•

15 years ago

In general, we don't support JS components that will be called into from C++ while there's content JS on the stack right now.  So doing (2) means either fixing bug 304048 (aka bug 360207) first or moving all of these callsites into JS as well:

http://mxr.mozilla.org/mozilla-central/search?string=unEscapeURI&find=.cpp&tree=mozilla-central
http://mxr.mozilla.org/mozilla-central/search?string=unEscapenonascii&find=.cpp&tree=mozilla-central
http://mxr.mozilla.org/mozilla-central/search?string=UnEscapeAndConvert&find=.cpp&tree=mozilla-central
http://mxr.mozilla.org/mozilla-central/search?string=ConvertAndEscape&find=.cpp&tree=mozilla-central

Not doing either of those will mean little things like getting location.hash in content JS will throw.

Boris Zbarsky [:bzbarsky]

Comment 3

•

15 years ago

Just to be clear, I'm all in favor of moving code into JS; we just need to fix this old bug about being unable to create really useful JS components first.

Shawn Wilsher :sdwilsh

Reporter

Comment 4

•

15 years ago

(In reply to comment #3)
> Just to be clear, I'm all in favor of moving code into JS; we just need to fix
> this old bug about being unable to create really useful JS components first.
How hard do we think this is going to be?  I've basically got the JS component done, baring one issue (I think just one) that I'm debugging right now.

With that said, I think I've come to the conclusion that I don't actually need to use this code in bug 455555 now that I fully understand what it was doing in the first place.  We are really only using this code to unescape, and then converting to UTF-16.

Shawn Wilsher :sdwilsh

Reporter

Comment 5

•

15 years ago

Attached patch v0.1 (obsolete) — Details — Splinter Review

I'm doing this in two parts.  First, I'm going to rewrite this in JS, with functionality being on par.  Then I'm going to make it threadsafe, which should be a really small patch, in theory.

This doesn't quite work.  I'm hung up on some places tests, but as far as I can tell the code does the same thing right now.  I've pushed this to the try server to see if anything else is broken at the moment.  Feel free to look at this and comment.

Shawn Wilsher :sdwilsh

Reporter

Comment 6

•

15 years ago

Attached patch v0.1 (obsolete) — Details — Splinter Review

It certainly helps to hg add and hg rm stuff...

Attachment #378738 - Attachment is obsolete: true

Shawn Wilsher :sdwilsh

Reporter

Comment 7

•

15 years ago

So, the issue that I'm currently hitting is that calling this._netUtil.unescapeString is dropping the high bits of my string that I'm passing in.  If I pass in the unicode characeter 0x30A8, the function returns 0xA8.

Trying something that dolske suggested...

Shawn Wilsher :sdwilsh

Reporter

Comment 8

•

15 years ago

Attached patch rewrite v0.9 (obsolete) — Details — Splinter Review

This passes xpcshell-tests and make check locally.  I've pushed to the try server to get more data (and performance numbers).

As far as I can tell, this is identical behavior to what we have now.  Tomorrow I want to go through and make sure we have unit tests for these code paths, and add tests if we do not.  Additionally, I'm going to revamp the IDL comments so they are actually useful and a bit more meaningful.  Comments and feedback are very much welcomed at this point.

Attachment #378740 - Attachment is obsolete: true

Attachment #378767 - Flags: superreview?(bzbarsky)

Attachment #378767 - Flags: review?(smontagu)

Shawn Wilsher :sdwilsh

Reporter

Comment 9

•

15 years ago

Forgot to add - after talking this over with dolske, there may be a bug in the existing code (and the reason for the try-catch block in _unEscapeAndConvert).  It's possible in the current code, and the new code, to pass in a valid UTF-8 string that gets turned into an invalid one because NS_UnescapeURL breaks it.  If you are a consumer who uses UTF-16, and then try to convert this unescaped string back to UTF-16, it will not be valid.

Boris Zbarsky [:bzbarsky]

Comment 10

•

15 years ago

I believe that bug came up sometime in a patch of smontagu's that I was reviewing, fwiw.  You might want to dig up that bug and see what was happening there.  I don't have the bug# offhand, sadly, but pretty sure it's assigned to smontagu.

Boris Zbarsky [:bzbarsky]

Comment 11

•

15 years ago

Was the stripping just because of the stupid (imo) signature in the IDL?  ACString is really only safe for ISO-8859-1-range strings...

Shawn Wilsher :sdwilsh

Reporter

Comment 12

•

15 years ago

(In reply to comment #10)
> I believe that bug came up sometime in a patch of smontagu's that I was
> reviewing, fwiw.  You might want to dig up that bug and see what was happening
> there.  I don't have the bug# offhand, sadly, but pretty sure it's assigned to
> smontagu.
Sadly, he's the assignee for every bug in this component, so it makes it hard to search for it.

(In reply to comment #11)
> Was the stripping just because of the stupid (imo) signature in the IDL? 
> ACString is really only safe for ISO-8859-1-range strings...
Are you talking about the signature of unescapeString on nsINetUtil.idl?  I suspect that that may be the case, and I'd be happy to fix that if we want to.  Native could should just be #including nsNetUtil.h and using the native methods anyway, right?  I can spin that off into another bug if you'd like.

Boris Zbarsky [:bzbarsky]

Comment 13

•

15 years ago

> Sadly, he's the assignee for every bug in this component

Bah.

Did some digging through my visited bugs, and it was bug 415491.


> Are you talking about the signature of unescapeString on nsINetUtil.idl? 

Yes.  When converting from JS string to ACString, we convert by just dropping the high byte of the 16-bit units (calling JS_GetStringBytes, without using UTF-8 JS strings).  So if you pass in a string which is not basically byte-inflated ISO-8859-1, then you'll lose by the time it gets into C++.

Spinning this off into another bug might be good.  For one thing, it's not obvious what the right behavior here really is.   Unescaping doesn't have to produce useful Unicode.... :(

Shawn Wilsher :sdwilsh

Reporter

Comment 14

•

15 years ago

I didn't see any unit test failures or performance regressions after pushing to the try server with this patch.

Shawn Wilsher :sdwilsh

Reporter

Updated

•

15 years ago

Depends on: 494217

Shawn Wilsher :sdwilsh

Reporter

Comment 15

•

15 years ago

(In reply to comment #13)
> Spinning this off into another bug might be good.  For one thing, it's not
> obvious what the right behavior here really is.   Unescaping doesn't have to
> produce useful Unicode.... :(
Done in bug 494217.  I can get rid of all the converting around the unescape call with the patch in that bug.

Shawn Wilsher :sdwilsh

Reporter

Comment 16

•

15 years ago

Attached patch v1.0 (obsolete) — Details — Splinter Review

I think there might be some slight behavior changes with this code in the rewrite.  I also don't think the IDL docs were completely in sync with the actual code from before.  Not sure if we should change the iid on the interface because of this or not (assuming not).

There are a few changes here from 0.9:
1) Added usable IDL comments to all methods
2) Removed the need to convert to UTF-8 and then back around the call to nsINetUtil::unescapeString
3) Fixed unEscapeURIForUI to not fail and assume UTF-8 if it does fail the first time through per IDL comments.  I'm not sure if we actually want that still, but that's up to the reviewers.

Sadly, there is just about zero test coverage on all of this code, so I'm going to have a write a bunch of tests too.  I'll put that up in another patch though, as I strongly suspect this code is correct.  The places test do hammer on unEscapeURIForUI, and those were the most difficult to pass initially.

Attachment #378767 - Attachment is obsolete: true

Attachment #378925 - Flags: superreview?(bzbarsky)

Attachment #378925 - Flags: review?(smontagu)

Attachment #378767 - Flags: superreview?(bzbarsky)

Attachment #378767 - Flags: review?(smontagu)

Shawn Wilsher :sdwilsh

Reporter

Updated

•

15 years ago

Whiteboard: [needs review smontagu][needs sr bz]

Shawn Wilsher :sdwilsh

Reporter

Comment 17

•

15 years ago

(In reply to comment #16)
> 2) Removed the need to convert to UTF-8 and then back around the call to
> nsINetUtil::unescapeString
I may need to add this back in based on discussion in bug 494217

Shawn Wilsher :sdwilsh

Reporter

Comment 18

•

15 years ago

Comment on attachment 378925 [details] [diff] [review]
v1.0

Canceling reviews until we figure out what is going on with bug 494217.

Attachment #378925 - Flags: superreview?(bzbarsky)

Attachment #378925 - Flags: review?(smontagu)

Shawn Wilsher :sdwilsh

Reporter

Updated

•

15 years ago

Whiteboard: [needs review smontagu][needs sr bz] → [pending bug 494217]

Shawn Wilsher :sdwilsh

Reporter

Comment 19

•

15 years ago

Attached patch v1.1 (obsolete) — Details — Splinter Review

Well, that was quick.  I'll get the tests up tomorrow or the next day hopefully.

Attachment #378925 - Attachment is obsolete: true

Attachment #379792 - Flags: superreview?(bzbarsky)

Attachment #379792 - Flags: review?(smontagu)

Shawn Wilsher :sdwilsh

Reporter

Updated

•

15 years ago

Whiteboard: [pending bug 494217] → [needs review smontagu][needs sr bz]

Shawn Wilsher :sdwilsh

Reporter

Comment 20

•

15 years ago

Comment on attachment 379792 [details] [diff] [review]
v1.1

>+  _unEscapeAndConvert: function TTSU_unEscapeAndConvert(aCharset,
>+                                                        aText,
>+                                                        aFlags)
>+  {
>+    // Un-escape our text, making sure to default to escape all if no flags were
>+    // specified.
>+    let text = this._netUtil.unescapeString(aText, (aFlags || 0));
>+
>+    // unescapeString returns an ACString, so we have to convert this back to
>+    // Unicode (xpconnect doesn't do this for us).  It's possible that we are
>+    // given back an invalid UTF-8 string, however.  If that's the case, we'll
>+    // just return the string that we got back (when conversion fails).
>+    try {
>+      let converter = this._newConverter("UTF-8");
>+      text = converter.ConvertToUnicode(text);
>+    }
>+    catch (e) {
>+    }
I realized last night that I think I want to convert from the specified character set to unicode before I call unescapeString.  XPConnect will then convert it to UTF-8 safely from Unicode for the call, and then we'll get an escaped string that may be UTF-8 back.

I'll need to add a test that passes in something that isn't Unicode here to be sure about this.

Boris Zbarsky [:bzbarsky]

Comment 21

•

15 years ago

That function's signature says |string| in the idl, no?  Passing non-ASCII data through that argument is technically not allowed.  Looking at the consumers, nsIndexedToHTML.cpp always passes through ASCII data, I can't tell for nsDirIndexParser.cpp (but could be checked with a dir with a non-ASCII name and files likewise in it), and the docshell caller seems to violate this (and will probably break the moment we stop escaping UTF-8 in refs).  The docshell caller might need changes in any case, to be compliant with the HTML5 spec for handling refs.

That said, the actual code for converting from C++ to JS strings used in XPConnect here will simply byte-inflate the string.  That is, it doesn't enforce the ASCII-ness.

Perhaps the right thing to do here is to make the argument ACString in the idl?  That will guarantee byte-inflation no matter what, and then your approach of converting before unescaping should be fine...  Or we could simply make it clear that the input must be ASCII, of course.

Shawn Wilsher :sdwilsh

Reporter

Comment 22

•

15 years ago

Well, it's only |string| for UnEscapeAndConvert, but the two other un-escape methods use ACString.  We should probably unify those I guess.  I guess I'm also not doing a simple refactoring anymore - should this work be spun out into a different bug blocking this one?

Boris Zbarsky [:bzbarsky]

Comment 23

•

15 years ago

I thought we were changing at least one of the other unescape methods to use AUTF8String, no?

But yeah, if we start changing the API that should go in a separate bug...

Shawn Wilsher :sdwilsh

Reporter

Updated

•

15 years ago

Depends on: 495080

Shawn Wilsher :sdwilsh

Reporter

Updated

•

15 years ago

Attachment #379792 - Attachment is obsolete: true

Attachment #379792 - Flags: superreview?(bzbarsky)

Attachment #379792 - Flags: review?(smontagu)

Shawn Wilsher :sdwilsh

Reporter

Comment 24

•

15 years ago

(In reply to comment #23)
> I thought we were changing at least one of the other unescape methods to use
> AUTF8String, no?
That was in nsINetUtil

> But yeah, if we start changing the API that should go in a separate bug...
Spun off into bug 495080.

No longer depends on: 494217

Whiteboard: [needs review smontagu][needs sr bz]

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 494217

Shawn Wilsher :sdwilsh

Reporter

Updated

•

15 years ago

No longer blocks: 455555

Shawn Wilsher :sdwilsh

Reporter

Updated

•

15 years ago

Assignee: sdwilsh → smontagu

Jim Mathies [:jimm]

Comment 25

•

3 years ago

Still valid?

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 26

•

3 years ago

mak, do we still care about using this stuff off-the-main-thread for the URL bar? AFAICT, the internals look rather off-the-main-thread-friendly at this point.

Flags: needinfo?(hsivonen) → needinfo?(mak)

Henri Sivonen (:hsivonen)

Updated

•

3 years ago

Assignee: smontagu → nobody

Status: ASSIGNED → NEW

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 27

•

3 years ago

No, the urlbar doesn't use this from cpp anymore, I think we can close this bug.

Status: NEW → RESOLVED

Closed: 3 years ago

Flags: needinfo?(mak)

Resolution: --- → WONTFIX

v0.1 15 years ago Shawn Wilsher :sdwilsh 1.67 KB, patch		Details \| Diff \| Splinter Review
v0.1 15 years ago Shawn Wilsher :sdwilsh 20.63 KB, patch		Details \| Diff \| Splinter Review
rewrite v0.9 15 years ago Shawn Wilsher :sdwilsh 20.86 KB, patch		Details \| Diff \| Splinter Review
v1.0 15 years ago Shawn Wilsher :sdwilsh 24.19 KB, patch		Details \| Diff \| Splinter Review
v1.1 15 years ago Shawn Wilsher :sdwilsh 24.63 KB, patch		Details \| Diff \| Splinter Review