Closed
Bug 125617
Opened 23 years ago
Closed 22 years ago
uconv should use the category service and stop using nsIRegistry
Categories
(Core :: Internationalization, defect, P2)
Tracking
()
VERIFIED
FIXED
mozilla1.1beta
People
(Reporter: dougt, Assigned: alecf)
References
Details
(Keywords: embed, intl, topembed-, Whiteboard: fix in hand)
Attachments
(3 files, 2 obsolete files)
50.22 KB,
patch
|
dougt
:
review+
dveditz
:
superreview+
|
Details | Diff | Splinter Review |
46.89 KB,
patch
|
dveditz
:
superreview+
|
Details | Diff | Splinter Review |
1.99 KB,
patch
|
Details | Diff | Splinter Review |
For performance reasons, I would like to see uconv converted to use the category service. In this way, I may be able to remove components.reg from existance.
Assignee | ||
Comment 1•23 years ago
|
||
yeah! I'd love to see this. by the way, see bug 125055 for some minor cleanup. Once I land that patch, converting over should be a matter of updating the macros..
Reporter | ||
Comment 2•23 years ago
|
||
Attached is a patch which will convert the intl/uconv stuff over the the category services. Overview of patch: Changed nsUConverterRegSelf to register either one of two categories: "Charset Decoders", "Charset Encoders". The topic of either category will be the give charset. Internally, these strings can be concatenated to a known contract id base. Implmeneted nsUConverterUnRegSelf. Currently, if any of the uconv components are unregistered, they leave the registry a mess. removed nsCharsetConverterManager::GetRegistryEnumeration2() as it is not used. Massaged nsCharsetConverterManager::GetList(*) so that it would work with the category service. As noted in the code, we should change the interface which this function implements to return a nsISimpleEnumerator so that we do not have to populate an array. Note that I added a helper function so that both the decoders and encoders can share the same code. I changed what was passed to NS_UCONV_REG_UNREG in many places. The basic form of these changes was to: a. removed the trailing parameter (the cid) as it is not needed any more. b. convert "Unicode" to nsnull so that I could determine if the class was an encoder or decoder without having to strcmp. I also had to sprinkle some #include love so that nsICategoryManger was defined.
Assignee | ||
Comment 3•23 years ago
|
||
ok, so I landed the other bug.. repairing this patch shouldn't be too hard... I'll see if I can do it today...
Reporter | ||
Comment 4•23 years ago
|
||
alec, just to be sure so that we don't stomp each other..... you have landed and you are *not* working on this patch? ain't concurrent development with no communication fun??
Assignee | ||
Comment 5•23 years ago
|
||
no I _am_ working on it - see my previous patch. I ran out of time friday, I'll try to finish it today.
Assignee | ||
Comment 6•23 years ago
|
||
er s/previous patch/previous post in this bug/
Reporter | ||
Comment 7•23 years ago
|
||
you are a great person, alec! reassigning over to you to be really explicted about this whole thing.
Assignee: dougt → alecf
Assignee | ||
Comment 8•23 years ago
|
||
Ok, I applied what I could of the previous patch, and then hand-applied the rest. I also changed: 1) as the category value, I put in the CID - I did this because it's basically free right now and this allows someone to create objects right out of the category manager. 2) Added only one call to AddCategoryEntry/DeleteCategoryEntry, because the ultimate code size will be cheaper (which is good if we're duplicating this file all over the place) 3) fixed the DeleteCategoryEntry to actually delete the entry, and not the CONTRACTID_BASE
Attachment #69585 -
Attachment is obsolete: true
Reporter | ||
Comment 9•23 years ago
|
||
Comment on attachment 70182 [details] [diff] [review] updated patch, using faster registration scheme you don't need to allocate here: + char * value = entry->cid.ToString(); \ + \ + rv = catman->DeleteCategoryEntry(category, key, PR_TRUE); \ + CRTFREEIF(value); I do not understand this change and how it relates to this bug: src/nsCharsetConverterManager.cpp other than that, looks good. Assuming that you fix the above, r=dougt.
Reporter | ||
Updated•23 years ago
|
Attachment #70182 -
Flags: review+
Assignee | ||
Comment 10•23 years ago
|
||
Comment on attachment 70182 [details] [diff] [review] updated patch, using faster registration scheme I assume you meant src/nsCharsetAliasImp.cpp sorry that was part of another patch.. I fixed the allocation in my tree..
Assignee | ||
Comment 11•23 years ago
|
||
cc'ing shaver for possible sr=
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P3
Whiteboard: fix in hand
Target Milestone: --- → mozilla0.9.9
Comment 12•23 years ago
|
||
One slight concern I have is that the intl team worked to delay their reading of registry data until it's needed. By putting this huge mess into the category manager it'll always get read at startup and sit around in memory whether needed or not. What problem are we trying to solve? I don't have a problem with component manager information in a separate file from more generic registry use. In fact I'd prefer it, so that if random components misuse the registry and manage to screw it up at least the precious component info stays safe. For a similar example see the bug where the plugin info makes the profile registry.dat grow and grow -- and now we'll have to go through hoops to get the profile data out of that file safely.
Assignee | ||
Comment 13•23 years ago
|
||
well, I'll let dougt answer as well, but I think that no matter what, this code should be using the more abstract Category Manager - I think the use of nsIRegistry here pre-dates the category manager and that this sort of pattern is exactly what the category manager was trying to solve. This will allow us to later change the implementation of the category manager without mucking with the consumers... i.e. so instead of working hard in the intl libraries to speed up this one use, we can work on things like lazily loading categories.. i.e. maybe we'll find a better more category-manager-specific storage mechanism...
Comment 14•23 years ago
|
||
Comment on attachment 70182 [details] [diff] [review] updated patch, using faster registration scheme >+#define NS_DATA_BUNDLE_REGISTRY_KEY "software/netscape/intl/xuconv/data/" >+#define NS_TITLE_BUNDLE_REGISTRY_KEY "software/netscape/intl/xuconv/titles/" These two point at a resource: and chrome: URL stored in the registry. Wouldn't all.js be a much better (lower overhead) place, while still allowing easy overriding for embeddors? Since the places that use these #defines don't appear in the patch you're not fully implementing "uconv should...stop using nsIRegistry" >+#define NS_UNICODEDECODER_NAME "Charset Decoders" >+#define NS_UNICODEENCODER_NAME "Charset Encoders" Will we only ever convert to and from Unicode? So far that's all I see, but the existing registry structure would lend itself to arbitrary source and destination conversions. I'm guessing we'd always convert through Unicode, just checking before we lose this potential. > const char* componentType, \ > const nsModuleComponentInfo *info) \ > { \ >- nsresult res = NS_OK; \ For involved patches more context than the default 3 lines helps reviewers (me, at least) figure out how the patch fits into places with which I'm not familiar. Even -6 would help, -9 often saves me having to bring up the original source file to complete the review. >+ if (entry->fromCharset) { \ >+ category = NS_UNICODEDECODER_NAME; \ >+ key = entry->fromCharset; \ >+ } else { \ >+ NS_ASSERTION(entry->toCharset, "Must have either a source or dest charset!\n"); \ >+ category = NS_UNICODEENCODER_NAME; \ >+ key = entry->toCharset; \ >+ } \ If you ever get both here you're screwed. You changed all the uses of the NS_UCONV_REG_UNREG macro anyway, why not change it to force people to do the right thing rather than having to put nsnull in the right place? Change the structure: > struct nsConverterRegistryInfo { > const PRBool decoder; > const char *charset; > nsCID cid; > }; Then make two macros: #define NS_UCONV_DECODER_REG(_From, _CID) \ { PR_TRUE, _From, _CID }, #define NS_UCONV_ENCODER_REG(_To, _CID) \ { PR_FALSE, _To, _CID }, then since these overwhelmingly come in pairs make life easier on folks: #define NS_UCONV_REG(_Charset, _DecoderCID, _EncoderCID) \ NS_UCONV_DECODER_REG(_Charset, _DecoderCID) \ NS_UCONV_ENCODER_REG(_Charset, _EncoderCID) >+ char * value = entry->cid.ToString(); \ >+ \ >+ rv = catman->AddCategoryEntry(category, key, value ? value : "", \ If there isn't a value aren't we screwed? If it's optional then why bother storing it at all? these things are big. Now I'm wondering why we store this information at all. The intl team already uses contract ID's like @mozilla.org/intl/unicode/decoder;1?charset=UTF-32LE @mozilla.org/intl/unicode/encoder;1?charset=ISO-8859-4 That's all they need to get a converter given an arbitrary charset name -- if it fails they don't support that conversion. Is the only reason for the category so they know which ones they've got ahead of time? If that's all they need couldn't they enumerate the contract ID's to get it? That might be slightly slower for them than a category (they'd have to skip the ones they weren't interested in), but then the component/category manager doesn't have to store essentially the same data twice. >+// we should change the interface so that we can just pass back a enumerator! >+NS_IMETHODIMP nsCharsetConverterManager::GetDecoderList(nsISupportsArray ** aResult) Would you mind filing a bug on that as a reminder?
Assignee | ||
Comment 15•23 years ago
|
||
ok, we're not doing this for 0.9.9 because the registry switch needs more bake-time
Assignee | ||
Updated•23 years ago
|
Priority: P3 → P1
Target Milestone: mozilla0.9.9 → mozilla1.0
Assignee | ||
Comment 16•22 years ago
|
||
actually, since we're not moving the component registry, we can push this past 1.0
Target Milestone: mozilla1.0 → mozilla1.1alpha
Comment 17•22 years ago
|
||
Marking it topembed-, embed (based on comments that we could push it past 1.0)
Assignee | ||
Updated•22 years ago
|
Priority: P1 → P2
Assignee | ||
Comment 18•22 years ago
|
||
mozilla 1.1alpha already passed, just moving out to 1.1beta
Target Milestone: mozilla1.1alpha → mozilla1.1beta
Assignee | ||
Comment 19•22 years ago
|
||
ok, here's an updted patch which I think addresses all of dan's comments: * as it turns out, yes the CID is of no value when stored in the registry. * the reason we need to do any of this is because we need a fast way to enumerate the existing charset converters, so we can create the View -> Character Coding menu * This patch doesn't get rid of ALL uses of nsIRegistry, it just gets rid of the biggest use. To make it easier to track regressions, I'd like to land this patch first, then tackle the whole * I liked the seperate macro idea, so I did more or less what you suggest. I ran into a little cpp issue (cpp doesn't like passing CID structs through nested macros - the commas in the CID definition end up getting evaluated too early..) but that's been commented in the patch
Attachment #70182 -
Attachment is obsolete: true
Reporter | ||
Comment 20•22 years ago
|
||
Comment on attachment 88152 [details] [diff] [review] updated patch with better macros + PRBool isDecoder; // 0 = decoder, 1 = encoder can we rename is isDecoder to isEncoder, or switch the values? Other than that, this is a great patch!!! The only stuff left in the component.reg file by unconv is nsRegistry::Common - software - netscape - intl - xuconv Right?
Attachment #88152 -
Flags: review+
Assignee | ||
Comment 21•22 years ago
|
||
ok, I've swapped isDecoder to isEncoder, and fixed up the macros to do the right thing there.. just need the sr= and this puppy is going in.
Comment 22•22 years ago
|
||
Comment on attachment 88152 [details] [diff] [review] updated patch with better macros >+ nsCOMPtr<nsICategoryManager> catman = \ >+ do_GetService(NS_CATEGORYMANAGER_CONTRACTID, &rv); \ Nit: Wouldn't this be better indented? >+ if (entry->isDecoder) { \ >+ category = NS_UNICODEDECODER_NAME; \ This appears to be wrong currently, since a decoder is isDecoder==0. You mentioned you're going to fix the oddness of that, just be careful you don't blindly swap things here. Ditto later. >+ nsCAutoString str(aRegistryPath); >+ str.Append(NS_LITERAL_CSTRING("defaultFile")); > >- char * p = ToNewCString(str); >- res = aRegistry->AddSubtree(nsIRegistry::Common, p, &key); >- nsMemory::Free(p); >+ res = aRegistry->AddSubtree(nsIRegistry::Common, str.get(), &key); >+ Is registry path always 7-bit ASCII (UTF-8, actually)? If not AddSubtree will return an error. > if (NS_FAILED(res)) return res; > res = aRegistry->SetStringUTF8(key, "name", "chrome://global/locale/charsetTitles.properties"); This is kinda odd. If we know it's this file confidently enough to hard code it then why do we need to store it? Whatever, not part of your changes. >+nsCharsetConverterManager::GetList(PRBool encoder, Could you make that aEncoder, or aIsEncoder? >+ rv = array->AppendElement(atom); >+ if (NS_FAILED(rv)) >+ continue; >+ } Checking NS_FAILED just so you can continue when you're already at the end of the loop is extra work. sr=dveditz
Attachment #88152 -
Flags: superreview+
Assignee | ||
Comment 23•22 years ago
|
||
cool, thanks for the reviews.. follow up comments: - turns out my comment was just backward (where I wrote "0 = decoder" - the bug was in the comment, not the code..) - in any case I made it "isEncoder" where isEncoder == PR_TRUE means its a decoder. - regarding the path thing, all I did was change a bad string usage, which saved us a string allocation...I didn't change the data that was actually going to the registry.. we were taking ASCII stored as unicode and converting it back to char-based ASCII. I got rid of the middle step. ok, that last patch has landed, just one more minor cleanup patch (Which actually addresses dveditz's comment about storing the property names in the registry)
Assignee | ||
Comment 24•22 years ago
|
||
ok, the whole other ugly part of this is the chardet module - it manually registers all its detectors using the registry. see http://lxr.mozilla.org/seamonkey/source/intl/chardet/src/nsCharDetModule.cpp#168 for the start of it. yuck.
Assignee | ||
Comment 25•22 years ago
|
||
sooo.. it turns out that there are actually two uses of the registry left: - charset title list - it turns out we're using a feature of nsIStringBundleService that nobody probably knew about - what you do is register a series of .properties file names in the registry, and the string bundle service loads them all into one big string bundle. - charset detector - each of the 8 or so charset detectors register themselves, and are enumerated later with nsCharsetConverterManager::GetCharsetDetectorList() I'm going to fix these in two seperate patches because the former requires a change to the string bundle service as well.
Assignee | ||
Comment 26•22 years ago
|
||
ok, turns out I had to do both operations at the same time! the stuff which enumerates character titles needed to go through the string bundle service, so I needed to fix everyone all at once.. so basically this: 1) fixes the startup registration such that the charset detectors are registered via the category manager 2) make extensible bundles now go through the category manager rather than the registry, and fixes the only consumer (the charset converter manager) to use this new mechanism reviews? dougt? dveditz?
Assignee | ||
Comment 27•22 years ago
|
||
after this, plugins are the only thing that seem to open the ApplicationComponentRegistry
Reporter | ||
Comment 28•22 years ago
|
||
Comment on attachment 89305 [details] [diff] [review] ok, final removal of registry code you have printf: + printf("######### Recycling cache entry for %s\n", cacheEntry->mHashKey->GetString()); remove that, ensure uconv works, and r=
Comment 29•22 years ago
|
||
Comment on attachment 89305 [details] [diff] [review] ok, final removal of registry code >+static NS_METHOD AddCategoryEntry(const char* category, I filed bug 154458 on adding category registration into the generic module macros. If categories are standard enough to be part of the new registry format they're standard enough to be part of the macros. >- rv = registry-> SetStringUTF8(key, "type", "off"); >- rv = registry-> SetStringUTF8(key, "defaultEnglishText", "Off"); >+ return AddCategoryEntry(NS_CHARSET_DETECTOR_CATEGORY, "off", "off"); Here the second value is a word... >- rv = registry-> SetStringUTF8(key, "type", "ja_parallel_state_machine"); >- rv = registry-> SetStringUTF8(key, "defaultEnglishText", "Japanese"); >+ return AddCategoryEntry(NS_CHARSET_DETECTOR_CATEGORY, >+ "ja_parallel_state_machine", >+ info->mContractID); ...and here a contractID. Would a blank string be better than "off"? are there other key values? is this format documented anywhere? I don't see anyone getting this value so maybe it doesn't even matter, although in that case the contractID is likely taking up considerably more memory than the previous label. Found the same thing in the commercial tree, in the alis detector again. Do you have an equivalent bugscape bug? I doubt anyone will squawk too loudly if you just attach the patch here, we do that with install and build stuff a lot. >Index: uconv/src/Makefile.in >=================================================================== >+ chardet \ Are there mac project file updates, too?
Comment 30•22 years ago
|
||
Does this commercial usage need to go at the same time? http://lxr.mcom.com/commercial/source/intl/chardet/src/nsAlisCharDetModule.cpp#59 Looks like commercial mail is looking for some AIM thing: http://lxr.mcom.com/commercial/source/mailnews/mime/aimstatus/build/nsMiscStatusFactory.cpp#57 (A registry use, but separate from these)
Assignee | ||
Comment 31•22 years ago
|
||
the "value" part of the category entry is never use for these detectors - so in most cases I just stuck the contractid to make debugging easier - the contractID is actually constructed at runtime by some other code... (and code that has no special knowledge of categories, so it wouldn't make sense to adapt it either) - in the case of "off" I just added that dummy value because we're actually using another components (an adaptor component to fake out the code into turning off autodetection) which doesn't actually have a relevant contractid, so I left it out. This is confusing I know :) I didn't even think about the commercial tree having detectors. I'll fix that and attach a patch (on a side note I'm going on vacation in about 2 hours, so I won't actually be able to check this in until July 7th... I was hoping I had all my bases covered and could land it today, but no such luck :)) also, regarding mac project changes - nope, no changes necessary - mac doesn't have a REQUIRES mechanism and the MachO build will pick up the dependency I added here. The dependency actually existed all along, its only needed to bring in the category name whereas before we were hardcoding the string. In the future I'm going to move the autodetector enumeration out of nsCharsetConversionManager and into an autodetection class where it really belongs.
Comment 32•22 years ago
|
||
Comment on attachment 89305 [details] [diff] [review] ok, final removal of registry code sr=dveditz
Attachment #89305 -
Flags: superreview+
Assignee | ||
Comment 33•22 years ago
|
||
ok, since dougt gave his "with that..r=" and dan reviewed the other stuff, I fixed the commercial tree and landed everything. Marking fixed.
Assignee | ||
Comment 34•22 years ago
|
||
oops, one more try.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 35•22 years ago
|
||
You need to patch universalchardet too...
Assignee | ||
Comment 36•22 years ago
|
||
argh, I had that patch in my tree, forgot to check it in.
You need to log in
before you can comment on or make changes to this bug.
Description
•