Closed Bug 112979 Opened 23 years ago Closed 22 years ago

open source for nsIIDNService implementation

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.3beta

People

(Reporter: nhottanscp, Assigned: nhottanscp)

References

()

Details

(Keywords: intl)

Attachments

(7 files, 7 obsolete files)

37.59 KB, patch
Details | Diff | Splinter Review
12.74 KB, patch
shanjian
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
317.81 KB, patch
Details | Diff | Splinter Review
117.31 KB, patch
nhottanscp
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
13.59 KB, patch
darin.moz
: superreview+
Details | Diff | Splinter Review
42.51 KB, patch
nhottanscp
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
1.20 KB, text/html
Details
nsIIDNService implementation is currently not a part of Mozilla.
Adding a link to nsIIDNService.idl in LXR.
We also need a method to convert from ACE to Unicode (see bug 110028).
Also we might want a method to apply NamePrep, so NamePrep and ACE encoding can
be done separately.
cc to JPNIC
Added JDNA's ishisone-sab. Other JPN people are also from JDNA.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2
cc to darin

From e-mail conversation, I heard that the development has started, so set to 0.9.9.
Component: Internationalization → Networking
Keywords: intl
Target Milestone: mozilla1.2 → mozilla0.9.9
it has come to my attention that the current interface ConvertUTF8toIDN is
technicially incorrect and essentially non-scriptable.  the problem is that
|string| is interpreted by xpconnect as ASCII.  as a result, JS code which
passes a non ASCII string into ConvertUTF8toIDN will have the high bytes chopped
off.

either this interface needs to be modified to take wstring (ie.
ConvertUCS2toIDN) or we need bug 84186 to be solved (which will add a UTF8
string type to XPCOM).
We probably need to change it to noscript. I don't think IDN encoding is needed
from scripting code.
The patch is not ready for posting but I got some info through e-mail conversation.
Here is a list of issues.

1) The patch is for Unix only (for now).

2) It needs an additional package in order to build (mDNkit).
http://www.nic.ad.jp/jp/research/idn/mdnkit/download/

3) Schedule, can this make 0.9.9 (20-Feb-2002)?
i don't know what mozilla's policy is for adding new external dependencies. 
perhaps the IDN service impl belongs in extensions?

brendan: what do you say?
Hi, folks,

I'm very new to here, so please forgive me and let me know if there is any bad
in manner.

As Hotta-san said, now JPNIC (and would succeed to JDNA) is developping open
source nsIIDNService implementation using JPNIC's mDNkit. Our intention is to
enable IDN capability as an usual feature in Mozilla 1.0. To achieve it, we'd
like to hear more advice from you.

Here, I have some questions to proceed our development correctly.

1. External library dependancy
  Our implementation depends on mDNkit's new library (hopefully released as a
part of mDNkit-2.3 in the near future). What is the recommended way to solve
this dependency?

2. Naming rules for Class/Contract ID
  Our implementation currently uses newly generated Class ID and already defined
Contract ID which is "@i-dns.net/IDNService;1" for XPIDN. If there are naming
rules for them, please let us know.

3. Configure option
  We added new configure option to build IDN component with mDNkit as follows:
    --with-mdnkit(=prefix)
      indicates to compile IDN component with mDNkit. prefix is a path name
where mDNkit is installed. default is "/usr/local/".
  If there are nameing rules or policies to add configure option, please let us
know.

3. How to enable IDN feature
  What is the correct way to enable IDN feature in Mozilla? Current XPIDN
requires to access resource:///res/idnconfig.html.

Thanks in advance.
cc'ing seawood and mcafee hoping that they can help with this 'how do i add an
external dependency to mozilla?' question.
Because win32 & classic mac do not have the concept of installing libraries on a
system to be used for general development, we have historically handled external
dependencies by checking them into the tree.  I really despise that approach &
hope that we can come up with something better.  I don't want to see us becoming
a maintainer of a fork of the external library if it's not necessary.

Anyway, if a component has a new build dependency, it needs to be run by
staff@mozilla.org & the build release team first so that the nightly automation
& tinderboxes can be updated before the new dependency is enabled.

I would prefer to use --with-mdnkit-prefix to tell where mDNkit was installed. 
This module sounds like it would start off life as an optional extension so the
existing --enable-extensions mechanism would work if it were checked in under
mozilla/extensions/ .  Otherwise, we would need an --enable-idn option.
Target Milestone: mozilla0.9.9 → mozilla1.2
FYI, an update on the standards effort:
On 3/4/2002, the IDN WG co-chairs sent an email to the mailing list,
idn@ops.ietf.org, stating that the document authors had summarized the
comments and drafted new versions
 - draft-ietf-idn-idna-07.txt
 - draft-ietf-idn-nameprep-08.txt
 - draft-ietf-idn-punycode-01.txt
 - draft-hoffman-stringprep-01.txt

These documents will be sent for IESG consideration for
Proposed Standard on March 11th 2002.
JPNIC published IDN XPCOM patch for Mozilla-0.9.8 on JPNIC's Web.

1) patch for libnecko
    http://www.nic.ad.jp/ja/idn/mdnkit/download/misc/mozilla-idn-builtin-patch
2) patch for shared library
    http://www.nic.ad.jp/ja/idn/mdnkit/download/misc/mozilla-idn-standalone-patch

Apologize this is out of date and nothing improved since I reported a few
monthes ago.  I hope others contribution to progress the work.

To use this patch, please follow following procedure.

a) download and install mDNkit-2.4
    http://www.nic.ad.jp/ja/idn/mdnkit/download/
b) apply patch which you like to Mozilla-0.9.8-release source
c) create configure with autoconf-2.1.3 (autoconf-2.5 will fail)
d) execute configure with '--with-mdnkit' option
YONEYA san, is it possible to separate components from mdnkit and check in to
Mozilla tree? 
I think Mozilla only needs nameprep and ACE encode/decode.
Hotta-san,

It's hard to extract NAMEPREP and ACE module from current mDNkit.
JPNIC is developping idnkit -- succeeder of mDNkit -- which will be released
within a few months.  It would be a good candidate to incorporate IDN libraries
to Mozilla.  Would you please wait a while?

BTW, JPNIC can't promise being in charge of incorporate work for Mozilla.  Any
other's cooperation is highly recommended :-)
I did some research reading the drafts.

http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-02.txt
The author put a sample C code in the draft. I put the code to mozilla and
worked fine. But I cannot find a test case, I mean live site to test with.

http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-03.txt
I wrote a test code to estimate the size of mapping table for stringprep.
The result was about 11 kb for the mapping table encoded to a C code.

We also need KC nomalization (compatible decompose and canonical compose). ICU
(International Components for Unicode) has the normilization code. It uses a
binary data about 100 Kb to do the mapping. We need investigation if we can use
this for mozilla.


Hotta-san, you can try http://zq--wgv71a119e.dready.org/ (the nameprepped
version of <kanji-nihongo>.dready.org).
Depends on: 12063
It works if only no ASCII is feeded to the punycode encoder. 
I mean "<kanji-nihongo>" part but input "<kanji-nihongo>.dready.org" produces a
different result, ".dready.org-tj8wq5jvv8nw". 
Is there a spec about how to separate a host name and apply punycode?
Quoted from http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-10.txt

> Nameprep is used to process domain name labels, not domain names. IDNA
> calls nameprep for each label in a domain name, not for the whole domain
> name.

Just split the domain name at U+002E to get the labels.
A prefix "zq--" is used in http://zq--wgv71a119e.dready.org/.

http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-10.txt
section 5 is for ACE prefix which lists several prefixes but does not explain each.
Which prefix do we need to prepend to the encoded string?
JPNIC released IDN XPCOM patch for Mozilla-1.0 libnecko.

  http://www.nic.ad.jp/ja/idn/mdnkit/download/misc/mozilla-1.0-mdnkit-2.4.diff

Please refer to my previous message to apply it and compile.

  http://bugzilla.mozilla.org/show_bug.cgi?id=112979#c13

BTW, "zq--" is implicitly used for Punycode(AMC-ACE-Z)'s testing prefix.
Only "bq--" for RACE and "dq--" for DUDE were registered for testing by IDN WG.

  http://www.i-d-n.net/#ace_registry
Sorry, U+002E is not the only label separator. Section 3.1 in 
http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-10.txt
says:

> 1) Whenever dots are used as label separators, the following characters
> MUST be recognized as dots: U+002E (full stop), U+3002 (ideographic full
> stop), U+FF0E (fullwidth full stop), U+FF61 (halfwidth ideographic full
> stop).
I see, I was wondering why those are not found in the string prep table.
Target Milestone: mozilla1.2alpha → ---
darin, could you review the patch?
Comment on attachment 100615 [details] [diff] [review]
Adding empty IDN implementation to make actual implementation done easily.

sr=darin
Attachment #100615 - Flags: superreview+
Comment on attachment 100615 [details] [diff] [review]
Adding empty IDN implementation to make actual implementation done easily.

r=shanjian
Attachment #100615 - Flags: review+
the patch was checked in to the trunk
the bug is still open since I only checked in the empty implementation
Depends on: 8275
No longer depends on: 12063
Blocks: 92109
Target Milestone: --- → mozilla1.3alpha
Updating list of propose standards:

 - draft-ietf-idn-idna-13.txt: Internationalizing Domain Names In
   Applications (IDNA) (Proposed Standard)

 - draft-ietf-idn-punycode-03.txt: Punycode:A Bootstring encoding
   of Unicode for IDNA (Proposed Standard)  

 - draft-ietf-idn-nameprep-11.txt: Nameprep: A Stringprep Profile
   for Internationalized Domain Names (Proposed Standard)

 - draft-hoffman-stringprep-07.txt: Preparation of
   Internationalized Strings ('stringprep') (Proposed Standard)

For status see http://www.ietf.org/IESG/status.html
Blocks: 162571
The code was taken from
http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-03.txt

ACE prefix can be specified by pref "network.IDN_prefix". But I think currently
no prefix is assigned for punycode.
Attachment #102590 - Attachment is obsolete: true
FYI, just received this email that the IDN IDs have been approved
as Proposed Standards by the IESG:

Subject: [idn] Protocol Action: Internationalizing Domain Names In
         Applications (IDNA) to Proposed Standard
Date: Thu, 24 Oct 2002 14:51:18 -0400
From: The IESG <iesg-secretary@ietf.org>

The IESG has approved publication of the following Internet-Drafts
as Proposed Standards:

o Internationalizing Domain Names In Applications (IDNA)
  as a Proposed Standard.  
o Nameprep: A Stringprep Profile for Internationalized Domain Names  
o Punycode: An encoding of Unocode for use with IDNA
According to "Internationalizing Domain Names In Applications (IDNA)",
http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-14.txt
>The ACE prefix for IDNA is "IESG--".

Attachment #104034 - Attachment is obsolete: true
Changed QA contact to ylong@netscape.com.
QA Contact: teruko → ylong
Attached patch Updated JPNIC license. (obsolete) — Splinter Review
Attachment #104044 - Attachment is obsolete: true
Target Milestone: mozilla1.3alpha → mozilla1.3beta
Attachment #108818 - Flags: review?(ftang)
RFC 3454: Preparation of Internationalized Strings ("stringprep")
http://www.ietf.org/rfc/rfc3454.txt
Blocks: 188215
Blocks: 188218
Attached patch JPNIC source files (obsolete) — Splinter Review
separated the last patch into two, JPNIC code and the xpcom glue
Attachment #108818 - Attachment is obsolete: true
Attached patch xpcom glue code (obsolete) — Splinter Review
Attachment #110979 - Flags: review?(ftang)
idnkit-1.0pr2
http://www.nic.ad.jp/ja/idn/mdnkit/download/
Attachment #110978 - Attachment is obsolete: true
Attachment #110979 - Flags: review?(ftang) → review+
Comment on attachment 110979 [details] [diff] [review]
xpcom glue code

This is xpcom glue for the JPNIC IDN code (which also needs your review later).
Attachment #110979 - Flags: superreview?(darin)
Yoneya san,
I started to look at the nameprep implementation. There are three mapping tables
and some macros are defined. I can see that it generates an offset from the
input code point and map to one or more code points, but not clear about the detail.
Is it possible to put some comments in nameprep_template.c or nameprepdata.c to
explain about those tables?
Comment on attachment 110979 [details] [diff] [review]
xpcom glue code

nhotta: why do you need to make these changes to the DNS service?  why can't
everything live behind nsIIDNService?  why can't this service be implemented
inside mozilla/intl/?
in fact, nsDnsService should only need to deal with ASCII hostnames.  nsIURI
implmentations (such as nsStandardURL) should provide the mapping from UTF-8
hostnames to ACE hostnames (see nsIURI::asciiHost).
The patch implements nsIIDNService. Unicode normalization is implemented in
mozilla/intl (bug 8275), the stringprep and ACE implementation are specific to
IDN so implemented in nsIIDNService. 
Do you mean we should implement stringprep and ACE in nsIURI then nsIIDNService
to call them?
nhotta: i'm sorry for the confusion.  i misread your patch and thought you were
modifying the DNS service directly.  modifying nsIDNService is completely
correct.  sorry about that!! ;-)
Comment on attachment 110979 [details] [diff] [review]
xpcom glue code

>+CSRCS     = race.c  \
>+            nameprep.c \
>+            punycode.c

so these files are going to be added to necko as well?


>Index: dns/src/nsIDNService.cpp

>+  nsCOMPtr<nsIPref> prefs(do_GetService(NS_PREF_CONTRACTID, &rv));
>+  if (NS_SUCCEEDED(rv)) {
>+
>+    // to support test environment
>+    PRBool value;
>+    rv = prefs->GetBoolPref("network.IDN_testbed", &value);
>+    if (NS_SUCCEEDED(rv))
>+      mMltbd = value;
>+
>+    // read prefix from pref
>+    nsXPIDLCString prefix;
>+    rv = prefs->GetCharPref("network.IDN_prefix", getter_Copies(prefix));
>+    if (NS_SUCCEEDED(rv) && 
>+        prefix.Length() == kACEPrefixLen)
>+      strncpy(mACEPrefix, prefix.get(), kACEPrefixLen);
>+  }

mMltbd seems like a terribly cryptic name.  can you come up with something
better?  also it looks as if mMltbd will be uninitialized if you fail to
access the prefs service.  and, you should not use nsIPref in new code.  use
nsIPrefService/nsIPrefBranch instead.

> NS_IMETHODIMP nsIDNService::ConvertUTF8toACE(const char *input, char **_retval)
> {
>+  nsAutoString ustr;
>+
>+  ustr.Assign(NS_ConvertUTF8toUCS2(input));

replace this with the following:

    NS_ConvertUTF8toUCS2 ustr(input);

doing so eliminates one buffer copy.


>+      ace.Append(encodedBuf);
>+      ace.Append('.');

nit: 
	ace.Append(encodedBuf + NS_LITERAL_CSTRING("."));

each time you call Append on a string you are potentially calling malloc,
so it is worthwhile to coalesce calls to Append in this manner especially
from within a loop.


>+  *_retval = ToNewCString(ace);

the methods on nsIIDNService should actually take AUTF8String and ACString
types instead of "string" types.  as it currently exists the interface is
not scriptable because it uses "string" to represent UTF8 data.  xpconnect
will fail to convert UCS2 to UTF8 before calling ConvertUTF8toACE.  of
course the methods are currently marked |noscript| because of this fact.
we should file a bug to fix the interface or if you have time it would be
great if you could fix the interface now.  doing so would also cut down on
the number of buffer copies since you should not have to call ToNewCString
at the end of each method.

here's what the interface should look like:

interface nsIIDNService : nsISupports
{
  ACString    convertUTF8toACE(in AUTF8String input);
  AUTF8String convertACEtoUTF8(in ACString input);
};

also, we probably need to have a method on this interface to test whether
an ASCII string should be converted to UTF8 using convertACEtoUTF8.  this
method would be called by nsStandardURL in order to provide UTF8 to the
caller of nsIURI::GetHost.


>+static nsresult punycode(const char* prefix, const nsAString& in, nsACString& out)
...
>+  out.Assign(prefix);
>+  out.Append(encodedBuf);

nit:
    out.Assign(nsDependentCString(prefix) + nsDependentCString(encodedBuf));


>+static nsresult encodeToRACE(const char* prefix, const nsAString& in, nsACString& out)
...
>+  out.Assign(prefix);
>+  out.Append(encodedBuf);

nit:
    out.Assign(nsDependentCString(prefix) + nsDependentCString(encodedBuf));


>+nsresult nsIDNService::encodeToACE(const nsAString& in, nsACString& out)
>+{
>+  // RACE encode is supported for existing testing environment
>+  if (!strcmp("bq--", mACEPrefix))
>+    return encodeToRACE(mACEPrefix, in, out);
>+  // else do punycoce
>+  else
>+    return punycode(mACEPrefix, in, out);
>+}

else after return statement is meaningless.  do this instead:

 {
   if (...)
     return encodeToRACE(...);

   return punycode(...);
 }



>+
>+nsresult nsIDNService::stringPrepAndACE(const nsAString& in, nsACString& out)
>+{
>+  nsresult rv = NS_OK;
>+
>+  out.Truncate();
>+
>+  if (in.Length() >= kMaxDNSNodeLen) {
>+    NS_ERROR("IDN node too large");
>+    return NS_ERROR_FAILURE;
>+  }
>+
>+  if (IsASCII(in))
>+    out.Append(NS_LossyConvertUCS2toASCII(in));

nit:
      CopyASCIItoUCS2(in, out);

this is better since it eliminates the intermediate conversion buffer.


>+    nsCAutoString encodedBuf;
>+    rv = encodeToACE(strPrep, encodedBuf);
>+    out.Append(encodedBuf);

why don't you just do this:

      rv = encodeToACE(strPrep, out);

thereby eliminating the intermediate buffer "encodedBuf"
Attachment #110979 - Flags: superreview?(darin) → superreview-
>+CSRCS     = race.c  \
>+            nameprep.c \
>+            punycode.c

> so these files are going to be added to necko as well?

yes nameprep.c implements RFC 3454 and punycode is going to be a standard,
race.c can be removed after punycode becomes standard

>the methods on nsIIDNService should actually take AUTF8String and ACString
>types instead of "string" types. 

let me do the change separately (filed as bug 188410)


Attachment #110979 - Attachment is obsolete: true
Attachment #111093 - Flags: superreview?(darin)
Comment on attachment 111093 [details] [diff] [review]
xpcom glue code (address reviewer's comments)

>Index: nsIDNService.cpp

> nsIDNService::nsIDNService()
...
>+      if (NS_SUCCEEDED(rv))
>+        mMultilingualTestBed = value;

mMultilingualTestBed will be uninitialized if, for example,
the pref service is not available.  shouldn't you at least
initialize it to some sane value?

fix that and sr=darin
Attachment #111093 - Flags: superreview?(darin) → superreview+
Attachment #111097 - Flags: review+
Comment on attachment 111097 [details] [diff] [review]
JPNIC code based on idnkit-1.0pr2 (added comment about the mapping table)

This is from idnkit-1.0pr2.
http://www.nic.ad.jp/ja/idn/mdnkit/download/

punycode is from the internet draft
http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-03.txt
Attachment #111097 - Flags: superreview?(darin)
Attachment #110982 - Flags: review+
Comment on attachment 110982 [details] [diff] [review]
JPNIC nameprep data from idnkit-1.0pr2

This is from idnkit-1.0pr2, mapping table for RFC 3454,
http://www.nic.ad.jp/ja/idn/mdnkit/download/
Attachment #110982 - Flags: superreview?(darin)
Comment on attachment 110982 [details] [diff] [review]
JPNIC nameprep data from idnkit-1.0pr2

sr=darin
Attachment #110982 - Flags: superreview?(darin) → superreview+
Comment on attachment 111097 [details] [diff] [review]
JPNIC code based on idnkit-1.0pr2 (added comment about the mapping table)

sr=darin
Attachment #111097 - Flags: superreview?(darin) → superreview+
by how much does this IDN support code bloat necko.dll?  given that IDN is not
all that common, it might be worth it to break this out into its own library.
In today's debug build on my machine, necko.dll is 1321k without the patch and
1365k after the patch.

I am going to check in the JPNIC code without hooking up to the build. The glue
code has to wait for Unicode normalizer (bug 8275).
nhotta: the size change in an optimized build would be more interesting.  since
a good portion of the increase due to this patch is the large nameprep table, it
stands to reason that much of the 45k increase would be transfered to an
optimized build.  and, last i checked the optimized size of necko.dll is roughly
450k, so this change potentially increases the size of necko.dll by 10%.  if
true, then that is not a trivial increase.  perhaps it might make sense to move
the nsIDNService into necko2.dll, which contains the less frequently used
networking code.  at any rate, i just think this is something that should be
investigated.
>perhaps it might make sense to move the nsIDNService into necko2.dll, 
>which contains the less frequently used networking code.
I think that make sense at least for now.
darin, how can we detach nsIDNService from necko.dll?
Do we need to make a new directory like netwerk/idn and move the files from
netwerk/dns because we first need to detach it from neckodns_s.lib?
nhotta: right something like that would be required.  but before going to all
that trouble, can you determine if this is even necessary?  i'm only suggesting
that it might be a good idea, but without actually comparing optimized builds
it's nearly impossible to know for sure.
I disabled debug, ac_add_options --disable-debug.
necko.dll without IDN is 696k
necko.dll with IDN is 732k

so, difference is 36k. 
don't forget making a new DLL has the 10k disk / 32k runtime footprint overhead.
perhaps this belongs in necko2 so we don't take the hit of another dll?
alec, the current patch is adding to necko.dll which increase 36k for non debug
build (see comment #65). Do you think that is okay? Or do we want to put IDN to
necko2.dll?
I don't know enough about how much IDN is used to know where we should put it -
necko2 is no longer in the default embedding client (though embeddors are still
welcome to add it themselves) - so the question is, is it safe to leave out of
the default embedding builds?
IDN has to be included as default. It is going to be a standard soon.
Let me proceed with the current approach. We can do separate dll later if 36k is
a problem and the change should be a mechanical one.
nhotta: did your non-debug build include the "--enable-optimize" flag?  if not,
then 36k might be an overestimate which is good :)

i'm starting to agree more and more with just keeping this in necko.dll.  let's
go for that for now.
>nhotta: did your non-debug build include the "--enable-optimize" flag?
No, I didn't. With that flag,
necko.dll without IDN is 428k
necko.dll with IDN is 460k

32k is added by IDN
it isn't many hours since i built from CVS, and after re-building now, including
this checkin, i suddenly see HTTP headers in top of content area at 
http://www.aftenposten.no
Please ignore comment 72 - people see this with older builds too. Site error i think
Is this related to bug 157388, maybe even a duplicate? Or are the .nu domains
problems with letters like זרו etc. a different case?
sorry for the previous spam. Actually I thought I was standing on another bug
when submitting. Maybe still a bit related, but this one seems very technical ;-)
checked in to the trunk on 1/17/2003
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Attachment #108818 - Flags: review?(ftang)
RFC update.  In addtion to:
  - stringprep: RFC 3454 (http://www.ietf.org/rfc/rfc3454.txt)

3 more RFCs have been finalized:

  - IDNA: RFC 3490 (http://www.ietf.org/rfc/rfc3490.txt)
  - nameprep: RFC 3491 (http://www.ietf.org/rfc/rfc3491.txt)
  - punycode: RFC 3492 (http://www.ietf.org/rfc/rfc3492.txt)
Mark as verified since IDN works on trunk build by following the instruction in
comment #76, and the we have bugs for remain problems.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: