Remove MOZ_UTF16() macro in favor of using u"string" directly

RESOLVED FIXED in Firefox 50

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: xidorn, Assigned: cpeterson)

Tracking

unspecified
mozilla50
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox50 fixed)

Details

Attachments

(4 attachments)

We are currently using MOZ_UTF16() macro to workaround the difference between VC compiler (which uses L"") and other compilers (which accepts u""). This macro can be removed once we drop the support of VS2013, because VS2015 supports u"string" directly.
I have a WIP patch for this bug.
Assignee: nobody → cpeterson
Blocks: 904985, 796948
Depends on: 928351
Part 1: Now that we dropped VS2013 support in bug 1186064, use VS2015's real char16_t type instead of aliasing wchar_t. Also remove MOZ_CHAR16_IS_NOT_WCHAR #ifdefs because char16_t is no longer ever wchar_t.

Note that this change means MSVC Windows builds now use the `char16ptr` workaround added for mingw in bug 928351. I looked into removing char16ptr, replacing its implicit char16_t*/wchar_t* type conversations with explicit casts in Windows platform code, but the changes would affect a surprising amount of code.
Attachment #8772274 - Flags: review?(jwalden+bmo)
Part 2: Expand MOZ_UTF16() strings to u"" string literals. This patch is mostly a find-and-replace of MOZ_UTF16("…") to u"…" with some indenting fixes.
Attachment #8772275 - Flags: review?(jwalden+bmo)
Part 3: Expand MOZ_UTF16() characters to u'' character literals. This patch is mostly a find-and-replace of MOZ_UTF16('…') to u'…' with some indenting fixes.

Curiously, SpiderMonkey's wasm code is the only code in mozilla-central that uses char16_t character literals!
Attachment #8772276 - Flags: review?(luke)
Part 4: Remove all remaining uses of MOZ_UTF16(). These uses are mostly within other macro definitions like NS_LITERAL_STRING and NS_STATIC_ATOM_BUFFER.

This patch relies on preprocessor string concatenation of `u"x" "y"` producing `u"xy", thus upgrading "y" to a char16_t string literal. The alternative is a huge set of changes for things like `NS_LITERAL_STRING("…")` to `NS_LITERAL_STRING(u"…")` and changing every HTML5_ATOM() definition in parser/html/nsHtml5AtomList.h, which has other fallout. Note that a few NS_LITERAL_STRING() callers must be changed when their string literal parameter actually contains escaped Unicode characters like "\x2026".

Is removing the MOZ_UTF16() macro worth exposing the u"" string concatenation hack to some macro callers? At this point, the only remaining use of Char16.h is the Windows-only `char16ptr` class and some mostly irrelevant static asserts. Maybe keeping MOZ_UTF16() for these few special cases is OK.
Attachment #8772277 - Flags: review?(jwalden+bmo)
A mostly green Try run, save for some Android test timeouts:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b83211cf6d9b
OS: Unspecified → All
Hardware: Unspecified → All
(In reply to Chris Peterson [:cpeterson] from comment #2)
> Note that this change means MSVC Windows builds now use the `char16ptr`
> workaround added for mingw in bug 928351.

MSVC 2015 is already using char16ptr_t.
Comment on attachment 8772276 [details] [diff] [review]
part-3-use-unicode-chars.patch

Review of attachment 8772276 [details] [diff] [review]:
-----------------------------------------------------------------

\o/
Attachment #8772276 - Flags: review?(luke) → review+
Comment on attachment 8772274 [details] [diff] [review]
part-1-use-char16_t.patch

Review of attachment 8772274 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/HashFunctions.h
@@ -290,5 @@
> - * the same width!
> - */
> -#ifdef WIN32
> -MOZ_MUST_USE inline uint32_t
> -HashString(const wchar_t* aStr)

If wchar_t != char16_t everywhere, then these two overloads need to stay around so that HashString(const wchar_t*) and the other keep working.

::: mfbt/NumericLimits.h
@@ +30,4 @@
>  template<>
>  class NumericLimits<char16_t> : public std::numeric_limits<uint16_t>
>  {
>    // char16_t and uint16_t numeric limits should be exactly the same.

std::numeric_limits<char16_t> should exist now, everywhere, I think.  So in a separate bug (or at least patch), try removing 1) this specialization, then 2) NumericLimits entirely (replacing it with std::numeric_limits.  As the comment above suggests, NumericLimits existed basically only for char16_t emulation support.  Now that it's never emulated, I think NumericLimits{,.h} can just go away completely.

::: mfbt/TypeTraits.h
@@ -97,5 @@
>  template<> struct IsIntegralHelper<long long>          : TrueType {};
>  template<> struct IsIntegralHelper<unsigned long long> : TrueType {};
>  template<> struct IsIntegralHelper<bool>               : TrueType {};
> -template<> struct IsIntegralHelper<wchar_t>            : TrueType {};
> -#ifdef MOZ_CHAR16_IS_NOT_WCHAR

If "char16_t is no longer ever wchar_t", then you should have specializations for both char16_t and wchar_t here -- you shouldn't be removing the latter.

@@ -106,5 @@
>  
>  /**
>   * IsIntegral determines whether a type is an integral type.
>   *
>   * mozilla::IsIntegral<int>::value is true;

This comment has a "Note that the behavior of IsIntegral on char16_t and char32_t is unspecified." that you should remove.

@@ -443,5 @@
>  template<> struct IsPod<bool>               : TrueType {};
>  template<> struct IsPod<float>              : TrueType {};
>  template<> struct IsPod<double>             : TrueType {};
> -template<> struct IsPod<wchar_t>            : TrueType {};
> -#ifdef MOZ_CHAR16_IS_NOT_WCHAR

Ditto.
Attachment #8772274 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8772275 [details] [diff] [review]
part-2-use-unicode-strings.patch

Review of attachment 8772275 [details] [diff] [review]:
-----------------------------------------------------------------

Quasi-rs= here, I skimmed things super-quickly.
Attachment #8772275 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8772277 [details] [diff] [review]
part-4-remove-MOZ_UTF16.patch

Review of attachment 8772277 [details] [diff] [review]:
-----------------------------------------------------------------

::: parser/htmlparser/nsHTMLTags.cpp
@@ +15,5 @@
>  using namespace mozilla;
>  
>  // static array of unicode tag names
> +#define HTML_TAG(_tag, _classname) (u"" #_tag),
> +#define HTML_HTMLELEMENT_TAG(_tag) (u"" #_tag),

I...am not sure, but you might need to pass this by hsivonen, as I can't remember whether this is in the generated-from-Java part of the HTML parser or not.

::: widget/windows/TSFTextStore.cpp
@@ +834,5 @@
>      //      Therefore, we need to check the langid too.
>      return mLangID == 0x411 &&
>        (mActiveTIPKeyboardDescription.EqualsLiteral("Microsoft IME") ||
>         mActiveTIPKeyboardDescription.Equals(
> +         NS_LITERAL_STRING(u"Microsoft \xC785\xB825\xAE30")) ||

Not sure why the NS_LITERAL_STRING for this stuff when I thought we had EqualsLiteral that would take the u"" string directly, but I guess that's how it was before.
Attachment #8772277 - Flags: review?(jwalden+bmo) → review+
> > -HashString(const wchar_t* aStr)
> 
> If wchar_t != char16_t everywhere, then these two overloads need to stay
> around so that HashString(const wchar_t*) and the other keep working.

> ::: mfbt/TypeTraits.h
> > -template<> struct IsIntegralHelper<wchar_t>            : TrueType {};
> > -#ifdef MOZ_CHAR16_IS_NOT_WCHAR
> 
> If "char16_t is no longer ever wchar_t", then you should have
> specializations for both char16_t and wchar_t here -- you shouldn't be
> removing the latter.

No one was using the HashString(const wchar_t*) or IsIntegralHelper<wchar_t> overloads. I can add them back for completeness, if you prefer, but I assume we want to discourage people from using wchar_t considering its size varies across platforms.


> ::: mfbt/NumericLimits.h
> std::numeric_limits<char16_t> should exist now, everywhere, I think.  So in
> a separate bug (or at least patch), try removing 1) this specialization,
> then 2) NumericLimits entirely (replacing it with std::numeric_limits.  As
> the comment above suggests, NumericLimits existed basically only for
> char16_t emulation support.  Now that it's never emulated, I think
> NumericLimits{,.h} can just go away completely.

OK. I'll follow up in a separate bug for NumericLimits.h.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #11)
> ::: parser/htmlparser/nsHTMLTags.cpp
> @@ +15,5 @@
> >  using namespace mozilla;
> >  
> >  // static array of unicode tag names
> > +#define HTML_TAG(_tag, _classname) (u"" #_tag),
> > +#define HTML_HTMLELEMENT_TAG(_tag) (u"" #_tag),
> 
> I...am not sure, but you might need to pass this by hsivonen, as I can't
> remember whether this is in the generated-from-Java part of the HTML parser
> or not.

parser/htmlparser/ is the legacy HTML parser (for about:blank). The generated-from-Java HTML parser is in parser/html/.
Pushed by cpeterson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6e99af8e9fa4
Part 1: Use VS2015's real char16_t instead of aliasing wchar_t. r=Waldo
https://hg.mozilla.org/integration/mozilla-inbound/rev/282f7afd6765
Part 2: Expand MOZ_UTF16() strings to u"" string literals. r=Waldo
https://hg.mozilla.org/integration/mozilla-inbound/rev/4e02a5ff49f2
Part 3: Expand MOZ_UTF16() characters to u'' char16_t literals. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/dfd60b58fa64
Part 4: Remove MOZ_UTF16() macro. r=Waldo
Blocks: 1288355
Blocks: 1288603
(In reply to Chris Peterson [:cpeterson] from comment #5)
> Is removing the MOZ_UTF16() macro worth exposing the u"" string
> concatenation hack to some macro callers?

Why didn't port the u##s trick to former MOZ_UTF16() macro callers? (Sorry, I should have pointed out it faster.)
Flags: needinfo?(cpeterson)
(In reply to Masatoshi Kimura [:emk] from comment #16)
> (In reply to Chris Peterson [:cpeterson] from comment #5)
> > Is removing the MOZ_UTF16() macro worth exposing the u"" string
> > concatenation hack to some macro callers?
> 
> Why didn't port the u##s trick to former MOZ_UTF16() macro callers?

Why did I replace `MOZ_UTF16(NS_IOSERVICE_ONLINE)` (as one example) with `(u"" NS_IOSERVICE_ONLINE)` instead of `u##NS_IOSERVICE_ONLINE`?

Without the additional macro expansion of MOZ_UTF16() and MOZ_UTF16_HELPER(), `u##NS_IOSERVICE_ONLINE` is expanded to the bogus token `uNS_IOSERVICE_ONLINE` instead of the expected string `u"online"`. `u"" NS_IOSERVICE_ONLINE`, on the other hand, will be expanded to `u"" "online"` which the compiler will concatenate to the expected string `u"online"`.
Flags: needinfo?(cpeterson)
You need to log in before you can comment on or make changes to this bug.