Closed Bug 928351 Opened 6 years ago Closed 6 years ago

Mixing char16_t and wchar_t broke mingw builds

Categories

(Core :: String, defect)

All
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: jacek, Assigned: jacek)

References

(Blocks 1 open bug)

Details

(Whiteboard: [qa-])

Attachments

(2 files, 6 obsolete files)

From bug 895047:

 G++ follows standards so it has char16_t, which is different from wchar_t. It would require having tons of casts between win32 API, which uses wchar_t extensivelly, and the rest of code using char16_t.

Looking at the solution for MSVC from the patch:

#  define char16_t wchar_t

I'm pretty sure that's not a good idea to maintain. Trying the same with g++ results in having tons of redefinitions for types specialized both for wchar_t and char16_t. This will also cause similar problems on MSVC once (if) it will have proper char16_t support.
What redefinition errors do you get?
Component: General → String
Compiling with the same define that is used on MSVC results in errors like:

In file included from /usr/local/i686-w64-mingw32/include/c++/4.8.2/bits/stl_algobase.h:61:0,
                 from /usr/local/i686-w64-mingw32/include/c++/4.8.2/bits/char_traits.h:39,
                 from /usr/local/i686-w64-mingw32/include/c++/4.8.2/string:40,
                 from /home/jacek/mozilla/mozilla-central/mfbt/decimal/Decimal.h:44,
                 from /home/jacek/mozilla/mozilla-central/mfbt/decimal/Decimal.cpp:31:
/usr/local/i686-w64-mingw32/include/c++/4.8.2/bits/cpp_type_traits.h:184:12: error: redefinition of ‘struct std::__is_integer<wchar_t>’
     struct __is_integer<char16_t>
            ^
/usr/local/i686-w64-mingw32/include/c++/4.8.2/bits/cpp_type_traits.h:175:12: error: previous definition of ‘struct std::__is_integer<wchar_t>’
     struct __is_integer<wchar_t>
            ^

Relevant parts of cpp_type_traits.h are:

# ifdef _GLIBCXX_USE_WCHAR_T
  template<>
    struct __is_integer<wchar_t>
    {
      enum { __value = 1 };
      typedef __true_type __type;
    };
# endif

#if __cplusplus >= 201103L
  template<>
    struct __is_integer<char16_t>
    {
      enum { __value = 1 };
      typedef __true_type __type;
    };

Note that with char16_t defined to wchar_t, this is indeed a duplication.
Component: String → General
(Please let's keep this in Core::String.)

Can't we just #include <bits/c++config.h> and then #undef _GLIBCXX_USE_WCHAR_T?
Component: General → String
I tried that and got similar error about atomic<> redeclaration:

  /// Explicit specialization for wchar_t.
  template<>
    struct atomic<wchar_t> : public atomic_wchar_t
    {
      typedef wchar_t                   __integral_type;
      typedef atomic_wchar_t            __base_type;

      atomic() noexcept = default;
      ~atomic() noexcept = default;
      atomic(const atomic&) = delete;
      atomic& operator=(const atomic&) = delete;
      atomic& operator=(const atomic&) volatile = delete;

      constexpr atomic(__integral_type __i) noexcept : __base_type(__i) { }

      using __base_type::operator __integral_type;
      using __base_type::operator=;
    };

  /// Explicit specialization for char16_t.
  template<>
    struct atomic<char16_t> : public atomic_char16_t
    {
      typedef char16_t                  __integral_type;
      typedef atomic_char16_t           __base_type;

      atomic() noexcept = default;
      ~atomic() noexcept = default;
      atomic(const atomic&) = delete;
      atomic& operator=(const atomic&) = delete;
      atomic& operator=(const atomic&) volatile = delete;

      constexpr atomic(__integral_type __i) noexcept : __base_type(__i) { }

      using __base_type::operator __integral_type;
      using __base_type::operator=;
    };
(In reply to comment #4)
> I tried that and got similar error about atomic<> redeclaration:

Shoot...  Is that guarded by a macro?  Who #includes <atomic>?
Before I could reproduce the atomic error today, I hit hash<> redeclaration similar to previous ones, which is also not guarded by a macro. It's very early in the build process, so many more problems will probably show later. I do not think that trying to hack around all those is possible without changes to GCC that would disable char16_t type or make it an alias to char16_t (for which a patch could be rejected by GCC anyway, because that's not what spec says).

How about simply using a new type like char16 or Char16_t in Mozilla? I understand that using standard types is nice, but, as you can see, the standard type does not meet requirements. Those tricks for MSVC while may work, change char16_t to something it's not. This way, char16_t is not really char16_t (not in Windows builds) and calling it char16_t is missleading. Could we be explicit about this being a new type?
I'm not sure I understand. We do the #define for MSVC because it doesn't yet have a standard char16_t. But since mingw does already have that, why wouldn't we use it?
Windows API uses wchar_t in many places and we'd need casts for all of them (it's mostly for pointers, you can't pass char16_t* as wchar_t* argument).
We discussed this on IRC today at length.  I suggested trying to redefine WCHAR to char16_t in mingw builds, and Jacek said he'll try doing that.
I tried that and it's not enough. WCHAR is used in platform headers, but crt ones use wchar_t, so there are new conflicts (because wchar_t != WCHAR). I keep trying to work around that.
After more investigation, I decided that trying to make wchar_t=char16_t is not possible and it's better to try make the code right. Changing ns*String classes to accept both wchar_t and char16_t helps a lot. This, combined with a helper char16ptr_t class that I introduced to have a type that can be implicitly casted to both const char16_t* and const wchar_t* (on other build configs that's just |typedef char16_t char16ptr_t|), makes required changes significally smaller. Still, there are tons of places that need simple changes and casts. I have most of the tree compiling already, but it's still quite a lot of work to do.
(In reply to comment #11)
> After more investigation, I decided that trying to make wchar_t=char16_t is not
> possible and it's better to try make the code right. Changing ns*String classes
> to accept both wchar_t and char16_t helps a lot. This, combined with a helper
> char16ptr_t class that I introduced to have a type that can be implicitly
> casted to both const char16_t* and const wchar_t* (on other build configs
> that's just |typedef char16_t char16ptr_t|), makes required changes
> significally smaller. Still, there are tons of places that need simple changes
> and casts. I have most of the tree compiling already, but it's still quite a
> lot of work to do.

Can you please upload your WIP patch so that we can give you feedback on it?

Thanks!
For bug 514173 I'd rather not have to deal with anything other than typeof(MOZ_UTF16("")) == nsAString::char_type[1] etc.
Attached patch WIP (obsolete) — Splinter Review
Sure, I'm attaching the parts that do tricks I mentioned earlier. There are still tons of casts needed and this could probably be improved. char16ptr_t may be an unfortunate name... This type should act as close to |const char26_t*| as possible, but accept coth wchar_t and char16_t. Also, it could be useful to have another such type for non-const pointers. There are still some constructs where it can't be a transparent replacement for |const char16_t|... I have more changes in my tree, including tons of type changes, but this is the most important part to limit number of required casts. Feedback is appreciated.
Comment on attachment 822322 [details] [diff] [review]
WIP

Review of attachment 822322 [details] [diff] [review]:
-----------------------------------------------------------------

Your patch doesn't show where else char16ptr_t is used, but why can't we just modify nsTString.h and nsTSubstring.h to accept wchar_t as well as char16_t on mingw?

::: mfbt/Char16.h
@@ +68,5 @@
>  
> +#ifdef MOZ_USE_CHAR16_WRAPPER
> +  class char16ptr_t {
> +  private:
> +    const char16_t *mPtr;

Nit: we don't use this kind mMember naming convention most of the times.

@@ +69,5 @@
> +#ifdef MOZ_USE_CHAR16_WRAPPER
> +  class char16ptr_t {
> +  private:
> +    const char16_t *mPtr;
> +    typedef decltype(sizeof(int)) size_type;

Why this and not size_t?

@@ +74,5 @@
> +  public:
> +    char16ptr_t(const char16_t *ptr) : mPtr(ptr) {}
> +    char16ptr_t(const wchar_t *ptr) : mPtr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> +    constexpr char16ptr_t(decltype(nullptr)) : mPtr(nullptr) {}

Can't you just make the ptr argument to the first ctor default to nullptr?

@@ +76,5 @@
> +    char16ptr_t(const wchar_t *ptr) : mPtr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> +    constexpr char16ptr_t(decltype(nullptr)) : mPtr(nullptr) {}
> +
> +    operator bool() const {

This is bad, see <http://www.artima.com/cppsource/safebool.html>.

@@ +82,5 @@
> +    }
> +    operator const char16_t *() const {
> +      return mPtr;
> +    }
> +    operator const void *() const {

This is not needed.

@@ +88,5 @@
> +    }
> +    operator const wchar_t *() const {
> +      return reinterpret_cast<const wchar_t*>(mPtr);
> +    }
> +    explicit operator char16_t*() const {

Why are the non-const pointer conversion operators needed?

@@ +94,5 @@
> +    }
> +    explicit operator wchar_t*() const {
> +      return const_cast<wchar_t*>(static_cast<const wchar_t*>(*this));
> +    }
> +    explicit operator void*() const {

This shouldn't be needed.

@@ +97,5 @@
> +    }
> +    explicit operator void*() const {
> +      return const_cast<char16_t*>(mPtr);
> +    }
> +    bool operator==(const char16ptr_t &x) const {

I'm not sure why you're adding all of these other operators.

@@ +120,5 @@
> +      return mPtr[i];
> +    }
> +    };
> +
> +#else

We should try to not put anything into this branch of the #if.
Comment on attachment 822322 [details] [diff] [review]
WIP

Review of attachment 822322 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/Char16.h
@@ +88,5 @@
> +    }
> +    operator const wchar_t *() const {
> +      return reinterpret_cast<const wchar_t*>(mPtr);
> +    }
> +    explicit operator char16_t*() const {

If I'm not mistaken, MSVC doesn't support explicit operators.

@@ +112,5 @@
> +    }
> +    char16ptr_t operator+(size_type add) const {
> +      return char16ptr_t(mPtr+add);
> +    }
> +    long operator-(const char16ptr_t &other) const {

That is ptrdiff_t, not long.

::: xpcom/string/public/nsTString.h
@@ +72,5 @@
>        self_type& operator=( char_type c )                                                       { Assign(c);        return *this; }
>        self_type& operator=( const char_type* data )                                             { Assign(data);     return *this; }
>        self_type& operator=( const self_type& str )                                              { Assign(str);      return *this; }
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> +      self_type& operator=( char16ptr_t data )                                                  { Assign(static_cast<char16_t>(data));     return *this; }

Should it not be static_cast<char16_t*>?

I also feel like the declaration should be operator=( const char16ptr_t data );
(In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment #15)
> @@ +69,5 @@
> > +#ifdef MOZ_USE_CHAR16_WRAPPER
> > +  class char16ptr_t {
> > +  private:
> > +    const char16_t *mPtr;
> > +    typedef decltype(sizeof(int)) size_type;
> 
> Why this and not size_t?

I imagine stddef.h is not included when Char16.h is...
(In reply to Joshua Cranmer [:jcranmer] from comment #16)
> ::: mfbt/Char16.h
> @@ +88,5 @@
> > +    }
> > +    operator const wchar_t *() const {
> > +      return reinterpret_cast<const wchar_t*>(mPtr);
> > +    }
> > +    explicit operator char16_t*() const {
> 
> If I'm not mistaken, MSVC doesn't support explicit operators.

This code is only compiled on MinGW.
OS: All → Windows XP
Thanks for your comments. I'm traveling until tomorrow, I will be back to this later this week.
Addresses most feedback and more changes.
Assignee: nobody → jacek
Attachment #822322 - Attachment is obsolete: true
Contains different changes across the tree (also a few unrelated mingw fixes). With this patch I'm able to build usable Firefox.
(In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment #15)
> Comment on attachment 822322 [details] [diff] [review]
> WIP

Thanks for the review.

> Review of attachment 822322 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Your patch doesn't show where else char16ptr_t is used, but why can't we
> just modify nsTString.h and nsTSubstring.h to accept wchar_t as well as
> char16_t on mingw?

The main motivation was to make get() to return something that may used both by code expecting char16_t and wchar_t. This is not possible without a new type. This is also useful in a few other places.

> ::: mfbt/Char16.h
> @@ +68,5 @@
> >  
> > +#ifdef MOZ_USE_CHAR16_WRAPPER
> > +  class char16ptr_t {
> > +  private:
> > +    const char16_t *mPtr;
> 
> Nit: we don't use this kind mMember naming convention most of the times.

OK, fixed.


> @@ +69,5 @@
> > +#ifdef MOZ_USE_CHAR16_WRAPPER
> > +  class char16ptr_t {
> > +  private:
> > +    const char16_t *mPtr;
> > +    typedef decltype(sizeof(int)) size_type;
> 
> Why this and not size_t?

For that we'd need to include stddef.h here.

> @@ +74,5 @@
> > +  public:
> > +    char16ptr_t(const char16_t *ptr) : mPtr(ptr) {}
> > +    char16ptr_t(const wchar_t *ptr) : mPtr(reinterpret_cast<const char16_t*>(ptr)) {}
> > +
> > +    constexpr char16ptr_t(decltype(nullptr)) : mPtr(nullptr) {}
> 
> Can't you just make the ptr argument to the first ctor default to nullptr?

Without this constructor, using nullptr is ambiguous as it could be casted both to const wchar_t* and const char16_t*.

> @@ +76,5 @@
> > +    char16ptr_t(const wchar_t *ptr) : mPtr(reinterpret_cast<const char16_t*>(ptr)) {}
> > +
> > +    constexpr char16ptr_t(decltype(nullptr)) : mPtr(nullptr) {}
> > +
> > +    operator bool() const {
> 
> This is bad, see <http://www.artima.com/cppsource/safebool.html>.

This class is not meant to be safe of anything like that, this is meant to be as similar to |typedef const char16_t *char16ptr_t| as possible so that there are as few mingw-specific changes as possible. This operator is part of that.

> @@ +82,5 @@
> > +    }
> > +    operator const char16_t *() const {
> > +      return mPtr;
> > +    }
> > +    operator const void *() const {
> 
> This is not needed.

Without that, assigning char16ptr_t to const void* is ambiguous (it could be casted both via const char16_t* and const wchar_t*).

> @@ +88,5 @@
> > +    }
> > +    operator const wchar_t *() const {
> > +      return reinterpret_cast<const wchar_t*>(mPtr);
> > +    }
> > +    explicit operator char16_t*() const {
> 
> Why are the non-const pointer conversion operators needed?

Because you may do casts like: (char16_t*)strptr and it's done in a lot of places in the code. We want to support that here, even if it doesn't look nice. I should probably put quite a few comments explaining that...

> @@ +94,5 @@
> > +    }
> > +    explicit operator wchar_t*() const {
> > +      return const_cast<wchar_t*>(static_cast<const wchar_t*>(*this));
> > +    }
> > +    explicit operator void*() const {
> 
> This shouldn't be needed.

It is to make (void*) casts not ambiguous.

> @@ +97,5 @@
> > +    }
> > +    explicit operator void*() const {
> > +      return const_cast<char16_t*>(mPtr);
> > +    }
> > +    bool operator==(const char16ptr_t &x) const {
> 
> I'm not sure why you're adding all of these other operators.

Because such operators are allowed on pure pointers.

> @@ +120,5 @@
> > +      return mPtr[i];
> > +    }
> > +    };
> > +
> > +#else
> 
> We should try to not put anything into this branch of the #if.

If we have the typedef for other build configs, we may sasely use char16ptr_t wherever we like and it will be no-op on anything other than mingw. See the patch I attached.
(In reply to Joshua Cranmer [:jcranmer] from comment #16)
> @@ +112,5 @@
> > +    }
> > +    char16ptr_t operator+(size_type add) const {
> > +      return char16ptr_t(mPtr+add);
> > +    }
> > +    long operator-(const char16ptr_t &other) const {
> 
> That is ptrdiff_t, not long.

Yes, except we can't use ptrdiff_t here. I used decltype((char*)0-(char*)0), which is ugly but does the job. I'd be glad to have better suggestion.
 
> ::: xpcom/string/public/nsTString.h
> @@ +72,5 @@
> >        self_type& operator=( char_type c )                                                       { Assign(c);        return *this; }
> >        self_type& operator=( const char_type* data )                                             { Assign(data);     return *this; }
> >        self_type& operator=( const self_type& str )                                              { Assign(str);      return *this; }
> > +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> > +      self_type& operator=( char16ptr_t data )                                                  { Assign(static_cast<char16_t>(data));     return *this; }
> 
> Should it not be static_cast<char16_t*>?
> 
> I also feel like the declaration should be operator=( const char16ptr_t data
> );


Yes, good catch, thanks.
It's time to try to have it landed. This is the most tricky part. I will also attach accessible/ part as an example of how remaining changes looks like and hold other changes until those are done.
Attachment #825938 - Attachment is obsolete: true
Attachment #827945 - Flags: review?(ehsan)
Attached patch accessible/ part (obsolete) — Splinter Review
Attachment #827946 - Flags: review?(ehsan)
Attachment #827945 - Attachment description: patch.diff → Char16.h and xpcom/strings/ part.
Fixed a peoblem found on try. This version succeeded:
https://tbpl.mozilla.org/?tree=Try&rev=48e1eba111d3
Attachment #827945 - Attachment is obsolete: true
Attachment #827945 - Flags: review?(ehsan)
Attachment #828631 - Flags: review?(ehsan)
Attachment #827946 - Flags: review?(ehsan)
Comment on attachment 828631 [details] [diff] [review]
Char16.h and xpcom/strings/ part.

Review of attachment 828631 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, but I would like Benjamin to take a look at this as well.

::: mfbt/Char16.h
@@ +68,5 @@
>  
> +#ifdef MOZ_USE_CHAR16_WRAPPER
> +# include <string>
> +  /**
> +   * Win32 API is extensively uses wchar_t, which is represented by a separated

Nit: s/is //

@@ +73,5 @@
> +   * builtin type than char16_t per spec. It's not the case for MSVC, but GCC
> +   * follows the spec. We want to mix wchar_t and char16_t on Windows builds.
> +   * This class is supposed to make it easier. It stores char16_t const pointer,
> +   * but provides implicit casts for wchar_t as well. On other platforms, we
> +   * simply use |typedef const char16_t char16ptr_t|. Here, we want to make

Nit: char16_t*.

@@ +79,5 @@
> +   * are allowed by the typedef.
> +   */
> +  class char16ptr_t {
> +  private:
> +    const char16_t *ptr;

Your indentation is off here, see mfbt/STYLE.

Also, please add a static_assert here checking to make sure the sizes of the two types (wchar_t/char16_t) is the same.

@@ +84,5 @@
> +  public:
> +    char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> +    char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> +    constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}

Please add a comment mentioning why this is necessary.
Attachment #828631 - Flags: review?(ehsan)
Attachment #828631 - Flags: review?(benjamin)
Attachment #828631 - Flags: review+
Thanks for the review. Here the patch updated with comments.
Attachment #828631 - Attachment is obsolete: true
Attachment #828631 - Flags: review?(benjamin)
Attachment #829746 - Flags: review?(benjamin)
Comment on attachment 829746 [details] [diff] [review]
Char16.h and xpcom/strings/ part.

I'm basing this on only using char16ptr_t as the result of nsString::get(). I couldn't work out whether what you achieve by using char16ptr_t in other places. In particular for assigning to strings I assumed a char16_t*/wchar_t* overload would suffice.

>+    /* Without this, nullptr assignment would be ambiguous */
>+    constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}
I don't know why this might be needed. I couldn't find a use in the mozilla-central tree.

>+    operator const void *() const {
>+      return ptr;
>+    }
I see someone passing get() to memcpy. It seems a reasonable thing to do.

>+    operator bool() const {
>+      return ptr != nullptr;
>+    }
get() never returns nullptr, except in the case of an nsXPIDLString or its subclass nsAdoptingString, but in these cases there are better ways of testing for null.

>+    operator std::wstring() const {
>+      return std::wstring(static_cast<const wchar_t*>(*this));
>+    }
Not sure why this is needed, as std::wstring can't be constructed from a const char16_t*.

>+    /* Explicit cast operators to allow things like (char16_t*)str */
>+    explicit operator char16_t*() const {
>+      return const_cast<char16_t*>(ptr);
>+    }
>+    explicit operator wchar_t*() const {
>+      return const_cast<wchar_t*>(static_cast<const wchar_t*>(*this));
>+    }
In theory, nobody should be casting away const from get(). I even found someone casting away const from a .get() and writing to the string. (Fortunately the string was probably not shared, but that's still incorrect Mozilla code that you're wallpapering over.)

>+    explicit operator const char*() const {
>+      return reinterpret_cast<const char*>(ptr);
>+    }
>+    explicit operator const unsigned char*() const {
>+      return reinterpret_cast<const unsigned char*>(ptr);
>+    }
>+    explicit operator unsigned char*() const {
>+      return const_cast<unsigned char*>(reinterpret_cast<const unsigned char*>(ptr));
>+    }
>+    explicit operator void*() const {
>+      return const_cast<char16_t*>(ptr);
>+    }
These look plain wrong, although I found people casting a .get() to various byte types.

>+    /* Some operators used on pointers */
>+    char16_t operator[](size_t i) const {
>+      return ptr[i];
>+    }
Doesn't make sense. Code should just index the nsString directly.

>+    bool operator==(const char16ptr_t &x) const {
>+      return ptr == x.ptr;
>+    }
>+    bool operator==(decltype(nullptr)) const {
>+      return !ptr;
>+    }
>+    bool operator!=(const char16ptr_t &x) const {
>+      return ptr != x.ptr;
>+    }
>+    bool operator!=(decltype(nullptr)) const {
>+      return ptr != nullptr;
>+    }
Doesn't make sense to compare a transient pointer with anything.

>+    char16ptr_t operator+(size_t add) const {
>+      return char16ptr_t(ptr+add);
>+    }
It's unclear whether Mozilla code should be doing this. They should possibly be using Substring or Rebind instead. For instance:
> CopyUTF16toUTF8(fileName.get() + i + 1, fileExt);
should be
> CopyUTF16toUTF8(Substring(fileName, i + 1), fileExt);
(Note that nsExternalHelperAppService::GetTypeFromFile is badly written in general.)
while
> nsDependentString withoutHash(colorStr.get() + 1, colorStr.Length() - 1);
> if (NS_HexToRGB(withoutHash, &color)) {
should be
> if (NS_HexToRGB(Substring(colorStr, 1))) {
For cases I couldn't easily fix I worked around it by using BeginReading() instead.

>+    ptrdiff_t operator-(const char16ptr_t &other) const {
>+      return ptr-other.ptr;
>+    }
>+};
>+
>+inline decltype((char*)0-(char*)0) operator-(const char16_t *x, const char16ptr_t y) {
>+  return x - static_cast<const char16_t*>(y);
>+}
I'm not sure what the use case is for these. I couldn't find any in the mozilla-central tree.
Thanks for the review.

(In reply to neil@parkwaycc.co.uk from comment #29)
> Comment on attachment 829746 [details] [diff] [review]
> Char16.h and xpcom/strings/ part.
> 
> I'm basing this on only using char16ptr_t as the result of nsString::get().
> I couldn't work out whether what you achieve by using char16ptr_t in other
> places. In particular for assigning to strings I assumed a
> char16_t*/wchar_t* overload would suffice.

This is also useful in other places. There are quite a few cases where wchar_t is mixed with char16_t and for those simple type change is the easiest solution. Sorry I didn't make it clear earlier. I skipped some comments that were based on that assumption.

> >+    /* Without this, nullptr assignment would be ambiguous */
> >+    constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}
> I don't know why this might be needed. I couldn't find a use in the
> mozilla-central tree.

That's for something like this:
char16ptr_t str = nullptr;

> >+    operator std::wstring() const {
> >+      return std::wstring(static_cast<const wchar_t*>(*this));
> >+    }
> Not sure why this is needed, as std::wstring can't be constructed from a
> const char16_t*.

It can be constructed on MSVC (because there is no char16_t is the same as wchar_t) and that's what I'm trying to emulate here. IPC code from chromium uses wstring in many places.

> >+    /* Explicit cast operators to allow things like (char16_t*)str */
> >+    explicit operator char16_t*() const {
> >+      return const_cast<char16_t*>(ptr);
> >+    }
> >+    explicit operator wchar_t*() const {
> >+      return const_cast<wchar_t*>(static_cast<const wchar_t*>(*this));
> >+    }
> In theory, nobody should be casting away const from get(). I even found
> someone casting away const from a .get() and writing to the string.
> (Fortunately the string was probably not shared, but that's still incorrect
> Mozilla code that you're wallpapering over.)

There are some places where such casts are made due to lack proper const support in Win32 API. See this for example (there are more cases even in the same file):
https://github.com/mozilla/mozilla-central/blob/master/embedding/components/printingui/src/win/nsPrintDialogUtil.cpp#L809


> >+    explicit operator const char*() const {
> >+      return reinterpret_cast<const char*>(ptr);
> >+    }
> >+    explicit operator const unsigned char*() const {
> >+      return reinterpret_cast<const unsigned char*>(ptr);
> >+    }
> >+    explicit operator unsigned char*() const {
> >+      return const_cast<unsigned char*>(reinterpret_cast<const unsigned char*>(ptr));
> >+    }
> >+    explicit operator void*() const {
> >+      return const_cast<char16_t*>(ptr);
> >+    }
> These look plain wrong, although I found people casting a .get() to various
> byte types.

There are such casts for different Win32 API. For example, strings are casted to BYTE* (aka. unsigned char*) for Reg* arguments. All of above I wrote after hitting compile error on a cast, they are not imagined cases.
Comment on attachment 829746 [details] [diff] [review]
Char16.h and xpcom/strings/ part.

Asking another review from Jeff on the MFBT part (mostly for MFBT style comments and such.)
Attachment #829746 - Flags: review?(jwalden+bmo)
(In reply to Jacek Caban from comment #30)
> (In reply to comment #29)
> > (From update of attachment 829746 [details] [diff] [review])
> > I'm basing this on only using char16ptr_t as the result of nsString::get().
> > I couldn't work out whether what you achieve by using char16ptr_t in other
> > places. In particular for assigning to strings I assumed a
> > char16_t*/wchar_t* overload would suffice.
> 
> This is also useful in other places. There are quite a few cases where
> wchar_t is mixed with char16_t and for those simple type change is the
> easiest solution. Sorry I didn't make it clear earlier. I skipped some
> comments that were based on that assumption.
Ah, so you're basically using it as a cast function? I hope that doesn't confuse people more than a bunch of reinterpret_cast<wchar_t*>()s would.

> > >+    operator std::wstring() const {
> > >+      return std::wstring(static_cast<const wchar_t*>(*this));
> > >+    }
> > Not sure why this is needed, as std::wstring can't be constructed from a
> > const char16_t*.
> 
> It can be constructed on MSVC (because there is no char16_t is the same as
> wchar_t) and that's what I'm trying to emulate here. IPC code from chromium
> uses wstring in many places.

Obviously MSVC can construct a std::wstring from a char16ptr_t because it is a typedef for const wchar_t*. I'm trying to work out why gcc would have trouble constructing a std::wstring from a char16ptr_t. I admit I wasn't using mingw but a basic test of gcc seemed to work for me.
Comment on attachment 829746 [details] [diff] [review]
Char16.h and xpcom/strings/ part.

Why do we need the new string constructors? If you have a char16ptr_t and you're constructing a nsString, won't the operators take care of the conversion?

This feel fragile: most people aren't going to know when they need to add these special char16ptr_t methods, and so this is going to break regularly :-(
(In reply to neil@parkwaycc.co.uk from comment #32)
> > This is also useful in other places. There are quite a few cases where
> > wchar_t is mixed with char16_t and for those simple type change is the
> > easiest solution. Sorry I didn't make it clear earlier. I skipped some
> > comments that were based on that assumption.
> Ah, so you're basically using it as a cast function?

Yes.

> I hope that doesn't confuse people more than a bunch of reinterpret_cast<wchar_t*>()s would.

The hope is that it will be needed much less places than reinterpret_casts would.

> > > >+    operator std::wstring() const {
> > > >+      return std::wstring(static_cast<const wchar_t*>(*this));
> > > >+    }
> > > Not sure why this is needed, as std::wstring can't be constructed from a
> > > const char16_t*.
> > 
> > It can be constructed on MSVC (because there is no char16_t is the same as
> > wchar_t) and that's what I'm trying to emulate here. IPC code from chromium
> > uses wstring in many places.
> 
> Obviously MSVC can construct a std::wstring from a char16ptr_t because it is
> a typedef for const wchar_t*. I'm trying to work out why gcc would have
> trouble constructing a std::wstring from a char16ptr_t. I admit I wasn't
> using mingw but a basic test of gcc seemed to work for me.

That's because C++ allows only one implicit cast. In case of mingw, char16ptr_t is a class, so there are two casts needed: char16ptr_t -> const wchar_t* -> std::wstring. Defining the cast inside char16ptr_t works around that.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #33)
> Comment on attachment 829746 [details] [diff] [review]
> Char16.h and xpcom/strings/ part.
> 
> Why do we need the new string constructors? If you have a char16ptr_t and
> you're constructing a nsString, won't the operators take care of the
> conversion?

They are needed for wchar_t* support. However, they can't take const wchar_t* as arugument, because that would make constructing from char16ptr_t ambiguous.


> This feel fragile: most people aren't going to know when they need to add
> these special char16ptr_t methods, and so this is going to break regularly
> :-(

Yes, I know. But I can't see any solution that would require less mingw-specific things. FWIW, since I got the tree compiling (about two weeks ago), I only had to rebase patches. There were no new char16_t-related problems introduced.
Comment on attachment 829746 [details] [diff] [review]
Char16.h and xpcom/strings/ part.

Review of attachment 829746 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/Char16.h
@@ +77,5 @@
> +   * simply use |typedef const char16_t* char16ptr_t|. Here, we want to make
> +   * the class as similar to this typedef, including providing some casts that
> +   * are allowed by the typedef.
> +   */
> +class char16ptr_t {

Brace on new line.

@@ +81,5 @@
> +class char16ptr_t {
> +  private:
> +    const char16_t* ptr;
> +    static_assert(sizeof(char16_t) == sizeof(wchar_t), "char16_t and wchar_t sizes differ");
> +  public:

Blank line before the public: line, please.

@@ +83,5 @@
> +    const char16_t* ptr;
> +    static_assert(sizeof(char16_t) == sizeof(wchar_t), "char16_t and wchar_t sizes differ");
> +  public:
> +    char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> +    char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}

*/& goes by type name in all places in this file:

  char16ptr_t(const char16_t* ptr) : ptr(ptr) {}
  typedef const char16_t* char16ptr_t;
  operator const char16_t*() const
  static_cast<const char16_t*>(...)
  bool operator==(const char16ptr_t& x)

@@ +85,5 @@
> +  public:
> +    char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> +    char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> +    /* Without this, nullptr assignment would be ambiguous */

Period at end of sentence, please.

@@ +86,5 @@
> +    char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> +    char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> +    /* Without this, nullptr assignment would be ambiguous */
> +    constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}

I guess we're just assuming native nullptr and decltype, no need for emulation of either?  Just flagging as something to keep in mind, my memory says we can get away with these if we're assuming basically gcc relatively new here.

@@ +98,5 @@
> +    operator const void *() const {
> +      return ptr;
> +    }
> +    operator bool() const {
> +      return ptr != nullptr;

Hmm, interesting.  I guess operator bool() isn't as completely broken as I thought!  (To the peanut gallery, this disambiguates the ambiguous conversion to bool through multiple different pointer types.  And char16ptr_t() + 1 uses the pointer conversions, so no bizarro summing up to an integer.)

@@ +101,5 @@
> +    operator bool() const {
> +      return ptr != nullptr;
> +    }
> +    operator std::wstring() const {
> +      return std::wstring(static_cast<const wchar_t*>(*this));

Is the * before this right?  I'd think it wouldn't be, or that *this would at least be ambiguous about which conversion operator it chose to invoke prior to dereferencing.

@@ +120,5 @@
> +    }
> +    explicit operator unsigned char*() const {
> +      return const_cast<unsigned char*>(reinterpret_cast<const unsigned char*>(ptr));
> +    }
> +    explicit operator void*() const {

These four const char*, const unsigned char*, unsigned char*, void* explicit operators are blargh.  How many instances were we actually hitting where these were necessary?  If it's relatively "few" I would probably go more for uglifying casts at those sites, or in some utility method in some central header that did the relevant sequence of casts instead.  (This would be followup-land, to be clear.)

I'd also put a comment before these explaining why you have this:

"""
Some Windows API calls accept BYTE* but require that data actually be WCHAR*.  Supporting this requires explicit operators to support the requisite explicit casts.
"""

@@ +132,5 @@
> +    bool operator==(const char16ptr_t &x) const {
> +      return ptr == x.ptr;
> +    }
> +    bool operator==(decltype(nullptr)) const {
> +      return !ptr;

ptr == nullptr is probably slightly clearer, paralleling the operator being defined, as it does.

@@ +141,5 @@
> +    bool operator!=(decltype(nullptr)) const {
> +      return ptr != nullptr;
> +    }
> +    char16ptr_t operator+(size_t add) const {
> +      return char16ptr_t(ptr+add);

Spaces around +.

@@ +144,5 @@
> +    char16ptr_t operator+(size_t add) const {
> +      return char16ptr_t(ptr+add);
> +    }
> +    ptrdiff_t operator-(const char16ptr_t &other) const {
> +      return ptr-other.ptr;

Spaces around -.

@@ +148,5 @@
> +      return ptr-other.ptr;
> +    }
> +};
> +
> +inline decltype((char*)0-(char*)0) operator-(const char16_t *x, const char16ptr_t y) {

inline decltype(static_cast<char*>(0) - static_cast<char*>(0))
operator-(const char16_t* x, const char16ptr_t y)
{
  ...
}

::: xpcom/string/public/nsTDependentString.h
@@ +53,5 @@
>          }
>  
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> +      explicit
> +      nsTDependentString_CharT( const wchar_t *data )

I'd have thought this would take char16ptr_t like all the others, but this isn't *wrong*, at least, so okay.  :-)

::: xpcom/string/public/nsTString.h
@@ +40,5 @@
>          }
>  
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> +      explicit
> +      nsTString_CharT( const wchar_t* data, size_type length = size_type(-1) )

Again I'd think char16ptr_t, but again, not *wrong*.

@@ +82,5 @@
>           * returns the null-terminated string
>           */
>  
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> +      const char16ptr_t get() const

The const here doesn't hurt, but it doesn't help, either.

@@ +599,5 @@
>          }
>  
>          // return nullptr if we are voided
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> +      const char16ptr_t get() const

Again const hurt/help stuff.

::: xpcom/string/public/nsTSubstring.h
@@ +374,5 @@
> +
> +      void Assign (char16ptr_t data, size_type length)
> +        {
> +          Assign(static_cast<const char16_t*>(data), length);
> +        }

I think you also need

      void Assign(char16ptr_t data, size_type length, const fallible_t&) NS_WARN_UNUSED_RESULT
        {
          return Assign(static_cast<const char16_t*>(data), length, fallible_t());
        }
Attachment #829746 - Flags: review?(jwalden+bmo) → review+
Attachment #827946 - Attachment is obsolete: true
Attachment #829746 - Attachment is obsolete: true
Attachment #829746 - Flags: review?(benjamin)
Thanks for the review. The attached version addresses all comments.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #36)
> @@ +86,5 @@
> > +    char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> > +    char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}
> > +
> > +    /* Without this, nullptr assignment would be ambiguous */
> > +    constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}
> 
> I guess we're just assuming native nullptr and decltype, no need for
> emulation of either?  Just flagging as something to keep in mind, my memory
> says we can get away with these if we're assuming basically gcc relatively
> new here.

Yes, I believe we may assume that here.

> @@ +101,5 @@
> > +    operator bool() const {
> > +      return ptr != nullptr;
> > +    }
> > +    operator std::wstring() const {
> > +      return std::wstring(static_cast<const wchar_t*>(*this));
> 
> Is the * before this right?  I'd think it wouldn't be, or that *this would
> at least be ambiguous about which conversion operator it chose to invoke
> prior to dereferencing.

The cast happens after dereferencing and it will use the cast operator, which is defined on char16ptr_t instance.

> @@ +120,5 @@
> > +    }
> > +    explicit operator unsigned char*() const {
> > +      return const_cast<unsigned char*>(reinterpret_cast<const unsigned char*>(ptr));
> > +    }
> > +    explicit operator void*() const {
> 
> These four const char*, const unsigned char*, unsigned char*, void* explicit
> operators are blargh.  How many instances were we actually hitting where
> these were necessary?  If it's relatively "few" I would probably go more for
> uglifying casts at those sites, or in some utility method in some central
> header that did the relevant sequence of casts instead.  (This would be
> followup-land, to be clear.)
> 
> I'd also put a comment before these explaining why you have this:
> 
> """
> Some Windows API calls accept BYTE* but require that data actually be
> WCHAR*.  Supporting this requires explicit operators to support the
> requisite explicit casts.
> """

There quite a few cases, but I don't have numbers. I will file a followup bug once this is landed.

> ::: xpcom/string/public/nsTDependentString.h
> @@ +53,5 @@
> >          }
> >  
> > +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> > +      explicit
> > +      nsTDependentString_CharT( const wchar_t *data )
> 
> I'd have thought this would take char16ptr_t like all the others, but this
> isn't *wrong*, at least, so okay.  :-)

Good catch, I've changed this.
I landed it on m-i:

https://hg.mozilla.org/integration/mozilla-inbound/rev/30fe34b9ebf7

I will be cleaning up other parts as dependent bugs soon (as the time permits).
Blocks: 943870
(In reply to Jacek Caban from comment #38)
> The cast happens after dereferencing and it will use the cast operator,
> which is defined on char16ptr_t instance.

Er, right.  Obviously.  I'm just thinko-ing here, clearly!
Thank you *so much* for working on this, Jacek!
https://hg.mozilla.org/mozilla-central/rev/30fe34b9ebf7
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Blocks: 944427
Blocks: 944444
Blocks: 944894
Blocks: 944905
Blocks: 944907
Blocks: 944913
Blocks: 945230
Blocks: 945245
Whiteboard: [qa-]
I landed a trivial fixup revealed by parts of codebase that use -Wshadow:

https://hg.mozilla.org/integration/mozilla-inbound/rev/7b8c2a48d08b
You need to log in before you can comment on or make changes to this bug.