Closed Bug 1453456 Opened 6 years ago Closed 5 years ago

Add/use more ASCII alphanumeric testing/conversion functions to/from MFBT

Categories

(Core :: MFBT, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

Details

Attachments

(5 files)

      No description provided.
Comment on attachment 8967133 [details] [diff] [review]
Add/use more ASCII alphanumeric testing/conversion functions from MFBT in SpiderMonkey code in various places

Review of attachment 8967133 [details] [diff] [review]:
-----------------------------------------------------------------

No tests?

::: js/src/ctypes/CTypes.cpp
@@ +2942,2 @@
>      else
> +        return false;

Now that there aren't so many cases, do you just want to say:

if (!IsAsciiAlphanumeric(c))
    return false;

uint8_t digit = AsciiAlphanumericToNumber(c);

?  Seems a little clearer, and enables you to initialize `digit` when you declare it, which is always nice.

::: js/src/jsnum.cpp
@@ +231,2 @@
>          else
>              break;

Could consider the same switch here, to enable initializing `digit` when it's declared.

::: mfbt/TextUtils.h
@@ +35,5 @@
>  };
>  
>  } // namespace detail
>  
> +/** Returns true iff |aChar| matches [a-z]. */

As with IsAsciiAlpha, is "what you thought islower did, but doesn't depend on the locale" worth adding here?  (Though it looks like islower is required to recognize non-ascii, too...)

@@ +48,5 @@
> +
> +/** Returns true iff |aChar| matches [A-Z]. */
> +template<typename Char>
> +constexpr bool
> +IsAsciiUppercaseAlpha(Char aChar)

Same as above, but with isupper.

@@ +104,5 @@
> +  }
> +
> +  // The division here compile-time-enforces an IsAsciiAlphanumeric assertion
> +  // when possible in constexpr context and in runtime context does the
> +  // next-best thing.

I am having a hard time deciding whether this is incredible or incredibly dangerous.
Attachment #8967133 - Flags: review?(nfroyd) → review+
Comment on attachment 8967134 [details] [diff] [review]
Replace nsCRT::IsAscii{Alpha,Digit} with mfbt/TextUtils.h versions

Review of attachment 8967134 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, but I would like to know why the below is hanging around.

::: xpcom/ds/nsCRT.h
@@ +96,5 @@
>    static bool IsLower(char aChar) { return NS_IsLower(aChar); }
>  
>    static bool IsAscii(char16_t aChar) { return NS_IsAscii(aChar); }
>    static bool IsAscii(const char16_t* aString) { return NS_IsAscii(aString); }
> +  static bool IsAsciiAlpha(char16_t aChar) { return mozilla::IsAsciiAlpha(aChar); }

So the patch is said to be replacing nsCRT::IsAsciiAlpha, but we're keeping the function around?
Attachment #8967134 - Flags: review?(nfroyd) → review+
Comment on attachment 8967135 [details] [diff] [review]
Replace JS7_ISDEC with mozilla::IsAsciiDigit

Review of attachment 8967135 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/util/Text.h
@@ -28,5 @@
>  /*
>   * Shorthands for ASCII (7-bit) decimal and hex conversion.
>   * Manually inline isdigit and isxdigit for performance; MSVC doesn't do this for us.
>   */
> -#define JS7_ISDEC(c)    ((((unsigned)(c)) - '0') <= 9)

Out of curiosity, and given the above comment, have you checked that MSVC does generate the above code with our current implementation of IsAsciiDigit?
Attachment #8967135 - Flags: review?(nfroyd) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fde6a66eb317
Add/use more ASCII alphanumeric testing/conversion functions from MFBT in SpiderMonkey code in various places.  r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/6a84718ddfe1
Replace nsCRT::IsAscii{Alpha,Digit} with mfbt/TextUtils.h versions.  r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0f8f325c2ce
Replace JS7_ISDEC with mozilla::IsAsciiDigit.  r=froydnj
(In reply to Nathan Froyd [:froydnj] from comment #4)
> No tests?

Whoops, forgot.  Added a bunch.

> Now that there aren't so many cases, do you just want to say:
> 
> if (!IsAsciiAlphanumeric(c))
>     return false;
> 
> uint8_t digit = AsciiAlphanumericToNumber(c);

Sure, yeah.  I also considered making it a Maybe or a Result or something, then decided not to spend too much time on it.  But really Optional<uint8_t> is really what's wanted here IMO.

> As with IsAsciiAlpha, is "what you thought islower did, but doesn't depend
> on the locale" worth adding here?  (Though it looks like islower is required
> to recognize non-ascii, too...)
> Same as above, but with isupper.

Sure.

> I am having a hard time deciding whether this is incredible or incredibly
> dangerous.

Oh, that's easy to answer.  Yes.  (C++17 if constexpr is really what we want -- added a comment noting that.)

(In reply to Nathan Froyd [:froydnj] from comment #5)
> So the patch is said to be replacing nsCRT::IsAsciiAlpha, but we're keeping
> the function around?

Whoops, removed.

(In reply to Nathan Froyd [:froydnj] from comment #6)
> Out of curiosity, and given the above comment, have you checked that MSVC
> does generate the above code with our current implementation of IsAsciiDigit?

Not recently, but years ago I remember tracking down some documentation for this somewhere and us starting to remove the pattern generally.  Just no one's going around systematically doing it.

I'll probably do more along these lines, so keeping the bug open for that.
Keywords: leave-open
So it occurred to me that actually, we *could* use non-constexpr code in this function -- you can have such things in constexpr functions *as long as the non-constexpr bits aren't invoked from constexpr context*.  But it turns out gcc at least as of 6.4 doesn't like this.  For example, this testcase

//////////////////////////////////
static inline void NonConstexpr() {}

constexpr unsigned char
AsciiAlphanumericToNumber(unsigned char uc)
{
  if ('0' <= uc && uc <= '9')
    return uc - '0';

  NonConstexpr();
  return 0;
}

int main()
{
  static_assert(AsciiAlphanumericToNumber('0') == 0, "hi");
}
//////////////////////////////////

works great with recent clang, but in gcc 6.4 I get

constexpr-fail.cpp: In function ‘constexpr unsigned char AsciiAlphanumericToNumber(unsigned char)’:
constexpr-fail.cpp:9:15: error: call to non-constexpr function ‘void NonConstexpr()’
   NonConstexpr();
   ~~~~~~~~~~~~^~
constexpr-fail.cpp: In function ‘int main()’:
constexpr-fail.cpp:15:3: error: non-constant condition for static assertion
   static_assert(AsciiAlphanumericToNumber('0') == 0, "hi");
   ^~~~~~~~~~~~~
constexpr-fail.cpp:15:42: error: ‘constexpr unsigned char AsciiAlphanumericToNumber(unsigned char)’ called in a constant expression
   static_assert(AsciiAlphanumericToNumber('0') == 0, "hi");
                 ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~

so boo to the idea of just having MOZ_ASSERT_UNREACHABLE in the function tail.

Still looking into this.  Goal is to land tonight or tomorrow, 'cause next week is a team meetup and I may not be able to then, but this stuff should get done and stay done so people can start using it.
Flags: needinfo?(jwalden+bmo)
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/69fbc1090788
Add/use more ASCII alphanumeric testing/conversion functions from MFBT in SpiderMonkey code in various places.  r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/c530019cba77
Replace nsCRT::IsAscii{Alpha,Digit} with mfbt/TextUtils.h versions.  r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/89bb0e170cc0
Replace JS7_ISDEC with mozilla::IsAsciiDigit.  r=froydnj
The leave-open keyword is there and there is no activity for 6 months.
:Waldo, maybe it's time to close this bug?
Flags: needinfo?(jwalden+bmo)
Okay, fine, let's do one or two more things here here about nsCRT.h and close this out.
Attachment #9046442 - Flags: review?(nfroyd)
Flags: needinfo?(jwalden)
Comment on attachment 9046443 [details] [diff] [review]
Remove nsCRT::IsAscii(null-terminated string) in favor of mozilla::IsAsciiNullTerminated

Review of attachment 9046443 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/TextUtils.h
@@ +40,5 @@
>    auto uc = static_cast<UnsignedChar>(aChar);
>    return uc < 0x80;
>  }
>  
> +/** Returns true iff |aChar| is ASCII, i.e. in the range [0, 0x80). */

This comment should really say something more like "Returns true iff every character in the null-terminated string pointed to by |aChar| is ASCII, i.e. in the range [0, 0x80).", right?
Attachment #9046443 - Flags: review?(nfroyd) → review+
Comment on attachment 9046442 [details] [diff] [review]
Remove nsCRT::IsAscii(char16_t) in favor of mozilla::IsAscii

Review of attachment 9046442 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/streamconv/converters/mozTXTToHTMLConv.cpp
@@ +220,1 @@
>               (!isEmail || aInString[uint32_t(i)] != ')');

Such conditional.
Attachment #9046442 - Flags: review?(nfroyd) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e4565e7ddcb0
Remove nsCRT::IsAscii(char16_t) in favor of mozilla::IsAscii.  r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/18e52bd7c3ad
Remove nsCRT::IsAscii(null-terminated string) in favor of mozilla::IsAsciiNullTerminated.  r=froydnj
Keywords: leave-open
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6ad7a291385a
Followup for Windows-specific bustage about char16ptr_t where every other platform has a nice pointer-to-character type instead.  r=bustage in a CLOSED TREE
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
You need to log in before you can comment on or make changes to this bug.