Bug 1792086 Comment 2 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

(Drive-by comment, after seeing this bug listed in an autonag bugzilla bot's "Bugs with a ni on a bug marked as affecting a released version without activity for the last 1 week for the 2022-10-10" periodic-email):

This looks like a bug in MSVC's implementation of `std::regex_search`, I think.

The warning is complaining about `load of value -8193, which is not a valid value for type 'std::regex_constants::match_flag_type'`.  And Mozilla's own code is not explicitly passing `-8193` here, or working with this `match_flag_type` at all.

It looks like Mozilla code is just calling `std::regex_search` here, and we're not passing any numeric values that could be guilty of tripping this warning.

Under-the-hood, MSVC implements `regex_search` using an internal function called `std::_Regex_search1` (stack level 2 in comment 0) which takes an arg of type `enum std::regex_constants::match_flag_type`, which is supposed to represent configuration options regex matching.  So that arg (or a version of it at stack level 1) is probably what's involved here.

Some googling turns up https://learn.microsoft.com/en-us/cpp/standard-library/regex-constants-class?view=msvc-170#match_flag_type which has information about the available options:
```c++
enum match_flag_type
    {    // specify matching and formatting rules
    match_default = 0x0000,
    match_not_bol = 0x0001,
    match_not_eol = 0x0002,
    match_not_bow = 0x0004,
    match_not_eow = 0x0008,
    match_any = 0x0010,
    match_not_null = 0x0020,
    match_continuous = 0x0040,
    match_prev_avail = 0x0100,
    format_default = 0x0000,
    format_sed = 0x0400,
    format_no_copy = 0x0800,
    format_first_only = 0x1000,
    _Match_not_null = 0x2000
```

I have a theory about how we would end up with -8193 here. The value -8193 in hex is `0xffffdfff which can also be expressed as `~0x2000`, i.e. `~_Match_not_null` if the above enum list can be trusted.  So I suspect part of Microsoft's regex implementation is trying to clear that bit (e.g. after testing for and handling it) by doing a bitwise operation with the inverse of that value, e.g. bitwise `~_Match_not_null`.  And the named literal happens to be treated as if it were signed, which makes it appear that we're putting a signed negative value into an unsigned enum type.

(Support for this theory:  stack level 1 is a function called `_Clearf` which sounds like it's clearing a flag; and stack level 0 is `operator&=` which is the operation that you would use if you were going to clear a bit by bitwise-`and`'ing it with its inverse.)

In any case, this is unlikely to be "really" undefined since it's in code provided by the compiler itself (in its implementation of the standard libraries), and unlikely to be something we could do anything about (except by wholly avoiding std::regex).  Possibly something fixed in newer MSVC versions, or something we should report upstream to Microsoft / MSVC folks about their regex impl.

Tyson, do we have procedures here for suppressing/ignoring UBSan warnings in external code? If so we probably want to bucket this under that category.
(Drive-by comment, after seeing this bug listed in an autonag bugzilla bot's "Bugs with a ni on a bug marked as affecting a released version without activity for the last 1 week for the 2022-10-10" periodic-email):

This looks like a bug in MSVC's implementation of `std::regex_search`, I think.

The warning is complaining about `load of value -8193, which is not a valid value for type 'std::regex_constants::match_flag_type'`.  And Mozilla's own code is not explicitly passing `-8193` here, or working with this `match_flag_type` at all.

It looks like Mozilla code is just calling `std::regex_search` here, and we're not passing any numeric values that could be guilty of tripping this warning.

Under-the-hood, MSVC implements `regex_search` using an internal function called `std::_Regex_search1` (stack level 2 in comment 0) which takes an arg of type `enum std::regex_constants::match_flag_type`, which is supposed to represent configuration options regex matching.  So that arg (or a version of it at stack level 1) is probably what's involved here.

Some googling turns up https://learn.microsoft.com/en-us/cpp/standard-library/regex-constants-class?view=msvc-170#match_flag_type which has information about the available options:
```c++
enum match_flag_type
    {    // specify matching and formatting rules
    match_default = 0x0000,
    match_not_bol = 0x0001,
    match_not_eol = 0x0002,
    match_not_bow = 0x0004,
    match_not_eow = 0x0008,
    match_any = 0x0010,
    match_not_null = 0x0020,
    match_continuous = 0x0040,
    match_prev_avail = 0x0100,
    format_default = 0x0000,
    format_sed = 0x0400,
    format_no_copy = 0x0800,
    format_first_only = 0x1000,
    _Match_not_null = 0x2000
```

I have a theory about how we would end up with -8193 here. The value -8193 in hex is `0xffffdfff` which can also be expressed as `~0x2000`, i.e. `~_Match_not_null` if the above enum list can be trusted.  So I suspect part of Microsoft's regex implementation is trying to clear that bit (e.g. after testing for and handling it) by doing a bitwise operation with the inverse of that value, e.g. bitwise `~_Match_not_null`.  And the named literal happens to be treated as if it were signed, which makes it appear that we're putting a signed negative value into an unsigned enum type.

(Support for this theory:  stack level 1 is a function called `_Clearf` which sounds like it's clearing a flag; and stack level 0 is `operator&=` which is the operation that you would use if you were going to clear a bit by bitwise-`and`'ing it with its inverse.)

In any case, this is unlikely to be "really" undefined since it's in code provided by the compiler itself (in its implementation of the standard libraries), and unlikely to be something we could do anything about (except by wholly avoiding std::regex).  Possibly something fixed in newer MSVC versions, or something we should report upstream to Microsoft / MSVC folks about their regex impl.

Tyson, do we have procedures here for suppressing/ignoring UBSan warnings in external code? If so we probably want to bucket this under that category.
(Drive-by comment, after seeing this bug listed in an autonag bugzilla bot's "Bugs with a ni on a bug marked as affecting a released version without activity for the last 1 week for the 2022-10-10" periodic-email):

This looks like a bug in MSVC's implementation of `std::regex_search`, I think.

The warning is complaining about `load of value -8193, which is not a valid value for type 'std::regex_constants::match_flag_type'`.  And Mozilla's own code is not explicitly passing `-8193` here, or working with this `match_flag_type` at all.

It looks like Mozilla code is just calling `std::regex_search` here, and we're not passing any numeric values that could be guilty of tripping this warning.

Under-the-hood, MSVC implements `regex_search` using an internal function called `std::_Regex_search1` (stack level 2 in comment 0) which takes an arg of type `enum std::regex_constants::match_flag_type`, which is supposed to represent configuration options regex matching.  So that arg (or a version of it at stack level 1) is probably what's involved here.

Some googling turns up https://learn.microsoft.com/en-us/cpp/standard-library/regex-constants-class?view=msvc-170#match_flag_type which has information about the available options:
```c++
enum match_flag_type
    {    // specify matching and formatting rules
    match_default = 0x0000,
    match_not_bol = 0x0001,
    match_not_eol = 0x0002,
    match_not_bow = 0x0004,
    match_not_eow = 0x0008,
    match_any = 0x0010,
    match_not_null = 0x0020,
    match_continuous = 0x0040,
    match_prev_avail = 0x0100,
    format_default = 0x0000,
    format_sed = 0x0400,
    format_no_copy = 0x0800,
    format_first_only = 0x1000,
    _Match_not_null = 0x2000
```

I have a theory about how we would end up with -8193 here. The value -8193 in hex is `0xffffdfff` which can also be expressed as `~0x2000`, i.e. `~_Match_not_null` if the above enum list can be trusted.  So I suspect part of Microsoft's regex implementation is trying to clear that bit (e.g. after testing for and handling it) by doing a bitwise operation with the inverse of that value, e.g.  `flags &= ~_Match_not_null` or something along those lines.  And the named literal happens to be treated as if it were signed, which makes it appear that we're putting a signed negative value into an unsigned enum type.

(Support for this theory:  stack level 1 is a function called `_Clearf` which sounds like it's clearing a flag; and stack level 0 is `operator&=` which is the operation that you would use if you were going to clear a bit by bitwise-`and`'ing it with its inverse.)

In any case, this is unlikely to be "really" undefined since it's in code provided by the compiler itself (in its implementation of the standard libraries), and unlikely to be something we could do anything about (except by wholly avoiding std::regex).  Possibly something fixed in newer MSVC versions, or something we should report upstream to Microsoft / MSVC folks about their regex impl.

Tyson, do we have procedures here for suppressing/ignoring UBSan warnings in external code? If so we probably want to bucket this under that category.
(Drive-by comment, after seeing this bug listed in an autonag bugzilla bot's "Bugs with a ni on a bug marked as affecting a released version without activity for the last 1 week for the 2022-10-10" periodic-email):

This looks like a bug in MSVC's implementation of `std::regex_search`, I think.

The warning is complaining about `load of value -8193, which is not a valid value for type 'std::regex_constants::match_flag_type'`.  And Mozilla's own code is not explicitly passing `-8193` here, or working with this `match_flag_type` at all.

It looks like Mozilla code is just calling `std::regex_search` here, and we're not passing any numeric values that could be guilty of tripping this warning.

Under-the-hood, MSVC implements `regex_search` using an internal function called `std::_Regex_search1` (stack level 2 in comment 0) which takes an arg of type `enum std::regex_constants::match_flag_type`, which is supposed to represent configuration options regex matching.  So that arg (or a version of it at stack level 1) is probably what's involved here.

Some googling turns up https://learn.microsoft.com/en-us/cpp/standard-library/regex-constants-class?view=msvc-170#match_flag_type which has information about the available options:
```c++
enum match_flag_type
    {    // specify matching and formatting rules
    match_default = 0x0000,
    match_not_bol = 0x0001,
    match_not_eol = 0x0002,
    match_not_bow = 0x0004,
    match_not_eow = 0x0008,
    match_any = 0x0010,
    match_not_null = 0x0020,
    match_continuous = 0x0040,
    match_prev_avail = 0x0100,
    format_default = 0x0000,
    format_sed = 0x0400,
    format_no_copy = 0x0800,
    format_first_only = 0x1000,
    _Match_not_null = 0x2000
```

I have a theory about how we would end up with -8193 here. The value -8193 in hex is `0xffffdfff` which can also be expressed as `~0x2000`, i.e. `~_Match_not_null` if the above enum list can be trusted.  So I suspect part of Microsoft's regex implementation is trying to clear that bit (e.g. after testing for and handling it) by doing a bitwise operation with the inverse of that value, e.g.  `flags &= ~_Match_not_null` or something along those lines.  And the named literal happens to be treated as if it were signed, so its inverse is signed and negative, and that makes it appear that we're putting a negative value into an unsigned enum type.

(Support for this theory:  stack level 1 is a function called `_Clearf` which sounds like it's clearing a flag; and stack level 0 is `operator&=` which is the operation that you would use if you were going to clear a bit by bitwise-`and`'ing it with its inverse.)

In any case, this is unlikely to be "really" undefined since it's in code provided by the compiler itself (in its implementation of the standard libraries), and unlikely to be something we could do anything about (except by wholly avoiding std::regex).  Possibly something fixed in newer MSVC versions, or something we should report upstream to Microsoft / MSVC folks about their regex impl.

Tyson, do we have procedures here for suppressing/ignoring UBSan warnings in external code? If so we probably want to bucket this under that category.

Back to Bug 1792086 Comment 2