Tests for TextFormatter to validate the usefulness of compare-locales checks

RESOLVED FIXED in Firefox 57

Status

()

enhancement
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Pike, Assigned: tromey)

Tracking

unspecified
mozilla57
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57 fixed)

Details

Attachments

(10 attachments, 1 obsolete attachment)

59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
pbro
: review+
flod
: review+
Details
In bug 890900, we'd like to review the assumptions on what's bad and what's OK for compare-locales and TextFormatter, aka, bundle.format...

To actually usefully do that, I would like to have tests that reproduce the problems. gtest works for that, thanks to Ted for the tip.

My C++ foo is weak, in particular when it comes down to which kinds of literals we're allowed to use in our code right now, so I'll need feedback on that.
I've pushed my latest version to try, and I'm puzzled.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=41de7b5fa692d8686b87c230424a695421ebec7b&filter-searchStr=opt%20gtest shows that only one platform has these tests passing. Weirdly enough, not the platform I'm on. I'm building on mac, and the tests pass. Mac on try not so much. Also, there are different test failures per platform.

Anyone here able to help on how to phrase this as a reliable test?

Also, one test is hitting a MOZ_ASSERT. I do want to include that pattern into our tests, but I need something that's allowing me to run one variant expecting a crash, and one variant not expecting a properly formatted string. Do we have good defines for that? I'd look for MOZ_NOT_GOING_TO_ASSERT_IN_TEST or so ;-)

I'm also interesting in general feedback on how bad my C++ is here, I wouldn't be surprised if some of the dependencies are due to that.

Not that I know why a float is a good pointer to a string on some platforms.
I don't understand why you're trying to write the death tests.  I guess the ones trying to pass off -1 as a char* are not that bad, as -1 is virtually guaranteed to be a bad address.  But with trying to pass off 3.5 as a char*, the bits of 3.5 interpreted as a pointer likely point to some reasonable address, and depending on the platform and the contents of memory, you're going to try and read a very large string from that address.

Those death tests look like the only problematic ones...so can you explain the rationale behind including those tests?
Flags: needinfo?(l10n)
Background to this bug is that once upon a time, '%1$S is %1$S' used to trigger fatal errors in nsTextFormatter, and now it does, and we never updated compare-locales to allow that.

Which supports that the number of fatal error cases in nsTextFormatter isn't static, and I'd like to have a full list of those and tests for them, to ensure that once someone fixes an error condition, they fail the test, and thus read the comment in the test to tell me to fix the l10n infrastructure.

Thus I'd like to have automated tests for all cases where something that a localizer might do generates a problem at runtime, whether it's a crash or just an empty string or something else. And I'd like that list to be exhaustive, 'cause, as I put it, localization is fuzz testing.

OTH, I want to make sure that all the things I forbid in the l10n toolchain actually isn't working. So that we don't end up again in situations where en-US uses a code pattern, and l10n can't because compare-locales rips the strings out at build time. As it does for errors in l10n.

On my choice of -1 as integer, that's a vague test, and mostly is there to show the difference between signed and unsigned formatters. The more frequent way to get this wrong is to mistake a %d for a %S, and then the code passes in 3. And then crash. Or bewildering other things.

PS: mail and suite strings in comm-central are the most prominent offenders these days, but I don't want to rely too much on "it's just mail" ;-) . mozilla-central code generally only goes through nsIStringBundle, which only supports unicode strings anyway. %S -> %d is bad, but less bad than the other way around.
Flags: needinfo?(l10n)
(In reply to Axel Hecht [:Pike] from comment #5)
> Background to this bug is that once upon a time, '%1$S is %1$S' used to
> trigger fatal errors in nsTextFormatter, and now it does, and we never
> updated compare-locales to allow that.

I'm assuming you meant "...and now it doesn't..." here.

> Which supports that the number of fatal error cases in nsTextFormatter isn't
> static, and I'd like to have a full list of those and tests for them, to
> ensure that once someone fixes an error condition, they fail the test, and
> thus read the comment in the test to tell me to fix the l10n infrastructure.

This is a noble goal.

> Thus I'd like to have automated tests for all cases where something that a
> localizer might do generates a problem at runtime, whether it's a crash or
> just an empty string or something else. And I'd like that list to be
> exhaustive, 'cause, as I put it, localization is fuzz testing.
> 
> OTH, I want to make sure that all the things I forbid in the l10n toolchain
> actually isn't working. So that we don't end up again in situations where
> en-US uses a code pattern, and l10n can't because compare-locales rips the
> strings out at build time. As it does for errors in l10n.
> 
> On my choice of -1 as integer, that's a vague test, and mostly is there to
> show the difference between signed and unsigned formatters. The more
> frequent way to get this wrong is to mistake a %d for a %S, and then the
> code passes in 3. And then crash. Or bewildering other things.
> 
> PS: mail and suite strings in comm-central are the most prominent offenders
> these days, but I don't want to rely too much on "it's just mail" ;-) .
> mozilla-central code generally only goes through nsIStringBundle, which only
> supports unicode strings anyway. %S -> %d is bad, but less bad than the
> other way around.

I don't have a good sense of how all the l10n infrastructure works, which might be causing part of my confusion.  But my impression is that all of the l10n stuff is coming from JS, where you can really only pass in strings anyway.  So the sort of errors you'd want to catch there are "the format string contains a format directive that's not %s or %S".  You're testing for very different sorts of errors with these gtest death tests.  You can't even get into the situation you cite above where somebody mistakes %d for %S.

Tests like these are also very weird:

  DisableCrashReporterForTextFormatter();
  NS_NAMED_LITERAL_STRING(fmt, "%d");
  nsString out;
  // just for completeness, this is our format, and works
  out.Adopt(nsTextFormatter::smprintf(fmt.get(), int(-1)));
  EXPECT_STREQ("-1", NS_ConvertUTF16toUTF8(out).get());
  out.Adopt(nsTextFormatter::smprintf(fmt.get(), (uint32_t)-1));
  EXPECT_STRNE("360999", NS_ConvertUTF16toUTF8(out).get());  // ???
  out.Adopt(nsTextFormatter::smprintf(fmt.get(), float(3.5)));
  EXPECT_STRNE("3.5", NS_ConvertUTF16toUTF8(out).get());     // ???

What are these sorts of tests even trying to accomplish in the context of the way l10n uses nsTextFormatter?  (I don't even know where this l10n code you're referring to lives; all I see is the call from nsStringBundle, and that is only subject to the %s or %S problem, and maybe some problems around positional arguments.)
If you want some sort of type-safe API, so bad things don't happen when somebody passes floats for %d or integers for %s, then the right thing to do is to have something that parses the format string and checks that the arguments you're going to pass in actual match up with what the format directives are requesting.
Some use cases from comm-central:

https://dxr.mozilla.org/comm-central/source/mailnews/import/outlook/src/nsOutlookImport.cpp#346 using %S and %d
https://dxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapMailFolder.cpp#1608 uses %c

Using code search for call-sites like this is sadly tedious, because most usage of nsTextFormatter is actually non-l10n, but not all of it.
OK, yeah, for such usages, we'd want something more like what comment 7 suggests.  I'm not super excited about including tests in the style of comment 6.
We will have a type-safe API for l20n.

Right now, I need to deal with the fact that as long as nsTextFormatter is in the code-base, there will be developers feeding strings from localizations to it. And localizers will try to pass in garbage that we need to fix at some point in the build chain, and report early to ask them to not do that.

That's why I need reliable checks for the status quo. I can't just decide that I want the world to turn the other way around.

If we can only make this reliable for one platform, well, so be it. If we need to use constructed funky values for our params, that's fine, too.
(In reply to Axel Hecht [:Pike] from comment #10)
> Right now, I need to deal with the fact that as long as nsTextFormatter is
> in the code-base, there will be developers feeding strings from
> localizations to it. And localizers will try to pass in garbage that we need
> to fix at some point in the build chain, and report early to ask them to not
> do that.
> 
> That's why I need reliable checks for the status quo. I can't just decide
> that I want the world to turn the other way around.

Adding a checker like comment 7 describes, and then using it in appropriate places in the C++ codebase, doesn't seem at all like turning the world around.  You don't even have to change the JS API or ask localizers to do anything different.
So, in layman terms, rewrite nsTextFormatter to be type safe, and mass-rewrite mozilla-central and comm-central to use the new code instead of the old?

Any idea who could lift that, and in which timeframe?
(In reply to Axel Hecht [:Pike] from comment #12)
> So, in layman terms, rewrite nsTextFormatter to be type safe, and
> mass-rewrite mozilla-central and comm-central to use the new code instead of
> the old?
> 
> Any idea who could lift that, and in which timeframe?

Nathan?
Flags: needinfo?(nfroyd)
I could, tromey probably could too, but I highly doubt it'd be for 57, unless we wanted to try and uplift it while 57 is in beta.  Maybe 58, more likely 59, and that's speaking only for myself.

(In reply to Axel Hecht [:Pike] from comment #10)
> We will have a type-safe API for l20n.

When is this coming?
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #14)
> (In reply to Axel Hecht [:Pike] from comment #10)
> > We will have a type-safe API for l20n.
> 
> When is this coming?

Our current plan is to start enabling it around November targeting 59, but I cannot yet predict when we'll be able to complete the migration away from the old formats.
I wrote a patch to change nsTextFormatter to be more robust.
It does two things:

1. Type-safety by having the compiler tell nsTextFormatter the type.
   It's no longer possible to, say, try print("%s", 23)
2. In most cases, if the format and the type mismatch, just fall back to
   printing based on the type.  That is, that print("%s", 23) example will
   just print "23", because that's the type of the argument.
   This reduces the number of possible error cases drastically, pretty much
   just leaving a mismatch between %n and its argument (well, and some things
   like non-integer dynamic width arguments).

I can attach it soon if you want; but meanwhile I'm writing some tests.
Probably going to copy the existing Printf tests and go from there.
Assignee: nobody → ttromey
Working my way through the tests in the attached patch.
(I reworded them because smprintf is gone now...)
I'm preserving the corner cases as tested there, I hope that's ok.

In some cases the new formatter can do a bit better.  For example:

  // Dropping leading arguments doesn't
  nsTextFormatter::ssprintf(out, u" %2$S ", u"1", u"2");
  EXPECT_STRNE(" 2 ", NS_ConvertUTF16toUTF8(out).get());

  // Dropping middle arguments doesn't
  nsTextFormatter::ssprintf(out, u" %3$S %1$S ", u"1", u"2", u"3");
  EXPECT_STRNE(" 3 1 ", NS_ConvertUTF16toUTF8(out).get());

... here those can be changed to STREQ, skipping arguments is no issue any more.
Attachment #8895431 - Attachment is obsolete: true
Comment on attachment 8903772 [details]
Bug 1388789 - replace hex strings with static arrays;

https://reviewboard.mozilla.org/r/175522/#review180630
Attachment #8903772 - Flags: review?(nfroyd) → review+
Comment on attachment 8903771 [details]
Bug 1388789 - remove prio.h include from nsTextFormatter.h;

https://reviewboard.mozilla.org/r/175520/#review180642
Attachment #8903771 - Flags: review?(nfroyd) → review+
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review180770

Tom, wow, this looks really nice.

I got a question about the tests, and what this means for linting localizations. I got two classes of feedback, one is Warning, and the other is Error. A warning generally means that something could be wrong, or better, and an error breaks the build. For errors, I strip the localized string from the build, for warnings I don't.

It seems to me that all STRNE tests should be errors in the l10n linter, and all other mismatches should be warnings?

::: xpcom/tests/gtest/TestTextFormatter.cpp:62
(Diff revision 1)
> +  nsTextFormatter::ssprintf(out, u"%2S %S %1$S", u"1", u"2", u"3");
> +  EXPECT_STRNE("2 3 1", NS_ConvertUTF16toUTF8(out).get());
> +
> +  // Referencing an extra param returns empty strings
> +  nsTextFormatter::ssprintf(out, u" %2$S ", u"1");
> +  EXPECT_STRNE(" 1 ", NS_ConvertUTF16toUTF8(out).get());

Does `out` in this case equal "" or "  "?

From the l10n pov, if we can have the latter that'd be preferable. I'd still warn localizers about that mistake, but we wouldn't drop the localized string in this case.
(In reply to Axel Hecht [:Pike] from comment #25)

> It seems to me that all STRNE tests should be errors in the l10n linter, and
> all other mismatches should be warnings?

> > +  // Referencing an extra param returns empty strings
> > +  nsTextFormatter::ssprintf(out, u" %2$S ", u"1");
> > +  EXPECT_STRNE(" 1 ", NS_ConvertUTF16toUTF8(out).get());
> 
> Does `out` in this case equal "" or "  "?
> 
> From the l10n pov, if we can have the latter that'd be preferable. I'd still
> warn localizers about that mistake, but we wouldn't drop the localized
> string in this case.

Thanks for pointing these out.  I didn't intend to leave any STRNE tests in there,
but I forgot to convert a few!

I think the output in that case should be "  ".

Also my overlooking of the NE cases led me to implement something a funny way: in
the current code, a numbered argument ("%2$S") will also set the index for the
implicit argument (in that case to 3).  However, that's just my misreading of the
test ... I think actually it would make a bit more sense for implicit arguments to
just proceed through the argument vector without being reset.

What do you think of that?

A test case could be like ("%1$S %S", "x", "y").
With the current code this would be "x y", but here I'm saying it
should be "x x".
Had a quick chat with flod, and we agree that positioning the %S independently from %{n}$S is a good idea.

We'll hunt folks down to not mix numbered and not, but this way, you most likely get some content printed, instead of possibly "" for an %S that got out-of-bounds.

So "x x" it is.
FWIW, the Linux man page for printf(3) says that:

       The arguments must correspond properly (after type promotion) with the  conver‐
       sion  specifier.   By default, the arguments are used in the order given, where
       each '*' (see Field width and Precision below) and  each  conversion  specifier
       asks for the next argument (and it is an error if insufficiently many arguments
       are given).  One can also specify explicitly which argument is taken,  at  each
       place  where an argument is required, by writing "%m$" instead of '%' and "*m$"
       instead of '*', where the decimal integer m denotes the position in  the  argu‐
       ment list of the desired argument, indexed starting from 1.  Thus,

           printf("%*d", width, num);

       and

           printf("%2$*1$d", width, num);

       are  equivalent.  The second style allows repeated references to the same argu‐
       ment.  The C99 standard does not include the style using '$', which comes  from
       the Single UNIX Specification.  If the style using '$' is used, it must be used
       throughout for all conversions taking an argument and all width  and  precision
       arguments, but it may be mixed with "%%" formats, which do not consume an argu‐
       ment.  There may be no gaps in the numbers of arguments  specified  using  '$';
       for example, if arguments 1 and 3 are specified, argument 2 must also be speci‐
       fied somewhere in the format string.

That is, if you're going to use numbered arguments, you should use them for *all* directives in the string, and not merely for some.  Could we just return an error for mixed numbered arguments and not?  That would at least be consistent with the printf spec, though perhaps backward compatibility concerns would dictate that we do something like comment 26.
For mixed numbered and not, we currently report an error in the l10n linters, and remove the string from the localization, replacing it with the one from en-US.

So yes, we can continue to do that.

Going with flod over the question about missing argument references, we do have use-cases for that, but also an established work-around. We advise folks to use a zero-width arguments in that case. We think it'd be good to keep being strict on that on the l10n side.
So we'd not pass that in to the TextFormatter, it'd be up to you two what you'd prefer to return.
(In reply to Nathan Froyd [:froydnj] from comment #28)

> Could we just
> return an error for mixed numbered arguments and not?

It's certainly easy from my end.  I'll make the change.
>            printf("%2$*1$d", width, num);

Note that this was never supported by nsTextFormatter.

I will also make it reject "*" when using positional arguments, for consistency.
We could implement the above it is needed.
Consider this comment:

https://dxr.mozilla.org/mozilla-central/rev/632e42dca494ec3d90b70325d9c359f80cb3f38a/xpcom/string/nsTextFormatter.h#57

What does "infallible" mean in this context?
This function definitely can fail under some circumstances, as discussed above.
Perhaps the failures are not interesting, but that's up to the caller.
Flags: needinfo?(nfroyd)
Maybe it just means what is implemented: errors from dosprintf are ignored and some result
is returned, whether or not it is useful.
Is this really desirable?
It means that a failure code is never reported to the caller; any failures cause crashes (e.g. OOM) or--as you have discovered--are silently ignored.

Whether that's desirable depends on the sort of errors that we're going to report from dosprintf or similar.  What kinds of errors are these?
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #34)

> Whether that's desirable depends on the sort of errors that we're going to
> report from dosprintf or similar.  What kinds of errors are these?

There are many possibilities, but for example mixing numbered and implicit arguments falls into this.
I found some other oddities, so I'm going to tack on a few fixes.
Comment on attachment 8903770 [details]
Bug 1388789 - change return values of nsTextFormatter::vs{s,v}printf;

https://reviewboard.mozilla.org/r/175518/#review181980

::: commit-message-e4105:1
(Diff revision 2)
> +Bug 1388789 - change nsTextFormatter returns; r?froydnj

Maybe "change return values of nsTextFormatter::vs{s,n}printf"?
Attachment #8903770 - Flags: review?(nfroyd) → review+
Comment on attachment 8905086 [details]
Bug 1388789 - use nsTextFormatter::ssprintf in more places;

https://reviewboard.mozilla.org/r/176896/#review181982

This is nicer, thank you.
Attachment #8905086 - Flags: review?(nfroyd) → review+
Comment on attachment 8905087 [details]
Bug 1388789 - normalize null string handling in nsTextFormatter;

https://reviewboard.mozilla.org/r/176898/#review181984
Attachment #8905087 - Flags: review?(nfroyd) → review+
Comment on attachment 8905088 [details]
Bug 1388789 - clean up \0 emission in nsTextFormatter;

https://reviewboard.mozilla.org/r/176900/#review181986

How sure are you that you handled all the places where this behavior matters?

::: xpcom/tests/gtest/TestTextFormatter.cpp:201
(Diff revision 1)
> +  EXPECT_EQ(nsTextFormatter::snprintf(buf, 10, u"%s", "more than 10 characters"), 9u);
> +  EXPECT_EQ(buf[9], '\0');
> +  EXPECT_STREQ("more than", NS_ConvertUTF16toUTF8(buf).get());
> +
> +  nsString out;
> +  nsTextFormatter::ssprintf(out, u"%s", "more than 10 characters");
> +  // The \0 isn't written here.
> +  EXPECT_EQ(out.Length(), 23u);

I assume these fail before your changes and pass afterwards...
Attachment #8905088 - Flags: review?(nfroyd) → review+
Comment on attachment 8903773 [details]
Bug 1388789 - handle unrecognized escapes in nsTextFormatter;

https://reviewboard.mozilla.org/r/175524/#review181988

I think this makes sense, though I can also see an argument for erroring and returning -1 here as well.  The man page for `printf(3)` is silent on what happens for unsupported conversion characters--I mean, if you were a Real C Programmer, you'd never make that sort of mistake, would you?

WDYT about the alternative, erroring behavior?
Attachment #8903773 - Flags: review?(nfroyd) → review+
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review180636

I had some comments on an earlier review of the diff.

::: xpcom/string/nsTextFormatter.h:51
(Diff revision 1)
> -  static uint32_t snprintf(char16_t* aOut, uint32_t aOutLen,
> -                           const char16_t* aFmt, ...);
> +  template<typename ...T>
> +  static uint32_t snprintf(char16_t* aOut, uint32_t aOutLen, const char16_t* aFmt, T... aArgs) {
> +    BoxedValue values[] = { BoxedValue(aArgs)... };
> +    return vsnprintf(aOut, aOutLen, aFmt, sizeof...(aArgs), values);
> +  }

This is so nice!  I sat and thought about how to go from templated varargs to actual varargs and still handle `%1$d` nicely, but couldn't figure out how to do it; this is very elegant.

WDYT about passing `mozilla::Span` here instead of a `length, pointer` tuple?

When looking at this initially, I wondered if we'd get codebloat from `BoxedValue` construction, but on reflection, it's probably not *that* much work than having to construct varargs lists at the callsite, right?  But anybody who wanted to receive varargs and then pass a `va_list` into `TextFormatter` now has to be templated themselves, correct?  That certainly seems like it could cause some codebloat...though there probably are not that many callers...

::: xpcom/string/nsTextFormatter.cpp:698
(Diff revision 1)
> -      case 'C':
> +        if (!thisArg->PointerCompatible()) {
> -        /* XXX not supported I suppose */
> -        MOZ_ASSERT(0);
> -        break;
> +          break;
> -#endif
> -
> -      case 'S':
> -        u.S = va_arg(aAp, const char16_t*);
> -        rv = cvt_S(aState, u.S, width, prec, flags);
> -        if (rv < 0) {
> -          va_end(aAp);
> -          FREE_IF_NECESSARY(nas);
> -          return rv;
>          }

This seems like an error, not an opportunity for `break`.

::: xpcom/string/nsTextFormatter.cpp:713
(Diff revision 1)
>        case 'n':
> -        u.ip = va_arg(aAp, int*);
> -        if (u.ip) {
> +        if (thisArg->mKind != INTPOINTER) {
> +          return -1;
> -          *u.ip = aState->cur - aState->base;
>          }
> -        break;
> +        

Nit: trailing whitespace.

::: xpcom/string/nsTextFormatter.cpp:749
(Diff revision 1)
> +        if (c != 'f' && c != 'E' && c != 'e' && c != 'G' && c != 'g') {
> +          // Pick some default.
> +          c = 'g';
> +        }

Don't we want to fail here, because we have a format string mismatch?
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review180636

> Don't we want to fail here, because we have a format string mismatch?

We should talk about this and some other comments and try to get to the design we want.

At first I was just aiming for runtime-type-safety.  That is, no matter what input is presented, don't crash.  And I think we've achieved that.

But while doing this I realized that this style of printf could afford to be more lenient than is usual.  In particular, most things can be printed based on the actual argument type, without reference to the format string.  In fact, only %n really requires an error for a mismatch.

Now, this is based on the theory that it's better to print something kinda-valid rather than error.  However, maybe we already shot that theory down by disallowing the mixing of implicit and numbered arguments.  So we could also go all the way and just reject all mismatches.  Or we could restore the argument behavior.  Or keep it like it is in this patch.

I guess what I'm asking for is, rather than spot review, we should treat this topic as one of the design facets.  It doesn't affect safety, maybe just behavior in corner cases -- which maybe the catalog-checking code will rule out anyhow.

Anyway, let me know what you'd like; all the different approaches are equally easy to do.
Comment on attachment 8905088 [details]
Bug 1388789 - clean up \0 emission in nsTextFormatter;

https://reviewboard.mozilla.org/r/176900/#review181986

Reasonably sure given:

* nsString is inherently null-terminated, and there was already a workaround for this in the "stuff" function, so I think effectively no change there; and
* there are only two remaining calls to snprintf, both of which seem obviously ok
Comment on attachment 8905088 [details]
Bug 1388789 - clean up \0 emission in nsTextFormatter;

https://reviewboard.mozilla.org/r/176900/#review181986

> I assume these fail before your changes and pass afterwards...

The snprintf one does, because the return-value computation there was incorrect previously.

The other one does not fail beforehand, because the stuff function had that special case.  However I wrote the test to help assure myself that I didn't break anything.
Comment on attachment 8903770 [details]
Bug 1388789 - change return values of nsTextFormatter::vs{s,v}printf;

https://reviewboard.mozilla.org/r/175518/#review182006

::: commit-message-e4105:1
(Diff revision 2)
> +Bug 1388789 - change nsTextFormatter returns; r?froydnj

I made the change, thanks.
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review180636

> This is so nice!  I sat and thought about how to go from templated varargs to actual varargs and still handle `%1$d` nicely, but couldn't figure out how to do it; this is very elegant.
> 
> WDYT about passing `mozilla::Span` here instead of a `length, pointer` tuple?
> 
> When looking at this initially, I wondered if we'd get codebloat from `BoxedValue` construction, but on reflection, it's probably not *that* much work than having to construct varargs lists at the callsite, right?  But anybody who wanted to receive varargs and then pass a `va_list` into `TextFormatter` now has to be templated themselves, correct?  That certainly seems like it could cause some codebloat...though there probably are not that many callers...

I'll use `mozilla::Span`.  I didn't know about it previously.

I would assume there is some bloat.  I didn't look at the size change.

However -- aside from the `%g` issue mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1331760, nearly all uses of this code can be converted to just use `mozilla::Smprintf`.  There aren't many places requiring any of the extensions, and maybe only one spot requiring runtime type-safety.  Nothing in-tree seems to use `%S`, either.

So, one option would be to add the `%g` code to `Printf.cpp` and then convert most of the users to the other API.

Anyone wanting `va_list`-like support would have to add some code to allow it.  We could certainly add something, see comment #37, which would require this.

There was only one spot in tree that had to be converted to a template, and it only had one caller.  So, that at least wasn't a big deal.
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review180636

Are these those?  Otherwise I didn't see them :)
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review180636

> I'll use `mozilla::Span`.  I didn't know about it previously.
> 
> I would assume there is some bloat.  I didn't look at the size change.
> 
> However -- aside from the `%g` issue mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1331760, nearly all uses of this code can be converted to just use `mozilla::Smprintf`.  There aren't many places requiring any of the extensions, and maybe only one spot requiring runtime type-safety.  Nothing in-tree seems to use `%S`, either.
> 
> So, one option would be to add the `%g` code to `Printf.cpp` and then convert most of the users to the other API.
> 
> Anyone wanting `va_list`-like support would have to add some code to allow it.  We could certainly add something, see comment #37, which would require this.
> 
> There was only one spot in tree that had to be converted to a template, and it only had one caller.  So, that at least wasn't a big deal.

I made the `Span` change.
Comment on attachment 8903770 [details]
Bug 1388789 - change return values of nsTextFormatter::vs{s,v}printf;

https://reviewboard.mozilla.org/r/175518/#review182310

::: xpcom/string/nsTextFormatter.cpp:1227
(Diff revision 2)
>    ss.cur = 0;
>    ss.maxlen = 0;
>    ss.stuffclosure = &aOut;
>  
>    aOut.Truncate();
> -  int n = dosprintf(&ss, aFmt, aAp);
> +  dosprintf(&ss, aFmt, aAp);

Should this (void) dosprintf?
Comment on attachment 8903770 [details]
Bug 1388789 - change return values of nsTextFormatter::vs{s,v}printf;

https://reviewboard.mozilla.org/r/175518/#review182310

> Should this (void) dosprintf?

It's not mandatory to do that, and I never really liked it as a matter of style.  I think it's basically an old-school version of mozilla::Unused (and one that doesn't work with newer compilers anyhow).

Is it Mozilla style?  I can add it if need be.
Comment on attachment 8903770 [details]
Bug 1388789 - change return values of nsTextFormatter::vs{s,v}printf;

https://reviewboard.mozilla.org/r/175518/#review182310

> It's not mandatory to do that, and I never really liked it as a matter of style.  I think it's basically an old-school version of mozilla::Unused (and one that doesn't work with newer compilers anyhow).
> 
> Is it Mozilla style?  I can add it if need be.

No idea, I was just surprised to see it go missing. Maybe actually use mozilla::Unused?

I'm really just going by pretty colors here, not by good or bad code :-/
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review180636

> We should talk about this and some other comments and try to get to the design we want.
> 
> At first I was just aiming for runtime-type-safety.  That is, no matter what input is presented, don't crash.  And I think we've achieved that.
> 
> But while doing this I realized that this style of printf could afford to be more lenient than is usual.  In particular, most things can be printed based on the actual argument type, without reference to the format string.  In fact, only %n really requires an error for a mismatch.
> 
> Now, this is based on the theory that it's better to print something kinda-valid rather than error.  However, maybe we already shot that theory down by disallowing the mixing of implicit and numbered arguments.  So we could also go all the way and just reject all mismatches.  Or we could restore the argument behavior.  Or keep it like it is in this patch.
> 
> I guess what I'm asking for is, rather than spot review, we should treat this topic as one of the design facets.  It doesn't affect safety, maybe just behavior in corner cases -- which maybe the catalog-checking code will rule out anyhow.
> 
> Anyway, let me know what you'd like; all the different approaches are equally easy to do.

flod and I are torn. flod likes the idea that format mismatch continues to be fatal, I prefer to have it non-fatal, but probably still error (and warn for sure) on the linter side.

I'm optimistic that most of the time, being a tad more forgiving will make our lives easier at runtime. We can be as pendantic as we want on dashboards and build independently.
I chatted with :Pike about this on irc today:

<tromey> Pike: I've been thinking about the strictness issue  [09:56]
<tromey> if the catalog checker is rejecting invalid localizations pretty
	 strictly, it seems to me that there's not much benefit to having
	 nsTextFormatter be lenient
<tromey> since presumably the bad formats will never wind up there anyhow
<tromey> so just rejecting them is ok and would address froydnj 's comments
								        [09:57]
<Pike> tromey: OTH, there might be messages that slip through the cracks on
       the linter.  [09:58]
<tromey> if we think that's possible then lenient seems better  [09:59]
<Pike> in particular,there are messages in en-US that break printf, but as
       they never go to printf, they're fine. So there are some holes I had to
       design into the checks
<tromey> like maybe the output will be garbled but recognizable
<Pike> right
<tromey> I'd be interested in an example of one of these holes
<Pike> tromey: one caveat is that I never proven that my regex-based parser
       actually parses the same stuff that textformatter does,
       https://hg.mozilla.org/l10n/compare-locales/file/tip/compare_locales/checks.py#l61
								        [10:02]
<tromey> I'll take a look  [10:04]
<tromey> but yes, I see the difficulty



My take on this is that we ought to proceed with the lenient approach.
Maybe we should even restore the mixing of numbered and implicit arguments on this basis.

Anyway, awaiting final word from :froydnj.
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review180636

> flod and I are torn. flod likes the idea that format mismatch continues to be fatal, I prefer to have it non-fatal, but probably still error (and warn for sure) on the linter side.
> 
> I'm optimistic that most of the time, being a tad more forgiving will make our lives easier at runtime. We can be as pendantic as we want on dashboards and build independently.

(Sorry, I thought I had published this last week, but apparently mozreview defeated my efforts.)

My initial feeling is that format mismatches should be errors, maybe even assertions in `DEBUG` builds: we don't get format-string checking with `nsTextFormatter` and since `nsTextFormatter` does get used from C++ where we don't have linting for errors, we should try to make errors as loud and obvious as possible.  As tromey has discovered, people are lax about even checking for error returns with this stuff.

I don't know what returning errors would look like for the l10n side of things, though: do you get errors in the browser console?  Do things not render in the chosen locale, and use en-US instead?  Do things just not render period?  I'm just worried that printing best-effort things will result in people not actually noticing errors, though perhaps with some sort of source linting that concern is assuaged.

How about a compromise: format string mismatches assert in `DEBUG` builds, but perform best-effort printing based on the actually passed type otherwise?  That would make the tests a little complicated, since we'd only be able to run some tests in release mode.  But at least we'd have a better shot at catching things on the C++ side.  I have a small preference for keeping the requirement for all-numbered-args or no-numbered-args regardless of what format mismatches do, though.
(In reply to Nathan Froyd [:froydnj] from comment #64)
> Comment on attachment 8903774 [details]

> My initial feeling is that format mismatches should be errors, maybe even
> assertions in `DEBUG` builds: we don't get format-string checking with
> `nsTextFormatter` and since `nsTextFormatter` does get used from C++ where
> we don't have linting for errors, we should try to make errors as loud and
> obvious as possible.

I think the essential issue is that this code is trying to serve two roles:
one for platform, and another for l20n.

However, most or all of the "platform" uses could be replaced, as in bug 1331760.
So, I tend to think it makes the most sense to consider how this should serve l20n.

But, at the same time, bug 1331760 isn't fixed yet, and it sounds like leniency isn't
a hard requirement for l20; the only hard requirement is not crashing -- and the
proposed plan does this.  So, I'll make the compromise change.  Thanks!

> Do things just not
> render period?  I'm just worried that printing best-effort things will
> result in people not actually noticing errors, though perhaps with some sort
> of source linting that concern is assuaged.

If they can't notice the errors and the linter doesn't catch them ... then those
errors are probably not really worth fixing anyhow :-)
I had to tweak the unrecognized-% test a bit to avoid failing because the
argument is out of range.
Comment on attachment 8907284 [details]
Bug 1388789 - make va_list nsTextFormatter private;

https://reviewboard.mozilla.org/r/178960/#review184398

Here's hoping that not many other consumers require this same templating treatment.  I guess we can work around that if need be.
Attachment #8907284 - Flags: review?(nfroyd) → review+
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review184402

This looks good, just a few comments.

::: xpcom/string/nsTextFormatter.cpp:645
(Diff revision 3)
>          }
>        }
>      }
>  
> -    /* size */
> -    type = NumArgState::INTN;
> +    /* Size.  Defaults to 32 bits.  */
> +    uint64_t mask = 0xffffffff;

Maybe use `UINT32_MAX` here, and `UINT16_MAX`, `UINT64_MAX` below, so the reader doesn't have to count f's to tell if you got the right thing?

::: xpcom/string/nsTextFormatter.cpp:668
(Diff revision 3)
> -        radix = 10;
> -        goto fetch_and_convert;
> +        // Note that this is actually always safe, so is not a release
> +        // assert.

Maybe instead of repeating this comment all over, we should just put a single comment at the top of this switch:

"Several `MOZ_ASSERT`s below check for argument compatibility with the format specifier.  These are only debug assertions, not release assertions, and exist to catch problems in C++ callers of `nsTextFormatter`, as we do not have compile-time checking of format strings.  In release mode, these assertions will be no-ops, and we will fall through to printing the argument based on the known type of the argument."

or something like that, just so the reader has some sense of the rationale behind the decisions we've arrived at on this bug.

::: xpcom/string/nsTextFormatter.cpp:803
(Diff revision 3)
> +          int64_t val = thisArg->mValue.mInt;
> +          if ((flags & _UNSIGNED) == 0 && val < 0) {
> +            val = -val;
> +            flags |= _NEG;
> +          }

Does this do the right thing for printing INT64_MIN as a signed integer?
Attachment #8903774 - Flags: review?(nfroyd) → review+
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review184402

> Maybe instead of repeating this comment all over, we should just put a single comment at the top of this switch:
> 
> "Several `MOZ_ASSERT`s below check for argument compatibility with the format specifier.  These are only debug assertions, not release assertions, and exist to catch problems in C++ callers of `nsTextFormatter`, as we do not have compile-time checking of format strings.  In release mode, these assertions will be no-ops, and we will fall through to printing the argument based on the known type of the argument."
> 
> or something like that, just so the reader has some sense of the rationale behind the decisions we've arrived at on this bug.

I liked your text so much that I used it verbatim.

> Does this do the right thing for printing INT64_MIN as a signed integer?

I've added a test.
Comment on attachment 8903774 [details]
Bug 1388789 - make nsTextFormatter runtime type-safe;

https://reviewboard.mozilla.org/r/175526/#review180636

> This seems like an error, not an opportunity for `break`.

As discussed, this is now a debug assert.
Blocks: 1399627
That started poorly.
Looks like I will have to use type traits to deal with the overloading.
Nope, no type traits -- I just missed that we needed a char16ptr_t case when MOZ_USE_CHAR16_WRAPPER is defined.
Whew, because the type trait thing was really not working out.
The devtools/client/commandline/test/browser_cmd_appcache_invalid.js failure seems to be caused by this series.
I'm investigating.
Hah, that one is an invalid format!

https://dxr.mozilla.org/mozilla-central/rev/ffe6cc09ccf38cca6f0e727837bbc6cb722d1e71/devtools/client/locales/en-US/appcacheutils.properties#86

It says "%20", which the formatter now preserves.
The test is expecting "%" though -- relying on the old behavior.
I think I will change the property to read "%%" and leave the test as-is.

:Pike, you might want to make sure your checker is catching this.
Flags: needinfo?(l10n)
Actually it's clearer if it reads %%20 and then the test is updated.
The webrtc thing fails in the same way on my pristine artifact build.
So, I think that's not my fault.
Comment on attachment 8909877 [details]
Bug 1388789 - fix invalid format in appcacheutils.properties;

https://reviewboard.mozilla.org/r/181368/#review186666
Attachment #8909877 - Flags: review?(pbrosset) → review+
Comment on attachment 8909877 [details]
Bug 1388789 - fix invalid format in appcacheutils.properties;

Please don't land this without an additional review by flod.
Attachment #8909877 - Flags: review?(francesco.lodolo)
Comment on attachment 8909877 [details]
Bug 1388789 - fix invalid format in appcacheutils.properties;

https://reviewboard.mozilla.org/r/181368/#review186674

::: devtools/client/locales/en-US/appcacheutils.properties:86
(Diff revision 1)
>  asteriskInWrongSection2=Asterisk (*) incorrectly used in the %1$S section at line %2$S. If a line in the NETWORK section contains only a single asterisk character, then any URI not listed in the manifest will be treated as if the URI was listed in the NETWORK section. Otherwise such URIs will be treated as unavailable. Other uses of the * character are prohibited.
>  
>  # LOCALIZATION NOTE (escapeSpaces): the associated cache manifest has a space
>  # in a URI. Spaces must be replaced with %20. Parameters: %S is the line
>  # number where this error occurs.
> -escapeSpaces=Spaces in URIs need to be replaced with %20 at line %S.
> +escapeSpaces=Spaces in URIs need to be replaced with %%20 at line %S.

This is not going to be caught and fixed by any language, we need to update the string ID here (including the reference in localization comment), and the code calling it.

https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_content_best_practices#Changing_existing_strings

Looks like there are only a couple of places
http://searchfox.org/mozilla-central/search?q=escapeSpaces&path=
Attachment #8909877 - Flags: review?(francesco.lodolo) → review-
Comment on attachment 8909877 [details]
Bug 1388789 - fix invalid format in appcacheutils.properties;

https://reviewboard.mozilla.org/r/181368/#review186678

This works. Even better if you add an explanation in the comment, e.g.

# %% will be displayed as a single % character (% is commonly used to define
# format specifiers, so it needs to be escaped).
Attachment #8909877 - Flags: review?(francesco.lodolo) → review+
I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1401307
Flags: needinfo?(l10n)
All the oranges seem to be pre-existing to me, so I'm going to land this.
Pushed by ttromey@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e9844f691218
make va_list nsTextFormatter private; r=froydnj
https://hg.mozilla.org/integration/autoland/rev/ec3f626df9d1
remove prio.h include from nsTextFormatter.h; r=froydnj
https://hg.mozilla.org/integration/autoland/rev/c118b2a8f1b1
replace hex strings with static arrays; r=froydnj
https://hg.mozilla.org/integration/autoland/rev/ed7a96166648
fix invalid format in appcacheutils.properties; r=flod,pbro
https://hg.mozilla.org/integration/autoland/rev/2aa6bbd7fca8
handle unrecognized escapes in nsTextFormatter; r=froydnj
https://hg.mozilla.org/integration/autoland/rev/2c6474a8bc31
change return values of nsTextFormatter::vs{s,v}printf; r=froydnj
https://hg.mozilla.org/integration/autoland/rev/5a294c3eff7c
make nsTextFormatter runtime type-safe; r=froydnj
https://hg.mozilla.org/integration/autoland/rev/e261f147b65f
use nsTextFormatter::ssprintf in more places; r=froydnj
https://hg.mozilla.org/integration/autoland/rev/2409839d34ba
normalize null string handling in nsTextFormatter; r=froydnj
https://hg.mozilla.org/integration/autoland/rev/76ba215c8f25
clean up \0 emission in nsTextFormatter; r=froydnj
Depends on: 1401821
You need to log in before you can comment on or make changes to this bug.