Closed Bug 1615064 Opened 4 years ago Closed 3 years ago

Subject column (and others) in message list may show any message header starting with a substring of known RFC header, like Su: instead of Subject:

Categories

(Thunderbird :: Folder and Message Lists, defect)

defect
Not set
normal

Tracking

(thunderbird_esr78 wontfix)

RESOLVED FIXED
88 Branch
Tracking Status
thunderbird_esr78 --- wontfix

People

(Reporter: c.buhtz, Assigned: gds)

References

Details

Attachments

(3 files, 6 obsolete files)

Attached image fake subject.png

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0

Steps to reproduce:

I can reproduce this on Windows 10 and Debian 10.

Simple variant, simulating reality:

  • Open a mbox-like file in Thunderbird profile folder with a text editor
  • Before the real "Subject: " line Insert another line beginning with "S: " and following any string.

Real variant:
Use ClawsMail to export mails to mbox format. ClawsMail stores (sometimes, not always) its meta data into the mbox files and create e.g. "S: " lines before the "Subject: "

Here is an example of an Thunderbirds mbox-like-file entry

From - Wed Feb 12 21:59:52 2020
S: Fake Subject
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
FCC: mailbox://ich%40gmail.com@pop.gmail.com/Sent
X-Identity-Key: id1
X-Account-Key: account1
To: receiver@world.uni
From: vorname nachname <ich@gmail.com>
Subject: This is the real subject
Message-ID: <3adb0f5d-9992-9211-073c-fa163304fd4d@gmail.com>
Date: Wed, 12 Feb 2020 21:59:52 +0100
X-Mozilla-Draft-Info: internal/draft; vcard=0; receipt=0; DSN=0; uuencode=0;
attachmentreminder=0; deliveryformat=4
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.4.1
MIME-Version: 1.0
Content-Type: text/html; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit

<html>
<head>

<meta http-equiv="content-type" content="text/html; charset=UTF-8">

</head>
<body>
<p>Text<br>
</p>
</body>
</html>

Actual results:

It causes the message list to use the wrong line as subject for the subject column.
The screenshot attached ilustrates the problem.

The message list contains a mail with "Fake subect" but in the message view the subject is displayed correct.

This behavior was adopted in Bug 172104 - multiple subject headers causes different subject in thread pane than preview pane, should use first subject - and so is working as designed.

Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID

It is inpolite to close a ticket befor waiting for opener feedback.
Re-open the bug.

"S: " is not a subject.

Please reference the RFC, ISO or what ever document if I am wrong. I did a small research about it before opening this bug and couldn't find any informations about that "S: " is a valid subject header in e-mails.

Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---

Gene, do you have references that are applicable?

Flags: needinfo?(gds)
From - Wed Feb  5 14:47:01 2020
S: what the hay?
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Return-Path: <0101016fdf28f741-e4fb7e66-b500-452c-ba6e-1147ba84d499-000000@us-west-2.amazonses.com>
:
:
From: "Bugzilla@Mozilla" <bugzilla-daemon@mozilla.org>
To: gds@chartertn.net
Subject: 
 =?UTF-8?Q?=5BBug_1517464=5D_Thunderbird_crash_in_nsXPCWrappedJS=3A=3ARelea?=
 =?UTF-8?Q?se_sending_mail_=28or_saving_to_Sent_or_Drafts=29?=

I changed every message in the imap mbox file like the example above, restarted tb and I don't see a variation on the subject. It is always the correct "Subject:" in the list and on the mail reading area. The "S: what the hay?" seems to be ignored. Maybe I'm missing something while duplicating what you did?

I couldn't find anywhere in tb code where "S:" is interpreted as abbreviated "Subject:" but I might not have looked in the right place. Don't see in rfc where you can abbreviate the header names and have them recognized.

Anyhow, probably trying to "import" or modify tb mbox files is something you shouldn't do. The mbox files are created or modified only when new messages are downloaded (imap fetched) from the server or when new emails are composed and saved. I tried this on Local Folders too and all it did was mess up the display so that only the raw source was visible in 2nd and later messages, didn't affect the Subject line anywhere.

Probably need more details. What protocol mbox were you changing: imap, pop3, news, Local Folders, etc. ? (I see a reference to pop in your comment, but not sure if that's the type of mbox you changed.)

Flags: needinfo?(gds)

c.buhtz ?

Flags: needinfo?(c.buhtz)

I was using POP3 and mbox-format in Thunderbird when reporting this bug.
I tried again and can reproduce with mdir-format, too.

BUT please delete your msf-file (where subjects stored also) before you re-open thunderbird.

Maybe there is a mystic different between my test mail (in the first comment) and your real mails?

Flags: needinfo?(c.buhtz)
Flags: needinfo?(gds)

Ok, I am able to duplicate this with a message in Local Folders. Looks like any header that starts with "s" is seen as the subject as long as there is no "wrong" chars, e.g., s: works, subjec: works but suject: doesn't.

Pop folders are essentially Local Folders, I think. Don't see it with imap however, just Local stuff.

Also, didn't have to delete the .msf file to see it.

I think non-standard headers are supposed to start with X so this might be a claws bug too.

This issue was also reported here:
http://forums.mozillazine.org/viewtopic.php?f=39&t=1994903

Wayne, if you see this as a significant problem we might fix it (it could affect headers other than Subject). The problem occurs here: https://searchfox.org/comm-central/rev/0af59a692f43286cbc0d239cafd250e33c4582ec/mailnews/local/src/nsParseMailbox.cpp#938
because "S: whatever" is assumed to be either Subject, Sender or Status and since Subject is 1st in the list, it is set as the subject. Also, since the length of the string between the "S" and colon (1) is the length passed to strncasecmp() "Subject" is seen to match just "S".

Flags: needinfo?(gds)
Attached patch no-abbreviated-header-names.diff (obsolete) — Splinter Review

I don't know if it's a bug for tb to accept headers that are truncated strings from the official names. I don't see any specs that allow for this. But not sure I've looked at them all.

Assuming full header names are required to be recognized, this shows the changes that might be required to allow proper matching to all possible header names.
Note: Some X_MOZILLA_* headers already required a length match but no others did.

So leaving this unconfirmed since not sure partial matches are a bug.

Assignee: nobody → gds
Attachment #9180621 - Flags: feedback?(mkmelin+mozilla)

Agreed it's a bug.

Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 9180621 [details] [diff] [review]
no-abbreviated-header-names.diff

Review of attachment 9180621 [details] [diff] [review]:
-----------------------------------------------------------------

Maybe for testing you could also modify one of our test mails to have an S: header. 
Like https://searchfox.org/comm-central/source/mailnews/test/data/bugmail11 for test_folderLoaded.js
Attachment #9180621 - Flags: feedback?(mkmelin+mozilla) → feedback?(benc)

Re: comment 0 (attachment 9126233 [details]): With the current code I also don't understand why the subject in the reading pane "This is the real subject" show up instead of the subject in the threads pane "Fake Subject" (both should be "Fake Subject"). The change in comment 1 above referencing Bug 172104 is supposed to cause the first "subject" encountered in the header (which would be "Fake Subject") to be the one that appears everywhere. But I think this is moot if my proposed patch is in effect.

The issue of shortened header fields was actually mentioned 15 years ago here: Bug 321887. Bug 321887 was "resolved" by duping to another bug 310189 but the issue was never addressed and bug 310189 is still in state NEW.

(In reply to Magnus Melin [:mkmelin] from comment #11)

Maybe for testing you could also modify one of our test mails to have an S:
header.
Like https://searchfox.org/comm-central/source/mailnews/test/data/bugmail11
for test_folderLoaded.js

The tests that include bugmail11 don't fail with the short subject like "S: XXXX" added. However the test that includes
https://searchfox.org/comm-central/source/mailnews/test/data/bugmail1 and
https://searchfox.org/comm-central/source/mailnews/test/data/draft1,
test
https://searchfox.org/comm-central/source/mailnews/local/test/unit/test_folderLoaded.js
does fail. And it passes with my diff applied. So I think I should modify bugmail1 and draft1.

Comment on attachment 9180621 [details] [diff] [review]
no-abbreviated-header-names.diff

Review of attachment 9180621 [details] [diff] [review]:
-----------------------------------------------------------------

I can't think of any reason not to fix this bug - seems bad to accept abbreviated header names. I can think of a few (admittedly contrived) situations where it might blow up much worse than this.

The patch looks generally fine to me, although fiddling the buffer (`*end=0` .... `*end=':'`) seems a little icky.
I'd probably bite the bullet and switch to using the string classes:

```
nsDependentCSubstring header(start,end);

...

if (header.LowerCaseEqualsASCII("subject")) {
...
}
```
(actually, probably `LowerCaseEqualsLiteral()` instead - it'll early-out if the string lengths don't match. But I can't remember the literal syntax off the top of my head).

Do we have any policy on the this kind of thing? We should definitely be moving away from PR_Malloc/PR_Free pairs in favour of string classes and nsTArray and whatever, but maybe we should also aim to be moving away from the PR_ string functions too?
Attachment #9180621 - Flags: feedback?(benc) → feedback+

Thanks Ben. This first pass is just to illustrate a possible fix with minimal change and not to replace the "stone age" string functions. Don't know the policy on this either.

(In reply to gene smith from comment #16)

Thanks Ben. This first pass is just to illustrate a possible fix with minimal change and not to replace the "stone age" string functions. Don't know the policy on this either.

Perhaps then finish this off, and file a follow up bug to change strings?

Flags: needinfo?(gds)

Here's another attempt at a patch. I went ahead and used the LowerCaseEqualsLiteral() as Ben suggested in his original feedback except in the "case 'x'" where I mostly kept the original method since it uses "symbolic literals" instead of "literal literals" and I didn't see a clean way to change it and since it checked the length it is OK anyhow.

I also made a change to a bugmail1 test file that will fail with the current tb code but passes with the proposed patch. It shows that the abbreviated header is rejected and verifies that a 2nd but valid header of type "Subject" will be ignored.

Attachment #9180621 - Attachment is obsolete: true
Flags: needinfo?(gds)
Attachment #9209220 - Flags: review?(benc)
Comment on attachment 9209220 [details] [diff] [review]
Bug1615064--reject-abbreviated-header-names.patch

Review of attachment 9209220 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, but I think those remaining PL_strncasecmp() should be still be replaced.
I'll r+ this patch - it fixes the bug, after all. I'm happy for the r+ to cover removing the remaining PL_strncasecmp(), but if you'd prefer another check, just r? me again!

::: mailnews/local/src/nsParseMailbox.cpp
@@ +942,3 @@
>          if (X_MOZILLA_STATUS2_LEN == end - buf &&
>              !PL_strncasecmp(X_MOZILLA_STATUS2, buf, end - buf) &&
>              !m_IgnoreXMozillaStatus && !m_mozstatus2.length)

I think :
```
if (headerStr.EqualsIgnoreCase(nsLiteralCString(X_MOZILLA_STATUS2))) &&
 !m_IgnoreXMozillaStatus &&
 !m_mozstatus2.length)
```
would work here, and for the PL_strncasecmp() uses. And it'd save the strlen() calls you added.
(I think nsLiteralCString() knows the length automagically. The '_ns' macro is the preferred way to use it, but it's complicated here because HEADER_X_STATUS2 is a #define...)
Also, I ditched the "X_MOZILLA_STATUS2_LEN == end - buf" in this example because the nsCString comparison functions all do a length check as their first step anyway.
Attachment #9209220 - Flags: review?(benc) → review+

Sounds good. All this ns* string handling is something I'm not real familiar with. Was hoping if there was a way you would point it out, so thanks. I'll update, test and resubmit the patch.

Also, FWIW, I did a try build and see some errors in the "X" tests. I don't think these are caused by my changes but not sure. See https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=7eb9936769ca8ba6e01b8766fca6847cebd4117e
The try build was actually for another user with a one-line change (bug 163964) but I included the changes for this bug in it.

I've never understood exactly what the _ns thing after strings is for. I see it a lot. Wasn't sure if it was part of standard c++ or moz specific.

P/S: I also need to run the changes for this bug through the code formatter since at least one line extends past col 80 and there are some trailing blanks. (Seems like the formatter should allow some tolerance, like maybe line length up to 85 oe 90 is OK.)

Attached file case-x.c (obsolete) —

Ben, I couldn't get your suggested way to build. I don't know why. But I finally, after trial and error, got it to work using a somewhat different method that gets rid of the str...() functions. It's all under "case 'x'" so, rather than making a new patch now, I'll just attach the file case-x.c for you to look at. If it's OK with you, I'll go with it and make a new patch; otherwise, I'll try something else that you might suggest. Thanks!

Attachment #9209336 - Flags: feedback?(benc)

(In reply to gene smith from comment #20)

Sounds good. All this ns* string handling is something I'm not real familiar with. Was hoping if there was a way you would point it out, so thanks. I'll update, test and resubmit the patch.

To be honest, I'm still figuring out a lot of it myself...
Best docs I've found so far are: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings
But I still end up trudging through the source code trying to figure stuff out. Some times I go looking for other places in the code to find an example, other times I'll dig through the string code. There's usually a lot of template magic and helper classes which complicate things there :-(

Also, FWIW, I did a try build and see some errors in the "X" tests. I don't think these are caused by my changes but not sure. See https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=7eb9936769ca8ba6e01b8766fca6847cebd4117e
The try build was actually for another user with a one-line change (bug 163964) but I included the changes for this bug in it.

I just ran all the ../mach xpcshell-test comm/mail comm/mailnews locally with your patch there and it all worked fine, so I don't think you broke any of them.

I've never understood exactly what the _ns thing after strings is for. I see it a lot. Wasn't sure if it was part of standard c++ or moz specific.

It's just a fancy way to say nsLiteralCString("string").
See https://searchfox.org/mozilla-central/source/xpcom/string/nsLiteralString.h#23
The reason for using nsLiteralCString() is that it can see the length of the literal char array, so the string length can be set at compile time, rather than having to measure the string at runtime (via strlen or equivalent).

P/S: I also need to run the changes for this bug through the code formatter since at least one line extends past col 80 and there are some trailing blanks. (Seems like the formatter should allow some tolerance, like maybe line length up to 85 oe 90 is OK.)

I just spotted this too when I was playing with your extra patch. Seems I just totally rely on ../mach clang-format now, to the point that I don't even know most of the format rules :-)

(In reply to gene smith from comment #21)

Ben, I couldn't get your suggested way to build. I don't know why. But I finally, after trial and error, got it to work using a somewhat different method that gets rid of the str...() functions. It's all under "case 'x'" so, rather than making a new patch now, I'll just attach the file case-x.c for you to look at. If it's OK with you, I'll go with it and make a new patch; otherwise, I'll try something else that you might suggest. Thanks!

I did a little more digging and found Equals() takes a comparator arg, so this works:

if (headerStr.Equals(nsLiteralCString(X_MOZILLA_STATUS),
                             nsCaseInsensitiveCStringComparator)) 

A bit verbose, but hey.

Actually, thinking about it, I don't see why the other tests shouldn't just be like this:

else if (headerStr.Equals("return-receipt-to"_ns))

Seems simpler than LowerCaseEqualsLiteral(), and the _ns provides the compile-time length we want...

I think the switch/case is kind of redundant now - I'm guessing it was put in speed up the tests, but don't think they'd be that bad anyway. The nsCString comparison functions are all inlined anyway, and they early-out if the string lengths differ.
So if you really feel adventurous you could rip out the switch entirely :-)

Phew. I know this is all a bit excessive for a conceptually simple fix, but I too am really curious to figure out the "proper" way to do ns*string handling! I'm writing up some TB C++ docs, so I shall write up a section with all this stuff...

Attachment #9209336 - Flags: feedback?(benc)

Your last advice worked well. I got rid of the switch and now just a long series of if/else if. Passes the unit tests (after fixing a transcription errors I made that took a while to find). Also, clang formatted.

Attachment #9209220 - Attachment is obsolete: true
Attachment #9209336 - Attachment is obsolete: true
Attachment #9209640 - Flags: review?(benc)
Comment on attachment 9209640 [details] [diff] [review]
[checked in] Bug1615064--reject-abbreviated-header-names-V2.patch

Review of attachment 9209640 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great to me!
Attachment #9209640 - Flags: review?(benc) → review+
Summary: Subject column in message list: Wrong field → Subject column in message list: Wrong field for multiple heaaders starting with S, like Subject: and S:
Target Milestone: --- → 88 Branch
Summary: Subject column in message list: Wrong field for multiple heaaders starting with S, like Subject: and S: → Subject column in message list: Wrong field for multiple headers starting with S, like Subject: and S:
Summary: Subject column in message list: Wrong field for multiple headers starting with S, like Subject: and S: → Subject column (and others) in message list may show any message header starting with a substring of known RFC header, like Su: instead of Subject:

Bug 321887 mentions that not only subject header is affected, but all default headers like From: To: Date: etc. (see mkmelin's Bug 321887 Comment 1). The patch seems to fix this for all of the known headers, thanks Gene!!!

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/48afdbe161e8
Ignore abbreviated header names, e.g., "Sub" for "Subject" r=benc

Status: ASSIGNED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → FIXED

Gene, thank you for this fix! Maybe with the same awesome energy, take a stab at this one? (same code which you have touched here)

Bug 705587 - For malformed message with multiple headers of same type (but RFC 5322 max-number=1), TB should only use the first header instance consistently (show/use only the first From:, To:, Date: etc. in message list, message reader, replies, filtering etc.)

I've prepared a minimal test case (attachment 9209975 [details]), screenshot (attachment 9209986 [details]), and tabular overview of current inconsistent behaviour (Bug 705587 Comment 6) to pave the way...

In malformed messages (e.g. AV duplicating subject), if there are multiple instances of the same header (To:, From:, Date: etc.), only the first instance should be used, per RFC 5322 max-number (but what if there's more To-recipients than fit in one long line? not sure...). Thunderbird is currently inconsistent about this, sometimes using the 1st instance of a header (msg reader), sometimes the 2nd (msg list), and sometimes both (To, Cc, but not Bcc). Replies also affected (recipients and attribution line).

This was fixed for Subject header with a one-liner in this patch for bug 172104. Which is exactly the same code which you have touched here.
The one-liner added && !m_subject.length to the subject condition:

} else if (headerStr.Equals("subject"_ns) && !m_subject.length)

Flags: needinfo?(gds)
Blocks: 705587

Please do a follow-up here and replace all the headerStr.Equals("xxx"_ns) with header.EqualsLiteral("xx"). Once upon a time fixed all string processing in C-C and now it's being watered down again :-( - You're aware what the_ns` does, right? That creates a "smart string" out of a literal, so that's not required here. Ben, please check for string processing when you do reviews.

Messed up the formatting, sorry.

The rationale for this was provided by Ben in comment 23. He said that the _ns operator provides a compile-time string length that allows better run-time so can get rid of the switch statement.
It seems that header.EqualsLiteral("xx") builds OK. So does just header.Equals("xx").

From comment 23:

I asked:
I've never understood exactly what the _ns thing after strings is for. I see it a lot. Wasn't sure if it was part of standard c++ or moz specific.

Ben answered:
It's just a fancy way to say nsLiteralCString("string").
See https://searchfox.org/mozilla-central/source/xpcom/string/nsLiteralString.h#23
The reason for using nsLiteralCString() is that it can see the length of the literal char array, so the string length can be set at compile time, rather than having to measure the string at runtime (via strlen or equivalent).

Flags: needinfo?(gds)

The "xx"_ns is a replacement of what used to be NS_LITERAL_CSTRING("xx"), see bug 1648802.

Before what got landed here, there was a single occurrence of Equals("xx"_ns) in all of C-C:
https://searchfox.org/comm-central/search?q=Equals%5C%28.*_ns&path=&case=true&regexp=true
the one in nsPop3Sink.cpp.

I fixed all the calls in bug 1402750. We did that to optimise the code, see reasoning in bug 870698 comment #23.

I don't know what makes you assume the that the compiler doesn't know the length of a literal string.

So please do the follow-up I requested in comment #29 to be consistent with the rest of the code base.

P.S.: The original NI came from Thomas in comment #28. That was just to draw your attention to similar bugs.

So please do the follow-up I requested in comment #29 to be consistent with the rest of the code base.

I'll wait until Ben looks at this and if he agrees with you assessment I'll do the follow up (or maybe he will want to just back this out and resubmit the patch under this bug number?).

P.S.: The original NI came from Thomas in comment #28. That was just to draw your attention to similar bugs.

re-NI'd myself
and NI'd Ben.

Flags: needinfo?(gds)
Flags: needinfo?(benc)

Mr. addons,
For the X_MOZ... checks, in your opinion should, e.g., this

 else if (headerStr.Equals(nsLiteralCString(X_MOZILLA_STATUS2),
                              nsCaseInsensitiveCStringComparator) &&
             !m_IgnoreXMozillaStatus && !m_mozstatus2.length)

be changed to this

 else if (headerStr.EqualsLiteral(X_MOZILLA_STATUS2,
                              nsCaseInsensitiveCStringComparator) &&
             !m_IgnoreXMozillaStatus && !m_mozstatus2.length)

? I haven't tried it. Seems like it should work even though X_MOZILLA_STATUS2 is #defined string (macro).

Flags: needinfo?(gds)

Mostly we're aiming for consistency and optimisation. For the case-insensitive comparison, see here:
https://searchfox.org/comm-central/search?q=Equals.*%2C+nsCaseInsensitiveCStringComparator&path=&case=true&regexp=true

Yes, a literal is a literal, even if it's provided via a #define since the preprocessor replaces it before the compiler sees it.

(In reply to Addons from comment #29)

Ben, please check for string processing when you do reviews.

Ha! I did. I suggested Gene replaced EqualsLiteral("foo") with Equals("foo"_ns).
Obviously, it was a bad call :-)
(My reasoning was: I figured that Equals() was more general - there are a couple that need a case-insensitive check which EqualsLiteral() doesn't do. And that the temporary nsLiteralCString() would get optimised away anyway - probably a bit much faith in the compiler there, hehe)

I'm writing up some C++ reference docs/guides, so I'll definitely add this snippet of information.

Gene: I'm pretty sure there's no case-insensitive EqualsLiteral(), so the X_MOZILLA_STATUS2 check will have to stay as you have it.
If you prefer, I'm happy to fix the other ones up, since I was the one who got it wrong!

Jorg: (re: comment #35) the X_MOZILLA_STATUS #define is mixed-case, so we need a case-insensitive compare unfortunately.

Flags: needinfo?(benc)
Attached patch 1615064-follow-up.patch (obsolete) — Splinter Review

Like this. I don't have a current build, so I didn't compile or run clang-format.

Do what you want with the patch, change the author, whatever. As for case-insensitive compare, check:
https://searchfox.org/mozilla-central/search?q=LowerCaseEqualsLiteral&redirect=false
but that doesn't fit the bill here.

Comment on attachment 9210149 [details] [diff] [review]
1615064-follow-up.patch

Review of attachment 9210149 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/local/src/nsParseMailbox.cpp
@@ +916,2 @@
>        header = GetNextHeaderInAggregate(m_toList);
> +    else if (headerStr.Equals(X_MOZILLA_STATUS2,

Why remove the nsLiteralCString() here -  won't that force a runtime strlen()?
nsLiteralCString() ctor works out the length at compile time, or am I missing something?

Gene: ...If you prefer, I'm happy to fix the other ones up...

That's fine with me. Thanks!

(In reply to Ben Campbell from comment #39)

Why remove the nsLiteralCString() here - won't that force a runtime strlen()?
nsLiteralCString() ctor works out the length at compile time, or am I missing something?

Jolly good question. Well, as I said, everyone is using it like this:
https://searchfox.org/comm-central/search?q=Equals.*%2C+nsCaseInsensitiveCStringComparator&path=&case=true&regexp=true
https://searchfox.org/mozilla-central/search?q=Equals.*%2C+nsCaseInsensitiveCStringComparator&path=&case=true&regexp=true
Are all those authors wrong?

How about this?
https://searchfox.org/mozilla-central/search?q=EqualsIgnoreCase&path=&case=true&regexp=true
Does that compile? It would be nice to get rid of the ugly nsCaseInsensitiveCStringComparator argument.

If you look at
https://searchfox.org/mozilla-central/rev/200bfc652c5128011e7725fc7c201eb125d453e7/xpcom/string/nsTStringRepr.h#196-208
it looks to me that the "raw" string is passed into the Compare() function and surely you need to go through the string once for the compare and stop at the null byte.

Let me ask the other way:
xx.Equals(nsLiteralCString("huhu") - won't that create and destroy an object for the simple purpose of comparison? Is that better or worse than a runtime strlen? If in doubt, we'd need to ask an M-C string guru, or just imitate hoping that that will yield the correct result.

(In reply to Addons from comment #41)

How about this?
https://searchfox.org/mozilla-central/search?q=EqualsIgnoreCase&path=&case=true&regexp=true
Does that compile? It would be nice to get rid of the ugly nsCaseInsensitiveCStringComparator argument.

If you look at
https://searchfox.org/mozilla-central/rev/200bfc652c5128011e7725fc7c201eb125d453e7/xpcom/string/nsTStringRepr.h#196-208
it looks to me that the "raw" string is passed into the Compare() function and surely you need to go through the string once for the compare and stop at the null byte.

Yep. My reading of EqualsIgnoreCase() is that only takes the raw pointer, so forcing a strlen()-style scan unless you explicitly pass in the length as well :-( It seems like a bit of a misplaced function compared to all the support Equals() has.

Let me ask the other way:
xx.Equals(nsLiteralCString("huhu") - won't that create and destroy an object for the simple purpose of comparison? Is that better or worse than a runtime strlen? If in doubt, we'd need to ask an M-C string guru, or just imitate hoping that that will yield the correct result.

Yes, it will create a temp nsLiteralCString object... but looking through the code I think it should all boil away when compiled - no data is copied, the literal pointer and size are taken as is. Lots of constexpr ctors too, which seems like it'd help... (I need to brush up on my constexpr though).
I think these are the relevant bits:
https://searchfox.org/mozilla-central/source/xpcom/string/nsTLiteralString.h#48
https://searchfox.org/mozilla-central/source/xpcom/string/nsTStringRepr.h#315

So I think that the intention was that it should be way better than doing a strlen(), but we'd probably have to look at the compiler output to be 100% certain...

Either way, I'll do another pass over the followup patch (compile, test, clang-format etc etc).
Thanks!

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch back-to-switch.diff (obsolete) — Splinter Review

After all this I'm inclined to go back to using the switch again. Considering the "X-..." headers are usually first, seem like a lot of string compares to go through to find them.
I did notice that the header called "priority" was under "case 'x'" so it would never be found, so I moved it to a "case 'p'", but moot of we stay with the "if/else" if structure.
Anyhow, Ben and Mr. Addon, whatever you think best to do here is fine with me.

Comment on attachment 9210149 [details] [diff] [review]
1615064-follow-up.patch

Review of attachment 9210149 [details] [diff] [review]:
-----------------------------------------------------------------

Compiles fine, clang-format is clean and xpcshell-tests pass for me here, so I think it's fine as is.
Any savings adding nsLiteralCString on those #define ones are theoretical, so I say we just go with less typing for now :-)
r=benc
Attachment #9210149 - Flags: review+

Sorry, gene, I missed your comment 43 + patch...

(In reply to gene smith from comment #43)

After all this I'm inclined to go back to using the switch again. Considering the "X-..." headers are usually first, seem like a lot of string compares to go through to find them.

Pretty much all the string compares do a (known) length check first, so I don't think the switch/case is much (/any) more efficient than that.

I did notice that the header called "priority" was under "case 'x'" so it would never be found, so I moved it to a "case 'p'", but moot of we stay with the "if/else" if structure.

ha! I vote for if/else structure - simple stupid code is usually the best kind :-)

Hi Emilio, look's like you worked on EqualsIgnoreCase() in bug 1591191. Can you help us out here. Which of the two is better:
xx.EqualsIgnoreCase("huhu") or xx.Equals("huhu", nsCaseInsensitiveCStringComparator).
Surely the first one looks a lot nicer.

Flags: needinfo?(emilio)

Either should work. If you have lots of headers given the header name is usually short it's gonna be faster to lowercase upfront and call EqualsLiteral. Both of those would be faster if you added _ns to the literal, and if this code doesn't need microoptimizing I'd just peek the one you think looks better. Otherwise profile and peek the faster one? :)

Flags: needinfo?(emilio)
Attached patch 1615064-follow-up.patch - take 2 (obsolete) — Splinter Review

Thanks, Emilio. I think this looks better and I don't think there is a need for micro-optimising. nsParseMailMessageState::ParseHeaders() appears to be called once per message, so I don't think there's any gain to be made here.

Ben, please pick the patch you prefer, obsolete the others and set checkin-needed.

Attachment #9210403 - Flags: review?(benc)

Maybe Gene's optimisation has its merit, see for example here:
https://searchfox.org/comm-central/rev/9fcd975bd0e858d633f236acfd5ebf754ff0904b/mailnews/base/src/nsMsgDBView.cpp#1880-1963
from https://hg.mozilla.org/comm-central/rev/802e67be3e3e
If we go this way, I'd still suggest to use xx.EqualsIgnoreCase("huhu") for better readability.

I've merged Gene's "back to switch" with the backout of what was already landed so it can be landed as a follow-up.

I think the switch has it's merits. Before the discussion was about micro-optimising string comparisons, with the switch we should be able to cut down on comparisons.

I put my money on this one ;-)

Attachment #9210149 - Attachment is obsolete: true
Attachment #9210403 - Attachment is obsolete: true
Attachment #9210403 - Flags: review?(benc)
Attachment #9210533 - Flags: review?(benc)
Attachment #9210533 - Flags: feedback+
Attachment #9210533 - Attachment is patch: true

I tried this and it works:

    if (headerStr.Equals(HEADER_X_MOZILLA_KEYWORDS ""_ns,
                              nsCaseInsensitiveCStringComparator) &&  !m_keywords.length)
      header = &m_keywords;

Anyhow, no idea here on speed but I suppose readability is still somewhat worse than EqualsIgnoreCase().

Yes, those concatenations work, what also works is u"" HEADER_X_MOZILLA_KEYWORDS if for some reason you wanted to turn this into a wide-string. But let's not micro-optimise string comparisons, let's avoid them with the switch as you proposed.

Comment on attachment 9210533 [details] [diff] [review]
1615064-follow-up-gds.patch - Gene's back to switch

Review of attachment 9210533 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM

::: mailnews/local/src/nsParseMailbox.cpp
@@ +941,5 @@
> +        if (headerStr.EqualsLiteral("to"))
> +          header = GetNextHeaderInAggregate(m_toList);
> +        break;
> +      case 'x':
> +        if (headerStr.EqualsIgnoreCase(X_MOZILLA_STATUS2) &&

Just to be clear, EqualsIgnoreCase() will cause a strlen at runtime. I'm pretty sure Gene's approach in comment 51 would avoid it and be faster.
But.
The speed aspect is already addressed by being behind the `case 'x':`, and I think EqualsIgnoreCase() is a more readable. So I'd say go with this patch as it is.
Attachment #9210533 - Flags: review?(benc) → review+
Attachment #9209640 - Attachment description: Bug1615064--reject-abbreviated-header-names-V2.patch → [checked in] Bug1615064--reject-abbreviated-header-names-V2.patch
Attachment #9210197 - Attachment is obsolete: true

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/2a5f6099970b
Follow-up: Optimize string processing in nsParseMailMessageState::ParseHeaders(). r=benc

Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Regressions: 1710213
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: