Last Comment Bug 58769 - don't collect addresses into CAB that occur in other AB's
: don't collect addresses into CAB that occur in other AB's
Status: VERIFIED FIXED
: helpwanted
Product: MailNews Core
Classification: Components
Component: Address Book (show other bugs)
: Trunk
: All All
: P3 normal with 31 votes (vote)
: Thunderbird 3.0rc1
Assigned To: Mark Banner (:standard8, afk until Dec)
:
:
Mentors:
: 75050 77524 155969 199196 209211 230158 237455 240935 252008 280780 281653 294703 320993 365452 386076 393555 433822 461895 (view as bug list)
Depends on: 450149 451370
Blocks: 126022 129393 446571 446574
  Show dependency treegraph
 
Reported: 2000-11-01 12:16 PST by Tim Taylor
Modified: 2015-04-09 12:53 PDT (History)
51 users (show)
asa: blocking1.7a-
mkmelin+mozilla: blocking‑thunderbird3-
mkmelin+mozilla: wanted‑thunderbird3+
standard8: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Stop collecting addresses in other ABs (16.56 KB, patch)
2008-11-10 16:21 PST, Mark Banner (:standard8, afk until Dec)
mozilla: review+
neil: superreview+
Details | Diff | Splinter Review

Description Tim Taylor 2000-11-01 12:16:54 PST
Don't collect addresses into the Collected Address Book if the address already
exists in other Address Books (PAB or user created).

Current behavior:

  1. create an entry for tim.taylor@iname.com in my PAB
  2. send an email to or receive an email from tim.taylor@iname.com
  3. tim.taylor is collected into CAB

or

  1. send an email to or receive an email from tim.taylor@iname.com
  2. tim.taylor is collected into CAB
  3. "move" tim.taylor from CAB to PAB
  4. send/receive another email to/from tim.taylor@iname.com
  5. tim.taylor is collected into CAB

In both cases, we end up with two entries for tim.taylor in the address books:
one in the PAB and another in the CAB.  If I want to fill in some fields for
that address, I need to enter them in both AB entries.  This defeats the purpose
of having a PAB at all.

Expected behavior:

1. create an entry for tim.taylor@iname.com in my PAB
2. send an email to or receive an email from tim.taylor@iname.com
3. tim.taylor isn't collected into CAB

or

1. send an email to or receive an email from tim.taylor@iname.com
2. tim.taylor is collected into CAB
3. "move" tim.taylor from CAB to PAB
4. send/receive another email to/from tim.taylor@iname.com
5. tim.taylor isn't collected into CAB

This should also take into account the "additional email" field in the address
book entry when determining whether to collect into the CAB.  In other words:

Expected Behavior:
1. create an entry for tim.taylor@iname.com in my PAB
2. specify tim_taylor_mi@yahoo.com as additional email address for tim.taylor
3. send an email to or receive an email from tim.taylor@iname.com
4. tim.taylor isn't collected into CAB
5. send/receive email to/from tim_taylor_mi@yahoo.com
6. tim_taylor_mi isn't collected into CAB
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2000-11-13 00:00:21 PST
Confirming.  I see the current behavior on linux trunk 2000111106 and the
expected behavior would definitely be nicer.  Right now I get a friend of mine
in three different incarnations in autocomplete; all three end up going to the
same address and two are collected while also being listed in my personal
address book.
Comment 2 pmock 2000-11-28 17:15:17 PST
Changing qa assign
Comment 3 scottputterman 2000-12-21 09:29:20 PST
reassigning to chuang.
Comment 4 fenella 2001-02-20 14:01:30 PST
QA-assign-to myself.
Comment 5 Boris Zbarsky [:bz] (still a bit busy) 2001-04-25 10:38:50 PDT
*** Bug 77524 has been marked as a duplicate of this bug. ***
Comment 6 Chris Dolan 2001-07-23 12:13:23 PDT
Adding myself to CC list -- I like this enhancement idea!
Comment 7 scottputterman 2001-11-01 00:13:18 PST
reassigning to cavin.
Comment 8 Ninoschka Baca 2002-01-18 17:41:32 PST
*** Bug 75050 has been marked as a duplicate of this bug. ***
Comment 9 Erich 'Ricky' Iseli 2002-06-09 03:33:28 PDT
I've been dreaming of this for a while now, but only today decided to look for
it in Bugzilla.

What needs to be mentioned too is that only the e-mail address, case-sensitive
should be checked for duplicates. Actually some people change the
case-sensitivity and/or name:
John Travolta <john.travolta@showbiz.com>
Travolta, John <John.Travolta@showbiz.com>
should be considered as the same address.
Comment 10 R.K.Aa. 2002-07-05 16:26:14 PDT
*** Bug 155969 has been marked as a duplicate of this bug. ***
Comment 11 Steve VanSlyck 2002-07-05 16:28:48 PDT
It would also be nice to have some way, perhaps interactive with the user at
runtime, to accurately set the html preference of the new entry before it is saved.
Comment 12 Ian Neal 2002-09-13 07:37:27 PDT
This also affects LDAP address books, see bug 126022
Comment 13 Boris 'pi' Piwinger 2002-10-09 03:01:03 PDT
This bug was marked as depending on bug 150360 which was duped onto bug 85344.
Correcting dependency.

Anyways, the target milestone was missed, so there is something to be done, but
this bug is not even assigned.

pi
Comment 14 R.K.Aa. 2003-03-25 13:27:38 PST
*** Bug 199196 has been marked as a duplicate of this bug. ***
Comment 15 Sam Hulick 2003-03-25 13:32:08 PST
This bug was opened in Nov of 2000??  am i reading that right?
Comment 16 Ian Neal 2003-03-25 19:15:58 PST
Yes, you are reading that correctly.
Comment 17 Sam Hulick 2003-03-26 07:58:24 PST
wow, then this is pretty old.  it seems to be fairly minor though so I guess no
one is breaking down the doors for this one, though there are 12 votes
Comment 18 Jim Booth 2003-06-15 13:26:27 PDT
*** Bug 209211 has been marked as a duplicate of this bug. ***
Comment 19 Jim Booth 2004-02-01 09:50:04 PST
*** Bug 230158 has been marked as a duplicate of this bug. ***
Comment 20 Asa Dotzler [:asa] 2004-02-02 12:49:20 PST
transferring nomination from duped bug.
Comment 21 Asa Dotzler [:asa] 2004-02-09 14:52:49 PST
Not going to block for this. Who can own this?
Comment 22 Bogdan Stroe 2004-04-25 21:33:20 PDT
*** Bug 240935 has been marked as a duplicate of this bug. ***
Comment 23 Jo Hermans 2004-07-18 11:18:45 PDT
*** Bug 252008 has been marked as a duplicate of this bug. ***
Comment 24 OstGote! 2004-11-26 01:58:16 PST
*** Bug 237455 has been marked as a duplicate of this bug. ***
Comment 25 OstGote! 2005-02-03 02:45:28 PST
*** Bug 280780 has been marked as a duplicate of this bug. ***
Comment 26 Steve Anonymous 2005-02-03 03:32:15 PST
This would be a fantastic improvement as well.  I am going to use one of my
votes for it.
Comment 27 OstGote! 2005-02-09 07:48:08 PST
*** Bug 281653 has been marked as a duplicate of this bug. ***
Comment 28 Shriramana Sharma 2005-08-14 22:56:14 PDT
Please can we see some action on this bug? It's really annoying when I set out
to clean my Collected Addresses book. 

I should also request that the check should be carried out on the mail id, not
the same, as people write stuff like "Taylor, Jim" or "Jim Taylor" with the same
mail id, or sometimes their mail clients munge their names that way. 

Also, the mail id check should be carried out case-insensitive.
Comment 29 Mark Banner (:standard8, afk until Dec) 2005-08-23 04:57:12 PDT
*** Bug 294703 has been marked as a duplicate of this bug. ***
Comment 30 Mark Banner (:standard8, afk until Dec) 2005-12-20 12:36:21 PST
*** Bug 320993 has been marked as a duplicate of this bug. ***
Comment 31 Mark Banner (:standard8, afk until Dec) 2005-12-21 05:05:23 PST
I don't believe that the dependency on bug 85344 is valid (why should adding more fields help us to implement not collecting other addresses???) - therefore I'm removing it.
Comment 32 Matt Taylor 2007-04-19 12:15:46 PDT
This really should be a bug not an enhancement.  Maybe the impact is not understood, let me try to explain:

- Used has 200 addresses in "Collected Addresses".
- They sort them into separate address books ("Friends", "Work" etc.)
- Collected Addresses is now empty.

3 months later, they have sent lots of emails, mostly to old contacts, but some to new contacts.

- They open "Collected Addresses".  It contains 205 addresses.

How are they supposed to know which are the 5 new addresses?
Comment 33 Daniel Beardsmore 2007-07-01 13:54:08 PDT
*** Bug 365452 has been marked as a duplicate of this bug. ***
Comment 34 Phil Hannent 2007-08-09 02:57:21 PDT
Just poking around to see if I could spot where this is happening.  

In mozilla\mail\components\compose\content\MsgComposeCommands.js there is a function called GenericSendMessage which I would have expected to check for the preference to add the recipient's to the CAB.  However its not there and there and it ends with a call to 

gMsgCompose.SendMsg(); 

However in the 30mins I spent looking I couldn't find where this was defined.  As it would be the next logical place for me to look.

I am expecting that the add to address book function call just needs extra checks against the address book database but until I find where they are getting added I cannot be sure.

Anybody care to point me in the right direction?
Comment 35 Phil Hannent 2007-08-09 03:17:11 PDT
I think I found it:

mozilla\mailnews\addrbook\src\nsAbAddressCollecter.cpp

Comment 36 Phil Hannent 2007-08-09 03:32:03 PDT
Re: nsAbAddressCollecter.cpp

From my figuring (which could be wrong) we have a function called CollectAddress and GetCardFromAttribute (for comparing the emails).

in GetCardFromAttribute thee is the following:
// Please DO NOT change the 3rd param of GetCardFromAttribute() call to 
// PR_TRUE (ie, case insensitive) without reading bugs #128535 and #121478.
return m_database->GetCardFromAttribute(m_directory, aName, aValue, PR_FALSE /* retain case */, aCard);

Which indicates that the comparison is case sensitive due to the bugs listed.
Comment 37 Jose 2007-12-11 12:02:04 PST
According to bugzilla, those bugs (bugs #128535 and #121478) are "verified fixed".  And this bug is listed as an "enhancement".... it's not.  It's a bug, a severe enough bug as to make "collected addresses" useless.

Jose
Comment 38 zug_treno 2008-03-10 06:56:59 PDT
*** Bug 386076 has been marked as a duplicate of this bug. ***
Comment 39 Mark Banner (:standard8, afk until Dec) 2008-05-15 01:58:03 PDT
*** Bug 433822 has been marked as a duplicate of this bug. ***
Comment 40 Scott 2008-07-03 13:32:09 PDT
I agree with Jose, >>"collected addresses" useless<< without this fix.

Comment 41 Greg Colyer 2008-08-11 16:56:06 PDT
There are times when it may be useful to have variant forms of the Display Name field. (An example is if you want to send a message with bare addresses only. At some point this became impossible for an address with an AB entry unless there is an AB entry for that address with a blank Display Name. Actually I consider that a separate bug, but for the sake of this argument let's assume it will stay that way.) On this theory any new variant should perhaps still be collected into CAB.

Comment #32 could still be dealt with, by UI that indicates whether a given address is duplicated or is unique. In fact, if this alone were fixed, the remainder of the bug would not be very serious and could easily be worked around.
Comment 42 Greg Colyer 2008-08-11 17:03:26 PDT
For clarity, another example of the above requirement is if you receive (reply to) a message from "Smith, Bob (home)" and a message from "Smith, Bob (work)"; it is useful not to discard the additional information about home vs. work just because you already have an entry for "Bob Smith" corresponding to one of them. (You may later want to tidy them up to only two entries, but the information loss would already have occurred.)
Comment 43 Greg Ackerson 2008-08-12 08:56:24 PDT
Sure, but wouldn't Bob@Home and Bob@Work be different email addresses? If you only store Collected Addresses on the email address, Bob would indeed end up with two separate entries, which is what you'd want.
Comment 44 Jose 2008-08-12 09:49:44 PDT
re: comment 41:
I can think of few reasons to have variations on the display name field automatically created by the program.  Sending messages with bare addresses should be done by the SeaMonkey client itself, not "worked around" by duplicating addresses in the address book.  And while it's true that email addresses are sometimes shared among several people, and people have several email addresses, the way to handle this is to make the address book into a relational database.  I'm not holding my breath.  I'd be happy if the program simply did not collect any Email address with a case insensitive duplication anywhere in any of my address books.  I'll handle the exceptions myself by hand.

Jose
Comment 45 Greg Colyer 2008-08-12 14:27:35 PDT
I can certainly see no benefit, and can see some inconvenience, in collecting an entry into CAB that agrees in its entirety with an existing entry in some other AB (where such agreement means, I think: every non-blank field in the collected entry is the same as the corresponding field in the existing entry). But if the two entries agree in address but differ elsewhere (in a non-blank field of the collected entry) then the collection is not necessarily a bug. Two people with the same address is indeed a better example of why. Exceptions will always be dealt with by hand, but the question is what should be the exception and what should be the norm.

Comment #32 is still an issue, however, and I was pointing out that UI to fix it would also enable a better workaround to the bug. Not that this would be as good as fixing it, but it would be better than what we have.
Comment 46 Jose 2008-08-12 14:49:56 PDT
>>
But if the two entries agree in address but differ elsewhere (in a non-blank
field of the collected entry) then the collection is not necessarily a bug.
<<

The only "fields" that can be collected are the Email address and the display name.

Email addresses that differ only in case or leading or trailing spaces are identical and should not be considered different.  Right now they are.  Bug.

Display names most often differ because of spelling, leading spaces, inclusion (or omission) of a middle initial, nickname, etc.  These are the same person.  Sometimes a display name will differ because of collision between "First name + Last name" and "display name".  These are the same person too.

Rarely, I'll get an Email from somebody else who is sharing an Email address.  I prefer to put that in by hand.  It happens once, whereas case and spelling stuff happens all the time.

The "collected addresses" is a convenience if it gets ONLY the new Email addresses, and leaves the other fields to the user.  It is =useless= if the computer is constantly insisting that
J smith 
J. Smith
 John Smith
John Smith
JOhn Smith
Jon S.
j. smith 
 JohnSmith
and John SMith

are different even though they share the same Email address.

>>
the question is what should be the exception
and what should be the norm.
<<

The norm should be that if the two Email addresses are identical in a case insensitive comparison of non-whitespace characters, then nothing should be collected.  GO through your own collected addresses and see how many are dupes, and how many are legitimate cases of "different people".

Jose

Comment 47 Rick Stockton 2008-08-13 14:15:42 PDT
even worse, in a related case:

If I click "add to address book" (defaulting to PAB), it appears to create a new card -- even when a 100% identical card ALREADY EXISTS in the SAME address book!

This is not merely identical with respect to "case-insensitive, whitesapce removed": they are 100% identical duplicates, created in exactly the same way.
Comment 48 Greg Colyer 2008-08-13 16:40:57 PDT
I understand the points being made; however, let's not exaggerate.

Comment #46: CAB is not "useless" at present; it is certainly a convenience for some. One may be perfectly happy to choose between duplicates when they are offered as alternative completions.

Comment #47: You are explicitly asking to do something that you then complain is done! Yes, the redundancy could potentially be detected, but this is hardly "even worse" than the other bug(s).
Comment 49 Greg Colyer 2008-08-13 16:56:20 PDT
Rather more importantly, the bit before the @ sign of an e-mail address is in principle case sensitive. RFC 2821 says so, in the strongest way (section 2.4):

                                         The local-part of a mailbox
   MUST BE treated as case sensitive.  Therefore, SMTP implementations
   MUST take care to preserve the case of mailbox local-parts.  Mailbox
   domains are not case sensitive.  In particular, for some hosts the
   user "smith" is different from the user "Smith".  However, exploiting
   the case sensitivity of mailbox local-parts impedes interoperability
   and is discouraged.

Clearly, most implementations uphold the last sentence by violating the other ones! If most really means all, then we could do the same.
Comment 50 Greg Colyer 2008-08-13 17:03:22 PDT
Most final delivery agents, that is.

In our case, ABs are used to provide addresses for SMTP, so the RFC certainly applies.
Comment 51 Jose 2008-08-13 17:38:00 PDT
>>
Comment #46: CAB is not "useless" at present; it is certainly a convenience for
some. One may be perfectly happy to choose between duplicates when they are
offered as alternative completions.
<<

I have not met such a one.  

If duplication is meaningful, it should be under the control of the user, not
the machine.  That is, in the rare instances where I actually want to choose
between "J. Smith" and "J Smith" (and "Jon Smith" and "JOn Smith" =every=
=time= I send an Email, I'll set those cards up myself.  In the much more
common instance that I reply to people who, while being the same person, send
with different display names, I end up with a collection of meaningless
duplicates.  

This means...

1:   that I have to choose among meaningless duplicates every time, and also
means that in the uncommon (but important) instance where I actually send to a
totally new person with a totally different name and a completely different
Email address (for example, the first time I send to 
Amadeus Mozart <tunesmith@longhairs.com>
this name gets tossed not in a pile of NEW PEOPLE, but rather, in a huge
collection of meaningless variants of existing people.

2:  that if I have a bunch of people (say, a theater cast) to whom I am sending
replies, and I may not have all of the people already in my book, but I'm not
sure who I do have and who I don't have, and who has just changed Email
addresses, I have to check EACH AND EVERY Email address in my CAB against the
ever growing "theater people" address book, because the computer, which is
supposed to be able to do this for me, won't.

So, when I go to put my new collected addresses where they belong, instead of
having five, I have five hundred to sort through. and four hundred ninety five
need to go in the trash.  Further, unless I restrict all my Email activity to
this one address book, I have to go through =each= of my address books to check
for dupes.  (I don't know if fred35@guesswhere.com is a theater person, a music
person, a friend with a new address, a clay person, a commerce person, or
belongs to any of the other categories.

>>
Rather more importantly, the bit before the @ sign of an e-mail address is in
principle case sensitive. RFC 2821 says so, in the strongest way (section 2.4):
<<

Ouch.

A Pox on RFC 2821.  I always thought that internet addresses were case
insensitive.  They =should= be case insensitive.  But Quantum Mechanics should
be nothing more than a bad dream.  Alas...

So, I propose that we do NOT break RFC 2821, but we still need to work around
the issue so that the CAB is actually useful, by adding a few bells.

One workaround:

First, leading and trailing whitespace would be trimmed.  Then...

1:  CAB would have another field... a dynamic virtual field which would
indicate the kind of duplication, if any, that exists with this card.  The only
field to be considered would be the Email address. 

2:  The user would be able to hide or display the following kinds of
duplications:

1: hide/display all entries which duplicate, in a case INsensitive manner, the
Email address of any existing entry.  The CAB entry "rSmith@abc.com would be a
duplicate of the existing RSmith@abc.com, and would be affected here.  The
Email address would be a link, clicking on it would open a window listing the
duplicates (and their locations), along with the new CAB entry.  Users could
edit all entries using that window.

2: hide/display all entries which duplicate, in a case =sensitive= manner, the
same as above.  
Robert<RSmith@abc.com> 
and 
Bob<RSmith@abc.com> 

would be duplicates here because their Email addresses are identical, even
though their NAME fields are different.

IF the user chooses to hide both of these, they would only see the entries that
reflect Email address that do not exist in any of their ABs.  They would also
not see "new" Email addresses that differ only in case... but at least they
could choose to view those too if they wanted, after they dealt with the
guaranteed new addresses first.

Another possible workaround is a user preference for collecting:

[x]  DO NOT COLLECT into the CAB if the email
     address exactly matches an existing entry
     in any address book, irrespective of
     differences in any other field (i.e. "name")

[x]  DO NOT COLLECT into the CAB if the email
     address matches an existing entry in any
     address book, ignoring capitalization as 
     well as differences in any other field.

A third, similar, workaround would be to have three different CABs.  One would
be for guaranteed unique Email addresses.  The only entries here would be those
whose Email address fails to match any entry in any address book, in a case
insensitive test, irrespective of any other fields.  There'd be another CAB for
case sensitive comparisons, and another for entries that differ anywhere (such
as the name field).  That way you could have the best of all worlds.

Jose
Comment 52 Trevor Morgan 2008-08-14 03:15:49 PDT
RFC 2821:
* is about SMTP (i.e. from address book through to destination mail server).
* says that the case of local parts must be preserved.
* discourages the use of case-sensitive local parts.
i.e. Case MUST be preserved up to where mail is sorted into mailboxes, where it SHOULD be ignored.

The decision about whether to collect an e-mail address or not is outside the scope of SMTP.  If the address is collected, however, the case (of local parts) must be preserved.

Collected address book is most useful where only new addresses are added.  Because _people_ frequently violate RFC 2821 case preservation when writing down or typing in e-mail addresses, case variations are common and should be ignored in this instance.

Likewise I get many variations of display name and only a few people who share an address (and they don't think of changing their display name).  So variations in display name should not be collected.

Adding options for case sensitivity and display name matching may be useful to keep the pedants quiet, but are probably not necessary.

Trevor
Comment 53 Steve Simms 2008-09-01 10:28:33 PDT
*** Bug 393555 has been marked as a duplicate of this bug. ***
Comment 54 Magnus Melin 2008-10-21 12:45:06 PDT
Wanted yes, but not a blocker.
Comment 55 Mark Banner (:standard8, afk until Dec) 2008-11-10 16:21:17 PST
Created attachment 347399 [details] [diff] [review]
Stop collecting addresses in other ABs

This patch stops us collecting addresses that exist in other address books. It does not address the case-sensitivity issues. They will be addressed in a later patch/bug; the work for this patch is a big enough change that case-sensitivity needs to be thought about and addressed later.

nsIAbAddressCollector::getCardFromAttribute is no longer needed (we removed the last js instance recently), hence the removal.

The rest of the code is reworked so that we're not using mork-specific code (which will mean we match the UI), and hence as more address book types become writeable, we'll be able to correctly collect to those as well.

Also included is a unit test to cover the new functionality.
Comment 56 David :Bienvenu 2008-11-10 16:51:53 PST
Comment on attachment 347399 [details] [diff] [review]
Stop collecting addresses in other ABs

Maybe change the name of SetAbUri? It's odd to pass in a pref branch - maybe SetAbUriFromPrefs?

"adding new cards and modifying existing ones" reads better.

+ * This tests the main collection functions for adding new cards, modifying
+ * existing ones.

I guess we hold all local AB's open, so this won't have a huge performance impact.
Comment 57 Mark Banner (:standard8, afk until Dec) 2008-11-11 09:49:11 PST
(In reply to comment #56)
> + * This tests the main collection functions for adding new cards, modifying
> + * existing ones.
> 
> I guess we hold all local AB's open, so this won't have a huge performance
> impact.

That's correct with how it is set up at the moment. If we change it in the future then we'll have to revisit this and various other areas.
Comment 58 Mark Banner (:standard8, afk until Dec) 2008-11-11 10:23:11 PST
Checked in: http://hg.mozilla.org/comm-central/rev/f1f7bcf2dee2

This now fixes this bug - addresses in other AB's won't be collected. It does not fix it in the LDAP situation - that is bug 129393.

The case sensitivity issues will be covered by bug 446571 and or bug 446574. I'll comment in those in a while to point them back at the discussion on this bug.
Comment 59 Mark Banner (:standard8, afk until Dec) 2008-11-11 10:24:44 PST
(In reply to comment #58)
> The case sensitivity issues will be covered by bug 446571 and or bug 446574.
> I'll comment in those in a while to point them back at the discussion on this
> bug.

and bug 129393
Comment 60 Przemyslaw Bialik 2008-11-16 13:48:58 PST
->VERIFIED

Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1b2pre) Gecko/20081116 Lightning/1.0pre Shredder/3.0b1pre + Mac
Comment 61 Mark Banner (:standard8, afk until Dec) 2008-11-17 05:19:30 PST
*** Bug 461895 has been marked as a duplicate of this bug. ***
Comment 62 Merlin Cul Kirkpatrick 2009-08-23 15:48:26 PDT
To the best of my knowledge the case sensitivity bug still exists in the main code.  To get it fixed please go to bug 129393 and click the vote button to raise its level of importance.
Comment 63 BUILDERGREGORYMATTHEWACKERSONBUILDER 2015-04-09 12:31:11 PDT Comment hidden (spam)

Note You need to log in before you can comment on or make changes to this bug.