Closed Bug 507317 Opened 15 years ago Closed 14 years ago

Leaks which access private collection by public means in Add-on Collector

Categories

(addons.mozilla.org Graveyard :: Collections, defect, P3)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wis.master, Assigned: fligtar)

References

Details

(Whiteboard: [z][needs spec])

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1

1. Privacy: Add-on Collection is not truly private even if you
checked "Only people I invite can view my collection":
  1. The supposedly private collection will be showed in
"Related Collections"
  2. The supposedly private collection will show in the search
if it matches
  3. The supposedly private collection is indexed and cached in
Google and other web search engines
  4. The supposedly private collection can still be accessed by typing the URL directly (either by name alias or exact string number)

Reproducible: Always




This is a major privacy bug. Please fix it ASAP!!
Attached patch Patch, rev. 1Splinter Review
This patch hides private collections from "popular collections" lists by default.
Assignee: nobody → fwenzel
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #391555 - Flags: review?(jbalogh)
ad 2) I cannot confirm it being listen in search. I find

"$_where[] = "`Collection`.`listed`='1'";"

in the search code, and I just tried it out and private collections indeed do not show up.

ad 3) CCing Fligtar: Google is of course a problem here. If someone's collection is linked to from anywhere in the whole wide web, it's going to be indexed sooner rather than later. Do we want that? Only way to avoid that would be making private collections login-only. But that smells like the old sandbox and thus may not fly with the users. Any thoughts?

ad 4) This is (currently) expected behavior. How else would the people you give the link to be able to access the collection? Of course, we might consider something like a guest password of sorts (I am thinking of what flickr does, here). Fligtar may have thoughts on this one, too.
OS: Windows XP → All
Hardware: x86 → All
Clouserw: Should we get this fix into 5.0.8? The other questions will have to wait, but it seems like a good patch to include.
Attachment #391555 - Flags: review?(jbalogh) → review+
Committed to r48162.

I'll keep this one open for the questions in comment 2.
Assignee: fwenzel → fligtar
Status: ASSIGNED → NEW
2) Step to reproduce it

a. Add one addon into your private collection
b. Go to that addon page > Related Collections > Click on "and XX more collection"
Result: The search result will return private collections too.

3) I didn't post the private collection page anywhere. I'm not sure why the page has been cached. Perhaps it's leaked by the bugs listed above. To prevent the problem, I think all pages in private collection should be marked as exclusion so search engines won't index it:
http://www.google.com/support/webmasters/bin/answer.py?answer=93710

4) A layman way of fixing this problem:

IF PERSON_NOT_IN_FRIEND_LIST
  Page not exist
ELSE
  Load the content


By the way, would AMO have a better to indicate which collection in my list is private? For example, a different icon, different name color, an asterisk (*) beside the name, put the word "private" after the name, or italic name. Any way is fine. I just want an easy way to identify which collection is private in the list.
Fred's patch also fixes #2.
(In reply to comment #5)
> To prevent
> the problem, I think all pages in private collection should be marked as
> exclusion so search engines won't index it:
> http://www.google.com/support/webmasters/bin/answer.py?answer=93710

Hm, that would probably work indeed. If that'll get the pages *deleted* out of Google's index, I don't know.

> 4) A layman way of fixing this problem:
> 
> IF PERSON_NOT_IN_FRIEND_LIST
>   Page not exist
> ELSE
>   Load the content

That makes sense, of course, but AMO does not currently have the concept of a "friend".
(In reply to comment #2)
> ad 3) CCing Fligtar: Google is of course a problem here. If someone's
> collection is linked to from anywhere in the whole wide web, it's going to be
> indexed sooner rather than later. Do we want that? Only way to avoid that would
> be making private collections login-only. But that smells like the old sandbox
> and thus may not fly with the users. Any thoughts?

Perhaps the URL of private collections should be collection/private/{guid} and then we can mark everything in /private as non-indexed. I wasn't concerned about indexing if someone posted their private URL on a publicly accessible resource that Google indexes. But since it was our fault we were linking to these private collections publicly, we should probably fix it.

> ad 4) This is (currently) expected behavior. How else would the people you give
> the link to be able to access the collection? Of course, we might consider
> something like a guest password of sorts (I am thinking of what flickr does,
> here). Fligtar may have thoughts on this one, too.

This is working as intended.
(In reply to comment #7)
> (In reply to comment #5)
> > To prevent
> > the problem, I think all pages in private collection should be marked as
> > exclusion so search engines won't index it:
> > http://www.google.com/support/webmasters/bin/answer.py?answer=93710
> 
> Hm, that would probably work indeed. If that'll get the pages *deleted* out of
> Google's index, I don't know.
> 

Hm, Google won't update the page next time it crawls there. And I believe old cache would be deleted eventually. 

Well in case I'm wrong, another workaround is to let Google update but returns the empty page or page not found so Google will refresh its cache which won't leak the private collection any more.

Just a reminder you should tell *all* search crawlers to not index those pages in future, not just Google.

> > 4) A layman way of fixing this problem:
> > 
> > IF PERSON_NOT_IN_FRIEND_LIST
> >   Page not exist
> > ELSE
> >   Load the content
> 
> That makes sense, of course, but AMO does not currently have the concept of a
> "friend".

Who can view your collection?
-> Only people I invite can view my collection

Who can publish add-ons to your collection?
-> Myself and these users:
--- Enter the e-mail address of a Firefox Add-ons account:

Who can manage my collection?
-> Myself and these users:
--- Enter the e-mail address of a Firefox Add-ons account:

Give us the capability of compiling e-mail addresses as a "friend list". Use existing codes. It should work very similarly, and problem is resolved.

The new interface would be:

Who can view your collection?
-> Only people I invite can view my collection
--- Enter the e-mail address of a Firefox Add-ons account: ***NEW***

Who can publish add-ons to your collection?
-> Myself and these users:
--- Enter the e-mail address of a Firefox Add-ons account:

Who can manage my collection?
-> Myself and these users:
--- Enter the e-mail address of a Firefox Add-ons account:
We thought about this a good bit when figuring out the permissions system, and we would rather keep it very lightweight with just a secret, non-guessable URL that someone can pass around to people that should see the collection.

It's just a list of add-ons, after all.
(In reply to comment #10)
> We thought about this a good bit when figuring out the permissions system, and
> we would rather keep it very lightweight with just a secret, non-guessable URL
> that someone can pass around to people that should see the collection.

How do you verify the permissions of persons who are allowed to publish addons and manage your collection? Are you trying to say you don't have any verification for this sort of permissions either? If it's false, why not extend a bit if you already have a permission system in place? After all it's safer and more fail-safe.

Why? The private collection is still protected in case there are bugs in your system which reveals the links of the private collections (like this incident). You can't guarantee there are no bugs anywhere now which will reveal the secret URLs too. A simple verification can act as a fail-safe against this sort of bugs and mistakes.

There is a use case which a simple verification can save you. I tell my friends to view that private collection. One of my friends tells other friends about my link. One of the other friends share my link to someone else. Eventually the link may leak out by some thoughtless friend (that I may not even know). A non-guessable link doesn't help to prevent an irresponsible person from leaking your private link. 

A friend may forget that your link is private and leak it out by mistake. To err is human. :(

Or a friend post the private link in his personal blog. Although the blog isn't meant to be public but he forget to configure the blog properly so a search engine can access it and index your private collection.

How are you going to fix the private collections which have been indexed by different search engines already? The non-guessable URLs have been recorded. People can freely access your private collections when the search matches. With a permission system at least people can't access your private collection even if your link is still indexed in some search engines.

I realize not all search engines respect no-index rules. A crawler is much more capable of finding your secret links than a human. This may pose some other problems.

(In reply to comment #10)
> It's just a list of add-ons, after all.
Yes but it seems people enjoy complete privacy nowadays. It's enough to cast their concerns even their bookmarks are never revealed but bookmark data has been aggregating for statistics purpose. Aggregating bookmark data has no way to identify you personally but people are still concerned about it.

Some people get annoyed and complain when an addon is recording hit counts. They want it completely off.

Sometimes people are concerned even their bookmarks don't have any porn or embarrassing sites being bookmarked. They just don't want people know about their browsing habits to be revealed. By the same reason some people just don't want others to know what addons they install or use. 

So I guess some people would be upset when the private collection isn't meant to be truly private. They are going to be frustrated and have negative impressions on AMO if their collections are leaked out in the above scenarios. They usually don't understand the implications behind so they will get surprised when they discover it.

I think some people may not realise the link isn't secured. What I mean is they don't know the fact that an anonymous person can access the page as long as he knows the link. Creating a name alias for that secret link would make it easier for an anonymous to guess your secret links.

I believe AMO treats privacy seriously so either do something to comprehensively fix the current and potential problems, or at least post a warning so people knows the limitation of this so-called "private collection. Tell them your link isn't truly private. People can freely access it if your secret link is leaked out in some way (eg. crawlers and search engines).

After all I think you should apply the patch to AMO as soon as possible. More private collections being indexed means more problems harder to fix.
Just a short question. When are you going to apply the current patch to AMO?

The longer time it get patches, the more pages being indexed which we have no control on and so more messes left behind. Why the wait? Why not apply it now?

I'm eager to see the private collection leaks as soon as possible.
In my opinion, owners of private collections should be notified of Problems 1, 2 and 3 by some means, especially the fact that private collections may have been indexed by search engines.
(In reply to comment #8)
> Perhaps the URL of private collections should be collection/private/{guid} and
> then we can mark everything in /private as non-indexed. I wasn't concerned
> about indexing if someone posted their private URL on a publicly accessible
> resource that Google indexes. But since it was our fault we were linking to
> these private collections publicly, we should probably fix it.

Changing the URLs of private collections may be a good idea, but marking the new URLs as non-indexed does not make much sense, because we do make them public.  Instead, we should mark the old URLs as non-indexed.
(In reply to comment #12)
> Just a short question. When are you going to apply the current patch to AMO?

Our next push date is currently scheduled for this Thursday night.

(In reply to comment #13)
> In my opinion, owners of private collections should be notified of Problems 1,
> 2 and 3 by some means, especially the fact that private collections may have
> been indexed by search engines.

I'll discuss this with the team. We may want to offer these users an easy way to generate a new GUID for their collection.
(In reply to comment #14)
> 
> Changing the URLs of private collections may be a good idea, but marking the
> new URLs as non-indexed does not make much sense, because we do make them
> public.  Instead, we should mark the old URLs as non-indexed.

Why it's not a good idea? If the old URLs were marked as "no-index" in the first place such a problem like this won't appear at all? Don't assume your system is always bug-free or search engine has no way to find out the links even if we don't deliberately publish it. 

After all the new URLs aren't supposed to be indexed anyway. There is nothing negative to mark them as "no-index".
(In reply to comment #14)
> Changing the URLs of private collections may be a good idea, but marking the
> new URLs as non-indexed does not make much sense, because we do make them
Oops: we do -> we do not
> public.  Instead, we should mark the old URLs as non-indexed.

(In reply to comment #16)
> Why it's not a good idea?

My point in comment 14 is that changing the URLs of private collections and marking the new pages as non-indexed will not fix the problem in the way which Justin wrote in comment 8, because it does not tell search engines to discard their “cache.”  Marking the old pages as non-indexed will.

Marking the new pages as non-indexed might be good as well, but for a different reason.  I am not arguing that point.
OK, thanks for clearing that up. So we should add "no-index" to both old and new pages - a good idea as always!

I would also like to know more about the part of permission checking. Unless I get something very wrong we do have permission checks for people who manage collections and publish addons, don't we? Who says we don't have a permission system in AMO? 

We can take advantage of this and just extend the permission check to view collections too. What do you think?
As I wrote in IRC #amo yesterday, the number of collections shown on the add-on details page still counts private collections.  This is confusing.

To fix it, we seem to have to change the function getCollectionCountForAddon similarly to Frédéric’s patch in comment 4.

If this should be a separate bug, feel free to file one or tell me to do so.
Priority: -- → P3
Target Milestone: --- → 5.6
Whiteboard: [z][needs spec]
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: