Closed Bug 528671 Opened 15 years ago Closed 15 years ago

Private collection is not private. No privacy is in private collections.

Categories

(addons.mozilla.org Graveyard :: Collections, defect)

defect
Not set
major

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 507317

People

(Reporter: nico.nelson-fodpsfst, Unassigned)

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5

I have created some private collections recently (about 1 month ago). Those collections are only for me and I suppose I have total privacy on those collections but facts prove me wrong.

I NEVER share the collections.
I NEVER show the collections anywhere.
I NEVER did anything or left any hints which might expose my private collections.

Very much to my surprise I found my private collections in the search engines. I keep checking and I found out the truth that I couldn't believe myself. In fact ANYONE can browse your private collections freely without any restrictions EVEN THOUGH you have carefully selected only me can view the collection. Mozilla Foundation clearly doesn't want to honor your request.

I'm sorry if I sound rude and aggressive but I'm very disappointed. How come a private collection is meant to be shared around the world and stored in the wild without my permission? That's not what I understand about private collection. I suppose Mozilla Foundation cares about people privacy. Mozilla Foundation gives me a false sense of safety and privacy by offering private collections. I'm very disappointed.

Please, for God's sake, change the misleading name. Mozilla Foundation please leave a warning to tell people that a private collection will still be shared among other people.

Reproducible: Always

Steps to Reproduce:
1. Create a private collection
2. No privacy is in private collection. Other people without granting permissions can still browse your private collections. Your collections will eventually be indexed in search engines.
My presumptions:

The reason the indexing happens is because there is no "Disallow" rule in the "robots" file.
And this is the case because the unique URL for collections does not differ between "public"/"private" setting.

Furthermore the text in the options

"By default, collections appear in the public Collection Directory and are discoverable by anyone. If you want to restrict your collection to be viewable only by people who are given a special link"

is not very clear on the issue regarding exluding search engines. maybe it's all about a misunderstanding of the meaning of "private".
OS: Windows XP → All
Hardware: x86 → All
The intention is for private collections to not be listed by search engines. We didn't add it to the robots file because we shouldn't be linking to it anywhere that a search engine can find out about. If it's in a search engine and you didn't link to it, there must be a bug somewhere on the site that lists these collections.

I created a private collection just now and tried to look for it in all the places it might appear and didn't see it. So, I'm not sure how this is happening.
What is the name and URL of this mysterious (un)secret collection?
The following might be situations or reasons why a private collection is indexed. I remember other people complained this too some time ago on some forum.

Scenario 1
As far as I remember collection is public by default and it doesn't ask you whether you want to create a private or public collection during the creation process. That means your collection is public until you configure the settings properly. It's possible that a web crawler just came by at that moment when you are trying to change it into a private collection.

Once the web crawler knows your URL changing it into private doesn't help. It can still access and index your page. You have no way to exclude yourself.

Also I believe some people may add descritpions and add some addons first, then change the setting. It will give the web crawler longer time to index your page first.

Scenario 2
Similar to the above but you use the auto-publisher to publish your collection. There may be also some bugs during the process.

Scenario 3
It's not clear whether your collection is public or private. By viewing the front page or seeing the name alone, you don't really know whether your collection has been correctly configued as privated. The only way to know it is to check the setting. It's possible that you forget to change your collection into private by mistake, espeically when you are creating multiple collections.

Scenario 4
Perhaps you told your friends about your URL but your friend is not as clever as you. He may post your link in his "private" blog but this blog is accessible to the public (although the majority aren't interested in it).

Scenario 5
You can set an alias for your collection. It's possible people can find you out by guessing the name. Perhaps someone use the alias before and deleted later. However the web crawler has already indexed that alias. Now you use that alias so your private collection is being indexed even though you did nothing wrong on your part.
What we could probably do is put private collections under .../collection/private/{uuid}, disallow all of /collection/private for crawlers, and subsequently use a 301 (permanent) redirect from the public URL to the private one. That keeps collections accessible even when the privacy status changes. The permanent redirect will let Google update its URLs, and (hopefully?) due to the robots.txt entry will make it remove the item from the index.
(In reply to comment #5)
> What we could probably do is put private collections under
> .../collection/private/{uuid}, disallow all of /collection/private for
> crawlers, and subsequently use a 301 (permanent) redirect from the public URL
> to the private one. That keeps collections accessible even when the privacy
> status changes. The permanent redirect will let Google update its URLs, and
> (hopefully?) due to the robots.txt entry will make it remove the item from the
> index.

This is a pretty good idea. At least it offers some basic protection against this kind of risks but there are different web crawlers on the Internet for example crawlers from Russia and China. I realize not all web crawlers respect or read robots.txt so this method may not work perfectly.

We may consider giving a visual indicator to private collections, let's say a private icon beside the name. We should aslo move the private/public configuration during the creation process. Those measures would prevent problems caused by Scenario 1-3.
(In reply to comment #5)
> What we could probably do is put private collections under
> .../collection/private/{uuid}, disallow all of /collection/private for
> crawlers, and subsequently use a 301 (permanent) redirect from the public URL
> to the private one. That keeps collections accessible even when the privacy
> status changes. The permanent redirect will let Google update its URLs, and
> (hopefully?) due to the robots.txt entry will make it remove the item from the
> index.

It'd be easier just to stick a meta tag on private pages. <meta name="robots" content="noindex">
(In reply to comment #5)
> What we could probably do is put private collections under
> .../collection/private/{uuid}, disallow all of /collection/private for
> crawlers, and subsequently use a 301 (permanent) redirect from the public URL
> to the private one. That keeps collections accessible even when the privacy
> status changes. The permanent redirect will let Google update its URLs, and
> (hopefully?) due to the robots.txt entry will make it remove the item from the
> index.

This measure gives an additional benefit that allow user to know the status of the collection by viewing the URL. You no longer need go to the collection configuration page to verify whether your collection has been set to private.
Prevention is better than cure. On second thought I have come up with a preventive measure which should close up all possible leakage and solve this problem once and for all.

Login requirement
As we can require users to login to install sandboxed addons we can require users to login to visit any private collections too. This would ensure private collection won't leak no matter what scenarios happen, and no matter what very hidden bugs the system has but we are still not aware of. You no longer need worry your friends leak your URL out mistakenly. The web crawler can't index the page in any way no matter how advanced its crawling techniques are. What do you think?

We should fix it as soon as possible so as to reduce the damages and the numbers of affected users.
(In reply to comment #4)
> As far as I remember collection is public by default and it doesn't ask you
> whether you want to create a private or public collection during the creation
> process.

The creation process does let you choose the privacy setting.
(In reply to comment #5)
> What we could probably do is put private collections under
> .../collection/private/{uuid}, disallow all of /collection/private for
> crawlers, and subsequently use a 301 (permanent) redirect from the public URL
> to the private one. That keeps collections accessible even when the privacy
> status changes. The permanent redirect will let Google update its URLs, and
> (hopefully?) due to the robots.txt entry will make it remove the item from the
> index.

With this method, it is difficult to make sure that a collection is accessible when the owner changes it from *private* to *public* and the old 301 response is cached by the browser.  I would avoid using 301 and prefer the meta element (comment #7).  I do not know whether search engines remove the old pages when we replace it by 301 or add meta to the pages, though.

(In reply to comment #8)
> (In reply to comment #5)
...
> This measure gives an additional benefit that allow user to know the status of
> the collection by viewing the URL. You no longer need go to the collection
> configuration page to verify whether your collection has been set to private.

This is true, but having to look at the URL to check if the collection is public or private is a poorly-designed UI anyway.  It is better to mark private collections as such in the list of My Collections and the collection page itself, and this is not in the scope of this bug.
(In reply to comment #11)
> I do not know whether search engines remove the old pages when
> we replace it by 301 or add meta to the pages, though.

Google definitely does.  I assume any other reputable one will also.
Status: UNCONFIRMED → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Hello! I'm afraid it isn't a duplicate of bug 507317. I did a test and I couldn't see the private collection in the search result or the addon's page. Bug 507317 seems to be fixed and should be marked as fixed.

A private collection may still be leaked due to other reasons and bugs listed in the scenarios in comment #4. Search crawlers nowadays have advanced techniques to crawl sites so it isn't surprised a crawler can find out the private collection in some way.
(In reply to comment #14)
> I'm afraid it isn't a duplicate of bug 507317.

This bug is the same as the item 3 in bug 507317 comment #0.
-> Verified duplicate
Status: RESOLVED → VERIFIED
But not all problems are the result of item 3 (read Comment 4).

After all isn't one bug per ticket? Why group together?
(In reply to comment #16)
> But not all problems are the result of item 3 (read Comment 4).

I have not read the long comment #4, but if your problem is not about private collections being indexed by search engines, then it is a different bug from this bug (528671).  Please file it as a separate bug if the same problem is not reported in Bugzilla yet.
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.