Closed Bug 949354 Opened 11 years ago Closed 9 years ago

Provide free database download of collected data

Categories

(Cloud Services :: Server: Location, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: qxc, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36

Steps to reproduce:

Since Mozilla Location Services names itself as free and open project the data collected by users should be provided as free download (not only via API but as database dump/CSV file/etc.) that contains all collected information.
Thanks for opening the issue. It's good to have this as a tracking bug.

We have an ongoing discussion internally and on the mailing list about this topic. I've also blogged a bit more about it with some background at http://blog.hannosch.eu/2013/12/mozilla-location-service-what-why-and.html. It's unfortunately a very complicated topic with no good or easy answers.

Since this is an open-ended question, it'd be best to keep discussing in on the mailing list for now.
I don't think this is complicated: in case it is REALLY a free and open project, all data should be opened to the public. That's the usual way in open source.

In case it is not, it should be stated clear and non-ambiguous on project page that data usage is secret. I think people that collect these data should know if they contribute to an open project or if the data are collected for some other reason - including the possibility they are sold to somebody.

Privacy is not an issue here since the API provides the same data - so either both should be closed or both, the API and the download, should be available. To have only one of both looks nonserious for this project.
The matter is complicated, because it is unclear whether or not we can legally provide an open data set or what form that data set could have (for example in an anonymized or aggregated form). When we started out with this project we assumed that we would only be dealing with underlying public data sources. 

Unfortunately that didn't turn out to be the case. In the same way you cannot just publish all existing books in the world (since there's copyright), you can also not just publish all WiFi access point locations or detailed measurements of cell locations, since there are personal rights and privacy regulations to respect. It's a complicated legal / privacy matter with very different views in different countries. Investigating those issues and also figuring out how to balance the privacy concerns for all the involved actors takes time and we are making slow but steady progress on this.

Please continue any debate on the mailing list, bugzilla is quite bad for having discussions.
how can it be legal to collect data while it is (possibly) illegal to provide these data for download? does mozilla exist outside of law or what is the explanation for this strange opinion?

so either stop collecting data or provide the data for download but the current state is implausible
@satzklauer: Consider Google Street View as a similar case. Taking street-side photos is legal in most countries. And Google can provide a service showing the taken photos. But before publishing those photos, they need to remove certain aspects from them, like blur out faces or license plate numbers.

In the same way it's legal to collect WiFi and cell data, but certain protections need to be in place, before offering this as part of a service. Everyone else adds those protections as part of a service layer, so this is a known and understood problem. But offering the raw or some part of the raw data is an entirely different matter.

You shouldn't be able to use Street View to track and follow the movements of anyone in the photos. And in the same way you shouldn't be able to use a location service to track the movements of anyone carrying a WiFi device or the movements of the persons uploading the data.

We've been discussing various approaches and techniques in the meetings and on the mailing list. So far we haven't found a solution to this problem. So for now the best we can promise is to run an open service.
Comparing it with Google Street View is nonsense: when I go to a special coordinate in StreetView I get all the images around this geolocation. Is there a function where I get all APs close to a specified geolocation in your API? No, it isn't.

But it's good to have this bug open for the public here, so everyone can see Mozilla is the same nonserious data-grabber like Google, Skyhook, Wigle and all the others that just collect the data, abuse communitys resources but do not give back anything.
openBmap have collected a similar database and release it for download under Open Database License (ODbL)
http://openbmap.org/
OpenWLANMap has collected a similar database and releases it for download under GNU FDL license: http://www.openwlanmap.org/download.php

So it seems to be no problem for all of these really open projects.
From an outsider point of view, it looks like as Mozilla is entering the mobile phone OS market, they are in an urgent need for a map/location service. While building its own loc service, Mozilla is trying to have an open approach but from my point of view this is not really the case:

1) Data collected are not freely available (only at Mozilla own discretion, and through a controlled API). From this point of view the data are as closed as the data of Google, Apple or other location service provider.
2) Limited improvement can be done to algorithm used to calculate position as data are lacking. If people have other idea, they will have to start their own collection to get a data set to work with.
3) Multiple "open" service of the same kind will still compete as some are providing the row data, or have a more open approach
4) Even if some people are trusting Mozilla more than Google or Apple, they still have to provide their position to some controlled server to get an answer, meaning that this data will stay out of their control.
5) Currently there is no point to provide an open source server without the data...
6) The current map is enough to breach on some people privacy, and it doesn't prevent Mozilla to publish it. 

In the end looking only at half of the problem (how to gather the data) and not how to open it, make this project purpose was mainly to solve an internal Mozilla issue and not really to bring an answer to the question of how to bring an open source approach to localization services.
Blocks: 862827
Status: UNCONFIRMED → NEW
Ever confirmed: true
I agree with the above comments: it seems that instead of trying to find ways of publishing the data, you are trying to find excuses not to do so. There are examples above of projects publishing their datasets without an issue. 

At the moment I do not understand how this service can be called "Open" - even to use the API a user/developer must request the access key. This is as "Closed" as it can get.

As you described in the blog post, there's three parties concerned: Contributors, Users and WiFi AP owners. The first category explicitly agrees to submitting data, the second as well (and is not involved when talking about opening up the dataset). This leaves us with only the hypothetical people who will "complain that their BSSID is now tied to lat/lon in some DB on the internet". Is this really a major issue? Is providing coordinates of a HEX string a crime in some country? Can a BSSID be considered personal data if it's hard to even prove "ownership" of one?

I believe just adding a page describing how to opt out by hiding the BSSID or appending _nomap to the SSID would make the 1% of paranoids happy, as well as the community that believes in open data.
FWIW, we currently publish the cell tower database (updated hourly) with a CC0 public domain license:

https://location.services.mozilla.com/downloads
Chris, that's a good start. Is there a way to bring your own location service using sources of the server side and this dataset? That's what I would call a good way to test usefulness of data currently made public.

This page describes that it's possible but does not tell you how, and also does not say if that's enough to operate the service:
https://github.com/mozilla/ichnaea/blob/master/docs/import_export.rst
(In reply to Chris Peterson (:cpeterson) from comment #11)
> FWIW, we currently publish the cell tower database (updated hourly) with a
> CC0 public domain license:
> 
> https://location.services.mozilla.com/downloads

Sorry but this is nothing more than an other approach to stall everything. About a year ago you stated your "discussion is ongoing". This would have been enough time to ask a lawyer / to find a solution in internal or public discussion / to solve the unsubscribe-issue.

And it is more than obvious why you are publishing this list: comparing to WLAN networks the number of cell towers is very limited and there are similar public databases already available. So you do not publish anything that is somehow new.

If you would be somehow honest, you would tell people clearly and loud, you are not interested in opening these data and you will never do that. You should do that especially for these people, that are collecting the data for you.
I think that at least it should be possible to get our own data back (instead of forking your app). Currently I don't really think there is a fundamental difference between contributing to this Mozilla project or a similar one from MS or Google.
No longer blocks: 862827
Severity: normal → enhancement
Is this discussion only about publishing WiFi-Data now?

If not it can be closed because cell data can be downloaded now anyways at
https://location.services.mozilla.com/downloads
This bug can be closed. We've gone through this in great length with our lawyers and haven't found any legal loopholes or technical workarounds to share more data. We publicly share the aggregated cell data under CC-0 as stated. We also explain the situation for the other data sets at the bottom of the https://location.services.mozilla.com/downloads page and the https://wiki.mozilla.org/CloudServices/Location/FAQ is updated.

I urge anyone who disagrees with our stance on privacy to contribute to other projects in this space, like OpenCellID or one of the various WLAN/WiFi mapping projects.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
@Djfe: Cell data are something completely different than WiFi data - it is easy for Mozilla to publish them because they are taken out of an already established and open database.

@Hanno: so your lawyers told you it is legal to collect and use all these data but it is illegal to publish thesm? Sorry, but I would gamble my right leg away you never asked a lawyer but made this decision based on marketing reasons only. With other words: Mozilla is the same untrustworthy data grabber like Google, Skyhook,...
I will agree with Elmi - there is no information presented whatsoever regarding how sharing this could be considered illegal in any country. Why is publishing maps not illegal? It matches physical addresses of people's homes to longitude and latitude. Pretty much the same idea with BSSID if you ask me.

I also see no effort happening to try and anonymize the dataset before publishing it. For example what about removing/replacing a couple of leading symbols in BSSID (usually vendor ID) to generalize data yet still leave it useful for geolocation?
This is just utter nonsense. How can you justify that it is not even possible for someone to retrieve their own data?

But I have understood the message. Bye bye Mozilla!
calm down people...

I totally understand that they don't want to publish that data.

Criminals could pick certain spots where lots of people live that own routers that are easily attackable. (the bssid contains the vendor id after all)

@evgheni
replacing that vendor id would make the bssid not unique anymore -> it wouldn't be too useful for geolocation anymore

encrypting the vendor id also doesn't make sense to me since you can guess certain numbers/names by the sheer amount

@elmi
"it is easy for Mozilla to publish them because they are taken out of an already established and open database." -> nope, they have their own database, even for cells, they only look up OpenCellID if they don't know a certain cell tower

I'm not sure if it would be entirely against any law, but people could get upset
against Mozilla for publishing such data because they promised that data privacy is one of their goals

but since wifi bssids aren't publicly unlike cells towers they fall under data privacy


"Mozilla is the same untrustworthy data grabber like Google, Skyhook,..."
-> why do you think so? and what do you think Mozilla is doing with that data?
how does publishing make the service trustworthy? I mean Mozilla/etc. could still whatever they want with that data despite publishing it;
this is just you getting upset about it and insulting Mozilla for something unfounded


@greatpatton
"This is just utter nonsense. How can you justify that it is not even possible for someone to retrieve their own data?"
->just for your information: MozStumbler already allows exporting as a kml file and you can fork/download ichnaea to establish your own service if you want to ;)
https://github.com/Mozilla/Ichnaea

the complete client-side code is available on github -> you can build your own and exchange the api URL

and the devs are working on more (keeping data after upload, public lib with the stumbling functionality and so on)
if you have any ideas, leave your feedback here: https://github.com/Mozilla/MozStumbler/issues


I don't want to offend anyone, I'm just discussing objectively

but if they said they won't do it then there's also not too much left
There were a few ideas suggested on the mailing list that can mitigate privacy issues. For example
* they could hash the BSSIDs, which would make it impossible for someone to find all the access points from a certain manufacturer.
* they could only give partial information, e.g. if the last bit of the BSSID is 0 give the latitude, if its 1 give the longitude. People can still locate them selves as long as they can see a few access points, but you can track down a single one.
* filter out an wifi point that has been seen in multiple places, to avoid containing locations of mobile phones, or people who have moved house.
Most important question for me: when the guys at Mozilla would REALLY have SERIOUS doubts regarding legality and privacy of the collected data: why don't they provide a possibility to let people unsubscribe their WLANs? OpenWLANMap has this possibility and provides a database download - with full respect to privacy. Same is true for openBmap.
(In reply to Elmi from comment #22)
> why don't they provide a possibility to let people unsubscribe their WLANs?

Add "_nomap" to your WiFi SSID, Mozilla as well as most other mapping services will honour that.
:hannosch I know that "_nomap" has been on the website before, now I can't find it: pls readd that useful information to https://location.services.mozilla.com/


"* filter out an wifi point that has been seen in multiple places, to avoid containing locations of mobile phones, or people who have moved house." -> Ichnaea is doing that already...
(In reply to Djfe from comment #24)
> :hannosch I know that "_nomap" has been on the website before, now I can't
> find it: pls readd that useful information to
> https://location.services.mozilla.com/

The homepage has a bullet point about "As the owner of a WiFi access point you can learn more about opting out of this service..." which leads to the /optout page where the _nomap suffix is explained.
You need to log in before you can comment on or make changes to this bug.