911306 - Does Mozilla have a preferred "open data" license for publishing databases?

Reporter

Description

•

12 years ago

The Mozilla Services and B2G teams working on a geolocation research project: https://wiki.mozilla.org/Services/Location We are prototyping a crowd-sourced version of Google's StreetView cars that correlate Wi-Fi access points and cell tower IDs to GPS positions. We would like to publish this geo data (with proper anonymization and obfuscation) so other projects might be able to build new geolocation services that don't depend on proprietary geo services. There are some hobbyist "net stumbler" groups that already collect this data (on a small scale) and publish their geo databases under a few different licenses. We're investigating whether we can merge (technically and license-wise) their data into our geo database. These groups use the following licenses: * CC-BY-SA 3.0 * dual-licensed CC-BY-SA 3.0 and Open Database License (ODbL) v1.0 * GFDL 1. Does Mozilla have a preferred "open data" license we should use when publishing our geo database? 2. What third-party data licenses would be compatible for importing to create a *derived* product? Would we need to silo "their" data from "our" data in the derived product? The following wiki page has links to the third-party data sources we are researching: https://wiki.mozilla.org/Services/Location/Bootstrap

Gervase Markham [:gerv]

Assignee

Comment 1

•

12 years ago

> 1. Does Mozilla have a preferred "open data" license we should use when > publishing our geo database? I don't think there has ever been an internal discussion of this question, so perhaps it's premature to say that there's a _Mozilla_ opinion. I have sympathy with those who say that CC licences are not intended for data. The ODBL is supposed to be the fix for that; however, AIUI it's not a pure licence but a contract (because it has to deal with rights beyond copyright, such as database right in the EU, which have different legal frameworks). Which means it's a bit different in how you use it. The ODBL is also purely a license for the database; you need a separate licence for the content inside it. OSM uses the Database Contents License v1.0: http://opendatacommons.org/licenses/dbcl/1-0/ I think that something with a share-alike requirement would be very wise. There's no risk of the data licences tainting the code itself or the rest of the device, and we really want to avoid lots of little proprietary silos of this data starting to exist. Whereas a single global freely-licensed regularly-updated source of it would be awesome. Whatever you do, you should design your service and database such that data items can have a license identifier associated with them. That's a good future-proofing step. > 2. What third-party data licenses would be compatible for importing to > create a *derived* product that we could republish without restrictions > on attribution/etc? Your "etc." needs expanding for this question to be answerable. What sort of requirements are acceptable and what are not acceptable? What are you trying to avoid? > The following wiki page has links to the third-party data sources we are > researching: > > https://wiki.mozilla.org/Services/Location/Bootstrap How long do you think we would take to blow past the sizes of the *non*-commercial databases in this list, if we deployed this technology? If the answer is "not very long", then that affects how much we should let their license choices determine ours. Gerv

Chris Peterson [:cpeterson]

Reporter

Comment 2

•

12 years ago

(In reply to Gervase Markham [:gerv] from comment #1) > I have sympathy with those who say that CC licences are not intended for > data. The ODBL is supposed to be the fix for that; however, AIUI it's not a > pure licence but a contract (because it has to deal with rights beyond > copyright, such as database right in the EU, which have different legal > frameworks). Which means it's a bit different in how you use it. > > The ODBL is also purely a license for the database; you need a separate > licence for the content inside it. OSM uses the Database Contents License > v1.0: > http://opendatacommons.org/licenses/dbcl/1-0/ The DbCL looks like a good match because we want the share-alike provision and (AFAIK) we won't need to distinguish between the database and contents rights. > > 2. What third-party data licenses would be compatible for importing to > > create a *derived* product that we could republish without restrictions > > on attribution/etc? > > Your "etc." needs expanding for this question to be answerable. What > sort of requirements are acceptable and what are not acceptable? What > are you trying to avoid? We were trying to avoid the "lots of little proprietary data silos" problem you describe above. We might also "remix" the aggregated data into new forms, e.g. trilateration of related measurements or adding new obfuscations to increase privacy. > How long do you think we would take to blow past the sizes of the > *non*-commercial databases in this list, if we deployed this technology? > > If the answer is "not very long", then that affects how much we should > let their license choices determine ours. TBD. Our database is very far behind theirs, but Mozilla has a large community. :)

Gervase Markham [:gerv]

Assignee

Comment 3

•

12 years ago

(In reply to Chris Peterson (:cpeterson) from comment #2) > The DbCL looks like a good match because we want the share-alike provision > and (AFAIK) we won't need to distinguish between the database and contents > rights. Just to be clear: The DbCL does not have a share-alike provision, but the ODBL does. The DbCL is an ultra-permissive licence - "do what you like". The way this works is that if you use the database _as_a_database_ then the share-alike provisions apply (either by licence or contract). But if you just extract a few bits of data, e.g. making map tiles, then the content licence applies. See http://opendatacommons.org/faq/licenses/ - "Why Do You Distinguish Between the “Database” and its “Contents”? This is good for us, because it means (I think) we can calm any fears that people have about using the data in applications, but people who enhance the data itself will be required to share-alike. Note that, sadly, it seems to me that none of the existing databases of this data use ODBL/DbCL. openBMap uses ODBL/CC-BY-SA and the rest just use CC-BY-SA (despite inadvisability of that). CC-BY-SA alone is what OpenStreetMap used to do, until they switched to ODBL/DbCL. (Compatibility with their licence is also a good thing about this choice.) It seems to me that if you put CC-BY-SA data into a database with other similar data, a reasonable interpretation of the SA is that the SA applies to the other data too. So here's what I suggest might work, if you feel you want to use some of that bootstrap data. You should put the database under the ODBL, and have all data submitted to the database *by our apps* be dual-licensed DbCL and CC-BY-SA. Then, you can also import any useful bootstrap data from any of the projects you list which use CC-BY-SA. You should make sure data is source-tagged. Once we are bootstrapped, you can remove the CC-BY-SA-only data (that came from the other projects) and then the whole database can be considered ODBL/DbCL. (At this point you could then, if you wanted, change the data submission terms to be DbCL only.) That, I think, gets us where we want to be while allowing us to bootstrap from others. > > Your "etc." needs expanding for this question to be answerable. What > > sort of requirements are acceptable and what are not acceptable? What > > are you trying to avoid? > > We were trying to avoid the "lots of little proprietary data silos" problem > you describe above. We might also "remix" the aggregated data into new > forms, e.g. trilateration of related measurements or adding new obfuscations > to increase privacy. See above for my solution. Another thing we could try is getting some of these projects to collaborate with us, either by switching their licensing to something compatible, or by dual-licensing (with in-DB tagging), and by getting their collection apps to collect the hashes we need. Have you reached out to any of them yet? Gerv

Chris Peterson [:cpeterson]

Reporter

Comment 4

•

12 years ago

(In reply to Gervase Markham [:gerv] from comment #3) > The way this works is that if you use the database _as_a_database_ then the > share-alike provisions apply (either by licence or contract). But if you > just extract a few bits of data, e.g. making map tiles, then the content > licence applies. See http://opendatacommons.org/faq/licenses/ - "Why Do You > Distinguish Between the “Database” and its “Contents”? This is good for us, > because it means (I think) we can calm any fears that people have about > using the data in applications, but people who enhance the data itself will > be required to share-alike. I'm still confused about the distinction between the database and its contents. Individual WiFi measurements from the database are not very useful in isolation. Most applications or services would want the whole database (or a filtered subset or a "remixed" version) so they could query it for WiFi access points. Is that what "using the database as a database" means? We will probably publish our data in a simple text file format, not an actual MySQL or SQLite database file. Any application or service will need to import our data into their preferred database format to query the data efficiently. Would they still be using "our database as a database"? > So here's what I suggest might work, if you feel you want to use some of > that bootstrap data. You should put the database under the ODBL, and have > all data submitted to the database *by our apps* be dual-licensed DbCL and > CC-BY-SA. Then, you can also import any useful bootstrap data from any of > the projects you list which use CC-BY-SA. You should make sure data is > source-tagged. Once we are bootstrapped, you can remove the CC-BY-SA-only > data (that came from the other projects) and then the whole database can be > considered ODBL/DbCL. (At this point you could then, if you wanted, change > the data submission terms to be DbCL only.) After we remove the third-party CC-BY-SA data, are we allowed to relicense our dual-licensed DbCL/CC-BY-SA data to just DbCL? We wouldn't need to keep making the dual-licensed data available under CC-BY-SA? > Another thing we could try is getting some of these projects to collaborate > with us, either by switching their licensing to something compatible, or by > dual-licensing (with in-DB tagging), and by getting their collection apps to > collect the hashes we need. Have you reached out to any of them yet? We definitely plan to reach out to them, but we haven't yet. We wanted to get our plans (product and license) figured out first.

Gervase Markham [:gerv]

Assignee

Comment 5

•

12 years ago

(In reply to Chris Peterson (:cpeterson) from comment #4) > I'm still confused about the distinction between the database and its > contents. Individual WiFi measurements from the database are not very useful > in isolation. That may indeed be the case for this particular database. > Most applications or services would want the whole database > (or a filtered subset or a "remixed" version) so they could query it for > WiFi access points. Is that what "using the database as a database" means? I perhaps should have said "modify" rather than "use". The OpenStreetMap OdBL FAQ also has more on the difference between a Derivative Database, a Produced Work and a Collective Database: https://wiki.openstreetmap.org/wiki/Legal_FAQ/ODbL "3b. If I have data derived from OSM data, do I have to distribute it? The license does not force you to distribute or make any data available. But if you do distribute or publicly use anything derived from it - a Derivative Database - then the derivative database must be available under the same licence as the OSM data (the Open Database License). You must make the derivative database available on request to anyone who received your data, viewed the work made from it, or used your service. ..." "3c. If I make something with OSM data, do I now have to apply your license to my whole work? No. For example, if you have written a game or published an artistic map which includes OSM data, only the data is covered by the license. This is called a Produced Work." I would say that a location calculated using the data is a Produced Work, but a version of the database with additional signal strength info, or further data points, is a Derivative Database. > We will probably publish our data in a simple text file format, not an > actual MySQL or SQLite database file. Any application or service will need > to import our data into their preferred database format to query the data > efficiently. Would they still be using "our database as a database"? AIUI, a database is a compilation of structured data; the exact form it's distributed in is not important. > After we remove the third-party CC-BY-SA data, are we allowed to relicense > our dual-licensed DbCL/CC-BY-SA data to just DbCL? We wouldn't need to keep > making the dual-licensed data available under CC-BY-SA? Yes. If something is dual-licensed, a distributor has the option of dropping one part of the dual and continuing to use only the other part. Gerv

Chris Peterson [:cpeterson]

Reporter

Comment 6

•

12 years ago

The privacy implications of Wi-Fi measurements may require us to maintain multiple data sets: 1. A "raw database" of the original measurements uploaded by stumblers. 2. A "sanitized database", a filtered subset of the raw database that removes measurements that look bogus (e.g. in the ocean) or might be mobile devices (because they were seen in multiple locations). 3. A "synthesized database" that averages duplicate measurements from the sanitized database into a single latitude/longitude prediction for each access point. For privacy protection, we do not want to publish the raw database. We might also want to limit distribution of the sanitized database to trusted researchers. From my reading of the OSM Legal FAQ, I am unclear whether the sanitized and synthesized databases would be Derivative Databases or Produced Works. If we publish the synthesized database with an ODBL/DbCL license, what are the license implications on the sanitized and raw databases? Our stumbler contract might need to clarify that users' raw measurements they are uploading will not be published as-is, but a Derivative Database will be published as ODBL/DbCL.

Gervase Markham [:gerv]

Assignee

Comment 7

•

12 years ago

cpeterson: why is publishing the raw database more of a risk to people's privacy than publishing the synthesized database? Are the raw measurements more accurate? And given that these devices are broadcasting their location, why is it a privacy issue to make that information available to people who are already in the region and receiving those broadcasts? Who on the privacy team is working with you on this project? Anyway, if you had that scheme, the sanitized and synthesized databases would be Derivative Databases. You are right that if you get people to submit data directly under ODBL/DbCL, then you would need to publish the raw and sanitized databases once you'd published the synthesized one. Therefore, if you end up doing this the way you suggest, you should get people to give you unlimited rights to the data, and in return make a commitment to publish the synthesized database under an open licence. You can explain that this is part of your privacy protection strategy. Gerv

Chris Peterson [:cpeterson]

Reporter

Comment 8

•

12 years ago

(In reply to Gervase Markham [:gerv] from comment #7) > cpeterson: why is publishing the raw database more of a risk to people's > privacy than publishing the synthesized database? Are the raw measurements > more accurate? And given that these devices are broadcasting their location, > why is it a privacy issue to make that information available to people who > are already in the region and receiving those broadcasts? The raw database has timestamps, so stumblers' paths can be recreated. The raw database may also have measurements of mobile devices that slipped through the stumbler's mobile device filters. (The sanitized database can detect moving mobile devices by looking at the entire database.) Regarding devices broadcasting their locations, the problem is that our database (even the synthesized database) would be available to everyone, not just people near the devices. Google's location service prevents people from tracking individual access points by requiring geolocation requests to include 2-3 access points that Google knows are are near each other. This "proves" the requester is actually near the access points. We have some ideas on how we might be able to create an offline version of that scheme, hashing two MAC addresses into one key in our published database. Users would need both MAC addresses to decrypt the access point position. If this crypto scheme is inadequate or infeasible, we can publish just cell tower measurements but make the WiFi measurements available behind a web service (and use the 2-3 access point requirement that Google does). > Who on the privacy team is working with you on this project? I'm working with Alina Hua from the Privacy team and Jishnu Menon from Legal.

Gervase Markham [:gerv]

Assignee

Comment 9

•

12 years ago

Ah, the privacy of the _stumblers_. I hadn't thought of that. What is the privacy problem if the person making a request is not near the access point they are asking about? If I ask "Where is the access point of MAC XX:XX:XX and SSID Foo", whose privacy is compromised if I get an answer? Requiring inclusion of 2-3 access points near each other may be OK in the Valley, but it would significantly degrade the amount of times the service is useful in places where there may only be one local access point. How can we avoid that? Assuming that a restriction is necessary, can we tie it to local cell tower information instead of to other local wifi access points? Also, can you avoid excluding irregularly-moving access points? If I move house, I don't want my access point permanently removed from the database as a valid source. Perhaps we can have some "known static for X amount of time" value, or have an assumption that an access point that was static for > N days, if it moves, will continue to be static at the new location. The data sets that we are thinking of using for bootstrapping don't have any restrictions on who can query them for what (obviously). It would be good if we can publish full data with any kind of access control built into the data, rather than requiring a web service and correlation - because then anyone else can set up a service for their own use, which is surely one major goal of doing this in an open manner with an open licence. Gerv

Hanno Schlichting

Comment 10

•

12 years ago

(In reply to Gervase Markham [:gerv] from comment #9) > What is the privacy problem if the person making a request is not near the > access point they are asking about? If I ask "Where is the access point of > MAC XX:XX:XX and SSID Foo", whose privacy is compromised if I get an answer? There's two problematic use-cases. The first is based on many mobile phones being wifi hotspots for personal internet sharing. As an outsider I can walk by you in a coffee shop and write down your mac/ssid combination. Later on I can then look up your location from the service. If we publish underlying data on a regular basis, I can look up where you went over time. There's two protections against this. One is trying to detect any wifi access points (AP) which are moving constantly and removing them from the data set, in addition to some client side heuristics to detect ad-hoc networks on looking for typical ssid names of phone wifis. The other is the "you need to know two AP's" approach. Even if we don't successfully detect that an AP is moving, the service only tells you the location of one AP combination, if you already know a second nearby AP. If you aren't actually at the location, you won't know a second matching AP. The second use-case is a bit more obscure. Assume Anna and Bob living together and sharing an AP. Bob gets obsessive and threatening and Anna leaves taking the AP with her. Bob did write down the mac/ssid combination of the AP before. Now Anna installs the AP in her new place. She doesn't want to be found by Bob. But Bob is now able to look up the AP location in our service or data. Since the AP is stationary, there's no reason for us to filter it out. But the "you need to know two AP" protection is still effective here. One a meta-level the mac/ssid/location combination is considered personally identifiable information in some countries, very much like public phone or address books. So far we are still allowed to build such a service with an opt-out approach for wifi owners. But if we and the other players aren't careful, some countries might demand from us to switch to an opt-in approach. Realistically that would make in impractical to build this service. > Requiring inclusion of 2-3 access points near each other may be OK in the > Valley, but it would significantly degrade the amount of times the service > is useful in places where there may only be one local access point. How can > we avoid that? Assuming that a restriction is necessary, can we tie it to > local cell tower information instead of to other local wifi access points? We want this service to work for laptops and tablets as well. Most of which don't have cellular connectivity. > Also, can you avoid excluding irregularly-moving access points? If I move > house, I don't want my access point permanently removed from the database as > a valid source. Perhaps we can have some "known static for X amount of time" > value, or have an assumption that an access point that was static for > N > days, if it moves, will continue to be static at the new location. Yes, this is planned. Figuring out the exact thresholds for "AP seen X times, over Y days" is going to take some time. Right now we just internally blacklist all AP's which moved at all, to be on the safe side. > The data sets that we are thinking of using for bootstrapping don't have any > restrictions on who can query them for what (obviously). It would be good if > we can publish full data with any kind of access control built into the > data, rather than requiring a web service and correlation - because then > anyone else can set up a service for their own use, which is surely one > major goal of doing this in an open manner with an open licence. The goal is certainly to publish as much of this data as is possible. But there's real legal and privacy concerns that limit what we can and should do. A lot of the other hobby projects in this space haven't really looked at the privacy implications too closely. And some of them are rather old and were founded before location tracking via wifi's got a lot of attention. Most of the media outcries, data protection agency settlements and law suits over this happened in the last two years and those projects haven't all adjusted yet.

Chris Peterson [:cpeterson]

Reporter

Comment 11

•

12 years ago

(In reply to Hanno Schlichting [:hannosch] from comment #10) > The second use-case is a bit more obscure. Assume Anna and Bob living > together and sharing an AP. Bob gets obsessive and threatening and Anna > leaves taking the AP with her. Bob did write down the mac/ssid combination > of the AP before. Now Anna installs the AP in her new place. She doesn't > want to be found by Bob. But Bob is now able to look up the AP location in > our service or data. Since the AP is stationary, there's no reason for us to > filter it out. But the "you need to know two AP" protection is still > effective here. We also minimize this threat in part because, unlike Google's Location Service, we require both a MAC address and SSID. If Anna changes her Wi-Fi router's SSID, then its hash ID (i.e. SHA1(MAC+SSID)) will change, hiding her new location. Our database only contains hashes, not MAC addresses or SSIDs. In the short term, we could publish only the cell tower measurements. They are not as accurate, but their positions are fixed and they don't have the Wi-Fi privacy problems.

Gervase Markham [:gerv]

Assignee

Comment 12

•

12 years ago

Anyway, we've got off-topic (and the conversation is now continuing in m.d.security, which is great). I think we have an idea of how we want to go forward here. Let me know if you need more from me. Gerv

Jishnu Menon :jishnu

Comment 13

•

12 years ago

Hi Gerv I'm actually thinking we use CC0 since the goal of this database is utilization - can you, me and Chris get on a call? Thanks jishnu

Gervase Markham [:gerv]

Assignee

Comment 14

•

12 years ago

Hey Jishnu, Does this discussion need to happen synchronously? I agree that the goal is for the database to be used. I don't think that this goal obviously means that we have to use do-whatever-you-want licensing. OpenStreetMap is using the licensing scheme I'm proposing (after a great deal of consideration) and their data is widely used. (And our goal for our core source code is to be used, and it is copylefted.) I think that we benefit most, and so do others, if there is one pile of this data rather than a set of silos. A share-alike provision which applies to the _database_ (but not individual data items) achieves that rather well. How familiar are you with the ODbL/DbCL combination? Using CC0 straight would also mean that we would be unable to incorporate data from any of the bootstrapping data sources that Chris mentions. Gerv

Gervase Markham [:gerv]

Assignee

Comment 15

•

12 years ago

(In reply to Gervase Markham [:gerv] from comment #14) > Does this discussion need to happen synchronously? Sorry, that's not clear :-( By that, I meant: "surely it would be OK to continue to have this discussion in this bug, asynchronously?". This leads to a better and more open record of what was said and decided than a phone call. Gerv

Hanno Schlichting

Comment 16

•

12 years ago

Based on the security feedback we got, I think there's very little data we can actually share openly. Or share in a form that allows others to actually use it for anything useful. Most likely just some form of aggregated cell tower locations. But the way we are aggregating the cell information is going to change over time. At first because we make our algorithms smarter and later because we'll switch approaches over to "fingerprinting". Basically instead of figuring out actual cell tower locations, you divide the entire globe into grid cells, and build "signal fingerprints" for each such grid cell. This is the current standard way to approach the problem of signal strengths not actually being co-related to distance to cell towers. It requires a lot more data, so we can only do this later once we got sufficient data. But this leads to a situation where we won't actually produce and share the same kind of cell data. Which means it won't be a dependable data set for anyone else to build services on top of it. With the limited amount of data we can share, you certainly won't be able to actually use the data for the same purpose that we do. All of this leads me to believe, that there is little value in trying to share some of the aggregated data. Instead we might be better off, trying to share all the underlying raw data under contract terms, which limit what you can use the data for and demand certain privacy restrictions. Potentially the content ownership could at some point be transferred to an independent organization, foundation or business alliance. This might not be in-line with what we'd usually do, but it might be the best approach to share as much of this data under the privacy restrictions we face. And actually get others to collaborate on and contribute to the same data set.

Gervase Markham [:gerv]

Assignee

Comment 17

•

12 years ago

Hanno: is "the security feedback" the stuff in the discussion forum, or was there other feedback? Is it published somewhere? Why did you ditch the ESSID hash idea? I think that the database non-vandalism and ability-to-change-your-identifier properties it gives you are worth having. If the hashing is done on the client, surely there's no privacy issue? If you also want raw MACs, perhaps you could submit both. (That might allow brute-forcing of the ESSID for common ESSIDs by anyone who had the raw database, but you aren't sharing the database anyway, any more, and you could promise not to do that.) I think this is valuable enough that if we are saying we are not doing it for legal reasons, I'd like that legal opinion to come from an expert lawyer. Maybe I'm not CCed on the right bugs, but I've not seen that. This situation is interesting enough (re: the privacy problems with aggregate data) that I'd like to blog about it. Is that OK? Just to step back for a moment, what's our end goal here? "A single global shared wireless signal fingerprint database that anyone can contribute to, and anyone can use, at zero cost, to geolocate themselves - but without violating anyone's privacy." Is that it? Gerv

Chris Peterson [:cpeterson]

Reporter

Comment 18

•

12 years ago

(In reply to Gervase Markham [:gerv] from comment #17) > Hanno: is "the security feedback" the stuff in the discussion forum, or was > there other feedback? Is it published somewhere? I believe Hanno is referring to the discussion we had on dev-security: https://groups.google.com/forum/#!topic/mozilla.dev.security/0qT0grS_PIY > Why did you ditch the ESSID hash idea? I think that the database > non-vandalism and ability-to-change-your-identifier properties it gives you > are worth having. If the hashing is done on the client, surely there's no > privacy issue? If you also want raw MACs, perhaps you could submit both. We haven't totally ditched the idea. And the "submit MAC and hash(MAC+SSID)" idea is one we're considering. :) > I think this is valuable enough that if we are saying we are not doing it > for legal reasons, I'd like that legal opinion to come from an expert > lawyer. Maybe I'm not CCed on the right bugs, but I've not seen that. We're finalizing a list of the data collection options that would work for us, then Jishnu will get a legal opinion. > This situation is interesting enough (re: the privacy problems with > aggregate data) that I'd like to blog about it. Is that OK? Sure, but I think we should finish our geolocation privacy FAQ before we go public. Also, Hanno was preparing a blog post specifically about our project. Would it make sense to introduce the project before discussing the privacy of aggregate data in general? > Just to step back for a moment, what's our end goal here? "A single global > shared wireless signal fingerprint database that anyone can contribute to, > and anyone can use, at zero cost, to geolocate themselves - but without > violating anyone's privacy." Is that it? Yes, I think that's a good summary of our goals.

Gervase Markham [:gerv]

Assignee

Comment 19

•

12 years ago

(In reply to Chris Peterson (:cpeterson) from comment #18) > Sure, but I think we should finish our geolocation privacy FAQ before we go > public. Also, Hanno was preparing a blog post specifically about our > project. Would it make sense to introduce the project before discussing the > privacy of aggregate data in general? Sounds good. I'm now reading .geolocation; but please do let me know when you are ready, and I'll riff off your post. Gerv

Chris Peterson [:cpeterson]

Reporter

Comment 20

•

12 years ago

Gerv, we would like to take our geolocation project public very soon. We need to determine whether community data collection is even viable, so we can "fail fast". To expedite our launch, we plan to collect just BSSIDs (and cell data) without promising to publish a WiFi or cell database in the near term. If we decide to publish our location data, we would probably publish "synthesized" location data derived from both our stumbler data and third-party databases (which use CC-BY-SA 3.0 or GNU FDL licenses). * What data license text (if any) do we need to include in the stumbler privacy notice and website privacy FAQ that will allow us to collect stumbler data without publishing it yet (but keeping that option open for the future)?

Flags: needinfo?(gerv)

Gervase Markham [:gerv]

Assignee

Comment 21

•

12 years ago

My advice: I would ask for all rights (a broad grant - work with the Legal team on exact wording), and add a FAQ item which says that you are still working out the thorny licensing and privacy issues here, and the "all rights" is for flexibility so you can implement whatever you come up with. You could note that people retain rights to the data they collect also. Make sure you don't over-set expectations by saying that we currently think it's unlikely we can publish the raw data, but we are still trying to work out how to publish something usable. Ideally, make it so the stumbler can export it in some format, just to make good on that promise. (A CSV data dump is fine.) But this isn't a blocker. Gerv

Flags: needinfo?(gerv)

Hanno Schlichting

Updated

•

12 years ago

Blocks: 862827

Gervase Markham [:gerv]

Assignee

Comment 22

•

12 years ago

Is there more you need from me here? :-) Gerv

Chris Peterson [:cpeterson]

Reporter

Comment 23

•

12 years ago

Not for now. Thanks.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Bugzilla

Does Mozilla have a preferred "open data" license for publishing databases?

Categories

(mozilla.org :: Licensing, task)

Tracking

(Not tracked)

People

(Reporter: cpeterson, Assigned: gerv)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Comment 20

Comment 21

Updated

Comment 22

Comment 23