Closed Bug 1624687 Opened 6 years ago Closed 5 years ago

Investigate unique MLS requests

Categories

(Location :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED MOVED

People

(Reporter: jwhitlock, Assigned: jwhitlock)

References

()

Details

ckolos has a hypothesis that the majority of requests are from static IP addresses with the same data. This would enable some server-side caching, such as a CDN, to reduce the traffic processed by the servers.

Requests and responses vary by:

  • URL, which includes the API key as a URL parameter
  • Client IP address, which is used in GeoIP requests, and
  • POST content, which includes WiFi, cell, and Bluetooth stations.

Work may include:

  • Looking into existing metrics, such as the API user metrics,
  • Analyzing logs to see if uniqueness can be determined,
  • Designing an experiment to measure uniqueness

The locate.user metric measures the unique IP addresses that use a service, to a theoretical 1% accuracy. For March 5, there were 128 million request (graphana) and 500,000 unique IP addresses (graphana). This is an average of 250 requests per IP address.

Possible further analysis:

  • Generate a signature for the request, to identify if the requests are identical
  • Add a log metric for number of requests by that IP addrress, identify the "whale" users

It's not clear if there is an easy win here. CDNs don't traditionally cache POST requests, since they imply changing state. However, it appears most POST requests to MLS have no radio data, so they would be identical to the equivalent GET request with no body data. Further, most CDNs do not vary on IP address, but this is required for effective caching of location requests.

One possible solution would be:

  • Update docs to say "Use GET when only IP is available"
  • Update GeoClue, other libraries to send a GET when only the IP is available
  • Add a CDN that caches empty GET requests varying by IP

I'm setting this P3 because there is definitely staff-only work to analysis usage by IP address, but a lot of the proposed actions would be in the client libraries, and it is unclear if CDNs support the MLS caching use case.

Type: task → enhancement
Priority: -- → P3
Blocks: 1635789

This work would also help determine the impact of implementing location caching in Firefox. My plan is:

  • Generate a signature for the request
  • Use Redis's HyperLogLog to count unique daily requests
  • Report counts in logs
  • Create a Data Studio report to find averages and identify outliers
Assignee: nobody → jwhitlock
Priority: P3 → P1
Depends on: 1641363
No longer blocks: 1635789

We went through a few rounds of review on PR 1170, but we had too many questions about what we should measure for the request body, which is very similar to the question of what should vary to bust a client cache:

  • The raw bytes
  • The JSON with sorted keys
  • The JSON with sorted keys and radio lists
  • The JSON with sorted keys, radio lists, and "noisy" parameters like signal strength removed or normalized

Instead, we focused on the suggested metric from bug 1635789:

How many requests, made from Firefox / with a Firefox API Key return the same result to the same IP? (it would probably be useful to segment this with Desktop vs Android)

We can revisit the idea of a request signature in the future. With the metrics added for bug 1635789, we should be able to estimate the upper limit for repeat requests (api_key_repeat_ip=True && api_repeat_response=True), and decide if we need the more precise number.

Priority: P1 → P3
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → MOVED
You need to log in before you can comment on or make changes to this bug.