Closed Bug 1669826 Opened 5 years ago Closed 5 years ago

Observation queues exceed maximum size targets

Categories

(Location :: General, defect)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1669825

People

(Reporter: jwhitlock, Unassigned)

Details

After bug 1613493, MLS observation queues were stable, by setting the sample rate of the largest geolocate API user to 35%. Since then, the queues have occasionally exceeded the target of no more than 10 million observations to process. The standard rate has been lowered, now at 10%.

Some of this may be due to increased usage and more data per day. Some may also be due to the growing database causing operations to slow down (bug 1602958 may address this).

One issue is that processing stops for 10-15 minutes during a production deploy, while the observations continue to accumulate. The processing system has difficulty catching up, even during the overnight slow periods.

The solution has been:

  1. Reduce the sample rate for the large API user to 1%
  2. Wait for the queues to reduce to a low level (10K - 100K each), 1 to 6 hours
  3. Return the sample rate to the original value

One way to automate this would be to add a global sample rate and a maximum queue size. As the total observation backlog approaches the maximum, the global rate can be reduced from 100%, slowing the rate of incoming observations. When the backlog is back under control, the global rate can grow back to 100%, if the async workers can handle the incoming data.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.