Helping libmozdata go faster

NEW
Unassigned

Status

()

bugzilla.mozilla.org
API
a year ago
a year ago

People

(Reporter: dylan, Unassigned)

Tracking

(Depends on: 1 bug)

Production

Details

(Reporter)

Description

a year ago
Today libmozdata (or something with that UA) made 14k requests.
I'm not here to complain about that!

I analyzed all of them, and put them into buckets based on average response time (controlled for other concurrent activity, including the user that did a full text search.

I profiled the API calls and compared them against the recent performance work I've done and the good news is most of them will benefit from my changes.

That leaves one particular query that is pretty bad:
One that performs for substring operations against cf_crash_signature.
Because of how substrings work, those perform a full table scan against bugs.*, which is 1.3 million rows. The fact that it only takes a few seconds is a testament to how powerful our DB servers are!

However, I'd like to know more about why that is a substring match vs. something else -- because even when we move to Elasticsearch, searching 1.3 million documents based on a substring is slow. 

What does libmozdata need out of its queries for cf_crash_signature?
Would a whitespace-ignoring but otherwise exact match work?
How important are the symbols? Can we use a tokenized search instead?

This bug should answer those questions.
(Reporter)

Comment 1

a year ago
Can you redirect this needinfo to the appropriate person in Release Management or the equivalent? I don't need a lot of someones' time, just a few minutes to understand this query.
Flags: needinfo?(ehumphries)
Marco, are you the right person to answer this question, or is this for :sylvestre?
Flags: needinfo?(ehumphries) → needinfo?(mcastelluccio)
(In reply to Dylan Hardison [:dylan] (he/him) from comment #0)
> Today libmozdata (or something with that UA) made 14k requests.
> I'm not here to complain about that!

libmozdata is used by several different projects.
We should probably start using a different UA for each one.

> That leaves one particular query that is pretty bad:
> One that performs for substring operations against cf_crash_signature.
> Because of how substrings work, those perform a full table scan against
> bugs.*, which is 1.3 million rows. The fact that it only takes a few seconds
> is a testament to how powerful our DB servers are!
> 
> However, I'd like to know more about why that is a substring match vs.
> something else -- because even when we move to Elasticsearch, searching 1.3
> million documents based on a substring is slow. 
> 
> What does libmozdata need out of its queries for cf_crash_signature?
> Would a whitespace-ignoring but otherwise exact match work?
> How important are the symbols? Can we use a tokenized search instead?

I can answer for https://github.com/mozilla/stab-crashes.

It needs to find the bugs associated with a given signature.
Here's the source code: https://github.com/mozilla/stab-crashes/blob/68e79b43d75c7e291ceb8bb3b75755a17d645fa2/generate-data.py#L169.

For each signature (2000 in total), it is doing a query with four substring operations (because a regexp didn't work):
o1=substring&f1=cf_crash_signature&v1=[@SIGNATURE]
o2=substring&f2=cf_crash_signature&v2=[@ SIGNATURE]
o3=substring&f3=cf_crash_signature&v3=[@SIGNATURE ]
o4=substring&f4=cf_crash_signature&v4=[@ SIGNATURE ]

The fields it is requesting are 'resolution', 'id', 'last_change_time', 'cf_tracking_firefox_XXX', 'cf_status_firefox_XXX'.

We discussed this a bit on IRC in the past, you ended up filing bug 1285998.
Flags: needinfo?(mcastelluccio)
Once bug 1285998 is resolved, does that resolve this bug?
Depends on: 1285998
Flags: needinfo?(mcastelluccio)
(In reply to Emma Humphries ☕️ (she/her) [:emceeaich] (UTC-8) +needinfo me from comment #4)
> Once bug 1285998 is resolved, does that resolve this bug?

I hope so. Once that bug is resolved, we will use the new search feature in libmozdata; then Dylan can look at the logs again and confirm that the number of requests will be way smaller and faster.
Flags: needinfo?(mcastelluccio)
You need to log in before you can comment on or make changes to this bug.