Closed Bug 949280 Opened 11 years ago Closed 11 years ago

optional "signature" argument for TCBS model

Categories

(Socorro :: Webapp, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: rhelmer, Assigned: rhelmer)

References

Details

For bug 915373 and bug 913266, it would be really handy to have a way to tell the TCBS model (and underlying mware service) a list of the signatures that should be returned.

The query still needs to run over all signatures in the requested date range though, in order to calculate the relative ranking.
I don't think we want this on second thought - we'll probably get a better cache hit leaving it alone. Will measure and reopen if it seems useful.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
If we genuinely need TCBS *by signature* then I think we should have this functionality. Relying on caching is too weak IMHO. Also, the TCBS takes `days` and a `limit` so what if you do::

 api= TCBS()
 result = api.get(product=foo, version=bar, limit=10, days=100)
 relevant = [x for x in result if x['signature'] == my_signature]

Then you might be missing out on relevant results that are "masked" by the limit if you catch my drift.
(In reply to Peter Bengtsson [:peterbe] from comment #2)
> If we genuinely need TCBS *by signature* then I think we should have this
> functionality. Relying on caching is too weak IMHO.

Agreed, reopening - thanks for digging this up.

> Also, the TCBS takes
> `days` and a `limit` so what if you do::
> 
>  api= TCBS()
>  result = api.get(product=foo, version=bar, limit=10, days=100)
>  relevant = [x for x in result if x['signature'] == my_signature]
> 
> Then you might be missing out on relevant results that are "masked" by the
> limit if you catch my drift.

True.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(In reply to Peter Bengtsson [:peterbe] from comment #2)
> If we genuinely need TCBS *by signature* then I think we should have this
> functionality. Relying on caching is too weak IMHO.

So my thinking behind WONTFIXing this was that we could move the backend service to use matviews instead of doing ranking on-demand like this, since it's slow. This would be per-product+version - we could still do this but I think adding the "signature" dimension into the mix would be prohibitively complex.

I don't know if restricting signatures will actually make the query faster, it's worth measuring as I think it just might.
(In reply to Robert Helmer [:rhelmer] from comment #4)
> (In reply to Peter Bengtsson [:peterbe] from comment #2)
> > If we genuinely need TCBS *by signature* then I think we should have this
> > functionality. Relying on caching is too weak IMHO.
> 
> So my thinking behind WONTFIXing this was that we could move the backend
> service to use matviews instead of doing ranking on-demand like this, since
> it's slow. This would be per-product+version - we could still do this but I
> think adding the "signature" dimension into the mix would be prohibitively
> complex.
> 
Not sure I understand that second sentence. You think adding signature as a optional parameter to tcbs is "prohibitively complex"? Or to add a matview is prohibitively complex?

> I don't know if restricting signatures will actually make the query faster,
> it's worth measuring as I think it just might.

Should be easy to benchmark. You mean, SQL versus being able to memcache all TCBS locallly for each signature. 
However, there's still the point I made above about using the data "correctly" with the limit and stuff.
(In reply to Peter Bengtsson [:peterbe] from comment #5)
> (In reply to Robert Helmer [:rhelmer] from comment #4)
> > (In reply to Peter Bengtsson [:peterbe] from comment #2)
> > > If we genuinely need TCBS *by signature* then I think we should have this
> > > functionality. Relying on caching is too weak IMHO.
> > 
> > So my thinking behind WONTFIXing this was that we could move the backend
> > service to use matviews instead of doing ranking on-demand like this, since
> > it's slow. This would be per-product+version - we could still do this but I
> > think adding the "signature" dimension into the mix would be prohibitively
> > complex.
> > 
> Not sure I understand that second sentence. You think adding signature as a
> optional parameter to tcbs is "prohibitively complex"? Or to add a matview
> is prohibitively complex?

I was thinking of the matview.

> 
> > I don't know if restricting signatures will actually make the query faster,
> > it's worth measuring as I think it just might.
> 
> Should be easy to benchmark. You mean, SQL versus being able to memcache all
> TCBS locallly for each signature. 
> However, there's still the point I made above about using the data
> "correctly" with the limit and stuff.

That makes sense for the current situation, but if you specify a list of signatures then you should only get that number of results back, for each product/version specified, so it should be possible to set it so this won't happen. Setting it to 0 might work to disable it too.
Ugh, so due to the way this query works we cannot restrict by signature using a WHERE clause, since this is where it does the relative ranking: https://github.com/mozilla/socorro/blob/master/socorro/external/postgresql/tcbs.py#L70

Having the service limit the signatures it sends back to a list will require us to do a little less work and is probably worth it, but it's definitely not going to be any significant perf savings, unfortunately.
QA Contact: rhelmer
Depends on: 957463
(In reply to Robert Helmer [:rhelmer] from comment #7)
> Ugh, so due to the way this query works we cannot restrict by signature
> using a WHERE clause, since this is where it does the relative ranking:
> https://github.com/mozilla/socorro/blob/master/socorro/external/postgresql/
> tcbs.py#L70

That's why the idea came up to store the TCBS rank in a matview so that this as well as topchangers could use that instead of recalculating all the time.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #8)
> (In reply to Robert Helmer [:rhelmer] from comment #7)
> > Ugh, so due to the way this query works we cannot restrict by signature
> > using a WHERE clause, since this is where it does the relative ranking:
> > https://github.com/mozilla/socorro/blob/master/socorro/external/postgresql/
> > tcbs.py#L70
> 
> That's why the idea came up to store the TCBS rank in a matview so that this
> as well as topchangers could use that instead of recalculating all the time.

Yes I think we should do this, but it's a larger project.
Blocks: 952246
Assignee: nobody → rhelmer
Target Milestone: --- → 73
Had a chat w/ peterbe and we went full circle on this - due to the way this query works, the best we could do right now is filter the output from the service, we may as well stay the course and do this in django, the marginal savings in data transfer isn't worth complicating the service further.

The longer-term (hopefully q2 goal) vision is to pre-calculate TCBS rankings, probably via matview and stored procedure.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → WONTFIX
QA Contact: rhelmer
You need to log in before you can comment on or make changes to this bug.