This is started again after 10-01-12 SUMO release
There seems to be some cases where an answer gets saved but the question's denormalized fields don't get updated. (I have no idea how this can happen)
Problem still happening.. related explanation -> https://support.mozilla.org/en-US/forums/contributors/708120
I still have no idea how this happens. Making a 1pter to look into it. This does seem infrequent so we shouldn't spend more ~ a half day going down a rat hole.
Grabbing this one to look into on Monday.
The "No Replies" message is displayed if question.num_answers is 0. I think if the question comes from cache, then num_answers could be 0, but question.answers.all() could kick off a query that doesn't come from cache and show the answers in the db and thus you could get two different "understandings" of the state of things. I think the easy fix is to change questions/answers.html to do one of two things: 1. rely on question.num_answers: if num_answers is 0, then don't do a db query and don't show answers 2. rely on question.answers: check question.answers to see if it's 0 in regards to when to show "No replies" I'm inclined to go with the latter. I don't think it'll be any worse performance-wise since we're already doing the query to show the answers.
Hrm... according to the screenshot, the reply was 42 minutes later. So I think that probably nixes the cache theory. I'll look into how an answer could be added without updating num_answers.
I can reproduce the problem (or something that looks like the problem) locally when I have memcache enabled. I create a new question, then I answer it. The problem occurs when the answer gets saved, there's a section of code that does: self.question.num_answers = self.question.answers.count() after the answer has been saved. However, that self.question.answers.count() is returning 0--even though there should be an answer there--so then self.question.num_answers gets a 0, gets saved, and that's how we end up with a mismatch. Three interesting things: 1. There is a test for this already and the test works fine (test_models.py TestAnswer.test_new_answer_updates_question). 2. If I switch to the dummy cache, I can't reproduce it anymore with my steps. 3. If I change the count query to: self.question.num_answers = Answer.objects.filter( question=self.question, upvotes__gte=0).count() that correctly returns the number of answers. The key part is the upvotes__gte part which is totally goofy, but causes the query to get the answer from the db rather than cache. I'd rather not go with that as a solution, though. Is there a way to invalidate the cache or force the queryset to get the results from the db for "get me all answers for question x"?
Switched it to use uncached. That fixes the num_answers mismatch for me. https://github.com/mozilla/kitsune/pull/506
Checked into master in https://github.com/mozilla/kitsune/commit/41a5e7326ba5aa7ecfd1b500eebb4efb039c0608 . I think that change fixes the problem here. If not, we'll go back to the drawing board with new information.
Is there a way to test the fix manually on stage? I haven't been able to repro.
(In reply to Will Kahn-Greene [:willkg] from comment #12) > Checked into master in > https://github.com/mozilla/kitsune/commit/ > 41a5e7326ba5aa7ecfd1b500eebb4efb039c0608 . > > I think that change fixes the problem here. If not, we'll go back to the > drawing board with new information. So... the problem with my steps to reproduce and premise was that it assumed we were caching counts, which turned out to be wrong. Given that, where I was able to trivially reproduce the issue, it's not the case on dev, stage or production. After talking this over with James and Ricky, we decided to leave the fix in because it shouldn't have an adverse effect and if we ever did decide to cache counts, this would be something that would just continue to work without additional changes. The reported issue happens sporadically. It's probably hard to test in production, though we'll find out if the frequency of the issue goes down. It's probably harder to test on staging where we have less activity and data.
I'm closing this as [qa-] due to the difficulty of testing on stage. This will re-open if the problem persists on production after the push.
Possibly the problem continues apparently it was noticed on a recent Sumo Day see for instance /forums/contributors/708120#post-46131
Reopening. Given that this has been worked on several times before over the last year (possibly year and a half), I think it warrants serious effort the next time we look into it since it's hard to reproduce and we've probably exhausted the set of easy causes. So it should probably be a 3 pter.
I've been seeing this alot recently on the forums. Especially over the past two weeks or so.
Landed in prod. I have a good feeling this will fix it: https://github.com/mozilla/kitsune/commit/4fbc42f0ec3fbfc8ecc3ce4cc7ed46e3e59baad0