As a security precaution, we have turned on the setting "Require API key authentication for API requests" for everyone. If this has broken something, please contact bugzilla-admin@mozilla.org
Last Comment Bug 706948 - In some cases, a question displays "No replies" with a reply below
: In some cases, a question displays "No replies" with a reply below
Status: RESOLVED FIXED
u=contributor c=questions p=1
:
Product: support.mozilla.org
Classification: Other
Component: Questions (show other bugs)
: unspecified
: All All
: P2 normal (vote)
: 2012.8
Assigned To: Ricky Rosario [:rrosario, :r1cky]
:
:
Mentors:
: 707070 717163 736586 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-01 13:12 PST by Ricky Rosario [:rrosario, :r1cky]
Modified: 2012-05-07 18:28 PDT (History)
8 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description User image Ricky Rosario [:rrosario, :r1cky] 2011-12-01 13:12:34 PST
For example:

https://support.mozilla.com/en-US/questions/901169
http://cl.ly/2q2D1Y3q0j3V3x0v063w
Comment 1 User image Swarnava Sengupta (:Swarnava) 2011-12-01 23:59:08 PST
*** Bug 707070 has been marked as a duplicate of this bug. ***
Comment 2 User image Ricky Rosario [:rrosario, :r1cky] 2012-01-11 06:23:37 PST
*** Bug 717163 has been marked as a duplicate of this bug. ***
Comment 3 User image Swarnava Sengupta (:Swarnava) 2012-01-11 06:37:27 PST
This is started again after 10-01-12 SUMO release
Comment 4 User image Ricky Rosario [:rrosario, :r1cky] 2012-01-11 06:39:30 PST
There seems to be some cases where an answer gets saved but the question's denormalized fields don't get updated. (I have no idea how this can happen)
Comment 5 User image Muhammed Hasan 2012-01-28 10:00:06 PST
Problem still happening.. related explanation -> https://support.mozilla.org/en-US/forums/contributors/708120
Comment 6 User image Ricky Rosario [:rrosario, :r1cky] 2012-02-14 07:37:14 PST
I still have no idea how this happens. Making a 1pter to look into it. This does seem infrequent so we shouldn't spend more ~ a half day going down a rat hole.
Comment 7 User image Will Kahn-Greene [:willkg] 2012-02-17 14:49:29 PST
Grabbing this one to look into on Monday.
Comment 8 User image Will Kahn-Greene [:willkg] 2012-02-21 13:49:02 PST
The "No Replies" message is displayed if question.num_answers is 0.

I think if the question comes from cache, then num_answers could be 0, but question.answers.all() could kick off a query that doesn't come from cache and show the answers in the db and thus you could get two different "understandings" of the state of things.

I think the easy fix is to change questions/answers.html to do one of two things:

1. rely on question.num_answers: if num_answers is 0, then don't do a db query and don't show answers

2. rely on question.answers: check question.answers to see if it's 0 in regards to when to show "No replies"


I'm inclined to go with the latter. I don't think it'll be any worse performance-wise since we're already doing the query to show the answers.
Comment 9 User image Will Kahn-Greene [:willkg] 2012-02-21 14:05:38 PST
Hrm... according to the screenshot, the reply was 42 minutes later. So I think that probably nixes the cache theory.

I'll look into how an answer could be added without updating num_answers.
Comment 10 User image Will Kahn-Greene [:willkg] 2012-02-22 07:11:30 PST
I can reproduce the problem (or something that looks like the problem) locally when I have memcache enabled. I create a new question, then I answer it. The problem occurs when the answer gets saved, there's a section of code that does:

   self.question.num_answers = self.question.answers.count()

after the answer has been saved. However, that self.question.answers.count() is returning 0--even though there should be an answer there--so then self.question.num_answers gets a 0, gets saved, and that's how we end up with a mismatch.

Three interesting things:

1. There is a test for this already and the test works fine (test_models.py TestAnswer.test_new_answer_updates_question).

2. If I switch to the dummy cache, I can't reproduce it anymore with my steps.

3. If I change the count query to:

       self.question.num_answers = Answer.objects.filter(
           question=self.question, upvotes__gte=0).count()

   that correctly returns the number of answers. The key part is the upvotes__gte
   part which is totally goofy, but causes the query to get the answer from the db
   rather than cache.


I'd rather not go with that as a solution, though. Is there a way to invalidate the cache or force the queryset to get the results from the db for "get me all answers for question x"?
Comment 11 User image Will Kahn-Greene [:willkg] 2012-02-22 08:38:38 PST
Switched it to use uncached. That fixes the num_answers mismatch for me.

https://github.com/mozilla/kitsune/pull/506
Comment 12 User image Will Kahn-Greene [:willkg] 2012-02-22 09:36:11 PST
Checked into master in https://github.com/mozilla/kitsune/commit/41a5e7326ba5aa7ecfd1b500eebb4efb039c0608 .

I think that change fixes the problem here. If not, we'll go back to the drawing board with new information.
Comment 13 User image Rebecca Billings [:rbillings] 2012-02-29 16:53:21 PST
Is there a way to test the fix manually on stage? I haven't been able to repro.
Comment 14 User image Will Kahn-Greene [:willkg] 2012-03-01 06:06:19 PST
(In reply to Will Kahn-Greene [:willkg] from comment #12)
> Checked into master in
> https://github.com/mozilla/kitsune/commit/
> 41a5e7326ba5aa7ecfd1b500eebb4efb039c0608 .
> 
> I think that change fixes the problem here. If not, we'll go back to the
> drawing board with new information.

So... the problem with my steps to reproduce and premise was that it assumed we were caching counts, which turned out to be wrong. Given that, where I was able to trivially reproduce the issue, it's not the case on dev, stage or production.

After talking this over with James and Ricky, we decided to leave the fix in because it shouldn't have an adverse effect and if we ever did decide to cache counts, this would be something that would just continue to work without additional changes.

The reported issue happens sporadically. It's probably hard to test in production, though we'll find out if the frequency of the issue goes down. It's probably harder to test on staging where we have less activity and data.
Comment 15 User image Rebecca Billings [:rbillings] 2012-03-01 16:30:30 PST
I'm closing this as [qa-] due to the difficulty of testing on stage. This will re-open if the problem persists on production after the push.
Comment 16 User image John Hesling [:John99] (NeedInfo me) 2012-04-21 12:39:14 PDT
Possibly the problem continues apparently it was noticed on a recent Sumo Day see for instance /forums/contributors/708120#post-46131
Comment 17 User image Will Kahn-Greene [:willkg] 2012-04-22 06:48:58 PDT
Reopening.

Given that this has been worked on several times before over the last year (possibly year and a half), I think it warrants serious effort the next time we look into it since it's hard to reproduce and we've probably exhausted the set of easy causes. So it should probably be a 3 pter.
Comment 18 User image Tyler Downer [:Tyler] 2012-04-25 09:16:24 PDT
I've been seeing this alot recently on the forums. Especially over the past two weeks or so.
Comment 19 User image Ricky Rosario [:rrosario, :r1cky] 2012-04-26 11:36:42 PDT
Landed in prod. I have a good feeling this will fix it:

https://github.com/mozilla/kitsune/commit/4fbc42f0ec3fbfc8ecc3ce4cc7ed46e3e59baad0
Comment 20 User image Ricky Rosario [:rrosario, :r1cky] 2012-05-07 18:28:51 PDT
*** Bug 736586 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.