Closed Bug 291413 Opened 20 years ago Closed 16 years ago

Top rated should take into account the number of ratings

Categories

(addons.mozilla.org Graveyard :: Public Pages, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cameron, Assigned: rdoherty)

References

()

Details

Attachments

(5 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.7) Gecko/20050414 Firefox/1.0.3 Build Identifier: The top rated page should take into account the number of ratings recieved for the extention/theme. Items which have 30 5 star ratings should be rated above those with only 1 5 star rating. Reproducible: Always Steps to Reproduce: Actual Results: Currently, they are simply ordered into the number of stars and then alphabetically. Expected Results: Should be ordered by number of stars, then number of votes, then alphabetically.
One way to do this would be to only consider those entries at all in the list that have a certain number of ratings or maybe even better, only consider the top 10% most rated entries. In any case, a list that shows all the entries with only one 5-star rating first as is the case now is simply totally useless.
Status: UNCONFIRMED → NEW
Ever confirmed: true
The "Downloads in last 7 days" is a strong criterion of popularity also. I think that ordering should include it.
There are a couple of other things that came up in this thread in the MZ extension forum: http://forums.mozillazine.org/viewtopic.php?t=255987 For example, ratings can and should change between different versions of the same extension. Maybe it would be good to use the MZ discussion thread to further work on ideas how to make this feature better. But: is there a way for non-Mozilla Foundation emplyees to change anything or create a patch for this at all? Is the code of the UMO CGIs available anywhere?
*** Bug 296599 has been marked as a duplicate of this bug. ***
Looking at http://lxr.mozilla.org/mozilla/source/webtools/update/extensions/showlist.php - it seems there is no field for number of rating, this one is always calculated dynamically. I think this is a case where you have to give up normalization. This field should be there, to be updated whenever a rating is added or removed. Then you could: 1. Remove anything with less than 10 ratings from the "Top Rated" list - listings these is simply pointless 2. Show the number of ratings in *all* listings - otherwise you don't know whether the average rating is actually meaningful 3. Add a new list for extension with most comments (well, this is optional :)
AMO bugspam. Correcting QA contacts on OLD bugs (mozilla.update@update.bugs) -> Correct QA contact (listings@update.bugs) Filtermeplzkthx
QA Contact: mozilla.update → listings
*** Bug 325778 has been marked as a duplicate of this bug. ***
Severity: enhancement → normal
Component: Listings → Maintenance Scripts
QA Contact: listings → maintenance
Target Milestone: 1.0 → ---
Version: unspecified → 2.0
Severity: normal → minor
Target Milestone: --- → 3.x (triaged)
Component: Maintenance Scripts → Public Pages
QA Contact: maintenance → web-ui
Anyone ever plan to do something about this? This bug has been untouched for years. Sorting by rating shows nothing useful and just looks stupid. :|
Well, any suggestion on how to press this into a formula? A very very easy way to handle this (that is frequently employed by other apps that use rating of product or movies or whatever) is a rating threshold. We could just assign a rating of 0 to all add-ons that have fewer than 5 ratings, for example? Of course, that would still put an add-on with 5 x 5 stars before one with 9 x 5 stars and 1 x 1 star. A way to get around this, often used in statistical contexts, is a "truncated mean" (feel free to wikipedia it) that would give a measure more resistant to outliers. We could cut off the upper and lower 10% and require 10 ratings before couting. That, however may be too restrictive to deal with sparse data? For a lot of smaller add-ons, 10 ratings may be hard to receive (add-ons have a "long tail" in that respect). The question is, do people want the "all time favorites" when they click on top rated, or do they want to discover relatively new, but also good add-ons.
I think using some sort of smooth function is the best way to go here so we don't have to have logic deciding where to truncate and we sort simply for everything. Otherwise yes, sorting the bulk of add-ons with a small number of ratings wouldn't be that doable. Any formula also has to rank add-ons with large numbers of ratings above those with smaller numbers. The quick and lazy way to do things would be to just sum up the total ratings and sort by that. (i.e. sort by avg*count) However, while better than the current system this would still have a few problems, as those with high counts could have loads of crappy ratings. The average rating would need more weight. Maybe something looking vaguely like this: sort by: exp(avgRating)*count Making 'avgRating' a truncated average, as suggested, would cut down on some of the volatility of the ranking for newer extensions, if desired. Of course this is just a wild guess and could give awful results. Not sure off the top of my head. ;) This all should spit out a list of "all time favorites" within the selected group, which is what I think most people would expect this to do. Newer popular add-ons already show up (to a degree) when sorting by weekly downloads.
(In reply to comment #10) > Well, any suggestion on how to press this into a formula? A very very easy way > to handle this (that is frequently employed by other apps that use rating of > product or movies or whatever) is a rating threshold. We could just assign a > rating of 0 to all add-ons that have fewer than 5 ratings, for example? > > Of course, that would still put an add-on with 5 x 5 stars before one with 9 x > 5 stars and 1 x 1 star. How about that: ORDER BY IF(totalreviews >= 5, averagerating, 0) DESC, totalreviews DESC
(In reply to comment #12) > How about that: > > ORDER BY IF(totalreviews >= 5, averagerating, 0) DESC, totalreviews DESC Oh yeah, the first part is exactly what I was talking about earlier, and I also like that an add-on that for example gets 4.8 points out of 200 reviews is rated higher than one that gets it out of 5.
Ideally, 3.7 points out of 200 reviews would even be sorted before 3.9 points out of 10 reviews but after 4.8 out of 5 reviews - simply because the reviews have to be weighted less if the number of reviews is low, the average is too unreliable for that. A simple approach would be: ORDER BY averagerating + totalreviews / MAX(totalreviews) DESC That would "increase" the rating of add-ons with many reviews by up to 1. Problem is choosing the constant correctly (does it have to be 1? maybe 0.5 or 1.5 is better), also it is questionable whether you want a linear increase here (maybe SQRT(totalreviews / MAX(totalreviews)) instead which would give less advantage to add-ons with very many reviews). Another way to deal with this is considering standard deviation. In general, you wouldn't want to show add-ons with a zero standard deviation unless they really have a high number of reviews. So you would do something like: ORDER BY IF(ratingdeviation > 0 OR totalreviews >= 10, averagerating, 0) DESC, totalreviews DESC You don't really need to calculate standard deviation here, a boolean flag "had rating different from the average" would be sufficient. But you might still want to do it, this information is useful. E.g. you might want to slightly prefer add-ons with a low standard deviation because the average ratings for those are more reliable. You would do something like: ORDER BY IF(ratingdeviation > 0 OR totalreviews >= 10, averagerating - ratingdeviation / 5, 0) DESC, totalreviews DESC Note that calculating the standard deviation is *not* a problem, it can be calculated incrementally without going through all the individual ratings every time.
Blocks: 443026
I just filed bug 443026 to update the review counts and ratings incrementally when changes occur, and to include this change in it.
Version: 2.0 → 3.2
(In reply to comment #14) > Ideally, 3.7 points out of 200 reviews would even be sorted before 3.9 points > out of 10 reviews but after 4.8 out of 5 reviews - simply because the reviews > have to be weighted less if the number of reviews is low, the average is too > unreliable for that. A simple approach would be: > > ORDER BY averagerating + totalreviews / MAX(totalreviews) DESC > > That would "increase" the rating of add-ons with many reviews by up to 1. > Problem is choosing the constant correctly (does it have to be 1? maybe 0.5 or > 1.5 is better), also it is questionable whether you want a linear increase here > (maybe SQRT(totalreviews / MAX(totalreviews)) instead which would give less > advantage to add-ons with very many reviews). I like the idea of factoring it in like that. I do agree increasing linearly is a bad idea. However, I also dislike just using MAX as it is quite vulnerable to an outlier. In other words: This measure wouldn't help much anymore for distinguishing a 1-review add-on from one that has 10, once any other add-on has accumulated 1000 reviews. > You don't really need to calculate standard deviation here, a boolean flag "had > rating different from the average" would be sufficient. But you might still > want to do it, this information is useful. E.g. you might want to slightly > prefer add-ons with a low standard deviation because the average ratings for > those are more reliable. You would do something like: > > ORDER BY IF(ratingdeviation > 0 OR totalreviews >= 10, averagerating - > ratingdeviation / 5, 0) DESC, totalreviews DESC This would penalize high standard deviations with up to .5 points, correct? Only problem I see here is that an add-on with a deviation of 0 would still be rated 0, even if it consistently received, say, 4 stars, from 20 people. Not likely, but unexpected behavior. > Note that calculating the standard deviation is *not* a problem, it can be > calculated incrementally without going through all the individual ratings every > time. That's correct, it is self-maintainable, which makes it great for incremental updates.
(In reply to comment #16) > > ORDER BY IF(ratingdeviation > 0 OR totalreviews >= 10, averagerating - > > ratingdeviation / 5, 0) DESC, totalreviews DESC > > This would penalize high standard deviations with up to .5 points, correct? Yes, if 0 is a valid rating value then 2.5 should be the maximal possible standard deviation. > Only problem I see here is that an add-on with a deviation of 0 would still be > rated 0, even if it consistently received, say, 4 stars, from 20 people. Not > likely, but unexpected behavior. Yes, not likely. If all opinions on the add-on are identical despite them being many - something is deadly wrong. Only way to get same rating from 20 people is to manipulate your rating IMO.
Target Milestone: 3.x (triaged) → 3.4.6
Assignee: nobody → basil
Blocks: 378657
Target Milestone: 3.4.6 → 3.4.7
Agree it's *highly* unlikely. Perhaps not quite impossible. Lazarus Form Recovery is currently on an unmanipulated 4.9 - 5 stars from 12 of 13 reviews, and the guy who gave us 4 stars promised to give us 5 if we fixed a small bug (which we fixed a month ago...) https://addons.mozilla.org/en-US/firefox/reviews/display/6984
Basil, any progress on this? I need it for bug 446122 which is supposed to be in tomorrow's freeze. Should I push that out?
Target Milestone: 3.4.7 → 3.5.1
It's taken forever to come up with, I'm sorry. Here's my rationale and what I propose - please tell me if it's too crazy or hard to implement. Based on the feedback I've seen so far, popularity should be multifactor-based rather than just last week's download count. I propose that it should include the following: 1) Downloads (DL) for the last week - how many downloaded this past week - a measure of recent interest 2) Active daily users (AU) - how many continue to use it - a measure of long term interest (modulo users that have it disabled or are not actively using it) but it's the best proxy that we currently have 3) Rating Count (RC) - how many have rated it 4) Average Rating (AR) - what was the rating (1-5) 5) Rating engagement (RE) - Do users care enough to rate it? One calculates a popularity index which factors the above into a score that you can sort by when generating a listing by popularity. The frequency of calculating this and keeping it updated is implementation TBD. Popularity = k*DL + m*AU + n*RC + x*AR + y*RE There are several problems with this. 1) The data is not normalized across the population so you can't combine the values in this way. Solution: Normalize based on totals in the AMO population, so this yields. (Note for AR, we need to divide by 5 since the range of values is 1-5) Popularity = k*DL/(AMO DL Total) + m*AU/(AMO AU Total) + n*RC + x*AR/5 + y*RE OK, now that we are normalized, what values should we pick for the weights. 2) I suggest that we consider a weighting where 40% is based on recent interest, 20% on long-term interest, 10% on rating count, 20% on rating and 10% on rating engagement. (These are just gut numbers, I'm willing to be swayed. We also need to see what it does in real-life situations). This yields: Popularity = 0.4*DL/(AMO DL Total) + 0.2*AU/(AMO AU Total) + 0.1*RC + 0.2*AR/5 + 0.1*RE 3) This formula still suffers from the overweighting of review counts described in comment 5 and other comments. So, we simply penalize for having fewer than 10 ratings by not including them. This yields: If RC >= 10 Popularity = 0.4*DL/(AMO DL Total) + 0.2*AU/(AMO AU Total) + 0.1*RC + 0.2*AR/5 + 0.1*RE Else Popularity = 0.4*DL/(AMO DL Total) + 0.2*AU/(AMO AU Total) + 0.4*AR (rationale being that with so few Can everyone critique this algorithm please as I'm in the process of trying this with the current featured list and playing with the weights to see the results. Any ideas on how to best determine what the right weight values should be? I can't currently share the spreadsheet since it contains active user info for add-ons whose authors have opted to not have it be public. Thanks.
I am missing pieces in your explanation. 1) What exactly is RE? 2) How do you normalize RC?
That's a bit complex. Not sure if it's too complex or not, but I agree with the idea that all these sorts of things should be taken into account. Few issues: 1) As stated by Wladimir, RE is not defined; how is this to be calculated? A normalization of RC could automatically integrate this concept. 2) Things would be simpler if the conditional were dropped to only have one expression. I don't think it's needed using this method. With RC<10 the RC component is quite small and can just be ignored. I'm not quite sure I agree with the formulation of AR and RE components into this, as well. 3) Your parenthetical comment after the conditional seems to have been cut off. ;)
Sorry about that I should have reviewed it a bit more carefully (I was cutting and pasting from the spreadsheet). RE is retention engagement defined by RC/AU. I normalize Review Count (RC) by dividing RC by "AMO RC total". So, to correct comment 20, it would be: If RC >= 10 Popularity = 0.4*DL/(AMO DL Total) + 0.2*AU/(AMO AU Total) + 0.1*RC/(AMO RC Total) + 0.2*AR/5 + 0.1*RE Else Popularity = 0.4*DL/(AMO DL Total) + 0.2*AU/(AMO AU Total) + 0.4*AR/5 my parenthetical comment was to say that instead of trying to figure out the STDEV and all the other calcs, avg is fine and wanted to say that for small values of RC, what's most important is AR so we drop the other components and weight AR more heavily.
The obvious issue is that we are likely adding apples and oranges here, despite the normalization. My suspicion is that MAX(DL/(AMO DL Total)), MAX(AU/(AMO AU Total)), MAX(RC/(AMO RC Total)), MAX(RC/AU) are all very different values - so that the weightings in the formula don't have the desired effect.
PS: Seeing histograms for each of the equation parts would certainly help understanding what a combination of those will produce.
It is certainly interesting and useful to come up with the uber-formula that will end all discussion, but in the meantime, would it be possible to just fix the total uselessness of the current situation by e.g. requiring a minimum amount of reviews before including the addon in the list? The minimum number could for the time being be a constant that is determined once based on the current distribution of ratings per addon: Use the highest number that still gives a reasonably long list in the current database. Then, we have some slack time to solve this rather complex issue so that it 1) works and 2) satisfies most commenters here. My comment on using too many factors is that this usually leads to totally unexplainable and weird effects. No ranking will ever be perfect, but if it is based on a more simple formula the ranking will at least be understandable. For example, it easy to see why the current ranking fails badly.
Continuing the progress from comment 24, I built a few histograms and add some min,max,avg, stddev (based on the entire AMO population) and uploaded them to http://people.mozilla.com/~basil/amo/HistogramsForPopularityAlgo.pdf. Any advice about how to normalize/combine these different distributions?
Let's go back to the topic of this bug: Top rated should take into account the number of ratings The underlying problem is here that addons with only a very few ratings get a high average rating (as the first couple of reviewers, could even be authors themselves rate it as 5). Removing these rates from equation is not fair. But we really want to smooth the noice on the average rates when number of rates is low. A very simple way is to add in the overall rate average into the equation to 'smooth' the averages with low number of rates. RATE_SCORE = (SUM(rates of addons)+OVERALL_AVERAGE)/(1+NUMBER_ADDON_RATES). So sum the rates of that addon, add the overall average rate, and divide by number of rates of that addon plus one. See attached diagram on how this would work. Note, rates with score '0' should not be used as these are reviews where the reviewer forgot to set a rate. This formula only uses the rates, and is very simple to implement and is predictable enough for everyone (although one might need to explain that the average rate is corrected for overall rating average).
This example show how addons with one, two or three rates get a slightly lower corrected average rate. An addon with 3 fives get a higher corrected average rate than one with only one rate. Note, the data used for this is random, except for the first two rates which are set to 5 to simulate the rating behaviour on AMO.
To simulate different numbers, just press F9 when you have opened the spreadsheet. Excel (or OO's version) will then regenerate the numbers.
First of all, I would normalize by MAX(x) rather than SUM(x) - that way the range for each of the equation parts is really 0..1 and we can play around with weightings. The histogram for the average rating shows clearly that the average rating is meaningless by itself - there are ~750 extensions with average rating 5, most of those probably have only one rating. I'll have to think some more about how we can combine the values in a meaningful way.
Alfred, thank you for your suggestion but I don't understand the rational for your formula. Ratings are not generally a random distribution. Let S = Sum of the ratings, n=number of ratings, avg rating= S/n Your formula calculates (S+avg)/(n+1) = (S+S/n)/(n+1) = 1/(n+1) * (Sn+S)/n = 1/(n+1) * S(n+1)/n = S/n which is the same as the average. Just to confirm I exported the entire reviews database and there is no difference between the formula score and the average. What am I missing here? (Interestingly the spreadsheet is in Dutch and I changed the function to RANDBETWEEN() to make it work in English Excel.)
The bit you are missig is that the first S is the sum of ratings of only that particular addon, and n the number of rates for that addon, and avg is the average of rates across all addons. So, if an addons (e.g. Starry Skin) has the following rates: 3 5 4 4 4 5, with its average is 4.17 And another 'top' theme as two rates: 5 5, with its average is 5 The formula becomes resp (assuming current average is 3.49) (SUM(3 5 4 4 4 5) + 3.49) / (6 + 1) = 4.07 (SUM(5 5) + 3.49) / (2 + 1) = 4.50 This shows clearly how both averages are corrected by adding the overall average rate. In terms of the formula: (Saddon + avgTotal) / (nAddon + 1) becomes (Saddon + (STotal / nTotal)) / (nAddon + 1)
Alfred, so you are just adding one more "rating" to the list of rating values and calculating the average as usually. The more ratings the extension gets, the closer it will get to the "real" average. This is a perfect way to get lots of bug reports like "AMO's average rating is always a little off". But with the overall rating average being a pretty random value (2.39 right now), I wonder why we would use it here. Why not have a fixed value, e.g. 2.5 or 3? That way we can choose exactly how much we want to penalize extensions with too few ratings.
Using a fixed value (roughly based on the average) such as 3 is fine by me, and would indeed easier to explain to the AMO visitors.
Yes, |rating_average = (SUM(rating) + 3) / (COUNT(rating) + 1)| sounds like a simple and good enough solution for the original problem to me. And we can continue looking into measuring popularity in the best possible way.
I would propose to use this bug for the simple solution, and spin off a separate bug for a 'popularity' calculation.
Could you clarify what you try to achieve here and why and how this formula is supposed to achieve it? I do not think that this formula: rating_average = (SUM(rating) + 3) / (COUNT(rating) + 1) is very helpful to improve the list by ranking considerably. Let n = count(rating) s = sum(rating) Then the old rating is the average rating which is a= s/n The new rating you propose is (s+3)/(n+1) right? So the new rating is s/n+1 + 3/n+1 or expressed in terms of the old average a: new = a * n/(n+1) + 3/(n+1) So you essentially give the old average a penalty that depends on n by the factor n/(n+1) and then add 3/(n+1). However since the impact is multiplicative in a, it is different for low rankings and high rankings which is not very intuitive. To illustrate: An addon with the single ranking of 1 would get score 2 (double the original) while and addon with 100 rankings of 1 would get score ~1.02 An addon with the single ranking of 5 would get score 4 (1 less the original) while and addon with 100 rankings of 5 would be slightly below 5. The new score for addons with old score 3 will never change, no matter how many rankings they get. So no addon with old score 3 will ever be higher ranked than an addon that has just a few rankings that average to slightly above 3. A single 5 ranking would be just as good as two rankings with old average 4.5 However, even an addon with 100 rankings with old average 4.5 would not achieve the score 4.5 that just 3 rankings of 5 achieve (which yields new score 4.5). No number of old average 1 rankings will ever be as good as You can directly derive what-if questions from your formula to see what number of ratings are necessary for the ranking to change, depending on what the old average was. I still believe that the quick best solution (and maybe even the best solution in the long run) would be to simply not including addons in the ranking that do not have a minimum number of ratings. What do you consider bad about that approach?
Quite simple: that will men cutting those addons with just a few rates completely, which is not fair at all. (and if you make it public that it will require 5 ratings before you get into the list, people will ask 5 friends to vote for them, making the rating even more biased). The overall idea is just to soften the averages of addons with a few rates a little bit by pulling to the overall average, so that an addon with just one 5 doesn't get immediately to the top of the list. So, indeed give those addons with just a few high rates a penalty. Same goes of addons with just one rate '1'. Without correction that addons will then move to the bottom of the whole list, below addons which have ten 1's and on 5. In my view 10 1's and one 2 is more clearly less appreciated than the addons with just one rate. So, with my formula the corrected rate of the first will be 2 and the second 1,25. In your example, the addon with one 5 will get a score of 4, but an addon with *hunderd* fives, will get a score of 4.99. Same here, in my view the addon with *hunderd* rate is clearly much higher rated than the first. The discussion on this bug is already very long, and there is no perfect formula here. The overall point is to make the 'Top Rated' list more meaningful by trying to prevent that the first 10 entries are all addons with just one 5, hiding wonderful addons with 100's of rates.
(In reply to comment #38) > So you essentially give the old average a penalty that depends on n by the > factor n/(n+1) and then add 3/(n+1). That's a very complicated way of looking at it. We are simply giving each extension an additional rating value (that's 3 in the formula above but could be something different as well). So each average will be "skewed" towards 3 - and the more ratings an extension gets, the less "skew" will be there because a single rating won't really matter. That means that for extensions with no ratings we just assume that they are average, not too good and not too bad. Once the users start rating the extension, the average will very soon reflect the real opinion users have about the extension. But as long as the ratings are few we are "softening" the very good and very bad ratings.
There are two issues here: Average Rating, and ranking order. One need not determine the other. The rating average shown to developers can still be an exact average, thus defusing Wladimir's concern in #34, and the star rating shown to users can still be what you would get if you tallied up all the ratings and decided anything above 4.0 was 5 stars, but the Ranking order could be determined differently, by one of the methods suggested above. From the public's perspective all will seem reasonably fine just so long as no 4 star extensions end up ranked above of any 5 star extensions. There might be a few nitpickers who notice that some addons with higher averages end up below those with more ratings but lower average ratings, but just so long as there is a "how the order is determimed" info box / link somewhere then I think that would defuse that problem. So we might want to think about ranking within 5 star, showing those first, then ranking within 4 star, showing those next, etc. Just another clause in the ORDER BY, pretty easy to do.
To clarify what I mean by my above comment where we seperate out the average rating from the actual ranking, let me give an example: Addon A has an average rating of 4.2 from 5 reviews Addon B has an average rating of 4.1 from 50 reviews Addon C has an average rating of 3.9 from 500 reviews Say that the rating formula works as it does now, and A, B are 5 star, C is 4 star, and that as is the case now, the developer can see their average rating on their control panel. Addon A's developer would probably notice if their "average rating" was significantly different from 4.2 They probably *wouldn't* notice if they appeared in the rankings below Addon B. They probably *would* notice if they appeared in the rankings below Addon C. Essentially we want an algorithm that is (a) fair and (b) does not generate complaints from confused users and developers. I think this would be best achieved by showing the actual average rating to the developer, making sure that the average rating matches the number of stars shown for the addon, but ranking the addons *within each star grouping* based on a more complicated fomula that takes into account the number of ratings.
It is probably worth considering bug #435263 (what to do about multiple reviews by same author) in combination with this one. My latest suggestion on that bug is to *keep* all the old reviews, but only consider the latest review by any given user when doing the average rating calculation.
While it is nice to search and debate about the perfect formula here with all aspects calculated in, we need to note that there is a problem needs to be solved quickly. Addons that exists for long time, and are appreciated by lots of people do get pushed away by new addons with only one rating (generally from the author or his friends). My proposition is to factor in some weight to the average so that the addons with just a few addons are a bit more averaged. This is a very simple fix and could be applied very quickly, and after that we could debate some more about the perfect solution...
Blocks: 455579
OK, I want to ensure we continue movement on this bug as I would like to see it fixed in the upcoming 4.0.2 release. So, we'll use Alfred's formula and we'll create another bug to help adjust the formula again with other variables (including download counts and active users which I consider critical). I performed the following query on our production database: select sum(rating)/count(id) from reviews ; This gives us the average rating across all add-ons, the value as of 9-16-2008, 12:30pm PST is 3.9223. So, that's the value that we'll smooth to. The adjusted rating will be calculated by: (SUM(ratings) + 3.9223) / (COUNT(ratings) + 1) As described above this has the smoothing feature that this bug was looking for. To verify, I've plotted PDF attachment (http://people.mozilla.com/~basil/amo/AvgRatingDiffChart.pdf, warning very large file) that maps add-ons (with less than 80 reviews). It plots along the x-axis, the number of ratings that an add-on has. Along, the y-axis is the difference between the average rating and the above formula. Notice that for a small number of ratings, the variation is higher but then gets closer to the average rating as the number of ratings grows. Hopefully, this closes the first chapter for modifying the popularity rating. I've created bug 455579 to track progress against the ideal formula for popularity. The only open question in my mind is how often we should be recalc'ing the smoothing function. I'll leave that up to the implementor to decide.
Assignee: basil → nobody
No longer blocks: 446122
Priority: -- → P1
Reassigning to Smita. Smita, looks like you'll have to find and update the code where the average rating is calculated to use this new formula.
Assignee: nobody → smita.periyapatna
Note that ideally this should only affect sorting, not the displayed average rating (which should stay the same as it is now).
Attached patch Patch for Bug 291413 (obsolete) — Splinter Review
Comment on attachment 342314 [details] [diff] [review] Patch for Bug 291413 This does was it's supposed to do, but I'm not able to apply this patch. It looks like it fails because the file hasn't been updated since the patch for bug 435263 was committed. I think the sql query will need to be updated to take into account multiple ratings and only count the latest one when computing the values.
Attachment #342314 - Flags: review-
Checked out the latest version and updated the file after reviewers comments.
Attachment #342314 - Attachment is obsolete: true
Comment on attachment 342660 [details] [diff] [review] Patch for bug 291413 after review comments This patch looks like it would work, but I can't get it to apply. I get a 'reversed or previously applied patch detected' error. Did you svn up before creating the patch? I would recommend backing up your work, reverting maintenance.php, svn up'ing and adding your changes back in before running svn diff.
Attachment #342660 - Flags: review-
Attached patch Patch for 291413Splinter Review
I did checkout the latest version from svn the last time also,but not sure why it failed.Again after getting the latest version from svn I have made changes.
Attachment #342893 - Flags: review?(rdoherty)
My patch for bug 435263 had some changes in maintenance.php for calculating average ratings - might that have caused an issue here?
(In reply to comment #53) > My patch for bug 435263 had some changes in maintenance.php for calculating > average ratings - might that have caused an issue here? Works now, not sure why the patch was broken previously. Seemed like some of the earlier changes made in bug 435263 were in the patch.
Attachment #342893 - Flags: review?(rdoherty) → review+
Comment on attachment 342893 [details] [diff] [review] Patch for 291413 Looks good and works well. Good work!
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
I'd still like to test this before we tag and push tonight (!); I have the formula from comment 45, but still don't have a good testcase. I have access to my test add-ons (some with 0 ratings), so if there's a way I could test this, please let me know, thanks!
Stephen, let's work on this now.
Verified FIXED; Ryan and I worked on his khan instance with my test add-on (7431, same add-on #, since it was a prod dump), and gave it a rating of 1, which, when run through the algorithm, came out to be 2.5, and was subsequently rounded up to 3 stars. Then, I modified the rating of 1 to be 5, and the rating changed internally to be 4.5, and was rounded up to 5 stars. (After each rating change, he re-ran maintenance.php to recalculate the ratings.) Big props to Ryan for helping me verify this...
Status: RESOLVED → VERIFIED
Has this patch been put on preview or production? I still see add-ons with a single 5 star review at the top for both lists. https://addons.mozilla.org/en-US/firefox/browse/type:1/cat:all?show=20&sort=rated https://preview.addons.mozilla.org/en-US/firefox/browse/type:1/cat:all?show=20&sort=rated
It's on preview, not quite sure about prod. One 5 star review averaged with the mean (~4) will give you 4.5, which rounds to 5. The algorithm is running correctly, we tested it with a prod db dump in a test environment. So, the new algorithm won't change the sort, it just slightly changes add-ons review average with few reviews.
So you are sorting by rounded rating averages? This sort of defeats the whole purpose of this patch, don't you think?
(In reply to comment #62) > So you are sorting by rounded rating averages? Yeah I don't think it's actually rounding.
Problem is, what I see on preview.addons.mozilla.org doesn't make sense if this bug was fixed. It is still mostly add-ons with 1 review and 0 weekly downloads that appear at the top.
On the preview page, the list of all firefox addons (the default, just sorted by rating) shows on the first page 4 with two ratings and the rest (16) all with only one rating: https://preview.addons.mozilla.org/en-US/firefox/search?q=&cat=all&as=true&vfuz=true&appid=1&lver=1.0&hver=1.0&atype=0&pid=0&lup=&pp=20&sort=averagerating
This still doesn't work, see comments above - reopening.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
I agree that the algorithm supplied by Basil doesn't fix the core issue/feature request. I think Seth's ideas in comment 41 and comment 42 are a good idea. Sorting within individual average ratings (1,2,3,4,5) seems fine by me.
I think that the algorith that has been added works by itself. However the problem is with the step that follows the avg_rate calculation: 336 while ($row = mysql_fetch_array($rating_result)) { 337 $_avg = round($row['avg'],1); 338 if ($db->query("UPDATE addons SET averagerating = '{$_avg}' WHERE id = {$row['addon_id']}", true)) { Here the avg rate that has been calculated is rounded, so to 0,1,2,3,4, or 5. And then there is no sorting anymore in each level. So, addons with 2 rates of 5 can still end up in level 5, and even above addons with 500 rates of 5! So, within the rounded rates, sorting is also needed. I checked the first 8 themes in the 'themes sorted by rating list': https://addons.mozilla.org/en-US/firefox/browse/type:2/cat:all?show=20&sort=rated the result is: Assuming overall avg. of 3,6: Sum Count OrbitGreyCustom1.0.5 717 5 1 4,3 NoGoTheme1.0.1 3860 5 1 4,3 CrystalFox RoundStyle 1.1 4519 6 2 3,2 Aquamarine 1.1.1 4198 45 9 4,86 Foxkeh Theme 3.0.0 5198 909 188 4,828571429 Kempelton Large 2.0.3 5448 51 12 4,2 Scribblies Brite 2.9.1 303 102 21 4,8 glowyblue 3.1.2.4 8889 92 19 4,78 So, themes with average rates of 4,3 and even 3,2 get sorted above themes with average rates of 4,86, 4,82 and 4,78! Note that all addons with averagerate above4,7 are those with at least 45 reviews... The easiest way to fix would be to remove the 'round()' here from maintenance.php, so that the 'averagerating' in the addons is not rounded anymore, but contains the 'real' averagerating. Only where this number is translated in number of stars, rounding is needed. In this way sorting on rating works on the 'real' averagerating, so that addons with averagerate 4.78 do end above 4.3...
Target Milestone: 4.0.2 → 5.0.1
Ryan/Smita - is this close to being done? What is the ETA for this bug?
(In reply to comment #69) > Ryan/Smita - is this close to being done? What is the ETA for this bug? Smita is on vacation right now and working on getting a dev env setup, but I think the bigger problem is the average rating algorithm/sorting issue. The 'new' algorithm in comment 68 really won't fix the issue. Addons with fewer reviews could still show up higher in listings. I think we should just sort by rating AND number of reviews. For example, 3 addons with an average rating of 4 stars, but each has 1, 5 and 10 ratings would sort like this: Addon 1 - 4 stars - 10 reviews Addon 2 - 4 stars - 5 reviews Addon 3 - 4 stars - 1 review Addon 4 - 3 stars - 4 reviews, etc The fix for this shouldn't be too much work. I don't see a deadline for 5.0.1 on the wiki though. My initial estimate is by the end of next week.
(In reply to comment #70) > I think we should just sort by rating AND number of reviews. For example, 3 > addons with an average rating of 4 stars, but each has 1, 5 and 10 ratings > would sort like this: > > Addon 1 - 4 stars - 10 reviews > Addon 2 - 4 stars - 5 reviews > Addon 3 - 4 stars - 1 review > Addon 4 - 3 stars - 4 reviews, etc > > The fix for this shouldn't be too much work. I don't see a deadline for 5.0.1 > on the wiki though. My initial estimate is by the end of next week. This may not fully do everything intended in some of these discussions, namely still leaving for the possibility that a 5 star w/ 1 review will show above a 4 star w/ 100 reviews. However the current sorting as implemented is useless. Just switching to something simple like this sounds very good. The less than perfect handling farther down the list is something most users probably wouldn't even care about, as I would think not too many people are going to browse through the entire thing in order.
Maybe you want to re-read my original comment (#1, #3, #26) Users who want to browse the list by rating are interested essentially only in the top rated addons, not those that get bad ratings. More importantly, they are only interested in addons that have reliable ratings that somewhat reflect overall opinion, i.e. have a certain minimum number of ratings. No matter how you do this -- as long as you do not filter out addons that have not enough ratings, these addons will show up in the list and show up at ranks where they make it extremely hard to browse the interesting ones. Even if you sort all the 4-star rated addons with only one review at the bottom of all 4-star, you will end up with an endless list of addons with just one rating before you get to the 3 star ratings with a reliable rating. It is hard to imagine why anyone would want that or how that would be user friendly. I therefore repeat my suggestion: solve this by simply including only addons that have a minimum number of ratings and then sort by the exact, uncorrected average rating. The minimum number could be determined statically from a few experiments or dynamically so that there is a certain minimum number of addons per rating. I also think that explaning the list to the user is much easier with this approach: "addons with a minimum of N ratings, sorted by rating" is much easier to understand than any of the weighting schemes or other methods that try to include all of the addons.
Nick, thoughts? I think a minimum number of ratings or a Bayesian sorting by rank (http://www.stat.columbia.edu/~cook/movabletype/archives/2007/03/bayesian-sortin.html) would work. I do predict developer outcry though. The main issue would be defining the minimum number of reviews. (First or second standard deviation?)
Why do you predict developer outcry? I think if we communicate our reasoning well, nobody can object to such a rating (unless they disagree that 1x5 stars should be *under* 10x4). Good article you linked to there, Ryan. Also make sure to read the comment posted there on BeerAdvocate's rating. Maybe that's something for us?
Reading further in that comment thread, it appears that many sites including Beer Advocate and Rotten Tomatoes use that formula, and they tend to credit IMDB as the first to do so. So I think if you had a page/callout titled "How are the Addons Ranked" that said something like "Only Addons with more than X ratings appear in the list and they are ranked according to this formula [...] which is the same formula used by IMDB, Rotten Tomatoes, and Beer Advocate. The formula is designed to take into account the number of ratings as well as the average of the ratings, in order that an addon with 10 five star ratings and 2 four star will rank above an addon with 1 five star rating." I would tend to think that because there are *so many* addons in each category, that Johann is correct in comment #72 when he says "Users who want to browse the list by rating are interested essentially only in the top rated addons, not those that get bad ratings.", and I think in most categories they might not even look down as far as the 4 star addons, hence rendering my concern in comment #41 about what happens when a 4 star ranks above a 5 star less important. Developers would be more likely to notice such a discrepancy when they look up how they compare with their peers, but they would be more likely to click on a link about how the ranking worked and more likely to understand the explanation and be satisfied by it. I also agree with Johann that "explaning the list to the user is much easier with this approach: 'addons with a minimum of N ratings, sorted by rating'" but I would say that 99% of the users are not going to care how the addons are ranked, and the 1% that do are probably a bit smarter than average and willing to put up with a more complicated explanation.
And please remove the rounding that also causes randomization of rates...
And I do care about how addons are ranked, as my themes do have many high scores, but still are now visible on the first page of top rated themes, because of the many 1 rate items, and the faulty rounding of average ratings... So, in terms of 'developer outcry': "I have been working on my themes since Mozilla M10, about 10 years now, and still my themes are not in the toprated list, because of this rating sorting problems and discussions. I really feel that I should stop my efforts in supporting my themes".
Note that the fix proposed my me and partly implemented is the 'frequentist mean' as we are adding in the overall average. As described before, the problem is now with the 'round(avg)' that removes the accuracy that we are looking for in sorting all those extensions that in the range of 4 to 5. The 'frequentist mean' as described in the article of #73 is the one that we are looking for, but it is just poorly implemented: rounding and others calculation issues... Note, requesting for at least n rates before you can become in the list, will only force people to 'fake' n rates...
I don't have any experience with this code or the db schema, so apologies in advance if I'm not understanding something. Looking at r19035 and other code in maintenance.php, I think that this line from r19035: $avg_sql = "SELECT SUM(rating)/count(rating) as avg from reviews"; should be: $avg_sql = "SELECT SUM(rating)/count(rating) as avg from reviews WHERE rating > 0 AND reply_to IS NULL"; That is, code a few lines down from it seems to indicate there are reviews with ratings that are 0 and reply_to that is not NULL which should be ignored. So, it would make sense to ignore it in the average that is used for smoothing. However, maybe it doesn't make a difference other than smoothing down further, thus making it a little bit harder to have a high average rating. (Maybe that's intentional.)
Right now it looks like a quick fix would be to simply not round the averaged value.
Assignee: smita.periyapatna → rdoherty
(In reply to comment #81) > Right now it looks like a quick fix would be to simply not round the averaged > value. Arg, committed before finishing comment. So we can not round the average rating and fix the total average to not include ratings of 0. This would be a quick fix and we can figure out the details for implementing a new, more complicated algorithm.
If it doesn't take long to try out- I think that's fine. out of 83K reviews in the system only 1K are zeros. Incorporating them with a complex algo wouldn't be that different in the end than just ignoring the zeroes.
Here's the distribution: Why do we have null ratings in the DB? count(id) rating 3043 NULL 1361 0 7010 1 3787 2 5872 3 16863 4 44999 5
(In reply to comment #84) > Why do we have null ratings in the DB? Just a guess - developer replies?
ah, good one. Anyway, I could do deeper analysis but i'm going to go out on a limb and say that ignoring 0 ratings will only affect very bad add-ons who already have a low average rating (i.e. change the average from 0 to 1). These add-ons are usually poorly rated for obvious reasons.
Hmmm, I think *not* ignoring 0 ratings will have an adverse affect on some good add-ons, so yes, they should be ignored. I'm not sure if it's *possible* to zero rate an add-on anymore, but in the past the only reason I've seen it done was when someone wanted to leave a comment about the add-on *without rating it* - therefore they weren't actually trying to say it was *bad*, therefore including that 0 in the average would unfairly drag down the average for the addon. eg here are the first two reviews of Lazarus Form Recovery, from https://addons.mozilla.org/en-US/firefox/reviews/display/6984?show=20&page=2 -------- Not yet rated by thebigabdo on April 10, 2008 oh man,that happen to me all the time, thank you for this addons Not yet rated by coolare1 on April 10, 2008 this looks like something everyone could use!! i will test it out and come back to rate it. ------- Now I'm not sure if in the database those two ratings are zeros or nulls, but either way they shouldn't be allowed to affect the average score for the add-on.
To summarize: * Leave out the 0 and NULL rates from total average * Leave out the 0 and NULL rates from extension average * Remove the rounding. This should make the average calculation overall more stable and predictable.
Perhaps replace all 0 ratings with NULL?
(In reply to comment #88) > To summarize: > * Leave out the 0 and NULL rates from total average > * Leave out the 0 and NULL rates from extension average > * Remove the rounding. > > This should make the average calculation overall more stable and predictable. 0 and NULLs were already ignored in extension averages and the rounding was to the nearest tenth, not whole number. I'm going to change the averaging to hundredths, but it won't make a huge difference.
Here's the patch that ignores 0 and NULL ratings and averages the rating to 2 decimal places.
Attachment #358044 - Flags: review?(fwenzel)
Comment on attachment 358044 [details] [diff] [review] Patch for 0 and NULL values and rounding Looks good.
Attachment #358044 - Flags: review?(fwenzel) → review+
Target Milestone: 5.0.1 → 5.0.2
I've done some research here and it seems like a bayesian rating system might address this. Found this here: http://www.thebroth.com/blog/118/bayesian-rating br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes) Legend: * avg_num_votes: The average number of votes of all items that have num_votes>0 * avg_rating: The average rating of each item (again, of those that have num_votes>0) * this_num_votes: number of votes for this item * this_rating: the rating of this item The problem with a BR rating is that it won't output a rating that is obviously calculable from the ratings. For instance if our average site wide rating is 4.2, an add-on with 1 rating of 5 will still get a 4.2. I think what we should do is continue with Ryan's patch, and add another bug to create a BR score for each add-on that isn't necessarily exposed that we can use to determine sort order.
Blocks: 477343
(In reply to comment #94) > I think what we should > do is continue with Ryan's patch, and add another bug to create a BR score for > each add-on that isn't necessarily exposed that we can use to determine sort > order. I've committed my patch for removing ratings with a 0 and rounding to the hundredths. I'm not sure what else is required for this bug if we've created another bug for the new algorithm.
Marking as fixed, bug 477343 tracks the algorithm implementation.
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
I am not really sure that this now works as designed: When I run on March 7, 2009: https://addons.mozilla.org/en-US/firefox/browse/type:2/cat:all?sort=rated The first three results are: No Go Theme: 1 review with one 5 rating. Orbit Grey Custom: 1 review with one 5 rating. CrystalFox RoundStyle: 3 reviews: one 1 rating, one 5 rating, and one without: It is strange that these three end up on top of the list. Reopening...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Another example: Using https://addons.mozilla.org/en-US/firefox/search?q=&cat=2,0&as=true&vfuz=true&appid=1&lver=3.0&hver=any&atype=2&pid=0&lup=&pp=20&sort=averagerating&show=20&page=3 Walnut2 appears below Strata Bauhaus: Walnut2 has 11 reviews of which 10 at 5 and one 4, averaging as 4.91 Strata Bauhaus has 11 review of which 1 at 2, 1 at 3, 2 at 4, 7 at 4 averaging=4.36 There is something completely wrong here, even if Bayesian rating is used there is no way that Walnut2 should end up below Strata...
This bug is fixed and scheduled for release 3/12.
Status: REOPENED → RESOLVED
Closed: 17 years ago16 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: