TL;DR: Read Bayesian ranking of items with up and downvotes or 5 star ratings by Jules Jacobs. I’ll get back to you on how to tune the priors and utility values.
Years ago, when I first had the idea that prompted me to register ficfan.org as a placeholder page, I went looking for a mathematical solution to the problem of bias in story ratings built from very small samples.
Given that this explanation of Wilson scores brought it back to mind, I thought I might as well blog about the topic for anyone who wants to know how to calculate average ratings properly.
I’d recently finished a very basic statistics course and, when I stumbled across How Not To Sort By Average Rating by Evan Miller, I recognized the principle of using the lower bound of a confidence interval, but didn’t know how to generalize it to the multivariate data that is an out-of-5 rating and didn’t find any candidates that were as well-suited to use by a stats novice as Miller’s post.
Since then, that very question got asked on Cross Validated (the StackExchange site for statistics), and a good point was made: The “lower bound of a confidence interval” method will seriously under-estimate things with a low number of ratings.
raegtin gives a good theoretical answer for methods of resolving that problem, but I didn’t yet have the much better stats textbook that’s now in my TODO pile and wasn’t in the mood to soldier through the theory on my own, so I kept looking.
Not longer after that answer, Evan Miller came back with a less hacky solution for up/down ratings which makes good reading if you’re trying to learn the theory without a textbook… but still didn’t meet my “I don’t trust my math. Give me something someone else trusts.” needs.
Now, that said, some people do apply the Wilson score to multi-value data by scaling the range down to between 0 and 1, so a middling rating counts as half an up vote. That’s what this MySQL solution and this Node.JS module do) but it has a big flaw. As Apocalisp pointed out when someone else thought of the idea, it leaves you with 300 3-star ratings being equivalent to 100 5-star ratings. He suggests calculating a Wilson confidence interval for each possible score, then working from there (and I’ve seen it suggested elsewhere), but I wouldn’t feel comfortable with that even if his suggestion had been around earlier, because I couldn’t find a detailed breakdown of why it would produce good results.
Ironically, the oldest resource (What is a better way to sort by a 5 star rating? on StackOverflow) is one I found just recently (Google Fu failure!). Despite that, it’s probably the most useful of the StackOverflow answers.
That said, let’s get on to the aforementioned resources it links.
In 2014, Evan Miller came back with “Ranking Items With Star Ratings“, which is a detailed look at the problem and how to solve it… unfortunately, I was sleep deprived when I encountered it and the massive wall of equations which didn’t end in sample code prompted me to shelve it for later. (Ironically, I can’t evaluate it right now either, because today is the one day this week that I slept terribly and I’m too busy to risk delaying this post until I have time.)
Finally, I came across “Bayesian ranking of items with up and downvotes or 5 star ratings” by Jules Jacobs (written in response to Evan Miller’s improved upvote/downvote code) which is in a form my sleep-fogged brain can handle.
It’s a simple, easy-to-understand explanation for people with minimal background in statistics, it guides you through thinking at the problem from the right direction (eg. what does the utility function really mean?), and comes with Python example code for if you really can’t be bothered to do anything more than copy-paste code.
As for making it efficient, the fact that it’s a modified arithmetic mean allows us to calculate it incrementally as long as we store both the score and the total number of votes with enough precision that we don’t have to worry about rounding errors.
- Multiply the average by the total vote count to reverse the final step of the process and produce what I’ll call the “expanded average”.
- Perform the weighting calculations for the new value
- Add it to the expanded average.
- Divide the expanded average by the new total number of votes.
This works because of two properties:
- Addition is commutative, so it doesn’t matter which order you sum together the individual ratings.
- The division and multiplication are symmetric, so
1 + 2+ 3 + 4 + 5and
((1 + 2 + 3 + 4) / 4 * 4) + 5are mathematically equivalent.
It’s not perfect, since you don’t have a nice averaged number in the range from 0 to 5 to display, but it’s definitely a good start if sorting and graphical visualizations are your goal. (Evan Miller’s approach is probably best if you need to display numbers.)
That leaves only one question: How do you tune your priors and utilities? …I’ll get back to you on that one after I have time to research it.