Rating System Analysis

AGB

Ramtuathal

Pinside member

I'm new to PS so I'm not sure if this topic has already been beat to death, but I wanted to weigh in on my experience with the rating system and see what other members think about it.

First, I'm grateful PS exists at all. My intention here is not to fruitlessly complain, but to offer constructive feedback on the rating system. I understand that the admins are not earning a large salary for their time and effort, and I appreciate all they've done so far (it seems that Robin is the person responsible for website design and function - including ratings - so thanks for that Robin!). Second, there was a post / poll about 2 years ago concerning the issues I'm going to discuss (Poll: Should criteria for the Top 100 be changed?). I don't mean to just repeat that old thread, but I will make some comments that echo what was discussed there. The following is a list of the issues I've found with the rating system.

1) Six rating buttons: Even numbered evaluations don't allow for the rater to truly rate a game as "average" on any score, even with the weighted sliders. I don't mean average in the statistical sense, I mean "average" as in a game's relative score (which is what the rating system / Top 100 system is really picking up on). That is, we try to think of games objectively when we rate them, as though they were in a vacuum, but one must have played a game with good rules to understand what "decent" or "not so good" rules would be like. Anyway, there are several games that I think of as having rules or layouts etc. that aren't really "decent", but that aren't "not so good" either. They are just "average" compared to other games. I think seven buttons would be a nice feature. What do you think?

2) Ranking systems and ratings: I'm going to be bold and say that there really are no "1" machines, and there are really no "10" machines. Those are idealized numerical barriers that are meant to contain realistic ratings. Are there really any games that are *absolutely* "terrible" or *absolutely* "excellent"? There are some real dogs out there, but when one thinks about the R&D that goes into any machine and the production costs, would an *absolutely* "terrible" game even make it through all the steps that it takes to get a game to the market? The odds of every single feature of the game being terrible are pretty low. On the flipside, a perfect "10" would be a game that a player can't find even the slightest flaw in - a game that makes all other games obsolete. No matter how much a player loves a particular game, again, the odds are very low that a critical eye couldn't find at least one or two flaws in a putative "10."

This was discussed a little in that older thread, but the resolution of having actual "1" or "10" votes seemed to be to throw out the outliers on standard deviation. I think this is a good solution to having too many "1's" and "10's", but there might be other solutions that limit "1's" and "10's" even more (I don't have them, that's why I'm petitioning the members here).

For a *ranking* system like the Top 100, it seems that the best way to input the data is through a Condorcet method rather than through averages of ratings. In a way, that is how the Top 100's are constructed, but a Condorcet system would require raters to rank games in distinct order. However, this would be reinventing the wheel (but being too conservative can also stagnate a website, too). And, the Condorcet system is not amenable to rating more than 20 or so options. It would be unreasonable to expect raters to rank hundreds of machines against each other. Maybe we could take a hint from the ranking systems though. The system could require that there are no ties in a rater's personal rankings. The result would be that raters would be limited to rating only one game a "1" and one game a "10". If two games were rated as a "1"or "10", the system would require that the rater rank the two machines against each other and the rating would adjust accordingly. So, for someone who has MM and TZ both as "10's", the system would prompt them to rank the games against each other. Say the rater puts TZ above MM, then MM's original rating would get decreased by a certain small percentage. Since no two games are exactly alike on a fine-grained analysis, this doesn't seem unreasonable. Would that help ratings tampering? Any other ideas? I'm asking because I've seen way too many "10's" in the ratings (not sure if I've seen any "1's" yet, but I'm sure they're out there).

3) Rating inflation: This is more of a plea to members than a criticism of the site design. It seems to me that if a member's ratings don't span the 3-9 spectrum, they are over- (or under-) inflating their ratings. As an analogy, evaluation inflation is an issue with grading in educational settings (especially in postsecondary ed in the US). Everyone wants A's, and school administrators want their students to appear successful. Getting a B+ used to mean that a student was well above the average C or C+, but now a B+ is viewed as an insult to a student. The problem is, grade inflation destroys the evaluation system because A's aren't worth as much as they were 20 or even 10 years ago. I think that there is some of that going on with the member's ratings on PS too.

Since I'm approaching the rating system as a relative system, there just can't be as many games in the 85th+ percentile as there seem to be. A "9" should mean that game stands out above all the other games *by far*. An "8" or "7" still means that the game is really good, but those games share their really good status with a few other games. A "6" is still *better* than average, meaning it is still a good game, and a "6" game shares its spot with many other games. "5's" are decent but middle-ground games. But when I compare my personal ratings (currently 28 total) with the average ratings on the Top 100, or the Pinside Admin ratings, it seems that my score is well below those other ratings on the same games. However, my games *rank* in roughly the same order as they do on the Top 100. That is, I seem to be in line with the rank of the games compared to others, but my ratings are usually a whole point or more lower than the average rating. So, for example, if I thought that CFTBL should be higher than TSPP on the Top 100, I'd have to change my reasonable rating of CFTBL from 6.9 to something around 8.3+. But I'm not doing that - CFTBL doesn't deserve that high of a rating, but it is still a good game. I understand that there is no set distribution for the ratings of the games, but I just find it hard to believe that there can be so many games that are rated so high.

Part of the problem with the rating inflation might be the way the six rating buttons are worded. The word "good" is pretty vague, so I can see how there might be some confusion about whether or not a member should rate a game's qualities or features as "good." I think a lot of games are good, but my personal rating list shouldn't be decided in the thousandths column of a bunch of "8's" - the games aren't that close to each other in quality and features.

Like I said, I know that the admins work hard on the site, and I really do appreciate Pinside being around. But if I don't say something, the rating system will just remain unbalanced (imo, of course).

tl/dr;
1) Can / should we have an odd number of rating buttons, like seven? Why or why not?
2) What are some ways to eliminate too many "10's" or "1's"? What are some better ways to rank games?
3) What can be done to encourage members to use the full spectrum of the rating system?

#3 8 years ago

AGB

Ramtuathal

Pinside member

Snowyetti, yeah, I know. The tl/dr is pretty much all I wanted to know, but the whole post offers plenty of explanation for why I'm asking those questions. You never know, there might be pinsiders out there who are having trouble getting to sleep.

#9 8 years ago

AGB

Ramtuathal

Pinside member

Oops - I see that a member downvoted my post. Just curious, is that because I went on and on (and on and on), or because this topic is despised, jaded, etc.? Or both?

I was just trying fully explain the issues I found so that we could have an open dialogue about the rating system. Sure, our solutions might not be practical or implementable now, but it doesn't hurt to try to figure out if the site could be improved.

@Law, thanks, that helps me understand how the sample selection isn't random and how the balance of rating options affects the overall rating (in the Top 100, etc.)

@ DefaultGen, I see where you're coming from, but can't the rating system strive to make sense?

Thanks for humoring the newbie.

#15 8 years ago

AGB

Ramtuathal

Pinside member

@ TheLaw
Thanks for letting me know. I get that most people don't want to read that much, and now I know that bringing up ratings is a pointless pursuit.

1

#16 8 years ago

AGB

Ramtuathal

Pinside member

Thanks to everyone for weighing in. I still think that the subjectivity of ratings can be mitigated by good data collection and management, but it's fairly clear that many members don't take the ratings / Top 100 seriously, or at least they take them with a large grain of salt.

#19 8 years ago

AGB

Ramtuathal

Pinside member