(Topic ID: 133765)

Rating System Analysis

By Ramtuathal

8 years ago


Topic Heartbeat

Topic Stats

  • 25 posts
  • 15 Pinsiders participating
  • Latest reply 8 years ago by L_satan
  • No one calls this topic a favorite

You

Linked Games

No games have been linked to this topic.

    Topic Gallery

    View topic image gallery

    image.jpg
    image.jpg

    #1 8 years ago

    I'm new to PS so I'm not sure if this topic has already been beat to death, but I wanted to weigh in on my experience with the rating system and see what other members think about it.

    First, I'm grateful PS exists at all. My intention here is not to fruitlessly complain, but to offer constructive feedback on the rating system. I understand that the admins are not earning a large salary for their time and effort, and I appreciate all they've done so far (it seems that Robin is the person responsible for website design and function - including ratings - so thanks for that Robin!). Second, there was a post / poll about 2 years ago concerning the issues I'm going to discuss (Poll: Should criteria for the Top 100 be changed?). I don't mean to just repeat that old thread, but I will make some comments that echo what was discussed there. The following is a list of the issues I've found with the rating system.

    1) Six rating buttons: Even numbered evaluations don't allow for the rater to truly rate a game as "average" on any score, even with the weighted sliders. I don't mean average in the statistical sense, I mean "average" as in a game's relative score (which is what the rating system / Top 100 system is really picking up on). That is, we try to think of games objectively when we rate them, as though they were in a vacuum, but one must have played a game with good rules to understand what "decent" or "not so good" rules would be like. Anyway, there are several games that I think of as having rules or layouts etc. that aren't really "decent", but that aren't "not so good" either. They are just "average" compared to other games. I think seven buttons would be a nice feature. What do you think?

    2) Ranking systems and ratings: I'm going to be bold and say that there really are no "1" machines, and there are really no "10" machines. Those are idealized numerical barriers that are meant to contain realistic ratings. Are there really any games that are *absolutely* "terrible" or *absolutely* "excellent"? There are some real dogs out there, but when one thinks about the R&D that goes into any machine and the production costs, would an *absolutely* "terrible" game even make it through all the steps that it takes to get a game to the market? The odds of every single feature of the game being terrible are pretty low. On the flipside, a perfect "10" would be a game that a player can't find even the slightest flaw in - a game that makes all other games obsolete. No matter how much a player loves a particular game, again, the odds are very low that a critical eye couldn't find at least one or two flaws in a putative "10."

    This was discussed a little in that older thread, but the resolution of having actual "1" or "10" votes seemed to be to throw out the outliers on standard deviation. I think this is a good solution to having too many "1's" and "10's", but there might be other solutions that limit "1's" and "10's" even more (I don't have them, that's why I'm petitioning the members here).

    For a *ranking* system like the Top 100, it seems that the best way to input the data is through a Condorcet method rather than through averages of ratings. In a way, that is how the Top 100's are constructed, but a Condorcet system would require raters to rank games in distinct order. However, this would be reinventing the wheel (but being too conservative can also stagnate a website, too). And, the Condorcet system is not amenable to rating more than 20 or so options. It would be unreasonable to expect raters to rank hundreds of machines against each other. Maybe we could take a hint from the ranking systems though. The system could require that there are no ties in a rater's personal rankings. The result would be that raters would be limited to rating only one game a "1" and one game a "10". If two games were rated as a "1"or "10", the system would require that the rater rank the two machines against each other and the rating would adjust accordingly. So, for someone who has MM and TZ both as "10's", the system would prompt them to rank the games against each other. Say the rater puts TZ above MM, then MM's original rating would get decreased by a certain small percentage. Since no two games are exactly alike on a fine-grained analysis, this doesn't seem unreasonable. Would that help ratings tampering? Any other ideas? I'm asking because I've seen way too many "10's" in the ratings (not sure if I've seen any "1's" yet, but I'm sure they're out there).

    3) Rating inflation: This is more of a plea to members than a criticism of the site design. It seems to me that if a member's ratings don't span the 3-9 spectrum, they are over- (or under-) inflating their ratings. As an analogy, evaluation inflation is an issue with grading in educational settings (especially in postsecondary ed in the US). Everyone wants A's, and school administrators want their students to appear successful. Getting a B+ used to mean that a student was well above the average C or C+, but now a B+ is viewed as an insult to a student. The problem is, grade inflation destroys the evaluation system because A's aren't worth as much as they were 20 or even 10 years ago. I think that there is some of that going on with the member's ratings on PS too.

    Since I'm approaching the rating system as a relative system, there just can't be as many games in the 85th+ percentile as there seem to be. A "9" should mean that game stands out above all the other games *by far*. An "8" or "7" still means that the game is really good, but those games share their really good status with a few other games. A "6" is still *better* than average, meaning it is still a good game, and a "6" game shares its spot with many other games. "5's" are decent but middle-ground games. But when I compare my personal ratings (currently 28 total) with the average ratings on the Top 100, or the Pinside Admin ratings, it seems that my score is well below those other ratings on the same games. However, my games *rank* in roughly the same order as they do on the Top 100. That is, I seem to be in line with the rank of the games compared to others, but my ratings are usually a whole point or more lower than the average rating. So, for example, if I thought that CFTBL should be higher than TSPP on the Top 100, I'd have to change my reasonable rating of CFTBL from 6.9 to something around 8.3+. But I'm not doing that - CFTBL doesn't deserve that high of a rating, but it is still a good game. I understand that there is no set distribution for the ratings of the games, but I just find it hard to believe that there can be so many games that are rated so high.

    Part of the problem with the rating inflation might be the way the six rating buttons are worded. The word "good" is pretty vague, so I can see how there might be some confusion about whether or not a member should rate a game's qualities or features as "good." I think a lot of games are good, but my personal rating list shouldn't be decided in the thousandths column of a bunch of "8's" - the games aren't that close to each other in quality and features.

    Like I said, I know that the admins work hard on the site, and I really do appreciate Pinside being around. But if I don't say something, the rating system will just remain unbalanced (imo, of course).

    tl/dr;
    1) Can / should we have an odd number of rating buttons, like seven? Why or why not?
    2) What are some ways to eliminate too many "10's" or "1's"? What are some better ways to rank games?
    3) What can be done to encourage members to use the full spectrum of the rating system?

    #3 8 years ago

    Snowyetti, yeah, I know. The tl/dr is pretty much all I wanted to know, but the whole post offers plenty of explanation for why I'm asking those questions. You never know, there might be pinsiders out there who are having trouble getting to sleep.

    #4 8 years ago

    image.jpgimage.jpg

    #5 8 years ago
    Quoted from Ramtuathal:

    I'm new to PS so I'm not sure if this topic has already been beat to death,

    Welcome, It has been beaten to death again and again and again and again.

    #6 8 years ago

    image.jpgimage.jpg

    #7 8 years ago

    Yes, ratings make no sense. Basically every single game is from a 6 to an 8.5. I'd prefer just rating each game from a 1-10. I have no interest in rating the sound effects vs. the cabinet art and having that affect my overall rating. Demo Man is one of my top games ever but the art brings down the score so really my rankings are meaningless even to me.

    Robin doesn't run Pinside full time and there are enough bugs to fix that I think overhauling the basically working ratings system isn't worth his time.

    #8 8 years ago

    @ Ramtuathal

    I very much agree, but it is what it is. The most you can do is flag ratings that are obviously just there to mess with the aggregate scores, like flat 1s or 10s. My own scores are in the 3.5-9 range but even those are pretty skewed.

    Take the ratings for what they are and consider that the more someone "really likes" a game, the more likely they are to come on here and rate it. That, plus people here like pinball in general and are going to use evaluations like "pretty good" most of the time. There are also categories like rule balance that *most* games should be rated fairly high on unless there are crazy exploits or broken randomness. Those factors could account for a lot of the skew.

    I'd be interested to see what the average position trends over time actually look like, normalized for the number of games included in the ratings system. I feel like that would be a more meaningful gauge of how well liked a game is compared to others and how much ratings are impacted by the release of new games or the hype of old ones.

    #9 8 years ago

    Oops - I see that a member downvoted my post. Just curious, is that because I went on and on (and on and on), or because this topic is despised, jaded, etc.? Or both?

    I was just trying fully explain the issues I found so that we could have an open dialogue about the rating system. Sure, our solutions might not be practical or implementable now, but it doesn't hurt to try to figure out if the site could be improved.

    @Law, thanks, that helps me understand how the sample selection isn't random and how the balance of rating options affects the overall rating (in the Top 100, etc.)

    @ DefaultGen, I see where you're coming from, but can't the rating system strive to make sense?

    Thanks for humoring the newbie.

    #10 8 years ago

    I agree with you, but these ratings are subjective and like anything else subject to fan boy manipulations. I mean when I see MM rated at 1 out of 10 meaning it's in the bottom of all pins ever you know something's not right.

    The best I can hope for is that enough of the stupid good and stupid bad cancel each other out and globally the ratings is semi-ok

    However that doesn't mean i agree with all the ratings but if a game is truly God awful it won't be in the top40 and if a game is really great it won't be out of the top40 (ish)

    I only gave the Gladiator pic thumbs down because he was rating a fight and the rating allows us to all be like Commodous and give a rating on a game.

    I have only rated games I own and give all between 7.7 and 8.9 (best game ever TZ)

    I don't think it's quite fair that people already rated The Hobbit and Magic Girl those pins are not even released yet.

    #11 8 years ago
    Quoted from DefaultGen:

    I have no interest in rating the sound effects vs. the cabinet art and having that affect my overall rating. Demo Man is one of my top games ever but the art brings down the score so really my rankings are meaningless even to me.
    .

    Can't you change the weight of the cabinet and sound effects in your total rating? That way if someone does think the side art is important it can be represented in their score.

    I actually like the scoring system. It states in the section that the scores are not final and you can change them whenever you want. I use that quite often when I score a game off my first impression and then later may find I have a changed my opinion. I just use the ratings for fun. When I want to buy a game, the comments are more meaningful to me.

    #12 8 years ago
    Quoted from dmbjunky:

    Can't you change the weight of the cabinet and sound effects in your total rating? That way if someone does think the side art is important it can be represented in their score.

    But you have to spend all your weight points, so I have points in things like cabinet art, backglass, and DMD animations which are a solid 0 to me. The system is too granular for me. I don't know if sound variety on Star Wars Trilogy is a 3 or a 7, but I know the game overall feels like a 6/10 to me regardless of how you break it down. Even a weighted average of all these categories often results in a score that feels wrong.

    #13 8 years ago

    Pinside game rankings are good for comedy relief only. Unless you need somebody else to tell you how to spend your money.

    #14 8 years ago
    Quoted from Ramtuathal:

    Oops - I see that a member downvoted my post. Just curious, is that because I went on and on (and on and on), or because this topic is despised, jaded, etc.? Or both?

    Both.

    #15 8 years ago

    @ TheLaw
    Thanks for letting me know. I get that most people don't want to read that much, and now I know that bringing up ratings is a pointless pursuit.

    #16 8 years ago

    Thanks to everyone for weighing in. I still think that the subjectivity of ratings can be mitigated by good data collection and management, but it's fairly clear that many members don't take the ratings / Top 100 seriously, or at least they take them with a large grain of salt.

    #17 8 years ago

    I think instead of trying to rank which game is the number 1 and which is number 105 is flawed. Rather, I like the idea of saying this game is an "A" game and this one is a "B" etc. There would be distinct cutoffs for the grades and no-one could see the actual number rating. But I also agree when I say that ratings are subjective and really what matters is do you like to play it or not. POTC is not very highly rated but I enjoy the heck out of it but I am also not an expert. But in many instances I find it a more enjoyable game (FOR ME AT MY CURRENT SKILL LEVEL) than say LOTR. So I think your experience and skill level also help color your impressions of a given game. If you really wanted to get nuts you break out the ratings systems based upon skill level - beginner, intermediate and expert. I am sure that a game like Taxi is totally an "A" for a beginners / intermediates but maybe it is only a "C" for the expert. Sort of like Thomas the Tank Engine is great for a 3 year-old but not for a 12 year-old. Does that make any sense?

    #18 8 years ago
    Quoted from AJB4:

    I think instead of trying to rank which game is the number 1 and which is number 105 is flawed. Rather, I like the idea of saying this game is an "A" game and this one is a "B" etc. There would be distinct cutoffs for the grades and no-one could see the actual number rating. But I also agree when I say that ratings are subjective and really what matters is do you like to play it or not. POTC is not very highly rated but I enjoy the heck out of it but I am also not an expert. But in many instances I find it a more enjoyable game (FOR ME AT MY CURRENT SKILL LEVEL) than say LOTR. So I think your experience and skill level also help color your impressions of a given game. If you really wanted to get nuts you break out the ratings systems based upon skill level - beginner, intermediate and expert. I am sure that a game like Taxi is totally an "A" for a beginners / intermediates but maybe it is only a "C" for the expert. Sort of like Thomas the Tank Engine is great for a 3 year-old but not for a 12 year-old. Does that make any sense?

    Definitely agree. There's a huge shift based on skill level and goals.

    POTC is a great example- it's fun, but the terribly broken single-ball blow it up max jackpot scoring strategy ruins it a bit for me and makes it a huge bear to play in league. Nonlinear scoring is fine, but only having that scoring potential on one ball....

    #19 8 years ago

    @ AJB4
    You raise an excellent point. I've noticed that even in the short time I've been playing pinball (again), my impression of most games has changed as I (slowly) improve. It would be interesting to see if skill level could be incorporated into ratings.

    I also like the idea of grading games, but I wonder if taking away the transparency of the ratings will encourage more wild voting; if a rater wants to make sure a game gets an "A", no need to be nuanced about it - they just give it the highest ratings across the board without worrying that anyone else will know they did so. But still, a good idea if some changes were eventually made to the system.

    #20 8 years ago

    If Robin would let us download the raw data that goes into these rankings, it would be neat to see what people can do for different ranking systems. Personally, I'd like to see the distribution of scores for a game. Being normal vs bi-modal would give a level of detail that the current system lacks.

    #21 8 years ago

    My two cents, I don't think it will make a bit of difference. IPDB has a rating system, though slightly different in rating, many of the games are in comparable placement. Whatever algorithm or rating method one chooses, I predict we would find nearly the same games in the same ranking.

    #22 8 years ago
    Quoted from L_satan:

    My two cents, I don't think it will make a bit of difference. IPDB has a rating system, though slightly different in rating, many of the games are in comparable placement. Whatever algorithm or rating method one chooses, I predict we would find nearly the same games in the same ranking.

    I agree with this... The system is obviously imperfect, but the results - especially for games that get a high vote sample - tend to be pretty accurate... IPDB's list is indeed very close.

    As for the the "Top 10" - I agree there is no real "#1".... And further, I'd argue there are close to 40 games that could sit in the Top 10, depending on what your preferences are.

    #23 8 years ago
    Quoted from lyonsden:

    If Robin would let us download the raw data that goes into these rankings, it would be neat to see what people can do for different ranking systems. Personally, I'd like to see the distribution of scores for a game. Being normal vs bi-modal would give a level of detail that the current system lacks.

    PM him, ask, see what he says. Perhaps he would be willing to offer an anonymized version for research.

    #24 8 years ago

    One well-written, objective review is worth a thousand ratings.

    #25 8 years ago
    Quoted from swampfire:

    One well-written, objective review is worth a thousand ratings.

    Totally Total!

    Reply

    Wanna join the discussion? Please sign in to reply to this topic.

    Hey there! Welcome to Pinside!

    Donate to Pinside

    Great to see you're enjoying Pinside! Did you know Pinside is able to run without any 3rd-party banners or ads, thanks to the support from our visitors? Please consider a donation to Pinside and get anext to your username to show for it! Or better yet, subscribe to Pinside+!


    This page was printed from https://pinside.com/pinball/forum/topic/rating-system-analysis and we tried optimising it for printing. Some page elements may have been deliberately hidden.

    Scan the QR code on the left to jump to the URL this document was printed from.