PCS/Reputation Judging

moriel · Nov 15, 2017

Sam-Skwantch said:
That will not be reasonable unless we have the same judges at each event. Until then you have to accept that judges use descretion. Plain and simple that is the nature and intent of judging. The judges in essence have replaced the "score board"

Technically, if they are to follow the rules, the scores *should* be comparable between competitions even with different panels.

CanadianSkaterGuy · Nov 15, 2017

moriel said:
Technically, if they are to follow the rules, the scores *should* be comparable between competitions even with different panels.

That's a big if.

I still say the computer should drop the 2 lowest and 2 highest scores when it comes to GOE/PCS for a panel of 9 judges (and drop 1 low 1 high for a panel of 5 or 7). Yes, it will make some judges irrelevant (especially if one judge is harsher or more lenient than the others), but at least it will skew the mean GOE and PCS less severely.

Maybe if judges/federations realized that biased judging would be rendered useless they would be less inclined to have them skew scores.

Ending anonymous judging was supposed to get rid of judging bias. It's not done much. The recommendation to not give 10's with a fall. It's not done much. The panel that's supposed to review judges who stray too far from the mean... any updates on that? Probably not much.

The only way to reduce judging biased is to reduce the power/influence of each individual judge, which means cancelling out more skewed marks, and tracking which judges continually have their marks thrown out and having repercussions for overscoring.

Mathman · Nov 15, 2017

gkelly said:
Here are a couple of pre-IJS free skates that haven't inspired a lot of fan analysis in the past, so even if you have seen them before you should be able to watch them with relatively fresh eyes and few preconceptions:

https://www.youtube.com/watch?v=72jCfd8UlRc
https://www.youtube.com/watch?v=pmuArdNNsVk

Choose one or more or all of the IJS program components as currently defined.

A prediction. No one will take you up on this, which is kind of interesting in its own right. :yes:

Here is my impression (I do not aspire to judge

). Claudia Leistner. I liked the verve and ta-da of the first part and I thought that she used the music well. The two slow parts in the middle were not very interesting tp me. (Scott Hamilton mentioned that her "languid" style did not capture the judges' attention, and I agree.) IMHO she did not move with much grace or purpose in these sections. The last part was better largely because of the footwork section, although not as compelling as the beginning.

Filipowski's perfomance held my attention throughout. He did not do much more than skate from one element to the next, but the total effect was of a coherent program. He skated to the character of the music and also hit the musical beats well. The "second mark" was greatly enhanced by the excellence of the jumps. The program built to a satisfying climax. The brief slow sections were largely throwaways, but OK.

(How'd I do?

)

rabbit1234 · Nov 15, 2017

Sam-Skwantch said:
That will not be reasonable unless we have the same judges at each event. Until then you have to accept that judges use descretion. Plain and simple that is the nature and intent of judging. The judges in essence have replaced the "score board"

If so, GPF's selection criteria will also be unfair. Total score will also be compared.
Also, Highest Score, score ranking and PB are meaningless.

In order to make the judgment of judges fair, I think that the ISU should set more detailed standards and should minimize the variance due to discretion of the judges as much as possible.
In particular, it should be clear whether fall affects PCS.
I think that is the cause of the disorder of discussion like this one.

Mathman · Nov 15, 2017

rabbit1234 said:
In order to make the judgment of judges fair, I think that the ISU should set more detailed standards and should minimize the variance due to discretion of the judges as much as possible.

They're trying, they're trying! This was the whole intent of the CoP judging system in the first place. Every year the ISU adds on a few more rules, comes up with some extra words to "clarify" the existing rules, tweaks this and that, holds more judges' training seminars, revisits the role of the Judges' Oversight Committee.

Who knows? The ISU might even read Internet skating forums to get new ideas for improvements.

gkelly · Nov 15, 2017

YesWay said:
I don't exactly know what they discuss at those meetings, but if it's about how the judging, tech panel calls etc went for the competition, then I'd like to see videos of those meetings made public. Or even just transcripts of what was said if they think it has to be anonymous.

I'd even be satisfied with the referee giving a summary of what was discussed so judges could feel free to speak freely during the discussion. (E.g., if someone says "The ice coverage sucked," the referee could paraphrase to more press conference-friendly language).

Neenah16 said:
Are there any translations of her commentaries? I would love to see them if possible

Me too.

moriel said:
See Kolyada's SP at Rostelecom and at CoC. See Kostner's FS at Rostelecom and at NHK.

Thanks. I'll check them out later when I'm not at work.

That is not how it works (and i speak here as a statistician, not an FS fan).
Suppose that we have a way to measure exactly the real score for, lets say, SS and IN of a specific skater. Lets call those
values SS_R and IN_R.

Now, we have a pool of trained judges, and each of them produces their own estimate of those values. Those are the scores the judges give. Of course, the score each judge gives will not necessarily be equal to SS_R and IN_R. The assumption current skating system makes is that, if we have a very huge number of judges, the average of the scores they give will be very close to SS_R and IN_R. Lets call this the estimated score, SS_E and IN_E.

There is no such thing as a "real score" that exists prior to human evaluation. There is only the consensus of multiple experts who each offer their own score based on their own perceptions of the actual skating and their understandings of the scoring guidelines.

There is no such thing as SS_R, only SS_E. Or maybe we should call it SS_C for consensus. We could compare each individual judges to the composite consensus of the panel. We could even get more experts, including non-judge skating experts such as coaches, commentators, or other retired competitors, and non-skating experts with significant expertise in evaluating other performing arts, to contribute to that consensus -- but it had better be representative of a full range of nationalities and other allegiances, not just representatives of the silver medalist federation or fans of that style of skating.

Which is why it's meaningless to talk about correct or incorrect scores. Only better or worse scoring, correct or incorrect, as well as better or worse processes to arrive at those scores.
(We don't know exactly what any of the judges' thought processes were -- at least not without access to their shorthand notes and whatever they have to say for themselves in roundtable discussions -- so we are not in a position to say whether their process was incorrect. The referee at the roundtable would be better placed to pass judgment of that sort. And we can state that fans who offer "I liked it" or "I didn't like it" as their sole explanation for how they would have scored a program, or even "The jumps were all cheated so I marked down all the components," are not using the correct process. But if we're discussing with other fans, I'd hope that our responses could be more along the lines of "Well, I thought that..." or "well, according to the criteria for this component..." or "But what about..." rather than "You're wrong!")

TL;DR: Average *MUST* conserve the difference between 2 categories.

If you're talking about the math, I admit that is not my area of expertise. All I'll say is that you have 9 judges, some of them give large gaps between highest and lowest components for each performance, and judges disagree with each other on which component was better or worse, the gaps between the average scores will be smaller than the gaps on the individual wide-marking judges' cards.

So, I have a question. I understand that PC categories usually correlate, and you cannot really score 10 in transitions and 0 in skating skills. But is that accurate? Is that correct that most skaters have their skills very evenly distributed in PCs department, all around some level? Is it not possible for a skater to have SS 9 and IN 8, for example - because, based on ACTUAL DATA this seems like something VERY unlikely to happen.

I agree that is impossible to deserve 10 in transitions and 0 in skating skills. (Speedskaters might be able to achieve close to the opposite.)

I do think that it is very possible to deserve 9 in SS and 8 in IN. We do see gaps of that size or larger sometimes from individual judges. It would probably be appropriate to see that more often than we do. But I think it will still be a minority of skaters, even if it should be a larger minority than it already is.

For any given performance, though, I can't speak to what the "real" score should be, because there is no such thing as a real score independent of the opinion of experts. I could speak to how I would have scored the various components, and you and other fans can add how you would score them. And we could get a panel of people off the street who know nothing about skating, hand them the PCS guidelines with no further explanation, and ask them to score as well.

Our consensus might be very similar or very different from the expert panel or the ignorant panel. I'd say that a consensus of expert opinions is better than a consensus of opinions without expertise. But I wouldn't say that there is a right answer or a correct or "real" score. A different panel might score differently. The same expert panel might score differently if you show them the same performances on video from a different angle.

A good scoring system should not be that debatable. For real, if people who watch all FS competitions including juniors for years and know the rules by head cannot come anywhere close to some agreement, that means, in best case, that the variance of our measure (score) is too high and depends on too many factors other than the actual performance. Everywhere outside FS, this means a measure is bad and should be replaced by something more accurate.

Technically, skaters get paid because people watch them.

Click to expand...

Most skaters invest more money in their training than they ever receive in prize money or federation support. It's no longer strict amateur sport but it's not a professional sport either. Most competitors are losing money by participating in competitions.

Take gymnastics: you see two girls, one falls from the bars, the other doesnt. Is the score the judges give usually match with your expectation that the first one will lose and the second one will win?

Click to expand...

Yes, if both gymnasts are attempting similar skills with similar levels of expertise.iu
If I watched a competition with a wide range of skill levels in which the best gymnast fell from the bars at the end of a difficult and otherwise well-performed routine, and the worst gymnast did not fall but had an easier routine with a lower start value, and various form breaks and technical errors that I can't necessarily recognize, I would not be surprised for the stronger gymnast to score higher than the weak gymnast on that bar routine.

Over the course of a whole competition, a very strong gymnast could fall on bars and lose many points there compared to the next-best gymnasts in the event, but if s/he is strong enough at the other apparatuses s/he might make up enough points in all the other areas to counteract the deficit on bars.

With skating, there aren't other apparatuses. There are short and long programs. And there other sets of scores for each program that measure different skills.

If one skater earns +2 and +3 on all her successful elements and -3 on element with a fall, and another skater with similar base value earns 0 and +1 on all her elements, the former can come out ahead on TES despite a fall.

If two skaters have identical TES but one skater is in the "above average" to "good" range on components and another skater is "good" to "very good," the latter can lose points to a fall but still deservedly come out ahead on the strength of PCS.

Those concepts should be easy enough to understand if explained.

The tricky part is getting fans to see the differences between good vs. very good, or getting judges to agree on them. There are multiple criteria that go into each number, and that means there will always be room for disagreement.

If you had judges score each criterion separately, that would cut down on a lot of the disagreement. But as long as much of what is being assessed is qualitative, I don't think it's possible to eliminate all subjectiveness.

However, people will still perceive those qualitative aspects even if they aren't captured by any scores. So if you end up using only TES, for example, there would still be outrage that the better skating or better performer didn't win.

Mathman · Nov 15, 2017

moriel said:
That is not how it works (and i speak here as a statistician, not an FS fan).

Suppose that we have a way to measure exactly the real score for, lets say, SS and IN of a specific skater. Lets call those values SS_R and IN_R. ...

Strange as it seems, there is actually a (small) literature about the appropriateness of this sampling model in applications like figure skating judging. For instance, is the underlying population of all possible judges' scores (whose mean presumably is what we are calling the "real" (population) mean -- it is, as gkelly points out, hard to see what other meaning we can give to the "real" mean) -- is this population Gaussian, or even symmetric? Failing such assumptions as these, we are cast into the wacky world of non-perimetric statistics.

This is especially evident when we consider that it is the trimmed mean that produces the final score. (Canadianskaterguy is even suggesting greater trimming). It is not clear that one can do any better than bootstrapping methods. In particular, none of the formulas that have sigma/sqrt

in them obtain. It is interesting (and appropriate, I think), that the ISU uses a different measure of variation than the standard deviation in its Judges' Evaluation rules.

I have always maintained, not that anyone is interested, that ordinal judging, for all its faults, at least gave us some interesting mathematics to play with. Add up a column of numbers -- what's the fun of that?

moriel · Nov 15, 2017

If you're talking about the math, I admit that is not my area of expertise. All I'll say is that you have 9 judges, some of them give large gaps between highest and lowest components for each performance, and judges disagree with each other on which component was better or worse, the gaps between the average scores will be smaller than the gaps on the individual wide-marking judges' cards.he to evaluate a performance following the rules.

So what happens is: 9 judges evaluate 1 component. Then, the differences average out.
If we take the judges to evaluate 2 different components, the differences average out for each component, but not between them.
Lets take the extreme example of a speed skater. Awful transitions, and good skating skills.
Now, we get judges to evaluate that.

To make things simpler, lets have just 3 judges, but it can be easily extended to 9.

Transitions
Judge 1 will give him 0 in transitions
Judge 2 will give a 2 in transitions
Judge 3 will give him a 1 in transitions
Average: 1

SS
Judge 1: 9
Judge 2: 10 (this guy is biased, maybe some national bias)
Judge 3: 8
Average: 9

Difference between SS and TR is 8 points

Now, with a different pannel, this would be evaluated as something like:
TR: 1 1 2, average 1.3
SS 9 9 8, average 8.7
Difference: 7.4

Now, we can just keep scoring this guy with different pannels, but we will still keep getting averages close to 1 for TR and close to 9 for SS.
Because he got awful transitions and amazing skating skills.

Now, suppose that we do the same with a panel of 9 judges.
TR: 0 2 1 0 2 2 1 1 1, average 1.1
SS: 8 10 8 8 9 10 9 8 9, average 8.8
Difference: 7.7

We can even come up with an arbitrary panel of, dunno, 999 judges just for fun
TR: average 1.01
SS: average 9.00
Difference 8.99

Notice that the average cancels the difference between scores inside each category. One judge gives a 2, another one gives a 0, and it averages to 1.
But it does not cancels the difference BETWEEN DIFFERENT CATEGORIES. Why? Because if a skater has SS stronger than TR, the judges will tend to score his SS higher.

If you're talking about the math, I admit that is not my area of expertise. All I'll say is that you have 9 judges, some of them give large gaps between highest and lowest components for each performance, and judges disagree with each other on which component was better or worse, the gaps between the average scores will be smaller than the gaps on the individual wide-marking judges' cards.

So, the thing is. Suppose a skater has better SS than TR. Dont you think that judges will tend to AGREE that this skater SS is higher than TR, and it will NOT CANCEL OUT.

This is the thing.
The average "removes" the individual preferences of each judge, by averaging it out with individual preferences of other judges.
So just the trend is left.
So it is expected that differences between averages will represent better the differences in skill across different PCs categories than the differences of the individual scores.

The fact that those differences average out instead means that either
1. skaters have very small variance of skills across different PCs categories. A skater is VERY UNLIKELY have SS at level 9, and IN at level 8, for example
2. the scores do not represent actual skills that skater has.

Mathman · Nov 15, 2017

moriel said:
Lets take the extreme example of a speed skater. Awful transitions, and good skating skills.

Now, we get judges to evaluate that.

I think that is the problem, from a purely statistical point of view. You are assuming some sort of God-like determination or Truth with a capital T in statement that the skater has awful transitions and good skating skills. Absent judging, there is no basis for making this assumption.

It is not like the situation where an iron rod has a "real" length, then we look at a sample of measurements of this length. It is more like quantum physics, where the only thing that is real is the interaction between the measuring devise and the thing being measured.

(That's what I think, anyway.

)

Sam-Skwantch · Nov 15, 2017

My brain just wrinkled!!

drivingmissdaisy · Nov 15, 2017

Sam-Skwantch said:
My brain just wrinkled!!

I have bad news. It was already wrinkled.

YesWay · Nov 15, 2017

Neenah16 said:
Are there any translations of her commentaries? I would love to see them if possible

gkelly said:
Me too.

I'm afraid I don't know of any.

They are shown on JSports4 channel. This is available in Japan of course, but also via fujitv.live subscription outside of Japan. (That's how we watch, since our Japan-based slingbox died). I have the priviledge of on-the-fly translations by my wife, but I'm afraid we're not really inclined to transcribe and post translations online - it would take quite some time and effort

Maybe someone else with access to that channel would be willing and able?

gkelly · Nov 15, 2017

So, the thing is. Suppose a skater has better SS than TR. Dont you think that judges will tend to AGREE that this skater SS is higher than TR, and it will NOT CANCEL OUT.

Yes, if all judges agree SS is significantly better than TR, the TR score will always be lower than the SS score, and the size of the gap will be the average of the individual judges' gaps.

But let's take a different example.

Say we have a figure skater who is much slower than the speedskater, with average speed and edge depth and balance and precision. But she does almost no crossovers: instead her skating is full of three turns and mohawks and choctaws in all directions, and rockers and counters and brackets and loops and twizzles, and cross rolls and power pulls, and also a few toe steps.

And she also has some beautiful spirals and split jumps and spread eagles.

But she tends to lose speed and fluidity when setting up her jumps and uses very simple entries and exits to jumps and spins.

Judge A focuses on the power and edges and gives this skater a 5.0 for Skating Skills. He is impressed by the great variety of the steps and turns and believes those should best be rewarded under Transitions, along with the high-quality highlight moves. So he gives 6.0 for Transitions.

Judge B focuses on the one-foot skating and multidirectional skating criteria of the SS component, so she gives 5.5 for this component. She acknowledges the variety and difficulty of the transitions between elements and the quality of the positions, but that spiral edge was pretty shallow, and Judge B really doesn't like the telegraphed jumps or the fact that none of the transitional moves were directly connected to elements. So she gives 4.5 for TR.

The referee had encouraged all the judges to spread their marks. So these two judges both gave a 1.0 scoring difference between these two components -- but in different directions, because they disagreed on what to reward in which component.

So if we just take the average of these two judges, the skater ends up with 5.25 for both components.

Maybe judge C makes more of an effort to balance all the criteria of each component and ends up with 5.25 for SS and 5.25 again, or maybe 5.0, for TR, ignoring the ref's encouragement to spread the marks or believing it was not appropriate to do so for these marks for this performance.

So which judge is right and which is wrong? Or are they all doing their jobs correctly, but somewhat differently?

You personally might agree more with Judge A or Judge B or Judge C. But that doesn't make your assessment closer to the "real" score -- there is no real score apart from the assessment of a panel. A situation like this might make for fruitful discussion in the judges' room and on message boards.

It's hard to imagine a network TV commentator delving into the finer details of the various component criteria, though. Even if the commentator has the knowledge and the analytical mindset to do so, the producer will encourage them to dumb down the commentary to lowest common denominator of viewers.

moriel · Nov 15, 2017

gkelly said:
Yes, if all judges agree SS is significantly better than TR, the TR score will always be lower than the SS score, and the size of the gap will be the average of the individual judges' gaps.

But let's take a different example.

Say we have a figure skater who is much slower than the speedskater, with average speed and edge depth and balance and precision. But she does almost no crossovers: instead her skating is full of three turns and mohawks and choctaws in all directions, and rockers and counters and brackets and loops and twizzles, and cross rolls and power pulls, and also a few toe steps.

And she also has some beautiful spirals and split jumps and spread eagles.

But she tends to lose speed and fluidity when setting up her jumps and uses very simple entries and exits to jumps and spins.

Judge A focuses on the power and edges and gives this skater a 5.0 for Skating Skills. He is impressed by the great variety of the steps and turns and believes those should best be rewarded under Transitions, along with the high-quality highlight moves. So he gives 6.0 for Transitions.

Judge B focuses on the one-foot skating and multidirectional skating criteria of the SS component, so she gives 5.5 for this component. She acknowledges the variety and difficulty of the transitions between elements and the quality of the positions, but that spiral edge was pretty shallow, and Judge B really doesn't like the telegraphed jumps or the fact that none of the transitional moves were directly connected to elements. So she gives 4.5 for TR.

The referee had encouraged all the judges to spread their marks. So these two judges both gave a 1.0 scoring difference between these two components -- but in different directions, because they disagreed on what to reward in which component.

So if we just take the average of these two judges, the skater ends up with 5.25 for both components.

Maybe judge C makes more of an effort to balance all the criteria of each component and ends up with 5.25 for SS and 5.25 again, or maybe 5.0, for TR, ignoring the ref's encouragement to spread the marks or believing it was not appropriate to do so for these marks for this performance.

So which judge is right and which is wrong? Or are they all doing their jobs correctly, but somewhat differently?

You personally might agree more with Judge A or Judge B or Judge C. But that doesn't make your assessment closer to the "real" score -- there is no real score apart from the assessment of a panel. A situation like this might make for fruitful discussion in the judges' room and on message boards.

It's hard to imagine a network TV commentator delving into the finer details of the various component criteria, though. Even if the commentator has the knowledge and the analytical mindset to do so, the producer will encourage them to dumb down the commentary to lowest common denominator of viewers.

In your case, what happens is: there is no difference, but there is the judge personal bias. Which is removed by the average, basically.
Indeed, different judges may give different weights for different things, and based on that, produce different scores. This is why we take an average.

But, for example, do you agree that Kostner should receive higher SS scores than, dunno, Anastasiya Galustyan?
This difference is not a matter of judge's oppinion, or preferences, or whatever they consider more important. That is some real difference that is there and does not depend on judges.

My point here is: looking at the scores of real skaters, they ALL seem to have a VERY small difference between all the PCs categories. This difference is usually under 0.5, and pretty much always under 1.0 (we are counting the averaged differences which better represent the real difference as there is less individual bias).
Now, is that really true? Looking at the skaters in most events, thats not what I feel. Some skaters are really strong in PE and IN, and visibly weaker in SS, and so on.

I feel the real difference is higher than what the scores show us, because it is just too even compared to the differences we see between skaters.

Sam-Skwantch · Nov 15, 2017

drivingmissdaisy said:
I have bad news. It was already wrinkled.

Is this a reputation based assessment :roll5:

moriel · Nov 15, 2017

Mathman said:
I think that is the problem, from a purely statistical point of view. You are assuming some sort of God-like determination or Truth with a capital T in statement that the skater has awful transitions and good skating skills. Absent judging, there is no basis for making this assumption.

It is not like the situation where an iron rod has a "real" length, then we look at a sample of measurements of this length. It is more like quantum physics, where the only thing that is real is the interaction between the measuring devise and the thing being measured.

(That's what I think, anyway. )

From a statistical point of view, if our distribution has no mean, all the current judging is totally pointless, and we should just shrug and go do something else =)
You know, no central limit theorem and so on, which mean taking averages makes no sense, and so on.

Also, given one competition, we expect the measuring device to interact the same way with all the things to be measured. So if skater A and skater B have exactly same level of SS, they will get very close SS scores. If they don't, the measuring device does not satisfy our needs of ranking the things.

At any rate, quantum physics still do not explain the absence of difference between scores for different PCs categories.
Because when we look out there, there are quite a few skaters who can show, for example, very good IN and PE, but average SS and TR. And things like that. And this does not seem to translate into the scores.

Mathman · Nov 15, 2017

moriel said:
My point here is: looking at the scores of real skaters, they ALL seem to have a VERY small difference between all the PCs categories. This difference is usually under 0.5, and pretty much always under 1.0/

First, to me, the appropriate conclusion to draw from this is not that PCS should be eliminated, but that they should be combined. There is no point in having a performance score and a composition score and a musical interpretation score if that just means that the judges write down the same score three times. You could combine them into one score, while still listing all the criteria and bullet points that the judges are supposed to be paying attention to.

What I have found is that TR is usually half a point lower than SS. I always figured that this was because it is easier for judges actually to count the transitions and evaluate their variety at least, if not quality -- so they are more confident in giving a lower score.

The other thing that I have noticed is that for the top skaters all of the components scores are about the same. But this often becomes less and less true as you work your way down. At the lower levels it is easier to find examples of a skater who gets high marks in SS and considerable lower marks in Presentation, or vice versa. For the very top skaters. I for one am not surprised that the same competitors who have the strongest skating skills, for instance, also get good marks in turning those skills to the service of musical interpretation, etc.

Mathman · Nov 15, 2017

moriel said:
From a statistical point of view, if our distribution has no mean, all the current judging is totally pointless, and we should just shrug and go do something else =)

That is the question. What is the distribution that we are attempting to study?

You know, no central limit theorem and so on, which means taking averages makes no sense, and so on.

Indeed.

gkelly · Nov 15, 2017

I think a big part of the problem is:

A skating program consists of hundreds, maybe a couple thousand, relevant details.

A casual viewer will only notice a handful.

An educated fan, or a less experienced judge, might notice dozens.

The best judges will see hundreds of details.

It's not difficult for an experienced judge to see three different relevant details in a single second of program watching time. But it would take more than one second to make shorthand notes on paper about all those details, let alone to record them all in ways that can be captured by computer and thereby conveyed to the skaters and fans and other interested parties.

So judges take mental note of everything that they see and synthesize those observations into categories that can be translated into a manageable number of scores.

By the end of the program, they may have forgotten some of the specific details that they noticed, but they do remember the synthesized impression that all those momentary impressions about skating skill or musical interpretation added up to in their minds while they were watching.

Under 6.0 judging, judges had to boil down all those impressions into 2 scores per program. In IJS, it's up to 18 per program. (Plus there are other details that the technical panel and the referee are responsible for that judges don't need to worry about.)

That gives more information than 2 scores, but still much less than the actual number of data points the judges noticed while watching, maybe much less than they would remember if you asked them to tell you everything they could immediately after the program ended, with reference to whatever written notes they took.

Because there are multiple data points/impressions going into each score, and sometimes there will be good and bad, or good and mediocre or mediocre and bad, aspects that are all part of the same score, there is always going to be room for disagreement.

The scores aren't binary decisions like Did the ball go into the goal? yes = 1, no = 0

It's not even Did the skater land the jump on one foot? How the skater got there and how the jump was landed also count.

Soccer fans, or refs, can see how the ball got to the goal and how it went in. They might have opinions (and likely often different opinions) about the quality of the ball handling. The experts see the details, but those details usually don't matter to the scoring. Soccer isn't a judged sport, there aren't scores for quality, only for getting into the goal. In that kind of sport, there's much less room for disagreement.

Still there will be gray areas sometimes inspiring fans or players to dispute the referee calls. Just imagine how much more contentious soccer scoring would be refs or judges could award extra points for elegant plays and deductions for sloppy ones, etc.

moriel · Nov 15, 2017

Mathman said:
That is the question. What is the distribution that we are attempting to study?

Indeed.

As long as it has a mean, we are just fine doing nonparametric stuff

Now, if its a cauchy, then we should deffo use a different approach

PCS/Reputation Judging

moriel

CanadianSkaterGuy

Mathman

rabbit1234

Mathman

gkelly

Mathman

moriel

Mathman

Sam-Skwantch

“I solemnly swear I’m up to no good”

drivingmissdaisy

YesWay

四年もかけて&#

gkelly

moriel

Sam-Skwantch

“I solemnly swear I’m up to no good”

moriel

Mathman

Mathman

gkelly

moriel

Similar threads

Connect with us

PCS/Reputation Judging

“I solemnly swear I’m up to no good”

&#22235;&#24180;&#12418;&#12363;&#12369;&#12390;&#

“I solemnly swear I’m up to no good”

Similar threads

四年もかけて&#