Judging math details | Page 7 | Golden Skate

Judging math details

moriel

Record Breaker
Joined
Mar 18, 2015
You werent listening quite well. I said that in skaters own range of possible PCS he/she may get, and which is around 2 points, 0.5 difference in scores is big enough. That point range can change throu the years and with changing of programmes/their own skating, but at one point of the time when skaters need to compete is more or less defined. Every skater knows (by getting feedbacks) which marks can expect in PCS, he cant get 5 in one category and 9 in another, or 5 in one competition and 9 in another, because knows skills he/she has in that programme he/she skates again and again are for example from 7,5 to lets say 9 at his/her best, but also lets say may drop to min of 6.5 is he/she skates really bad. So, in that point range where every individual skater could possible get scores, 0.5 difference between categories and between one score in category and his/her own potential max is big. That 0.4 difference is significant for that individual skater in any possible universe, including statistical one, because sample of scores for individual skater is not 10 point range, but 2 point range.

"in skaters own range of possible PCS he/she may get"
Ok, so we have 9 judges, giving a skater scores in this 2 point range. The thing is, when we do the trimmed mean, this 0.5 point difference is not really relevant, because depends on the judges behavior. Becaue with a different judges pannel, you may get the very same difference just because pannel changed.

For example, I did a simulation for a range 7 to 9 and got those 2 scores on my first 2 attempts: 7.11 and 8. There is a large difference. But it is not because anything about the skater changed.
 
Joined
Jun 21, 2003
For example, I did a simulation for a range 7 to 9 and got those 2 scores on my first 2 attempts: 7.11 and 8. There is a large difference. But it is not because anything about the skater changed.

Do you mean that you generated scores at random in the range 7 to 9?

By the way, the trimmed mean obtained by leaving off the lowest and highest mark is sometimes called the "Olympic mean" because of its use in judged Olympic sports. It is also widely used in management science.

https://en.wikipedia.org/wiki/Truncated_mean

Estimating the standard deviation of the sample trimmed means is, in general, a hopeless task. In our case, though, the following method is not bad. Discard the highest and lowest, and then replace them with a duplicate of the lowest remaining score and the highest remaining score. Take the mean of the resulting 9 numbers. Don't forget n=9, not 7. This is called the Winsorized mean. Now compute the (sample) standard deviation of this new sample of size 9. (n-1 = 8). Now apply the usual formula for the standard error (SE) of the sampling mean: SE = s/sqrt(n). Now we have a 68% confidence interval about the listed trimmed mean (use twice the standard error if we want a 95% confidence interval).

estimate for true mean = trimmed mean from protocol +/- SE.

Do this for each component score. Now we can see the degree to which the number on the protocol is significant, whether the score for one skater is significantly different from another skater for that component, etc. Or if the contest is "too close to call" by the IJS.
 

moriel

Record Breaker
Joined
Mar 18, 2015
Do you mean that you generated scores at random in the range 7 to 9?

By the way, the trimmed mean obtained by leaving off the lowest and highest mark is sometimes called the "Olympic mean" because of its use in judged Olympic sports. It is also widely used in management science.

https://en.wikipedia.org/wiki/Truncated_mean

Estimating the standard deviation of the sample trimmed means is, in general, a hopeless task. In our case, though, the following method is not bad. Discard the highest and lowest, and then replace them with a duplicate of the lowest remaining score and the highest remaining score. Take the mean of the resulting 9 numbers. Don't forget n=9, not 7. This is called the Winsorized mean. Now compute the (sample) standard deviation of this new sample of size 9. (n-1 = 8). Now apply the usual formula for the standard error (SE) of the sampling mean: SE = s/sqrt(n). Now we have a 68% confidence interval about the listed trimmed mean (use twice the standard error if we want a 95% confidence interval).

estimate for true mean = trimmed mean from protocol +/- SE.

Do this for each component score. Now we can see the degree to which the number on the protocol is significant, whether the score for one skater is significantly different from another skater for that component, etc. Or if the contest is "too close to call" by the IJS.

Nah, used normal variation with trimmed mean (7 out of 9)
Ofc not a good example, but still fun.
We could probably try accessing the actual distribution and parameters for that, but sounds kinda pointless really.
 

Miller

Final Flight
Joined
Dec 29, 2016
That's a good point. However, I still think that it is tricking people to say that Zhenya got 9 and Caro got 8.75, but this does not mean that Zhenya exhibited better skating skills. The average person would think that is does mean that Zhenya skated better: 9.00 beats 8.75.

On a separate topic, if we look at the criteria for the five components

http://www.usfigureskating.org/content/ISU program-component-chart_sandp-and-id_08-16.pdf

it makes me wonder why Composition and Interpretation are two separate categories.

Composition: Phrase and form (movements and parts of the program to match the musical phrasing)

Interpretation: Expression of the music's character, feeling and rhythm, when clearly identifiable.

And so on. Actually this criterion under Interpretation is kind of funny. The ISU seems to be saying, most skating music does not have any identifiable character, feeling or rhythm -- but if it does, the skater should respect it.

There is another problem with the Composition (Choreography) component. The actual choreography is done by someone else. A criterion like "multidimensional use of space and design of movements" is the choreographer's work, not the skater's. What the skater should get credit for is being able (thanks to her Skating Skills and talent for Interpreting Music) to perform complex choreography and fully to commit to the music while doing so. I do not really see the value in having a separate Composition component.

My final thought on my proposed 3 part PCS system - SS, TR + combined PE/CO/IN component.

You are dead right re this, the final one I would call 'Performance and Interpretation' with nothing at all from Composition, unless there's something that has to be in there, perhaps the judges recognise this by making sure CO and IN are so similar.

Hence final system = SS 25%, TR 25%, PE/IN 50% (you could actually have 4 components but that would defeat the object of having a simplified 3 part one). Also dropping CO would have very similar effect to when I recalculated the ladies LP with a merged 3 component one - I got a swing of 0.30 in PCS compared with 0.28 above, Dabin Choi and Elena Radionova (real commitment to choreography/relatively high IN score) were now the outliers.
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
My final thought on my proposed 3 part PCS system - SS, TR + combined PE/CO/IN component.

You are dead right re this, the final one I would call 'Performance and Interpretation' with nothing at all from Composition, unless there's something that has to be in there, perhaps the judges recognise this by making sure CO and IN are so similar.

Hence final system = SS 25%, TR 25%, PE/IN 50% (you could actually have 4 components but that would defeat the object of having a simplified 3 part one). Also dropping CO would have very similar effect to when I recalculated the ladies LP with a merged 3 component one - I got a swing of 0.30 compared with 0.28 above, Dabin Choi and Elena Radionova (real commitment to interpretation) were now the outliers.

The problem is if we emerge PE/IN/CO in one mark that mark will cover too many concepts and requirements, so skater can achieve the same score with too many different ways. So, that mark will not have significant meaning for skater cause it would change only in smaller range between the competition. More differencies in PC for some skater and better feedback we will have if we have more categories in PC, not less, for example if every single requirement for every single category gets his mark independetly. For example, when we judge concepts of CO of 2 skaters, the scores can be
Purpose (idea,concept,vision,mood) 7.25 for one skater and 8.25 for the other
Pattern/ice coverage 8.50 and 7.25
Multidimensional use of space and design of movements 8.25 and 7.50
Phrase and form (movements & parts of the program to match the musical phrasing) 7.75 and 8.25
Originality of the composition 7.25 and 8.00
You see - general mark for CO will be probably 7.75 for both skaters. So, the same skater or some other skater can get the same mark by fullfill those requirements differently. If we all requirements of 3 categories (so 15 of them) put in one score, those scores will be probably even more the same for skater across diferent competition and for many different kind of skaters. So the problem can be in fact that new mark (composed of CO/PE/IN) will say even less at the end about skaters PC.
 

Miller

Final Flight
Joined
Dec 29, 2016
The problem is if we emerge PE/IN/CO in one mark that mark will cover too many concepts and requirements, so skater can achieve the same score with too many different ways. So, that mark will not have significant meaning for skater cause it would change only in smaller range between the competition. More differencies in PC for some skater and better feedback we will have if we have more categories in PC, not less, for example if every single requirement for every single category gets his mark independetly. For example, when we judge concepts of CO of 2 skaters, the scores can be
Purpose (idea,concept,vision,mood) 7.25 for one skater and 8.25 for the other
Pattern/ice coverage 8.50 and 7.25
Multidimensional use of space and design of movements 8.25 and 7.50
Phrase and form (movements & parts of the program to match the musical phrasing) 7.75 and 8.25
Originality of the composition 7.25 and 8.00
You see - general mark for CO will be probably 7.75 for both skaters. So, the same skater or some other skater can get the same mark by fullfill those requirements differently. If we all requirements of 3 categories (so 15 of them) put in one score, those scores will be probably even more the same for skater across diferent competition and for many different kind of skaters. So the problem can be in fact that new mark (composed of CO/PE/IN) will say even less at the end about skaters PC.

I think the point about dropping CO is that it is so similar to IN, across 40 skaters/couples at CoC it was virtually identical, plus as above CO is often down to what choreographer you can afford. Hence with PE and IN still remaining (albeit as one component) you are actually increasing the chances of the skater getting feedback on the things that are down to them - the judges could always provide feedback to the original choreographer if they felt that was necessary.

Also you could actually still have 4 components SS, TR, PE, IN, my personal preference would be to merge from a simplicity point of view. Whether the judges can really split the individual components down as you say under competition circumstances I don't know, but I would have thought general feedback on a merged component i.e. plus or minus on what they actually scored should still be possible.
 

moriel

Record Breaker
Joined
Mar 18, 2015
The mark already has no significant meaning.
There is no meaningful difference between getting 8 and 8.25 for example.
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
I think the point about dropping CO is that it is so similar to IN, across 40 skaters/couples at CoC it was virtually identical, plus as above CO is often down to what choreographer you can afford. Hence with PE and IN still remaining (albeit as one component) you are actually increasing the chances of the skater getting feedback on the things that are down to them - the judges could always provide feedback to the original choreographer if they felt that was necessary.

Also you could actually still have 4 components SS, TR, PE, IN, my personal preference would be to merge from a simplicity point of view. Whether the judges can really split the individual components down as you say under competition circumstances I don't know, but I would have thought general feedback on a merged component i.e. plus or minus on what they actually scored should still be possible.

I understand your point. But we also saw that with some skaters CO is most similiar to SS, and only with some to IN... Lets say that some concepts of CO as pattern, ice coverage, use of space we can put in SS, Purpose and Originality of CO in PE and phrase/form/design of movements in IN. So we get 4 components in the end (i see that as valid, i agree)... BUT - The possible problem we could get is that skaters wouldnt know why they get that 8.50 mark, because they were skating to the rythm (good timing) or because theirs movements. From skaters point of view is better to have more categories, because they can get more informations about theirs performance, not to emerge them.
 

Miller

Final Flight
Joined
Dec 29, 2016
The mark already has no significant meaning.
There is no meaningful difference between getting 8 and 8.25 for example.

Yes, it's only 0.4 marks in a Ladies LP. It's only worth doing something, if by merging/simplifying you actually end up with something different than what you get now - max spread I got across all components for one competitor was 1.00, next highest 0.67, and for all dance and pairs teams it was less than 0.50.
 

Miller

Final Flight
Joined
Dec 29, 2016
I understand your point. But we also saw that with some skaters CO is most similiar to SS, and only with some to IN... Lets say that some concepts of CO as pattern, ice coverage, use of space we can put in SS, Purpose and Originality of CO in PE and phrase/form/design of movements in IN. So we get 4 components in the end (i see that as valid, i agree)... BUT - The possible problem we could get is that skaters wouldnt know why they get that 8.50 mark, because they were skating to the rythm (good timing) or because theirs movements. From skaters point of view is better to have more categories, because they can get more informations about theirs performance, not to emerge them.

Yes, it all depends on your point of view as to what the numbers represent and what they are used for. My personal preference is for less categories that IMO might lead to more accurate figures - with 5 numbers I think you get a lot of plus or minus-ing based on a base, with 3 they might, IMO, be more accurate. However as you say you might have less scope for the individual breakdown. I guess it boils down to whether you think the numbers as keyed now are fully accurate.
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
Yes, it all depends on your point of view as to what the numbers represent and what they are used for. My personal preference is for less categories that IMO might lead to more accurate figures - with 5 numbers I think you get a lot of plus or minus-ing based on a base, with 3 they might, IMO, be more accurate. However as you say you might have less scope for the individual breakdown. I guess it boils down to whether you think the numbers as keyed now are fully accurate.

Well, point of numbers in PCS is in individual breakdown. You are giving numbers in PCS in a way that you are judging every individual skater on a scale from 1 to 10. If some skater fullfill 3 requirements of one component with good quality and 2 with very good quality he/she will get 6.50, if other fullfill 2 requirements good, 2 very good and 1 excelent he/she will get 7.00 etc etc
 
Joined
Jun 21, 2003
Well, point of numbers in PCS is in individual breakdown.

Well, that is exactly the issue that is in question. Is this really the point of numbers in PCS? Or is the primary point of all these numbers to determine the winner of the competition?

It is nice to give feedback to the skaters. But Moriel and others are saying that the current system is on shaky ground when it claims to provide an accurate determination of the true winner of a close contest.
 
Last edited:

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
Well, that is exactly the issue that is in question. Is this really the point of numbers in PCS? Or is the primary point of all these numbers to determine the winner of the competition?

It is nice to give feedback to the skaters. But Moriel and others are saying that the current system is on shaky ground when it claims to providing an accurate determination of the true winner of a close contest.

The same way teachers determine who is 'the best' student in the class using the same way of judging, employers determine who would be 'the best' for that exact job etc etc... Who meet requirements better in some deffined competition is the winner of that exact competition. Thats the philosophy behind that... But the winner sometimes is not the 'best player', or the 'best team', just the one who played that competition better, or the one who scored more ... If competition are just there to determine a 'true winner' there would be no point of competitions. Cause, we already know Bolt is the fastest man in the world, or Brasilians play football the best etc etc. If they suppose to win every competition, there is no need for them...
 

gkelly

Record Breaker
Joined
Jul 26, 2003
My final thought on my proposed 3 part PCS system - SS, TR + combined PE/CO/IN component.

You are dead right re this, the final one I would call 'Performance and Interpretation' with nothing at all from Composition, unless there's something that has to be in there, perhaps the judges recognise this by making sure CO and IN are so similar.

The current Composition criteria are
*Purpose (idea, concept, vision, mood)
*Pattern/ice coverage
*Multidimensional use of space and design of movements
*Phrase and form (movements & parts of the program to match the musical phrasing)
*Originality of the composition

You could move Pattern/ice coverage to the technical/Skating Skills component and most of the others to a redefined Performance/Interpretation component, so you don't lose them altogether.

People would have to stop complaining about backloaded (or frontloaded) programs being bad choreography by definition that must be punished, if there is no component for considering choreography in terms of program layout.

If the combined components are worth approximately 25% instead of 10% each compared to the total score including TES (e.g., with factors of 2.5 instead of 1.0 for the men's short program, assuming there isn't a rejiggering of the men's factors to account for the outsized TES values with multiple difficult quads), then let's give the judges 0.1 instead of 0.25 increments to distinguish performances that are similar but not identical in their mastery of all the criteria for these combined components.

If the factor is 2.5 for a men's SP and 5.0 for a men's freeskate, then what is the smallest difference in score judges should be able to give between, say, Good and Very Good? If two performances, or ten performances in the same large event, are all better than just Good at Performance/Interpretation, and neither reaches the judge's standard for Very Good, how much flexibility should a judge have available to distinguish them? If there were no decimal places, the only choices would be to round some down to 7.0 and some up to 8.0 with large point gaps between them after factoring so that small differences result in scoring differences equal to a triple jump, or else giving all these performances the exact same P/I score and erasing the real differences each judge sees between them.

Hence final system = SS 25%, TR 25%, PE/IN 50% (you could actually have 4 components but that would defeat the object of having a simplified 3 part one).

You could also combine SS with TR and include the ice coverage and pattern criterion from Composition there.
But if you combine and raise the factor for a combined S/T component, the same consideration about factors and decimal places would apply.

I think the point about dropping CO is that it is so similar to IN, across 40 skaters/couples at CoC it was virtually identical,

Before drawing conclusions and prescribing global solutions that would affect all skaters in all competitions, at least all senior and junior competitions, I would urge you to analyze scores from whole large events including Worlds, Euros/4Cs, Junior Worlds, as well as JGPs and senior B events, and maybe national championships, maybe even sectional qualifiers, in large skating countries with diverse senior and junior-level populations.

It would be a lot more effort to analyze all that extra data especially if you're doing it manually. But it would give a much better idea of the scope of the "problem" that you're trying to solve and how your proposed solution would affect all the

It may be that Grand Prix scores are the least likely to show the variation you're looking for (both because of the narrow range of participants and because judges at that level may be less willing to spread their marks, for whatever reason). If so, prescribing a solution based on such a small population could be entirely inappropriate for other populations the same rules apply to.
 

Miller

Final Flight
Joined
Dec 29, 2016
The current Composition criteria are
*Purpose (idea, concept, vision, mood)
*Pattern/ice coverage
*Multidimensional use of space and design of movements
*Phrase and form (movements & parts of the program to match the musical phrasing)
*Originality of the composition

You could move Pattern/ice coverage to the technical/Skating Skills component and most of the others to a redefined Performance/Interpretation component, so you don't lose them altogether.

People would have to stop complaining about backloaded (or frontloaded) programs being bad choreography by definition that must be punished, if there is no component for considering choreography in terms of program layout.

If the combined components are worth approximately 25% instead of 10% each compared to the total score including TES (e.g., with factors of 2.5 instead of 1.0 for the men's short program, assuming there isn't a rejiggering of the men's factors to account for the outsized TES values with multiple difficult quads), then let's give the judges 0.1 instead of 0.25 increments to distinguish performances that are similar but not identical in their mastery of all the criteria for these combined components.

If the factor is 2.5 for a men's SP and 5.0 for a men's freeskate, then what is the smallest difference in score judges should be able to give between, say, Good and Very Good? If two performances, or ten performances in the same large event, are all better than just Good at Performance/Interpretation, and neither reaches the judge's standard for Very Good, how much flexibility should a judge have available to distinguish them? If there were no decimal places, the only choices would be to round some down to 7.0 and some up to 8.0 with large point gaps between them after factoring so that small differences result in scoring differences equal to a triple jump, or else giving all these performances the exact same P/I score and erasing the real differences each judge sees between them.


You could also combine SS with TR and include the ice coverage and pattern criterion from Composition there.
But if you combine and raise the factor for a combined S/T component, the same consideration about factors and decimal places would apply.


Before drawing conclusions and prescribing global solutions that would affect all skaters in all competitions, at least all senior and junior competitions, I would urge you to analyze scores from whole large events including Worlds, Euros/4Cs, Junior Worlds, as well as JGPs and senior B events, and maybe national championships, maybe even sectional qualifiers, in large skating countries with diverse senior and junior-level populations.

It would be a lot more effort to analyze all that extra data especially if you're doing it manually. But it would give a much better idea of the scope of the "problem" that you're trying to solve and how your proposed solution would affect all the

It may be that Grand Prix scores are the least likely to show the variation you're looking for (both because of the narrow range of participants and because judges at that level may be less willing to spread their marks, for whatever reason). If so, prescribing a solution based on such a small population could be entirely inappropriate for other populations the same rules apply to.

I didn't quite follow your point about 0.1 increments and factoring. I would still be proposing 0.25 increments in each component, even the combined one, but see below. Each 0.25 in final component average in the LP equates to 0.3 marks for the Free Dance, 0.4 for Ladies LP and 0.5 for Men's. However the final result isn't to 0.25 it's to 0.01. Each individual judge is slightly different. If you take 7 counting judges each 0.25 of increment per component is, to a round figure, 0.04 for FD (0.3/7), 0.06 for Ladies, 0.07 for Men.

Hence if you increase the factoring from 1.2/1.6/2.0 to 3.0/4.0/5.0 for the combined component for the FD/Ladies LP/Mens LP each increase would be worth 0.11/0.14/0.18 i.e. the same as a +1 GOE on a normal triple (0.7/7)/a TA or a quad (1.0/7), but the 0.18 would be more, but on only one program of one discipline, plus the judges are obviously fine with what each +1 GOE allows them to do - the other way to think about it is each 0.25 in final average would be worth 0.75 for FD, 1.00 for Ladies LP and 1.25 for Mens LP, then divide by 7 to get the amount of points per judge for a single 0.25 increment.

However this is with 0.25 increments, 40 in total per component. If the points per increment are too high you could go to a 0.2 increment i.e. 50 in 10 marks. This would reduce the 0.18 to 0.14 for a 0.20 increment for a Men's LP. Going to 0.1 i.e. effectively marks out of 100 would surely be too far.

Fair point re other events, but the other point about CO was it also reflected the skater's ability to have a good choreographer and not all would be able to pay. Also I was trying to stress that the system needs to score the things that the skater has control over and not where it reflected the skaters ability to pay. Also I did say that some aspects of Composition could be kept if they had to be, plus you yourself have already have come up with ideas how to do it - I don't know how many of these are down to the skater and how many the choreographer, but if they are the former then the more the merrier, plus by including them you are effectively keeping CO, or parts of it, in there.

As you say you also need to make sure you don't lose the fine level of scoring detail by combining the components, but hopefully I have shown that even with the combined one the increase in marks per judge for 0.25 of component is only equal to around that of a +1 GOE for a triple or a quad in the worst case scenario, and even then you could increase the level of detail by going to a 0.2 increase for a component.
 
Joined
Jun 21, 2003
But the winner sometimes is not the 'best player', or the 'best team', just the one who played that competition better, or the one who scored more ...

No one disputes this. It is certainly possible for someone to nip Usain Bolt at the finish line, even though Bolt is the greatest of all time. Under ordinal judging Michelle Kwan lost to Tara Lipinski and to Sarah Hughes at the Olympics -- no numbers involved.

The question is this: How can we tell whether the sprinter really nipped Usain Bolt at the finish or not? In the case of a foot race, the answer is clear -- all you need is a stopwatch and a camera. What is not so clear is whether all the numbers carried out to the second decimal place provide an adequate stopwatch in a figure skating competition.

Consider this situation. There are three judges (after trimming) and two skaters. The scores in Composition are

Skater A: 8.25, 8.5, 8.75 (Average 8.50)
Skater B: 8.75, 8.5, 7.75 (Average 8.33)

Can we be confident from the two numbers 8.50 and 8.33 that Skater A really fulfilled 8 bullet points (and deserves to win) while skater B only fulfilled 6 (and deserves to lose)? Or would we say that the variation in scores among the three judges completely swamps the minuscule difference (0.17 points) between the two averages?

Here is another example, where we cannot be certain that one skater fulfilled more bullet points that the other. Who "really" skated best this day and deserves to win?.

Here are the averages of the five component scores:

Skater A

SS = 8.33
TR = 7.33
PE = 8.0
CO = 8.5
IM = 7.5

Total PCS = 39.66 (factored by 1.6 = 63.46)

Skater B

8.00
7.67
8.25
7.00
8.25

Total 39.67 (factored by 1.60 = 63.47)

The only reason that Skater B won is that he had one score (TR) rounded up from .66666666..., while skater A had two scores (SS and TR) that were both rounded down from .33333333... .

Well, this is an extreme example. But the point is, presenting numbers to two decimal points does not give anyone confidence that the scoring system produced the correct winner according to the scale of values and the bullet points for program components.
 
Last edited:
Joined
Jun 21, 2003
The current Composition criteria are

*Purpose (idea, concept, vision, mood)
*Pattern/ice coverage
*Multidimensional use of space and design of movements
*Phrase and form (movements & parts of the program to match the musical phrasing)
*Originality of the composition

You could move Pattern/ice coverage to the technical/Skating Skills component and most of the others to a redefined Performance/Interpretation component, so you don't lose them altogether.

People would have to stop complaining about backloaded (or frontloaded) programs being bad choreography by definition that must be punished, if there is no component for considering choreography in terms of program layout.

There is not any such consideration in the bullet points for choreography now, so this would not be a change if components were combined (nor would it stop people from complaining about backloaded programs ;) ) If the ISU wished they could to put in a bullet point: "technical highlights shall be used throughout the program to create a harmonious whole." They obviously don't want to do this, since they haven't.

As for me, my biggest complaint is that the choreography can hit all the bullet points and still be crumby choreography. There is no bullet point for, "This is a great program: 9.50!" I guess this is covered by: "This program is outstanding, so the score should be in the 9s. it hit a few bullet points, so I'll nudge it up to 9.5 (especially since I gave the last guy 9.25 and this guy is better)."
 

Miller

Final Flight
Joined
Dec 29, 2016
Consider this situation. There are three judges (after trimming) and two skaters. The scores in Composition are

Skater A: 8.25, 8.5, 8.75 (Average 8.50)
Skater B: 8.75, 8.5, 7.75 (Average 8.33)

Can we be confident from the two numbers 8.50 and 8.33 that Skater A really fulfilled 8 bullet points (and deserves to win) while skater B only fulfilled 6 (and deserves to lose)?

I was wondering is it possible to say from a statistical point of view that a skater has actually won or had say a better component score than another skater. For example when I was at Uni I did a stats course they were talking about 95% probability and I took this as meaning the point at which a statistician would say this was 'a certainty' whereas somebody else might have said it wasn't long before then. E.g. if a skater were to score a component score of 8.27 on a trimmed 7 out of 9 basis, at what point would a statistician say that it was 95% certain that the score was higher than that of another skater - 8.00? 7.75? Also is it possible to do so on a final score. Say a lady skater wins a competition with 200 points with 19 GOE elements and 10 PCS scores in there. At what point would a statistician say it was 95% probable she'd actually won - a 2 point advantage, 3?
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
.

The question is this: How can we tell whether the sprinter really nipped Usain Bolt at the finish or not? In the case of a foot race, the answer is clear -- all you need is a stopwatch and a camera. What is not so clear is whether all the numbers carried out to the second decimal place provide an adequate stopwatch in a figure skating competition.

But are the watch giving us the 'right number'? How we can be sure? What if runners cross the line in the same time, lets say 9.80 (which is happening in 100m race). So, they need to decide who is the winner by thousand part of a second, because that is what rules are saying. But, is it one of them really faster if is faster by thousand part of a second, a measure humans cant percept at all? And more, isnt it better if we consider Starting time and deducted it from Time at finish, because that is what really telling us the time one runner has on 100m. Arent we when we try to determine who is the fastest, shoud not consider other units of speed as km/h, for example? So, is that number we got really telling us who is faster? If we want to decide who is best long jumper shouldnt we consider other factors, as spacing of the springboard before the jump, to consider average score of all jumps, and not just the longest one etc etc So, even in sports which look simple as track and field the winner is determine only by the current rules. And you can find problems everywhere how is that determined. But one of the goals of every competitions is to determine the winner, there are rules in every competition how to determine it, and every sport doing it by their own rules. FS decided to doing it like that, using that math method... Because of that i was saying that your opinion that sport should be determine true winner is a bit too much to expect, because rules and other factors (as numbers) are the one who can make the winner. And every competitor is aware of it - numbers are also telling you when someone is clear winner and when other factors can play the role. Because or that, every winner is winner in some exact context, he/she cant be winner without that context (and because of that there is no point of using 'true winner' words to described it).\
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
As for me, my biggest complaint is that the choreography can hit all the bullet points and still be crumby choreography. "

Because crumby choreography is subjective statement. 'Great choreography' is subjective statement. Hiting Compositions bullet points is (or can be) objective statement, as a product of Composition being defined through those bullet points. The problem is that is hard to objectively define Great choreography, so you wouldnt know how to award it... For example, even most of the die hard (and older) fans of FS would disagree, for my younger sister and her friends T/M FS has Great choreography, partly because they enjoy more when someone skates to 'modern music' and they can easier see the concept behind it... Also, there is no logic in skating competition to judge choreographers work (choreography itself), so you judge only skaters work (which is to show some idea and concept while skating, no matter what that concept is)
 
Top