Judging math details | Page 5 | Golden Skate

Judging math details

gkelly

Record Breaker
Joined
Jul 26, 2003
But they don't all get the exact same score.

At worst, a skater might get feedback that amounts to "Your performance solidly in the Above Average range. You were better than just average in all areas, but not up to Good standard in any. Within that range, the program was somewhat lacking in transitions, and you could have done a better job of interpreting the music."

At best, if at least some judges are brave enough to use wider ranges, the system as is allows for clearer feedback when a performance really is unbalanced. If you take away that possibility because not enough judges are using it, that only exacerbates the problem -- it doesn't solve it.

If anything, I'd rather see more encouragement and guidelines and training for judges to separate the scores more when warranted. Not saying "Oh screw it, they're not using this tool anyway so instead of teaching them how to use it better, let's throw it away."
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
I disagree.
Yes, skaters train them all at the same time. Still, there are visible differences for most skaters. There are very few skaters who are actually well rounded.

But ok, suppose they are all the same because skaters train everything, and they need the 5 different categories to give skaters feedback. But why? since all the cats get the same score, this feedback is as useful as if there was just 1 number, period.

Some visible differencies are sometimes just our own preferencies, i think... Also, giving more marks and then extract one from those as general mark should be better in terms of more detailed feedback and more objective evaluation, comparing to giving just one mark, even all of those marks are exactly the same.. you again have principles of that in school practice... in the end it is one number, but skaters can see how judges came to that one number..
 

moriel

Record Breaker
Joined
Mar 18, 2015
But they don't all get the exact same score.

At worst, a skater might get feedback that amounts to "Your performance solidly in the Above Average range. You were better than just average in all areas, but not up to Good standard in any. Within that range, the program was somewhat lacking in transitions, and you could have done a better job of interpreting the music."

At best, if at least some judges are brave enough to use wider ranges, the system as is allows for clearer feedback when a performance really is unbalanced. If you take away that possibility because not enough judges are using it, that only exacerbates the problem -- it doesn't solve it.

If anything, I'd rather see more encouragement and guidelines and training for judges to separate the scores more when warranted. Not saying "Oh screw it, they're not using this tool anyway so instead of teaching them how to use it better, let's throw it away."

The difference between the different categories is smaller than the difference between different judges =)
So they do get the same score really.
 

moriel

Record Breaker
Joined
Mar 18, 2015
Some visible differencies are sometimes just our own preferencies, i think... Also, giving more marks and then extract one from those as general mark should be better in terms of more detailed feedback and more objective evaluation, comparing to giving just one mark, even all of those marks are exactly the same.. you again have principles of that in school practice... in the end it is one number, but skaters can see how judges came to that one number..

Now, if there are no differences, again why 5 scores instead of 1?
If there is no difference, there is no feedback lols
 

Miller

Final Flight
Joined
Dec 29, 2016
Some actual figures from a real life competition to think about.

I’ve taken the recent Cup of China GP, at random, but where there may be some interest on the Ladies side because there was some controversy over whether Wakaba Higuchi or Alina Zagitova should have won. However firstly I’ll look at the PCS range across the entire competition, all disciplines, and come up with the range of highest component (final figure not individual judges) to lowest, and then the PE/CH/IN range which I was proposing to be merged into a single component earlier on in the thread. I’ll then do a second post looking at ladies in particular where I’ll look at the effect of going to my proposed 25% SS/25% TR/50% ‘Artistry/Performance’ Component. Let’s see what happens.

Men – there’s an awful lot of figures so I’ll show you a couple of skaters to show you what I mean and then attempt to summarise the ranges and the average differences.

Max Aaron – lowest to highest range 7.71 - 8.21, PE to IN 7.89 - 8.18. Differences 0.50/0.29.
Vincent Zhou – all components 7.29 – 7.64, PE to IN 7.50 – 7.61. Differences 0.35/0.11.

Across the 12 men the highest range was 0.67 for Boyang Jin, smallest was Keiji Tanka 0.28, average was 0.42. Across PE/CH/IN biggest was Alexander Petrov 0.64, smallest Boyang Jin 0.03, average 0.26.

Repeating for 11 Ladies highest range was 1.00 (Xiangning Li) , smallest 0.40 (Gabrielle Daleman), average was 0.55. For PE to IN, highest range was 0.57 (Xiangning Li), smallest 0.07 (Gabrielle Daleman + Elizaveta Tuktamysheva), average 0.23.

For 7 Pairs highest/smallest/average across all 5 was highest 0.36 (Marchei/Hotarek), smallest 0.14 (Cain/LeDuc), average 0.25. Across PE to IN it was highest 0.29 (Zhang/Song), smallest 0.07 (Marchei/Hotarek), average 0.16

For 10 Dance highest/smallest/average across all 5 was highest 0.46 (McNamara/Carpenter), smallest 0.22 (Papadakis/Cizeron), average 0.35. Across PE to IN it was highest 0.25 (Chen/Zhao), smallest 0.03 (McNamara/Carpenter), average 0.10.

That’s 40 skaters/teams across all 4 disciplines. Enough I would have thought to draw some conclusions.

Biggest range for any individual skater was 1.00 for Xiangning Li, next highest was Boyang Jin 0.67, while for 32 skaters/couples they were all within 0.50 of one another including all dance and pairs teams. Also for PE to IN biggest range was 0.64 for Alexander Petrov, next highest 0.57, Xiangning Li. 37 skaters/couples were 0.50 or less, in fact 0.40 or less, and all dance/pairs teams less than 0.30. Think you can already tell that for pairs and especially dance you might as well merge the final 3 components into 1, at least simplify things (though see PE comment in next paragraph).

Looking a bit closer at the figures any sort of separation almost always occurred between SS and TR with TR the lower by some way. The only separation in PE to IN was PE for skaters like Javier Fernandez and Alexander Petrov who had pretty disastrous skates, plus there weren’t many like them. However any sort of merging of all 3 needs to take account of this i.e. the judges would need to input a fair average across the board, and that also reflected any disastrous skates. CH and IN were invariably very close together across all 4 disciplines, no matter how the skater performed. Biggest gap was 0.25 (Elena Radionova). These can definitely be merged into one component, IMO.

Finally, though can see no apparent reason for this, the PE to IN range was almost always towards the top of the overall range. Seems that judges like to be nice to skaters re this, but clamp down on TR if anything.

Ladies analysis to follow...

Also I would be grateful if someone could fully explain how PE works. I’m not the greatest at components and to me it always seems like there’s been an element of double counting e.g. ‘performance and execution’ has already been reflected in things like GOE, deductions and interpretation. Is there an element to it that has not already been covered in things the judges mark on? For example it can account for up to 24 marks across a ladies competition and should it really be if it’s already been counted – where I’m coming from is maybe a further proposal of 3 components, SS, TR and combined CH and IN which as we’ve seen are already very similar. Could this work as a simplified system that then allows further separation to occur? Only problem I can see with this is that skaters who have low PE gain, but I would have thought the sort of performances that Javier and Alexander gave could be reflected in a ‘tweaked’ combined CH and IN, e.g. you haven’t particularly ‘interpreted’ very well if you‘ve had a fall or step out, and so something that reflected this means you could have a combined PE/CH/IN component after all.
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
Now, if there are no differences, again why 5 scores instead of 1?
If there is no difference, there is no feedback lols

Because you are getting 5 feedbacks, not only one. Thats the whole point. Its not the point if all those feedbacks are the same (aka good, good, good, good, good), the point is how there is more of them. Giving 5 detailed feedbacks is still considering better than giving one general in any human practice. The question here is if 5 feedbacks are optimal choice, maybe 3 or 4 is enough, but maybe 6 is better if they are not leading to cognitive failure cause information is redudant/not necessary or there are too much informations to take it. But giving only one general feedback is not supposed to be the optimal choice.
I can give an example to explain what i meant. Let say Wakaba need to compete. She already knows of what marks she is capable of, by getting feedbacks in training. So, she knows she is capable to skate around 8.5 marks, but she is aiming for even more, she wanted to skate better she ever did. She finished and her scores are 8.65 8.21. 8.56 8.49 8.40. So she could be very happy how she presented her SS, she is fine with PE and CO, and she could work a little more on IN and TR to present her own max. Even her point range is less than 0.5, she knows in which part of her skating she didnt give her max, where she nearly gave it, and where scores are in the range of her usual self.

EDIT: Regarding topic, I definitely see 3 different things skaters are doing while skating theirs programme: 1) skating on the ice surface (in ice rink) 2) skating to the music rythm 3) skating for the audience. So those are 3 components (goals skaters trying to achieve) there.
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
Some actual figures from a real life competition to think about.

I’ve taken the recent Cup of China GP, at random, but where there may be some interest on the Ladies side because there was some controversy over whether Wakaba Higuchi or Alina Zagitova should have won. However firstly I’ll look at the PCS range across the entire competition, all disciplines, and come up with the range of highest component (final figure not individual judges) to lowest, and then the PE/CH/IN range which I was proposing to be merged into a single component earlier on in the thread. I’ll then do a second post looking at ladies in particular where I’ll look at the effect of going to my proposed 25% SS/25% TR/50% ‘Artistry/Performance’ Component. Let’s see what happens.

I dont know if you also saw what i see - but with half of the skaters there is less difference between SS and CO, than between CO and IN (or difference is similiar). So, with some skaters is better to emerge SS and CO in one mark. Also, PE mark is most dependable of skaters mistake, while CO and IN marks are not that much. Interesting work tho :agree:
And some general explanations of Program Components are here http://www.usfigureskating.org/content/ISU program-component-chart_sandp-and-id_08-16.pdf
 

moriel

Record Breaker
Joined
Mar 18, 2015
Because you are getting 5 feedbacks, not only one. Thats the whole point. Its not the point if all those feedbacks are the same (aka good, good, good, good, good), the point is how there is more of them. Giving 5 detailed feedbacks is still considering better than giving one general in any human practice. The question here is if 5 feedbacks are optimal choice, maybe 3 or 4 is enough, but maybe 6 is better if they are not leading to cognitive failure cause information is redudant/not necessary or there are too much informations to take it. But giving only one general feedback is not supposed to be the optimal choice.
I can give an example to explain what i meant. Let say Wakaba need to compete. She already knows of what marks she is capable of, by getting feedbacks in training. So, she knows she is capable to skate around 8.5 marks, but she is aiming for even more, she wanted to skate better she ever did. She finished and her scores are 8.65 8.21. 8.56 8.49 8.40. So she could be very happy how she presented her SS, she is fine with PE and CO, and she could work a little more on IN and TR to present her own max. Even her point range is less than 0.5, she knows in which part of her skating she didnt give her max, where she nearly gave it, and where scores are in the range of her usual self.

EDIT: Regarding topic, I definitely see 3 different things skaters are doing while skating theirs programme: 1) skating on the ice surface (in ice rink) 2) skating to the music rythm 3) skating for the audience. So those are 3 components (goals skaters trying to achieve) there.

Ok, if you getting the same feedback 5 times, what is the point?
Because that is what happens.

Lets look at your example: Wakaba got 8.65 8.21. 8.56 8.49 8.40. See, while those numbers may look mildly different, in fact they are not. If you put in a different judge, she may as well get 8.65 8.56 8.21. 8.49 8.40.
 
Joined
Jun 21, 2003
She finished and her scores are 8.65 8.21. 8.56 8.49 8.40.

Honestly, I don't think she learned anything at all from these marks. Transitions are always the lowest, apparently by some kind of hidden judges' convention. This is true whether the skater did any transitions or not. Otherwise, Wakaba learned that she skated pretty well, about what she always gets.

The differences among her scores are so slight that they provide no useful feedback. They just reflect random statistical noise. I do not believe that the skater will conclude, "Gee, I better work on my interpretation. Maybe I can bring that 8.40 up to 8.49." In fact, I don't believe that the judges really thought that her interpretation was worse than her Composition at all. Some judges most likely just threw out an 8.25 for one and an 8.5 for the other, while other judges did the opposite, and it came out that way in the average.

Moriel made a good point above, that the difference among judges is greater than the differences among components, with both sets of differences swamped by "sampling error." (This can be confirmed statistically by, for instance, a two-way analysis of variance.)

Well, that's what I think, anyway. :cool:
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
For some of you those are just numbers with no significant difference between them. But in this case your view is not really important. For skaters who of course knows what their range of skating is, those could be significant informations. In example of Wakaba who knows that her range of skating is lets say 7.25 to 9, difference between 8.2 and 8.65 (as scores she got) may be big, as also the difference between 8.2 and 9 (her own maximum).
 

GGFan

Record Breaker
Joined
Nov 9, 2013
Honestly, I don't think she learned anything at all from these marks. Transitions are always the lowest, apparently by some kind of hidden judges' convention. This is true whether the skater did any transitions or not. Otherwise, Wakaba learned that she skated pretty well, about what she always gets.

The differences among her scores are so slight that they provide no useful feedback. They just reflect random statistical noise. I do not believe that the skater will conclude, "Gee, I better work on my interpretation. Maybe I can bring that 8.40 up to 8.49." In fact, I don't believe that the judges really thought that her interpretation was worse than her Composition at all. Some judges most likely just threw out an 8.25 for one and an 8.5 for the other, while other judges did the opposite, and it came out that way in the average.

Moriel made a good point above, that the difference among judges is greater than the differences among components, with both sets of differences swamped by "sampling error." (This can be confirmed statistically by, for instance, a two-way analysis of variance.)

Well, that's what I think, anyway. :cool:

Baron Vladimir, I don't think the problem is that there isn't a difference between 8 and 8.5, the problem is that it may be meaningless from a mathematical point of view. A difference of .5 doesn't matter if the margin of error is plus or minus 1 for example. Wakaba might well take some lessons away but she shouldn't given those numbers. As I've said I don't know what the margin of error actually is but the fact that the system doesn't take it into account bothers me.
 

Miller

Final Flight
Joined
Dec 29, 2016
I dont know if you also saw what i see - but with half of the skaters there is less difference between SS and CO, than between CO and IN (or difference is similiar). So, with some skaters is better to emerge SS and CO in one mark. Also, PE mark is most dependable of skaters mistake, while CO and IN marks are not that much. Interesting work tho :agree:
And some general explanations of Program Components are here http://www.usfigureskating.org/content/ISU program-component-chart_sandp-and-id_08-16.pdf

No, I hadn't noticed (re SS and CO). Had noticed PE to IN range was towards the top with TR lower as you said. Throws a whole new complexion on things, the average difference between SS and CO for the whole event was only 0.14 (barely 0.2 of a mark), with Xiangning Li by far the biggest outlier with a 0.57 difference (Boyang Jin next with 0.35). Given that the max difference between 40 skaters/couples for CO and IN was 0.25 i.e. 0.4 of a mark in reality (0.5 for men), then could you really combine SS, CO and IN into 1 component. Am going to have to read your link, but if anyone can come up with a simple reason why they are so similar then it would be gratefully received!

Cup of China protocol - http://www.isuresults.com/results/season1718/gpchn2017/
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
Baron Vladimir, I don't think the problem is that there isn't a difference between 8 and 8.5, the problem is that it may be meaningless from a mathematical point of view. A difference of .5 doesn't matter if the margin of error is plus or minus 1 for example. Wakaba might well take some lessons away but she shouldn't given those numbers. As I've said I don't know what the margin of error actually is but the fact that the system doesn't take it into account bothers me.

Which mathematical point of view is important to you regarding numbers in PC? The only one i see is to win the competition. But just upping PC categories to your own maximum is not enough to win, isnt it. Point of numbers in PC is that they are carriers of some informations. Number 8.25 has one meaning to one skater, while different to another. In a skaters own point range 7 can means bad, 7.5 average, 8 good and 8.5 excelent. For some other 8.5 is bad. PC numbers doesnt have point without that context. Difference between 8.5 and 8.25 here is not mathematical - 0.25 points, but it says that you are one range better. PC numbers working the same as giving numbers or letters in school practice while judging students. If you want to see who is your best student to work with you for example, than you can find students average scores in 5 classes and then sum it up... Like i already said in earlier posts when i discussed 2014 ice dance comp, i also see a problem there, but i dont think those two things are necesserely connected, because judging 0.5 difference in PC is not there to explain differences between skaters or to give some of them medal, but just to differentiate skaters own abilities.
 

GGFan

Record Breaker
Joined
Nov 9, 2013
Which mathematical point of view is important to you regarding numbers in PC? The only one i see is to win the competition. But just upping PC categories to your own maximum is not enough to win, isnt it. Point of numbers in PC is that they are carriers of some informations. Number 8.25 has one meaning to one skater, while different to another. In a skaters own point range 7 can means bad, 7.5 average, 8 good and 8.5 excelent. For some other 8.5 is bad. PC numbers doesnt have point without that context. Difference between 8.5 and 8.25 here is not mathematical - 0.25 points, but it says that you are one range better. PC numbers working the same as giving numbers or letters in schooll practice while judging students. If you want to see who is your best student to work with you for example, than you can find students average scores in 5 classes and then sum it up... Like i already said in earlier posts when i discussed 2014 ice dance comp, i also see a problem there, but i dont think those two things are necesserely connected, because judging 0.5 difference in PC is not there to explain differences between skaters or to give some of them medal, but just to differentiate skaters own abilities.

I think this works well at the developmental level where there are clear differences and one can really take leaps in improving things. My focus of course (and I'm biased) is at the very top level where the distinctions are finer and medals are being handed out. I don't think if an elite skater like Wakaba receives a 8.12 in a component and then a 8.37 at the next she can really draw any conclusions that she improved in that component. That .25 should be meaningful according to the system but I don't think it is.

It might just as likely be due to lots of other factors like panel composition, crowd response, standings, etc. So the differences between the components are not very helpful but also the differences from competition to competition are not very helpful either at least in the short term. I think Mathman and moriel have made these points more eloquently.

I think the most useful information is the corridor that the judges put the skater in. I would love for the judges to explain that and be more open about it. For example, at her best someone like Carolina is going to be in a certain range and someone like Wakaba is going to be in a certain range. I see no problem with the judges/ISU etc. being more explicit about why that is the case. There basically seems to be a PCS base value that is assumed but not discussed.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Some actual figures from a real life competition to think about.

Thanks for the analysis.

Also I would be grateful if someone could fully explain how PE works. I’m not the greatest at components and to me it always seems like there’s been an element of double counting e.g. ‘performance and execution’ has already been reflected in things like GOE, deductions and interpretation. Is there an element to it that has not already been covered in things the judges mark on?

As of last year the third component is officially called just Performance.
There are some disparate criteria being considered in this one mark (so if anything it might make sense to break it down further rather than merging it with something else), but even when the word "Execution" was part of the name, none of the criteria explicitly had anything to do with the success of the elements.

The current criteria are
*Physical, emotional, intellectual involvement and projection
*Carriage & Clarity of movement
*Variety and contrast of movements and energy
*Individuality / personality
(And then unison/"oneness" and spatial awareness between partners for pairs and dance)

These apply throughout the whole program. You can get a sense right from the beginning before any elements of
-how into the program the skater is and how well she's projecting it to the audience;
-how good her posture and body line are and how precisely or vaguely she moves;
-whether all the movements are slow, quick, or medium; smooth, sharp, or medium; high, low, or medium; small, large, or medium; internally or externally focused, etc. (if the movements and movement qualities are all the same, most likely all medium, then the skater is not demonstrating much variety and contrast)
-whether you get a sense that this performance is coming from this specific skater, that she's not just a body going through the motions

Then the question is whether she can sustain the good qualities present in the opening movements throughout the whole program, or add even more as the program progresses.

Again, the score applies to every second of the program -- how well does the skater do all this while just getting from one end of the ice to the other, during steps or field moves or other transitions, while setting up a jump or spin, during the elements, exiting the elements, in opening and closing poses and any posing in between, etc.

The exits of the jumps, good or failed, are each a couple of seconds at most. And while they are very important to the GOE of the jump, they are no more important than any other second in the program to the Performance of the program.

So a fall or stumble in itself would not affect the PE score. But if a failed jump landing leads to the skater losing connection with the music and audience for the rest of the performance afterward, leads to skating more stiffly and with less clarity and variety for the rest of the program, or leads to more and more errors -- or if the error is a symptom of a larger problem the skater was having with performing today even before the error -- then we would expect a lower PE score than on a good day.

A skater who is ill or injured or very nervous/feeling pressure or otherwise just in a bad mood, or working with unreliable equipment, might have an uncharacteristically bad skate by their standards, and we would expect the scores to be lower than when they're having a good day -- even if there are no falls or the same number of falls on both days. Of course, a fall that really hurts/slightly injures a skater or gets the skater depressed that they're not going to win now or takes so long to recover from that they never catch up with the music might be the cause of the rest of the performance program being below standard.

But it's the struggle throughout the program (if present) and the lack of involvement/clarity/variety/projection of personality, not a fall in itself, that would be penalized in this mark.

Honestly, I don't think she learned anything at all from these marks. Transitions are always the lowest, apparently by some kind of hidden judges' convention. This is true whether the skater did any transitions or not.

That is generally the case. But then the skaters can look at the size of the gaps. If we know that the norm is for judges to reduce the Transitions score, then how much they reduce it does carry some meaning.

Let's say that TR score 0.25 to 0.5 lower than SS is the norm and doesn't really tell us anything except that judges are automatically deducting one or two increments from the SS. Differences in that range would tell skaters only that judges were not especially impressed by their transitions favorably or unfavorably and just defaulted to the usual reduction.

Less than 0.25 difference between SS and TR would mean that some individual judges thought the transitions were on par with or even better than the skating skill. This can tell skaters that they're doing better than the norm with their transitions relative to their skating skills.

(E.g., at 2017 Worlds freeskate, three judges gave Jason Brown the same score for SS and TR. Three gave Patrick Chan the same score for those two components, and one gave 0.25 higher for Transitions, with a gap of only 0.11 between the averages of those components.

Lower in the standings, Julian Zhi Jie Yee had a TR score that was 0.04 higher than his SS. In fact SS was his lowest component. So that tells him he needs to work on his technical skating, but he did a better job than most skaters at maximizing his transitions within the limitations of his technique.)

If the gap is more than 0.5, that means that most judges scored the more severe of the two "default" differences and some judges went at least 0.75 lower on TR than SS. (E.g., at Worlds Maxim Kovtun had the largest gap of 0.53, with two judges going down 0.75 from their SS score.) That would tell the skater that at least some judges thought their program was deficient in transitions compared to the rest of the field.

In general, there is a whole scale of 0-10 to cover the whole range of skating from beginners to best in the world.

The scores for any given skater or for any group of skaters will be in a much narrower range, so the meaningful differences will be much smaller -- on a scale that differences in the way individual judges use numbers does confound but doesn't completely obscure.

The differences among her scores are so slight that they provide no useful feedback.

I disagree. I think there is some valuable information to be teased out of the patterns of scores and the detailed protocols for those who look more closely and take into account that judges will have different opinions about which skater was better or which component was better for a given skater.

For some of you those are just numbers with no significant difference between them. But in this case your view is not really important. For skaters who of course knows what their range of skating is, those could be significant informations. In example of Wakaba who knows that her range of skating is lets say 7.25 to 9, difference between 8.2 and 8.65 (as scores she got) is big, as also the difference between 8.2 and 9 (her own maximum)

Yes.

I also think that the system as written has the potential to give even clearer information, if judges can be encouraged to be bolder in spreading their scores for different components for the same performance. So for me, the solution to the current situation is to keep the rules and improve the performance of the judges, not to throw away the rules.

But even if the ISU impresses on the judges "Go for it. Stop worrying about the corridor and use your scores to tell the skaters what you really think," they're not all going to think the same thing.

E.g., for one program, one judge might think "There was a really clear clever concept behind that program: I understood the character she was playing, the program really told a story, there were lots of little details during the step sequence and choreo sequence and even during the spins and jump landings to express the music and fit in with the theme. Great choreography! But she looked like she was executing the choreographer's vision a bit mechanically, and she got behind the music for a while after that fall and missed the big swell in the music at the start of the spiral. 9.0 for Composition, 8.0 for Interpretation."

And another judge thinks "That program did a great job of telling a story and carrying through a theme, and it was timed really well to the music except when the skater got behind for a few seconds there in the middle. But in terms of pattern there was a lot of just skating around in counterclockwise circles or end to end to set up jumps, and I didn't like that all three spins were placed in pretty much the same spot on the ice. 8.0 for Composition, 9.0 for Interpretation."

So both judges have taken to heart the directive to spread their marks, and both have good reasons for rewarding one of those components more than the other. But the average score is still going to come out the same for both marks.

Is that better or worse than if each judge individually split the difference and gave 8.5 for both marks?

Even if the skater looks at the protocol and sees that these two judges had completely opposite opinions of those two components, they would have to guess as to the reasons. If they go to critique sessions early in the season, they could get more detailed feedback about what the strengths and weaknesses of that choreography are and make some changes to address the weaknesses.

If each judge got to score each bullet point for each component separately, then it would be easier to guess the reasons behind the numbers.

But asking judges to document all those decisions during the program during the few minutes afterward would be asking a lot more of the judges than just having those thoughts and consolidating them into five scores.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
I think this works well at the developmental level where there are clear differences and one can really take leaps in improving things. My focus of course (and I'm biased) is at the very top level where the distinctions are finer and medals are being handed out.

And my focus and my bias are on applying the system to the whole range of skaters, not just to the very top seniors where the important medals are handed out.

Thus our difference in perspective.

It may be appropriate to have different rules for developmental levels. For example, USFS has recently changed the component rules for lower levels to remove some components entirely at juvenile and intermediate levels and to use different component factors so that Skating Skills is worth considerably more than the others for juvenile, intermediate, and novice.

So the skaters at these levels, especially novice, still get some feedback about their presentation qualities, and they do figure into the results to some degree. But by design the biggest determinant of the results, at least on the PCS side, is Skating Skills, because that's where USFS wants skaters at those levels to be focusing their development.

However, all skaters at each of those levels compete under the same rules as others in the same level, whether they are realistically aiming for National medals or just squeaked past their tests and want to have the experience of competing at regionals or getting detailed feedback on their elements and components, or anywhere in between.

The ISU makes rules for seniors and juniors (which national federations tend to follow verbatim for domestic events). That means all seniors and all juniors compete under the same rules, whether they are realistically aiming for world medals, or whether they can only do double jumps but are the best age-eligible skater in their country to represent their federation internationally on the JGP, or are 19 or 25 but still love to compete at their own expense nationally or in lower tier senior B events. And anywhere in between, including those who can meet the minimums to compete at Worlds or Olympics but making it to the freeskate, or making top 10, would be a major accomplishment for them personally and for their federations.

The same rules need to serve all those skaters -- including the medal contenders, but not just the medal contenders. If in fact the distinctions are finer at the top levels, then we need a system that is able to make those fine distinctions.

It may be that there could be a viable way to make those distinctions more accurately with different rules than we have now. But combining all the PCS into one would not be the right direction to make that improvement.
 

GGFan

Record Breaker
Joined
Nov 9, 2013
And my focus and my bias are on applying the system to the whole range of skaters, not just to the very top seniors where the important medals are handed out.

Thus our difference in perspective.

It may be appropriate to have different rules for developmental levels. For example, USFS has recently changed the component rules for lower levels to remove some components entirely at juvenile and intermediate levels and to use different component factors so that Skating Skills is worth considerably more than the others for juvenile, intermediate, and novice.

So the skaters at these levels, especially novice, still get some feedback about their presentation qualities, and they do figure into the results to some degree. But by design the biggest determinant of the results, at least on the PCS side, is Skating Skills, because that's where USFS wants skaters at those levels to be focusing their development.

However, all skaters at each of those levels compete under the same rules as others in the same level, whether they are realistically aiming for National medals or just squeaked past their tests and want to have the experience of competing at regionals or getting detailed feedback on their elements and components, or anywhere in between.

The ISU makes rules for seniors and juniors (which national federations tend to follow verbatim for domestic events). That means all seniors and all juniors compete under the same rules, whether they are realistically aiming for world medals, or whether they can only do double jumps but are the best age-eligible skater in their country to represent their federation internationally on the JGP, or are 19 or 25 but still love to compete at their own expense nationally or in lower tier senior B events. And anywhere in between, including those who can meet the minimums to compete at Worlds or Olympics but making it to the freeskate, or making top 10, would be a major accomplishment for them personally and for their federations.

The same rules need to serve all those skaters -- including the medal contenders, but not just the medal contenders. If in fact the distinctions are finer at the top levels, then we need a system that is able to make those fine distinctions.

It may be that there could be a viable way to make those distinctions more accurately with different rules than we have now. But combining all the PCS into one would not be the right direction to make that improvement.


I'm sure someone with more expertise can say the following more eloquently, but it's the teasing of these small numbers that bothers me. The scale is meant to cover everyone in skating, but at the top it really bunches people between 7.5-9.5 and I think some of the small differences are eaten up by variables and statistical noise. I think you're probably able to draw some conclusions after the season but it's more difficult to do so from competition to competition.

Can they reset the scale on the elite level? I have no problem with 1-10 at developmental levels meaning something different than 1-10 at the elite level. I doubt that judges would ever use a 1, but even a 3-10 would be helpful. I'm more confident of judges distinguishing with a wider range.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
I'm sure someone with more expertise can say the following more eloquently, but it's the teasing of these small numbers that bothers me. The scale is meant to cover everyone in skating, but at the top it really bunches people between 7.5-9.5

It bunches people at similar skill levels, whatever that level is.

In large events, there will be several tiers of skill levels. The large-scale distinctions (whole numbers) are between tiers. The small-scale distinctions are within a tier (decimal places).

You can't spread out the scores for the top tier while also distinguishing them from the lower tiers in the same event. You could use a different scale for the Grand Prix Final, but not at Four Continents or Olympics.

and I think some of the small differences are eaten up by variables and statistical noise. I think you're probably able to draw some conclusions after the season but it's more difficult to do so from competition to competition.

If it's valuable at all, then some is better than none.

The statistical noise is a problem. But I don't think the best way to overcome it is to throw up our hands and stop trying to make any distinctions at all.

Can they reset the scale on the elite level? I have no problem with 1-10 at developmental levels meaning something different than 1-10 at the elite level. I doubt that judges would ever use a 1, but even a 3-10 would be helpful. I'm more confident of judges distinguishing with a wider range.

Here's a protocol from an elite competition in which the highest score awarded by a single judge for a single component was 10.0 and the lowest was 4.00:
http://www.isuresults.com/results/season1617/ec2017/ec2017_Men_SP_Scores.pdf

Here's an event from the same 2017 Cup of China discussed for other disciplines above, where the scores ranged from 10.0 to 4.5:
http://www.isuresults.com/results/season1718/gpchn2017/gpchn2017_Pairs_FS_Scores.pdf

Here's a senior event where the scores awarded ranged from 7.50 for the winner down to 2.00 in last place:
http://www.isuresults.com/results/season1718/csger2017/csger2017_Ladies_SP_Scores.pdf

9.00 down to 3.00:
http://www.figureskatingresults.fi/results/1718/CSFIN2017/CSFIN2017_Ladies_FS_Scores.pdf

And so forth.

That's pretty much the range for senior competition.

You can't say we're going to distinguish between medal contenders by using scores down to 4 or 3 for the weakest components of the medal contenders and also distinguish those top skaters from the non-medal contenders competing in the same competition.
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
I'm sure someone with more expertise can say the following more eloquently, but it's the teasing of these small numbers that bothers me. The scale is meant to cover everyone in skating, but at the top it really bunches people between 7.5-9.5 and I think some of the small differences are eaten up by variables and statistical noise. I think you're probably able to draw some conclusions after the season but it's more difficult to do so from competition to competition.

Can they reset the scale on the elite level? I have no problem with 1-10 at developmental levels meaning something different than 1-10 at the elite level. I doubt that judges would ever use a 1, but even a 3-10 would be helpful. I'm more confident of judges distinguishing with a wider range.

The problems is that all top skaters had something exceptually good. By current system judges should award every special detail, so we can have more skaters getting big transition score (by different ways). Subjectively, we as a viewers can like one way more than another, but judges job is to award every way of exceptionaly good skating. I think this is fine, because otherway (if the rules are more rigid) we can see better diferantiation between skaters PC scores, but im afraid all programmes will became the same, thus competition will became something like Compolsary dance in the past, boring for casual viewer. So, scores for elements TES is the one who is there to decide the winner. Another problem is that TES score is too far in terms of numbers from PCS in Man, ladies and pairs, thus small diferencies between PC even means less in the end. With bringing scores back to weight 50/50 (by multiple pcs more will be better option i think) those differencies of 0.25 could mean more in total score.
 
Joined
Jun 21, 2003
Looking a bit closer at the figures any sort of separation almost always occurred between SS and TR with TR the lower by some way.

Although the TR scores are uniformly lower than SS, they are all lower by the same amount. I always figured that this was because transitions (number, variety, difficulty, and quality) are more objective than any of the other four components. Therefore the judges feel a little more confident in giving out lower scores to good skaters.

Statistically, the correlation between SS and the other 4 is the highest among the 5 components. If you are given the SS score you can predict the other 4 with great confidence. I think that thiis is because the judges enter the SS score first. In any kind of exercise of this type, the judges will enter one mark and then will key on that mark for the rest. It is like a political poll. You ask a leading question first, then the answers to the rest of the questions will all tend to turn out the way you want them to. IIRC the ISU did some experiments in which the judges entered some other component first. Whichever was entered first had the greatest predictive power for all the rest.
 
Top