Could judges be replaced by computers in giving TES score? | Page 2 | Golden Skate

Could judges be replaced by computers in giving TES score?

jersey1302

On the Ice
Joined
Jan 10, 2016
Country
Canada
I think they potentially could but in saying that I don’t think they will. I’m sure someone would figure out how to hack the system and rig it anyways lol 😆
 

Casual

On the Ice
Joined
Jan 26, 2018
What they need to do is establish post-competition audit of all scores - and investigate if "post-game" independent scores differ too much from the scores during competitions.

Also, computers can be used to analyze actual scores to identify bias and "fixed" competitions.

However, any such suggestions assume that ISU is actively trying to achieve fair results - which we know, they are not. :laugh:
 

lzxnl

Final Flight
Joined
Nov 8, 2018
What they need to do is establish post-competition audit of all scores - and investigate if "post-game" independent scores differ too much from the scores during competitions.

Also, computers can be used to analyze actual scores to identify bias and "fixed" competitions.

However, any such suggestions assume that ISU is actively trying to achieve fair results - which we know, they are not. :laugh:

ISU have already rejected proposals to have a separate PCS judging panel. Either they are committed against fair judging, or they don't have the funds. Either option suggests your post-competition audit (which, for the record, I agree with) isn't likely to proceed.
 

NaVi

Medalist
Joined
Oct 30, 2014
Yesterday I read an article onI think TASS from the high up Russian in the ISU saying that they'll look into using technology as a tool to help the judges.

I think it makes a lot of sense that after quads and triple axels become regular in ladies skating that big triples(ones with at least an X amount of airtime) get a bonus.
 

Miller

Final Flight
Joined
Dec 29, 2016
How would this proposed "post-competition audit" differ from the current officials assessment procedures?

See ISU Communication 2194 at https://www.isu.org/figure-skating/rules/fsk-communications?limit=20&limitstart=40

This is an absolute cracker. Based on typical base values a senior ladies' SP could be scored by an individual judge as anywhere between +/- 10 points from the average and still be deemed OK, plus a LP could be +/- 20 points and still be OK! Plus you're allowed 1 error every 8 skaters anyway so you could be outside the range a lot of times and not be evaluated.

SP - acceptable GOEs = 1.5 avg, so +/- 4 points per segment based on typical elite BV of 27/28 points after removing lower value jumps in combinations/10% stuff. PCS = 7.5 net component scores so 6 points converted to actual points. Hence a 60 point SP could be scored by an individual judge as anywhere between 50 and 70 points and still be OK.

Similarly for LP. 1.5 avg per GOE gives +/- 8 points in GOE, based on a typical elite skater's BV of 52/53 points after removing lower value jumps/10% stuff, while 7.5 in component scores gives +/- 12 points in PCS i.e. a 140 Free Skate could be scored as high as 160 or as low as 120 and still be deemed OK (event total = +/-30 points on a score of 200 say) and as above you're allowed 1 'error' every 8 skaters anyway!
 

CanadianSkaterGuy

Record Breaker
Joined
Jan 25, 2013
So long as we have the GOE system, the answer will be a hard no to complete automation of judging. It will, at best, be computer-assisted human judging. As a reminder, here are the six bullet points for jump GOE:

1) very good height and very good length (of all jumps in a combo or sequence)
2) good take-off and landing
3) effortless throughout (including rhythm in Jump combination)
4) steps before the jump, unexpected or creative entry
5) very good body position from take-off to landing
6) element matches the music

#1 I can see this being assisted very easily with computers, with raw data and real-time ranking of height, speed, and distance relative to the rest of the field and to data history.
#2 About half here. Speed going in and out of a jump can be computer assisted, but a skid or toe slip on the take-off or the type of error and how much to deduct is something a human judge can do much better.
#3 "Effortless" is completely determined by human judgment.
#4 The computer can detect steps, but only a human can say whether something is "unexpected" or "creative".
#5 Maybe a bit here for good body position for detecting any wraps or leg separation, but again, a "good" body position is mainly something only a human judge can determine.
#6 A computer would need to detect the beat and maybe phrasing of an element to the music, but that's tough to code, and with musical sections that don't have a clear beat, "musicality" and "phrasing" is hard to determine without a human judge.

So in summary, a computer would help about 25-35% in determining GOE overall.

That said, where the computer would help tremendously is in detecting edge changes, pre-rotations, and under-rotations since the human eye is TERRIBLE at this. Jump placement, angle, and bias can all seep into these types of calls - I mean, we even have the "!" call where the edge change is unsure. Having several cameras around the rink to determine blade rotation and edge angle would make much more sense and assist the technical panel significantly.

I noticed that certain people online/Twitter are up in arms about Eric Radford mentioning how subjective aspects of GOE are, and your/this post is a good outline as to why GOE is subjective (and why those people need to chill, lol). :biggrin:

Yes, things like "good height and distance" are measurable (by an accurate computer) but for now they are treated as subjective absolutes and not quantifiable relatives -- judges obviously can't compare the heights of jumps by two singles skaters side by side ... so they need to make a subjective call as to if, in their opinion , the height and distance is very good. Expecting them or the ISU to look at IceScope data is futile - nobody's going to remember that, and skaters jump differently depending on the competition (skater A might do a 50 cm 3A in Skate Canada and a 60 cm 3A at Worlds).

"Effortlessness" is also a super subjective call. So is whatever constitutes preceding steps/unexpected/creative entry (or even steps - some judges might call Mohawks/3-turns as sufficient whereas others might look for Choctaws/counters/twizzles.... some might need a minimum of 3+ steps leading up to a jump, others might only give GOE if the takeoff edge is directly leading from the preceding steps instead of a change of edge or foot. "Good" takeoff/landing and body position are subjective too.

As I've said before the worst subjective one is the musical structure bullet. There's no quantifiable amount of an element matching the music - some say a jump matches a musical phrase if the skater takes off on the beat, some say if they land on the beat, some say if the exit transition (eg a a kick up or falling lead or whatever) is on the music, some saying a jump that's on the on beat matches, some saying on the off beat it matches, etc etc.

Basically, every GOE bullet is currently subjective because they use fundamentally qualitative descriptors words like "good"/"very good"/"unexpected"/"matches the structure". There's no X cm threshold the amount of height a jump must have to get a bullet; there's no protractor measuring a skater's axis of rotation in the air to ensure their body position is completely perpendicular to the ice with a lean allowance of 3 degrees (or whatever quantifiable criteria body position might have) to get a bullet. As you said there is no computer syncing the landing of a jump to the beat of the music to get a bullet. Same goes for deductions - there's no rule as to how long a skater can prep for a jump to be considered telegraphed, for jumps with a "range of deduction" (like a poor takeoff being -2 or -3) there's no rule as to what constitutes -2 versus a -3 (page 16: https://www.isu.org/docman-document...munications/17142-isu-communication-2168/file). These are some specifics the ISU can clarify and would be a good step towards more accurate, subjective judging.

For objective things like height and distance and correct landing edge and even rotation, technology can definitely help but we are a way's away from determining how reliable and accurate it can be.

Some folks are already using blog posts that measure jump height based on YouTube frame rates, or regarding IceScope data (of which there's a small sample size at that) like it's dogma. There's no published paper, as far as I know, as to the accuracy of these methods. If skaters' scores/results are partially determined by tech, then this is a must-have.

Jumps aren't always black and white and so the assessment of them will always be subjective but the introduction of technology - that is verified for accuracy - can definitely help assess technical aspects more reliably. Bringing this tech to every competition would be expensive as heck, but maybe it could be a future reality.

Trust me, I would love the judges to have less power, and the scores to be more computer-generated but right now that is impossible, given that the qualities of a jump are more than just objective, quantifiable ones.

But for now, as Radford says, GOE is subjectively derived - I mean, duh, if it were objectively derived then every judge would share the exact same opinion and give out the exact same GOE scores! :laugh:
 

NaVi

Medalist
Joined
Oct 30, 2014
I don't think it's possible to use jump data to consistently apply GOE because there are too many other factors that go into a jump... but I do think it's possible to use jump data to apply a binary big jump bonus for triple jumps that really do have a "wow" amount of height... say jumps in the 90th percentile of airtime as measured in the previous season. I haven't thought it through, but if this was done it might be a good idea to restrict the bonus... say to certain kinds of jumps or to 1 in the SP and 2 in the FS.
 

tral

Match Penalty
Joined
Mar 27, 2019
Expecting them or the ISU to look at IceScope data is futile - nobody's going to remember that, and skaters jump differently depending on the competition (skater A might do a 50 cm 3A in Skate Canada and a 60 cm 3A at Worlds).
Lol, as if anyone said that they should be looking at data from five competitions back to make judgments in the current one. The call is to try and figure out a way to incorporate real time data into real time judging.
 

CanadianSkaterGuy

Record Breaker
Joined
Jan 25, 2013
Lol, as if anyone said that they should be looking at data from five competitions back to make judgments in the current one. The call is to try and figure out a way to incorporate real time data into real time judging.

There are definitely people (on Twitter/forums) who are already drawing conclusions from IceScope, and some who have expressed wanting to leverage the IceScope data (particularly ones who were upset about the scoring at Worlds and other competitions). Speaking of the sample size being limited - what are the competitions it's been used at? AFAIK it's been used primarily in Japan so far?

Using real time data to augment judging is a nice thought but it's a stat on something that represents just a fraction of what makes up a good program. Every bit helps of course but we are talking about one GOE bullet - and developing accurate tech for usage in all major competitions is a huge undertaking.
 
Joined
Jun 21, 2003
This is an absolute cracker. Based on typical base values a senior ladies' SP could be scored by an individual judge as anywhere between +/- 10 points from the average and still be deemed OK.

Frankly, it's Ok with me, too. Who says the "average" is right? I often find that that my scoring at home is different from what the average of the judges comes up with. I am usually pretty sure that I am right and the average is wrong. :yes:
 

gkelly

Record Breaker
Joined
Jul 26, 2003
but I do think it's possible to use jump data to apply a binary big jump bonus for triple jumps that really do have a "wow" amount of height... say jumps in the 90th percentile of airtime as measured in the previous season.

90th percentile of what population of jumps the previous season? Which competitions do you take the data from?

90th percentile of all jumps executed at Worlds will be a lot higher than 90th percentile of all jumps executed on the JGP. But judges should be applying the same standards to both.
 

NaVi

Medalist
Joined
Oct 30, 2014
90th percentile of what population of jumps the previous season? Which competitions do you take the data from?

90th percentile of all jumps executed at Worlds will be a lot higher than 90th percentile of all jumps executed on the JGP. But judges should be applying the same standards to both.

Whatever makes sense... this is like a 5 minute discussion among the major stakeholders. TBH, the last 4 seasons of the world championships would be a good sample. And they could calculate that for juniors too or just use the senior sample. And they could just calculate it once and revise it whenever.

But the point is to reward jumps that "wows" people like but less than how a quad "wows" people. Which means setting the bar up high to the 90th percentile. I don't think it should count as much as a quad but I think it would be good to make it easier for some without quads to compete with those who do have quads.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Of course, skaters who have quads are more likely to have big jumps and to earn even more of the big-jump rewards.

If you want to give opportunities for non-quad-jumpers to win points to help them compete with the quadsters, it would be better to emphasize skills that don't rely on air time.

Or go back to explicitly mentioning delayed rotation as a positive GOE criterion, to reward those who can get the air time but not the extra rotation.



I don't think there should be a separate measurable jump size standard for juniors than for seniors. There are more factors that go into making a jump wow audiences and judges on account of its size than what level competition the skater is entered in. After all, some 16-year-olds are competing as juniors and some as seniors in any given season, and some individual skaters compete in both types of competitions in the same season.

Sex also makes a big difference in jump size.

Also the skater's absolute size and their body type, which is correlated with both sex and age but can vary significantly within the same sex and age cohort based on genetics.

The size of the jump will be influenced by all of those things, as well as the skaters' technique.

You could easily have two skaters of the same age and sex who execute the same jump with the same measurable height and distance, but one makes you think "Wow!" and the other doesn't.

So what is the reward for good height and distance on a jump really supposed to reward?
 

NaVi

Medalist
Joined
Oct 30, 2014
Of course, skaters who have quads are more likely to have big jumps and to earn even more of the big-jump rewards.

If you want to give opportunities for non-quad-jumpers to win points to help them compete with the quadsters, it would be better to emphasize skills that don't rely on air time.

Watch this video on the biomechanics of figure skating jumps. Quads have far more to do with spinning faster than gaining height. And skaters often jump less high they they can for the sake of consistency. Skaters who can do quads are more likely to be skinny than to jump high. There are plenty of skaters who can jump high who can't do quads.

https://www.youtube.com/watch?v=TacSOaXZgJQ
 
Joined
Jun 21, 2003
I don't think it's possible to use jump data to consistently apply GOE because there are too many other factors that go into a jump... but I do think it's possible to use jump data to apply a binary big jump bonus for triple jumps that really do have a "wow" amount of height...

I don't know how the details would work out, but I do like this way of looking at the bullet points for GOE. IMHO the biggest problem with GOE judging is that not enough 0's are given out. The jump has OK height, works OK with the music, has acceptavble flow out of the landing, and is basically "thanks, next."

Your reward for doing am OK triple Lutz. even a "textbook" one, is the base value. Plus GOE should mean something special, just like negative GOE means an identifiable error.
 
Joined
Jun 21, 2003
I'm by no means an expert on computers (I barely know anything), but we may be underestimating what they would be able to do. We're looking this as a coding challenge, but the way that computers have handled complex processes is to teach themselves and improve over time. Given enough time and inputs of programs perhaps based on measurable aspects (speed, height, etc.) computers would be able to teach themselves to reach a more accurate score than human beings. ...

I think the problem would be, how would the computer know when it is getting better instead of getting worse? In successful artificial intelligence projects there is a well-defined goal. When the machine does something that gets it closer to the goal, that sort of behavior is internally rewarded and repeated.

The canonical example would be something like a program that plays chess. The machine "knows" whether it won or lost. What would constitue "winning" a figure skating judging exercise?
 

Miller

Final Flight
Joined
Dec 29, 2016
Frankly, it's Ok with me, too. Who says the "average" is right? I often find that that my scoring at home is different from what the average of the judges comes up with. I am usually pretty sure that I am right and the average is wrong. :yes:

Actually I was being sarcastic. Something like Bradie Tennell's SP at the World Championships could be scored anywhere between 59 and 79 points and be deemed OK, while her Free Skate could be anywhere between 123 and 163, which is crazy really (IMO). However way off topic for this thread, maybe something for a judging bias thread one day, and how much an unscrupulous judge could get away without being called into question.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
I don't know how the details would work out, but I do like this way of looking at the bullet points for GOE. IMHO the biggest problem with GOE judging is that not enough 0's are given out.

I think that there were many more 0s given out in the early years of IJS and that changes to the GOE rules have developed along with consensus within the judging community that judges should be making more distinctions between the quality of elements, not fewer.

(Meanwhile there have also been more gradations introduced in the base values themselves with the lowered base values for< and e calls, the +REP designation, etc.)

But I think it's hard to get a good sense of what is "average" vs. "good" when you're mainly focusing on the elite. Even the 24th best skater in the world is typically well above average compared to the senior field as a whole. And the average senior is typically much stronger than the average skaters at lower levels, who are also scored by the same standards.

I think there would be something wrong if these elite skaters were not doing many of their elements at a higher standard than non-elites trying the same skills, and something wrong with the system if they weren't being rewarded for it.

Also with the change to the +5/-5 gradations, I think there are a lot more opportunities to find at least one positive quality in a basically average successful jump. Especially at the elite level, I'd say that +1 is the new 0.
 

CanadianSkaterGuy

Record Breaker
Joined
Jan 25, 2013
You could easily have two skaters of the same age and sex who execute the same jump with the same measurable height and distance, but one makes you think "Wow!" and the other doesn't.

So what is the reward for good height and distance on a jump really supposed to reward?


Exactly. This is a SUBJECTIVE bullet (based on opinion) - and ideally an ABSOLUTE bullet (based on that skater, independent of other skaters before or after them). The judges are not referring to IceScope for the OBJECTIVE height/distance (based on cm, which mind you is partially based on a subjective camera man), and they certainly aren't remembering that skater's height/distance RELATIVE to other skaters.

It's almost comical how up in arms people are that the IceScope data (which has yet to even be verified as accurate, and has a super small sample size) should have some bearing on GOE. If there were an objective minimum height (oh, let's say, 0.5 m height, and 2.50 m distance), then skaters who surpass that get the bullet -- whether they jump 0.6m high/2.52 m distance or 0.8m high & 2.7 m distance.

It's like complaining that a skater who did 6 difficult variations in a spin should get level 6... no, they get level 4... sure they did more than the bare minimum, but they don't get extra for going beyond. If a skater has very good amplitude, they score the same as as skater who has exceptional amplitude - because the bullet only calls for very good height/distance to be awarded the bullet -- exceptional height/distance does not get more points (and if it should get more points it would be something incredibly marginal like a few extra hundredths/tenths -- some of y'all are acting like having the best amplitude means that skater deserves like 10 points more TES, lol).

Element scores don't start with whoever executes the best version of it, and then everyone else is scaled accordingly lower than that. Not how the system is designed to work Rather, elements are assessed in real-time and not bearing in mind "Oh, well, skater B has yet to skate so I'm only going to give his quad a +4 because it was perfect, but I'm leaving room in case skater B does it bigger and better". Again, IJS is (ideally) supposed to be absolute scoring of a skater, and not RELATIVE scoring to skaters who have already skated or have yet to skate.

Judges have wayyy more things to think about than how an IceScope cameraman assessed Skater A's jump as 8 cm taller and 30 cm longer than Skater B's, and accordingly adjust their GOE scoring based on that. :laugh:

Again, the GOE bullet is very good height/length - not "superior" height/length or "better according to IceScope"s height/length" or anything that implies how that element was executed relative to other skaters.
 
Top