Can we use machine learning to score elements | Golden Skate

Can we use machine learning to score elements

bobbob

Medalist
Joined
Feb 7, 2014
Before you call me crazy, I think machine learning can actually helping scoring quite a bit, which has been plagued with inconsistency and subjectivity. Taking a video of an element with pixels, and outputting a score is well within the realm of possibility of technologies right now. Plus with the added bonus of being able to get precise measurements on things like rotation, height, speed, etc. will help. Bias is certainly still possible and machines do have issues (for instance the color of your costume might actually affect the pixels of your video and thus the score) but on the whole it will most certainly have less bias than human judges.

Just training a model right now based on past scores and bullet point criteria etc., and using it on some videos of elements I reckon it will do quite well. Perhaps more "accurate" than real judges. Again crazy idea but I think it has its merits. I do think the biggest barrier is just humans getting over the fact that things like this machines can do quite well. (maybe not PCS yet, but it can get some metrics like speed and transitions too)
 

Vandevska

U don't have to build the end of the world out it.
Medalist
Joined
Dec 18, 2017
Before you call me crazy, I think machine learning can actually helping scoring quite a bit, which has been plagued with inconsistency and subjectivity. Taking a video of an element with pixels, and outputting a score is well within the realm of possibility of technologies right now. Plus with the added bonus of being able to get precise measurements on things like rotation, height, speed, etc. will help. Bias is certainly still possible and machines do have issues (for instance the color of your costume might actually affect the pixels of your video and thus the score) but on the whole it will most certainly have less bias than human judges.

Just training a model right now based on past scores and bullet point criteria etc., and using it on some videos of elements I reckon it will do quite well. Perhaps more "accurate" than real judges. Again crazy idea but I think it has its merits. I do think the biggest barrier is just humans getting over the fact that things like this machines can do quite well. (maybe not PCS yet, but it can get some metrics like speed and transitions too)
Hmmmm I agree on this, but I don't know how wise it is to train it using previous scoring seeing as that is biased as well. But I do think it's highly possible, though it would need lots of work.
 

TontoK

Hot Tonto
Record Breaker
Joined
Jan 28, 2013
Country
United-States
I'm not calling you crazy. This is a really interesting prospect.

I suppose one challenge would be scoring original moves and innovative choreographic elements that would not have been programmed into the software.
 

Mawwerg

Final Flight
Joined
Nov 8, 2014
The idea is not crazy. I also believe machine learning could be used for such aims. However the problem I see here is as the result we just obtain an another judge.
 

lanceupper1114

On the Ice
Joined
Jun 3, 2018
it’s great in theory. But first of all, remember that the ISU has, like, no money for anything. Implementing technology is expensive because of equipment. For skaters music, many are still requiring CDs, compact discs!!! I mean, my car doesn’t even have a CD player and it wasn’t even an available option! Then there’s the cost of developing the tech. Can you imagine if Japan lost interest in figure skating? Everyone would be left competing in abandoned buildings skating on uneven frozen patches of sewage.

Next, people that work in the ISU are tech illiterates still trying to calculate GOE multipliers with an abacus. Did you watch the ISU web seminars for judges to learn about the new changes in the judging system? That was the most ghetto thing I’ve watched online in a very long time. It looked like a high school PowerPoint presentation that was slapped together in the morning by a student that was out very late partying the night before. I'm honestly pleasantly surprised skating protocols are in PDF format, and not calligraphy on parchment paper.

Also, don't forget the many corrupt don't want the option to be corrupt to be taken away. They will only reconfigure the system and rules to make it seem like they're trying to fix things. Skating rules and scoring has changed a lot over the years. But the one thing that needs to be changed, judges' ability to unfairly manipulate placements, has not changed one bit.

So, idea not crazy. Thinking it can happen within the next 100 years, crazy. We would sooner see a quintuple salchow done in the one-arm Biellman position in the air.
 

tofuetoffee

Rinkside
Joined
Jun 12, 2017
I think the idea has reason behind it. Setting aside costs, and if they can't take away the human judges because of federations, just having a system to better inform the judges about what is being done on the ice during a high level international competition like Worlds (since the technology may be too expensive to use otherwise) is a step up. I think this will give judges the ability to give better GOE, determine edges and etc. But I mean ISU won't even let their judges have better camera angles or replay jump takeoffs so this is too advanced and not enough room to play around for them.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
I think any machine-based scoring of the skaters should wherever possible score the actual skating and not video images thereof.

Which would mean developing some kind of sensors that can be attached to/built into the skates without interfering with the execution of the moves.

Objective scoring would be good for things like absolute speed (average across a program or a step sequence, maximum heading into and out of a jump) and efficiency of acceleration and deceleration, correct takeoff edges and rotation, height and distance of jumps, centering and rotational speed and number of rotations of a spin. Maybe overall ice coverage of a program or step sequence.

More sophisticated instruments might be able to evaluate the depth of edge, correctness and cleanness of edges on turns within and between step sequences and dance patterns, identify/count every difficult turn within a program, etc.

Additional sensors or markers on body parts would probably be necessary for evaluating when skaters achieve positions in spins, death spirals, etc., required for the element to count at all or to achieve certain level features with more accuracy than can be done by the human eye with video replay available. Similarly for evaluating basic good form in elements and transitional moves such as spread eagles and split jumps.

What machines could not evaluate would be qualities like music interpretation (timing maybe, but not expressiveness), originality, visual beauty of body positions or movement qualities, connection with the audience, etc.. Those are subjective judgments, so as long as the skating community considers them important definitions of "good skating" then human evaluators with differing perspectives will be necessary for assessing those qualities.
 

Neenah16

On the Ice
Joined
Dec 4, 2016
I have never worked on machine learning myself but have known people who are really good at and I can tell you that theoretically it is possible. However, you need more sources of data than just a video, especially in the training stage. Multiple video angles, sensors in the rink and on the boots are probably necessary to give the system enough data to distinguish different moves and elements regardless of the lighting, customs, skating styles, and off course skill level.

If anyone wants to do this, they should start with something basic like identifying elements (spins, jumps, steps), counting rotations, size of jumps, ice coverage, distinguish inside and outside edges. Scoring is subjective but providing accurate information to the judges will definitely help improve scoring even if the system itself does not give scores or points.

Cost is also a real issue, I knew someone who spent few thousand pounds for a study on the behaviour of animals in a small barn using video and sensors (took two years to complete). A very controlled environment with animals that were basically doing the same things over and over again. Imagine trying to apply somethings similar to skating that happens in different rinks, different ice conditions, different skaters, etc. This will require a fortune just for the study, let alone to implement the system in competitions.

If you are doing it, I would like to know more about your approach.
 

lanceupper1114

On the Ice
Joined
Jun 3, 2018
top skaters already use biomechanical analysis to evaluate and improve their jumps, so the sensors to measure height, rotation, whatever, already exist. I'm pretty sure Nathan Chen did this to get his quads.
 

moriel

Record Breaker
Joined
Mar 18, 2015
This is indeed possible, either with sensors or with videos.

The main issue here would be actually make people accept the decisions. For example, see all the discussions about unfair judging, including sometimes from actual athletes and coaches. When the evaluation is done by a computer, all this transforms into "your code is wrong" and result in a very low acceptancy of the whole thing.
 

Danny T

Medalist
Joined
Mar 21, 2018
I dabbled a bit in this field a few years back, and yes it is definitely possible in theory to develop a code to recognise at least jump type and rotation. But, it would need more camera angles, or it'll be very hard to pin down any accuracy at all. I mean, a super good developing team may be able to work around this (maybe, cause AI is the most tricky/dumb thing ever) but this is ISU not NASA so I'm assuming even if such a project is green-lighted, the funding's going to be limited :laugh: Spins are even easier than jumps, but steps and turns might prove a bit of a challenge, considering 1) the ice quality after even 1 skater/team is going to be very bad so it's hard to read correctly the ice tracing and 2) at least for singles and pairs, most don't have extremely deep edges like the top ice dancers so scanning images might prove a bit of an optical illusion. I mean, even ice dance teams are losing levels left and right, so edges are just tricky.

Sensors is a good option in theory to overcome this problem, but I don't think it's going to be very practical. Sensors are, by their namesake, sensitive, so I'm thinking they would be quite easily broken during high impact elements during singles and pairs. The landing force of a quad is what, 8 times the skater's weight? Sensors durable enough for that might exist, but they are going to be verrrry expensive. Plus, who is paying for the sensors? Installing them on the spot at competition is a huge no - no sane skaters are letting anyone mess with their blades 1-2 days before competition. Having them pre-installed by the manufacturing company is an option (which may allow coaches to monitor skaters' training too), but skaters are not exactly swimming in money for that kind of upgraded tech, tech they have to replace every few months.

But if machine learning/sensors are used, skaters/fed should have the right to ask for a review. I mean, I think they should be able to do so now too, but especially if machines are involved. I competed in Taekwondo before, and we can ask for a panel review if let's say, I felt like my foot contacted with my opponent's head but the sensors didn't pick it up. (Fun fact: Well, rules used to be different and people had to hit audibly to be awarded points, but now a tap is good enough, which is lame but whatever :rolleye:)
 

drivingmissdaisy

Record Breaker
Joined
Feb 17, 2010
I think the biggest problem would be that, to train a machine learning model, you would need to have training data classified by humans. For example, if you have a training example of a +2 3Lz, it has to be a human (or group of humans) that determines it was a +2 element. So I don't see the advantage if you aren't removing the human element anyway. It seems more useful to use machine learning when you want efficiency rather than accuracy.
 

moriel

Record Breaker
Joined
Mar 18, 2015
I think the biggest problem would be that, to train a machine learning model, you would need to have training data classified by humans. For example, if you have a training example of a +2 3Lz, it has to be a human (or group of humans) that determines it was a +2 element. So I don't see the advantage if you aren't removing the human element anyway. It seems more useful to use machine learning when you want efficiency rather than accuracy.

As a data scientist, i would approach this somewhat differently. By parts (I would also evaluate GOE bullets separately. I mean, not learn what a +2 jump is, but do it GOE bullet point by GOE bullet point).
1. Jump type, rotations, pre-rotations, URs, edges. Basically, this determines that the jump is a 3Lz with correct edge, with pre-rotations and under-rotations under an acceptable limit. This can be done even without machine learning maybe, but rather by creating models for the correct jumps, and then comparing actual parameters of the jump with the "ideal" parameters. Very rough example: for a specific jump, you need to do do X rotations in the air. Then, based on the sensor on the blade, you can count how many rotations the skater performed in a position where the sensor was above certain height (which means blade not touching ice). Two footed landings and so on are also pretty trivial and objective to evaluate.
2. Some GOE points are also pretty easy to do with sensors: "very good height and very good length" is pretty trivial to measure, you just need to define what good height and good length are, for example say that good height is anything above 35cm and good length is anything above some x horizontal shift. This can be done jump by jump too.
3. Then there are lots of things that are subjective. I would leave it to human judges at this point because lazy and it would still create tons of controversy if computer did it.

So overall, the algo would work like this.

1. Jump evaluated and identified, BV and some of the GOE is assigned.
- type of jump, number of rotations, <, ! and e whenever applies.
- there would be an issue with downgrades, because a very badly URd 3Lz would be simply identified as a 2Lz, since the computer cannot read minds and guess that it was supposed to be a triple. Imho this is actually better than the current system with <<, which is unnecessarely complicated. Not entirely sure, but I suppose there could be issues with very bad cases of "e" too.
- several characteristics of the jump, such as
>> very good height and very good length
>> SP: Jump element not according to requirements final GOE must be -5
>> Fall
>> Landing on two feet in a jump
>> Stepping out of landing in a jump
>> 2 three turns in between (jump combo)
>> Wrong edge take off F/Lz (sign “e”)
>> Unclear edge take off F/Lz (sign “!”)
>> Under-rotated (sign < )
>> Poor speed, height, distance, or air position
>> Touch down with both hands in a jump
>> Touch down with one hand or free foot
2. Then judges would be presented some of the following bullets to evaluate. Not all, because for example if the jump was called < for example, there is no need to the judge to further subtract from GOE, the skater was already punished, so the judge would not get the "Lacking rotation (no sign) including half loop in a combo" aspect. Or, for example, if skater fell on the jump, the "good take-off and landing" would be automatically a NO, because we already know the landing is bad. But then the judge wouldn't be able to also subtract GOE for "Weak landing (bad pos./wrong edge/scratching etc)", because that was already done in fall and is intended for more subtle cases.
>> good take-off and landing
>> effortless throughout (including rhythm in Jump combination)
>> steps before the jump, unexpected or creative entry
>> very good body position from take-off to landing
>> element matches the music
>> Unclear edge take off F/Lz (no sign)
>> Poor take-off
>> Lacking rotation (no sign) including half loop in a combo
>> Loss of flow/direction/rhythm between jumps (combo/seq.)
>> Weak landing (bad pos./wrong edge/scratching etc)
>> Long preparation
3. Add evrything up, computer BV + GOE factor * (computer GOE + human judge GOE)

I dunno, i would do something like this.
 

Danny T

Medalist
Joined
Mar 21, 2018
I think the biggest problem would be that, to train a machine learning model, you would need to have training data classified by humans. For example, if you have a training example of a +2 3Lz, it has to be a human (or group of humans) that determines it was a +2 element. So I don't see the advantage if you aren't removing the human element anyway. It seems more useful to use machine learning when you want efficiency rather than accuracy.

Well, CMMIIW, but I don't think anyone was proposing complete replacement of judging by the code? When writing my reply I certainly wasn't expecting machines to be able to tell if a spin has "creativity and originality", or a jump "matches to the music". But some other bullets (height and distance, speed of the spin, good take-off and landing, even ice coverage I would say) would benefit from being held to a clear standard. For example, a quad toe should be at least 60cm high and 3m far (+/-5%) to tick the first bullet. A specific spin position should be (x) rpm to count as good speed, death spiral (y) rpm, etc. Yes, the standards would have to be provided by the ISU after enough data is collected, but these standards would then be applied across the field. It's not like the algorithm knows oh this is Miyahara, her standard is 30cm, but this is Jin, his standard is 70cm.

Plus, the algorithm wouldn't be to judge a +2 jump. That's not how it works; it's to judge whether the jump tick off bullet 1or 2 or 3. The +2 whatever is simple mathematics. Another advantage is a lot of people (including me) have said they want judges to be clear what bullets an element satisfy for GOE awarding. A notable problem is that in real time due to the many bullets they wouldn't be fast enough to do so while still watching out for the performance and the next elements. Maybe with the algorithm taking away half the bullets, the judges can tick off the remaining bullets individually - then the overall GOE would be automatically calculated. Less "wow, that was a +5", more "wow, that's certainly a bullet 5 and 6".
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Yes, if there is going to be objective scoring of elements, in the sense of measuring whether they meet the mininum standards for the element as defined, or with specific errors that require penalties, or with specific positive qualities that deserve automatic rewards, those should be applied separately from any judge-based evaluation of more subjective qualities.

So you'd have the identification of the element and the determination of the objective pluses and minuses to the base value by machine, and then you could separately have quality evaluations in the form of GOEs. Or some other type of score to factor in the human evaluation. The more objective determinations get built into the scoring, the further we would get from IJS scoring as originally designed and closer to something entirely new, perhaps as different from the current IJS as IJS is from the holistic 6.0 approach.

I don't think a machine-based system that measures the measurable aspects of elements and applies penalties and bonuses would completely replace the human technical panel, though.

To begin with, most of the level calls for non-jump elements are not the sorts of things that can easily be evaluated by machine. Technology that can accurately identify every step and turn in a step sequence, let alone the upper body movement etc., would need to be more advanced than technology to identify and measure jumps.

Second, a lot of what the tech panel does involves rule vetting. Yes, computer algorithms certainly can do and to some extent I believe already do things like automatically inserting the +REP for repeated jumps without combinations or not counting the last jump in a program if no axel has been performed or the last spin if no flying spin or one-position spin has yet been performed, etc.

But sometimes when a skater doesn't stick exactly to their plan because of errors of one kind or another (including incorrect takeoff edges or amount of revolutions in jumps), intention matters in order to give the skater the benefit of the doubt. Which means that sometimes calls for early elements need to get revised after the fact based on what the skater does later. And sometimes determining what a skater intended to do involves "reading" the approach to an element or aborted element or even reading the skater's facial expression and body language, which are not things machines would be able to do any time soon.

there would be an issue with downgrades, because a very badly URd 3Lz would be simply identified as a 2Lz, since the computer cannot read minds and guess that it was supposed to be a triple. Imho this is actually better than the current system with <<, which is unnecessarely complicated. Not entirely sure, but I suppose there could be issues with very bad cases of "e" too.

At the beginning of IJS downgraded jumps were called as the jump with the same takeoff and one fewer rotations. This led to problems with applying the Zayak rule accurately: E.g., if a downgraded 4T was called as 3T and the skater also performed two intentional and clean 3Ts elsewhere in the program (or one, plus two other repeated jumps). That's why the < designation was introduced instead -- and later the distinction between << vs. <.

It was also a problem for underrotated single jumps -- particularly single axels at elite levels, although now it would apply to Eulers as well.

For lutzes and flips, often the way the skater approaches the jump lets people familiar with skating practice infer which of the two jumps the skater intended even if the edge (or flat) that the blade is on at the moment of takeoff is not correct according to the definition of the intended jump. Human determination of intention can sometimes be fallible in these cases, if the technique is so weak that the skater is never on the intended edge at all or intentionally rocks onto the correct edge immediately before takeoff after holding the wrong one for a while, or if the skater is attempting an unusual approach such as forward outside rocker into a lutz that is misinterpreted as an overchecked three turn into a wrong-edge flip. But machine evaluation would be even more fallible at determining intention. If we went to a purely automated method of jump evaluation based solely on the edge at the moment of takeoff, some skaters would get called for too many flips or too many lutzes in the same program and earn no points at all for the last one(s), even if the last one was a correctly performed intended jump.

Now that preceding steps are no longer required for the short program solo jump, it's less of an issue, but when they were required the presence or absence of steps could be a clue to which jump was intended as the combination and which as the solo jump in a short program with no combination actually attempted (generally because of a fall/bad landing on the intended first jump of the combo).

Also if a skater steps/stumbles out of a jump and then tacks another jump on afterward. Would a machine-based identification method identify the later jump as intended to connect to the same element and be able to determine whether to call, e.g., 3T+2T*+SEQ or just 3T+SEQ or two separate elements 3T and 2T?

At the very least, there would still need to be one official to apply human perception and understanding to the machine-based calls.

And we haven't even addressed how machines could assist or take over identifying or calling levels for non-jump elements.

I.e., I expect some kind of machine-assisted scoring sometime within the next few decades. I don't expect (nor will I live long enough to see) any primarily machine-generated skating scoring within this century.
 

moriel

Record Breaker
Joined
Mar 18, 2015
Yep this.
But while the machine scoring wouldnt entirely replace live judges, it would remove a lot of work from them. And, the main thing, greatly increase accountability.
All the machine decisions could be easily made public, to start with. Then, since judges will have less things to evaluate and more time, they could be required to explain their decisions, for instance by evaluating individual bullet points rather than giving the total.
This would allow us to have a full explanation of why a specific element was given a specific GOE or level

step sequences could be evaluated too. I dunno, you could identify specific steps and then see if the steps done match the level requirement. Still could use some human assistance, but overall would be a big deal too.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Then, since judges will have less things to evaluate and more time, they could be required to explain their decisions, for instance by evaluating individual bullet points rather than giving the total.
This would allow us to have a full explanation of why a specific element was given a specific GOE or level
step sequences could be evaluated too.

I think this would be a lot more time consuming and a lot less straightforward than you're allowing for.

How long does it take to you to watch a jump and to tick off in your head what pluses and minuses you think it deserves, and then write down or input on computer a single score?

Now how long does it take you to note each plus and minus on paper using an efficient shorthand or a form to check boxes on?
Checking boxes on a screen would take about the same amount of time as doing so on paper. Either way you'd need to look away from the live skating long enough to enter it all.
By which time you might have missed the next element entirely.

How many pages would it take to show online or print out protocols for a single program by one skater, showing every checked box for every judge for every element? (And some similar documentation for each program component?) Then multiply by all the skaters in the event.

How much time would it take for one judge to explain out loud their reasoning for every bullet point awarded in one skater's program? Now multiply by all judges on the panel. Or for all judges to each give their reasoning for the same element, and then multiply by the number of elements.

Then multiply that by all skaters in the event.

Typing up what they might say into sentences would take even longer.

Of course skaters would mainly only be interested in their own elements for which the judges had mixed opinions or where the judges' opinions in general differed from how the element felt to them while executing it. And the press and public would mainly be interested in details on specific elements that were unusual or controversial or notable in some other way, including comparing elements by medal contenders.

But if that kind of information were to be made available, all judges would need to document it for all elements in real time.

I dunno, you could identify specific steps

Who could identify them? The artificial intelligence?
The tricky part would be teaching it to recognize 8 different kinds (forward/backward, right/left, inside/outside edge entries) for each kind of turn and many steps, plus number of rotations on twizzles, plus various other types of steps where the edge isn't relevant -- from all possible viewing angles on the ice surface.

Knowledgeable humans can do this in real time, but they can't document it in real time.

If you can get a computer to be able to recognize them, the recognition could probably be documented instantaneously.

and then see if the steps done match the level requirement.

That would be the easy part.
 

moriel

Record Breaker
Joined
Mar 18, 2015
It would take less time to judges to evaluate 10 or so points than the 20+ that they currently have to evaluate on every element.
If they currently score like "this jump looks +4", then most of the rules are useless and pointless. Now, clicking 10 checkboxes is much faster and requires less attention than going through 20 of them in your head or on paper or whatever.

Presenting all the checkboxes online is fairly simple. For example, a simple format would be to have 2 protocols - one like the current one, and another one, detailed. I really see zero issues with 300 pages pdf, considering that internet is fast, and most pdf viewers feature a search function. If that is too bad, they can release one protocol / skater.
 

Metis

Shepherdess of the Teal Deer
Record Breaker
Joined
Feb 14, 2018
Before you call me crazy, I think machine learning can actually helping scoring quite a bit, which has been plagued with inconsistency and subjectivity. Taking a video of an element with pixels, and outputting a score is well within the realm of possibility of technologies right now. Plus with the added bonus of being able to get precise measurements on things like rotation, height, speed, etc. will help. Bias is certainly still possible and machines do have issues (for instance the color of your costume might actually affect the pixels of your video and thus the score) but on the whole it will most certainly have less bias than human judges.

Just training a model right now based on past scores and bullet point criteria etc., and using it on some videos of elements I reckon it will do quite well. Perhaps more "accurate" than real judges. Again crazy idea but I think it has its merits. I do think the biggest barrier is just humans getting over the fact that things like this machines can do quite well. (maybe not PCS yet, but it can get some metrics like speed and transitions too)

Yes, but that only gets you so far. AI could also measure height, ice coverage, rotation speed (jumps and spins), and assess centering/traveling in spins. However, you’re always going to have a GIGO problem and training an AI to take over the more contentious issues a technical panel handles wouldn’t be an immediate fix. (I think it’s the direction the ISU should start heading, however.) Assessing GOE bullets is a bit far, in my opinion, and would be a more complex task than just taking over for edge calls. That being said, AI could certainly be used to rule out certain positive features or to detect negative ones (such as very good height and length and poor takeoff, respectively), but human review is likely always going to be necessary to some extent.
 
Top