Abstract
Keywords
Introduction
Australian Football (AF) is an intermittent effort sport played between two teams of 18 players, plus four players upon the interchange bench. During competitive matches, players are generally required to cover between 11,000 and 13,500 m whilst also repeatedly performing several technical actions such as kicks, handballs, tackles and marks.1,2 Previous research highlights the importance of both these physical and technical outputs in relation to successful match performance3–7. Specifically, Wing et al. 4 established the importance of tackling and marking during successful phases of defence, with superior running performance noted during successful phases of offence. Additionally, team kicking and goal conversion have been previously highlighted as the most important performance metrics when explaining team success. 5
The importance these physical and technical outputs have upon team success highlights the need for training that effectively develops AF players running ability and skill proficiency. A comprehensive study by Tribolet et al. 8 describes the multifaceted nature of AF team training modalities, which range from fundamentals, skill development, line training, match simulation and fitness or conditioning. Additionally, Loader et al. 9 were able to classify AF training drills into three clusters: game specific conditioning, skill refining moderate intensity dominant, and skill refining low intensity dominant. During skill development drills (e.g., skill development, line training and fundamentals), players are often placed into groups based upon their playing position (e.g., backs, midfielders, and forwards), where training is focused upon position-specific tasks. Although there are undeniable benefits to dividing players into positional groups for the purposes of training, the tactical evolution of AF 10 has seen the growth of hybrid type players, capable of playing within multiple positions. 7 Additionally, previous clustering of player positions within the Australian Football League Women's (AFLW) competition, revealed multiple player roles that sit within playing positions (e.g., high-disposal defenders) may be present. 11 Therefore, it may be beneficial to explore alternative grouping strategies within modern AF training practices. This may be particularly important when training time is at a premium, hence emphasising the most relevant individual traits from competitive matches may optimise training effectiveness within training time constraints. 5
Utilising a statistical clustering method which can account for multiple metrics, can be seen as beneficial to achieving optimal grouping. Several different clustering methods (e.g., fuzzy, k-median, k-medoid, k-means, hierarchical) can be employed to achieve this, each with their own strengths and weaknesses.12,13 Specific to sports performance research, clustering methods have been employed in soccer (fuzzy clustering),14,15 basketball (fuzzy clustering, complex networks, Bayesian networks, two-step clustering),16–19 American football (k-means), 20 tennis (fuzzy, stochastic blockmodel), 21 and AF (Wards two-way hierarchical, K-means, CLARA).9,11,22 Which method to use relies upon striking a balance between statistical robustness, ease/burden of computation, and practical application. This particular research uses the k-means clustering method previously utilised by Shelly et al., 20 who have made progress towards establishing the use of k-means for devising training groups in American football. However, further studies are warranted within this area to validate the technique in this context. Additionally, the study by Shelly et al. 20 was limited to global positioning system data only, where the inclusion of technical actions (e.g., kicks) may prove beneficial when devising training groups. This is due to the reported benefit of performing these actions at match speed in order to increase positive transfer to performance.8,23
K-means clustering is an unsupervised machine learning method whereby
To perform this successfully as it relates to training design, both physical performance (e.g., distance per minute) and technical actions (e.g., kicks) should be included within the data collected. This is not only due to the importance of these metrics to successful performance, but it has also been noted that technical actions should be performed at match speed in order to increase positive transfer from training to match performance.8,23 This concept is further supported by Farrow et al. 25 who demonstrated that generally open drills displayed significantly more game-like decisions and higher cognitive complexity scores, as well as higher physical outputs, as opposed to closed drills. Therefore, the aim of this study was to assess the potential and utility of k-means clustering as a method for aiding the division of training groups for AF players using match performance data.
The remainder of this paper is structured as follows: Section 2 outlines the methodology utilised for the analysis; Section 3 presents the findings; Section 4 discusses the findings in respect to Australian Football training design; Section 5 outlines the limitations and future research recommendations; and Section 6 provides conclusive remarks.
Methods
Participants
Microsensor technology and technical action data were collected on 38 male sub-elite AF players (age: 23.0 ± 3.3 years; body mass: 82.8 ± 10.1 kg; height: 185.8 ± 8.2 cm) from one club competing in the West Australian Football League (WAFL) during the 2021 season across 22 matches. Individual player observations were removed if the player was unable to complete the entire match (e.g., sustained an injury), or if there was failure of the recording device. 23 This missing data is considered to be missing completely at random (MCAR). A total of 454 player observations (average observations per player 11.9 ± 6.9; range 2–22) were included in the final analyses. Additionally, players were split into 7 positional groups according to the position they spent the most amount of on-ground playing time in each individual match. This data was obtained from the interchange software which provides the time played in each position (Interchanger, Australia). They were then assigned the position they played the most matches in as their final position for the purposes of this study. This included full back (n = 4), half back (n = 8), inside midfield (n = 10), wing (n = 4), half forward (n = 5), full forward (n = 5), and ruck (n = 2). This study was approved by the Edith Cowan University Human Research Ethics Committee (HREC ID: WING-2023–04977).
Data collection
Microsensor technology data was collected using the Playertek device (Catapult Innovations, Melbourne, Australia) sampling at 10 Hz. The accuracy of these devices has been previously reported. 26 To reduce interunit variability, all players wore the same unit for each match, fitted into a specifically designed pouch sewed into the playing shirt. Upon completion of each match, the data was downloaded onto the proprietary software (Playertek Cloud), where the start and end times of each quarter were synced from the Playertek live-feed application. All time spent upon the interchange bench was removed so that the analyses was formed of on-ground time only. The following metrics were collected from the microsensor device and expressed per minute of playing time for the entire match (i.e., the sum of the four quarters); total distance (m), high-speed distance (HSR; >18 km h−1), very high-speed distance (VHSR; >24 km h−1), PlayerLoadTM (the sum of accelerations across the three accelerometer vectors accounting for the instantaneous rate of acceleration and divided by a scaling factor, 27 PL; AU), accelerations (efforts >3 m s−2), and high-intensity efforts (HIE;>18 km h−1 for ≥ 2s duration). These metrics were selected as they have been commonly reported within the AF literature4,23,28–31
Additionally, technical action data was acquired from Champion Data (Melbourne, Australia), a company that provides statistics to the Australian Football League and the WAFL. Their coding of events has been previously shown to demonstrate acceptable levels of accuracy.
32
These technical actions included
33
:
These technical actions were included as they are both commonly highlighted within the AF literature, and have been reported to be important to successful match performance.1,4,23,32 Each player was assigned a season match average for all microsensor and technical action data within the final data set. Utilising a combination of data modalities (e.g., microsensor technology and match statistics), can provide a greater spectrum of data to understand player performance. 34
Statistical analyses
All statistical analyses were performed in Microsoft Excel (Microsoft Corporation, Redmond, WA, USA), and R (R for Statistical Computing, version 4.3.2)
35
where the packages “factoextra”
36
and “cluster”
37
were utilised. To ensure all data points contributed equally to the model, the data was first scaled. This was performed by using the scale function within R, which first centres the data by subtracting the mean, and then scales it through dividing by the standard deviation. To determine the optimal number of clusters, the elbow method was utilised through producing a scree plot where the total within-cluster sum of squares was plotted against each candidate of

Scree plot of total within sum of squares plotted against number of clusters (1 to 10), utilised as the “elbow method” for ascertaining optimal number of clusters used within the final analyses.
Following the application of K-means clustering, silhouette coefficients were extracted to assess clustering performance. Additionally, divisive hierarchical clustering (DIANA) analysis was also performed post clustering to further assess clustering performance.
Results
The results of the elbow method demonstrated the optimal number of clusters to be four, which are depicted in Figure 2, where each coloured shape represents a cluster, and each number a player within that cluster. The post clustering silhouette coefficients yielded an average score of 0.22, with a positive score deemed acceptable. The k-means clustering was performed again at all iterations of

Cluster plot revealing the results of the K-means clustering using four clusters (centres) as derived from the scree plot (figure 1). Each cluster represents a group of players, characterised by specific match performance characteristics. Cluster one; low technical action players, cluster two; low physical output players, cluster three; high technical action players, cluster four; high physical output players. Each number within each cluster represents an individual player. Dim 1 and Dim 2 represent the first two principal components, with Dim 1 accounting for the greatest variability in the data set (48.7%), and Dim 2 the next highest (20.9%) whilst being orthogonal to Dim 1 20 .
Finally, the k-means clustering was checked for sensitivity by performing a “leave-one-out” cross validation, where the k-means was re-performed 38 more times with one player systematically left out on each occasion, with the clustering results compared to those obtained utilising the full data set. This cross validation demonstrated that 31 players remained in the same cluster on all 38 occasions. Five players remained in the same cluster on 37/38 occasions, one player remained in the same cluster on 31/38 occasions, with one player remaining in the same cluster on 22/38 occasions.
Each of the clusters can be broadly classified, with cluster one containing players who recorded lower values for handballs (vs cluster three), kicks (vs cluster three), and tackles (vs cluster three and four). Cluster two represents the low physical output players (e.g., those who recorded the lowest microsensor technology values, predominately for distance and high-speed running). Cluster three represents the high technical action players (e.g., those who recorded the highest values for kicks and handballs). Cluster four represents the high physical output players (e.g., those who recorded the highest values for the microsensor technology metrics, predominately for distance and high-speed running). The descriptives for each cluster are shown within Table 1.
Descriptives (average ± SD) for microsensor technology metrics, technical actions, and playing positions for the four clusters.
DIST: distance/min, HSR: high-speed running/min (>18 km/h), HIE: High-intensity efforts/min, PL: PlayerLoad/min, VHSR: very high-speed running/min (>24 km/h), ACCEL: accelerations/min (>3 m/s/s), FB: Full back, HB: Half back, M: Inside-midfield, W: Wing, R: Ruck, HF: Half forward, FF: Full forward.
Cluster one contained 14 players, of which two were full backs, five half backs, one wing, four half forwards, and two full forwards. The players were generally defined as the lowest for the technical actions, but high for high-speed (23.1 ± 3.1 m min−1) and very high-speed distances (4.4 ± 0.8 m min−1), as well as high-intensity efforts (0.7 ± 0.1 efforts per minute). Cluster two contained nine players, of which two were full backs, one half back, one inside midfield, two rucks, and three full forwards. These players were the lowest for all microsensor technology metrics, but highest for marks (3.7 ± 1.9). Cluster three contained nine players, of which two were half backs, and seven inside midfielders. These players were high for technical actions, and in-particular kicks (12.2 ± 1.7), and handballs (8.4 ± 2.1). Finally, cluster four contained six players, of which two were inside midfielders, three wing, and one half forward. These players were the highest for all microsensor technology metrics and tackles (4.1 ± 2.4).
Discussion
The utility and usefulness of k-means clustering as a method for devising training groups in a population of AF players were assessed. The results demonstrated that this population of sub-elite AF players could be divided statistically into four groups, each with a unique identity in terms of match performance data, which can subsequently be targeted with specific training interventions. It should be noted that the number of clusters derived from the elbow method may differ by team, and be influenced by playing style, the mix of player profiles, and the specific metrics being assessed. 20 Although there was some evidence of grouping in terms of playing positions, the players within each cluster were not completely unique to a particular playing position, which is in line with those reported in American football. 20 This is perhaps an unsurprising finding amongst AF populations, given the tactical evolution of game play has led to more hybrid-type players, who often switch positions, sometimes even during the same game as well as across the season.7,10 Therefore, it may be that technical and physical match performance is more person and role specific, as opposed to position specific in modern AF.
Cluster one was defined by the presence of low technical action players, recording the lowest values for kicks (7.2 ± 2.2), handballs (4.3 ± 1.3), tackles (2.0 ± 0.8), and the second lowest for marks (3.3 ± 0.9). However, they did display some of the highest values for the high velocity microsensor technology metrics; HSR (23.1 ± 3.1 m min−1), VHSR (4.4 ± 0.8 m min−1), and HIE (0.7 ± 0.1 efforts per min). This cluster also contained five of the eight (63%) half-backs, and four of the five (80%) half-forwards within the study, who together made up 64% of the 14 players within the cluster. Although it appears counterintuitive to cluster together half forwards and half backs, this is common practice within research, where these two positions are combined to form a half-line grouping.1,23,39,40 Oftentimes, this occurs because these players play directly against each other in competitive matches and therefore likely to display similar running demands and movement patterns. Interestingly, in our study, these positions were still predominantly clustered together when technical actions are also included, potentially due to the low number recorded. Additionally, three of the five players who were not classified as half-line players within this cluster, had played some games during the season in a half-line position. This strengthens the theory of grouping players by match performance data, as opposed to playing position.
Cluster two was defined by low physical output players, recording the lowest values for all microsensor technology variables. However, this group did record the highest number of marks (3.7 ± 1.9), and the second highest number of kicks (8.0 ± 2.3) per game. This cluster consisted of two full backs, both ruck position players, and three full forwards (50%, 100%, and 60% of the study population respectively), which when combined accounted for 78% of the cluster's population. The clustering together of these positions is somewhat unsurprising considering that these positions are often grouped together and referred to as fixed, full, tall or key position players in AF research.23,29,39,40 Previous research delineating AF populations into discrete positional groups have noted these playing positions to record the smallest running distances and distances recorded at high velocities. 28 This can be attributed to several factors, which include their role within the team requiring them to play in a more restricted area of the playing oval (e.g., forward or defensive 50), they are typically larger in stature, and have been reported to record the lowest scores on physical performance measures. 41 However, their large stature often provides them with an advantage when marking the ball, which is highlighted in the high number of marks recorded by players in this cluster.
Maybe somewhat surprisingly, the other two players within this cluster are one half back and one inside-midfield position players, who would have potentially been expected to have been clustered into other groups. However, it should be noted that the half back played nine of their 20 games as a full back (with the other 11 at half back), which may have led to the player being clustered to this group. Additionally, it may be that their specific role within the team produces an output (physically and technically) that more closely resembles a key position player. For example, the inside-midfielder in this cluster may have played more of a role as the protector, where his job is to block and create space for other midfielders to receive the ball. 42 In this instance it is likely that the player would have less kicks and handballs (in comparison to the other midfielders), and potentially cover smaller running distances. However, without deidentifying the data and receiving the technical/tactical expertise of the coaching staff, this is somewhat difficult to confirm.
Cluster three was defined by high technical action players, with this group recording the highest number of kicks (12.2 ± 1.7) and handballs (8.4 ± 2.1), as well as recording high values for marks (3.7 ± 0.9) and tackles (4.0 ± 1.3). These players also recorded high values for all microsensor technology metrics. The cluster was formed of seven inside-midfielders (70% of study population), and two half backs (25% of study population). The grouping together of the majority (7 out of 10) of the inside-midfielders within the study is encouraging and strengthens the validity of the methodology. Typically, these players have the most possessions and disposals during an AF match, 43 as well as covering large running distance, 28 which is mirrored within the descriptives for cluster three. Similarly, the two half backs within this cluster recorded more average disposals (kicks and handballs combined) per match than any of the other half backs within the study, which may be the predominant reason they have been grouped into cluster three.
Cluster four was defined by high physical output players, recording the highest values for all microsensor technology metrics. This group also recorded the highest tackles per game (4.1 ± 2.4). The cluster was formed of three of the four wing position players, two inside-midfielders and one half forward (75%, 20%, and 20% of the study population respectively). Similar to cluster three, it is encouraging that the majority of the wing position players were clustered together. Previous research has reported wing position players to cover the largest running distances in competitive matches, 44 which is in agreement with the descriptives of cluster four. The combination of high number of tackles, low number of the other technical actions (i.e., kicks, handballs, and marks), and high physical output suggests that the primary role of the players in this group may be to perform a high number of defensive pressure acts. 1 However, additional data is required to confirm this hypothesis.
The findings of the present study support the potential use of k-means clustering as one valuable method to accurately devise training groups of shared characteristics within AF populations. Similar to American football, 20 the clusters contained several players from the same playing position or group; however, there appears to be some subtle differences to traditional training groupings which may add value within the practical setting for some aspects of training. For example, grouping together 70% (7/10) of the inside midfielders aligns with common AF training groups, however, the other three inside midfielders were assigned to different clusters, which may provide novel insights for training prescription. Although separating three players from a positional group to perform a different task for part of training may feel counterintuitive, coaches should feel confident that by using the match as a reference point to derive training prescription, a higher level of specificity can be achieved. Using the inside-midfield group as an example, if physical training is prescribed at the intensity of the 7 grouped into cluster three (distance; 130.4 ± 8.4 m min−1, high-speed; 21.1 ± 4.1 m min−1), then this is unlikely to provide a sufficient stimulus for the 2 inside-midfielders grouped into cluster four. These two players recorded a season average for distance 147.3 m min−1 and 145.7 m min−1, and high-speed 31.6 m min−1 and 30.2 m min−1 respectively. Therefore, ensuring a part of training replicates these physical demands, coaches can have a greater level of confidence that players are both physically prepared for competition and the successful transfer of technical actions can be potentially increased.
It is important to note that grouping players based upon the clustering results may not achieve optimal outcomes for all aspects of a training session/week, but would be better utilised within specific scenarios. As described by Tribolet et al., 8 AF training can be separated into two broad categories, training-form (fundamentals, fitness, skills), and playing-form (breakdown, transition, line-specific, match simulation). The groups derived by k-means may be better utilised during training-form drills, where players are often divided into small groups and perform repetitive type practice. 8 Traditionally, players would often perform these drills with players of a similar playing position, however, this may not truly represent the nature in which they play. Recently, van der Vegt et al. 11 established that up to thirteen playing roles may be present within a population of AFLW players. The nuance of these roles should be captured for the purposes of training prescription. 11 An example of this could be seen within the defensive-mid role (a defender playing a role that is partly representative of a midfielder). 11 This group of players will likely need to develop skill under different constraints, most namely time in possession and heightened opposition pressure,45,46 compared to those they will likely experience in a traditional backs (defenders) grouping. To achieve this, grouping them with similar players who can create such an environment during training-form type training would appear advantageous. Conversely, playing-form type training may be better served with players divided into traditional groupings, so that team cohesion and tactical practice can be optimised, whilst also exposing players to the vast dynamics of AF match play.
To assist coaches and players to utilise this information in AF practice, potential training prescriptions and areas of focus based upon the cluster results are presented in Table 2. The specific targeting of both these physical and technical attributes can maximise both training time and outcomes that are specific to the way in which the athlete performs during competitive matches. Additionally, ensuring technical actions (e.g., kicks and handballs) are performed in training at game pace has the potential for greater transfer to competitive matches.8,23 Therefore, it may be prudent for those in cluster four to practice not only tackle technique, but also perform these combined during high intensity running drills, as the players within this cluster recorded the highest values for both tackling and microsensor technology metrics.
Training recommendations based upon the cluster results.
MAS: Maximum aerobic speed.
Limitations and future recommendations
This study is limited by a few factors which include the number of technical actions utilised within the data set. Including actions such as pressure acts, inside and rebound 50's, and score involvements may have helped describe a player's role more accurately. Additionally, it may have been valuable to include physical performance measures such as jump height, lower body strength, and aerobic capacity (e.g., time trial), which could help to guide the development of specific physical qualities for each cluster of players. Future research may also wish to obtain tactical information from the coaching staff 20 to better understand the specific role the player has been asked to play for the team. Additionally, it may be interesting to compare the results of the k-means clustering to the coach's perception of which players have similar traits. As previously mentioned, there are several clustering methods available to practitioners, therefore, future studies may wish to explore these as alternatives to the k-means method.
Conclusion
The use of k-means clustering can provide some novel insights into training groupings within AF players. Through its use, players can be clustered with other players who have similar match outputs, and training can be tailored to specifically improve upon these. It is important to note, that although there appears to be some value in the use of this methodology, it is not intended to be a standalone method for training organisation. Moreover, it can be used as an additional tool to provide an extra layer of detail concerning some aspects of training. For example, in Table 2, we have recommended a focus on physical conditioning for players within cluster four, but not for those in cluster two. Whilst all players are required to be physically fit, including those in cluster two, it appears to be more prudent for those in cluster four, and therefore an additional training stimulus may be required. Finally, the results of this study, combined with those reported in American football, 20 support the notion that k-means clustering could be successfully employed across multiple team sports of a similar nature to that of Australian football.
