Abstract
Introduction
A fundamental challenge in basketball performance evaluation is the team nature of the game. Contributions to team success occur in the context of a five-player lineup, and isolating the specific contribution of an individual is a difficult problem with a considerable history. Among the many approaches to the player evaluation problem are well-known metrics like player efficiency rating (PER), wins produced (WP), adjusted plus-minus (APM), box plus-minus (BPM), win shares (WS), value over replacement player (VORP), and offensive and defensive ratings (OR and DR) to name only a few (Basketball-Reference). While these individual player metrics help create a more complete understanding of player value, some contributions remain elusive. Setting good screens, ability to draw defenders, individual defense, and off-ball movement are all examples of important contributions that are difficult to measure and quantify. In part, these contributions are elusive because they often facilitate the success of a teammate who ultimately reaps the statistical benefit.
Even beyond contributions that are difficult to quantify, the broader question of chemistry between players is a critical aspect of team success or failure. It is widely accepted that some groups of players work better together than others, creating synergistic lineups that transcend the sum of their individual parts. Indeed, finding (or fostering) these synergistic groups of players is fundamental to the role of a general manager or coach. There are, however, far fewer analytic approaches to identifying and quantifying these synergies between players. Such positive or negative effects among teammates represent an important, but much less well understood, aspect of team basketball.
In this paper we propose spectral analysis (Diaconis, 1988) as a novel approach to identifying and quantifying group effects in NBA play-by-play data. Spectral analysis is based on algebraic signal processing, a methodology that has garnered increasing attention from the machine learning community (Kakarala, 2011; Kondor et al., 2007; Kondor and Dempsey, 2012), and is particularly well suited to take advantage of the underlying structure of basketball data. The methodology can be understood as a generalization of traditional Fourier analysis, an approach whose centrality in a host of scientific and applied data analysis problems is well-known, and speaks to the promise of its application in new contexts from social choice to genetic epistasis and more (Paudel et al., 2013; Jurman et al., 2008; Lawson et al., 2006; Uminsky et al., 2018; Uminsky et al., 2019). The premise of spectral analysis in a basketball context is simple: team success (appropriately measured) can be understood as a function on lineups. Such functions have rich structure which can be analyzed and exploited for data analytic insights.
Previous work in basketball analytics has addressed similar questions from a different perspective. Both Kuehn (2016) and Maymin et al. (2013) studied lineup synergies on the level of player skills. In Maymin et al. (2013) the authors used a probabilistic framework for game events, along with simulated games to evaluate full-lineup synergies and find trades that could benefit both teams by creating a better fit on both sides. In Kuehn (2016), on the other hand, the author used a probabilistic model to determine complementary skill categories that suggest the effect of a player in the context of a specific lineup. Work in Grassetti et al. (2019a) and Grassetti et al. (2019b) modeled lineup and player effects in the Italian Basketball League (Serie A1) based on an adjusted plus-minus framework.
Our approach is different in several respects. First, we study synergies on the level of specific player groups independent of particular skill sets. We also ignore individual production statistics and infer synergies directly from observed team success, as defined below. As a consequence of this approach, our analysis is roster constrained–we don’t suggest trades based on prospective synergies across teams. We can, however, suggest groupings of players that allow for more optimal lineups within the context of available players, a central problem in the course of an NBA game or season. Further, our approach uses orthogonality to distinguish between the contributions of a group and nested subgroups. So, for example, a group of three players that appears to exhibit positive synergies may, in fact, be benefiting from strong individual and pair contributions while the triple of players adds no particular value as a pure triple. We tease apart these higher-order correlations.
Furthermore, spectral analysis is not a model-based approach. As such, our methodology is notably free of modeling assumptions–rather than fitting the data, spectral analysis reports the observed data, albeit projected into a new basis with new information. Thus, it is a direct translation of what actually happened on the court (as we make precise below). As such, our methodology is at least complementary to existing work, and is also promising in presenting a new approach to understanding and appreciating the nuances of team basketball.
Finally, we note that while the methodology that underlies the spectral analysis approach is challenging, the resulting intuitions and insights are readily approachable. In what follows, we have stripped the mathematical details to a minimum and relegated them to references for the interested reader. The analysis, on the other hand, shows promise as a new and practical approach to a difficult problem in basketball analytics.
Data
We start with lineup level play-by-play data from the 2015-2016 NBA season. Such play-by-play data is publically available on ESPN.com or NBA.com, or can be purchased from websites like bigdataball.com, already processed into csv format. For a given team, we restrict attention to the 15 players on the roster having the most possessions played on the season, and filter the play-by-play data to periods of games involving only those players. Next, we compute the aggregated raw plus-minus (PM) for each lineup. Suppose lineup
Since a lineup consists of 5 players on the floor, there are 3003 = 15choose5 possible lineups, though most see little or no playing time. We thus naturally arrive at a function on lineups by associating with
Methodology
Our goal is now to decompose the function
Observe that a full lineup is an unordered set of five players. Any reshuffling of the five players on the floor, or the ten on the bench, does not change the lineup under consideration. Moreover, given a particular lineup, a permutation (or reshuffling) of the fifteen players on the team will result in a new lineup. The set of such permutations has a rich structure as a mathematical group. In this case, all possible permutations of fifteen players are described by
Let
Take
To understand
The decomposition continues in an analogous way, though the computations become more involved. Several computational approaches are described in Diaconis (1988) and Maslen et al. (2003). In our case of the symmetric group
The decomposition in (2) is particularly useful for two reasons. First, each
Secondly, the decomposition in (2) is orthogonal (signified by the ⊕ notation). From a data analytic perspective, this means that there is no overlap among the spaces, and group effects are independent. Thus, for instance, a contribution attributed to a group of three players can be understood as a pure third-order contribution. All constituent pair and individual contributions have been removed and quantified separately in the appropriate lower-order spaces. We thus avoid erroneous attribution of success due to multicollinearity among groups. For example, is a big three really adding value as a triple, or is its success better understood as a strong pair plus an individual? The spectral decomposition in (2) provides a quantitative basis for answering such questions.
The advantage of the orthogonality of the spaces in (2), however, presents a challenge with respect to direct interpretation of contributions for particular groups. This is evident when considering the dimension of each of the respective effect spaces in Table 1, which is strictly smaller than the number of groups of that size we might wish to analyze.
Dimension of each effect space, along with the number of natural groups of each size
Since we have rosters of fifteen players, there are fifteen individual contributions to consider. The space
We deal with this issue using Mallows’ method of following easily interpretable vectors as in Diaconis (1988). Let
To quantify the contribution of
To ground the ideas of the previous section we present a small-scale example in detail. Consider a version of basketball where a team consists of 5 players, two of which play at any given moment. The set of possible lineups consists of the ten unordered pairs {
Following a season of play, we obtain a success function that gives the plus-minus (or other success metric) of each lineup. We might observe a function like that in Table 2.
Success function for two-player lineups
Summing
Preliminary analysis of sample team using individual plus-minus (PM), which is the sum of the lineup PM over lineups that include a given individual
Player 3 is the top rated individual, followed by 2, 4, 5, and 1. Lineup rankings are given by
Now compare the analysis above with spectral analysis. In this context the vector space of functions on lineups is 10-dimensional and has a basis consisting of vectors
First order (or individual) effects beyond the mean are in encoded in
Finally, the orthogonal complement of
We can now project
Turning to the question of interpretability, section 3 proposes Mallows’ method of using readily interpretable vectors projected into the appropriate effect space. To that end, the individual indicator function
Contributions from spectral analysis as measured by Mallows’ method are given in Table 4 for both individuals and (two-player) lineups.
Spectral value (Spec) for each individual player and two-player lineup, and rank of each lineup, along with the preliminary rank given by
The table also includes both the spectral and preliminary (based on
A challenge inherent in working with real lineup-level data is the wide disparity in the number of possessions that lineups play. Most teams have a dominant starting lineup that plays far more possessions than any other. For example, the starting lineup of the ’16 Golden State Warriors played approximately 1140 possessions while the next most used lineup played 535 possessions. Only 12 lineups played more than 100 possessions for the Warriors on the season. For the Boston Celtics, the starters played 1413 possessions compared to 257 for the next most utilized, with 13 lineups playing more than 100 possessions. By contrast, the Celtics had 255 lineups that played fewer than 10 possessions (but at least one), and the Warriors had 236. Numbers are similar across the league. This is another reason for using raw plus-minus in defining the team success function
Despite these challenges, however, we’ll see below that there are significant insights to be gained in working with lineup level data. Moreover, since spectral analysis is a non-model-based description of complete lineup-level game data, it has the advantage of maintaining close proximity to the actual gameplay observed by coaches, players, and fans. There are always five players on the floor, so all data begins at the level of full lineups.
Consider the first order effects for the 15-16 Golden State Warriors in Table 5. Draymond Green, Stephen Curry, and Klay Thompson are the top three players. The ordering, specifically Green ranked above Curry, is perhaps interesting, though it’s worth noting that this ordering agrees with ESPN’s real plus-minus (RPM). (Green led the entire league in RPM in 15-16.) Other metrics like box plus-minus (BPM) and wins-above-replacement (WAR) rank Curry higher. Because SCLP is based on ability of lineups to outscore opponents when the player is on the floor (like RPM), however, as opposed to metrics like BPM and WAR which are more focused on points produced, the ordering is defensible.
Top and bottom five first-order effects for GSW. SCLP is the spectral contribution per log possession, PM is the player’s raw plus-minus, and Poss is the number of possessions for that player
Top and bottom five first-order effects for GSW. SCLP is the spectral contribution per log possession, PM is the player’s raw plus-minus, and Poss is the number of possessions for that player
In fact, a closer look at the interpretable vector
The second-order effects are given in in Table 6, and quantify the contributions of player pairs, having removed the mean, individual, and higher-order group effects. The top and bottom five pairs (in terms of SCLP) are presented here, with more complete data in Table 16 in the appendix.
Top and bottom five SCLP pairs with at least 200 possessions, along with raw plus-minus and possessions
Even after accounting for and removing their strong individual contributions, however, it is notable that Green–Curry, Curry–Thompson, and Green–Thompson are the dominant pair contributors by a considerable margin, with SCLP values that are all more than twice as large as for the next largest pair (Barbosa–Speights). These large positive SCLP values represent true synergies: These pairs contribute to team success
Reserves Leandro Barbosa, Mareese Speights, and Ian Clark, on the other hand, were poor individual contributors, but manage to combine effectively in several pairs. In particular, the Barbosa–Speights pairing is notable as the fourth best pure pair on the team (in 983 possessions). After accounting for individual contributions, lineups that include the Barbosa–Speights pairing benefited from a real synergy that positively contributed to team success. This suggests favoring, when feasible, lineup combinations with those two players together to leverage this synergy and mitigate their individual weaknesses.
Tables 7 and 8 show pair values for players Andrew Bogut and Shaun Livingston (again in pairs with at least 150 possessions, and with more detailed tables in the appendix). Both players are interesting with respect to second order effects. While Bogut was a positive individual contributor, and was a member of the Warriors’ dominant starting lineup that season, he largely fails to find strong pairings. His best pairings are with Klay Thompson and Harrison Barnes, while he pairs particularly poorly with Andre Iguodala (in a considerable 785 possessions). This raises interesting questions as to why Bogut’s style of play is better suited to players like Thompson or Barnes rather than players like Curry or Iguodala. Also noteworthy is the fact that the Bogut–Iguodala pairing has a positive plus-minus value of 107. The spectral interpretation is that this pairing’s success should be attributed to the individual contributions of the players, and once those contributions are removed, the group lacks value as a pure pair.
Select pairs involving Andrew Bogut (with at least 150 possessions)
Select pairs involving Shaun Livingston (with at least 150 possessions)
Shaun Livingston, on the other hand, played an important role as a reserve point guard for the Warriors. Interestingly, Livingston’s worst pairing by far was with Klay Thompson. Again, considering the particular styles of these players compels interesting questions from the perspective of analyzing team and lineup compositions and playing style. It’s also noteworthy that this particular pairing saw 1412 possessions, and it seems entirely plausible that its underlying weakness was overlooked due to the healthy 111.8 plus-minus with that pair on the floor. The success of those lineups should be attributed to other, better synergies. For example, one rotation added Livingston as a sub for Barnes (112 possessions). Another put Livingston and Speights with Thompson, Barnes, and Iguodala (70 possessions). Finally, it’s also interesting to note that Livingston appears to pair better with other reserves than with starters (save Draymond Green, further highlighting Green’s overall value), an observation that raises important questions about how players understand and occupy particular roles on the team.
Table 9 shows the best and worst triples with at least 200 possessions.
Best and worst third-order effects for GSW with at least 200 possessions
The grouping of Green–Curry–Thompson is far and away the most dominant triple, and safely (and unsurprisingly) earns designation as the Warriors’ big three. Other notable triples include starters like Green and Curry or Green and Thompson together with Andre Iguodala who came off the bench, and more lightly used triples like Curry–Barbosa–Speights who had an SCLP of 4.6 in 245 possessions. Analyzing subpairs of these groups shows a better stacking of synergies in the triples that include Iguodala–he pairs well with Green, Curry, and Thompson in the second order space as well, while either of Barbosa or Speights paired poorly with Curry. Still, Barbosa with Speights was quite strong as a pair, and we see that the addition of Curry does provide added value as a pure triple. Interesting ineffective triples include Iguodala and Bogut with either of Curry or Green, especially in light of the fact that Bogut–Iguodala was also a weak pairing (see detailed tables in the appendix).
Figure 1 shows that the most effective player-triples as identified by spectral analysis are positively correlated with higher values of plus-minus.

Third-order effects for triples with more than 100 possessions the 2015-2016 Golden State Warriors. The
As raw group plus-minus decreases, however, we see considerable variation in the spectral contributions of the groups (and in number of possessions played). This suggests the following narrative: while it may be relatively easy to identify the team’s top groups, it is considerably more difficult to identify positive and negative synergies among the remaining groups, especially when controlling for lower-order contributions. Spectral analysis suggests several opportunities for constructing more optimal lineups with potential for untapped competitive advantage, especially when more obvious dominant groupings are unavailable.
Table 10 shows top and bottom three third-order effects for the 15-16 Boston Celtics. (The appendix includes more complete tables for Boston including effects of all orders.) Figure 2 gives contrasting bar plots of the third-order effects for both Boston and Golden State.
Top and bottom three third-order effects for BOS with at least 150 possessions

Bar graph of third order spectral contributions per log possession (SCLP) for BOS and GSW for groups with more than 150 possessions.
The Celtics have fewer highly dominant groups. In particular, we note that the spectral signature of the Celtics is distinctly different from that of the Warriors in that Boston lacks anything resembling the big-three of Golden State. While SCLP values are not directly comparable across teams (they depend, for instance, on the norm of the overall team success function when projected into each effect space), the relative values within an effect-space are comparable. Similarly, the SCLP values also depend on the norm of the interpretable vector used in Mallow’s method. As a result, the values are not directly comparable across effect spaces–a problem we return to below.
In fourth and fifth-order spaces the numbers of high-possession groups begins to decline, as alluded to above. (See appendix for complete tables.) Still, it is interesting to note that spectral analysis flags the Warriors small lineup of Green–Curry–Thompson–Barnes–Iguodala as the team’s best, even over the starting lineup with Bogut replacing Barnes. It also prefers two lesser-used lineups to the Warriors’ second most-used lineup of Green–Curry–Thompson–Bogut–Rush. Also of note is the fact that Golden State’s best group of three and best group of four are both subsets of the starting lineup–another instance of stacking of positive effects–while neither of Boston’s best groups of three or four are part of their starting lineup.
Before moving on, we consider the connection between spectral analysis and a related approach via linear regression which will likely be more familiar to the sports analytics community.
Recalling our assumption of a 15 man roster, consider the problem of modeling a lineup’s plus-minus, given by
Let
The fitted coefficients
Tables 11 and 12 give the ridge regression coefficients associated with the top 5 individuals, pairs, and triples for the Warriors.
Best individuals and pairs using the linear model
Best individuals and pairs using the linear model
Top triples according to the linear model
Comparing with Tables 5, 6, and 9 shows both some overlap in the top rated groups, but also significant differences with respect to both ordering and magnitude of contribution. In particular, the linear model appears to value the contributions of Andrew Bogut considerably more than spectral analysis. It is also notable that spectral analysis identifies a clearly dominant big three of Green–Curry–Thompson, in contrast to the considerably different result arising from the modeling approach which ranks that group third.
We can interpret the linear model determined by
Still, there are important differences between (2) and (9). While
The decomposition in (ref
In this section we take a first step to addressing questions of the stability of spectral analysis. We seek evidence that spectral analysis is indicative of a true signal, and that should the data have turned out slightly differently, the analysis would not change dramatically. Since spectral analysis works on the lineup function
To that end, we start with the actual 15-16 season for the Boston Celtics. We can then build a bootstrapped season by sampling plays, with replacement, from the set of all plays in the actual season. (We sample the same number of plays as in the actual season.) A play is defined as a connected sequence of events surrounding a possession in the team’s play-by-play data. For example, a play might involve a sequence like a missed shot, offensive rebound, and a made jump shot; or, a defensive rebound followed by a bad pass turnover. When sampling from a team’s plays, a particular lineup will be selected with a probability proportional to the number of plays in which that lineup participated. We generate 500 bootstrapped seasons, process each using the methodology of sections 2 and 3 to produce success functions
The variability in group PM presents a challenge in gauging the stability of the spectral analysis associated with a player group. Take, for example, the Thomas–Bradley–Crowder triple for the Celtics. The actual season’s plus-minus for this group was 154.8 in 2572 possessions. Over the bootstrapped seasons the group has means of 145.9 and 2574.1 for plus-minus and possessions, respectively. On the other hand, the standard deviation of the plus-minus values is 82.8 versus only 47.7 for possessions. Thus, some of the variability in the spectral contribution of the group over the bootstrapped seasons should be expected since, in fact, the group was less effective in some of those seasons. Figure 3 shows SCLP plotted against PMperLP for the Thomas–Bradley–Crowder triple in 500 bootstrapped seasons. Of course, spectral analysis purports to do more than raw plus-minus by removing otherwise confounding colinearities and overlapping effects. Not surprisingly, therefore, we still see variability in SCLP within a band of plus-minus values, but the overall positive correlation, whereby SCLP increases in seasons where the group tended to outscore its opponents, is reasonable.

Spectral contribution per log possession (SCLP) versus plus-minus per log possession (PMperLP) for Thomas–Bradley–Crowder triple in 500 bootstrapped seasons. Each bootstrapped season consists of sampling plays (connected sequences of game events) with replacement from the set of all season plays. Resampled season data is then processed as in section 2 and group contributions are computed via spectral analysis as in section 3.
Also intuitively, the strength of the correlation between group plus-minus and spectral contribution depends on the number of possessions played. Fewer possessions means that group’s contribution is more dependent on other groups and hence exhibits more variability. The mean possessions for the Thomas–Bradley–Crowder triple in Fig.3 is 2574, and has a Pearson correlation of
Another natural question is how to value the relative importance of the group-effect spaces. One way to gauge importance uses the squared
Distribution of the squared L 2-norm of the team success function over the effect spaces
Distribution of the squared
By this measure, the higher-order spaces are dominant as they hold most of the mass of the success function. An issue with this metric, however, is the disparity in the dimensions of the spaces. Because
Moreover, we can take the true success function of a team and break the dependence on the actual player groups as follows. Recall that the raw data
Average fraction of squared
An alternative measure of the importance of each effect space is given by measuring the extent to which projections onto
Spectral analysis proposes a new approach to understanding and quantifying group effects in basketball. By thinking of the success of a team as function on lineups, we can exploit the structure of functions on permutations to decompose the team success function. The resulting Fourier expansion is naturally interpreted as quantifying the group effects to overall team success. The resulting analysis brings insight into important and difficult questions like which groups of players work effectively together, and which do not. Furthermore, the spectral analysis approach is unique in addressing questions of lineup synergies by presenting an EDA summary of the actual team data without making the kind of modeling or skill-based assumptions of other methods.
There are several directions for future work. First, the analysis presented used raw lineup level plus-minus to measure success. This approach has the advantage of keeping the analysis tethered to data that is intuitive, and helps avoid pitfalls arising from low-possession lineups. Still, adjusting the lineup level plus-minus to account for quality of opponent, for example, seems like a valuable next step. Another straight forward adjustment to raw plus-minus data would involve devaluing so-called garbage time possessions when the outcome of the game is not in question.
As presented here, spectral analysis provides an in-depth exploratory analysis of a team’s lineups. Still, the results of spectral analysis could also add valuable inputs to more traditional predictive models or machine learning approaches to projecting group effects. Similarly, it would be interesting to use spectral analysis as a practical tool for lineup suggestions. While the orthogonality of the spectral decomposition facilitates valuation of pure player-groups, the question of lineup construction realistically begins at the level of individuals and works up, hopefully stacking the contributions of individuals with strong pairs, triples, and so-on. A strong group of three, for instance, without any strong individual players may be interesting from an internal development perspective, or at the edges of personnel utility, but may also be of limited practical value from the perspective of constructing a strong lineup. Development of a practical tool would likely require further analysis of the ideas in sections 7 and 8 based on ideas in Diaconis et al. (1998). For example, given data (a function on lineups), we might fix the projection of that data onto certain spaces (like the first or second order), and then generate new sample data conditional on that fixed projection. The resulting projections in the higher-order spaces would give some evidence for how the fixed lower-order projections affect the mass of
