Abstract
There is a long-held belief in the chess community that the player with the white pieces has an advantage in making the first move. This phenomenon has been observed repeatedly in over-the-board games between high-level players and professionals. However, less is known about the prevalence of white’s advantage in games played between amateurs in more casual settings. This article attempts to identify a first-move advantage in chess by examining a large database of amateur games played online on a dedicated chess website. Win rates are calculated for various rating levels, and the influence of opening move choice is also explored. These results can help determine whether there is an inherent first-move advantage in chess observable for all players in multiple settings, or if this effect is exclusively seen with players of high skill during games played in person.
Introduction
In many turn-based games, there is an accepted belief that the player making the first move has a distinct advantage. The advantage of the initiative can be so extreme that the player going first has a defined path to victory in certain cases. Connect-Four and Go-Moku are examples of games with perfect information where computer solutions exist showing a win for the player making the first move (Uiterwijk & van den Herik, 2000). This first-move advantage can also be seen in other types of games, such as those without perfect information for which computer solutions do not exist. Oftentimes game mechanics are designed to help mitigate this imbalance between the players. In some games, players are able to bid on turn order, and in others, there is an asymmetry in the starting resources determined by turn order.
In chess, two players compete in a strategy game with perfect information. Players have an equal number of pieces arranged on a board divided into an 8
So far, the main focus of the research into assessing white’s advantage in chess has focused on games played between very skilled or professional players. Less is currently known about whether this first-move advantage also translates to amateur chess players. Many examples exist detailing differences between expert and amateur chess players (Sheridan & Reingold, 2014). Therefore, it is reasonable to suspect the advantage of moving first could be different for amateur players. Moreover, the majority of the research focused in this area has concentrated on games played over the board, but recent interest in online chess has grown substantially, especially in the years following the COVID-19 pandemic. Consequently, there is a lack of information available on white’s advantage for games played in the less traditional setting of the internet.
This work will address the current gap in knowledge regarding the advantage of moving first in chess for amateur play online. A publicly available database of online chess games will be analyzed in order to determine whether white’s advantage is also seen in this context. Game characteristics and game results will be summarized. In particular, win rates will be calculated and comparisons will be made for different rating ranges, time controls, and opening choices.
Methods
Data Source
Game information is taken from Lichess, an online chess website that hosts millions of chess games each day (https://lichess.org/). Lichess is free to access and open-source, and many high-profile professional chess players compete regularly on the platform. Users on Lichess can participate in live or correspondence games with a variety of time controls. Games can be rated or unrated and the Glicko-2 rating system is used for rated games. A chess rating is a numerical score that attempts to measure player skill. Players with higher ability levels and more success in games earn higher ratings. Various chess websites and organizations have rating systems in place to assist in pairing players of similar skill. In addition to playing standard chess games, players may also play chess variants like Fisher random chess, complete puzzles, and train openings.
Lichess maintains an open database of games completed on their server that is available to the public (https://database.lichess.org/). Games are arranged in files by month spanning back to 2013, and the data for each month can be downloaded in portable game notation (PGN) format. Most months contain information for millions or tens of millions of games. For this analysis, data from one randomly chosen month, December 2019, was used, and it is assumed that this month provides a representative look at amateur player behavior online. This original dataset contains 10.9 GB of data for 44,055,757 games.
Data Preparation
The raw data contained various information about each game, but some preprocessing was necessary before statistical analysis. All data wrangling and analyses were completed in the statistical programming language R (Core Team, 2024), and the R package
Many of the games in the original dataset were not appropriate to include in the analysis. For example, a game with a large discrepancy in player ratings would not be relevant when attempting to isolate the advantage of moving first as the game outcome would likely be heavily influenced by the relative difference in player strengths. Also, one would not expect games where one of the players resigns after the first move to be particularly insightful. Therefore, a subset of the full dataset was created before proceeding to the statistical analysis.
Several characteristics were used to filter the games into a suitable subset. First, games that ended in less than 4 moves were removed from the data set. This cutoff was chosen since the focus of this study is amateur play and the popular scholar’s mate checkmating pattern common in lower-level games requires four moves. Next, only games with classical or blitz time controls were kept. Lichess allows the creation of games with custom time controls, and it would be problematic to include games like hyperbullet, where players might only have 30 seconds on the clock, since those games rely heavily on reaction time. Finally, ratings were used to further eliminate unwanted games. In order for a game to be retained, both players must have ratings less than 2000 and the two player ratings must be within 10 points. This reduces the data to only consider amateur players with similar skill levels, and as a consequence of this process, all unrated games were removed from consideration. In addition, the ratings for both players must belong to the same rating range corresponding to the classifications used by the United States Chess Federation. This was done for the purpose of making comparisons of the first-move advantage for different rating ranges easier to interpret. After the data filtering was complete, a final total of 3,241,159 games remained that were used for the final analysis.
Statistical Analysis
Summary statistics for several variables related to the players and types of games played will be calculated. Data analysis will primarily concentrate on calculating the win rates for each color. For additional comparisons, game results will be stratified according to rating category, time controls, and selection of opening, and hypothesis tests will be conducted to assess the statistical independence of these variables. All statistical inference will be conducted using a significance level of
Engine Analysis
A chess engine will be used to analyze a selection of games. Chess engines are computer programs that evaluate chess positions to produce a quantitative assessment of a given game state. In general, an engine analysis can identify the strongest moves available in a given position as well as produce a numerical estimation of which player, if either, has the advantage at a given point in the game. Chess engines typically measure the advantage in a position in a unit called a centipawn, which is equal to 1/100th of a pawn. The evaluation can be both positive and negative with positive values showing an advantage for white, negative values indicating an advantage for black, and values close to zero showing an equal position. In addition, each chess piece can be assigned a relative value in pawn units. For example, rooks and queens are typically thought to have values equal to five and nine pawns, respectively.
Here, games played using the classical time control will be analyzed using the open-source engine Stockfish 17 (https://stockfishchess.org/). Classical games have a larger amount of time on the game clock relative to blitz games, so focusing on these longer games might help reduce the influence that time pressure can have on the evaluation of the advantage. Each player’s first 10 moves, or their total number of moves if the game ended before a player had the chance to make 10 moves, will be analyzed with Stockfish using a depth of 10. If playing with the white pieces does come with an inherent advantage for amateur players online that affects the game outcome, then it would be reasonable to expect to see the engine evaluation show white gaining an advantage in the opening phase of the game.
Results and Discussion
Game and Player Characteristics
An initial examination of descriptive statistics provided a summary of the distributions of the variables in the data set. This was done in order to gain insight into the types of games these players completed. First, it was of interest to examine the length of games. The games in the data set had a mean number of moves of 33.65 with a standard deviation of 15.88. The longest game had 285 moves, but most games lasted between 22 and 43 moves as those values represent the first and third quartiles. A histogram showing the distribution of the number of moves can be seen in Figure 1. The plot shows a right-skewed distribution with a small number of outliers beyond 100 moves and the majority of games lasting less than 50 moves. In total, 2019 games went over 100 moves which represents less than 1% of the data set.

Histogram showing the distribution of the number of moves.
Next, the frequencies of various openings were calculated to better understand opening preferences for amateurs when playing online. In chess, the assorted openings represent different specific move order choices in the first phase of the game. Different opening choices typically lead to different game structures and can influence styles of play. For example, flank openings are characterized by moves focused on the outside portions of the board rather than the center. Chess analysts and authors have created an extensive collection of knowledge related to opening theory that outlines optimal play, and players routinely study openings as part of their chess training. Similar opening sequences can be grouped together, and the Encyclopedia of Chess Openings (ECO) is a popular reference for separating openings into various categories and subcategories. In total, 484 different ECO subcategories were represented in the data set. The most commonly occurring individual ECO code was the Scandinavian Defence (B01) with 160,349 games. Results, tabulated according to the five broadest ECO volume categories (A through E), are shown in Table 1. Summarizing the data this way was done in order to make interpretation easier. Open games were the most common opening choice followed by semi-open games with Indian defences being the least popular opening. Overall, the distribution of opening lines played in these amateur games is not even.
Percentage of Games Played for Each Opening By Main ECO Volume Category.
Time control preferences were also investigated. Blitz games made up the majority of the data set with 94.11% of the games having that time control. While classical games only accounted for 5.89% of the data set, that still totaled over 191,000 games. These statistics show the overwhelming popularity of online blitz and possibly point to online chess as a more casual form of the game given the significantly smaller percentage of games played using a longer time control.
Finally, the distribution of player skill, as measured by their rating, was also explored. As can be seen in Figure 2, histograms for the distributions of ratings for the white and black players were nearly identical. Players for both colors have sample means of 1560 with standard deviations of 249.24. These sample means are slightly above the Lichess starting rating of 1500 for new accounts. The distributions are not uniform and show left skew with a relatively smaller numbers of player having extremely low ratings. In addition, a two-sample Kolmogorov--Smirnov test for equality of distributions was conducted. The conclusion of the test was to fail to reject the null hypothesis of equality of distributions with

Histograms showing the distributions of ratings for both white and black.
The main focus of the analysis is an examination of the game outcomes in order to determine winning percentages for each color. Overall, white won 49.91% of games while black won 45.98% with 4.11% of the games ending in a draw. This shows the white player in amateur online games has a slight advantage. However, the difference in winning percentages between white and black in this context does appear to be approximately half that observed in professional games. This could possibly be explained in several ways. Lower-level players could lack more detailed knowledge about opening theory which might hinder their be ability to translate an opening advantage into a decisive game outcome. Alternatively, amateur players could squander advantages by committing frequent or large blunders that are not seen in games played between higher-level players and professionals. In addition, the draw rate in these lower-level games is substantially lower than that reported for higher-level games. This characteristic could also have multiple plausible explanations. For example, an amateur player might lose a theoretically drawn game due to not being familiar with a specific drawing technique.
For a more detailed breakdown concerning white’s advantage, winning percentages for the two different time controls were also calculated. Table 2 shows the win and draw rates for both the blitz and classical time controls. These results indicate that white has a similar advantage in both time controls. White won 50.31% of classical games to black’s 45.55%, and white won 49.89% of blitz games to black’s 46.01%. These results are also very close to the overall winning percentages previously discussed. A Chi-squared test of independence was conducted to determine whether a relationship exists between game outcome and time control. This result could be useful in determining whether white’s advantage is the same for both time controls. The test produced a test statistic of
Win and Draw Percentages By Time Control.
Win and Draw Percentages By Time Control.
Game results were also analyzed separately for different opening choices. Outcomes for various openings categorized based on the five ECO volume codes can be found in Table 3. Here, it is clear that white maintains an advantage for all of the different openings. The general trend for outcomes based on opening line matches the overall winning percentages, as well as the time control results, with white winning just under half of the games on average, black winning around 46% of games, and the remaining 4% of contests ending in draws. White has the largest advantage in closed openings winning slightly over 51% of those games to black’s 44.48%. The gap in winning percentages between white and black is the smallest for semi-open games with a difference of 2.01%. Another Chi-squared test was conducted to evaluate the possible dependence between game outcome and opening category. The resulting test statistic was
Win and Draw Percentages By Opening ECO Code.
Finally, win rates were calculated within different rating ranges. Games were separated into one of seven different categories based on the ratings of the two players. This was done in an attempt to gain insight as to whether white’s advantage depends on a player’s skill since it would be reasonable to expect that players with higher ratings know more opening theory and are more knowledgeable about how to achieve and convert opening advantages. Table 4 shows these results. White again has an advantage across each of the different rating categories with white winning somewhere between 49% and 51% of the games. Black wins between 45 and 47% of games with draws accounting for around 4% of games. Differences between the various rating categories are minimal. The gap in winning percentage between the ranges with white’s largest and smallest advantage is less than 1%. Players in the 600–799 rating category enjoy the largest estimated advantage with white winning 50.43% of games and players with ratings between 1800 and 1999 have the smallest estimated advantage at 49.46%. A test of independence between game outcome and rating category was conducted and produced a test statistic of
Win and Draw Percentages By Rating Category.
For each game, the Stockfish evaluations are first converted from centipawns to pawns. This is done to present results in a more interpretable unit. Then, the game evaluations are divided into several subsets based on game outcome and opening choice. For each subset, the mean evaluation is calculated for each move. The means are then plotted in order to detect whether the engine evaluations show an advantage developing throughout the opening phase of the game.
The first subset investigated is based on game outcome alone. Games are divided into wins for white, wins for black, and draws. Results showing the mean evaluations for each of these groups of games are shown in Figure 3 with the move number on the horizontal axis and evaluation measured in pawns on the vertical axis. The plot shows that the mean evaluation for games won by white are above zero and steadily increasing as the game progresses. Positive evaluations correspond to an advantage for white. This suggests that, on average, players with the white pieces are building an advantage out of the opening for games they win. The results for games won by black show a decreasing trend with the mean evaluations dipping below zero. These negative evaluations display an advantage for black, so black also tends to steadily increase their advantage during games they win. Games that end in a draw have mean evaluations falling in the middle with the values remaining close to zero, which indicates an equal game. At first glance, these results might seem obvious and expected. However, it is important to remember that these are amateur games, and amateurs tend to make more frequent and serious mistakes that can drastically change the game evaluation. Therefore, it is relevant to be curious about whether the outcomes of these amateur games are likely determined by skill and intentional play, or if blunders, such as an unforced error causing the loss of a valuable piece, can instead be credited with deciding who wins. If isolated mistakes were the primary driving force in determining game outcomes, then one would expect the evaluations to show sudden and large changes, but this not seen here. In addition, not all advantages are equal. At move 20, in games won by white, the mean evaluation is 1.51 pawns. However, the mean evaluation in games won by black is only

Line plots showing the mean engine evaluation (in pawns) based on game outcome.
Next, engine evaluations for the five ECO opening categories are considered. As before, subsets of games are created and the mean evaluation at each move is calculated. Initial overall opening results, which do not take into account game outcome, are shown in Figure 4. For each opening category, the mean evaluations fluctuate around values close to zero. Therefore, it appears that no opening category promotes an overall advantage for white or black in the first 20 moves of the game when examining all outcomes simultaneously. To gain additional insight, these opening results are further subdivided into two sets of games: wins for white and wins for black. Plots showing the resulting mean evaluations for the openings separated by the two game outcomes are in Figure 5 with games won by white on the left and games won by black on the right. For the games won by white, the plot shows increasing mean evaluations for each opening with all values above zero. In addition, each opening shows an advantage for white of at least one pawn at move 20. During these victories, white achieves the largest mean advantage at move 20 of 1.65 pawns with openings of the Open type. When black wins, all of the opening categories show downwards trend and most display an advantage for black at move 20. However, there are significant differences in these results when compared to the evaluations for games where white wins. Perhaps most importantly, the magnitude of black’s advantage is again noticeably smaller than the advantage white had at move 20. Flank openings produce the largest mean advantage for black in wins at move 20, but the mean evaluation is only

Line plots showing the mean engine evaluation (in pawns) based on opening ECO code with evaluations aggregated across all three possible game outcomes.

Line plots showing the mean engine evaluation (in pawns) based on opening ECO code with evaluations separated based on game outcome. Results for games won by white are on the left while games won by black are on the right.
This work investigated whether the initiative white gains from moving first is an advantage in amateur online chess. A large database of games was examined in order to quantify differences in win rates for white and black. It was found that players with the white pieces win a higher percentage of games than those playing black. This advantage is consistent across time controls, opening choices, and rating categories. While white does appear to have an advantage in lower-level games played on the internet, this advantage is smaller than previously reported results for games played over the board by players with higher ratings. In addition, the observed draw rate was significantly lower than that found for professionals. The engine analysis also supported the notion that white has an advantage in this context since the size of the advantage white is able to establish at the beginning of games won by white is larger than the corresponding edge black has in games won by black.
Several open questions remain in this area. Exploring the lack of draws seen in these online amateur games could be a reasonable direction for future research. Also, with the recent proliferation of online professional chess events, it could be interesting to investigate possible differences in the first-move advantage for professionals and highly skilled players playing online versus over the board. A more thorough examination of outcomes for specific opening lines could reveal potential recommendations. In addition, this analysis focuses on data from only one chess website during one month, and expanding the analysis to include data from other months would allow for the identification of any changes in white’s advantage through time. Finally, a more detailed and expansive move-by-move analysis of games with the aid of a chess engine could help further identify the main factors, such as advantages gained out of the opening or tactical blunders, that prove decisive in these low-level game played online.
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
