Tactical Spectale: Forming a Soccer Formation

Formation, Shmormation

An exploration of team formation and aggregated statistical tendencies of a team. Can we predict the formation of a team given their historical playing season statistics? This aimed not to hint at any causal relationship, but rather existed in the name of exploratory computational statistics.

To get our data, we used a soccer database API. We used a variety of models from tidymodels. First we used A CART to understand feature importance in the dataset. Then we trained our model with a tuned Random Forest. It had around 80% accuracy, which was great, but one issue we ran into was the over-prevalence of the 4-4-2 formation in the dataset, leading to a skewed model to over-predicted towards 4-4-2.

Using this model, we created an interactive Shiny App in R that asks users to rank team attributes by importance. Based on those rankings, our model can predict which formation would support a team that should statistically perform such that team metrics are optimized by the user's ranking of importance.