Logo

dev-resources.site

for different kinds of informations.

Analyzing Svenskalag Data using DBT and DuckDB

Published at
6/17/2024
Categories
dataengineering
dbt
duckdb
football
Author
calleo
Author
6 person written this
calleo
open
Analyzing Svenskalag Data using DBT and DuckDB

As a youth football coach and data engineer, I have been dreaming of that Moneyball moment to become true.

But coaching 10-year old girls, we (the coaching team) are less concerned about batting averages and player valuations. Our primary goal is to get everyone to enjoy football and develop as a player and a teammate. By getting these foundational parts right, we hope that many of todays' team members will keep playing football for a long time.

But this doesn't mean that you can't use data to improve. Most clubs these days use standardized software to track attendance and team members over time. This is extremely helpful just to get players to show up to practice and games, but the data can also be used when planning for the season:

  • Which days to practice?
  • How many teams to register?
  • Which level of difficulty (league) to pick?
  • If you are bringing in an external coach, which day of the week would be the best one?

There are many more potential questions you might find the answer to, especially these days when teams keep track of scores, shots, running distance, etc.

I have been wanting to try DuckDB for a long time, and this seemed like the perfect excuse. Follow along to see how to scrape a website using Python and transform data using DBT. Best of all, it's all done on your local machine using DuckDB for persistence and querying.

Getting the Data

Within our club, we use Svenskalag.se which has become a very popular system used to manage sports teams in Sweden.

This system offers some basic reporting functionality, but you quickly run out of options if you want to do anything else than just seeing how many training sessions each player has attended.

There is no public API available to extract the data, so the only option left is the dirtiest trick in the book: web scraping!

Using Scrapy I fetched the data needed (activities and attendance). Scrapy handled authentication using a form request in a very simple way:

yield scrapy.FormRequest(
  login_url,
  formdata={'UserName': username, 'UserPass': password},
  callback=self.parse
)
Enter fullscreen mode Exit fullscreen mode

Scrapy relies on XPath to extract data. I admit it, I rarely get those expressions right the first time, so it was a big help to use Chrome Developer Tools to test them.

At the beginning I searched for individual elements to extract the data. However, after a while I noticed that all the data I needed was rendered as JavaScript/JSON within script tags. Example:

<script>
var initData = {
  teams: [
    {
      id: 7335,
      name: "Soccer Dads"
    },
    {
      id: 9495,
      name: "Soccer Moms"
    }
  ]
}                  
</script>
Enter fullscreen mode Exit fullscreen mode

This made things I whole lot easier. By getting the text content from the script tag, I could use calmjs.parse to convert JavaScript into a Python data structure. Much easier than finding tags and extracting text using XPath.

Data Modelling

After fetching the data I ended up with JSON-objects that I stored in DuckDB. These needed to be transformed into something that could be analyzed more easily.

I decided to use DBT for this task, together with the DuckDB connector. DuckDB is especially brilliant when you are working locally on a project like this.

I had some issues at the beginning, but it was because DBT is very picky about the naming of the profile file (profiles.yml and NOT profiles.yaml) 🤦

In the DBT profile I configured DuckDB by attaching the database with the raw data and loading the extensions needed (ICU for time zone parsing). After that it felt like any other DBT project I have worked on.

As a frequent Snowflake user, I appreciate the simplicity when it comes to handling unstructured data. Turns out DuckDB can do it just as well. I used UNNEST to pick a part the JSON payload, which was almost hassle free (learnt about 1-based indexing). A positive surprise was the inclusion of QUALIFY into DuckDB.

Analysis

Over time I have gotten to know my coach colleagues pretty well, and they like Microsoft Excel. I recognize this all too well from my day-time job: You create fancy data models in a data warehouse, just for them to be exported into Microsoft Excel or Google Sheets.

But this time I came prepared and built a wide table containing all the data you could imagine, which can then be exported to a CSV file (using DuckDB) with a simple command. Anyone can then open it in Excel and be happy 🙂

Summary

It is a breeze to get your analytics project up and running using tools such as DBT and DuckDB. And although fragile, web scraping can be a life saver and tool worth having around.

You can find the source code for this project on GitHub. Don't be a stranger, I'd love to hear your feedback.

Now let's get back to enjoying the football played at Euro 2024 🎉

football Article's
30 articles in total
Favicon
انواع شرط بندی فوتبال آنلاین: میکس، تکی، لایو و بیشتر
Favicon
10 Greatest Russian Soccer Players in History
Favicon
football artificial intelligence
Favicon
OkeStream Guide: Your Ultimate Companion for Today’s Football Action
Favicon
James Daunte Harrell: A Rising Star in the World of Entertainment
Favicon
How to Calculate a Good and Correct Mix Parlay
Favicon
The New Era of Football: Blockchain and Digital Assets Redefining the Sport
Favicon
Can I use GenAI services to predict football results? - My experience during European Football Championship 2024
Favicon
Analyzing Svenskalag Data using DBT and DuckDB
Favicon
Football news - Mbappe!
Favicon
Is Retro Bowl Still Worth Playing in 2024?
Favicon
Watch Full Matches Replay & Highlights Football https://sportuniversaltv.com/
Favicon
Live Scores, Football News and Transfer Updates | Sportsdunia
Favicon
پیش بینی فوتبال
Favicon
How Oracle Cloud has enhanced the English Premier League
Favicon
An article on the game of football
Favicon
Ball physics in GOALS
Favicon
C# Tutorial – Create a Football Penalty Kick Game in Visual Studio with Win Forms
Favicon
Attempt to create an Elifoot style game for Football (🏈)m
Favicon
How to build a football live scores notification app with Python?
Favicon
Goalookr - The Best Sports News
Favicon
How football made me a better team player at work
Favicon
Mathematical Modelling of Football
Favicon
Maximize the potential of your football players with Machine Learning
Favicon
Is Neymar Better Than Average?
Favicon
Probabilidade E Estatística Para Previsão de Resultados Esportivos
Favicon
Using Probability and Statistics to Predict Sportive Results
Favicon
Fantasy Premier League with AI - First 10 Gameweeks Review
Favicon
Bitcoin and sports betting: Play sports betting with the cryptocurrency
Favicon
Coding, Concussions, or Country Music?

Featured ones: