Skip to content

A Quick, Interactive Approach to Learning Data Analytics with SQL using Spring Data, JOOQ and Window Functions

Notifications You must be signed in to change notification settings

essentialprogramming/spring-boot-data-analytics

Repository files navigation

Spring-Boot-Data-Analytics

A Quick, Interactive Approach to Learning Data Analytics with SQL using Spring Data, JOOQ and Window Functions

📑 What you will learn

  • Understand what data analytics is and why it is important
  • Experiment with data analytics using basic and advanced queries
  • Interpret data through descriptive statistics and aggregate functions
  • Work with and manipulate data using SQL joins and constraints
  • Speed up your data analysis workflow by optimizing queries

🔖 What are window functions ?

  • Nothing to do with Windows OS
  • Standard functionality added to T-SQL
  • Functions that operate on a set or window of rows
  • Always with an OVER clause (but sometimes you will see an OVER clause without a window funtion)
  • Always found in the SELECT and ORDER BY
  • Makes queries easier to write
  • Often better performance

📃 Introduction to SQL Window Functions

Window functions are an advanced SQL feature offered to improve the execution performance of queries. These functions act on a group of rows related to the targeted row called window frame. Unlike a GROUP BY clause, Window functions do not collapse the rows to a single row, preserving the details of each row instead. This new approach to querying data is invaluable in data analytics and business intelligence.

Window functions follow three key steps:

  • They split data into groups
  • They perform a calculation on each group
  • They combine the results of those calculations back into the original dataset three_key

have_over_works

🔩 Example 1:

Let’s say we have the following problem statement: find the team with most points from each group over the entire dataset.

Teams_-_random

Before jumping into this question, it’s worth examining a few similar questions that we can easily answer without window functions.

  • If our task was to find the team with most points by group that would have been very simple.
  • Or if we wanted to find the team with most points overall, that would have been trivial.
  • What makes our original question different, is that it asks us to find out a piece of information separately for specific groups within our data, but also to preserve information from the original dataset. We need the team with most points from each group and we need to calculate that separately.

Moving on — let’s first answer this question conceptually, using the three steps outlined above:

  • Separate data into groups: In this case, we want to split up by group.
  • Perform an aggregation: Once we have our data separated, we can rank by team points.
  • Combine the data back together: When we combine back together, each original record will have a column indicating its rank order compared to all other teams within same group.

three_key_steps

  • Teams within a group represent a window:

Groups_-_each_group_is_a_window

  • We rank teams within each window:

Teams_-_ranking_per_group

  • Combine the data back together and select all teams which have rank 1. And we have all group winners: Group_Winners

About

A Quick, Interactive Approach to Learning Data Analytics with SQL using Spring Data, JOOQ and Window Functions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •