Skip to content

jecastrom/lab-sql-9

 
 

Repository files navigation

Lab SQL Queries 9

Instructions:

In this lab we will find the customers who were active in consecutive months of May and June. Follow the steps to complete the analysis.

1. Create a table rentals_may

to store the data from rental table with information for the month of May

Answer:

CREATE TABLE rentals_may AS
SELECT
    *
FROM
    rental
LIMIT
    0;

to verify the table:

DESCRIBE rental;
148646240 bfa84b35 c659 423e 8c34 d35ad73be688

2. Insert values in the table rentals_may

using the table rental, filtering values only for the month of May

Answer:

INSERT INTO rentals_may
SELECT
    *
FROM
    rental
WHERE
    monthname(rental_date) = 'May';

to verify that the data was inserted into the table:

SELECT
    *
FROM
    rentals_may
LIMIT
    3;
148646630 74596c5a ed60 458c 979a 14b92a58ac9e
💡
the last two questions can be done in only one query as follow:

Answer:

CREATE TABLE rentals_may AS
SELECT
    *
FROM
    rental
WHERE
    monthname(rental_date) = 'May';

To verify that the table was created along with the data:

SELECT
    *
FROM
    rentals_may
LIMIT
    5;
148645242 51757c8c 1dc1 446e af79 857815ac0274

3. Create a table rentals_june

to store the data from rental table with information for the month of June

Answer:

CREATE TABLE rentals_june AS
SELECT
    *
FROM
    rental
WHERE
    monthname(rental_date) = 'June';

to verify the table and data:

SELECT
    *
FROM
    rentals_june
LIMIT
    3;
148647187 f60e8282 a517 4b63 a1ed e9ec9e8477de

4. Insert values in the table rentals_june

using the table rental, filtering values only for the month of June

Answer:

Already done in the previous question

5. Check the number of rentals for each customer for May

Answer:

SELECT
    concat((b.last_name), ' ', (b.first_name)) AS customer_name,
    count(*) AS number_of_films_rented_may
FROM
    rentals_may a
    INNER JOIN customer b ON a.customer_id = b.customer_id
GROUP BY
    1
ORDER BY
    1
LIMIT
    5;
148648028 3b49ad95 78ae 4783 9705 aacb8e5d1c60

6. Check the number of rentals for each customer for June

Answer:

SELECT
    concat((b.last_name), ' ', (b.first_name)) AS customer_name,
    count(*) AS number_of_films_rented_june
FROM
    rentals_june a
    INNER JOIN customer b ON a.customer_id = b.customer_id
GROUP BY
    1
ORDER BY
    1
LIMIT
    5;
148648836 054ddcd5 e2eb 4510 9020 de75cbe55766

7. Create a Python connection with your SQL database

and retrieve the results of the last two queries

(also mentioned below) as dataframes:

Answer:

from sqlalchemy import create_engine
import pandas as pd
import getpass
password = getpass.getpass()
DATABASE_LOCATION = 'mysql+pymysql://root:' + password + '@localhost/sakila'
engine = create_engine(DATABASE_LOCATION, echo=True)
148679545 ff40bb35 a222 4c18 a646 cf88a7535dce

7.1. Check the number of rentals for each customer for May

Answer:

query_1 = """
SELECT
    concat((b.last_name), ' ', (b.first_name)) AS customer_name,
    count(*) AS number_of_films_rented_may
FROM
    rentals_may a
    INNER JOIN customer b ON a.customer_id = b.customer_id
GROUP BY
    1
ORDER BY
    1;

"""
data_1 = pd.read_sql_query(query_1, engine)
data_1.head()
148654352 294f3733 c329 4036 b2c2 df55e4f16354
148652392 17cbf615 3593 4fa9 a511 0b9a325e7cf7
148652430 e8985110 7bf8 4bb4 9e98 98e5ba2df7d9

7.2. Check the number of rentals for each customer for June

query_2 = """
SELECT
    concat((b.last_name), ' ', (b.first_name)) AS customer_name,
    count(*) AS number_of_films_rented_june
FROM
    rentals_june a
    INNER JOIN customer b ON a.customer_id = b.customer_id
GROUP BY
    1
ORDER BY
    1;
"""
data_2 = pd.read_sql_query(query_2, engine)
data_2.head()
148654396 c843b02a 20fd 49ba a42f 6efd75d740ce
148652806 1196bc07 c22e 4093 951e d0e5a9b2e4c7

8. Write a function that checks if customer borrowed more or less films in the month of June as compared to May

Answer:

Declaring the two SQL queries into separate variables:

query_1 = """
SELECT concat((b.last_name), ' ', (b.first_name)) AS customer_name,
    count(*) AS number_of_films_rented_may
FROM rentals_may a
    INNER JOIN customer b ON a.customer_id = b.customer_id
GROUP BY 1
ORDER BY 1;
"""
query_2 = """
SELECT concat((b.last_name), ' ', (b.first_name)) AS customer_name,
    count(*) AS number_of_films_rented_june
FROM rentals_june a
    INNER JOIN customer b ON a.customer_id = b.customer_id
GROUP BY 1
ORDER BY 1;
"""

Function:

def rents(customer_name):
    from sqlalchemy import create_engine
    import pandas as pd
    import getpass
    import numpy as np

    # Connecting to the database
    password = getpass.getpass()
    DATABASE_LOCATION = 'mysql+pymysql://root:' + password + '@localhost/sakila'
    engine = create_engine(DATABASE_LOCATION)

    # Fetching rentals for May with the SQL query
    data_05 = pd.read_sql_query(query_1, engine)

    # Fetching rentals for June with the SQL query
    data_06 = pd.read_sql_query(query_2, engine)

    # Merging the two DataFrames and replacing NULL values with (0)
    rentals_05_06 = pd.merge(data_05, data_06)
    rentals_05_06.fillna(value=0, inplace=True)

    # Setting the index by customer name
    rentals_05_06.set_index('customer_name', inplace = True)

    if customer_name in rentals_05_06.index:
        if rentals_05_06.at[customer_name, 'number_of_films_rented_may'] > rentals_05_06.at[customer_name, 'number_of_films_rented_june']:
            print('Customer', customer_name, 'decreace of rents in June')
        elif rentals_05_06.at[customer_name, 'number_of_films_rented_may'] < rentals_05_06.at[customer_name, 'number_of_films_rented_june']:
            print('Customer', customer_name, 'increace of rents in June')
        else:
            print('Customer', customer_name, 'had same rents as May')
    else:
        print('Customer', customer_name, ' had no rents in May and June')

Testing the function by passing as argument the name of some customers):

148707042 7ac1042e 6ffe 4cef 89dd 5a139020276b

9. Connecting to a Microsoft Azure SQL Database with pyodbc

Answer:

from sqlalchemy import create_engine
import textwrap
import pyodbc
import getpass
import pandas as pd
import numpy as np
  • Setting up the query variables and connection string (database location)

sql_query_1 = """
SELECT last_name, first_name,
    count(rental_id) AS number_of_films_rented_may
FROM rental a
    INNER JOIN customer b ON a.customer_id = b.customer_id
WHERE MONTH(rental_date) = 5
GROUP BY last_name, first_name
ORDER BY 1;
"""
  • Driver: find the DRIVERS that are available using the pyodbc.drivers() method

pyodbc.drivers()
148929057 efe8076b 2466 478e 9c68 5092b2be8526
# Creating the server URL:
server='{server_name}.database.windows.net,1433'.format(server_name=server_name)

# Driver:
driver='{SQL Server Native Client 11.0}'

# Server name and DataBase name:
server_name='tcp:serv-sakila-2'
database_name='sakila'

# User name and password
username='a100jcd'
token=getpass.getpass()
# Setting up the Azure SQL database connection

DATABASE_LOCATION = textwrap.dedent('''
    Driver={driver};
    Server={server};
    Database={database};
    Uid={username};
    Pwd={token};
    Encrypt=yes;
    TrustServerCertificate=no;
    Connection Timeout=30;
    '''.format(driver=driver,server=server,
    database=database_name, username=username, token=token))

Here I used textwrap. This python module provides formatting of text by adjusting the line breaks in the input paragraph. The dedent function in this case is particularly useful to keep the code tidy without adding the actual line breaks and also to be able to use .format. If I need to connect to another Azure SQL Server database, I only replace the variables:

  • server_name

  • database_name

  • username

  • token

A connection string must be indicated in the same line. If I arbitrarily just format the string with triple quotes, I would be adding line breaks (\n) which would break the string.

cnx = pyodbc.connect(DATABASE_LOCATION)
data_1 = pd.read_sql_query(sql_query_1, cnx)
data_1.head()
148937894 797a01be e011 4934 904f fd17b243928f

9.1. Most common issues trying to connect to a cloud database (Azure SQL)

  • Server Driver:

⚠️
Error message:
InterfaceError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_19372/2813816242.py in <module>
 1 cnx = pyodbc.connect(DATABASE_LOCATION)

InterfaceError: ('IM002', [IM002] [Microsoft][ODBC Driver Manager] Data source name
not found and no default driver specified (0) (SQLDriverConnect))

At the Azure portal, Azure SQL databases provide a ready to use connection string for ODBC[1] specifying the ODBC Driver 13 for SQL Server.

However SQLAlchemy has several dialects/DBAPI options available[2] such as PyODBC, mxODBC[3] and pymssql. So if we run into error messages of this sort, this is when the .driver method comes handy to check the available drivers so we can test the connection.

For example in my case, I could not connect to the database using the ODBC Driver 13 for SQL Server as specified by Azure. I could connect successfully using the ODBC Driver 17 for SQL Server. Also we have to make sure that this driver is actually install in our pc.

148947267 9e6cceac 4281 4852 ba2c beced8c0a4ff
  • Azure SQL server firewall settings:

⚠️
Error message:
---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_19372/2813816242.py in <module>
----> 1 cnx = pyodbc.connect(DATABASE_LOCATION)

"""ProgrammingError: (42000, [42000] [Microsoft][ODBC Driver 17 for SQL Server]
[SQL Server]Cannot open server serv-sakila-2 requested by the login. Client with IP address
2.206.133.211 is not allowed to access the  server.  To enable access, use the Windows Azure
Management Portal or run sp_set_firewall_rule on the master database to create a firewall rule
for this IP address or address range.  It may take up to five minutes for this change to
take effect. (40615) (SQLDriverConnect); [42000] [Microsoft][ODBC Driver 17 for
SQL Server]Invalid connection string attribute (0); [42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server] Cannot open server serv-sakila-2 requested by the login. Client with IP address
2.206.133.211 is not allowed to access the server.  To enable access, use the Windows
Azure Management Portal or run sp_set_firewall_rule on the master database to create
a firewall rule for this IP address or address range. It may take up to five minutes
for this change to take effect. (40615); [42000] [Microsoft][ODBC Driver 17
for SQL Server]Invalid connection string attribute (0))"""

When a Azure SQL Database or Azure Synapse Analytics are first created, by default, the firewall blocks all access to the public endpoint for the server. It is necessary to add a firewall rule at the server-level to be able to access the resource. We can configure server-level IP firewall rules by using the Azure portal, PowerShell, or Transact-SQL statements.

148958618 f18a654e 0a2b 40ea 8b17 075f55ab781d





1. ODBC stands for Open Database Connectivity. This is an API for accessing a database
2. DB-API is an acronym for DataBase Application Programming Interface and a library that lets Python connect to the database server.
3. mxODBC Deprecated since version 1.4: The mxODBC DBAPI is deprecated and will be removed in a future version. Please use one of the supported DBAPIs to connect to mssql.