Useing SQL (postresql and magic)

This is taken from the Socratica SQL youtube video for SELECT. I have also added in a bit more

SELECT is for choosing columns
FROM is for choosing a database
WHERE is for choosing the rows
ORDER BY is for sorting and sorts on a specified column

In [5]:
# Load in the sql extensions:
%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
In [44]:
#% sql sqlite://
In [45]:
! postgres -V
postgres (PostgreSQL) 11.2

I used pgAdmin to prepare a database and import the csv file. The following code allows me to connect to the database.

In [2]:
DB_ENGINE='postgresql' # tells ipython-sql that we will use postgreSQL as our database engine
DB_USER='postgres'  # id of the user who is logging in
DB_PWD='postgres' # the user's password
DB_ADDR='localhost:5432' # the host and port on which the database engine is listening
DB_NAME='earthquake' # the name of the database
In [3]:
#Construct the connection string
DB_CONNECTION = '{engine}://{user}:{pwd}@{addr}/{name}'.format(engine=DB_ENGINE,
                                                               user=DB_USER,
                                                               pwd=DB_PWD,
                                                               addr=DB_ADDR,
                                                               name=DB_NAME)

#Preview the connection string
DB_CONNECTION
Out[3]:
'postgresql://postgres:postgres@localhost:5432/earthquake'
In [4]:
%sql $DB_CONNECTION
Out[4]:
'Connected: postgres@earthquake'
In [49]:
%%sql
SELECT * FROM earthquake
LIMIT 8;
 * postgresql://postgres:***@localhost:5432/earthquake
8 rows affected.
Out[49]:
earthquake_id occurred_on latitude longitude depth magnitude calculation_method network_id place cause
1 1969-01-01 09:07:06 51.096 -179.392 45 5.6 mw iscgem812771 Andreanof Islands, Aleutian Islands, Alaska earthquake
2 1969-01-02 17:50:48 -56.096 -27.842 80.1 6 mw iscgemsup812819 South Sandwich Islands region earthquake
3 1969-01-03 03:16:40 37.14 57.899 10 5.5 mw iscgem812826 Turkmenistan-Iran border region earthquake
4 1969-01-03 13:28:12 51.132 -179.306 15 5.9 mw iscgem812841 Andreanof Islands, Aleutian Islands, Alaska earthquake
5 1969-01-04 22:36:48 -6.85 129.821 105 5.8 mw iscgem812879 Banda Sea earthquake
6 1969-01-05 13:26:43 -7.97 158.888 60 7 mw iscgem812898 Solomon Islands earthquake
7 1969-01-06 15:39:00 -10.407 164.479 10 6.9 mw iscgem812954 Santa Cruz Islands region earthquake
8 1969-01-08 21:55:51 11.823 143.419 45 5.6 mw iscgem813024 south of the Mariana Islands earthquake
In [50]:
%%sql
SELECT COUNT(*) FROM earthquake;
 * postgresql://postgres:***@localhost:5432/earthquake
1 rows affected.
Out[50]:
count
23119
In [51]:
%%sql
SELECT magnitude, place, occurred_on FROM earthquake
LIMIT 8;
 * postgresql://postgres:***@localhost:5432/earthquake
8 rows affected.
Out[51]:
magnitude place occurred_on
5.6 Andreanof Islands, Aleutian Islands, Alaska 1969-01-01 09:07:06
6 South Sandwich Islands region 1969-01-02 17:50:48
5.5 Turkmenistan-Iran border region 1969-01-03 03:16:40
5.9 Andreanof Islands, Aleutian Islands, Alaska 1969-01-03 13:28:12
5.8 Banda Sea 1969-01-04 22:36:48
7 Solomon Islands 1969-01-05 13:26:43
6.9 Santa Cruz Islands region 1969-01-06 15:39:00
5.6 south of the Mariana Islands 1969-01-08 21:55:51

Using a condition for WHERE

In [52]:
%%sql
SELECT *  
FROM earthquake
WHERE occurred_on >= '2010-1-1' AND occurred_on <= '2010-12-31'
ORDER BY magnitude DESC -- The DESC means descending order 
LIMIT 1;
 * postgresql://postgres:***@localhost:5432/earthquake
1 rows affected.
Out[52]:
earthquake_id occurred_on latitude longitude depth magnitude calculation_method network_id place cause
18756 2010-02-27 06:34:12 -36.122 -72.898 22.9 8.8 mww official20100227063411530_30 offshore Bio-Bio, Chile earthquake

Aggregation

In [53]:
%%sql
SELECT MIN(occurred_on), MAX(occurred_on) 
FROM earthquake;
 * postgresql://postgres:***@localhost:5432/earthquake
1 rows affected.
Out[53]:
min max
1969-01-01 09:07:06 2018-12-31 02:35:38
In [54]:
%%sql
SELECT MIN(magnitude), MAX(magnitude) 
FROM earthquake;
 * postgresql://postgres:***@localhost:5432/earthquake
1 rows affected.
Out[54]:
min max
5.5 9.1
In [83]:
%%sql
SELECT AVG(magnitude)
FROM earthquake;
 * postgresql://postgres:***@localhost:5432/earthquake
1 rows affected.
Out[83]:
avg
5.8744322851334400

The DISTINCT keyword

Show the distinct values for cause

In [56]:
%%sql
SELECT DISTINCT cause
FROM earthquake;
 * postgresql://postgres:***@localhost:5432/earthquake
3 rows affected.
Out[56]:
cause
explosion
nuclear explosion
earthquake

Counting the entries for these causes

In [57]:
%%sql
SELECT COUNT(*)
FROM earthquake 
WHERE cause = 'earthquake';
 * postgresql://postgres:***@localhost:5432/earthquake
1 rows affected.
Out[57]:
count
22942
In [58]:
%%sql
SELECT COUNT(*)
FROM earthquake 
WHERE cause = 'nuclear explosion';
 * postgresql://postgres:***@localhost:5432/earthquake
1 rows affected.
Out[58]:
count
173

Count the number of earthquakes that have explosion as the cause

In [59]:
%%sql
SELECT COUNT(*)
FROM earthquake 
WHERE cause = 'explosion';
 * postgresql://postgres:***@localhost:5432/earthquake
1 rows affected.
Out[59]:
count
4

Show the 4 rows with explosion as cause

In [60]:
%%sql
SELECT *
FROM earthquake 
WHERE cause = 'explosion';
 * postgresql://postgres:***@localhost:5432/earthquake
4 rows affected.
Out[60]:
earthquake_id occurred_on latitude longitude depth magnitude calculation_method network_id place cause
1483 1973-02-16 05:02:58 49.835 78.232 0 5.6 mb usp00000jc eastern Kazakhstan explosion
2461 1975-08-23 08:59:58 73.369 54.641 0 6.4 mb usp0000cjx Novaya Zemlya, Russia explosion
6107 1984-11-02 20:44:59 -21.883 -138.994 0 5.7 mb usp00028r7 Tuamotu Archipelago, French Polynesia region explosion
6380 1985-05-08 20:27:59 -21.823 -139.048 0 5.7 mb usp0002ejj Tuamotu Archipelago, French Polynesia region explosion

The above table is actually nuclear explosions and not explosion. This is an example of messy data and needs to be cleaned. Evidence for the explosions being nuclear explosions
eastern Kazakhstan https://en.wikipedia.org/wiki/1973_Soviet_nuclear_tests
Novaya Zemlya, Russia https://en.wikipedia.org/wiki/1975_Soviet_nuclear_tests
Tuamotu Archipelago, French Polynesia region, 1984 and 1985 https://en.wikipedia.org/wiki/1983%E2%80%9385_French_nuclear_tests

The DESC keyword and the 10 strongest earthquakes for the dataset

DESC arranges the column in numerical descending order

In [69]:
%%sql 
SELECT place, magnitude, occurred_on 
FROM earthquake
ORDER BY magnitude DESC
LIMIT 10;
 * postgresql://postgres:***@localhost:5432/earthquake
10 rows affected.
Out[69]:
place magnitude occurred_on
off the west coast of northern Sumatra 9.1 2004-12-26 00:58:53
near the east coast of Honshu, Japan 9.1 2011-03-11 05:46:24
offshore Bio-Bio, Chile 8.8 2010-02-27 06:34:12
off the west coast of northern Sumatra 8.6 2012-04-11 08:38:37
northern Sumatra, Indonesia 8.6 2005-03-28 16:09:37
near the coast of southern Peru 8.4 2001-06-23 20:33:14
southern Sumatra, Indonesia 8.4 2007-09-12 11:10:27
south of Sumbawa, Indonesia 8.3 1977-08-19 06:08:55
Kuril Islands 8.3 2006-11-15 11:14:14
Hokkaido, Japan region 8.3 2003-09-25 19:50:06

The LIKE and BETWEEN keywords

How many aftershocks were there after the 'near the east coast of Honshu, Japan' earthquake with a magnitude of 6 or greater?

In [9]:
%%sql
SELECT COUNT(*) AS "Aftershocks"
FROM earthquake
WHERE place LIKE '%Honshu%Japan%'
    AND occurred_on BETWEEN '2011-03-11' AND '2011-03-18'
    AND magnitude >= 6;
 * postgresql://postgres:***@localhost:5432/earthquake
1 rows affected.
Out[9]:
Aftershocks
51

Notice i used the AS keyword to rename the column header from count to Aftershocks

In [7]:
#! postgres --help