This course teaches an important aspect of data science - data visualization. Picture is worth a thousand words.
What do you think are the kind of data needed to draw this map?
Past 7-days M2.5+ Earthquakes
https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_week.csv)
https://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php
Does it have the kind of data you need to draw the activity map?
1 2 3 4 5 | import pandas as pd
df_7d = pd.read_csv(
'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.csv')
df_30d = pd.read_csv(
'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv')
|
read_csv
function, which takes the URL of the CSV data, and returns a DataFrame.You can peek at the dataframe's content by simply evaluating it in a cell, or use the .head()
function.
1 2 3 | df_7d.shape
df_30d.columns
|
1 | df_7d['time'] # <-- or df_7d.time
|
1 2 | map_df_7d = df_7d[['time', 'longitude', 'latitude', 'mag']]
map_df_7d.head()
|
1 2 3 | for index, row in map_df_7d.iterrows():
print(index)
print(row['time'])
|
Histograms are excellent ways to visualize the frequency of data measurements
1 2 3 4 5 6 7 8 9 10 11 | from matplotlib import pyplot as plt
%matplotlib inline
plt.style.use('seaborn') # <-- 'seaborn' is just a style name
plt.figure(figsize=(10, 5))
plt.hist(df_7d.mag, bins=50, alpha=0.3, label='7-day')
plt.hist(df_30d.mag, bins=50, alpha=0.3, label='30-day')
plt.ylabel('Occurences')
plt.xlabel('Magitude')
plt.title('7-day and 30-day earthquake magnitudes')
plt.legend() # <-- draw the legend box. Must have `label` param in `hist()`
plt.show()
|
bins
- how many bins to divide the data intoalpha
- transparency: 0 to 1, 1 being non-transparentFor example, you want to study earthquakes that are stronger than M2.5 but weaker than M5.0.
1 2 3 4 | minor_df = df_30d[df_30d.mag < 2.5]
print('Minor earthquake count:', minor_df.shape[0])
medium_df = df_30d[ (df_30d.mag >= 2.5) & (df_30d.mag < 5) ]
print('Medium earthquake count:', medium_df.shape[0])
|
1 2 3 4 5 6 7 8 | plt.style.use('seaborn')
plt.figure(figsize=(10, 5))
plt.hist(df_30d.depth, bins=50, alpha=0.3)
plt.ylabel('Occurences')
plt.xlabel('Depth in km')
plt.title('30-day earthquake depth')
plt.show()
|
1 2 3 4 5 6 7 8 | plt.style.use('seaborn')
plt.figure(figsize=(10, 5))
plt.hist(df_30d[df_30d.depth < 100].depth, bins=50, alpha=0.3)
plt.ylabel('Occurences')
plt.xlabel('Depth in km')
plt.title('30-day earthquake depth < 100km')
plt.show()
|
Basemap
is part of the Matplotlib
librariesMatplotlib
is an essential tool for scientific graph and chart plotting1 2 3 4 5 6 7 | %matplotlib inline
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
my_map = Basemap(projection='ortho', lat_0=50, lon_0=-100, resolution='l')
my_map.drawcoastlines()
plt.show()
|
1 | my_map = Basemap(projection='eck4', lat_0=0, lon_0=-120, resolution='l')
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | %matplotlib inline
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
plt.figure(figsize=(18,9)) # <-- set a larger canvas size
my_map = Basemap(projection='eck4', lat_0=0, lon_0=-90, resolution='l')
my_map.drawmapboundary(fill_color='aqua')
my_map.drawcoastlines()
my_map.drawcountries()
my_map.fillcontinents(color='coral', lake_color='aqua')
plt.title('A blank map')
plt.show()
|
The coordinate of Ardsley, NY is (41.0107° N, 73.8437° W)
1 2 3 4 5 6 7 8 | ...
longitude = -73.8437
latitude = 41.0107
# convert lon and lat to a (x,y) coordinate using the map's projection type
x, y = my_map(longitude, latitude)
my_map.scatter(x, y, color='red', marker='o', s=64,
zorder=2) # <- zorder=2 makes the marker show up on top
plt.show()
|
Sometimes you see people plot directly with lon and lat, by using latlon=True
.
1 2 3 4 5 6 | ...
longitude = -73.8437
latitude = 41.0107
my_map.scatter(longitude, latitude, latlon=True, color='red', marker='o',
s=64, zorder=2)
plt.show()
|
Matplotlib deals with lists (or list-like containers) of data seamlessly.
Let's plot two more cities on the map, with names next to the marker:
Beijing: 39.9042° N, 116.4074° E
Singapore: 1.3521° N, 103.8198° E
1 2 3 4 5 6 7 8 9 | longitudes = [-73.8437, 116.4074, 103.8198]
latitudes = [ 41.0107, 39.9042, 1.3521]
names = ['Ardsley', 'Beijing', 'Singapore']
xs, ys = my_map(longitudes, latitudes)
my_map.scatter(xs, ys, color='red', marker='o', s=64, zorder=2)
for i in range(0, len(names)):
plt.text(xs[i], ys[i], names[i], fontsize=12, color='black')
|
text()
only takes projected coordinates.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | %matplotlib inline
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
plt.figure(figsize=(18,9))
my_map = Basemap(projection='eck4', lat_0=0, lon_0=-90, resolution='l')
my_map.drawmapboundary(fill_color='aqua')
my_map.drawcoastlines()
my_map.drawcountries()
my_map.fillcontinents(color='coral', lake_color='aqua')
#---- This is the important part
data = map_df_7d[map_df_7d.mag > 2.5]
xs, ys = my_map(data['longitude'].tolist(), data['latitude'].tolist())
my_map.scatter(xs, ys, color='yellow', marker='o', s=64, zorder=2)
#---- End of important part
plt.title('Map with 7-day earthquakes > M2.5 marked')
plt.show()
|
List comprehension is a very handy and intuitive way to create a list
based on another list
.
Suppose you, the teacher, got the list of points (out of 100) everyone in the class scored in the last exam. What the class doesn't know is that instead of marking their papers, you randomly generated their scores. Ha!
1 2 3 | import random
points = random.sample(range(40, 101), 25)
points
|
You want to assign a letter grade from A to F for each score. The grade range is as follows: F < 50, E < 60, D < 70, C < 80, B < 90, A otherwise. To express this rule in Python -
1 2 3 4 5 6 7 8 9 10 11 12 13 | def assign_grade(point):
if point < 50:
return 'F'
elif point < 60:
return 'E'
elif point < 70:
return 'D'
elif point < 80:
return 'C'
elif point < 90:
return 'B'
else:
return 'A'
|
Now, we use list comprehension to produce the list of letter grades -
1 2 | grades = [assign_grade(p) for p in points]
grades
|
The rule: Green for < M3.0, Yellow for <M5.0, Red otherwise
The data: all earthquakes > M2.5
Hint:
1 2 3 4 5 6 | def get_marker_color(magnitude):
? # EXPRESS THE RULE USING IF..ELIF..ELSE
data = map_df_7d[map_df_7d.mag > 2.5]
intensity_colors = ? # USE LIST COMPREHENSION
|
1 2 3 4 5 6 7 8 9 10 | def get_marker_color(magnitude):
if magnitude < 3:
return 'green'
elif magnitude < 5.0:
return 'yellow'
else:
return 'red'
data = map_df_7d[map_df_7d.mag > 2.5]
intensity_colors = [get_marker_color(m) for m in data['mag']]
|
1 2 3 4 5 6 7 | ...
data = map_df_7d[map_df_7d.mag > 2.5]
intensity_colors = [get_marker_color(m) for m in data['mag']]
xs, ys = my_map(data['longitude'].tolist(), data['latitude'].tolist())
my_map.scatter(xs, ys, color=intensity_colors, marker='o', s=64, zorder=2)
...
|
Note: for an earthquake of magnitude m
, use a marker size of m**4
, or m to the 4th power.
1 2 3 4 5 6 7 8 9 10 11 | def get_marker_size(magnitude):
return magnitude**4
...
data = map_df_7d[map_df_7d.mag > 2.5]
intensity_colors = [get_marker_color(m) for m in data['mag']]
intensity_sizes = [get_marker_size(m) for m in data['mag']]
xs, ys = my_map(data['longitude'].tolist(), data['latitude'].tolist())
my_map.scatter(xs, ys, color=intensity_colors, marker='o', s=intensity_sizes, zorder=2, alpha=0.6)
...
|
Table of contents | t |
---|---|
Exposé | ESC |
Autoscale | e |
Full screen slides | f |
Presenter view | p |
Source files | s |
Slide numbers | n |
Blank screen | b |
Notes | 2 |
Help | h |