Worksheet
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = sns.load_dataset("titanic")
# Note: if you have problems, try:
# df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
2: Cleanup
2.1
Modify the dataset to provide more reasonable dtypes.
survived
can be boolalive
can be bool (a little tricker)pclass
,sex
,embarked
,who
, andembark_town
can be categorical
Remember, you are modifying the DataFrame, so you'll probably not be able to rerun this cell without rerunning the one before it.
What fraction survived the titanic (in the dataset we have)?
What fraction of children survived the Titanic?
What was the average fare of each class of adult passenger pclass
?