Effective pedagogies for teaching data science¶

Mike Gelbart

UBC Jupyter Days 2020

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder, FunctionTransformer
from sklearn.feature_extraction.text import CountVectorizer

from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV

import matplotlib.pyplot as plt
plt.rcParams['font.size'] = 16

import re

# To fix later, don't want to break the Jupyter Book
try:
    import graphviz
    graphviz_imported = True
except ImportError:
    graphviz_imported = False
    
def display_tree(df, tree):
    """ For binary classification only """
    if not graphviz_imported:
        return
    dot = export_graphviz(tree, out_file=None, feature_names=df.columns[:-1], class_names=tree.classes_,impurity=False)
    # adapted from https://stackoverflow.com/questions/44821349/python-graphviz-remove-legend-on-nodes-of-decisiontreeclassifier
    dot = re.sub('(\\\\nsamples = [0-9]+)(\\\\nvalue = \[[0-9]+, [0-9]+\])(\\\\nclass = [A-Za-z0-9]+)', '', dot)
    dot = re.sub(     '(samples = [0-9]+)(\\\\nvalue = \[[0-9]+, [0-9]+\])\\\\n', '', dot)
    return graphviz.Source(dot)

Note to self: run these imports ^

In this session I will discuss a few tips for teaching data science, especially with Jupyter notebooks:

Real-time collaborative documents (10 min)
Live coding with Jupyter (5 min)
Problem-based learning (20 min)
Break (5-10 min)
Sample activity: decision trees (20 min)
My implementation of think-pair-share (5-10 min, time-permitting)
Q&A (remaining time)

About me¶

Assistant Professor of Teaching, UBC Computer Science
Option Co-Director, UBC Master of Data Science program
Background in machine learning

Real-time documents (10 min)¶

Especially great for online teaching.
Here is a Google Doc for today.
Let’s start by completing the Introductions and Activity 1.
This tool pairs well with Jupyter.
- For example, I had students create visualizations and screenshot them in the document.

Live coding in Jupyter (5 min)¶

I’m a big fan of live experiments.
I don’t do much “live coding” but I do “live execution”.
You’ve probably seen a lot of this but here’s my version:

vs.

Main ideas:

No “magic”.
Cultivates an atmosphere of experimentation.

Problem-based learning (20 min)¶

My CPSC 340 syllabus:

Day 1: Decision Trees
Day 2: $k$-nearest neighbours
etc.

My CPSC 330 syllabus:

Week 1: census dataset
Week 2: houseing dataset
etc.

Jupyter facilitates this “problem-based learning” really well.

Example: Canadian cheese dataset¶

Below I will show a pretend lesson using the Canadian Cheese Directory dataset from Agriculture and Agri-Food Canada. This data is distributed under the Canadian Open Government License so I have bundled it with this Jupyterbook.

We will be predicting FatContentPercent based on the other features.

df = pd.read_csv("data/canadianCheeseDirectory.csv", index_col=0)

We will be predicting FatContentPercent, which is not available for all the cheeses, so I will first filter out those where this is not available:

df = df.dropna(subset=['FatContentPercent'])

df_train, df_test = train_test_split(df, random_state=123) 

y_train = df_train['FatContentPercent']

lm = Ridge()

# lm.fit(df_train, y_train)

Now we run into our first error.
I like to leave these in.
- However, they cause problems when running all cells (or building a Jupyter Book?).
- Fixing this issue seems to be in progress.
Not all the “problems” we encounter are Python errors.
Problem types:
- Python error
- Poor model accuracy
- Prohibitively slow code
- Problems you didn’t know you had (the worst kind!), especially around train/test contamination
- I don’t know where to start

Problem: non-numeric data.

df_train.head()

	CheeseNameEn	CheeseNameFr	ManufacturerNameEn	ManufacturerNameFr	ManufacturerProvCode	ManufacturingTypeEn	ManufacturingTypeFr	WebSiteEn	WebSiteFr	FatContentPercent	...	Organic	CategoryTypeEn	CategoryTypeFr	MilkTypeEn	MilkTypeFr	MilkTreatmentTypeEn	MilkTreatmentTypeFr	RindTypeEn	RindTypeFr	LastUpdateDate
CheeseId
1432	NaN	Chèvre des Alpes BIO	Damafro	Damafro	QC	Industrial	Industrielle	http://www.damafro.ca/en/home.html	http://www.damafro.ca	22.0	...	1	Fresh Cheese	Pâte fraîche	Goat	Chèvre	Pasteurized	Pasteurisé	No Rind	Sans croûte	2016-02-03
2281	NaN	Frère Chasseur (Le)	NaN	Fromagerie Au Gré des Champs	QC	Artisan	Artisanale	NaN	http://www.augredeschamps.com	35.0	...	1	Firm Cheese	Pâte ferme	Cow	Vache	Raw Milk	Lait cru	NaN	NaN	2016-02-03
1908	NaN	Mon précieux	NaN	Fromagerie Couland	QC	Artisan	Artisanale	NaN	NaN	22.0	...	0	Fresh Cheese	Pâte fraîche	Goat	Chèvre	Pasteurized	Pasteurisé	No Rind	Sans croûte	2016-02-03
2224	NaN	Tomme de Brebis de Charlevoix	NaN	Maison d'affinage Maurice Dufour (La)	QC	Artisan	Artisanale	NaN	http://www.fromagefin.com	33.0	...	0	Firm Cheese	Pâte ferme	Ewe	Brebis	NaN	NaN	Washed Rind	Croûte lavée	2016-02-03
2007	NaN	Cheddar Littoral	NaN	Fromagerie Ferme du littoral	QC	Farmstead	Fermière	NaN	NaN	30.0	...	0	Firm Cheese	Pâte ferme	Cow	Vache	Pasteurized	Pasteurisé	No Rind	Sans croûte	2016-02-03

5 rows × 29 columns

Now perhaps the lesson goes to encoding categorical variables…

df["ManufacturerName"] = df["ManufacturerNameEn"].fillna(df["ManufacturerNameFr"])
df = df.drop(columns=["ManufacturerNameEn", "ManufacturerNameFr"])
df["CheeseName"] = df["CheeseNameEn"].fillna(df["CheeseNameFr"])
df = df.drop(columns=["CheeseNameEn", "CheeseNameFr"])
df = df.drop(columns=[col for col in df.columns if col.endswith("Fr")])
df_train, df_test = train_test_split(df, random_state=123)

numeric_features = ['MoisturePercent']
categorical_features = ['ManufacturerProvCode', 'ManufacturingTypeEn', 'Organic', 'CategoryTypeEn', 'MilkTypeEn', 'MilkTreatmentTypeEn', 'RindTypeEn', 'ManufacturerName']
text_features = ['CheeseName', 'FlavourEn', 'CharacteristicsEn']
drop_features = ['WebSiteEn', 'ParticularitiesEn', 'RipeningEn', 'LastUpdateDate']
target_column = 'FatContentPercent'
assert set(numeric_features + categorical_features + text_features + drop_features + [target_column]) == set(df_train.columns)

y_train = df_train[target_column]
y_test = df_test[target_column]

Note how I’ve minimized some of the code cells.
In class, I would show more of them.
But not necessarily all, if they are beyond the scope, not relevant, or old news.

Now to the key code:

numeric_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='constant')),
    ('onehot', OneHotEncoder(sparse=False))
])

# Fit a separate CountVectorizer for each of the text columns.
# Need to convert the resulting sparse matrices to dense separately.
text_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='constant', fill_value='')),
    ('tolist', FunctionTransformer(lambda x: x.ravel(), validate=False)),
    ('countvec', CountVectorizer(max_features=10, stop_words='english')),
    ('todense', FunctionTransformer(lambda x: x.toarray(), validate=False))
])

preprocessor = ColumnTransformer([
    ('numeric', numeric_transformer, numeric_features),
    ('categorical', categorical_transformer, categorical_features)
] + [(f, text_transformer, [f]) for f in text_features])

preprocessor.fit(df_train);

def get_column_names(preprocessor):
    """
    Gets the feature names from a preprocessor.
    This entails looking at the OHE feature names and also
    the words used by the CountVectorizers.
    
    Arguments
    ---------
    preprocessor: ColumnTransformer
        A fit preprocessor following the specific format above.
    
    Returns
    -------
    list
        A list of column names.
    """
    ohe_feature_names = list(preprocessor.named_transformers_['categorical'].named_steps['onehot'].get_feature_names(categorical_features))
    text_feature_names = [f + "_" + word for f in text_features for word in preprocessor.named_transformers_[f].named_steps['countvec'].get_feature_names()]
    return numeric_features + ohe_feature_names + text_feature_names

new_columns = get_column_names(preprocessor)
    
df_train_enc = pd.DataFrame(preprocessor.transform(df_train), index=df_train.index, columns=new_columns)
df_train_enc.head()

	MoisturePercent	ManufacturerProvCode_AB	ManufacturerProvCode_BC	ManufacturerProvCode_MB	ManufacturerProvCode_NB	ManufacturerProvCode_NL	ManufacturerProvCode_NS	ManufacturerProvCode_ON	ManufacturerProvCode_PE	ManufacturerProvCode_QC	...	CharacteristicsEn_cheese	CharacteristicsEn_colored	CharacteristicsEn_creamy	CharacteristicsEn_interior	CharacteristicsEn_pressed	CharacteristicsEn_rind	CharacteristicsEn_ripened	CharacteristicsEn_smooth	CharacteristicsEn_texture	CharacteristicsEn_white
CheeseId
1432	1.127656	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	...	1.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2281	-1.459684	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1908	2.266086	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2224	-1.459684	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2007	-0.528242	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

5 rows × 224 columns

Awesome, we’ve got everything into numeric data!
Now let’s train a model:

pipeline_dummy = Pipeline([
    ('preprocessor', preprocessor),
    ('model', DummyRegressor())])

# scores_dummy = cross_validate(pipeline_dummy, df_train, y_train, cv=5, return_train_score=True)
# pd.DataFrame(scores_dummy)[["train_score", "test_score"]].mean()

Oh no, another error - what is going on this time? Etc.

Going back to the problem types:

Python error
Poor model accuracy
Prohibitively slow code
Problems you didn’t know you had (the worst kind!), especially around train/test contamination

Most of the topics / learning outcomes in the course fit into one of these categories.

Topic	Problem
One-hot encoding	Python error
Overfitting	Poor model accuracy
Fancy models, e.g. CatBoost	Poor model accuracy
Feature importances, e.g. SHAP	Lack of interpretability
Pipelines	Problems you didn’t know you had, hard-to-maintain code
Cross-validation	Poor model accuracy, problems you didn’t know you had
Survival analysis	Problems you didn’t know you had
Pre-trained deep networks	Don’t know where to start

etc.

Another quick activity before the break¶

Let’s generate a dataset for another supervised learning problem: predicting whether someone has seen the movie The Lion King.
Let’s head back to the Google Doc.

Break (5-10 min)¶

Sample activity: decision trees (20 min)¶

Basic idea: ask a bunch of yes/no questions until you end up at a prediction.
E.g.
- If you are scared of lions, predict “No”
- Otherwise, if you’ve seen a movie in the last 2 weeks, predict “Yes”
- Otherwise, if you’d pay > \$5 for a movie, predict “Yes”
- Otherwise, predict “No”

This “series of questions” approach can be drawn as a tree:

            Are you scared of lions?
            /          \
           / True       \  False
          /              \
         No           Have you seen a movie in the last 2 weeks?
                        /      \
                  True /        \ False
                      /          \ 
                    Yes         Would you pay more than $5 for a movie?  
                                 /           \
                                / True        \ False
                               /               \
                              Yes              No

The decision tree algorithm automatically learns a tree like this, based on the data set!
- We don’t have time to go through how the algorithm works.
- But it’s worth noting that it support two types of inputs:

Categorical (e.g., Yes/No or more options)
Numeric (a number)

In the numeric case, the decision tree algorithm also picks the threshold (\$5 in this case)

Let’s apply a decision tree to our The Lion King dataset.

lion_king = pd.read_csv("data/lionking_test.csv")
lion_king.tail()

	Are you scared of lions?	Have you watched a movie in the last 2 weeks?	How much would you pay to see a good movie? (in $)	Have you seen the movie The Lion King?
0	No	No	10	Yes
1	Yes	Yes	10	Yes
2	Yes	No	10	No
3	No	No	0	No

lion_king.loc[:,lion_king.columns[:2]] = lion_king.loc[:,lion_king.columns[:2]] == "No"
# Note on the above: Encoding "No" as 1 and "Yes" as 0 because sklearn uses <= 0.5; this makes the tree diagrams more readable.

tree = DecisionTreeClassifier(max_depth=1).fit(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])

display_tree(lion_king, tree)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
    163     try:
--> 164         proc = subprocess.Popen(cmd, startupinfo=get_startupinfo(), **kwargs)
    165     except OSError as e:

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
    799                                 errread, errwrite,
--> 800                                 restore_signals, start_new_session)
    801         except:

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
   1550                             err_msg += ': ' + repr(err_filename)
-> 1551                     raise child_exception_type(errno_num, err_msg, err_filename)
   1552                 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'dot': 'dot'

During handling of the above exception, another exception occurred:

ExecutableNotFound                        Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in _repr_svg_(self)
    111 
    112     def _repr_svg_(self):
--> 113         return self.pipe(format='svg').decode(self._encoding)
    114 
    115     def pipe(self, format=None, renderer=None, formatter=None, quiet=False):

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in pipe(self, format, renderer, formatter, quiet)
    136         out = backend.pipe(self._engine, format, data,
    137                            renderer=renderer, formatter=formatter,
--> 138                            quiet=quiet)
    139 
    140         return out

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in pipe(engine, format, data, renderer, formatter, quiet)
    242     """
    243     cmd, _ = command(engine, format, None, renderer, formatter)
--> 244     out, _ = run(cmd, input=data, capture_output=True, check=True, quiet=quiet)
    245     return out
    246 

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
    165     except OSError as e:
    166         if e.errno == errno.ENOENT:
--> 167             raise ExecutableNotFound(cmd)
    168         else:
    169             raise

ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

<graphviz.files.Source at 0x7f383690fa90>

lion_king.iloc[:,-1].value_counts()

No     2
Yes    2
Name: Have you seen the movie The Lion King?, dtype: int64

21/30

0.7

tree.score(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])

0.75

Now, let’s make this more interesting by increasing the depth of the tree.

tree2 = DecisionTreeClassifier(max_depth=3).fit(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])

display_tree(lion_king, tree2)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
    163     try:
--> 164         proc = subprocess.Popen(cmd, startupinfo=get_startupinfo(), **kwargs)
    165     except OSError as e:

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
    799                                 errread, errwrite,
--> 800                                 restore_signals, start_new_session)
    801         except:

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
   1550                             err_msg += ': ' + repr(err_filename)
-> 1551                     raise child_exception_type(errno_num, err_msg, err_filename)
   1552                 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'dot': 'dot'

During handling of the above exception, another exception occurred:

ExecutableNotFound                        Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in _repr_svg_(self)
    111 
    112     def _repr_svg_(self):
--> 113         return self.pipe(format='svg').decode(self._encoding)
    114 
    115     def pipe(self, format=None, renderer=None, formatter=None, quiet=False):

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in pipe(self, format, renderer, formatter, quiet)
    136         out = backend.pipe(self._engine, format, data,
    137                            renderer=renderer, formatter=formatter,
--> 138                            quiet=quiet)
    139 
    140         return out

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in pipe(engine, format, data, renderer, formatter, quiet)
    242     """
    243     cmd, _ = command(engine, format, None, renderer, formatter)
--> 244     out, _ = run(cmd, input=data, capture_output=True, check=True, quiet=quiet)
    245     return out
    246 

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
    165     except OSError as e:
    166         if e.errno == errno.ENOENT:
--> 167             raise ExecutableNotFound(cmd)
    168         else:
    169             raise

ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

<graphviz.files.Source at 0x7f383c6f7ad0>

Please ignore the <= 0.5 above, it’s a detail we don’t need to get into here.

tree2.score(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])

1.0

tree_max = DecisionTreeClassifier(max_depth=100).fit(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])

tree_max.score(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])

1.0

display_tree(lion_king, tree_max)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
    163     try:
--> 164         proc = subprocess.Popen(cmd, startupinfo=get_startupinfo(), **kwargs)
    165     except OSError as e:

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
    799                                 errread, errwrite,
--> 800                                 restore_signals, start_new_session)
    801         except:

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
   1550                             err_msg += ': ' + repr(err_filename)
-> 1551                     raise child_exception_type(errno_num, err_msg, err_filename)
   1552                 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'dot': 'dot'

During handling of the above exception, another exception occurred:

ExecutableNotFound                        Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in _repr_svg_(self)
    111 
    112     def _repr_svg_(self):
--> 113         return self.pipe(format='svg').decode(self._encoding)
    114 
    115     def pipe(self, format=None, renderer=None, formatter=None, quiet=False):

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in pipe(self, format, renderer, formatter, quiet)
    136         out = backend.pipe(self._engine, format, data,
    137                            renderer=renderer, formatter=formatter,
--> 138                            quiet=quiet)
    139 
    140         return out

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in pipe(engine, format, data, renderer, formatter, quiet)
    242     """
    243     cmd, _ = command(engine, format, None, renderer, formatter)
--> 244     out, _ = run(cmd, input=data, capture_output=True, check=True, quiet=quiet)
    245     return out
    246 

/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
    165     except OSError as e:
    166         if e.errno == errno.ENOENT:
--> 167             raise ExecutableNotFound(cmd)
    168         else:
    169             raise

ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

<graphviz.files.Source at 0x7f38368dbf10>

UBC JupyterDays 2020

Effective pedagogies for teaching data science¶

About me¶

Real-time documents (10 min)¶

Live coding in Jupyter (5 min)¶

Problem-based learning (20 min)¶

Example: Canadian cheese dataset¶

Another quick activity before the break¶

Break (5-10 min)¶

Sample activity: decision trees (20 min)¶

Q&A (15 min)¶

UBC JupyterDays 2020

Effective pedagogies for teaching data science¶

About me¶

Real-time documents (10 min)¶

Live coding in Jupyter (5 min)¶

Problem-based learning (20 min)¶

Example: Canadian cheese dataset¶

Another quick activity before the break¶

Break (5-10 min)¶

Sample activity: decision trees (20 min)¶

Think-pair-share (5-10 min, time permitting)¶

Cross-validation true/false questions¶

Q&A (15 min)¶