Effective pedagogies for teaching data science¶
Mike Gelbart
UBC Jupyter Days 2020
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder, FunctionTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
plt.rcParams['font.size'] = 16
import re
# To fix later, don't want to break the Jupyter Book
try:
import graphviz
graphviz_imported = True
except ImportError:
graphviz_imported = False
def display_tree(df, tree):
""" For binary classification only """
if not graphviz_imported:
return
dot = export_graphviz(tree, out_file=None, feature_names=df.columns[:-1], class_names=tree.classes_,impurity=False)
# adapted from https://stackoverflow.com/questions/44821349/python-graphviz-remove-legend-on-nodes-of-decisiontreeclassifier
dot = re.sub('(\\\\nsamples = [0-9]+)(\\\\nvalue = \[[0-9]+, [0-9]+\])(\\\\nclass = [A-Za-z0-9]+)', '', dot)
dot = re.sub( '(samples = [0-9]+)(\\\\nvalue = \[[0-9]+, [0-9]+\])\\\\n', '', dot)
return graphviz.Source(dot)
Note to self: run these imports ^
In this session I will discuss a few tips for teaching data science, especially with Jupyter notebooks:
Real-time collaborative documents (10 min)
Live coding with Jupyter (5 min)
Problem-based learning (20 min)
Break (5-10 min)
Sample activity: decision trees (20 min)
My implementation of think-pair-share (5-10 min, time-permitting)
Q&A (remaining time)
About me¶
Assistant Professor of Teaching, UBC Computer Science
Option Co-Director, UBC Master of Data Science program
Background in machine learning
Real-time documents (10 min)¶
Especially great for online teaching.
Here is a Google Doc for today.
Let’s start by completing the Introductions and Activity 1.
This tool pairs well with Jupyter.
For example, I had students create visualizations and screenshot them in the document.
Live coding in Jupyter (5 min)¶
I’m a big fan of live experiments.
I don’t do much “live coding” but I do “live execution”.
You’ve probably seen a lot of this but here’s my version:
vs.
Main ideas:
No “magic”.
Cultivates an atmosphere of experimentation.
Problem-based learning (20 min)¶
My CPSC 340 syllabus:
Day 1: Decision Trees
Day 2: \(k\)-nearest neighbours
etc.
My CPSC 330 syllabus:
Week 1: census dataset
Week 2: houseing dataset
etc.
Jupyter facilitates this “problem-based learning” really well.
Example: Canadian cheese dataset¶
Below I will show a pretend lesson using the Canadian Cheese Directory dataset from Agriculture and Agri-Food Canada. This data is distributed under the Canadian Open Government License so I have bundled it with this Jupyterbook.
We will be predicting FatContentPercent
based on the other features.
df = pd.read_csv("data/canadianCheeseDirectory.csv", index_col=0)
We will be predicting FatContentPercent
, which is not available for all the cheeses, so I will first filter out those where this is not available:
df = df.dropna(subset=['FatContentPercent'])
df_train, df_test = train_test_split(df, random_state=123)
y_train = df_train['FatContentPercent']
lm = Ridge()
# lm.fit(df_train, y_train)
Now we run into our first error.
I like to leave these in.
However, they cause problems when running all cells (or building a Jupyter Book?).
Fixing this issue seems to be in progress.
Not all the “problems” we encounter are Python errors.
Problem types:
Python error
Poor model accuracy
Prohibitively slow code
Problems you didn’t know you had (the worst kind!), especially around train/test contamination
I don’t know where to start
Problem: non-numeric data.
df_train.head()
CheeseNameEn | CheeseNameFr | ManufacturerNameEn | ManufacturerNameFr | ManufacturerProvCode | ManufacturingTypeEn | ManufacturingTypeFr | WebSiteEn | WebSiteFr | FatContentPercent | ... | Organic | CategoryTypeEn | CategoryTypeFr | MilkTypeEn | MilkTypeFr | MilkTreatmentTypeEn | MilkTreatmentTypeFr | RindTypeEn | RindTypeFr | LastUpdateDate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CheeseId | |||||||||||||||||||||
1432 | NaN | Chèvre des Alpes BIO | Damafro | Damafro | QC | Industrial | Industrielle | http://www.damafro.ca/en/home.html | http://www.damafro.ca | 22.0 | ... | 1 | Fresh Cheese | Pâte fraîche | Goat | Chèvre | Pasteurized | Pasteurisé | No Rind | Sans croûte | 2016-02-03 |
2281 | NaN | Frère Chasseur (Le) | NaN | Fromagerie Au Gré des Champs | QC | Artisan | Artisanale | NaN | http://www.augredeschamps.com | 35.0 | ... | 1 | Firm Cheese | Pâte ferme | Cow | Vache | Raw Milk | Lait cru | NaN | NaN | 2016-02-03 |
1908 | NaN | Mon précieux | NaN | Fromagerie Couland | QC | Artisan | Artisanale | NaN | NaN | 22.0 | ... | 0 | Fresh Cheese | Pâte fraîche | Goat | Chèvre | Pasteurized | Pasteurisé | No Rind | Sans croûte | 2016-02-03 |
2224 | NaN | Tomme de Brebis de Charlevoix | NaN | Maison d'affinage Maurice Dufour (La) | QC | Artisan | Artisanale | NaN | http://www.fromagefin.com | 33.0 | ... | 0 | Firm Cheese | Pâte ferme | Ewe | Brebis | NaN | NaN | Washed Rind | Croûte lavée | 2016-02-03 |
2007 | NaN | Cheddar Littoral | NaN | Fromagerie Ferme du littoral | QC | Farmstead | Fermière | NaN | NaN | 30.0 | ... | 0 | Firm Cheese | Pâte ferme | Cow | Vache | Pasteurized | Pasteurisé | No Rind | Sans croûte | 2016-02-03 |
5 rows × 29 columns
Now perhaps the lesson goes to encoding categorical variables…
df["ManufacturerName"] = df["ManufacturerNameEn"].fillna(df["ManufacturerNameFr"])
df = df.drop(columns=["ManufacturerNameEn", "ManufacturerNameFr"])
df["CheeseName"] = df["CheeseNameEn"].fillna(df["CheeseNameFr"])
df = df.drop(columns=["CheeseNameEn", "CheeseNameFr"])
df = df.drop(columns=[col for col in df.columns if col.endswith("Fr")])
df_train, df_test = train_test_split(df, random_state=123)
numeric_features = ['MoisturePercent']
categorical_features = ['ManufacturerProvCode', 'ManufacturingTypeEn', 'Organic', 'CategoryTypeEn', 'MilkTypeEn', 'MilkTreatmentTypeEn', 'RindTypeEn', 'ManufacturerName']
text_features = ['CheeseName', 'FlavourEn', 'CharacteristicsEn']
drop_features = ['WebSiteEn', 'ParticularitiesEn', 'RipeningEn', 'LastUpdateDate']
target_column = 'FatContentPercent'
assert set(numeric_features + categorical_features + text_features + drop_features + [target_column]) == set(df_train.columns)
y_train = df_train[target_column]
y_test = df_test[target_column]
Note how I’ve minimized some of the code cells.
In class, I would show more of them.
But not necessarily all, if they are beyond the scope, not relevant, or old news.
Now to the key code:
numeric_transformer = Pipeline([
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
])
categorical_transformer = Pipeline([
('imputer', SimpleImputer(strategy='constant')),
('onehot', OneHotEncoder(sparse=False))
])
# Fit a separate CountVectorizer for each of the text columns.
# Need to convert the resulting sparse matrices to dense separately.
text_transformer = Pipeline([
('imputer', SimpleImputer(strategy='constant', fill_value='')),
('tolist', FunctionTransformer(lambda x: x.ravel(), validate=False)),
('countvec', CountVectorizer(max_features=10, stop_words='english')),
('todense', FunctionTransformer(lambda x: x.toarray(), validate=False))
])
preprocessor = ColumnTransformer([
('numeric', numeric_transformer, numeric_features),
('categorical', categorical_transformer, categorical_features)
] + [(f, text_transformer, [f]) for f in text_features])
preprocessor.fit(df_train);
def get_column_names(preprocessor):
"""
Gets the feature names from a preprocessor.
This entails looking at the OHE feature names and also
the words used by the CountVectorizers.
Arguments
---------
preprocessor: ColumnTransformer
A fit preprocessor following the specific format above.
Returns
-------
list
A list of column names.
"""
ohe_feature_names = list(preprocessor.named_transformers_['categorical'].named_steps['onehot'].get_feature_names(categorical_features))
text_feature_names = [f + "_" + word for f in text_features for word in preprocessor.named_transformers_[f].named_steps['countvec'].get_feature_names()]
return numeric_features + ohe_feature_names + text_feature_names
new_columns = get_column_names(preprocessor)
df_train_enc = pd.DataFrame(preprocessor.transform(df_train), index=df_train.index, columns=new_columns)
df_train_enc.head()
MoisturePercent | ManufacturerProvCode_AB | ManufacturerProvCode_BC | ManufacturerProvCode_MB | ManufacturerProvCode_NB | ManufacturerProvCode_NL | ManufacturerProvCode_NS | ManufacturerProvCode_ON | ManufacturerProvCode_PE | ManufacturerProvCode_QC | ... | CharacteristicsEn_cheese | CharacteristicsEn_colored | CharacteristicsEn_creamy | CharacteristicsEn_interior | CharacteristicsEn_pressed | CharacteristicsEn_rind | CharacteristicsEn_ripened | CharacteristicsEn_smooth | CharacteristicsEn_texture | CharacteristicsEn_white | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CheeseId | |||||||||||||||||||||
1432 | 1.127656 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2281 | -1.459684 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1908 | 2.266086 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2224 | -1.459684 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2007 | -0.528242 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 224 columns
Awesome, we’ve got everything into numeric data!
Now let’s train a model:
pipeline_dummy = Pipeline([
('preprocessor', preprocessor),
('model', DummyRegressor())])
# scores_dummy = cross_validate(pipeline_dummy, df_train, y_train, cv=5, return_train_score=True)
# pd.DataFrame(scores_dummy)[["train_score", "test_score"]].mean()
Oh no, another error - what is going on this time? Etc.
Going back to the problem types:
Python error
Poor model accuracy
Prohibitively slow code
Problems you didn’t know you had (the worst kind!), especially around train/test contamination
Most of the topics / learning outcomes in the course fit into one of these categories.
Topic |
Problem |
---|---|
One-hot encoding |
Python error |
Overfitting |
Poor model accuracy |
Fancy models, e.g. CatBoost |
Poor model accuracy |
Feature importances, e.g. SHAP |
Lack of interpretability |
Pipelines |
Problems you didn’t know you had, hard-to-maintain code |
Cross-validation |
Poor model accuracy, problems you didn’t know you had |
Survival analysis |
Problems you didn’t know you had |
Pre-trained deep networks |
Don’t know where to start |
etc.
Another quick activity before the break¶
Let’s generate a dataset for another supervised learning problem: predicting whether someone has seen the movie The Lion King.
Let’s head back to the Google Doc.
Break (5-10 min)¶
Sample activity: decision trees (20 min)¶
Basic idea: ask a bunch of yes/no questions until you end up at a prediction.
E.g.
If you are scared of lions, predict “No”
Otherwise, if you’ve seen a movie in the last 2 weeks, predict “Yes”
Otherwise, if you’d pay > \$5 for a movie, predict “Yes”
Otherwise, predict “No”
This “series of questions” approach can be drawn as a tree:
Are you scared of lions?
/ \
/ True \ False
/ \
No Have you seen a movie in the last 2 weeks?
/ \
True / \ False
/ \
Yes Would you pay more than $5 for a movie?
/ \
/ True \ False
/ \
Yes No
The decision tree algorithm automatically learns a tree like this, based on the data set!
We don’t have time to go through how the algorithm works.
But it’s worth noting that it support two types of inputs:
Categorical (e.g., Yes/No or more options)
Numeric (a number)
In the numeric case, the decision tree algorithm also picks the threshold (\$5 in this case)
Let’s apply a decision tree to our The Lion King dataset.
lion_king = pd.read_csv("data/lionking_test.csv")
lion_king.tail()
Are you scared of lions? | Have you watched a movie in the last 2 weeks? | How much would you pay to see a good movie? (in $) | Have you seen the movie The Lion King? | |
---|---|---|---|---|
0 | No | No | 10 | Yes |
1 | Yes | Yes | 10 | Yes |
2 | Yes | No | 10 | No |
3 | No | No | 0 | No |
lion_king.loc[:,lion_king.columns[:2]] = lion_king.loc[:,lion_king.columns[:2]] == "No"
# Note on the above: Encoding "No" as 1 and "Yes" as 0 because sklearn uses <= 0.5; this makes the tree diagrams more readable.
tree = DecisionTreeClassifier(max_depth=1).fit(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])
display_tree(lion_king, tree)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
163 try:
--> 164 proc = subprocess.Popen(cmd, startupinfo=get_startupinfo(), **kwargs)
165 except OSError as e:
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
799 errread, errwrite,
--> 800 restore_signals, start_new_session)
801 except:
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
1550 err_msg += ': ' + repr(err_filename)
-> 1551 raise child_exception_type(errno_num, err_msg, err_filename)
1552 raise child_exception_type(err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'dot': 'dot'
During handling of the above exception, another exception occurred:
ExecutableNotFound Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in _repr_svg_(self)
111
112 def _repr_svg_(self):
--> 113 return self.pipe(format='svg').decode(self._encoding)
114
115 def pipe(self, format=None, renderer=None, formatter=None, quiet=False):
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in pipe(self, format, renderer, formatter, quiet)
136 out = backend.pipe(self._engine, format, data,
137 renderer=renderer, formatter=formatter,
--> 138 quiet=quiet)
139
140 return out
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in pipe(engine, format, data, renderer, formatter, quiet)
242 """
243 cmd, _ = command(engine, format, None, renderer, formatter)
--> 244 out, _ = run(cmd, input=data, capture_output=True, check=True, quiet=quiet)
245 return out
246
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
165 except OSError as e:
166 if e.errno == errno.ENOENT:
--> 167 raise ExecutableNotFound(cmd)
168 else:
169 raise
ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH
<graphviz.files.Source at 0x7f383690fa90>
lion_king.iloc[:,-1].value_counts()
No 2
Yes 2
Name: Have you seen the movie The Lion King?, dtype: int64
21/30
0.7
tree.score(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])
0.75
Now, let’s make this more interesting by increasing the depth of the tree.
tree2 = DecisionTreeClassifier(max_depth=3).fit(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])
display_tree(lion_king, tree2)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
163 try:
--> 164 proc = subprocess.Popen(cmd, startupinfo=get_startupinfo(), **kwargs)
165 except OSError as e:
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
799 errread, errwrite,
--> 800 restore_signals, start_new_session)
801 except:
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
1550 err_msg += ': ' + repr(err_filename)
-> 1551 raise child_exception_type(errno_num, err_msg, err_filename)
1552 raise child_exception_type(err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'dot': 'dot'
During handling of the above exception, another exception occurred:
ExecutableNotFound Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in _repr_svg_(self)
111
112 def _repr_svg_(self):
--> 113 return self.pipe(format='svg').decode(self._encoding)
114
115 def pipe(self, format=None, renderer=None, formatter=None, quiet=False):
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in pipe(self, format, renderer, formatter, quiet)
136 out = backend.pipe(self._engine, format, data,
137 renderer=renderer, formatter=formatter,
--> 138 quiet=quiet)
139
140 return out
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in pipe(engine, format, data, renderer, formatter, quiet)
242 """
243 cmd, _ = command(engine, format, None, renderer, formatter)
--> 244 out, _ = run(cmd, input=data, capture_output=True, check=True, quiet=quiet)
245 return out
246
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
165 except OSError as e:
166 if e.errno == errno.ENOENT:
--> 167 raise ExecutableNotFound(cmd)
168 else:
169 raise
ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH
<graphviz.files.Source at 0x7f383c6f7ad0>
Please ignore the <= 0.5
above, it’s a detail we don’t need to get into here.
tree2.score(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])
1.0
tree_max = DecisionTreeClassifier(max_depth=100).fit(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])
tree_max.score(lion_king.iloc[:,:-1], lion_king.iloc[:,-1])
1.0
display_tree(lion_king, tree_max)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
163 try:
--> 164 proc = subprocess.Popen(cmd, startupinfo=get_startupinfo(), **kwargs)
165 except OSError as e:
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
799 errread, errwrite,
--> 800 restore_signals, start_new_session)
801 except:
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
1550 err_msg += ': ' + repr(err_filename)
-> 1551 raise child_exception_type(errno_num, err_msg, err_filename)
1552 raise child_exception_type(err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'dot': 'dot'
During handling of the above exception, another exception occurred:
ExecutableNotFound Traceback (most recent call last)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in _repr_svg_(self)
111
112 def _repr_svg_(self):
--> 113 return self.pipe(format='svg').decode(self._encoding)
114
115 def pipe(self, format=None, renderer=None, formatter=None, quiet=False):
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/files.py in pipe(self, format, renderer, formatter, quiet)
136 out = backend.pipe(self._engine, format, data,
137 renderer=renderer, formatter=formatter,
--> 138 quiet=quiet)
139
140 return out
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in pipe(engine, format, data, renderer, formatter, quiet)
242 """
243 cmd, _ = command(engine, format, None, renderer, formatter)
--> 244 out, _ = run(cmd, input=data, capture_output=True, check=True, quiet=quiet)
245 return out
246
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, encoding, quiet, **kwargs)
165 except OSError as e:
166 if e.errno == errno.ENOENT:
--> 167 raise ExecutableNotFound(cmd)
168 else:
169 raise
ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH
<graphviz.files.Source at 0x7f38368dbf10>