dsci-100

DSCI 100: Introduction to Data Science

Time and Place

Note: synchronous class sessions will be conducted using Zoom video conferencing platform. You will need Version 5.3.0 or higher of the Zoom desktop application installed on your computer (Not available on Chrome OS). Please download this before coming to class on Tuesday! For more info: https://skylight.science.ubc.ca/lt/guides/zoom/student-breakout-rooms If you aren’t able to download Zoom, please contact the course instructor for your course section.

Quizzes

Students must write their quizzes during lecture time for which they registered. The quizzes and exam will be conducted with Canvas quizzes and will be invigilated using Zoom. Please ensure that you meet system requirements to use Zoom and Canvas.

Description

Use of data science tools to summarize, visualize, and analyze data. Sensible workflows and clear interpretations are emphasized.

Prerequisite Mathematical Knowledge

As an example, British Columbia’s Math 12 or Pre-Calculus 12 courses would satisfy the prerequisite.

Textbook

We are using an open source textbook available free on the web: https://ubc-dsci.github.io/introduction-to-datascience/

Expanded Course Description

In recent years, virtually all areas of inquiry have seen an uptake in the use of data science tools. Skills in the areas of assembling, analyzing, and interpreting data are more critical than ever. This course is designed as a first experience in honing such skills. Students who have completed this course will be able to implement a data science workflow in the R programming language, by “scraping” (downloading) data from the internet, “wrangling” (managing) the data intelligently, and creating tables and/or figures that convey a justifiable story based on the data. They will be adept at using tools for finding patterns in data and making predictions about future data. There will be an emphasis on intelligent and reproducible workflow, and clear communications of findings. No previous programming skills necessary; beginners are welcome!

Course Software Platforms

Students will learn to perform their analysis using the R programming language. Worksheets and tutorial problem sets as well as the final project analysis, development, and reports will be done using Jupyter Notebooks. Students will access the worksheets and tutorials in Jupyter Notebooks through Canvas. Students will require a laptop, chromebook or tablet in both lectures and tutorials. If a student does not their own laptop or chromebook, students may be able to loan a laptop from the UBC library.

Learning Outcomes

By the end of the course, students will be able to:

Teaching Team

Note that your TAs may have class right before their DSCI100 office hours, so they may run a few minutes late. Please be patient!

Section Position Name Email Office Hours Office Location
002 Instructor Tiffany Timbers tiffany.timbers@stat.ubc.ca Thursdays 9:00 - 10:00 AM (PT) Zoom
003 Instructor Melissa Lee melissa.lee@stat.ubc.ca Thursdays 12:30 - 1:30 PM (PT) Zoom
002 & 003 TA Alex   Wednesdays 12 - 1 PM (PT) Zoom
002 & 003 TA Cathy   Fridays 12 - 1 PM (PT) Zoom

Assessment

In each class (lecture and tutorial) there will be an assignment. Lecture and tutorial worksheet due dates are posted on Canvas. To open the assignment, click the link (e.g. worksheet_01) from Canvas. To submit your assignment, just make sure your work is saved (File -> Save and Checkpoint to be sure) on our server (i.e., using the link from Canvas) before the deadline. Our server will automatically snapshot at the due date/time.

Course breakdown

Deliverable Percent Grade
Lecture worksheets 5
Tutorial problem sets 15
Group project 20
Two quizzes 40
Final exam 20

Due to the global pandenmic, we will drop the lowest lecture worksheet grade, as well as the lowest tutoral worksheet grade. We still recommend students complete all assigned work for the course as they are critical for learning. This is dropping of the lowest lecture and tutorial worksheet grades is intended to help increase flexibility for students to help navigate challenges they face due to the global pandemic.

Group project breakdown

Deliverable Percent Grade
Proposal 3
Final report 11
Team work 5
Group contract 1

Schedule

Lectures are held on Tuesdays. Tutorials are held on Thursdays and build on the concepts learned in lecture.

Topic Description
Chapter 1: Introduction to Data Science Learn to use the R programming language and Jupyter notebooks as you walk through a real world data Science application that includes downloading data from the web, wrangling the data into a useable format and creating an effective data visualization.
Chapter 2: Reading in data locally and from the web Learn to read in various cases of data sets locally and from the web. Once read in, these data sets will be used to walk through a real world data Science application that includes wrangling the data into a useable format and creating an effective data visualization.
Chapter 3: Cleaning and wrangling data This week will be centered around tools for cleaning and wrangling data. Again, this will be in the context of a real world data science application and we will continue to practice working through a whole case study that includes downloading data from the web, wrangling the data into a useable format and creating an effective data visualization.
Chapter 4: Effective data visualization Expand your data visualization knowledge and tool set beyond what we have seen and practiced so far. We will move beyond scatter plots and learn other effective ways to visualize data, as well as some general rules of thumb to follow when creating visualations. All visualization tasks this week will be applied to real world data sets. Again, this will be in the context of a real world data science application and we will continue to practice working through a whole case study that includes downloading data from the web, wrangling the data into a useable format and creating an effective data visualization.
Transition week Quiz 1
Chapter 6: Classification Introduction to classification using K-nearest neighbours (k-nn)
Chapter 7: Classification, continued Classification continued
Chapter 8: Regression Introduction to regression using K-nearest neighbours (k-nn). We will focus on prediction in cases where there is a response variable of interest and a single explanatory variable.
Chapter 9: Regression, continued Continued exploration of k-nn regression in higher dimensions. We will also begin to compare k-nn to linear models in the context of regression.
Transition week Quiz 2
Chapter 10: Clustering Introduction to clustering using K-means
Chapter 11: Introduction to statistical inference Introduce sampling and estimation for sample means and proportions.
Chapter 12: Introduction to statistical inference, continued Introduce confidence intervals, and calculating them via boostrapping.
Exam period Final Exam

Policies

Late Assignments / Quiz Absence

Students must be present at the invigilation venue (in class, on Zoom, examination centre, etc) to take the quiz; otherwise they will be considered to have missed the quiz and will be assigned a grade of zero. Students who will miss a quiz must provide a self-declaration and make arrangements (e.g., schedule an oral make-up quiz) with the Instructor prior to the quiz. Failing to present a declaration within a reasonable timeframe before the quiz will result in a grade of zero.

We will not provide extensions for the lecture and tutorial worksheets; late assignments will receive a grade of zero. Instead, we will drop the lowest grade on tutorials and worksheets for the semester.

For all other assignments and the course project, a late submission will receive a 50% penalty.

Late Registration

Students who register for the class late have 1 week from their registration date on Canvas to complete all prior assignments.

Autograder Policy

Many of the questions in assignments are graded automatically by software. The grading computer has exactly the same hardware setup as the server that students work on. No assignment, when completed, should take longer than 5 minutes to run on the server. The autograder will automatically stop (time out) for each student assignment after a maximum of 5 minutes; any ungraded questions at that point will receive a score of 0.

Furthermore, students are responsible for making sure their assignments are reproducible, and run from beginning to end on the autograding computer. In particular, please ensure that any data that needs to be downloaded is done so by the assignment notebook with the correct filename to the correct folder. A common mistake is to manually download data when working on the assignment, making the autograder unable to find the data and often resulting in an assignment grade of 0.

In short: whatever grade the autograder returns after 5 minutes (assuming the teaching team did not make an error) is the grade that will be assigned.

Re-grading

If you have concerns about the way your work was graded, please contact the TA who graded it within one week of having the grade returned to you. After this one-week window, we may deny your request for re-evaluation. Also, please keep in mind that your grade may go up or down as a result of re-grading.

Device/Browser

Students are responsible for using a device and browser compatible with all functionality of Canvas. Chrome or Firefox browsers are recommended; Safari has had issues with Canvas quizzes in the past.

While classes are remote due to the ongoing pandemic, students are responsible to have a stable internet connection and a functioning webcam and microphone. Absence of any of these will not be considered a valid reason not to take a quiz/exam or submit an assignment on time.

Missed Final Exam

Students who miss the final exam must report to their faculty advising office within 72 hours of the missed exam, and must supply supporting documentation. Only your faculty advising office can grant deferred standing in a course. You must also notify your instructor prior to (if possible) or immediately after the exam. Your instructor will let you know when you are expected to write your deferred exam. Deferred exams will ONLY be provided to students who have applied for and received deferred standing from their faculty.

Academic Concession Policy

Please see UBC’s concession policy for detailed information on dealing with missed coursework, quizzes, and exams under circumstances of an acute and unanticipated nature.

Academic Integrity

The academic enterprise is founded on honesty, civility, and integrity. As members of this enterprise, all students are expected to know, understand, and follow the codes of conduct regarding academic integrity. At the most basic level, this means submitting only original work done by you and acknowledging all sources of information or ideas and attributing them to others as required. This also means you should not cheat, copy, or mislead others about what is your work. Violations of academic integrity (i.e., misconduct) lead to the breakdown of the academic enterprise, and therefore serious consequences arise and harsh sanctions are imposed. For example, incidences of plagiarism or cheating may result in a mark of zero on the assignment or exam and more serious consequences may apply if the matter is referred to the President’s Advisory Committee on Student Discipline. Careful records are kept in order to monitor and prevent recurrences.

A more detailed description of academic integrity, including the University’s policies and procedures, may be found in the Academic Calendar at http://calendar.ubc.ca/vancouver/index.cfm?tree=3,54,111,0.

Plagiarism

Students must correctly cite any code or text that has been authored by someone else or by the student themselves for other assignments. Cases of plagiarism may include, but are not limited to:

An “adequate acknowledgement” requires a detailed identification of the (parts of the) code or text reused and a full citation of the original source code that has been reused.

The above attribution policy applies only to assignments. No code or text may be copied (with or without attribution) from any source during a quiz or exam. Answers must always be in your own words. At a minimum, copying will result in a grade of 0 for the related question.

Repeated plagiarism of any form could result in larger penalties, including failure of the course.

Dealing With COVID-19

The COVID-19 pandemic has affected us all in different ways: it’s okay to not be okay, and we all need to support each other during this time. With that said:

Further, teaching/learning an intense graduate course like MDS online is a very new concept to all of us. If you have feedback on how I can improve the teaching experience, don’t hesitate to reach out - I’m sure things won’t be perfect from the get-go.

Finally, here is an official statement from UBC regarding the online learning experience:

During this pandemic, the shift to online learning has greatly altered teaching and studying at UBC, including changes to health and safety considerations. Keep in mind that some UBC courses might cover topics that are censored or considered illegal by non-Canadian governments. This may include, but is not limited to, human rights, representative government, defamation, obscenity, gender or sexuality, and historical or current geopolitical controversies. If you are a student living abroad, you will be subject to the laws of your local jurisdiction, and your local authorities might limit your access to course material or take punitive action against you. UBC is strongly committed to academic freedom, but has no control over foreign authorities (please visit http://www.calendar.ubc.ca/vancouver/index.cfm?tree=3,33,86,0 for an articulation of the values of the University conveyed in the Senate Statement on Academic Freedom). Thus, we recognize that students will have legitimate reason to exercise caution in studying certain subjects. If you have concerns regarding your personal situation, consider postponing taking a course with manifest risks, until you are back on campus or reach out to your academic advisor to find substitute courses. For further information and support, please visit: http://academic.ubc.ca/support-resources/freedom-expression.

Attribution

Parts of this syllabus (particularly the policies) have been copied and derived from the UBC MDS Policies.