Reproducible analysis

Reproducible analysis is the practice of distributing all data, code and tools required to produce the results obtained through an analysis. Making an analysis reproducible may be the lowest hanging fruit to improve analytical outputs and processes.

A reproducible ananlysis serves as an in-depth documentation of an analysis. Results can be recreated by others and repeated at any time. It effectively promotes collaboration of several analysts working on the same project. If appropriatly set up, the analysis can be redone with different parameters making follow-up analyses straightforward and painfree. The rigor required to make an analysis reproducible increases the confidence of both analysts and clients in the obtained results.

Course syllabus

  • We start the course with an introduction to reproducible research, its principles and the spectrum of reproducibility if complete replication of an analysis is unfeasible
  • We discuss a standardized directory structure for all analytics projects
  • The most common way to make an analysis reproducible is to script it using a programming language. We cover the most basic concepts of the statistical programming language R. Although previous programming knowledge is beneficial, it is not essential to follow the course.
  • Using code in the analytical process allows coordinating work among multiple analysts and keeping track of all changes throughout the development of the analysis. We show how to use a version control system to automatically track all files of the analysis.
  • Also the documentation of an analysis should be reproducible. We discuss how to script the documentation and regenerate it whenever parts of the analysis change.
  • Usually an analysis is composed of several scripts that devide it into logical parts, e.g. data cleaning, exploratory data analysis, modeling and documentation. In the last part of the course we talk about the glue between these scripts that ensures they are executed in a specific order.

Course prerequisites

The course is aimed at business and data analysts who regularly work with data. The course covers the very basics of the statistical programming language R, previous knowledge on R or other programming languages is beneficial bot not required.

What should you bring?

You should bring a laptop to participate the practical part of the course. After registration we will provide you with links to download all course materials and relevant software to install.

Instructor — Stefan Schliebs, PhD

Stefan is the lead data scientist at Quantiful. He holds a Master degree in Computer Science from the University of Leipzig, Germany and a PhD from the Auckland University of Technology. He received numerous academic rewards, published more than 35 scientific articles in international journals, conferences and books and lectured data science courses on university level to 100s of students. He worked as a data scientist in various industries and has 10+ years of experience in commercial and academic environments. He is a co-organizer of the R User Meetup Group Auckland and Hackathon participant.

Location

Details will be announced closer to the course date.

Course fees

The price for a seat still needs to be confirmed. There are limited seats at a discounted price for early career professionals and graduates.

Interested?

Please leave us your email address and we will get back to you with more details shortly.