Overslaan en naar de inhoud gaan
x

Data Science: Foundations using R Specialization (Data Science: Foundations using R Specialization )

Taal

Engels

Course format Online
Datum 2021-01-01 - 2021-05-31
Tijdsduur suggested 8 hours per week
Cost 41€/month

If you are like me and want some more in-depth knowledge about R programming, here might be a course for you.

The same course is taught under the name of “Advanced Data Science I” by the John Hopkins University Bloomberg School of Public Health. The course is designed for their 2nd-year PhD and/or 2nd-year master’s students.

The description is taken from Coursera:

 

About this Specialization

Ask the right questions, manipulate data sets, and create visualizations to communicate results.

This Specialization covers foundational data science tools and techniques, including getting, cleaning, and exploring data, programming in R, and conducting reproducible research.

Applied Learning Project

In taking the Data Science: Foundations using R Specialization, learners will complete a project at the ending of each course in this specialization. Projects include installing tools, programming in R, cleaning data, performing analyses, as well as peer review assignments.

 

--> The courses are divided into weeks, but you're free to choose your own pace, see a detailed description under Learning outcomes.

To be eligible to earn a certificate, you must either pay for enrollment or qualify for financial aid.

Prerequisites

Basic understanding of statistics.

Application procedure

none, just enrol (and pay the tuition fee :-/ ...)

Grant opportunities

There's a possibility to apply for financial help on the Coursera website when you enrol.

Learning outcomes

Course 1: The Data Scientist’s Toolbox (19h)

Data Science Fundamentals 5h

In this module, we'll introduce and define data science and data itself. We'll also go over some of the resources that data scientists use to get help when they're stuck.

R and RStudio 5h

In this module, we'll help you get up and running with both R and RStudio. Along the way, you'll learn some basics about both and why data scientists use them.

Version Control and GitHub 4h

During this module, you'll learn about version control and why it's so important to data scientists. You'll also learn how to use Git and GitHub to manage version control in data science projects.

R Markdown, Scientific Thinking, and Big Data 5h

During this final module, you'll learn to use R Markdown and get an introduction to three concepts that are incredibly important to every successful data scientist: asking good questions, experimental design, and big data.

Course 2: R Programming (57h)

Background, Getting Started, and Nuts & Bolts 25h

This week covers the basics to get you started up with R. The Background Materials lesson contains information about course mechanics and some videos on installing R. The Week 1 videos cover the history of R and S, go over the basic data types in R, and describe the functions for reading and writing data.

Programming with R 12h

This week, we take the gloves off, and the lectures cover key topics like control structures and functions. We also introduce the first programming assignment for the course, which is due at the end of the week.

Loop Functions and Debugging 9h

We have now entered the third week of R Programming, which also marks the halfway point. The lectures this week cover loop functions and the debugging tools in R. These aspects of R make R useful for both interactive work and writing longer code, and so they are commonly used in practice.

Simulation & Profiling 11h

This week covers how to simulate data in R, which serves as the basis for doing simulation studies. We also cover the profiler in R which lets you collect detailed information on how your R functions are running and to identify bottlenecks that can be addressed. The profiler is a key tool in helping you optimize your programs. Finally, we cover the str function, which I personally believe is the most useful function in R.

Course 3: Getting and Cleaning Data (19h)

Week 1 2h

In this first week of the course, we look at finding data and reading different file types.

Week 2 1h

The primary goal is to introduce you to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.

Week 3 11h

This week the lectures will focus on organizing, merging and managing the data you have collected using the lectures from Weeks 1 and 2.

Week 4 5h

This week we finish up with lectures on text and date manipulation in R. In this final week we will also focus on peer grading of Course Projects.

Course 4: Exploratory Data Analysis (54h)

Week 1 19h

This week covers the basics of analytic graphics and the base plotting system in R. We've also included some background material to help you install R if you haven't done so already.

Week 2 17h

This week covers some of the more advanced graphing systems available in R: the Lattice system and the ggplot2 system. While the base graphics system provides many important tools for visualizing data, it was part of the original R system and lacks many features that may be desirable in a plotting system, particularly when visualizing high dimensional data. The Lattice and ggplot2 systems also simplify the laying out of plots making it a much less tedious process.

Week 3 13h

This week covers some of the workhorse statistical methods for exploratory analysis. These methods include clustering and dimension reduction techniques that allow you to make graphical displays of very high dimensional data (many many variables). We also cover novel ways to specify colours in R so that you can use colour as an important and useful dimension when making data graphics.

Week 4 5h

This week, we'll look at two case studies in exploratory data analysis. The first involves the use of cluster analysis techniques, and the second is a more involved analysis of some air pollution data. How one goes about doing EDA is often personal, but I'm providing these videos to give you a sense of how you might proceed with a specific type of dataset.

Course 5: Reproducible Research (4h)

Concepts, Ideas, & Structure 1h

This week will cover the basic ideas of reproducible research since they may be unfamiliar to some of you. We also cover structuring and organizing a data analysis to help make it more reproducible.

Markdown & knitr 1h

This week we cover some of the core tools for developing reproducible documents. We cover the literate programming tool knitr and show how to integrate it with Markdown to publish reproducible web documents. We also introduce the first peer assessment which will require you to write up a reproducible data analysis using knitr.

Reproducible Research Checklist & Evidence-based Data Analysis 1h

This week covers what one could call a basic check list for ensuring that a data analysis is reproducible. While it's not absolutely sufficient to follow the check list, it provides a necessary minimum standard that would be applicable to almost any area of analysis.

Case Studies & Commentaries 1h

This week there are two case studies involving the importance of reproducibility in science for you to watch.

Files/Documents

ISCED Categories

Statistiek
Wetenschappelijke modellering