Software Training Institute in Chennai with 100% Placements – SLA Institute

Easy way to IT Job

Share on your Social Media

Data Science with R Tutorial

Published On: August 9, 2025

Learning data science can be like ascending a mountain, particularly if you’re beginning. Are you having confusions with where to start with R, how to bridge theory and real-world application, and how to create a portfolio that really shines. This data science with R tutorial is intended to break through that confusions, providing a concise guidance on learning Data Science using R. Ready to push past these roadblocks and begin your data science journey? Learn about our extensive Data Science with R course syllabus today!

Data Science with R: Your Beginner’s Guide to Unleashing Data Potential

R is an incredibly powerful, open-source programming language and environment for statistical computing and graphics. It’s one of the favorite tools of data scientists all over the world, rendering it essential for anyone planning to achieve excellence in data analytics and more.

We’re going to guide you through the basics, discussing typical issues and ensuring you feel secure at every step. Ditch the jargon and technical theory for a little while; we’re going to learn by doing, and with practical applications in mind.

Why R for Data Science?

You may ask, “Why R? Why not Python?” Both are great tools for data science, and many use both. But R stands out uniquely with its solid statistical functionality and beautiful visualization capabilities.

  • Statistical Powerhouse: For statisticians, by statisticians, R programming was created. That is, it has an unmatched universe of packages for higher-level statistical modeling, hypothesis testing, and machine learning algorithms.
  • Exceptional Visualizations: Making stunning and informative graphs is really very easy and straightforward in R, thanks to libraries such as ggplot2. Graphing your data is important for interpretation and communication of your findings.
  • Vast Community and Resources: R has an enormous, thriving community, so you’ll have copious tutorials, forums, and packages available to assist with nearly any data-driven task. This also makes it easy to learn R for data analysis.
  • Free and Open Source: No licensing costs! You can utilize R and all its collection of packages for free of cost, a tremendous boon for new learners as well as veteran professionals.

Though R programming as well as Python both have their own strengths, it is a good idea to concentrate specifically on R programming for data science as this will give you a niche skill set much sought after by industry professionals. This tutorial shall mostly deal with using R for data science.

Recommended: Data Science with R Online Course.

Getting Started: Installing R and RStudio

Before we dive into the exciting stuff, we need to set up our environment. Think of R as the engine and RStudio as the dashboard that makes driving the engine much easier.

Install R:
  • Go to the official R Project website: https://cran.r-project.org/
  • Click on “Download R for [Your Operating System]”.
  • Read the installation guidelines for your platform (Windows, macOS, Linux). It’s usually a simple procedure like installing any other program.
Install RStudio Desktop (Recommended):
  • Visit the RStudio page: https://posit.co/download/rstudio-desktop/
  • Select the “RStudio Desktop” free version.
  • Download and install it.

After installing both of them, start RStudio. You’ll have a number of panes:

  • Source Editor (Top-Left): This is where you enter your R code.
  • Console (Bottom-Left): Here is where R runs your code and shows you outputs. You can even enter commands directly here.
  • Environment/History (Top-Right): The Environment window contains all the objects (variables, data sets, functions) presently loaded into your R session. The History window stores your previous commands.
  • Files/Plots/Packages/Help/Viewer (Bottom-Right): There are several tabs in this pane. “Plots” displays your plots, “Packages” assists in the management of installed packages, and “Help” is a lifesaver for searching documentation.

Review Core Skills: Data Science Interview Questions and Answers.

Your First Steps in R: Basic Operations and Data Types

Let’s begin with some basic R operations. Don’t worry if it comes across as basic at first; constructing a solid foundation is essential to becoming proficient in R for data scientists.

Basic Arithmetic

R can be used like a sophisticated calculator.

# Addition

2 + 3

# Subtraction

10 – 5

# Multiplication

4 * 6

# Division

20 / 4

# Exponentiation

2^3 # 2 to the power of 3

Variables and Assignment

Values can be stored in variables via the <- (assignment operator) or =. <- is the traditional choice in R.

my_number <- 15

my_text <- “Hello, Data Science!”

# You can also use =

another_number = 25

# Print the values

print(my_number)

print(my_text)

print(another_number)

Data Types in R

Familiarity with data types is important for successful R programming. R infers the data type automatically, but it’s a good idea to know them.

Numeric: It contains real numbers (integers and decimals).

x <- 10.5

y <- 7

class(x) # Output: “numeric”

class(y) # Output: “numeric”

Integer: It contains whole numbers (commonly explicitly declared with L).

z <- 10L

class(z) # Output: “integer”

Character (String): It denotes text data.

name <- “Alice”

class(name) # Output: “character”

Logical (Boolean): It denotes TRUE or FALSE.

is_true <- TRUE

is_false <- FALSE

class(is_true) # Output: “logical”

Complex: For complex numbers (not generally used in introductory data science).

Data Structures: The Blocks of Data in R

Data science using R is heavily dependent on the way data is structured. R supports a number of basic data structures.

Vectors: Your First Collection of Data

The most basic data structure in R is the vector. It is an ordered collection of identically typed components.

# Numeric vector

ages <- c(25, 30, 22, 35, 28)

print(ages)

class(ages) # Output: “numeric”

# Character vector

names <- c(“John”, “Jane”, “Mike”, “Sarah”)

print(names)

class(names) # Output: “character”

# Logical vector

is_student <- c(TRUE, FALSE, TRUE, TRUE)

print(is_student)

class(is_student) # Output: “logical”

# What happens if you mix types? R coerces them to the most flexible type.

mixed_vector <- c(1, “hello”, TRUE)

print(mixed_vector) # Output: “1”     “hello” “TRUE”

class(mixed_vector) # Output: “character”

Accessing Vector Elements: Elements can be accessed by position (index). R employs 1-based indexing.

ages[1]    # First element: 25

ages[3]    # Third element: 22

ages[c(1, 4)] # First and fourth elements: 25 35

ages[2:4]  # Elements from second to fourth: 30 22 35

Matrices: Two-Dimensional Arrays of Same Type

A matrix is a two-dimensional array of elements of the same type of data. Imagine it as a grid or table.

# Create a matrix

my_matrix <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)

print(my_matrix)

# Output:

#      [,1] [,2] [,3]

# [1,]    1    2    3

# [2,]    4    5    6

# byrow = FALSE (default): fills by column

my_matrix_col <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = FALSE)

print(my_matrix_col)

# Output:

#      [,1] [,2] [,3]

# [1,]    1    3    5

# [2,]    2    4    6

Accessing Matrix Elements: Use [row, column].

my_matrix[1, 2]   # Element in first row, second column: 2

my_matrix[2, ]    # All elements in the second row: 4 5 6

my_matrix[, 3]    # All elements in the third column: 3 6

Recommended: R Programming Course in Chennai.

Data Frames: The Unifying Structure of Data Science in R

Data frames are the single most crucial data structure for data science using R. They’re tables where a column may hold a different type of data, but all the items in a column have to be of the same type. That’s just like you would think of structuring a dataset!

# Create a data frame

students_data <- data.frame(

  Name = c(“Alice”, “Bob”, “Charlie”, “Diana”),

  Age = c(21, 23, 22, 24),

  Major = c(“CS”, “Math”, “Physics”, “CS”),

  GPA = c(3.8, 3.5, 3.9, 3.7)

)

print(students_data)

# Output:

#      Name Age   Major GPA

# 1   Alice  21      CS 3.8

# 2     Bob  23    Math 3.5

# 3 Charlie  22 Physics 3.9

# 4   Diana  24      CS 3.7

# Check class

class(students_data) # Output: “data.frame”

Accessing Data Frame Elements:

By column name using $ (most readable and common):

students_data$Name

students_data$Age

By column name using [] or [[]]:

students_data[“Major”]   # Returns a data frame with one column

students_data[[“Major”]] # Returns a vector

By row and column index:

students_data[1, 2]    # First row, second column (Age of Alice): 21

students_data[3, ]     # Third row (Charlie’s data)

students_data[, “GPA”] # GPA column

Useful Data Frame Functions:

str(): It is used to display the data frame structure (column data types).

str(students_data)

summary(): It is used to display summary statistics for every column.

summary(students_data)

head(): It displays the first 6 rows.

head(students_data)

tail(): It displays the last 6 rows.

tail(students_data)

dim(): It returns the dimensions (rows, columns).

dim(students_data) # Output: 4 4

colnames() / names(): It returns column names.

colnames(students_data)

Lists: The Most Flexible Structure

A list is an assortment of objects of various types. It is a general-purpose container. A list may even contain lists, data frames, vectors, etc.

my_list <- list(

  name = “Dr. Einstein”,

  age = 76,

  is_active = FALSE,

  hobbies = c(“Physics”, “Music”, “Sailing”),

  research_data = students_data # Our data frame!

)

print(my_list)

Accessing List Elements:

Using $ for named elements:

my_list$name

my_list$hobbies[1] # Access an element within a vector in the list

Using [[]] for elements by name or index (returns the actual object):

my_list[[“age”]]

my_list[[4]] # The ‘hobbies’ vector

Using [] (returns a sub-list):

my_list[1] # Returns a list containing only ‘name’

my_list[c(“name”, “age”)] # Returns a list with ‘name’ and ‘age’

Explore: Data Scientist Salary for Freshers.

Packages: Unlocking R’s Potential

One of the strongest aspects of R and data science is its enormous package ecosystem. Packages are groups of functions, data, and compiled code in a standard format. They add R capabilities for particular tasks, such as data manipulation, visualization, or machine learning.

Packages are similar to smartphone apps – they introduce new features.

Installing and Loading Packages

Install: You install a package only once.

install.packages(“dplyr”) # For data manipulation

install.packages(“ggplot2”) # For stunning visualizations

install.packages(“readr”) # For reading various data formats

Note: If you run install.packages(), R may prompt you to select a CRAN mirror. Pick one near you for quicker downloads.

Load: Once you have installed, you must load the package within your current R session each time you open a new session in which you need to utilize its functions.

library(dplyr)

library(ggplot2)

library(readr)

Pro-tip: If you attempt to call a function from a package without loading it, R will return an error such as “could not find function function_name”.

Data Import and Export: Piping Your Data In and Out

Real-world data science with R starts with piping your data into R.

Importing Data

R can read many different file formats:

CSV (Comma Separated Values): Most widely used. Utilize read_csv() from the readr package (faster and more consistent than base R’s read.csv()). 

Suppose you have a file called my_data.csv in your working directory. You can locate your working directory by using getwd(). To switch directories, use setwd(“path/to/your/folder”).

# Create a dummy CSV file for demonstration

# (You would typically have this file already)

sample_data_text <- “ID,Name,Score\n1,Alice,85\n2,Bob,92\n3,Charlie,78”

writeLines(sample_data_text, “students.csv”)

library(readr)

my_data <- read_csv(“students.csv”)

print(my_data)

# Output:

# # A tibble: 3 × 3

#      ID Name    Score

#   <dbl> <chr>   <dbl>

# 1     1 Alice      85

# 2     2 Bob        92

# 3     3 Charlie    78

Note: read_csv produces a ‘tibble’ which is a new data.frame with some enhancements. You may treat it very similarly.

Excel Files (.xlsx, .xls): Use the readxl package.

# install.packages(“readxl”)

library(readxl)

# my_excel_data <- read_excel(“my_data.xlsx”, sheet = “Sheet1”)

Other Formats:

  • haven package for SAS, SPSS, Stata files.
  • jsonlite for JSON files.
  • XML for XML files.

Exporting Data

You can also export your data from R after processing.

CSV: use write_csv() from readr.

library(readr)

write_csv(my_data, “processed_students.csv”)

R Data Format (.RData or .rds): R-specific data formats for directly saving R objects. .rds is usually the best choice for single objects.

saveRDS(my_data, “my_processed_data.rds”)

# To load it back:

# loaded_data <- readRDS(“my_processed_data.rds”)

Suggested: Data Analytics Course Online.

Data Manipulation with dplyr: Your Data Science Superpower

Data transformation and cleaning are essential operations in R and data science. The dplyr package is an essential R tool for data scientists since it simplifies these operations to an extremely intuitive and effective level. It follows a “grammar of data manipulation” that is straightforward to learn.

Let’s apply our students_data data frame:

students_data <- data.frame(

  Name = c(“Alice”, “Bob”, “Charlie”, “Diana”, “Eve”),

  Age = c(21, 23, 22, 24, 21),

  Major = c(“CS”, “Math”, “Physics”, “CS”, “Biology”),

  GPA = c(3.8, 3.5, 3.9, 3.7, 3.2),

  Enrolled_Year = c(2022, 2021, 2022, 2021, 2023)

)

library(dplyr)

Key dplyr Functions:
  1. select(): Choosing Columns
  • Select specific columns.

# Select Name and GPA columns

selected_cols <- students_data %>%

  select(Name, GPA)

print(selected_cols)

  • Select all columns except one.

# Select all columns except Enrolled_Year

no_year_col <- students_data %>%

  select(-Enrolled_Year)

print(no_year_col)

  1. filter(): Filtering Rows
  • Filter rows based on conditions.

# Students with GPA greater than 3.6

high_gpa_students <- students_data %>%

  filter(GPA > 3.6)

print(high_gpa_students)

# CS majors enrolled in 2022

cs_2022_students <- students_data %>%

  filter(Major == “CS”, Enrolled_Year == 2022)

print(cs_2022_students)

  1. mutate(): Creating New Columns
  • Add new columns or update existing ones.

# Add a column for ‘Is_Excellent’ based on GPA

students_with_status <- students_data %>%

  mutate(Is_Excellent = GPA >= 3.7)

print(students_with_status)

# Calculate Age_in_5_Years

students_data <- students_data %>%

  mutate(Age_in_5_Years = Age + 5)

print(students_data)

  1. arrange(): Sorting Data
  • Sort rows by one or more columns.

# Sort by GPA in descending order

sorted_by_gpa <- students_data %>%

  arrange(desc(GPA))

print(sorted_by_gpa)

# Sort by Major (ascending) then by Age (ascending)

sorted_by_major_age <- students_data %>%

  arrange(Major, Age)

print(sorted_by_major_age)

  1. summarise() / summarize(): Summarizing Data
  • Calculate summary statistics (mean, median, count, etc.).

# Calculate average GPA and total number of students

summary_stats <- students_data %>%

  summarise(

    Average_GPA = mean(GPA),

    Total_Students = n()

  )

print(summary_stats)

  1. group_by(): Grouping for Aggregation
  • Perform operations on groups of rows. Often used with summarise(). This is incredibly powerful for data analysis.

# Calculate average GPA per major

gpa_by_major <- students_data %>%

  group_by(Major) %>%

  summarise(

    Average_GPA = mean(GPA),

    Count = n()

  )

print(gpa_by_major)

# Output:

# # A tibble: 4 × 3

#   Major   Average_GPA Count

#   <chr>         <dbl> <int>

# 1 Biology         3.2     1

# 2 CS              3.75    2

# 3 Math            3.5     1

# 4 Physics         3.9     1

The Pipe Operator (%>%) 

You’ve likely seen the %>% (pipe) operator used in the examples. This is from the magrittr package (included with dplyr). Writing neat, understandable R programming code is made much easier with its help.

Rather than: function2(function1(data))

You write: data %>% function1() %>% function2()

It forwards the output of the left-hand side as the first argument to the function on the right-hand side. This gets your code to execute in a logical manner, simulating how you think through data manipulation operations.

Upskill: Data Science Full Stack Course in Chennai.

Data Visualization with ggplot2: Telling Your Data Story

Data visualization is not merely about pretty pictures; it’s about insights, trends, and communicating what you find. ggplot2 (part of the tidyverse package, similar to dplyr) is the go-to for R data analytics visualization.

It follows a “grammar of graphics,” where you construct plots by layering elements, making it fantastically flexible and powerful.

library(ggplot2)

Let’s reuse our students_data again.

Elements of a ggplot2 Plot:

  • ggplot(): The central function, where you set the data and global aesthetics (mappings of variables to visual properties).
  • aes() (Aesthetics): Sets mappings of your variables to visual properties of the plot (e.g., x-axis, y-axis, color, size).
  • geom_*() (Geometries): Specifies the type of geometric object to draw (e.g., points, bars, lines, boxes).

Examples:

  1. Scatter Plot: Age vs. GPA

ggplot(data = students_data, aes(x = Age, y = GPA)) +

  geom_point() +

  labs(title = “Student Age vs. GPA”,

       x = “Age of Student”,

       y = “Grade Point Average”) +

  theme_minimal()

  • ggplot(data = students_data, aes(x = Age, y = GPA)): Set up the plot, defining the data and mapping Age to the x-axis and GPA to the y-axis.
  • geom_point(): Puts points on the plot for each data point.
  • labs(): Adds title and axis labels.
  • theme_minimal(): Uses a clean, minimal theme.
  1. Bar Chart: Students per Major

Let’s first get the counts per major.

major_counts <- students_data %>%

  group_by(Major) %>%

  summarise(Count = n())

ggplot(data = major_counts, aes(x = Major, y = Count, fill = Major)) +

  geom_bar(stat = “identity”) + # stat=”identity” means use y-values as is

  labs(title = “Number of Students per Major”,

       x = “Major”,

       y = “Number of Students”) +

  theme_classic()

  • fill = Major: Colors bars according to the Major variable.
  • geom_bar(stat = “identity”): It produces a bar chart with the height of bars set by the Count variable.
  1. Histogram: Distribution of GPAs

ggplot(data = students_data, aes(x = GPA)) +

  geom_histogram(binwidth = 0.2, fill = “skyblue”, color = “black”) +

  labs(title = “Distribution of Student GPAs”,

       x = “GPA”,

       y = “Frequency”) +

  theme_light()

These are only a few simple ones. ggplot2 can produce nearly any statistical graphic, so it is an absolute necessity to use for learning R data analysis.

Basic Statistical Concepts in R

Data science with R is naturally statistical. Although a complete statistics class is out of scope for this tutorial, let’s briefly cover some basics and how to implement them with R.

Descriptive Statistics

We’ve already encountered summary() for simple descriptive statistics. You can also compute them one by one.

# Mean GPA

mean(students_data$GPA)

# Median GPA

median(students_data$GPA)

# Standard Deviation of GPA

sd(students_data$GPA)

# Quartiles and Interquartile Range

quantile(students_data$GPA)

IQR(students_data$GPA)

Correlation

The degree and direction of a linear relationship between two quantitative variables are indicated by correlation.

# Correlation between Age and GPA

cor(students_data$Age, students_data$GPA)

  • Positive value gives the positive relationship (as one rises, the other tends to rise).
  • Negative value gives the negative relationship.
  • Close to 0 gives a moderate or no linear relationship.
Simple Linear Regression

Linear regression is a statistical technique for modeling a dependent variable (response) and one or more independent variables (predictors).

Let’s try to forecast GPA given Age.

# Build a linear regression model

gpa_model <- lm(GPA ~ Age, data = students_data)

# View the model summary

summary(gpa_model)

summary() output will indicate to you:

  • Coefficients: The intercept and the slope for Age. The slope informs you about how much GPA would change for one unit increase in Age.
  • R-squared: To what extent Age explains GPA change.
  • P-values: For the coefficients, their statistical significance.

Recommended: Data Science with Python Course Online.

Beyond the Basics: What’s Next in Your R Data Science Journey?

This data science with R programming tutorial has given you a foundation in data science with R. You’ve learned:

Why R is capable for data science.

  • How to configure your R environment.
  • Fundamental R syntax, data types, and structures.
  • Importing and exporting data.
  • Basic data manipulation with dplyr.
  • Making effective visualizations with ggplot2.
  • Fundamental statistical principles in R.

To become a true master of R for data scientists, the following are areas to study next:

  • More Advanced Data Manipulation: Joining data frames (left_join, inner_join), reshaping data (pivot_longer, pivot_wider), string manipulation (stringr).
  • Feature Engineering: Deriving new variables from existing variables to enhance model performance.
  • Machine Learning:
    • Supervised Learning: Linear Regression (which we briefly mentioned), Logistic Regression, Decision Trees, Random Forests, Support Vector Machines. The tidymodels ecosystem is great for this.
    • Unsupervised Learning: Clustering (K-Means), Principal Component Analysis (PCA).
  • Time Series Analysis: For data gathered over time.
  • Big Data with R: Packages for processing large datasets that do not fit into memory.
  • R Markdown: For writing dynamic reports that integrate code, output, and text (crucial for reproducible research).
  • Shiny: For creating interactive web applications from R directly.

Becoming proficient in R for data analysis is a process that lasts. Practice regularly, apply it to real-world projects, and don’t be afraid to refer to R’s comprehensive documentation and community forums.

Explore: All Software Training Courses.

Conclusion

The flexibility and power of R, along with the passionate community, make it an unbeatable resource for anyone aiming towards a data career. We hope this data science with R tutorial has prepared you with the essentials you need to tackle actual data challenges with confidence.

Ready to go further and learn the advanced techniques that will set you apart? Our Data Science with R course provides detailed step-by-step modules, interactive projects, and access to experts to help you go from beginner to expert in data science. Join today and unlock your potential!

Share on your Social Media

Just a minute!

If you have any questions that you did not find answers for, our counsellors are here to answer them. You can get all your queries answered before deciding to join SLA and move your career forward.

We are excited to get started with you

Give us your information and we will arange for a free call (at your convenience) with one of our counsellors. You can get all your queries answered before deciding to join SLA and move your career forward.