Data Structures

Vectors in R

A vector is the most basic structure in R
It is one-dimensional; its single dimension is its length
A vector of length n has n cells
Each cell can hold a single value, like a numeric or string value
- In general, vectors can only hold ONE type of data

numVec <- c(2,3,4)      # <- is the assigning operator
numVec

## [1] 2 3 4

Examples of Vectors

Examples of character, logical, and complex vectors:

intVec <- c(2L, 3L, 4L)
intVec

## [1] 2 3 4

charVec <- c("red", "green", "blue")
charVec

## [1] "red"   "green" "blue"

logVec <- c(TRUE, FALSE, FALSE, T, F)
logVec

## [1]  TRUE FALSE FALSE  TRUE FALSE

Matrices

A matrix is a special case of a vector
- Unlike vectors, matrices have a dimension attribute

myMat <- matrix(nrow = 2, ncol = 4)
myMat

##      [,1] [,2] [,3] [,4]
## [1,]   NA   NA   NA   NA
## [2,]   NA   NA   NA   NA

attributes(myMat)

## $dim
## [1] 2 4

Matrices

myMat <- matrix(1:8, nrow = 2, ncol = 4)
myMat # matrices are filled in column-wise

##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8

A matrix is a special case of a vector

myVec <- 1:8
myVec

## [1] 1 2 3 4 5 6 7 8

dim(myVec) <- c(2,4)
myVec

##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8

Similar to vectors, all elements of a matrix should be of the same data type
- If not, R automatically coerces

Other Ways to Create a Matrix

Intuitively, matrices seem to be a combination of vectors that are put next to each other (either column-wise or row-wise).
rbind() and cbind() (row bind and column bind) achieve this:

vec1 <- 1:4
vec2 <- sample(1:100, 4, replace = FALSE)
vec3 <- sample(1:20, 4, replace=TRUE)
colMat <- cbind(vec1, vec2, vec3)
colMat

##      vec1 vec2 vec3
## [1,]    1   50   15
## [2,]    2   72   18
## [3,]    3   15    8
## [4,]    4   93   19

Other Ways to Create a Matrix

vec1 <- 1:4
vec2 <- sample(1:100, 4, replace = TRUE)
vec3 <- sample(1:20, 4, replace=FALSE)
rowMat <- rbind(vec1, vec2, vec3)
rowMat

##      [,1] [,2] [,3] [,4]
## vec1    1    2    3    4
## vec2   86   94   82   60
## vec3   14   16    3    7

Factors

A factor is a vector used to specify a discrete classification (categorical values).
Factors can be ordered or un-ordered
Levels of a factor are better when labeled (self-descriptive)
- Consider sex as (0, 1) as opposed to labeled (“F”, “M”)

Sex <- rep(c("Female", "Male"), times = 3)
Sex

## [1] "Female" "Male"   "Female" "Male"   "Female" "Male"

SexFac1 <- factor(Sex)
SexFac1

## [1] Female Male   Female Male   Female Male  
## Levels: Female Male

Factors

levels(SexFac1)

## [1] "Female" "Male"

table(SexFac1)

## SexFac1
## Female   Male 
##      3      3

Factors

SexFac1 # levels are ordered alphabetically - 1st level = BaseLevel

## [1] Female Male   Female Male   Female Male  
## Levels: Female Male

SexFac2 <- factor(Sex, levels = c("Male", "Female"))
SexFac1

## [1] Female Male   Female Male   Female Male  
## Levels: Female Male

SexFac2

## [1] Female Male   Female Male   Female Male  
## Levels: Male Female

Lists

Think of a list as a vector with the following main differences:
- Each element of a list can have its own data structure regardless of other elements
  - vector, matrix, another list
- This means, each element can be of a different data type and a different length

myVec <- c(10, "R", 5L, T)
myVec

## [1] "10"   "R"    "5"    "TRUE"

Lists

myList <- list(10, "R", 5L, T)
myList

## [[1]]
## [1] 10
## 
## [[2]]
## [1] "R"
## 
## [[3]]
## [1] 5
## 
## [[4]]
## [1] TRUE

Elements of a list are shown with [[]]
Elements of a vector are shown with []

Data Frames

A data frame is a special list where all objects have equal length
A data frame looks very similar to a matrix; however, different columns in a data frame can be different data types

studentID <- paste("S#", sample(c(6473:7392), 10), sep = "")
score <- sample(c(0:100), 10)
sex <- sample(c("female", "male"), 10, replace = TRUE)
data <- data.frame(studentID = studentID, score = score, sex = sex)
head(data)

##   studentID score    sex
## 1    S#7019    40 female
## 2    S#6968     9   male
## 3    S#7025    14   male
## 4    S#6972    73 female
## 5    S#7320    78   male
## 6    S#7279    79 female

Special Values

There are some special values in R:

Use L to refer to an integer value, e.g., 1L
R knows infinity: Inf, -Inf
NaN: refers to “Not a number”

intVec <- c(1L, 2L, 3L, 4L) 
intVec

## [1] 1 2 3 4

typeof(intVec)

## [1] "integer"

intVec*Inf

## [1] Inf Inf Inf Inf

a <- Inf; b <- 0
rslt <- c(b/a, a/a)
rslt

## [1]   0 NaN

Missing Values

There are two kinds of missing values in R:
- NaN: stands for “Not a Number” and is a missing value produced by numerical computation.
- NA: stands for “Not Available” and is used when a value is missing
NaN is also considered as NA (the reverse is NOT true)

a <- c(1,2)
a[3]

## [1] NA

b <- 0/0
b

## [1] NaN

Data Type Coercion

In general, vectors CANNOT have mixed types of objects

numCharVec <- c(3.14, "a")
numCharVec                 # What do you expect to be printed? 

numLogVec <- c(pi, T)
numLogVec                   

charLogVec <- c("a", TRUE)
charLogVec

The above are examples of implicit coercion
Explicit coercion is also possible

Data Type Coercion

as(): explicitly coerces objects from one type to another

numVec <- seq(from = 1200, to = 1300, by = 15)
numVec

## [1] 1200 1215 1230 1245 1260 1275 1290

numToChar <- as(numVec, "character")
numToChar

## [1] "1200" "1215" "1230" "1245" "1260" "1275" "1290"

numToChar==as.character(numVec)

## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

logVec <- c(F, T, F, T, T)
as(logVec, "numeric")

## [1] 0 1 0 1 1

as.numeric(logVec)

## [1] 0 1 0 1 1

Data Type Coercion

Coercion does not always work! Be careful about warnings:

charVec <- c("2.5", "3", "2.8", "1.5", "zero")
as(charVec, "numeric")

## Warning in asMethod(object): NAs introduced by coercion

## [1] 2.5 3.0 2.8 1.5  NA

charVec <- c("2.5", "3", "2.8", "1.5", zero)

## Error in eval(expr, envir, enclos): object 'zero' not found

Troubleshooting

Troubleshooting is a vital skill for anyone using a programming language
Assume things will always break.

Try to replicate the error. If you know what types of input will cause an error and which types won’t, this is a clue.
Narrow down on where the error is occuring. This typically involves running chunks of code line-by-line or block-by-block.
Try fixing the error.
Google
- copy and paste your error/warning message
- CRAN (Comprehensive R Archive Network)
- RDocumentation
- Stack Exchange/Overflow

Summary

There are 5 atomic data types in R. Logical, integer, numeric, complex, character.
Vectors and matrices are used to store data of the same type.
Lists and data frames are used to store data of different types.
R can handle infinity (Inf), missing values (NA), and division by 0 (NaN)
Different data types inside of a vector or matrix will be coerced to the most flexible.

Data Structures

CSULB Intro to R

Agenda

Data Types in R

Data Structures in R

Vectors in R

Examples of Vectors

Matrices

Matrices

A matrix is a special case of a vector

Other Ways to Create a Matrix

Other Ways to Create a Matrix

Factors

Factors

Factors

Lists

Lists

Data Frames

Special Values

Missing Values

Data Type Coercion

Data Type Coercion

Data Type Coercion

Troubleshooting

Summary

Up Next