Subsetting

Sometimes, we want to take a subset of a vector, matrix, list, or data frame
Three main operators to take a subset of an object:

[] single brackets return an object of the same class of the original object (typically vectors, matrices)
[[]] double brackets are used to subset lists and dataframes by an (numeric) index
$ used to subset lists and dataframes by their name attributes

[] allows us to select more than one element
[[]] and $ allow us to select only one (though this one may be a structure with multiple elements)

Subsetting Examples

myVec <- 1:10
myVec[3]

## [1] 3

myList <- list(obj1 = "a", obj2 = 10, obj3 = T)
myList[[3]]

## [1] TRUE

myList$obj3

## [1] TRUE

Subsetting with `[]`

By using single brackets, we can choose more than one element of an object

x <- seq(from=0, to=100, by=10) 
x

##  [1]   0  10  20  30  40  50  60  70  80  90 100

x[1]  # select the first element

## [1] 0

x[10]  # select the 10th element

## [1] 90

x[1:3]  # select the first, second, and third elements

## [1]  0 10 20

Subsetting with `[]` - Index Vectors

Another way to select more than one element from an object is by using index vectors
An index vector is a vector of indices that is used to select a subset of another vector (or matrix)

x <- seq(from=0, to=100,by=10) 
x

##  [1]   0  10  20  30  40  50  60  70  80  90 100

x[c(2,4,6)]  # select the second, fourth, and six elements

## [1] 10 30 50

IndVec <- c(1, 2, 3, 4, 5)       # index vector to select the first 5 elements 
x[IndVec]

## [1]  0 10 20 30 40

Index Vectors

There are four types of index vectors:

Vector of positive integers
Vector of negative integers
Vector of character strings
Logical index vector

Example

Suppose we have grades of ten students

set.seed(1234)  # so that everyone has the same grades
grades <- sample(0:100, 10)  # randomly choose 10 numbers between 0 and 100
attributes(grades)

## NULL

names(grades) <- letters[1:10]
attributes(grades)

## $names
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

grades

##  a  b  c  d  e  f  g  h  i  j 
## 11 62 60 61 83 97  0 21 95 47

We will explore the different ways to subset this vector using index vectors

1. Index Vector of Positive Integers

A vector of positive integers corresponding to the elements you want to subset
All of the values in this type of index vector must lie in 1:length(x)

posIndVec <- 4:7
posIndVec

## [1] 4 5 6 7

grades[posIndVec]

##  d  e  f  g 
## 61 83 97  0

2. Index Vector of Negative Integers

A vector of negative integers indicates the values to be excluded from the vector

negIndVec <- -1:-5
negIndVec

## [1] -1 -2 -3 -4 -5

grades[negIndVec]

##  f  g  h  i  j 
## 97  0 21 95 47

3. Vector of Character Strings

If a vector has a name attribute, we can take a subset of the vector by calling the names of the elements

chIndVec <- c("a")
chIndVec

## [1] "a"

grades[chIndVec]

##  a 
## 11

4. Logical Index Vector

A vector of TRUE/FALSE values that should be the same length as the vector from which we are subsetting.
Values corresponding to TRUE in the index vector are selected

logIndVec <- rep(c(T, F), each = 5)
logIndVec

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

grades[logIndVec]

##  a  b  c  d  e 
## 11 62 60 61 83

4. Logical Index Vector

Logical index vectors can also be generated by using conditional statements and operators including ==, !=, <, and >

logIndVec <- grades > 60
logIndVec

##     a     b     c     d     e     f     g     h     i     j 
## FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE

grades[logIndVec]

##  b  d  e  f  i 
## 62 61 83 97 95

grades[grades < 40]

##  a  g  h 
## 11  0 21

Subsetting Matrices

We also use the single square brackets to subset matrices
In the square brackets, the first position refers to the row(s) and the second position refers to the column(s)

myMat <- matrix(1:8, ncol = 4)
myMat

##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8

Let’s go over the various ways to subset this matrix

Subsetting Matrices

myMat[1,1]            # retrieve the element in the first row, first column

## [1] 1

myMat[2,]             # retrieve the second row

## [1] 2 4 6 8

myMat[,-3]             # remove the third column

##      [,1] [,2] [,3]
## [1,]    1    3    7
## [2,]    2    4    8

Subsetting Matrices

By default, when the retrieved elements of a matrix look like a vector, R drops their dimension attribute. We can turn this feature off by setting drop = FALSE

myMat[1,1]

## [1] 1

myMat[1,1, drop = FALSE]

##      [,1]
## [1,]    1

myMat[2,]

## [1] 2 4 6 8

myMat[2,, drop = FALSE]

##      [,1] [,2] [,3] [,4]
## [1,]    2    4    6    8

Subsetting Lists

myList <- list(ID = paste("ID", sample(c(100:199), 3), sep = ""), 
               Age = sample(c(18:99), 3), 
               Sex = sample(c("M", "F"), 3, replace = TRUE))
myList

## $ID
## [1] "ID169" "ID153" "ID127"
## 
## $Age
## [1] 93 41 84
## 
## $Sex
## [1] "M" "M" "M"

myList[1]  # subset is still a list

## $ID
## [1] "ID169" "ID153" "ID127"

Subsetting Lists

myList[1:2]  # return the first two objects; subset is still a list

## $ID
## [1] "ID169" "ID153" "ID127"
## 
## $Age
## [1] 93 41 84

myList[[1]]  # return the 1st object; subset is a character vector

## [1] "ID169" "ID153" "ID127"

myList$ID  # alternative to [[]]

## [1] "ID169" "ID153" "ID127"

Subsetting Lists

myList[[1]][2] # return the 2nd element of the 1st object

## [1] "ID153"

myList$ID[2]

## [1] "ID153"

myList[[c(1,2)]]

## [1] "ID153"

Subsetting Data Frames

studentID <- paste("S#", sample(c(6473:7392), 10), sep = "")
score <- sample(c(0:100), 10)
sex <- sample(c("female", "male"), 10, replace = TRUE)
data <- data.frame(studentID = studentID, score = score, sex = sex)
str(data)

## 'data.frame':    10 obs. of  3 variables:
##  $ studentID: Factor w/ 10 levels "S#6509","S#6618",..: 4 6 5 2 1 3 8 7 10 9
##  $ score    : int  4 45 26 29 49 17 72 18 24 91
##  $ sex      : Factor w/ 2 levels "female","male": 2 2 2 1 2 1 2 2 1 1

data

##    studentID score    sex
## 1     S#6686     4   male
## 2     S#6763    45   male
## 3     S#6750    26   male
## 4     S#6618    29 female
## 5     S#6509    49   male
## 6     S#6673    17 female
## 7     S#7213    72   male
## 8     S#6952    18   male
## 9     S#7307    24 female
## 10    S#7230    91 female

Subsetting Data Frames

data[1:8,]

##   studentID score    sex
## 1    S#6686     4   male
## 2    S#6763    45   male
## 3    S#6750    26   male
## 4    S#6618    29 female
## 5    S#6509    49   male
## 6    S#6673    17 female
## 7    S#7213    72   male
## 8    S#6952    18   male

Subsetting Data Frames

head(data$sex)

## [1] male   male   male   female male   female
## Levels: female male

head(data[,c("studentID", "score")])

##   studentID score
## 1    S#6686     4
## 2    S#6763    45
## 3    S#6750    26
## 4    S#6618    29
## 5    S#6509    49
## 6    S#6673    17

Subsetting

CSULB Intro to R

Subsetting

Subsetting Examples

Subsetting with `[]`

Subsetting with `[]` - Index Vectors

Index Vectors

Example

1. Index Vector of Positive Integers

2. Index Vector of Negative Integers

3. Vector of Character Strings

4. Logical Index Vector

4. Logical Index Vector

Subsetting Matrices

Subsetting Matrices

Subsetting Matrices

Subsetting Lists

Subsetting Lists

Subsetting Lists

Subsetting Data Frames

Subsetting Data Frames

Subsetting Data Frames

Up Next

Subsetting

CSULB Intro to R

Subsetting

Subsetting Examples

Subsetting with []

Subsetting with [] - Index Vectors

Index Vectors

Example

1. Index Vector of Positive Integers

2. Index Vector of Negative Integers

3. Vector of Character Strings

4. Logical Index Vector

4. Logical Index Vector

Subsetting Matrices

Subsetting Matrices

Subsetting Matrices

Subsetting Lists

Subsetting Lists

Subsetting Lists

Subsetting Data Frames

Subsetting Data Frames

Subsetting Data Frames

Up Next

Subsetting with `[]`

Subsetting with `[]` - Index Vectors