Subsetting

CSULB Intro to R

April 13, 2018

Subsetting

  1. [] single brackets return an object of the same class of the original object (typically vectors, matrices)
  2. [[]] double brackets are used to subset lists and dataframes by an (numeric) index
  3. $ used to subset lists and dataframes by their name attributes

Subsetting Examples

myVec <- 1:10
myVec[3]
## [1] 3
myList <- list(obj1 = "a", obj2 = 10, obj3 = T)
myList[[3]]
## [1] TRUE
myList$obj3
## [1] TRUE

Subsetting with []

By using single brackets, we can choose more than one element of an object

x <- seq(from=0, to=100, by=10) 
x
##  [1]   0  10  20  30  40  50  60  70  80  90 100
x[1]  # select the first element
## [1] 0
x[10]  # select the 10th element
## [1] 90
x[1:3]  # select the first, second, and third elements
## [1]  0 10 20

Subsetting with [] - Index Vectors

x <- seq(from=0, to=100,by=10) 
x
##  [1]   0  10  20  30  40  50  60  70  80  90 100
x[c(2,4,6)]  # select the second, fourth, and six elements
## [1] 10 30 50
IndVec <- c(1, 2, 3, 4, 5)       # index vector to select the first 5 elements 
x[IndVec]
## [1]  0 10 20 30 40

Index Vectors

There are four types of index vectors:

  1. Vector of positive integers
  2. Vector of negative integers
  3. Vector of character strings
  4. Logical index vector

Example

set.seed(1234)  # so that everyone has the same grades
grades <- sample(0:100, 10)  # randomly choose 10 numbers between 0 and 100
attributes(grades)
## NULL
names(grades) <- letters[1:10]
attributes(grades)
## $names
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
grades
##  a  b  c  d  e  f  g  h  i  j 
## 11 62 60 61 83 97  0 21 95 47

1. Index Vector of Positive Integers

posIndVec <- 4:7
posIndVec
## [1] 4 5 6 7
grades[posIndVec]
##  d  e  f  g 
## 61 83 97  0

2. Index Vector of Negative Integers

A vector of negative integers indicates the values to be excluded from the vector

negIndVec <- -1:-5
negIndVec
## [1] -1 -2 -3 -4 -5
grades[negIndVec]
##  f  g  h  i  j 
## 97  0 21 95 47

3. Vector of Character Strings

If a vector has a name attribute, we can take a subset of the vector by calling the names of the elements

chIndVec <- c("a")
chIndVec
## [1] "a"
grades[chIndVec]
##  a 
## 11

4. Logical Index Vector

logIndVec <- rep(c(T, F), each = 5)
logIndVec
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
grades[logIndVec]
##  a  b  c  d  e 
## 11 62 60 61 83

4. Logical Index Vector

Logical index vectors can also be generated by using conditional statements and operators including ==, !=, <, and >

logIndVec <- grades > 60
logIndVec
##     a     b     c     d     e     f     g     h     i     j 
## FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE
grades[logIndVec]
##  b  d  e  f  i 
## 62 61 83 97 95
grades[grades < 40]
##  a  g  h 
## 11  0 21

Subsetting Matrices

myMat <- matrix(1:8, ncol = 4)
myMat
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8

Subsetting Matrices

myMat[1,1]            # retrieve the element in the first row, first column
## [1] 1
myMat[2,]             # retrieve the second row
## [1] 2 4 6 8
myMat[,-3]             # remove the third column
##      [,1] [,2] [,3]
## [1,]    1    3    7
## [2,]    2    4    8

Subsetting Matrices

By default, when the retrieved elements of a matrix look like a vector, R drops their dimension attribute. We can turn this feature off by setting drop = FALSE

myMat[1,1]
## [1] 1
myMat[1,1, drop = FALSE]
##      [,1]
## [1,]    1
myMat[2,]
## [1] 2 4 6 8
myMat[2,, drop = FALSE]
##      [,1] [,2] [,3] [,4]
## [1,]    2    4    6    8

Subsetting Lists

myList <- list(ID = paste("ID", sample(c(100:199), 3), sep = ""), 
               Age = sample(c(18:99), 3), 
               Sex = sample(c("M", "F"), 3, replace = TRUE))
myList
## $ID
## [1] "ID169" "ID153" "ID127"
## 
## $Age
## [1] 93 41 84
## 
## $Sex
## [1] "M" "M" "M"
myList[1]  # subset is still a list
## $ID
## [1] "ID169" "ID153" "ID127"

Subsetting Lists

myList[1:2]  # return the first two objects; subset is still a list
## $ID
## [1] "ID169" "ID153" "ID127"
## 
## $Age
## [1] 93 41 84
myList[[1]]  # return the 1st object; subset is a character vector
## [1] "ID169" "ID153" "ID127"
myList$ID  # alternative to [[]]
## [1] "ID169" "ID153" "ID127"

Subsetting Lists

myList[[1]][2] # return the 2nd element of the 1st object
## [1] "ID153"
myList$ID[2]
## [1] "ID153"
myList[[c(1,2)]]
## [1] "ID153"

Subsetting Data Frames

studentID <- paste("S#", sample(c(6473:7392), 10), sep = "")
score <- sample(c(0:100), 10)
sex <- sample(c("female", "male"), 10, replace = TRUE)
data <- data.frame(studentID = studentID, score = score, sex = sex)
str(data)
## 'data.frame':    10 obs. of  3 variables:
##  $ studentID: Factor w/ 10 levels "S#6509","S#6618",..: 4 6 5 2 1 3 8 7 10 9
##  $ score    : int  4 45 26 29 49 17 72 18 24 91
##  $ sex      : Factor w/ 2 levels "female","male": 2 2 2 1 2 1 2 2 1 1
data
##    studentID score    sex
## 1     S#6686     4   male
## 2     S#6763    45   male
## 3     S#6750    26   male
## 4     S#6618    29 female
## 5     S#6509    49   male
## 6     S#6673    17 female
## 7     S#7213    72   male
## 8     S#6952    18   male
## 9     S#7307    24 female
## 10    S#7230    91 female

Subsetting Data Frames

data[1:8,]
##   studentID score    sex
## 1    S#6686     4   male
## 2    S#6763    45   male
## 3    S#6750    26   male
## 4    S#6618    29 female
## 5    S#6509    49   male
## 6    S#6673    17 female
## 7    S#7213    72   male
## 8    S#6952    18   male

Subsetting Data Frames

head(data$sex)
## [1] male   male   male   female male   female
## Levels: female male
head(data[,c("studentID", "score")])
##   studentID score
## 1    S#6686     4
## 2    S#6763    45
## 3    S#6750    26
## 4    S#6618    29
## 5    S#6509    49
## 6    S#6673    17

Up Next