R’s subsetting operators are powerful and fast. Mastery of subsetting allows you to succinctly express complex operations in a way that few other languages can match.
As an illustration in this articles we will cover these topics.
- The three subsetting operators,
- The six types of subsetting,
- Important difference in subsetting behavior for different objects.
- Using subsetting in conjunction with the assignment.
Subsetting atomic vectors
> x <- c(1.4, 2.2, 3.0, 4.5, 5.2, 6.9, 7.6, 8.1, 9.5, 10.0)
We can subset this in 5 ways
Positive integers return elements at the specified positions:
> x[c(1)]
[1] 1.4
> x[c(5,6,2)]
[1] 5.2 6.9 2.2
# Duplicated indices yield duplicated values
> x[c(1,1)]
[1] 1.4 1.4
# Real numbers are silently truncated to integers
> x[c(7.1, 7.9, 7.5)]
[1] 7.6 7.6 7.6
Negative integers omit elements at the specified positions:
# skip the first element
> x[-1]
[1] 2.2 3.0 4.5 5.2 6.9 7.6 8.1 9.5 10.0
# skip the 3rd, 5th, and 7th
> x[-c(3, 5, 7)]
[1] 1.4 2.2 4.5 6.9 8.1 9.5 10.0
You can’t mix positive and negative integers in a single subset.
> x[c(-1, 4)]
Error in x[c(-1, 4)] : only 0's may be mixed with negative subscripts
Logical vectors select elements where the corresponding logical value is TRUE
. Hence this is probably the most useful type of subsetting because you write the expression that creates the logical vector.
# Logical values will be assign from the start and will repeat until it reaches the
# last element of the list and will return all TRUE values.
> x[c(TRUE, TRUE, FALSE, FALSE)]
[1] 1.4 2.2 5.2 6.9 9.5 10.0
# Can also be based on condition
> x[ x > 5]
[1] 5.2 6.9 7.6 8.1 9.5 10.0
> x[which.min(x)]
[1] 1.4
> x[which.max(x)]
[1] 10
Nothing returns the original vector. This is not useful for vectors but is very useful for matrices, data frames, and arrays. It can also be useful in conjunction with assignment.
> x[]
[1] 1.4 2.2 3.0 4.5 5.2 6.9 7.6 8.1 9.5 10.0
Zero returns a zero-length vector. This is not something you usually do on purpose, but it can be helpful for generating test data.
> x[0]
numeric(0)
Subsetting lists
In the same way as subsetting an atomic vector. Subsetting a list with [ will always return a list: [[
and $
, as described below, let you pull out the components of the list.
> x <- as.list(1:10)
> x[1:4]
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
Also, to extract individual elements in a list, use [[
operator
# to get element 5
> x[[2]]
[1] 2
> class(x[[2]])
[1] "integer"
# Using name
> names(x) <- letters[1:10]
> x$a
[1] 1
> x[c('a', 'b')]
$a
[1] 1
$b
[1] 2
Subsetting matrices
A matrix is a subset with two arguments within single brackets, [], and separated by a comma. So, the first argument specifies the rows and the second the columns.
# Create a matrix
> mat <- matrix(1:9, nrow = 3)
> colnames(mat) <- LETTERS[1:3]
> mat
A B C
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> mat[1:3,'A']
[1] 1 2 3
> mat[1:3,'C']
[1] 7 8 9
> mat[1:3,]
A B C
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Subsetting data frames
Also, Data frames possess the characteristics of both lists and matrices: if you subset with a single vector, they behave like lists; if you subset with two vectors, therefore they behave like matrices.
> df <- data.frame(x = 1:3, y = 3:1, z = letters[1:3])
# to get the row of the column where the values is 2
> df[df$x == 2, ]
x y z
2 2 2 b
# There are two ways to select a columns from data frame
# As list
> df[c('x', 'z')]
x z
1 1 a
2 2 b
3 3 c
# As a matrix
> df[, c('x', 'z')]
x z
1 1 a
2 2 b
3 3 c
> str(df)
'data.frame': 3 obs. of 3 variables:
$ x: int 1 2 3
$ y: int 3 2 1
$ z: chr "a" "b" "c"
Conclusion
Hence, we saw how to subset an atomic vector, list, matrix, and data frame. Also saw how to access elements in each of those data structures.
This brings the end of this Blog. We really appreciate your time.
Hope you liked it.
Do visit our page www.zigya.com/blog for more informative blogs on Data Science
Keep Reading! Cheers!
Zigya Academy
BEING RELEVANT