People remain uncertain when it comes to summarizing actual data easily in R. There are a variety of choices. So who is the right one? I addressed the query below. At first, you must pick one. And become an expert on this. That’s how you’re going to switch to the next.

In this article, I will discuss the primary methods of summarizing data sets. Let’s hope this makes the trip much smoother than it seems.

## apply()

Apply function returns a vector or array or a list of values achieved by applying a function to rows or columns. This is the easiest of all the tasks that can do this work. However, this feature is very unique to either row or column collapsing.

### Usage

``> apply(X, MARGIN, FUN, …)``

### Example

``````# Create a matrix
> mat <- matrix(c(1:20), nrow = 5, ncol=4)
> mat
[,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

# 2 indicates columns
> apply(mat, 2, mean)
[1]  3  8 13 18

# 1 indicates rows
> apply(mat, 1, mean)
[1]  8.5  9.5 10.5 11.5 12.5``````

## lapply()

`lapply()` function is useful for performing operations on list objects and returns a list object of the same length as the original set. `lappy()` returns a list of a similar length as the input list object, each element of which is the result of applying FUN to the corresponding element of the list. `lapply()` takes list, vector, or data frame as input and gives output in a list.

### Usage

``> lapply(X, FUN, …)``

### Arguments

l in lapply() stands for list. The difference between lapply() and apply() lies between the output return. The output of lapply() is a list. lapply() can be used for other objects like data frames and lists.

lapply() function does not need MARGIN.

A very easy example can be to change the string value of a matrix to lower case with tolower function. We construct a matrix with the name of the famous movies. The name is in upper case format.

### Example

``````> month <- month.abb
> month
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

> lower_month <- lapply(month,tolower)
> str(lower_month)
List of 12
\$ : chr "jan"
\$ : chr "feb"
\$ : chr "mar"
\$ : chr "apr"
\$ : chr "may"
\$ : chr "jun"
\$ : chr "jul"
\$ : chr "aug"
\$ : chr "sep"
\$ : chr "oct"
\$ : chr "nov"
\$ : chr "dec"
``````

## sapply()

`sapply()` function takes a list, vector, or data frame as input and gives output in vector or matrix. It is useful for operations on list objects and returns a list object of the same length as the original set. `sapply()` function does the same job as `lapply()` function but returns a vector.

## Usage

``> sapply(X, FUN)``

### Arguments

We can measure the minimum speed and stopping distances of cars from the cars dataset.

### Example

``````# Let's load car dataset
> dt <- cars
> lmn_cars <- lapply(dt, min)
> smn_cars <- sapply(dt, min)

> lmn_cars
\$speed
[1] 4

\$dist
[1] 2

> smn_cars
speed  dist
4     2
``````

We can summarize the difference between `apply()`, `sapply()` and `lapply()` in the following table:

## tapply()

Till now, all the function we discussed cannot do what Sql can achieve. Here is a function which completes the palette for R. Usage is “tapply(X, INDEX, FUN = NULL, …, simplify = TRUE)”, where X is “an atomic object, typically a vector” and INDEX is “a list of factors, each of same length as X”. Here is an example which will make the usage clear.

### Usage

``> tapply(X, INDEX, FUN = NULL, …, default = NA, simplify = TRUE)``

### Example

``````> df <- iris
> tp <- tapply(df\$Petal.Length, df\$Species, mean)
> tp
setosa versicolor  virginica
1.462      4.260      5.552
>``````

by()

Now comes a slightly more complicated algorithm. Function ‘by’ is an object-oriented wrapper for ‘tapply’ applied to data frames. Hopefully the example will make it more clear.

### Usage

``> by(data, INDICES, FUN, …, simplify = TRUE)``

### Example

``````> df <- iris
> mean_col <- by(df[,1:4], df\$Species, colMeans)
df\$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
5.006        3.428        1.462        0.246
------------------------------------------------------------
df\$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
5.936        2.770        4.260        1.326
------------------------------------------------------------
df\$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
6.588        2.974        5.552        2.026
``````

### Conclusion

Hence, we saw functions that can help for summarizing data in R. Functions like `by()`, `apply()`, `sapply()`, `tapply()` and `lapply()` with definition and the usage along with an example for each.

This brings the end of this Blog. We really appreciate your time.

Hope you liked it.