Press "Enter" to skip to content

How to remove NA values in R?

Zigya Acadmey 0

Removing NA values in a vector

Lets create a vector containing NA values before removing NA values.

# Create a vector list with NA
> list1 <- c(10, 20, NA, 30, NA, 50)
> list1
[1] 10 20 NA 30 NA 50

As you can see based on the output of the RStudio console, our example vectors contain four numeric values and two NAs. Let’s remove these NAs…

For this we can simply create a new vector without any NA values in R. Using is.na function which will get all NA values from the vector.

# Create a new vector without NA
> list2 <- list1[!is.na(list1)]
> list2
[1] 10 20 30 50

Another possibility is the removal of NA values within a function by using the na.rm argument.

if we want to exclude missing values from mathematical operations use the na.rm = TRUE argument. If you do not exclude these values most functions will return an NA.

# A vector with NA values
> list1 <- c(10, 20, NA, 30, NA, 50)

# including NA values will produce an NA output
> mean(list1)
[1] NA
> sum(list1)
[1] NA

# excluding NA values will calculate the 
# mathematical operation for all non-missing values
> mean(list1, na.rm=TRUE)
[1] 27.5
> sum(list1, na.rm=TRUE)
[1] 110

Removing NA values in a Data Frame

Another useful application of subsetting data frames is to find and remove rows with missing data. The R function to check for this is complete.cases(). You can try this on the built-in dataset airquality, a data frame with a fair amount of missing data:

First let’s check the structure of airquality dataset.

# airquality dataset
> str(airquality)
'data.frame':   153 obs. of  6 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

The results of complete.cases() is a logical vector with the value TRUE for rows that are complete, and FALSE for rows that have some NA values.

> complete.cases(airquality)
  [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
 [13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [25] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
 [37] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE
 [49]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [61] FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
 [73]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
 [85]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
 [97] FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE
[109]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE
[121]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[133]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[145]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
# subset with complete.cases to get complete cases
> airquality[complete.cases(airquality), ]
    Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
2      36     118  8.0   72     5   2
3      12     149 12.6   74     5   3
4      18     313 11.5   62     5   4
7      23     299  8.6   65     5   7
8      19      99 13.8   59     5   8
...

# or subset with `!` operator to get incomplete case
> airquality[!complete.cases(airquality), ]
    Ozone Solar.R Wind Temp Month Day
5      NA      NA 14.3   56     5   5
6      28      NA 14.9   66     5   6
10     NA     194  8.6   69     5  10
11      7      NA  6.9   74     5  11
25     NA      66 16.6   57     5  25
26     NA     266 14.9   58     5  26
27     NA      NA  8.0   57     5  27
32     NA     286  8.6   78     6   1
...

A shorthand alternative is to simply use na.omit() to omit all rows containing missing values.
As always with R, there is more than one way of achieving your goal. In this case, you can make use of na.omit() to omit all rows that contain NA values:

# or use na.omit() to get same as above with complete case
> na.omit(airquality) 

Conclusion

We covered, how to deal with the Missing values in Vector with is.na and na.rm in a function. Also how to handle NA values in a Data Frame with complete.cases() and na.omit() function.

This brings the end of this Blog. We really appreciate your time.

Hope you liked it.

Do visit our page www.zigya.com/blog for more informative blogs on Data Science

Keep Reading! Cheers!

Zigya Academy
BEING RELEVANT

Leave a Reply

Your email address will not be published. Required fields are marked *