Press "Enter" to skip to content

What is a Data Frame in R?

Zigya Acadmey 0

First of all, we are going to discuss from where the concept of a data frame came. The origin of data frames stems from intensive empirical research in the world of statistical software. The tabular data is referred by the data frames. In particular, it is a data structure in R that represents cases in which there are a number of observations(rows) or measurements (columns).

A data frame is being used for storing data tables, the vectors that are contained in the form of a list in a data frame are of equal length.

Characteristics of R Data Frame

Now, let’s discuss the characteristics of data frame in R.

  • The column names should be non-empty.
  • The row names should be unique.
  • The data frame can hold the data which can be numeric, character, or factor type.
  • Each column should contain the same number of data items.

Create Data Frame

# Create the data frame.
df <- data.frame(
   id = c (1:4), 
   name = c("Sam","Dan","Zack","Ryan"),
   age = c(62,51,61,72), 

   stringsAsFactors = FALSE
)
# Print the data frame.			
print(df)

Output of the above code

     id     name        age     
1     1     Sam         62
2     2     Dan         51
3     3     Zack        61
4     4     Ryan        72

Get the Structure of the R Data Frame

The structure of the data frame can be seen by using str() function.

# To get the structure of a data frame
> str(df)

'data.frame':	4 obs. of  3 variables:
 $ id  : int  1 2 3 4
 $ name: chr  "Sam" "Dan" "Zack" "Ryan"
 $ age : num  62 51 61 72

Operations on Data Frame

Extract the first two columns

# First two columns of the data frame
> two_col <- data.frame(df$id, df$name)
> print(two_col)

Output of the above code

     id     name            
1     1     Sam         
2     2     Dan         
3     3     Zack        
4     4     Ryan        

Extract the first two rows and then all columns

# Extract first two rows. with all columns
2_row_all_col <- df[1:2,]
print(2_row_all_col)

Output of the above code

     id     name        age     
1     1     Sam         62
2     2     Dan         51

Extract 3rd and 4th row with 2nd and 3th column

# Extract first two rows.
output <- df[c(3:4),c(2:3)]
print(output)

Output of the above code

      ame        age     
3     Zack        61
4     Ryan        72

Summary of Data in Data Frame

The statistical summary and nature of the data can be obtained by applying summary() function.

# To get the summary of any data
> summ <- summary(df)
> print(summ)

Output of the above code

       id           name                age      
 Min.   :1.00   Length:4           Min.   :51.0  
 1st Qu.:1.75   Class :character   1st Qu.:58.5  
 Median :2.50   Mode  :character   Median :61.5  
 Mean   :2.50                      Mean   :61.5  
 3rd Qu.:3.25                      3rd Qu.:64.5  
 Max.   :4.00                      Max.   :72.0

Built-in Data Frame

For our tutorials, we will use built-in data frames in R. For example, here is a built-in data frame in R, called mtcars.

Motor Trend Car Road Tests

Description

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

Usage

mtcars

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
.....

The top line of the table, called the header, contains the column names. Each horizontal line afterward denotes a data row, which begins with the name of the row, and then followed by the actual data. Each data member of a row is called a cell.

To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket “[]” operator. The two coordinates are separated by a comma. In other words, the coordinates begin with row position, then followed by a comma, and ends with the column position. The order is important.

To access the value of the 1st column and 4st row

> mtcars[1, 4] 
[1] 110

Moreover, we can use the row and column names instead of the numeric coordinates.

> mtcars["Honda Civic", "mpg"]
[1] 30.4

To know more about the dataset type help() command.

> help(mtcars)

And to get a list of all available dataset present in R

> data()

Output

Data sets in package 'datasets':

AirPassengers           Monthly Airline Passenger Numbers 1949-1960
BJsales                 Sales Data with Leading Indicator
BJsales.lead (BJsales)
                        Sales Data with Leading Indicator
BOD                     Biochemical Oxygen Demand
CO2                     Carbon Dioxide Uptake in Grass Plants
ChickWeight             Weight versus age of chicks on different diets
DNase                   Elisa assay of DNase
EuStockMarkets          Daily Closing Prices of Major European Stock
                        Indices, 1991-1998
Formaldehyde            Determination of Formaldehyde
HairEyeColor            Hair and Eye Color of Statistics Students
Harman23.cor            Harman Example 2.3
Harman74.cor            Harman Example 7.4
mtcars                  Motor Trend Car Road Tests
...

Conclusion

So, we have learned about the data frame along with its characteristics in detail. Also, we have discussed the different operations of a data frame. With the help of the above-mentioned information, it is easier to understand how to expand the data frame as we have included examples of it.

This brings the end of this Blog. We really appreciate your time.

Hope you liked it.

Do visit our page www.zigya.com/blog for more informative blogs on Data Science

Keep Reading! Cheers!

Zigya Academy
BEING RELEVANT

Leave a Reply

Your email address will not be published. Required fields are marked *