What Does na.omit Do in R?
When doing data analysis, inevitably, there will be cases where there is missing data in a data set. This occurs because real-world data collection cannot be as well-controlled a manner as that of a lab. When doing data analysis, this missing data needs to be taken to account and how you handle it can affect your results.
Deal with missing values in r
R deals with missing data by the use of the NA value. Because NA is not a true numerical value, it cannot be used in calculations. This means the value needs to be detected and removed from calculations. Each formula used for dataframes has a logical parameter called na.Rm, that controls this within the function. However, there are times when the row containing missing data simply needs to be removed, this is the job of the na.omit() function. These tools are designed to deal with missing values in R in that they prevent you from getting a lot of NA values when running calculations.
Detect missing values in r
One of the key problems when dealing with missing values is that it must be detected. The NA value is a big step toward this, however, one of the purposes of writing a program to help analyze data is to avoid the tedious job of finding this missing data yourself. Two functions that help with this task are is.na() which way turns a true value for every NA value it finds and na.omit() that removes any rows that contain an NA value.
na.omit in r
One way of dealing with missing data is the na.omit() which has the format of na.omit(dataframe) and simply removes any rows from the dataframe with NA values.
# na.omit in r example - cleaning up a data frame
> x=data.frame(a=c(2,3,5,8,12),b=c(3,8,NA,5,9),c=c(10,4,6,11,15),d=c(22,41,26,31,54))
> x
a b c d
1 2 3 10 22
2 3 8 4 41
3 5 NA 6 26
4 8 5 11 31
5 12 9 15 54
>
> na.omit(x)
a b c d
1 2 3 10 22
2 3 8 4 41
4 8 5 11 31
5 12 9 15 54
If you look at the results after dataframe x has gone through the na.omit() you can see that row 3 is missing. This version of the dataframe can now have calculations done on it without running into NA values.
Applications
There are many applications for na.omit in R. Anytime you are going to be dealing with real-world data the possibility of having missing data exists. While you can avoid this when you are the source of the data, this will not always be the case. So anytime you will be importing data into a program that neither you nor your program creates the na.omit() function will be a useful tool.
Having missing data is always a potential problem and the na.omit() function is one of the solutions to that problem. The details of dealing with missing data are going to vary from case to case but it does need to be dealt with.