This code uses a dataset file with population estimates by the US Census Bureau (more info).
tbl <- read.table(file.choose(),header=TRUE,sep=',') population <- tbl[c("NAME","POPESTIMATE2009","NPOPCHG_2009")] smallest.state.pop <- min(population$POPESTIMATE2009) print(population[population$POPESTIMATE2009==smallest.state.pop,])
NAME POPESTIMATE2009 NPOPCHG_2009 56 Wyoming 544270 11289
This piece of code extracts the data about the smallest state from the data frame.
The first line has reads the data from the CSV file (as explained here).
Picking specific columns out of a data frame
The second line limits the rows to the state name, the population estimate for 2009 and the total population change for 2009.
Let's use the head function to look at what we get:
head(population)
NAME POPESTIMATE2009 NPOPCHG_2009 1 United States 307006550 2631704 2 Northeast 55283679 223483 3 Midwest 66836911 241314 4 South 113317879 1296857 5 West 71568081 870050 6 Alabama 4708708 31244
Finding the lowest value in a list
First the POPESTIMATE2009 column is selected:
population$POPESTIMATE2009
[1] 307006550 55283679 66836911 113317879 71568081 4708708 698473 [8] 6595778 2889450 36961664 5024748 3518288 885122 599657 [...] [50] 2784572 621760 7882590 6664195 1819777 5654774 544270 [57] 3967288
Then the min function is used to find the minimum:
min(population$POPESTIMATE2009)
[1] 544270
Selecting the row with the lowest population value
You use something like a WHERE clause in data frame indices:
data.frame[condition]This condition works because it creates an array of booleans depending on whether the field value is a match:
population$POPESTIMATE2009==smallest.state.pop
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
In this case only the second to last row should be selected. We use a comma after the row index because we want all the columns:
population[population$POPESTIMATE2009==smallest.state.pop,]
NAME POPESTIMATE2009 NPOPCHG_2009 56 Wyoming 544270 11289