This code uses a dataset file with population estimates by the US Census Bureau (more info).
tbl <- read.table(file.choose(),header=TRUE,sep=",") population <- tbl["POPESTIMATE2009"] print(summary(population[-1:-5,]))
Min. 1st Qu. Median Mean 3rd Qu. Max. 544300 1734000 4141000 5980000 6613000 36960000
Reading a CSV file
read.table can read a variety of basic data formats into tables or "data frames".
sep specifies the separator for the data, which is a comma for CSV files.
header indicates whether the first row contains the names of the data columns.
The first argument contains the file name.
In this case file.choose is used to show a dialog.
Indexing data frames
See the R documentation for more information.
Getting a specific column
You can use the column name as a string in brackets: tbl["POPESTIMATE2009"]:
POPESTIMATE2009 1 307006550 2 55283679 3 66836911 [...]Using the column number also works: tbl[17].
Getting a column as a list
You can use the dollar sign for this: tbl$POPESTIMATE2009
[1] 307006550 55283679 66836911 113317879 71568081 4708708 698473 [8] 6595778 2889450 36961664 5024748 3518288 885122 599657 [...]
Fetching specific rows and columns
Here the table will be treated as a 2-dimensional matrix.
To get the first 5 rows from the population table:
population[1:5,] # first the rows, then the columns
[1] 307006550 55283679 66836911 113317879 71568081
The comma after the row information indicates that we want all columns. In this case we could also have written [1:5,1] because we only have 1 column in population.
Look at this data from the first 5 rows in the population column:
[1] 307006550 55283679 66836911 113317879 71568081
These are too big to be population values for US States.
They are the total US population and that of the US Census Bureau regions: Northeast, Midwest, South and West.
Since we are only interested in the states we can drop them like this:
population[-1:-5,]
Negative numbers in matrix indices can be used to omit specific rows or columns.
A short equivalent of the code
You can also fetch the population column at the same time as you remove the multi-state rows. Replace
population <- tbl["POPESTIMATE2009"] print(summary(population[-1:-5,]))with
print(summary(tbl[-1:-5,"POPESTIMATE2009"]))
The summary function
summary calculates a few values based on the data passed as the first argument. The exact values calculated depend on the class of the data.
summary(1:10)
Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 3.25 5.50 5.50 7.75 10.00