This code uses a dataset file with population estimates by the US Census Bureau (more info).

tbl <- read.table(file.choose(),header=TRUE,sep=",")
population <- tbl["POPESTIMATE2009"]
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
  544300  1734000  4141000  5980000  6613000 36960000 

Reading a CSV file

read.table can read a variety of basic data formats into tables or "data frames".
sep specifies the separator for the data, which is a comma for CSV files.
header indicates whether the first row contains the names of the data columns.

The first argument contains the file name. In this case file.choose is used to show a dialog.

(The user's home folder is the default working directory in RStudio.)

Indexing data frames

See the R documentation for more information.

Getting a specific column

You can use the column name as a string in brackets: tbl["POPESTIMATE2009"]:

1        307006550
2         55283679
3         66836911
Using the column number also works: tbl[17].

Getting a column as a list

You can use the dollar sign for this: tbl$POPESTIMATE2009

[1] 307006550  55283679  66836911 113317879  71568081   4708708    698473
[8]   6595778   2889450  36961664   5024748   3518288    885122    599657

Fetching specific rows and columns

Here the table will be treated as a 2-dimensional matrix.
To get the first 5 rows from the population table:

population[1:5,]  #  first the rows, then the columns
[1] 307006550  55283679  66836911 113317879  71568081

The comma after the row information indicates that we want all columns. In this case we could also have written [1:5,1] because we only have 1 column in population.

Look at this data from the first 5 rows in the population column:

[1] 307006550  55283679  66836911 113317879  71568081

These are too big to be population values for US States. They are the total US population and that of the US Census Bureau regions: Northeast, Midwest, South and West.
Since we are only interested in the states we can drop them like this:


Negative numbers in matrix indices can be used to omit specific rows or columns.

A short equivalent of the code

You can also fetch the population column at the same time as you remove the multi-state rows. Replace

population <- tbl["POPESTIMATE2009"]

The summary function

summary calculates a few values based on the data passed as the first argument. The exact values calculated depend on the class of the data.

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    3.25    5.50    5.50    7.75   10.00