Getting data from the Web with R and some basic functionality

We will load a dataset into R and do some manipulations in order to show some basic functionality.

The dataset can be downloaded here http://www.bls.gov/oes/current/oes_ca.htm#15-0000. This dataset is a subset of the May 2014 State Occupational Employment and Wage Estimates Report in California by Computer and Mathematical Occupations. The original dataset can be found here http://www.bls.gov/oes/current/oes_ca.htm#(8).

salaries_pic1

  1. The dataset has to be downloaded in your working directory. Getwd() funcion returns an absolute filepath representing the current working directory of the R process. To change your working directory in R you need to use setwd(dir) function or go to the File menu in the R Cosole and choose “Change dir”.
  2. Create a directory for the data
if(!file.exists("salaries")) {
   dir.create("salaries")
}
  1. Download the file
fileUrl < - "your link here"
download.file(fileUrl,destfile = "./salaries/computer.xls")
  1. Next step is to install the xlsx R package if you have not done so previously. To install xlsx, use install.packages(“xlsx”), to ensure if you have it or no, enter find.package(“xlsx”) in the console. After xlsx is done installing, load it using library(xlsx).
  2. Read the file
salariesData <-read.xlsx("./salaries/computer.xls",
                         sheetIndex=1,
                         header=TRUE)
head(salariesData)

salaries_pic2

Question 1

In the dataset, what are the column names of the dataset?

> names(salariesData)
[1] "ST" "OCC_CODE" "OCC_TITLE" "OCC_GROUP" "TOT_EMP" "EMP_PRSE" "JOBS_1000" "LOC_Q"
[9] "H_MEAN" "A_MEAN" "MEAN_PRSE" "H_PCT10" "H_PCT25" "H_MEDIAN" "H_PCT75" "H_PCT90"
[17] "A_PCT10" "A_PCT25" "A_MEDIAN" "A_PCT75" "A_PCT90"

Question 2

How many observations are in the dataset?

nrow(salariesData)
[1] 20

Question 3

Extract the first 3 rows of the data and print them to the console. What does the output look like?

print(salariesData[1:3, ])

salaries_pic3

Question 4

Extract the last 3 rows of the data and print them to the console

n < - nrow(salariesData)
print(salariesData[(n-2):n, ])

or

 tail(salariesData,3)

salaries_pic4

Leave a Reply

Your email address will not be published. Required fields are marked *