An xlsx is a file format used for Microsoft Excel spreadsheets. Excel can be used to store tabular data.
R has a built-in functionality that makes it easy to read and write a xlsx file.
Sample xlsx File
To demonstrate how we read xlsx files in R, let's suppose we have an excel file named studentinfo.xlsx
with following data:
We will be reading these datas with the help of R's built-in functions.
Install and Load xlsx Package
In order to read, write, and format Excel files into R, we first need to install and load the xlsx
package as:
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
Here, we have successfully installed and loaded the xlsx package.
Now, we are able to read data from an xlsx file.
Read a xlsx File in R
In R, we use the read.xlsx()
function to read a xlsx file available in our current directory. For example,
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# read studentinfo.xlsx file from our current directory
read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)
# display xlsx file
print(read_data)
Output
Name Age Faculty State 1 Abby 24 Business Florida 2 Hazzle 23 Engineering Arizona 3 Cathy 20 Engineering Colorado 4 Paterson 22 Arts Texas 5 Sammy 20 Economics Ohio 6 Pam 2 Arts Arizona
In the above example, we have read the studentinfo.xlsx
file that is available in our current directory. Notice the code,
read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)
Here,
read.xlsx()
- reads the xlsx filestudentinfo.xlsx
and creates a dataframe which is stored in the read_data variable.sheetIndex = 1
- reads specified worksheet i.e. 1
Note:
- If the file is in some other location, we have to specify the path along with the file name as:
read.xlsx("D:/folder1/studentinfo.xlsx", sheetIndex = 1)
. - We can also use the
read.xlsx2()
function if the dataset we are working on is larger.
xlsx rowIndex and colIndex Argument in R
In R, we can also read a specific range of data from excel files. We can pass the rowIndex
and colIndex
argument inside read.xlsx()
to read specific range.
rowIndex
- reads a specific range of rowscolIndex
- read a specific range of columns
Example: Read Range of Rows
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# read first five rows of xlsx file
read_data <- read.xlsx("studentinfo.xlsx",
sheetIndex = 1,
rowIndex = 1:5
)
# display xlsx file
print(read_data)
Output
Name Age Faculty State 1 Abby 24 Business Florida 2 Hazzle 23 Engineering Arizona 3 Cathy 20 Engineering Colorado 4 Paterson 22 Arts Texas
In the above example, we have passed rowIndex = 1:5
inside read.xlsx()
so the function reads only the first five rows from the studentinfo.xlsx
file.
Example: Read Range of Columns
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# read first three columns of xlsx file
read_data <- read.xlsx("studentinfo.xlsx",
sheetIndex = 1,
colIndex = 1:3
)
# display xlsx file
print(read_data)
Output
Name Age Faculty 1 Abby 24 Business 2 Hazzle 23 Engineering 3 Cathy 20 Engineering 4 Paterson 22 Arts 5 Sammy 20 Economics 6 Pam 21 Arts
Here, colIndex = 1:3
inside read.xlsx()
reads only the first three columns from the studentinfo.xlsx
file.
xlsx startRow Argument in R
Sometimes the excel file may contain headers at the beginning which we may not want to include. For example,
Here, the 1st Row of excel file contains a header, and the 2nd row is empty. So we don't want to include these two rows.
To start reading data from a specific row in excel worksheet, we pass the startRow
argument inside read.xlsx()
.
Let's take a look at example,
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# start reading from 3rd row
read_data <- read.xlsx("studentinfo.xlsx",
sheetIndex = 1,
startRow = 3
)
# display xlsx file
print(read_data)
Output
Name Age Faculty State 1 Abby 24 Business Florida 2 Hazzle 23 Engineering Arizona 3 Cathy 20 Engineering Colorado 4 Paterson 22 Arts Texas 5 Sammy 20 Economics Ohio 6 Pam 21 Arts Arizona
In the above example, we have used the startRow
argument inside the read.xlsx()
function to start reading from the specified row.
startRow = 3
means the first two rows are ignored and read.xlsx()
starts reading data from the 3rd row.
Write Into xlsx File in R
In R, we use the write.xlsx()
function to write into a xlsx file. We pass the data in the form of dataframe. For example,
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE))
# write dataframe1 into file1 xlsx file
write.xlsx(dataframe1, "file1.xlsx")
In the above example, we have used the write.xlsx()
function to export a data frame named dataframe1 to a xlsx file. Notice the arguments passed inside write.xlsx()
,
write.xlsx(dataframe1, "file1.xlsx")
Here,
dataframe1
- name of the data frame we want to exportfile1.xlsx
- name of the xlsx file
Finally, the file1.xlsx
file would look like this in our directory:
Rename Current Worksheet
We can rename the current worksheet by using the sheetName
argument inside the write.xlsx()
function. For example,
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE))
# name current worksheet
write.xlsx(dataframe1, "file1.xlsx",
sheetName = "Voting Eligibility"
)
Here, we have passed sheetname = "Voting Eligibility"
inside write.xlsx()
, so the name of the sheet is changed to "Voting Eligibility"
.
So the file1.xlsx
looks like this: