How to Create a dataframe from vectors in R

Dataframe from Scratch
Dataframe from Scratch

In this tutorial, we will learn how to create a dataframe from scratch in R. We will learn to create a dataframe from two vectors of same length.

Dataframe from Scratch

Let us get started creating two vectors in R. We use results from OReilly’s 2021 Data/AI Salary Survey giving average salary associated with some top programming languages.

Our first vector is the programming languages.

language <- c("R","Python", "SQL","Java", "JavaScript", "Rust", "Go")
language
## [1] "R"          "Python"     "SQL"        "Java"       "JavaScript"
## [6] "Rust"       "Go"

Our second vector is the average US salary for each of the programming languages.

salary <- c(143000, 150000,144000, 155000, 146000, 180000,179000)
## [1] 143000 150000 144000 155000 146000 180000 179000

We can create a dataframe using the data.frame() function available in base R. We simply provide the names of the vectors as arguments.

df <- data.frame(language, salary)

In this example, we have two vectors, therefore we will be creating a dataframe with two columns one for programming language and the other for average salary.

df
##     language salary
## 1          R 143000
## 2     Python 150000
## 3        SQL 144000
## 4       Java 155000
## 5 JavaScript 146000
## 6       Rust 180000
## 7         Go 179000

When we create a dataframe using data.frame() function, it also creates names for each row. We can access the row names by using rownames().

rownames(df)
## [1] "1" "2" "3" "4" "5" "6" "7"

Similarly, we can get column names of the dataframe by using colnames() function. In this example, our column names are the vector variable names.

colnames(df)
## [1] "language" "salary"

We can access the values on any of the columns in the dataframe using “$” symbol followed by the variable name as shown below.

df$language
## [1] "R"          "Python"     "SQL"        "Java"       "JavaScript"
## [6] "Rust"       "Go"

Sometime we would like to specify column names. We can create a dataframe using column names that we like as shown below.

df <- data.frame(Language= language, Salary=salary)
df
##     Language Salary
## 1          R 143000
## 2     Python 150000
## 3        SQL 144000
## 4       Java 155000
## 5 JavaScript 146000
## 6       Rust 180000
## 7         Go 179000
Exit mobile version