In this tutorial, we will learn how to create a dataframe from scratch in R. We will learn to create a dataframe from two vectors of same length.
Let us get started creating two vectors in R. We use results from OReilly’s 2021 Data/AI Salary Survey giving average salary associated with some top programming languages.
Our first vector is the programming languages.
language <- c("R","Python", "SQL","Java", "JavaScript", "Rust", "Go")
language ## [1] "R" "Python" "SQL" "Java" "JavaScript" ## [6] "Rust" "Go"
Our second vector is the average US salary for each of the programming languages.
salary <- c(143000, 150000,144000, 155000, 146000, 180000,179000)
## [1] 143000 150000 144000 155000 146000 180000 179000
We can create a dataframe using the data.frame() function available in base R. We simply provide the names of the vectors as arguments.
df <- data.frame(language, salary)
In this example, we have two vectors, therefore we will be creating a dataframe with two columns one for programming language and the other for average salary.
df ## language salary ## 1 R 143000 ## 2 Python 150000 ## 3 SQL 144000 ## 4 Java 155000 ## 5 JavaScript 146000 ## 6 Rust 180000 ## 7 Go 179000
When we create a dataframe using data.frame() function, it also creates names for each row. We can access the row names by using rownames().
rownames(df) ## [1] "1" "2" "3" "4" "5" "6" "7"
Similarly, we can get column names of the dataframe by using colnames() function. In this example, our column names are the vector variable names.
colnames(df) ## [1] "language" "salary"
We can access the values on any of the columns in the dataframe using “$” symbol followed by the variable name as shown below.
df$language ## [1] "R" "Python" "SQL" "Java" "JavaScript" ## [6] "Rust" "Go"
Sometime we would like to specify column names. We can create a dataframe using column names that we like as shown below.
df <- data.frame(Language= language, Salary=salary) df ## Language Salary ## 1 R 143000 ## 2 Python 150000 ## 3 SQL 144000 ## 4 Java 155000 ## 5 JavaScript 146000 ## 6 Rust 180000 ## 7 Go 179000