• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

dplyr rows_update(): Modify existing rows

rstats101 · September 20, 2024 ·

In this post, we will learn how to use dplyr’s rows_update() function with examples. dplyr’s rows_update() function is a useful function to modify or update specific rows in a data frame based on a matching values in key column. It takes in two dataframes x and y, updates existing rows in the target data frame x with values from the y data frame where the keys (columns) match.

library(tidyverse)
packageVersion("dplyr")

[1] '1.1.4'

Basic syntax of dplyr’s rows_update() function looks as given below. It takes in two dataframes as input with optional arguments.

rows_update(
  x,
  y,
  by = NULL,
  ...,
  unmatched = c("error", "ignore"),
  copy = FALSE,
  in_place = FALSE
)

Let us create two small dataframes.The first dataframe is our target dataframe in which we want to update some rows.

# Original data frame
df1 <- tibble(
  student_id = 1:5,
  name = c("Alice", "Bob", "Charlie", "Liz", "Sam"),
  score = c(85, 90, 88, 92, 89)
)

df1

# A tibble: 5 × 3
  student_id name    score
       <int> <chr>   <dbl>
1          1 Alice      85
2          2 Bob        90
3          3 Charlie    88
4          4 Liz        92
5          5 Sam        89

And the second dataframe contains the new values that needed to be updated with and a key column specifying the variable that we want to update.

# Data frame with updated scores for some student_ids
df2 <- tibble(
  student_id = c(1, 5),
  score = c(100, 98)  # updated scores for Alice and Sam
)

df2

# A tibble: 2 × 2
  student_id score
       <dbl> <dbl>
1          1   100
2          5    98

In the example we are considering here we are interested in updates the scores of two student ids. Originally the students 1 & 5 had 85 & 89 as their scores (first dataframe). Now we want to update them to 100 & 98 (second dataframe).

df1 |>
  rows_update(df2)

Matching, by = "student_id"
# A tibble: 5 × 3
  student_id name    score
       <int> <chr>   <dbl>
1          1 Alice     100
2          2 Bob        90
3          3 Charlie    88
4          4 Liz        92
5          5 Sam        98

dplyr’s row_update() identifies the column name to merge automatically, but we can also specify using “by” argument as shown below.

df1 |>
  rows_update(df2, by="student_id")

# A tibble: 5 × 3
  student_id name    score
       <int> <chr>   <dbl>
1          1 Alice     100
2          2 Bob        90
3          3 Charlie    88
4          4 Liz        92
5          5 Sam        98

]

Note that the second dataframe y must have the same columns of x or a subset. If there is a column that is not present in the first dataframe, rows_update() will throw an error. Here is an example showing that.

# Data frame with updated scores for some student_ids
df2_new <- tibble(
  student_id = c(1, 5),
  score = c(100, 98),
  grade = c("A", "A") # extra column not present in the first 
)

df2

# A tibble: 2 × 2
  student_id score
       <dbl> <dbl>
1          1   100
2          5    98
df1 |>
  rows_update(df2_new, by="student_id")
Error in `rows_update()`:
! All columns in `y` must exist in `x`.
ℹ The following columns only exist in `y`: `grade`.
Backtrace:
 1. dplyr::rows_update(df1, df2_new, by = "student_id")
 2. dplyr:::rows_update.data.frame(df1, df2_new, by = "student_id")
Error in rows_update(df1, df2_new, by = "student_id") : 
ℹ The following columns only exist in `y`: `grade`.

Related

Filed Under: dplyr rows_update(), rstats101 Tagged With: update rows with rows_update()

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version