R: To prioritize rows with full records when there are multiple occurrences with the same value in column A to retain them
(Edited)
Given a data frame with certain rows having missing data in certain columns, though they belong to the same with similar A values, I want to prioritize rows with full records when there are multiple occurrences with the same value in column A to retain them.
# Create a sample dataframe
df <- data.frame(
A = c(1, 1, 2, 2, 3, 3, 4, 4),
B = c("a", "a", "b", "b", "c", "c", "d", "e"),
C = c("z", "z", "y", "y", "x", "x", "w", "v"),
D = c(6, NA, 7, NA, 8, 8, NA, 10),
E = c("f", NA, "g", NA, "h", "h", NA, "j")
)
#filter it this way
df_filtered <- df %>%
group_by(A) %>%
arrange(rowSums(is.na(.))) %>%
slice(1) %>%
ungroup()
The filtering process is as follows:
Group by column A.
For each group:
a. First, arrange by the number of NA values in descending order (so rows with fewer NA values come first). arrange(rowSums(is.na(.)))
b. Then, slice to pick the first row.
Finally, ungroup.
0
0
0.000
Congratulations @snippets! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)
Your next target is to reach 50 upvotes.
You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOP