Look up (and Replace) values in data frame based on lookup table (with more than one column)

I’m trying to find the most elegant (simple and concise) way of replacing the value of certain columns based on matching two columns from another data frame.

Here is the table with the columns I wish to replace (based on what values they contain).

> cost.table
  Identifier Phase.0.Difficulty Phase.1.Complexity Phase.2.Complexity Phase.3.Complexity Phase.4.Complexity Phase.5.Complexity
1        FS1                Low                Low                Low             Medium             Medium               High
2        FS2               High               High               High             Medium             Medium             Medium
3        FS3               High                Low                Low               High               High               High
4        FS4               High             Medium             Medium             Medium             Medium             Medium
5        FS5               High             Medium             Medium               High             Medium             Medium
  Phase.6.Complexity Transaction.Feasibility Approach
1               High                  Medium        B
2             Medium                  Medium        I
3               High                  Medium        B
4             Medium                  Medium        I
5             Medium                  Medium        B

And here are the lookup tables that I wish to use to find the correct replacement value.

> cost.approach.difficulty
  Approach Difficulty   Phase 0  Phase 1  Phase 2   Phase 3  Phase 4  Phase 5  Phase 6
1        B       High 18102.778 29481.67 29481.67 11822.222 30737.78 21634.67 12768.00
2        B        Low  3860.694 15978.47 11175.69  7448.000 12768.00 11467.56 11467.56
3        B     Medium  5323.694 24974.44 15184.17  9221.333 15368.89 12768.00 12768.00
4        I       High 18102.778 74184.44 29481.67 44747.111 69160.00 45249.56 32245.11
5        I        Low  3860.694 26008.89 11175.69 16551.111 35910.00 16876.22 14275.33
6        I     Medium  5323.694 41156.11 15184.17 22373.556 44776.67 23378.44 16876.22
7       RV       High 18102.778 28373.33 29481.67 44747.111 69160.00 45249.56 32245.11
8       RV        Low  3860.694 14870.14 11175.69 16551.111 44776.67 16876.22 14275.33
9       RV     Medium  5323.694 22757.78 15184.17 22373.556 44776.67 23378.44 16876.22

I’m trying to find a simple solution to look up the corresponding value in the cost.approach.difficulty table for ‘Approach’ and ‘Difficulty’.

So for example, in the cost.table, I would like to have the first row, column Phase.0.Difficulty, be replaced with 3860.694 (because it is a ‘B’ approach and low difficulty.

Does anyone have an elegant, simple solution to looking up a value based on two (or more columns) and replacing the value along multiple columns?

Thank you,

Andrew

UPDATE –

There are two proposed answers related to using merge. My goal is to find a more succinct, concise, and elegant solution. Here is the best that I’ve come up with so far:

cost.approach.difficulty$Phase.0[match(paste(cost.table$Approach, cost.table$Phase.0.Difficulty), paste(cost.approach.difficulty$Approach, cost.approach.difficulty$Difficulty))]

The problem with this solution is that I’d need to know the column names ahead of time and still seems like a hack. Anyone have a more concise solution?

##############################################################

If you want this to work for a variable number of columns, I suggests reshaping your cost table and your lookup table(s) into a more standardized format.

First, it would have been easier to answer this question if you had given your data in a reproducible format:

# Create the example data
cost.table <- data.frame(
  "Identifier" = c("FS1", "FS2",  "FS3",  "FS4",  "FS5"),
  "Phase.0.Difficulty" = c("Low", "High", "High", "High", "High"),
  "Phase.1.Complexity" = c("Low", "High", "Low", "Medium", "Medium"),
  "Phase.2.Complexity" = c("Low", "High", "Low", "Medium", "Medium"),
  "Phase.3.Complexity" = c("Medium", "Medium", "High", "Medium", "High"),
  "Phase.4.Complexity" = c("Medium", "Medium", "High", "Medium", "Medium"),
  "Phase.5.Complexity" = c("High", "Medium", "High", "Medium", "Medium"),
  "Phase.6.Complexity" = c("High", "Medium", "High", "Medium", "Medium"),
  "Transaction.Feasibility" = c("Medium", "Medium", "Medium", "Medium", "Medium"),
  "Approach" = c("B", "I", "B", "I", "B"),
  stringsAsFactors = FALSE)

cost.approach.difficulty <- data.frame(
  "Approach" = c("B", "B", "B", "I", "I", "I", "RV", "RV", "RV"),
  "Difficulty" = c("High", "Low", "Medium", "High", "Low", "Medium", "High", "Low", "Medium"),
  "Phase.0" = c(18102.778, 3860.694, 5323.694, 18102.778, 3860.694, 5323.694, 18102.778, 3860.694, 5323.694),
  "Phase.1" = c(29481.67,15978.47, 24974.44, 74184.44, 26008.89, 41156.11, 28373.33, 14870.14, 22757.78),
  "Phase.2" = c(29481.67, 11175.69, 15184.17, 29481.67, 11175.69, 15184.17, 29481.67, 11175.69, 15184.17),
  "Phase.3" = c(11822.222, 7448, 9221.333, 44747.111, 16551.111, 22373.556, 44747.111, 16551.111, 22373.556),
  "Phase.4" = c(30737.78, 12768, 15368.89, 69160, 35910, 44776.67, 69160, 44776.67, 44776.67),
  "Phase.5" = c(21634.67, 11467.56, 12768, 45249.56, 16876.22, 23378.44, 45249.56, 16876.22, 23378.44),
  "Phase.6" = c(12768, 11467.56, 12768, 32245.11, 14275.33, 16876.22, 32245.11, 14275.33, 16876.22),
  stringsAsFactors = FALSE)

Once I had recreated your example data, I used the melt.data.frame function from the reshape2package:

# Reshape the data
require(reshape2)

cost.table <- melt(cost.table, id.vars = c("Identifier", "Approach"), 
  value.name = "Size")
cost.table$Phase <- gsub("(\\w+\\.\\d+)\\.(\\w+)", "\\1", 
  as.character(cost.table$variable), perl = TRUE)
cost.table$Type <- gsub("(\\w+\\.\\d+)\\.(\\w+)", "\\2", 
  as.character(cost.table$variable), perl = TRUE)

head(cost.table)

  Identifier Approach           variable Size   Phase       Type
1        FS1        B Phase.0.Difficulty  Low Phase.0 Difficulty
2        FS2        I Phase.0.Difficulty High Phase.0 Difficulty
3        FS3        B Phase.0.Difficulty High Phase.0 Difficulty
4        FS4        I Phase.0.Difficulty High Phase.0 Difficulty
5        FS5        B Phase.0.Difficulty High Phase.0 Difficulty
6        FS1        B Phase.1.Complexity  Low Phase.1 Complexity

cost.approach.difficulty <- melt(cost.approach.difficulty, 
  id.vars = c("Difficulty", "Approach"), variable.name = "Phase")
cost.approach.difficulty$Phase <- as.character(cost.approach.difficulty$Phase)
cost.approach.difficulty$Type <- "Difficulty"
colnames(cost.approach.difficulty)[
  colnames(cost.approach.difficulty) == "Difficulty"] <- "Size"

head(cost.approach.difficulty)

    Size Approach   Phase     value       Type
1   High        B Phase.0 18102.778 Difficulty
2    Low        B Phase.0  3860.694 Difficulty
3 Medium        B Phase.0  5323.694 Difficulty
4   High        I Phase.0 18102.778 Difficulty
5    Low        I Phase.0  3860.694 Difficulty
6 Medium        I Phase.0  5323.694 Difficulty

Once the two tables are in a standardized format, you can call merge:

cost.table.filled <- merge(cost.table, cost.approach.difficulty, 
  by = c("Approach", "Size", "Phase", "Type"), all.x = TRUE, all.y = FALSE)

Then, if you didn’t look up the values for certain columns, you can re-insert the original values (otherwise you end up with a bunch of NAs):

cost.table.filled$value[is.na(cost.table.filled$value)] <- 
  cost.table.filled$Size[is.na(cost.table.filled$value)]

Then you can dcast the thing back into the original format:

cost.table.final <- dcast(cost.table.filled, Identifier + Approach ~ Phase + Type)

head(cost.table.final)

  Identifier Approach Phase.0_Difficulty Phase.1_Complexity Phase.2_Complexity Phase.3_Complexity Phase.4_Complexity Phase.5_Complexity Phase.6_Complexity Transaction.Feasibility_Transaction.Feasibility
1        FS1        B           3860.694                Low                Low             Medium             Medium               High               High                                          Medium
2        FS2        I          18102.778               High               High             Medium             Medium             Medium             Medium                                          Medium
3        FS3        B          18102.778                Low                Low               High               High               High               High                                          Medium
4        FS4        I          18102.778             Medium             Medium             Medium             Medium             Medium             Medium                                          Medium
5        FS5        B          18102.778             Medium             Medium               High             Medium             Medium             Medium                                          Medium

To replace all columns, I would melt each lookup table and then cbind them all together into a single lookup table. That way, you only have to call merge once and you don’t have to worry about replacing NAs.