Scraping Together a Recipe, Episode III

Converting to Grams

Rather than rolling our own conversion dictionary, let’s turn to the measurements package that sports the conv_unit() function for going from one unit to another. For example, coverting 12 inches to centimeters, we get:

conv_unit(12, "inch", "cm")
## [1] 30.48

Let’s see how that’ll work with our data. Grabbing a the first few recipes from scratch and generating a sample_recipes_df, we begin with

sample_recipes_df <- get_recipes(urls[1:3]) %>% 
  dfize() %>% 
  get_portions(pare_portion_info = TRUE) %>% 
  add_abbrevs()
## Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada)
## Easy 4-Ingredient Margarita
## Blueberry and Spice Smoothie
## Number bad URLs: 0
## Number duped recipes: 0
sample_recipes_df %>% 
  select(recipe_name, ingredients, portion, portion_abbrev) %>% 
  slice(1:5) %>% 
  kable()
recipe_name ingredients portion portion_abbrev
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 12 ounces chipotle cooking sauce (such a Knorr®) 12.00 oz
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 1 (14 ounce) can reduced-sodium beef broth 14.00 oz
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 1/4 cup chopped fresh cilantro (optional) 0.25 cup
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 2 tablespoons vegetable oil 2.00 tbsp
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 1 onion, thinly sliced 1.00

Let’s take our 12 oz to grams.

conv_unit(x = sample_recipes_df[1, ]$portion, 
          from = sample_recipes_df[1, ]$portion_abbrev, 
          to = "g")
## [1] 340.1943

Let’s see which of our units conv_unit() can successfully convert out of the box.

We’ll set up exception handling so that conv_unit() gives us an NA rather than an error if it encounters a value it can’t convert properly.

try_conv <- possibly(conv_unit, otherwise = NA)

We’ll mutate our abbreviation dictionary, adding a new column to convert to either grams in the case that our unit is a solid or mililieters if it’s a liquid. These have a 1-to-1 conversion (1g = 1ml) so we’ll take whichever one of these is not a missing value and put that in our converted column.

We’ll use a sample value of 10 for everything.

test_abbrev_dict_conv <- function(dict, key_col, val = 10) {
  
  quo_col <- enquo(key_col)
  
  out <- dict %>% 
    rowwise() %>% 
    mutate(
      converted_g = try_conv(val, !!quo_col, "g"),
      converted_ml = try_conv(val, !!quo_col, "ml"),
      converted = case_when(
        !is.na(converted_g) ~ converted_g,
        !is.na(converted_ml) ~ converted_ml
      )
    )
  
  return(out)
}
test_abbrev_dict_conv(abbrev_dict, key)
## Source: local data frame [14 x 5]
## Groups: <by row>
## 
## # A tibble: 14 x 5
##    name        key      converted_g converted_ml converted
##    <chr>       <chr>          <dbl>        <dbl>     <dbl>
##  1 ounce       oz             283.           NA      283. 
##  2 pint        pt              NA            NA       NA  
##  3 pound       lb              NA            NA       NA  
##  4 kilogram    kg           10000.           NA    10000. 
##  5 gram        g               10.0          NA       10.0
##  6 liter       l               NA         10000.   10000. 
##  7 deciliter   dl              NA          1000.    1000. 
##  8 milliliter  ml              NA            10.      10.0
##  9 tablespoon  tbsp            NA            NA       NA  
## 10 teaspoon    tsp             NA            NA       NA  
## 11 fluid ounce fluid oz        NA            NA       NA  
## 12 gallon      gal             NA            NA       NA  
## 13 quart       qt              NA            NA       NA  
## 14 cup         cup             NA            NA       NA

What proportion of the portion abbreviations are we able to to convert to grams off the bat?

converted_units <- test_abbrev_dict_conv(abbrev_dict, key)
length(converted_units$converted[!is.na(converted_units$converted)]) / length(converted_units$converted)
## [1] 0.4285714

We can take a look at the units that measurements provides conversions for to see if we’ll need to go elsewhere to do the conversion math ourselves.

conv_unit_options$volume
##  [1] "ul"        "ml"        "dl"        "l"         "cm3"      
##  [6] "dm3"       "m3"        "km3"       "us_tsp"    "us_tbsp"  
## [11] "us_oz"     "us_cup"    "us_pint"   "us_quart"  "us_gal"   
## [16] "inch3"     "ft3"       "mi3"       "imp_tsp"   "imp_tbsp" 
## [21] "imp_oz"    "imp_cup"   "imp_pint"  "imp_quart" "imp_gal"

This explains why pint, cup, etc. weren’t convertable. It looks like we need to put the prefix "us_" before some of our units. We’ll create a new accepted column of abbrev_units that provides the convertable versions of our units.

abbrev_dict_w_accepted 
## # A tibble: 14 x 3
##           name      key accepted
##          <chr>    <chr>    <chr>
##  1       ounce       oz       oz
##  2        pint       pt  us_pint
##  3       pound       lb      lbs
##  4    kilogram       kg       kg
##  5        gram        g        g
##  6       liter        l        l
##  7   deciliter       dl       dl
##  8  milliliter       ml       ml
##  9  tablespoon     tbsp  us_tbsp
## 10    teaspoon      tsp   us_tsp
## 11 fluid ounce fluid oz       oz
## 12      gallon      gal   us_gal
## 13       quart       qt us_quart
## 14         cup      cup   us_cup

What percentage of units are we able to convert now?

test_abbrev_dict_conv(abbrev_dict_w_accepted, accepted)
## Source: local data frame [14 x 6]
## Groups: <by row>
## 
## # A tibble: 14 x 6
##    name        key      accepted converted_g converted_ml converted
##    <chr>       <chr>    <chr>          <dbl>        <dbl>     <dbl>
##  1 ounce       oz       oz             283.          NA       283. 
##  2 pint        pt       us_pint         NA         4732.     4732. 
##  3 pound       lb       lbs           4536.          NA      4536. 
##  4 kilogram    kg       kg           10000.          NA     10000. 
##  5 gram        g        g               10.0         NA        10.0
##  6 liter       l        l               NA        10000.    10000. 
##  7 deciliter   dl       dl              NA         1000.     1000. 
##  8 milliliter  ml       ml              NA           10.0      10.0
##  9 tablespoon  tbsp     us_tbsp         NA          148.      148. 
## 10 teaspoon    tsp      us_tsp          NA           49.3      49.3
## 11 fluid ounce fluid oz oz             283.          NA       283. 
## 12 gallon      gal      us_gal          NA        37854.    37854. 
## 13 quart       qt       us_quart        NA         9464.     9464. 
## 14 cup         cup      us_cup          NA         2366.     2366.

Looks like all of them! Good stuff.

Let’s write a function to convert units for our real dataframe.

convert_units <- function(df, name_col = accepted, val_col = portion,
                          pare_down = TRUE, round_to = 2) {
  
  quo_name_col <- enquo(name_col)
  quo_val_col <- enquo(val_col)
  
  out <- df %>% 
    rowwise() %>% 
    mutate(
      converted_g = try_conv(!!quo_val_col, !!quo_name_col, "g") %>% round(digits = round_to),
      converted_ml = try_conv(!!quo_val_col, !!quo_name_col, "ml") %>% round(digits = round_to), 
      converted = case_when(
        !is.na(converted_g) ~ as.numeric(converted_g), 
        !is.na(converted_ml) ~ as.numeric(converted_ml), 
        is.na(converted_g) && is.na(converted_ml) ~ NA_real_ 
      ) %>% round(digits = 2)
    ) 
  
  if (pare_down == TRUE) {
    out <- out %>% 
      select(-converted_g, -converted_ml)
  }
  
  return(out)
}

Next let’s add an accepted column onto our dataframe to get our units in the right format and run our function.

sample_recipes_df %>% 
  left_join(abbrev_dict_w_accepted, by = c("portion_abbrev" = "key")) %>% 
  sample_n(10) %>%
  convert_units() %>% 
  kable(format = "html")
ingredients recipe_name raw_portion_num portion_name approximate portion portion_abbrev name accepted converted
1 onion, thinly sliced Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 1 FALSE 1.0 NA NA NA
4 bolillo rolls, halved and lightly toasted Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 4 FALSE 4.0 NA NA NA
1 cup low-fat vanilla yogurt Blueberry and Spice Smoothie 1 cup FALSE 1.0 cup cup us_cup 236.59
3 cloves garlic, minced Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 3 FALSE 3.0 NA NA NA
1 teaspoon ground cinnamon Blueberry and Spice Smoothie 1 teaspoon FALSE 1.0 tsp teaspoon us_tsp 4.93
2 tablespoons vegetable oil Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 2 tablespoon FALSE 2.0 tbsp tablespoon us_tbsp 29.57
4 sprigs fresh cilantro, or to taste (optional) Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 4 TRUE 4.0 NA NA NA
1/2 lemon, juiced Easy 4-Ingredient Margarita 1/2 FALSE 0.5 NA NA NA
1 pound thinly sliced deli roast beef Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 1 pound FALSE 1.0 lb pound lbs 453.59
1 (14 ounce) can reduced-sodium beef broth Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 1, 14 ounce FALSE 14.0 oz ounce oz 396.89

All the data

Let’s put it all together, scraping all of our URLs and doing all of the munging in order.

recipes_raw <- more_urls %>% get_recipes(sleep = 3)
recipes <- recipes_raw[!recipes_raw == "Bad URL"]

recipes_df <- recipes %>% 
  dfize() %>% 
  get_portions() %>% 
  add_abbrevs() %>% 
  left_join(abbrev_dict_w_accepted, by = c("portion_abbrev" = "key")) %>% 
  convert_units()
set.seed(1234)

recipes_df %>% 
  sample_n(20) %>%
  select(ingredients, recipe_name, portion, portion_abbrev, converted) %>% 
  kable(format = "html")
ingredients recipe_name portion portion_abbrev converted
1 (14.5 ounce) can diced tomatoes No Ordinary Meatloaf 14.5000000 oz 411.068085
2 eggs Blueberry Banana Nut Bread 2.0000000 NA
1/2 cup KRAFT LIGHT DONE RIGHT! Raspberry Vinaigrette Reduced Fat Dressing Chicken and Citrus Salad 0.5000000 cup 118.294118
2 tablespoons cold butter Blueberry Banana Nut Bread 2.0000000 tbsp 29.573530
1 cup superfine sugar Banana Coffee Cake with Pecans 1.0000000 cup 236.588236
1 onion, chopped Claire’s Curried Butternut Squash Soup 1.0000000 NA
2/3 cup milk Cabbage Cakes 0.6666667 cup 157.725491
1/2 cup red salsa Chorizo, Potato and Green Chile Omelet 0.5000000 cup 118.294118
1/2 cup raisins Fudge Drops 0.5000000 cup 118.294118
1/4 cup brown sugar Mango Chicken with Greens 0.2500000 cup 59.147059
4 tablespoons Safeway SELECT Verdi Olive Oil Bistro Beef Salad 4.0000000 tbsp 59.147059
1/4 cup canola oil Hungarian Salad 0.2500000 cup 59.147059
4 ounces PHILADELPHIA Neufchatel Cheese, 1/3 Less Fat than Cream Cheese Creamy Two-Layer Pumpkin Pie 4.0000000 oz 113.398093
1 tablespoon milk Caramel Pear Crumble 1.0000000 tbsp 14.786765
3 tablespoons Worcestershire sauce Traveling Oven-Barbecued Baby Back Ribs 3.0000000 tbsp 44.360294
1 teaspoon crushed red pepper flakes Artichoke and Shrimp Linguine 1.0000000 tsp 4.928922
1 tablespoon freshly ground black pepper African Chicken in Spicy Red Sauce 1.0000000 tbsp 14.786765
2 cups sugar snap peas, trimmed Sugar Snap Salad 2.0000000 cup 473.176473
1 teaspoon canola oil Apples ‘n’ Onion Topped Chops 1.0000000 tsp 4.928922
1 (8 ounce) package KRAFT Mexican Style Shredded Four Cheese with a Touch of PHILADELPHIA, divided Chorizo, Potato and Green Chile Omelet 8.0000000 oz 226.796185

We now have usable data! 🙌

What can we discern from it? What types of foods tend to co-occur and in what proportions? Can we visualize a network of foods interacting with each other across all of these recipes found in the wild?