Scraping Together a Recipe, Episode III
- 2018/25/02
- 8 min read
Converting to Grams
Rather than rolling our own conversion dictionary, let’s turn to the measurements
package that sports the conv_unit()
function for going from one unit to another. For example, coverting 12 inches to centimeters, we get:
conv_unit(12, "inch", "cm")
## [1] 30.48
Let’s see how that’ll work with our data. Grabbing a the first few recipes from scratch and generating a sample_recipes_df
, we begin with
sample_recipes_df <- get_recipes(urls[1:3]) %>%
dfize() %>%
get_portions(pare_portion_info = TRUE) %>%
add_abbrevs()
## Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada)
## Easy 4-Ingredient Margarita
## Blueberry and Spice Smoothie
## Number bad URLs: 0
## Number duped recipes: 0
sample_recipes_df %>%
select(recipe_name, ingredients, portion, portion_abbrev) %>%
slice(1:5) %>%
kable()
recipe_name | ingredients | portion | portion_abbrev |
---|---|---|---|
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 12 ounces chipotle cooking sauce (such a Knorr®) | 12.00 | oz |
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 1 (14 ounce) can reduced-sodium beef broth | 14.00 | oz |
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 1/4 cup chopped fresh cilantro (optional) | 0.25 | cup |
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 2 tablespoons vegetable oil | 2.00 | tbsp |
Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 1 onion, thinly sliced | 1.00 |
Let’s take our 12 oz to grams.
conv_unit(x = sample_recipes_df[1, ]$portion,
from = sample_recipes_df[1, ]$portion_abbrev,
to = "g")
## [1] 340.1943
Let’s see which of our units conv_unit()
can successfully convert out of the box.
We’ll set up exception handling so that conv_unit()
gives us an NA
rather than an error if it encounters a value it can’t convert properly.
try_conv <- possibly(conv_unit, otherwise = NA)
We’ll mutate our abbreviation dictionary, adding a new column to convert to either grams in the case that our unit is a solid or mililieters if it’s a liquid. These have a 1-to-1 conversion (1g = 1ml) so we’ll take whichever one of these is not a missing value and put that in our converted
column.
We’ll use a sample value of 10 for everything.
test_abbrev_dict_conv <- function(dict, key_col, val = 10) {
quo_col <- enquo(key_col)
out <- dict %>%
rowwise() %>%
mutate(
converted_g = try_conv(val, !!quo_col, "g"),
converted_ml = try_conv(val, !!quo_col, "ml"),
converted = case_when(
!is.na(converted_g) ~ converted_g,
!is.na(converted_ml) ~ converted_ml
)
)
return(out)
}
test_abbrev_dict_conv(abbrev_dict, key)
## Source: local data frame [14 x 5]
## Groups: <by row>
##
## # A tibble: 14 x 5
## name key converted_g converted_ml converted
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 ounce oz 283. NA 283.
## 2 pint pt NA NA NA
## 3 pound lb NA NA NA
## 4 kilogram kg 10000. NA 10000.
## 5 gram g 10.0 NA 10.0
## 6 liter l NA 10000. 10000.
## 7 deciliter dl NA 1000. 1000.
## 8 milliliter ml NA 10. 10.0
## 9 tablespoon tbsp NA NA NA
## 10 teaspoon tsp NA NA NA
## 11 fluid ounce fluid oz NA NA NA
## 12 gallon gal NA NA NA
## 13 quart qt NA NA NA
## 14 cup cup NA NA NA
What proportion of the portion abbreviations are we able to to convert to grams off the bat?
converted_units <- test_abbrev_dict_conv(abbrev_dict, key)
length(converted_units$converted[!is.na(converted_units$converted)]) / length(converted_units$converted)
## [1] 0.4285714
We can take a look at the units that measurements
provides conversions for to see if we’ll need to go elsewhere to do the conversion math ourselves.
conv_unit_options$volume
## [1] "ul" "ml" "dl" "l" "cm3"
## [6] "dm3" "m3" "km3" "us_tsp" "us_tbsp"
## [11] "us_oz" "us_cup" "us_pint" "us_quart" "us_gal"
## [16] "inch3" "ft3" "mi3" "imp_tsp" "imp_tbsp"
## [21] "imp_oz" "imp_cup" "imp_pint" "imp_quart" "imp_gal"
This explains why pint
, cup
, etc. weren’t convertable. It looks like we need to put the prefix "us_"
before some of our units. We’ll create a new accepted
column of abbrev_units
that provides the convertable versions of our units.
abbrev_dict_w_accepted
## # A tibble: 14 x 3
## name key accepted
## <chr> <chr> <chr>
## 1 ounce oz oz
## 2 pint pt us_pint
## 3 pound lb lbs
## 4 kilogram kg kg
## 5 gram g g
## 6 liter l l
## 7 deciliter dl dl
## 8 milliliter ml ml
## 9 tablespoon tbsp us_tbsp
## 10 teaspoon tsp us_tsp
## 11 fluid ounce fluid oz oz
## 12 gallon gal us_gal
## 13 quart qt us_quart
## 14 cup cup us_cup
What percentage of units are we able to convert now?
test_abbrev_dict_conv(abbrev_dict_w_accepted, accepted)
## Source: local data frame [14 x 6]
## Groups: <by row>
##
## # A tibble: 14 x 6
## name key accepted converted_g converted_ml converted
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 ounce oz oz 283. NA 283.
## 2 pint pt us_pint NA 4732. 4732.
## 3 pound lb lbs 4536. NA 4536.
## 4 kilogram kg kg 10000. NA 10000.
## 5 gram g g 10.0 NA 10.0
## 6 liter l l NA 10000. 10000.
## 7 deciliter dl dl NA 1000. 1000.
## 8 milliliter ml ml NA 10.0 10.0
## 9 tablespoon tbsp us_tbsp NA 148. 148.
## 10 teaspoon tsp us_tsp NA 49.3 49.3
## 11 fluid ounce fluid oz oz 283. NA 283.
## 12 gallon gal us_gal NA 37854. 37854.
## 13 quart qt us_quart NA 9464. 9464.
## 14 cup cup us_cup NA 2366. 2366.
Looks like all of them! Good stuff.
Let’s write a function to convert units for our real dataframe.
convert_units <- function(df, name_col = accepted, val_col = portion,
pare_down = TRUE, round_to = 2) {
quo_name_col <- enquo(name_col)
quo_val_col <- enquo(val_col)
out <- df %>%
rowwise() %>%
mutate(
converted_g = try_conv(!!quo_val_col, !!quo_name_col, "g") %>% round(digits = round_to),
converted_ml = try_conv(!!quo_val_col, !!quo_name_col, "ml") %>% round(digits = round_to),
converted = case_when(
!is.na(converted_g) ~ as.numeric(converted_g),
!is.na(converted_ml) ~ as.numeric(converted_ml),
is.na(converted_g) && is.na(converted_ml) ~ NA_real_
) %>% round(digits = 2)
)
if (pare_down == TRUE) {
out <- out %>%
select(-converted_g, -converted_ml)
}
return(out)
}
Next let’s add an accepted
column onto our dataframe to get our units in the right format and run our function.
sample_recipes_df %>%
left_join(abbrev_dict_w_accepted, by = c("portion_abbrev" = "key")) %>%
sample_n(10) %>%
convert_units() %>%
kable(format = "html")
ingredients | recipe_name | raw_portion_num | portion_name | approximate | portion | portion_abbrev | name | accepted | converted |
---|---|---|---|---|---|---|---|---|---|
1 onion, thinly sliced | Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 1 | FALSE | 1.0 | NA | NA | NA | ||
4 bolillo rolls, halved and lightly toasted | Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 4 | FALSE | 4.0 | NA | NA | NA | ||
1 cup low-fat vanilla yogurt | Blueberry and Spice Smoothie | 1 | cup | FALSE | 1.0 | cup | cup | us_cup | 236.59 |
3 cloves garlic, minced | Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 3 | FALSE | 3.0 | NA | NA | NA | ||
1 teaspoon ground cinnamon | Blueberry and Spice Smoothie | 1 | teaspoon | FALSE | 1.0 | tsp | teaspoon | us_tsp | 4.93 |
2 tablespoons vegetable oil | Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 2 | tablespoon | FALSE | 2.0 | tbsp | tablespoon | us_tbsp | 29.57 |
4 sprigs fresh cilantro, or to taste (optional) | Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 4 | TRUE | 4.0 | NA | NA | NA | ||
1/2 lemon, juiced | Easy 4-Ingredient Margarita | 1/2 | FALSE | 0.5 | NA | NA | NA | ||
1 pound thinly sliced deli roast beef | Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 1 | pound | FALSE | 1.0 | lb | pound | lbs | 453.59 |
1 (14 ounce) can reduced-sodium beef broth | Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) | 1, 14 | ounce | FALSE | 14.0 | oz | ounce | oz | 396.89 |
All the data
Let’s put it all together, scraping all of our URLs and doing all of the munging in order.
recipes_raw <- more_urls %>% get_recipes(sleep = 3)
recipes <- recipes_raw[!recipes_raw == "Bad URL"]
recipes_df <- recipes %>%
dfize() %>%
get_portions() %>%
add_abbrevs() %>%
left_join(abbrev_dict_w_accepted, by = c("portion_abbrev" = "key")) %>%
convert_units()
set.seed(1234)
recipes_df %>%
sample_n(20) %>%
select(ingredients, recipe_name, portion, portion_abbrev, converted) %>%
kable(format = "html")
ingredients | recipe_name | portion | portion_abbrev | converted |
---|---|---|---|---|
1 (14.5 ounce) can diced tomatoes | No Ordinary Meatloaf | 14.5000000 | oz | 411.068085 |
2 eggs | Blueberry Banana Nut Bread | 2.0000000 | NA | |
1/2 cup KRAFT LIGHT DONE RIGHT! Raspberry Vinaigrette Reduced Fat Dressing | Chicken and Citrus Salad | 0.5000000 | cup | 118.294118 |
2 tablespoons cold butter | Blueberry Banana Nut Bread | 2.0000000 | tbsp | 29.573530 |
1 cup superfine sugar | Banana Coffee Cake with Pecans | 1.0000000 | cup | 236.588236 |
1 onion, chopped | Claire’s Curried Butternut Squash Soup | 1.0000000 | NA | |
2/3 cup milk | Cabbage Cakes | 0.6666667 | cup | 157.725491 |
1/2 cup red salsa | Chorizo, Potato and Green Chile Omelet | 0.5000000 | cup | 118.294118 |
1/2 cup raisins | Fudge Drops | 0.5000000 | cup | 118.294118 |
1/4 cup brown sugar | Mango Chicken with Greens | 0.2500000 | cup | 59.147059 |
4 tablespoons Safeway SELECT Verdi Olive Oil | Bistro Beef Salad | 4.0000000 | tbsp | 59.147059 |
1/4 cup canola oil | Hungarian Salad | 0.2500000 | cup | 59.147059 |
4 ounces PHILADELPHIA Neufchatel Cheese, 1/3 Less Fat than Cream Cheese | Creamy Two-Layer Pumpkin Pie | 4.0000000 | oz | 113.398093 |
1 tablespoon milk | Caramel Pear Crumble | 1.0000000 | tbsp | 14.786765 |
3 tablespoons Worcestershire sauce | Traveling Oven-Barbecued Baby Back Ribs | 3.0000000 | tbsp | 44.360294 |
1 teaspoon crushed red pepper flakes | Artichoke and Shrimp Linguine | 1.0000000 | tsp | 4.928922 |
1 tablespoon freshly ground black pepper | African Chicken in Spicy Red Sauce | 1.0000000 | tbsp | 14.786765 |
2 cups sugar snap peas, trimmed | Sugar Snap Salad | 2.0000000 | cup | 473.176473 |
1 teaspoon canola oil | Apples ‘n’ Onion Topped Chops | 1.0000000 | tsp | 4.928922 |
1 (8 ounce) package KRAFT Mexican Style Shredded Four Cheese with a Touch of PHILADELPHIA, divided | Chorizo, Potato and Green Chile Omelet | 8.0000000 | oz | 226.796185 |
We now have usable data! 🙌
What can we discern from it? What types of foods tend to co-occur and in what proportions? Can we visualize a network of foods interacting with each other across all of these recipes found in the wild?