Food for Thought II -- Building and Testing

Building Menus

Now that we’ve got our data, on to building a daily menu. The only constraint we’ll worry about for now is that menus have to contain at least 2300 calories. Our strategy is simple; pick one serving of a food at random from our dataset and, if it doesn’t yet exist in our menu, add it. We do this until we’re no longer under 2300 calories.

That’s implemented in add_calories() below, which we’ll as a helper inside the main building function, build_menu(). The reason I’ve spun add_calories() out into its own function is so that we can easily add more foods to existing menus. Unlike build_menu() which takes a dataframe of possible foods to choose from as its first argument, add_calories() takes menu as its first argument, making it convenient to pipe a menu in that needs more calories.

add_calories <- function(menu = NULL, df = abbrev, seed = NULL, ...) {

  # If we're starting from an existing menu
  if (! is.null(menu)) {
    menu <- menu %>% drop_na_(all_nut_and_mr_df$nutrient) %>%  filter(!(is.na(Energ_Kcal)) & !(is.na(GmWt_1)))
    cals <- sum((menu$Energ_Kcal * menu$GmWt_1), na.rm = TRUE)/100   # Set calories to our current number of calories
  # If we're starting from scratch
  } else {
    cals <- 0   
    menu <- NULL
  }

  while (cals < 2300) {
    df <- df %>% filter(!NDB_No %in% menu$NDB_No)   # Only add foods we don't already have

    if (nrow(df) == 0) {
      message("No more elligible foods to sample from. Returning menu too low in calories.")
      return(menu)
    } else {
      food_i <- df %>%
        sample_n(1)   # Sample a new index from a food that doesn't already exist in our menu
    }

    this_food_cal <- (food_i$Energ_Kcal * food_i$GmWt_1)/100   
    cals <- cals + this_food_cal    

    menu <- bind_rows(menu, food_i)   
  }
  return(menu)   
}

Okay now for build_menu(). We’ll make sure we don’t have missing values in any of our nutrient columns, calories, or the food weight. The from_better_cutoff argument allows us to specify that we only want to pull foods that have at least a certain z-score on our scaled_score dimension. More on scoring in a bit.

The default, though, will just be to pull foods from our main abbrev dataframe.

build_menu <- function(df = abbrev, menu = NULL, seed = NULL, from_better_cutoff = NULL, do_mutates = TRUE) {
  if (!is.null(seed)) {
    set.seed(seed)
  }
  
  df <- df %>% drop_na_(all_nut_and_mr_df$nutrient) %>%
    filter(!(is.na(Energ_Kcal)) & !(is.na(GmWt_1)))    # Filter out rows that have NAs in columns that we need
  
  # Optionally choose a floor for what the z-score of each food to build from should be
  if (!is.null(from_better_cutoff)) {
    assert_that(is.numeric(from_better_cutoff), msg = "from_better_cutoff must be numeric or NULL")
    if (! "scaled_score" %in% names(df)) {
      df <- df %>% 
        add_ranked_foods()
    }
    df <- df %>% 
      filter(scaled_score > from_better_cutoff)
  }
  
  if (nrow(df) == 0) {
    stop("No foods to speak of; you might try a lower cutoff.")
  }
  
  # Add one serving of food until we hit 2300
  menu <- add_calories(menu = menu, df = df)
  
  return(menu)
}
our_random_menu <- build_menu()
our_random_menu %>% kable(format = "html")
shorter_desc solution_amounts GmWt_1 serving_gmwt cost Lipid_Tot_g Sodium_mg Cholestrl_mg FA_Sat_g Protein_g Calcium_mg Iron_mg Magnesium_mg Phosphorus_mg Potassium_mg Zinc_mg Copper_mg Manganese_mg Selenium_µg Vit_C_mg Thiamin_mg Riboflavin_mg Niacin_mg Panto_Acid_mg Vit_B6_mg Energ_Kcal Shrt_Desc NDB_No score scaled_score
ALCOHOLIC BEV 1 34.8 34.8 4.81 0.30 8 0 0.106 0.10 1 0.06 3 6 30 0.03 0.040 0.017 0.3 0.0 0.004 0.012 0.144 0.000 0.000 308 ALCOHOLIC BEV,LIQUEUR,COFFEE,63 PROOF 14534 -3362.759 -0.3443655
BEANS 1 120.0 120.0 7.51 0.10 259 0 0.023 0.80 24 0.90 13 19 98 0.20 0.070 0.335 0.2 3.4 0.025 0.051 0.200 0.106 0.030 15 BEANS,SNAP,YEL,CND,REG PK,SOL&LIQUIDS 11727 -3492.567 -0.5925234
VANILLA EXTRACT 1 4.2 4.2 2.87 0.06 9 0 0.010 0.06 11 0.12 12 6 148 0.11 0.072 0.230 0.0 0.0 0.011 0.095 0.425 0.035 0.026 288 VANILLA EXTRACT 02050 -3366.897 -0.3522761
RICE 1 174.0 174.0 1.53 0.19 5 0 0.039 2.02 2 0.14 5 8 10 0.41 0.049 0.262 5.6 0.0 0.020 0.013 0.290 0.215 0.026 97 RICE,WHITE,GLUTINOUS,UNENR,CKD 20055 -3323.860 -0.2700009
PORK SAUSAGE 1 30.0 30.0 5.34 41.66 990 79 13.557 15.12 17 0.95 15 193 225 1.59 0.085 0.065 22.5 0.0 0.243 0.180 4.328 0.644 0.137 438 PORK SAUSAGE,LINK/PATTY,FULLY CKD,MICROWAVED 07953 -3562.512 -0.7262400
LAMB 1 85.0 85.0 8.53 17.53 86 102 6.004 28.16 25 1.89 20 152 183 4.80 0.138 0.002 6.4 0.0 0.026 0.130 3.654 0.411 0.089 270 LAMB,NZ,IMP,BREAST,LN,CKD,BRSD 17359 -3191.959 -0.0178412
BEEF 1 85.0 85.0 9.21 2.98 40 199 0.922 19.00 158 3.72 24 168 102 2.39 0.106 6.066 4.3 0.0 0.022 0.103 2.616 0.226 0.013 103 BEEF,NZ,IMP,VAR MEATS & BY-PRODUCTS,TRIPE CKD,BLD 23444 -3166.645 0.0305520
FROZEN YOGURTS 1 72.0 72.0 7.91 6.00 98 5 3.630 4.00 147 1.25 27 139 261 0.49 0.131 0.121 2.4 0.3 0.036 0.211 0.305 0.674 0.074 160 FROZEN YOGURTS,CHOC,SOFT-SERVE 19393 -3034.619 0.2829497
PUDDINGS 1 128.0 128.0 8.06 1.73 159 7 1.017 2.94 108 0.06 12 83 137 0.34 0.013 0.004 2.1 0.7 0.031 0.146 0.075 0.278 0.034 101 PUDDINGS,VANILLA,DRY MIX,REG,PREP W/ 2% MILK 19212 -3146.193 0.0696504
TURKEY 1 85.0 85.0 4.45 7.39 103 109 2.155 28.55 14 1.09 30 223 239 2.48 0.093 0.014 29.8 0.0 0.045 0.281 9.573 0.948 0.616 189 TURKEY,WHL,MEAT & SKN,CKD,RSTD 05166 -3069.747 0.2157956
SHALLOTS 1 0.9 0.9 4.71 0.50 59 0 0.084 12.30 183 6.00 104 296 1650 1.93 0.425 1.417 5.7 39.0 0.300 0.100 1.000 1.408 1.675 348 SHALLOTS,FREEZE-DRIED 11640 -3353.798 -0.3272339
CRACKERS 1 14.2 14.2 4.38 20.60 190 0 5.178 8.60 49 4.40 62 220 203 1.60 0.318 1.781 33.7 0.0 0.505 0.327 4.961 0.522 0.136 473 CRACKERS,WHEAT,LOW SALT 18428 -3320.740 -0.2640355
WATERCHESTNUTS 1 70.0 70.0 7.70 0.06 8 0 0.016 0.88 4 0.87 5 19 118 0.38 0.100 0.161 0.7 1.3 0.011 0.024 0.360 0.221 0.159 50 WATERCHESTNUTS,CHINESE,CND,SOL&LIQUIDS 11590 -3273.837 -0.1743701
PUDDINGS 1 147.0 147.0 5.10 2.90 267 11 1.740 2.70 99 0.06 11 205 127 0.32 0.011 0.003 1.8 0.8 0.032 0.135 0.070 0.262 0.035 115 PUDDINGS,LEMON,DRY MIX,INST,PREP W/ WHL MILK 19331 -3130.586 0.0994880
BEEF 1 85.0 85.0 6.66 10.67 82 73 4.998 19.25 18 2.12 19 187 367 7.66 0.066 0.010 22.4 0.0 0.070 0.155 4.560 0.675 0.414 173 BEEF,CHUCK EYE RST,BNLS,A BF RST,LN & FAT,0" FAT,SEL,RAW 13791 -2967.945 0.4104136
CARROTS 1 9.7 9.7 3.65 0.18 58 0 0.030 0.76 30 0.34 10 30 235 0.20 0.017 0.155 0.7 3.6 0.066 0.044 0.645 0.232 0.153 35 CARROTS,CKD,BLD,DRND,WO/SALT 11125 -3349.391 -0.3188088
BOLOGNA 1 100.0 100.0 9.76 24.59 960 60 9.301 15.20 85 1.21 17 163 315 2.30 0.052 0.018 24.6 0.8 0.217 0.185 2.521 0.415 0.297 308 BOLOGNA,BF & PORK 07008 -3800.076 -1.1803975
BEEF 1 85.0 85.0 2.35 10.34 56 80 3.917 28.66 13 2.58 22 226 293 4.66 0.071 0.003 26.9 0.0 0.060 0.212 7.460 0.320 0.714 208 BEEF,LON,TP LON STK,BNLSS,LIP-ON,LN,1/8" FAT,CHC,CKD,GRILLD 23392 -2969.924 0.4066290
SAUSAGE PORK & BF W/ CHEDDAR CHS SMOKED 1 77.0 77.0 6.71 25.84 848 63 9.482 12.89 57 0.73 13 178 206 2.25 0.070 0.032 7.5 0.0 0.249 0.160 2.900 0.760 0.130 296 SAUSAGE PORK & BF W/ CHEDDAR CHS SMOKED 07917 -3731.781 -1.0498363
SALAD DRSNG 1 15.0 15.0 8.69 44.54 901 26 6.964 1.32 28 0.30 5 186 64 0.17 0.019 0.033 3.5 0.0 0.015 0.087 0.054 0.272 0.030 430 SALAD DRSNG,RANCH DRSNG,REG 04639 -3477.456 -0.5636340

Alright nice – we’ve got a random menu that’s at least compliant on calories. Is it compliant on nutrients and must restricts?

Testing Compliance

A few quick functions for testing whether we’re compliant on the other dimensions. Nothing fancy here; all we’re doing is going through positives and must restricts and figuring out how much of a given nutrient we’ve got and comparing that to the requirement. If we’re below the minimum on any positives, above the maximum on any must restricts, or we’re below 2300 calories we’re out of compliance. To make it easier to see where we’re out of compliance, we’ll return a dataframe of the nutrients we’re uncompliant on.

For must restricts:

test_mr_compliance <- function(orig_menu, capitalize_colname = TRUE) {
  
  compliance_df <- list(must_restricts_uncompliant_on = vector(), 
                        `difference_(g)` = vector()) %>% as_tibble()
  
  for (m in seq_along(mr_df$must_restrict)) {    
    nut_to_restrict <- mr_df$must_restrict[m]    # Grab the name of the nutrient we're restricting
    to_restrict <- (sum(orig_menu[[nut_to_restrict]] * orig_menu$GmWt_1, na.rm = TRUE))/100   # Get the amount of that must restrict nutrient in our original menu
    
    if ((to_restrict - mr_df$value[m]) > 0.01) {    # Account for rounding error
      this_compliance <- list(must_restricts_uncompliant_on = nut_to_restrict,
                              `difference_(g)` = (to_restrict - mr_df$value[m]) %>% round(digits = 2)) %>% as_tibble()
      compliance_df <- bind_rows(compliance_df, this_compliance)
    }
  }
  if (capitalize_colname == TRUE) {
    compliance_df <- compliance_df %>% cap_df()
  }
  return(compliance_df)
}

Same idea for positives. Then to test whether we’re compliant overall, we’ll see whether we pass all of these tests. If not, we’re not compliant.

test_all_compliance <- function(orig_menu) {
  combined_compliance <- "Undetermined"
  
  if (nrow(test_mr_compliance(orig_menu)) + nrow(test_pos_compliance(orig_menu)) > 0 |
      test_calories(orig_menu) == "Calories too low") {
    combined_compliance <- "Not Compliant"
    
  } else if (nrow(test_mr_compliance(orig_menu)) + nrow(test_pos_compliance(orig_menu)) == 0 &
             test_calories(orig_menu) == "Calorie compliant") {
    combined_compliance <- "Compliant"
    
  } else {
    combined_compliance <- "Undetermined"
  }
  return(combined_compliance)
}

Let’s see where we are with our random menu.

our_random_menu %>% test_all_compliance()
## [1] "Not Compliant"

😔 We’ve got some work to do!


Scoring Menus

At this point I want an objective and preferably single scalar metric by which to judge menus. We want a metric that takes into account the following things:

  • You don’t get extra credit for going above the daily minimum on positive nutrients
    • This reflects the fact that your body can only absorb up to a certain amount of a vitamin
  • You do, however, keep getting penalized for going farther and farther above the minimum on must_restricts
    • There’s no cap on how bad an increase in bad stuff will keep

For simplicity and because I’m not a doctor, we’ll assume a linear relationship between increasing and decreasing nutrients and their effect on our score. Though they’re really two different dimensions, I want to be able to combine a must restrict score with a postiive score to get a single number out. The directionality of the scores will also have to be the same if we want to combine them; so in both cases, more positive scores mean worse.

Similar to how we tested compliance, I’ll do is go through a given menu and multiply the nutrient value per 100g by GmWt_1, the amount of the food we have. That will give us the raw amount of this nutrient. Then I’ll see how much that raw amount differs from the minimum or maximum daily value of that nutrient we’re supposed to have and give it a score accordingly. Then I’ll add it up.

First, the must restricts. For each must restrict, we find the difference between our maximum allowed value and the value of that must restrict and add those all up. (Perhaps percent above the maximum would be a better metric.)

\(\sum_{i=1}^{k} MaxAllowedAmount_{i} - AmountWeHave_{i}\)

where k is the total number of must restricts.

So the farther above our max we are on must restricts, the higher our score will be.

mr_score <- function(orig_menu) {
  total_mr_score <- 0
  
  for (m in seq_along(mr_df$must_restrict)) {    
    mr_considering <- mr_df$must_restrict[m]    
    val_mr_considering <- (sum((orig_menu[[mr_considering]] * orig_menu$GmWt_1), 
                                na.rm = TRUE))/100   
    
    mr_score <- mr_df$value[m] - val_mr_considering  # max amount it's supposed to be - amount it is

    total_mr_score <- total_mr_score + mr_score
  }
  return(total_mr_score)
}

Similar story for the positives: we’ll take the difference between our minimum required amount and the amount of the nutrient we’ve got in our menu and multiply that by -1:

\(\sum_{i=1}^{k} (-1) * (MaxAllowedAmount_{i} - AmountWeHave_{i})\)

where k is the total number of positive nutrients in our constriants.

That means that if we’ve got less than the amount we need, this value will be negative; if we’ve got more than we need it’ll be positive. Next, to make the best score 0 I’ll turn everything greater than 0 into 0. This takes away the extra brownie points for going above and beyond. Same as with must restricts, lower scores mean “better.”

pos_score <- function(orig_menu) {
  total_nut_score <- 0
  
  for (p in seq_along(pos_df$positive_nut)) {    
    nut_considering <- pos_df$positive_nut[p]    
    val_nut_considering <- (sum((orig_menu[[nut_considering]] * orig_menu$GmWt_1), 
                                na.rm = TRUE))/100   
    
    nut_score <- (-1)*(pos_df$value[p] - val_nut_considering)    # (-1)*(min amount it's supposed to be - amount it is here)
    
    if (nut_score > 0) {
      nut_score <- 0
    } else if (is.na(nut_score)) {
      message("Nutrient has no score")
      break
    }
    total_nut_score <- total_nut_score + nut_score
  }
  return(total_nut_score)
}

Last step is just a simple sum:

score_menu <- function(orig_menu) {
  healthiness_score <- pos_score(orig_menu) + mr_score(orig_menu)
  return(healthiness_score)
}

Let’s build a random menu,

our_random_menu <- build_menu()

and see what its score is.

our_random_menu %>% score_menu()
## [1] -3251.263

In part three we’ll code up a way to solve these menus using the simplex algorithm and couple hand-rolled algorithms for swapping out the bad for the better foods.