beer.ai

Using Machine Learning to generate new beers.

This project is a fun attempt to create beer recipes based on learning other recipes from online sources.

Getting Started

Getting your environment set up is standard. Note that this project requires python3.6+.

python3.6 -m venv env
source env/bin/activate
pip install -r requirements.txt

Once installed, you're good to go. Note that there is currently an annoying print statement in the pybeerxml package. You can get rid of it by deleting line 22 in .env/lib/python3.6/site-packages/pybeerxml/fermentable.py.

After this, you need data. Currently, you can either run the web scraper to get your own data (takes days), or you can ask Rory for the data. It should be stored in beer.ai/recipes/*.xml. As soon as we have finished pre-processing the recipes, you can likely skip this step and just load up an HDF containing all the recipes.

Project Layout

This will be filled in more at the end.

beer.ai contains the primary code, at the moment.
beer.ai/recipes contains all the beer xml recipe files
...

Data and Formats

There are a number of files containing useful data available for this project. The list below describes what is contained in each file.

all_recipes.h5 - Raw recipe data in HDF5 format. Contains every recipe entry that has been parsed from beer xml files. The recipes are split into 2 tables:
- core - A table containing the core information about a recipe. The index of this table is a unique identifier for the recipe, and matches with the ingredients table entries.
- Fields:

Name	Description	Unit	Data type	Typical values
`batch_size`	The volume in the fermenter, pre-fermentation. Or the volume in the package (bottle or keg).	L	float64	19.684141256, 18.927059, 22.712471
`boil_size`	The volume in the kettle, pre-boil.	L	float64	23.658823625, 30.283294, 11.356235
`boil_time`	The duration of the boil.	minutes	float64	60, 90, 120
`brewer`	The username of the recipe author.	-	str	anonymous, capitalcityhomebrewsupply, bitter & esters
`efficiency`	The ratio of extract in the wort (kg) to grain in the mash (kg).	%	float64	0.75, 0.80, 0.35
`fg`	The final gravity of the beer. (brewersfriend only)	SG	float64	1.010, 1.002, 1.031
`ibu`	The bitterness of the beer. (brewersfriend only)	IBU	float64	99, 55.0, 1
`name`	The name of the recipe.	-	str	untitled specialty beer, ipa, continental drift wheat
`og`	The original gravity of the beer. (brewersfriend only)	SG	float64	1.054, 1.072, 1.031
`origin`
`recipe_file`
`style_category`
`style_name`
`style_version`

ingredients - A table containing the actual ingredients for each recipe. The index of this table matches with the index from the core table. Note that a given recipe can (and usually does) span multiple rows since a given row holds only one of each type of ingredient (see below) and there will usually be multiples of each type of ingredient.
- ferm_amount, ferm_color, ferm_display_amount, ferm_name, ferm_origin, ferm_potential, ferm_yield.
- hop_alpha, hop_amount, hop_display_amount, hop_form, hop_name, hop_origin, hop_time, hop_use.
- misc_amount, misc_amount_is_weight, misc_name, misc_time, misc_use.
- yeast_amount, yeast_attenuation, yeast_flocculation, yeast_form, yeast_laboratory, yeast_name, yeast_product_id, yeast_type
recipe_vecs.h5 - A representation of recipes in a simple format.
- The data is stored under the vecs key.
- Each recipe is represented as an (N+1) length vector, where N is the number of possible ingredients (similar to one-hot encodings). The index in each vector represents a specific ingredient, and the value in that index represents how much of that ingredient is present (in mass/liter units).
- The last entry in each recipe vector represents the boil time in minutes. This is the +1 above.
- The lookup for indice-to-ingredient is present in vocab.pickle (see below).
vocab.pickle - A mapping of ingredient string -> unique id.
- The map is stored as a simple python dict going from str -> int.
- The unique id's in this map indicate the index of the vector in the recipe_vecs representation of recipes (see above) that will have a value.
*map.pickle - A mapping of various ingredient strings to the "standard" version of that string.
- The * above can be replaced with any of the ingredient categories (ferm, hop, misc, yeast).
- The "standard" version is usually either a less specific version of an ingredient (e.g. "Munton's roasted barley" -> "roasted barley"), a symbol-standard version ("2-row" -> "2 row"), or a properly-spelled version (e.g. "veinna malt" -> "vienna malt").
- The maps have been painstakingly created by hand and represent our best (first) effort to standardize ingredients. There are a lot of assumptions that get built into this process. For a discussion on this, see XXX.

Loading in Data

This section gives information on how to read in recipes, manipulate them, and make sense of the data.

Raw Recipes

Loading in raw recipes is fairly straightforward with pandas.

import pandas as pd

store = pd.HDFStore("all_recipes.h5", "r")

# Read in recipes 10-20. "start"/"stop" says how many rows to read
recipes = store.select("/core", start=10, stop=20)

# Read in the ingredients for the first 10 recipes. "where" says which indices
# to pull out, which is not the same as the number of rows in the `ingredients`
# table.
ingredients = store.select("/ingredients", where="index >= 10 & index < 20")

# Join the dataframes together. Note that this will duplicate the information
# in `recipes` to match the number of rows in ingredients. For other ways of
# joining, look at the pandas documentation for `join` and `merge`.
df = recipes.join(ingredients)

The above example will load in the raw recipes, which is mostly just useful for inspecting the various data columns and doing exploratory data analysis (EDA) on the base quantities.

If you wanted to "standardize" (see "Data and Formats" above) all of the ingredients for a particular category, you could do the following (starting with ingredients from above):

import pickle

# Standardize our fermentables
with open("fermmap.pickle","rb") as f:
    fermmap = pickle.load(f)

ingredients["ferm_name"].replace(fermmap, inplace=True)

Simple Recipe Vector Representation

To play around with the simple recipe representation, you can do the following:

with pd.HDFStore("recipe_vecs.h5","r") as store:
    vecs = store.get("/vecs")
vecs.shape
# (171699, 793)

In the above example, we see that we have 171,699 recipes that are each represented as a length 793 vector. This could be translated to an english-reading recipe with the following code:

with open("vocab.pickle", "rb") as f:
    vocab = pickle.load(f)

# Change from str -> int to int -> str
inv_vocab = {v:k for k,v in vocab.items()}

# Look at first recipe
rec1 = vecs.iloc[0]
rec1[rec1 != 0].reset_index().replace({"name": inverse_vocab})

#                           name        0.0
# 0              ferm_flaked rice   0.011782
# 1  yeast_brettanomyces lambicus   0.052674
# 2          yeast_east coast ale   0.024258
# 3         yeast_fermentis us-05   0.013862
# 4                     hop_comet   1.000000
# 5                    hop_waimea   0.000193
# 6                    boil_time   60.000000

Recipe Assumptions

Most of the assumptions about how we handle a recipe are stored in sequence_notes.md. This mentions what fields we're using, where we're normalizing values, and the general procedure of how we create a recipe from a beer xml file.

Authors

Rory Woods
Rob Welch

Name		Name	Last commit message	Last commit date
Latest commit History 593 Commits
beerai		beerai
data		data
docs		docs
notebooks		notebooks
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

beer.ai

Getting Started

Project Layout

Data and Formats

Loading in Data

Raw Recipes

Simple Recipe Vector Representation

Recipe Assumptions

Authors

About

Releases

Packages

Contributors 3

Languages

rmwoods/beer.ai

Folders and files

Latest commit

History

Repository files navigation

beer.ai

Getting Started

Project Layout

Data and Formats

Loading in Data

Raw Recipes

Simple Recipe Vector Representation

Recipe Assumptions

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages