12 Nov 2012Reusable Cookbooks Revisited
It seems reusable cookbooks are a hot topic at the moment. I recently sat in on the Reusable Cookbook Patterns hangout run by the most excellent Food Fight show where Noah Kantrowitz gave his thoughts on "Application" versus "Library" cookbooks. His approach aligned with the way we have approached cookbook reusability (See " Evolving towards cookbook reusability in Chef" for a basic overview of our view on reusability after using Chef for six months).
If I was to simplify Noah's view down I believe it would be that "library" cookbooks are a collection of LWRPs that manipulate resources. The "library" cookbook may also include a default recipe that installs the actual bits on the system. The "application" cookbooks depend on the "library" cookbook and then use the "library" cookbooks LWRPs to configure the system. (It should be noted that the term "application" cookbooks seemed to identify any cookbook that uses a "library" cookbook). The way that an "application" cookbook communicates with a "library" cookbook is through what Noah describes as "data capsules" which I believe just means rich data types passed into the LWRPs.
Our basic pattern for reusable cookbooks follows a similar approach except that the way we communicate with the reusable cookbooks is to use simple types - essentially anything that can be represented in json; numbers, strings, booleans, arrays and hashes. We go one step further in that we also define a recipe that reads node attributes and interprets the attributes to invoke the required LWRPs. The motivation for this was to DRY up our cookbooks. It also makes it easy to use other cookbooks that manipulate attribute data such as Heavywater's bag_config cookbook.
An Example
To highlight this I will make use of the glassfish cookbook again. GlassFish is an an application server in which you install sub-components such as web applications, libraries, database pools, message broker references etc.
Below are two ways of configuring a small, simple web application. The application uses a database and has a single configuration entry accessible via JNDI. The actual code in the two recipes is not important for the conversation but it is presented to give you a feel of the different approaches.
Using an attribute_driven recipe
Using raw LWRPs
Comparison
The attribute_driven recipe is marginally smaller (56 lines versus 68 lines) and this is mostly is due to the repetition when using raw LWRPs. However the greatest advantage that we see for the attribute_driven approach is the simpler cognitive model.
In most cases using raw LWRPs requires that the caller understands the implicit ordering requirements. i.e. Database pools and resources need to be set up before the application is deployed. The user of the raw LWRPs also needs to manually manage the removal of resources when they are no longer required. Compare this to the attribute_driven recipe approach that can automatically determine that a database pool, deployable or other component is no longer required (as it no longer appears in attribute data) and remove the component from the glassfish server.
Using the attribute_driven recipe does not remove the ability to directly use the raw LWRPs when needed. However 95% of the time we can get away with working at a higher level using the attribute_driven recipe.
Our approach also makes it easy it easy to build up configuration from multiple sources. In our environment we typically build up configuration data from data bags in the chef server, a separate configuration service, LDAP/ActiveDirectory, a rule layer as well as occasionally hard coding the configuration into a recipe. However after we have collected the configuration from the various sources, we just need to apply it as node attribute data and include the attribute_driven recipe. Hopefully there are fewer problems resulting from transcribing the configuration from one source to the node data than there are if we had to interpret the configuration data and invoking the LWRPs in the correct sequences.
In fact recently we have introduced a 'search_driven ' recipe that crystallizes a common approach to collecting configuration data. It searches a particular index, using a particular query and extracts data from within the index and applies the data to the node in the correct location. Essentially that means we can store all our configuration data in the data bags for a particular glassfish domain.
Using a search_driven recipe
When to use re-usable Cookbooks?
So one question that not a lot of time was spent on during the hangout was when to use "library" cookbooks. We are strong proponents of reusable cookbooks and yet in our infrastructure, only 5 of our 70+ cookbooks fall into this category. I can envision the ratio going up to as many as 9 in ~55 cookbooks but that is still a small proportion of our cookbooks. The reusable cookbooks include core functionality such as; firewalls, monitoring, the application server, the message broker and the content management system. Our other cookbooks may be reusable to one degree or another but no other cookbook follows the "library" design pattern.
There seemed to be a strong turnout from those who have come from the developer tradition in contrast to the operations tradition which may account for the strong push towards reuse and higher level abstractions. Our LWRPs tend to be thin veneers on top of abstractions in the underlying tool and the attribute_driven recipes are thin veneers on top of the LWRPs. I can see that higher level abstractions that are widely applicable may have merit and may even drive infrastructure decisions. Rails was remarkable in the way it simplified development through a set of conventions and higher level abstractions and maybe that approach could be just as successful in Chef. However that is not something we do locally so I don't have a feeling for how good or bad it could be.
Overall I enjoyed the hangout - it is pleasing to see a lot of smart and passionate people in the chef community.