How to store configuration data for Ansible

Thinking in data structures is not very common among system administrators. Even with DevOps and Infrastructure as Code things are changing slowly. It requires a completely different way to look at problems.

Everyone knows those kind of environments that used to be managed by a whole bunch of bash scripts no one really understands. The good things about bash scripts is that they allow us to start with Automation quickly without thinking to much about it. But the downside is that it can get really messy quickly.

Fortunately today there are tools like Ansible that allow us manage even large environments with ease and keep everything clean and tidy. But the tool itself can not do magic. We have to think about how we use it. Still a lot of people use Ansible in a way as if it was a 1:1 replacement for bash scripts.

When I used Ansible for the first time I had no clue about YAML and JSON and it didn’t make sense to me that there were two formats for the same thing. It took a while until I realized that one (JSON) is made for communication between machines and the other one (YAML) is used for communicatian between humans and machines. Think about using Ansible to make some API calls. We enter our data in YAML and use the JSON representation of that data to query it with json_query (JMESPATH).

As YAML and JSON are just a form of viewing data it makes sense to think about it on a different level. The same ideas apply to a whole bunch of programming languages and going into the details of the implementation may narrow our horizon.

Think about setting up 100 machines (web1 to web100) with Ansible. We will want to use some kind of for each loop to set our machines up. To do that we need to have a list of similar elements that we can use in our loop.

Each of the elements in our loop may have an unique configuration that we roll out.

- name: server1
  configuration: customer1
- name: server2
  configuration: customer2
- name: server3
  configuration: customer1

However when it comes to billing our customers for our services it needs a more or less complex json_query to get that information. From that point of view it would have been better to store our information as follows:

- customer1:
    - server1
    - server3
- customer2:
    - server2

Does that make things better? Now we need the json_query to find our servers.

Both types of data structures come with their advantages and disadvantages. What we should understand is that both are just a view to same information. In that way it is very similar to what YAML and JSON are. Just a different view on the same thing. A long time ago Relational Database Systems were so popular that even kids in school learned about normalizing data. Having different view on the same data can also be largely beneficial with Infrastructure as Code. Comining SQL with Infrastructure as Code is probably one of the next evolution steps.

Thinking about data structures helps to get a more clear view on what Ansible actually does. If you look at it from this perspective you may understand that Ansible playbooks are exactly that: A data structure written in YAML syntax.