The other day I was unable to decrypt a few of my Ansible Vault encrypted host_var files in a playbook. As best I can tell, the problem was related to my use of an executable vault-password-file and the 1Password CLI for fetching passwords.
Follow along for a frightening story and some interesting technical tidbits.
Background on how I stored Vault passwords
I rely on Ansible Vault to encrypt sensitive data in my homelab Ansible playbooks. I stored the Vault password in 1Password, and retrieve it using the 1Password command line interface (CLI).
Generating a new Vault password goes like this:
- Create a new, or Duplicate an existing1, password item in 1Password.
- Let 1Password generate a random password containing letters, numbers, and symbols. Length is chosen randomly by me.
- Save it with a name in this template:
<PLAYBOOK NAME> - Ansible Vault Password.
Ansible playbooks are run on a control node in my homelab. Retrieving a Vault’s password for use looks like this:
- Create a file
.vault_passin the directory containingplaybook.yaml. - Put the below script in it.
- Make the file executable.
- In
ansible.cfgsetvault_password_file = ./.vault_pass.
The simple script:
#!/bin/zsh
VAULT_ID="<VAULT_ID>"
VAULT_ANSIBLE_NAME="<PLAYBOOK NAME> - Ansible Vault Password"
op item get "$VAULT_ANSIBLE_NAME" --vault $VAULT_ID --fields password
The VAULT_ID alpha-numeric string can be sourced by running op vault list. The script runs the op item get command to fetch the specific item out of 1Password, and grabs just the password field on it.
Now when I run ansible-vault encrypt host_vars/my-host, Ansible executes .vault_pass which triggers the op command and the 1Password Agent asks me to authenticate. Following successful authentication the password value is returned to Ansible Vault, and the file is encrypted using it. This is very convenient!
A problem decrypting
One day I noticed an error when attempting to decrypt a host_var file. An error popped up, something about no suitable password found for decrypting.
My heart stopped.
Normally I don’t think about which password secures the files because it “just works”. The 1Password Agent is running on my machine, I run a command that needs a password, the Agent supplies that password, and life goes on. It’s 100% transparent to me, which is great from a usability standpoint!
After a few deep breaths, I started debugging by manually supplying the password to the Vault command:
ansible-vault decrypt host_vars/my-host --ask-vault-pass
Adding the flag --ask-vault-pass would allow me to manually input the password to be used to decrypt the file. I copied the password from 1Password and pasted it into the CLI when prompted. Everything worked fine. But why was my .vault_pass file solution no longer working?
I noticed that when I manually ran the op item get command I received this:
# Command
op item get --vault=$VAULT_ID "$VAULT_ANSIBLE_NAME" --fields password
# Response
ID: Password_ID
Title: <PLAYBOOK NAME> - Ansible Vault Password
Vault: Vault Name (Vault_ID)
Created: 11 months ago
Updated: 11 months ago by Matt Edwards
Favorite: false
Tags: ansible
Version: 1
Category: PASSWORD
Fields:
password: [use 'op item get <Password_ID> --reveal' to reveal]
password: [use 'op item get <Password_ID> --reveal' to reveal]
Well that looks… less than promising. I was not familiar with the --reveal flag. Starting in v2.30.0 of the 1Password CLI command (released 2024-07-29) this new flag was added. The return code of op item get without --reveal was still 0 and I am left wondering if perhaps a run of ansible-vault encrypt used the response from that which would have been:
password: [use 'op item get <Password_ID> --reveal' to reveal]
Using this as a raw string would not decrypt the vault. Oh well.
Peculiarly, there’s two different Fields both named password – which one was being returned? In the 1Password app GUI it only showed a single password value. It did not show any historical passwords for that field. Where was this second value coming from?
Changing 1Password CLI commands
There is a second CLI command from 1Password which can return a value from a saved item, op read. I ran this alongside op item get to determine if they would return the same value. I also added in the --reveal flag.
op item get --vault=$VAULT_ID "$VAULT_ANSIBLE_NAME" --fields password --reveal
# Response: ABC
op read "op://HomeNetwork/pihole - ansible vault password/password"
# Response: DEF
The two commands returned two different values.2
Uh oh. But at least I have both values and surely one of them must decrypt the file.
…nope.
The script is missing something
Deeper into this debugging I noticed something I should have caught immediately. For this particular playbook folder the .vault_pass file was not granted executable permissions.
$ ls -l .vault_pass
-rw-r--r-- 1 matt staff 251 July 29 13:53 .vault_pass
According to Ansible’s (somewhat skimpy!) documentation of the matter, the config value of vault_password_file should be a path which is one of:
- A file containing the password string
- An executable file which returns the password string
Since this file was not executable maybe Ansible had been using the raw contents of the script as the password?
Unfortunately, this is where the mystery ends as it is too late to go back and validate this. I decided instead to nuke-and-pave the file. I sorely wish I had saved the file so I could test out this theory.
Start from scratch
Since the host machine was alive in prod, I decided to simply re-create the host_vars/my-host file manually. I logged in, grabbed the passwords, and put them in a new file created using ansible-vault create host_vars/my-host.
A little annoying, but the impacted playbook had a fairly tame host_vars file, so I opted for the simplest solution to the problem.
Lessons learned
Make sure script files are executable. This is a strange case where you don’t get a failure if a script file which needs the executable permission does not have it. Instead, the contents of the script could/will be used as the password.
Automation can be a risk! If a script is returning a password used to encrypt something, you had better hope that script returns the value you actually have stored in your password manager. The format of and the passwords in my Ansible Vault encrypted files are mostly not kept anywhere else, other than the live system they are running on. Losing the encryption key to those files is a PITA because not only might I have to rotate passwords I need to work out all of the variable names I had defined in that file. On the other hand, this sort of automation means I don’t have to copy-and-paste passwords into my CLI when running a playbook, and risk accidentally pasting a password somewhere I should not have.
Upgrading your tools quickly can be dangerous. I routinely update my tools. 1Password CLI is installed via Homebrew, and I update all of those Homebrew-installed packages using brew update && brew upgrade about once per week. It’s good hygiene! But it can also result in situations like this. Had I been using op from the command line I might have noticed that passwords were no longer returned without the new --reveal flag; however, I had wrapped op into an executable script which I had no visibility into. I do not think this is either a good thing, or a bad thing, just a regular thing that can happen as you layer various tools on top of each other in your workflow.
Use the right tool for the job. As I dug through 1Password’s documentation for op item and op read, I noticed this line:
To retrieve the contents of a specific field, use
op readinstead. When using service accounts, you must specify a vault with the--vaultflag or through piped input.
Guess that settles it, I’ll use op read from now on. The syntax of op read is a bit nicer; no more alpha-numeric VAULT_ID in the script.
Test all of your assumptions. While writing this blog post, I decided to make a testing item in 1Password with two fields named “password”. What does op read return in this case?
op read "op://HomeNetwork/Test Password/password"
For a password-type item in 1Password, it returns the special/first password field. It ignores any additional fields I add named “password”. And if I remove the value from the special/first password field, it returns an empty string even if a secondary field is defined and named “password” with a value in it.