How to dynamize your Kubernetes local storage

In this article, I will explain how I used a combination of { Ansible + local storage static provisioner + Kustomize } in order to manage local volumes (on-prem VMs) in a pseudo-dynamic way.

Some issues with the traditional way

Managed Kubernetes clusters are great. Especially when they are managed by major cloud providers, like GCP or AWS.
But shit can happen and you may have to deal yourself with the full complexity of on-premise kubernetes clusters and, even worse, with no volume driver to interact with (ie. vSphere volume plugin, etc.).

When you are dealing with the local Storage Class, there is no provisioner associated with it. So there are some drawbacks compared to other common dynamic provisioners. Here is a non-exhaustive list:

  • you have to create and delete manually your Persistent Volumes (PVs)

  • a PV is linked to a specific node. So you have to play with node labels to ensure that your workloads will be scheduled on the correct node(s)

  • you have to create manually one or more Storage Classes, depending on the granularity you want for storage requests. Indeed, if you have different kind of workloads, you don’t want that a given PVC only requesting for 1Gi storage get offered a large pre-created PV of 10Gi! so, as a response to this issue, it’s common to prepare dedicated SC for each type of workload.

A brief reminder of provisioners traits

Internal VS External provisioner

The difference between those two is really well explained here in a few words.

For example, a few Volume Plugins like 'Local' and 'NFS' are not natively managed by Kubernetes. But they can be managed by some external provisioners.

Dynamic VS Static provisioner

Dynamic provisioners will handle the PV lifecycle for you, whereas, as opposite, you may be the "static" provisioner.

The local-storage-static-provisioner

In this article I’m speaking about local storage management. So, here comes the local-storage-static-provisioner. Its purpose is to detect local directories in one - or more - given mount(s) in order to create the associated PV.

These directories can be either mounted filesystem or mounted block disks. In my case, I’m playing with LVM and mounting ext4-formatted LVs on the filesystem.

Once a LV is mounted on a sub-directory under a known parent dir (which is associated to one given StorageClass), then this provisioner will create the associated PV for this new sub-directory.

You can have all your PVs as "instances" of one single StorageClass if they have one common purpose. Or you can play with multiple SC if you want to manage them by size and purpose.

To catch-up with the provisioners definition given above, we are here facing a semi-static / semi-dynamic external provisioner. I hope this makes things neat 😏

The plan with an example

Well, I’ll use the example of an ElasticSearch cluster deployment. As you may know, there are two kinds of nodes in an ES typology: master & data nodes.

So here, I want to use 3 PVs of 5Gi for the 3 master nodes and 2 other PVs of 20Gi for data nodes.

Of course, I want to produce minimal effort to get those PVs ready, so I will use a combination of Ansible playbook and roles:

  1. deploy the local-storage-static-provisioner: spawns a DaemonSet, pretty easily thanks to its Helm chart

  2. manage the underlying storage with LVs

  3. create as many StorageClasses as there are usage types: here, we need to differentiate master nodes and data nodes because of the volumes size requirements, so 2 SC in total

  4. update the local-storage-static-provisioner configuration via Kustomize, so that it auto-magically creates the corresponding PVs

  5. run the Helm chart of ElasticSearch with a templatized values.yaml referencing our new StorageClasses

Here is the Big Picture:

Woh! nice plan! 🎢

Implementation

First, some context about my environment: the Kubernetes cluster was installed with kubeadm and it’s composed of 1 master and 2 workers. helm and kubectl commands are run from the master node.

Given the plan above, the implementation is pretty straightforward with Ansible.

Step 1: local-volume-provisioner

I won’t describe how the local-storage-static-provisioner was deployed because I just ran the Helm chart, wrapped in an Ansible role. The only value that was overridden was the classes: one, with no content at all (I don’t need those default values).

Steps 2, 3 & 4: LVs, StorageClasses and provisioner configuration

Let’s start with the definition of our volumes:

---
local_storage: (1)
  name: "local-storage"
  base_mount_point: "/mnt/local-storage"
  vg_name: "rvg"

volume_elastic_master: (2)
  class_name: "elastic-master-sc"
  lv_size: "5G"
  lv_name: "elastic_master"
  pv_reclaimPolicy: Delete
  number_of_volumes: 3

volume_elastic_data: (3)
  class_name: "elastic-data-sc"
  lv_size: "10G"
  lv_name: "elastic_data"
  pv_reclaimPolicy: Delete
  number_of_volumes: 2
1 Define the base mount path on each k8s worker node and some global vars
2 Define the requirements for master nodes of the ES cluster. 3 volumes and LV should be named and sized like described
3 Same story for ES data nodes

Those variables describe the volume requirements for each type of ES node. All of them are managed by a single Ansible role, the so-called smart-local-volume:

- name: "smart management of volumes"
  become: yes
  run_once: true
  include_role:
    name: smart-local-volumes
  vars:
    volume: "{{ item }}"
    workers: "{{ groups['k8s-workers'] }}"
    remote_basepath_kustomize_assets: "{{ local_storage.name }}"
  with_items:
    - "{{ volume_elastic_master }}"
    - "{{ volume_elastic_data }}"

So, we have one role in charge of:

  • the creation of all our LVs and mounts

  • the creation of the 2 associated StorageClasses

  • updating the local-storage provisioner config

Note that, there are a few subtleties that could interest you, like the random distribution of volumes across the cluster. See this task, line 47:

- name: global vars
  set_fact:
    nbWorkers: "{{ workers | length }}"
    shuffled_workers: "{{ workers | shuffle }}"

<...>

# LV
- name: "create LV(s)"
  become: yes
  run_once: true
  include_role:
    name: lv-mgmt
    apply: # https://github.com/ansible/ansible/issues/35398#issuecomment-451301837
      delegate_to: "{{ shuffled_workers[loop_index | int % nbWorkers | int] }}" (2)
  vars:
    logical_volume:
      vg: "{{ local_storage.vg_name }}"
      lv: "{{ volume.lv_name }}{{ loop_index }}"
      size: "{{ volume.lv_size }}"
      mount_point: "{{ local_storage.base_mount_point }}/{{volume.class_name }}/{{ volume.lv_name }}{{ loop_index }}"
  delegate_facts: true
  with_sequence: start=0 end={{ volume.number_of_volumes - 1 }} (1)
  loop_control:
    loop_var: loop_index
1 for the number of volumes…​
2 create the given volume iteratively on nodes, starting from a random worker node

When it comes to notifying the provisioner that we want it to manage new kind of StorageClasses, we use a combination of Ansible templates and Kustomize in order to patch both the ConfigMap and the DaemonSet.

Have a look at the Ansible script, from line 67 to the end.

As you can see, the kustomization.yaml file plays 2 patches, one for the ConfigMap, the other one for the DaemonSet.

resources:
  - previous_ds.yaml
  - previous_cm.yaml
patches:
  - config.yaml
  - ds.yaml

Patching a DaemonSet is a built-in feature but, as far as I know, there are no pre-build plugins for merging a given item in a ConfigMap (which sounds legit). Let me know if you think Kustomize can do it natively.

Step 5: deploy ElasticSearch

It’s nothing more than another set of small Ansible tasks, pushing the Helm chart and a templatized values.yaml, itself referencing the same volume vars.

You will find it here.

As you can see, I can use the Helm chart in a standard way, just by referencing the newly created StorageClasses:

  master:
    name: master
    replicas: "{{ volume_elastic_master.number_of_volumes }}"
    heapSize: "512m"
    # additionalJavaOpts: "-XX:MaxRAM=512m"
    persistence:
      enabled: true
      accessMode: ReadWriteOnce
      name: data
      size: "{{ volume_elastic_master.lv_size }}"
      storageClass: "{{ volume_elastic_master.class_name }}"
    resources:
      limits:
        cpu: "1"
        memory: "1024Mi"
      requests:
        cpu: "25m"
        memory: "512Mi"

That' it! PVCs will bind to pre-created PV, according to the StorageClasses.

ElasticSearch should be up & running! 🍩

PV renewal

Depending on the retain policy that you have configured on each StorageClass or PV, your PV may be just "released" or "deleted" once the associated PVC is deleted.

If you set that property to "Delete", the PV will be deleted and automatically re-created in a few seconds by our local provisioner.

As opposite, if you set that property to "Retain", you may have delete the PVC, the PV and to clean the underlying storage volume, see below.

Cleaning tasks

At the moment, the removal part both of PVs and of the underlying storage - ie. the LVs and filesystem mounts - is not yet scripted.

I didn’t start this script because I was not sure if I really need it in a production environment.

Also, it may happen that you just want to remove a subset of volumes for a given class. So cleaning scripts should take this kind of option into consideration.

Here is my personal TODO-list when I want to clear the volumes associated to an application and clean the system:

# alias k='kubectl'
# alias ks='kubectl -n kube-system'

# purge Helm release of the app
$ helm delete --purge <release-name>
# list PVC, if bound, stop app and delete PVC
$ k get pvc -A
$ k delete pvc <...>
# edit the configMap to delete the given SC
$ ks edit cm local-provisioner-config
# edit the daemonSet to do so with volumes and mounts
$ ks edit ds local-volume-provisioner
# check that the DS pods of the provisioner are automatically restarted
$ ks get po
# delete PV
$ k get pv
$ k delete pv <...>
# delete storageClass
$ k get sc
$ k delete sc <...>
# on each worker, delete the content of the mounted dir
$ lsblk
$ rm -rf /mnt/local-storage/registry-sc/registry0/*
# umount dir
$ lsblk
$ umount /mnt/local-storage/registry-sc/registry0
# delete the mounted dir
$ rmdir ...
# lvremove
$ lvs
$ lvremove ...
# update fstab
$ vim /etc/fstab