April 30, 2019

Fixes with GlusterFS Block Volume in CNS

Container native storage, or CNS, is a technology that widely used on Kubernetes and many other container platforms.

No Space Error when Provisioning Block Volume

OpenShift supports GlusterFS block volume natively. When deploying OpenShift with Ansible, adding the configuration into inventory file will enable the automated process to deploy GlusterFS block storage as well.

[OSEv3:vars]
....
openshift_storage_glusterfs_block_deploy=true
openshift_storage_glusterfs_block_host_vol_size=400
openshift_storage_glusterfs_block_storageclass=true
....

After installation, the GlusterFS pods would be setup in openshift-storage project and glusterfs-storage-block storage class is set up as well. With these configurations, GlusterFS volumes works fine with OpenShift. But when provisioning persistent volumes using glusterfs-storage-block storage class, the process status will eternally stuck in "Pending".

Starting with the log from the glusterblock-storage-provisioner-dc pod, it seems to be a very simple problem.

Error: Failed to allocate new volume: No space

Check immediately with heketi-cli topology info and all the nodes have over 200GB free space. However, provisioning a 64GB or even less block storage will leads to a failure. Meanwhile, provisioning a block volume in same size directly via heketi-cli volume create could success.

Rodrigo Bersa in "Re: Provisioning persistence for metrics with GlusterFS": The Gluster Block volumes works with the concept of block-hosting volume, and these ones are created with 100GB by default. To clarify, the block volumes will be provisioned over the block hosting volumes. Let's say you need a 10GB block volume, it will create a block hosting volume with 100GB and then the 10GB block volume over it, as the next block volumes requested until it reaches the 100GB. After that a new block hosting volume will be created and so on.

Also according to OpenShift documentation:

gluster-block volumes require the presence of a GlusterFS block-hosting volume with enough capacity to hold the full size of any given block volume's capacity. By default, if no such block-hosting volume exists, one will be automatically created at a set size. The default for this size is 100 GB.

It seems like I have set a wrong block-hosting volume size. This size could be defined by setting  openshift_storage_glusterfs_block_host_vol_size parameter during OpenShift installation. But how to change it afterwards? Where does this parameter go?

In order to work out the problems, I searched block_host_vol in openshift_ansible project and found it was used in roles/openshift_storage_glusterfs/templates/heketi.json.j2.

....
    "_block_hosting_volume_size": "New block hosting volume will be created in size mentioned, This is considered only if auto-create is enabled.",
    "block_hosting_volume_size": {{ glusterfs_block_host_vol_size }}
....

And this size finally goes into block_hosting_volume_size parameter. So we need to change heketi configuration file. Let's check heketi deployment config to find out the configuration first.

....
          volumeMounts:
            - mountPath: /var/lib/heketi
              name: db
            - mountPath: /etc/heketi
              name: config
....
      volumes:
        - glusterfs:
            endpoints: heketi-db-storage-endpoints
            path: heketidbstorage
          name: db
        - name: config
          secret:
            defaultMode: 420
            secretName: heketi-storage-config-secret
....

The configuration is stored in an Opaque secret heketi-storage-config-secret including a heketi.json encoded in base64. Now we can save it int o a file and decode it, edit it and re-encode it in base64 to change the setting. Finally, change the secret to the new base64 string and delete all heketi pods, the configuration will applied.

Null Block Volume Response

....
E0428 08:01:01.821670       1 glusterblock-provisioner.go:451] BLOCK VOLUME RESPONSE: <nil>
E0428 08:01:01.821709       1 glusterblock-provisioner.go:453] [heketi] failed to create volume: 
E0428 08:01:01.821722       1 controller.go:895] Failed to provision volume for claim "results-mining/elasticsearch-data-2" with StorageClass "glusterfs-storage-block": failed to create volume: [heketi] failed to create volume: 
E0428 08:01:01.821765 1 goroutinemap.go:165] Operation for "provision-results-mining/elasticsearch-data-2[957c1231-698b-11e9-a2f6-6451060dfbac]" failed. No retries permitted until 2019-04-28 08:01:02.821752299 +0000 UTC m=+1380528.403251822 (durationBeforeRetry 1s). Error: "failed to create volume: [heketi] failed to create volume: "
....