Warm tip: This article is reproduced from stackoverflow.com, please click
h5py hdf5 python

Storing a list of strings to a HDF5 Dataset from Python using VL format

发布于 2020-03-27 10:16:50

I expected the following code to work, but it doesn't.

import h5py
import numpy as np

with h5py.File('file.hdf5','w') as hf:
    dt = h5py.special_dtype(vlen=str)
    feature_names = np.array(['a', 'b', 'c'])
    hf.create_dataset('feature names', data=feature_names, dtype=dt)

I get the error message TypeError: No conversion path for dtype: dtype('<U1'). The following code does work, but using a for loop to copy the data seems a bit clunky to me. Is there a more straightforward way to do this? I would prefer to be able to pass the sequence of strings directly into the create_dataset function.

import h5py
import numpy as np

with h5py.File('file.hdf5','w') as hf:
    dt = h5py.special_dtype(vlen=str)
    feature_names = np.array(['a', 'b', 'c'])
    ds = hf.create_dataset('feature names', (len(feature_names),), dtype=dt)

    for i in range(len(feature_names)):
        ds[i] = feature_names[i]

Note: My question follows from this answer to Storing a list of strings to a HDF5 Dataset from Python, but I don't consider it a duplicate of that question.

Questioner
mhwombat
Viewed
318
teegaar 2019-07-03 21:23

You almost did it, the missing detail was to pass dtype to np.array:

import h5py                                                                                                                                                                                                
import numpy as np            

with h5py.File('file.hdf5','w') as hf: 
     dt = h5py.special_dtype(vlen=str) 
     feature_names = np.array(['a', 'b', 'c'], dtype=dt) 
     hf.create_dataset('feature names', data=feature_names)

PS: It looks like a bug for me - create_dataset ignores the given dtype and don't apply it to the given data.