Working with ndarrays

The overview introduced the ndarray – a typed, multi-dimensional array whose bulk numeric data lives in a binary block while its shape, datatype, and byte order are described by metadata in the YAML tree. This page is a practical guide to doing things with ndarrays in libasdf: reading them out of a file, copying and converting their data (including sub-array “tiles”), and building new arrays to write back out.

Everything here is declared in asdf/core/ndarray.h (pulled in by the umbrella asdf.h) with supporting datatype definitions in asdf/core/datatype.h.

The asdf_ndarray_t structure

An ndarray is represented by an asdf_ndarray_t. When reading, the library heap-allocates one for you and fills in the metadata; when writing, you fill one in yourself, typically on the stack. The publicly visible fields are:

Field

Type

Meaning

ndim

uint32_t

Number of dimensions.

shape

const uint64_t *

Length of each dimension; an array of ndim values.

datatype

asdf_datatype_t

Element datatype (see Datatypes and byte order below).

byteorder

asdf_byteorder_t

Byte order of the stored data (ASDF_BYTEORDER_LITTLE or ASDF_BYTEORDER_BIG).

source

size_t

Index of the binary block holding the data.

offset

uint64_t

Optional byte offset into the block where the data begins.

strides

const int64_t *

Optional per-dimension strides; NULL means C-contiguous.

Note

Only a subset of full ndarray functionality is implemented so far. In particular complex datatypes, string (ascii / ucs4) and structured datatypes, masks, and arbitrarily strided data are not yet fully supported for reading. See the asdf/core/ndarray.h header for the current status.

Reading an ndarray from a file

If you know the YAML Pointer path to an array, read it with asdf_get_ndarray:

asdf_ndarray_t *array = NULL;

if (asdf_get_ndarray(file, "data", &array) != ASDF_VALUE_OK) {
    fprintf(stderr, "no ndarray at /data\n");
    return 1;
}

printf("%u-dimensional array\n", array->ndim);
for (uint32_t dim = 0; dim < array->ndim; dim++)
    printf("  axis %u: %" PRIu64 " elements\n", dim, array->shape[dim]);

The library allocates array; release it with asdf_ndarray_destroy when you are done:

asdf_ndarray_destroy(array);

If you do not know the path in advance you can search the tree for the first ndarray with asdf_value_find and the asdf_value_is_ndarray predicate, then cast the matching value with asdf_value_as_ndarray:

asdf_value_t *root = asdf_get_value(file, "");
asdf_value_t *found = asdf_value_find(root, asdf_value_is_ndarray);

asdf_ndarray_t *array = NULL;
if (found && asdf_value_as_ndarray(found, &array) == ASDF_VALUE_OK) {
    /* ... */
}

asdf_value_destroy(found);
asdf_value_destroy(root);

See Reading values out of the ASDF tree for more on generic value handles and tree traversal.

Accessing the raw data

The quickest way to reach the element data is asdf_ndarray_data, which returns a pointer to the (decompressed) bytes and optionally their size:

size_t nbytes = 0;
const void *data = asdf_ndarray_data(array, &nbytes);

This pointer is owned by array and must not be freed. The data is in the array’s source datatype and byte order, exactly as stored – no conversion is performed. Two helpers describe the array’s extent without touching the data:

asdf_ndarray_data_raw is a lower-level variant that, for compressed arrays, returns the still-compressed bytes without decompressing them; for uncompressed arrays it is equivalent to asdf_ndarray_data.

Because asdf_ndarray_data exposes the data in its original layout, you are responsible for honoring byteorder and datatype yourself. When you want the data converted to a convenient host type, use the reading functions in the next section instead.

Reading data with conversion

asdf_ndarray_read_all copies the whole array into a buffer, converting it to the host’s native byte order and, optionally, to a different numeric datatype:

double *values = NULL;
asdf_ndarray_err_t err =
    asdf_ndarray_read_all(array, ASDF_DATATYPE_FLOAT64, (void **)&values);

if (err == ASDF_NDARRAY_OK) {
    uint64_t n = asdf_ndarray_size(array);
    /* values[0 .. n-1] are now native-endian doubles */
    free(values);
}

Passing NULL for the destination (as above, via a pointer whose value is NULL) asks the library to allocate a buffer of the right size; the caller then owns that memory and must free() it. Alternatively, pre-allocate a buffer of asdf_ndarray_nbytes (or the appropriate size for the converted type) and pass its address. Pass ASDF_DATATYPE_SOURCE as the destination datatype to keep the array’s original element type and only normalize byte order.

Reading tiles

Often you only need a rectangular sub-region (“tile”) of a large array. asdf_ndarray_read_tile_ndim reads an arbitrary N-dimensional cutout given an origin and a shape, each an array of ndim values:

/* A 5x5 tile anchored at the array origin of a 2-D array */
uint64_t origin[2] = {0, 0};
uint64_t shape[2]  = {5, 5};

float *tile = NULL;
asdf_ndarray_err_t err = asdf_ndarray_read_tile_ndim(
    array, origin, shape, ASDF_DATATYPE_FLOAT32, (void **)&tile);

if (err == ASDF_NDARRAY_OK) {
    /* tile holds 25 native-endian floats in row-major order */
    free(tile);
}

The same buffer-ownership and datatype-conversion rules as asdf_ndarray_read_all apply: pass NULL to have a buffer allocated for you, or your own buffer to read into, and ASDF_DATATYPE_SOURCE to keep the source type. A tile that extends past the bounds of the array yields ASDF_NDARRAY_ERR_OUT_OF_BOUNDS.

For the common two-dimensional case, asdf_ndarray_read_tile_2d offers a friendlier signature taking x, y, width, and height directly (plus an optional plane_origin selecting a plane of a higher-dimensional array):

float *tile = NULL;
asdf_ndarray_read_tile_2d(
    array, /* x= */ 10, /* y= */ 20, /* width= */ 8, /* height= */ 8,
    /* plane_origin= */ NULL, ASDF_DATATYPE_FLOAT32, (void **)&tile);

Datatypes and byte order

An element datatype is described by asdf_datatype_t, whose type field is one of the asdf_scalar_datatype_t enums – for example ASDF_DATATYPE_UINT8, ASDF_DATATYPE_INT32, ASDF_DATATYPE_FLOAT32, or ASDF_DATATYPE_FLOAT64. For simple numeric arrays setting type is all that is required; asdf_datatype_size then reports the size of a single element in bytes.

Byte order is given by asdf_byteorder_t: ASDF_BYTEORDER_LITTLE or ASDF_BYTEORDER_BIG for explicit endianness, or ASDF_BYTEORDER_DEFAULT to let the library choose when writing. asdf_scalar_datatype_from_string and asdf_scalar_datatype_to_string convert between the enumerators and their ASDF string names (e.g. "float64").

Building an ndarray to write

For writing you stack-allocate an asdf_ndarray_t and fill in the fields that describe the array, then allocate a data buffer for its contents with asdf_ndarray_data_alloc:

const uint64_t shape[] = {128, 128};
asdf_ndarray_t nd = {
    .datatype  = (asdf_datatype_t){.type = ASDF_DATATYPE_FLOAT32},
    .byteorder = ASDF_BYTEORDER_LITTLE,
    .ndim      = 2,
    .shape     = shape,
};

float *data = asdf_ndarray_data_alloc(&nd);
for (int idx = 0; idx < 128 * 128; idx++)
    data[idx] = (float)idx;

asdf_set_ndarray(file, "image", &nd);

asdf_ndarray_data_alloc allocates a correctly sized buffer on the heap based on the array’s shape and datatype. After the file has been written, release it with asdf_ndarray_data_dealloc (safe to call after asdf_close):

asdf_write_to(file, "out.asdf");
asdf_close(file);
asdf_ndarray_data_dealloc(&nd);

Note

When building an ndarray inside an extension’s serialize callback, use asdf_ndarray_data_alloc_temp instead – its buffer is freed automatically once the write completes.

See Writing ASDF files for the surrounding file-writing workflow.

Storage modes

By default an array’s data is written into a binary block (“internal” storage). asdf_ndarray_storage_set selects a different asdf_array_storage_t mode:

  • ASDF_ARRAY_STORAGE_INTERNAL – data is written to a binary block in the same file (the default).

  • ASDF_ARRAY_STORAGE_INLINE – data is serialized directly into the YAML tree as a nested sequence. This is convenient for very small arrays but a warning is logged above a configurable element-count threshold.

  • ASDF_ARRAY_STORAGE_EXTERNAL – reserved for data stored in a separate file; not yet supported.

asdf_ndarray_storage_set(&nd, ASDF_ARRAY_STORAGE_INLINE);

asdf_ndarray_storage reports the mode that will be used for an array when writing it out. If the array was ready from an existing file, it will also report the storage format that was used in that file.

Compression

Internal block data can be compressed. Call asdf_ndarray_compression_set on the array before passing it to asdf_set_ndarray (or asdf_write_to):

asdf_ndarray_compression_set(&nd, "lz4");
asdf_set_ndarray(file, "image", &nd);

Compressed arrays are transparently decompressed when read back. See Compression for the full list of supported compressors.