Working with ndarrays¶
The overview introduced the ndarray – a typed, multi-dimensional array whose bulk numeric data lives in a binary block while its shape, datatype, and byte order are described by metadata in the YAML tree. This page is a practical guide to doing things with ndarrays in libasdf: reading them out of a file, copying and converting their data (including sub-array “tiles”), and building new arrays to write back out.
Everything here is declared in asdf/core/ndarray.h (pulled in by the
umbrella asdf.h) with supporting datatype definitions in
asdf/core/datatype.h.
The asdf_ndarray_t structure¶
An ndarray is represented by an asdf_ndarray_t. When reading, the library heap-allocates one for you and fills in the metadata; when writing, you fill one in yourself, typically on the stack. The publicly visible fields are:
Field |
Type |
Meaning |
|---|---|---|
|
|
Number of dimensions. |
|
|
Length of each dimension; an array of |
|
Element datatype (see Datatypes and byte order below). |
|
|
Byte order of the stored data ( |
|
|
|
Index of the binary block holding the data. |
|
|
Optional byte offset into the block where the data begins. |
|
|
Optional per-dimension strides; |
Note
Only a subset of full ndarray functionality is implemented so far. In
particular complex datatypes, string (ascii / ucs4) and
structured datatypes, masks, and arbitrarily strided data are not yet fully
supported for reading. See the
asdf/core/ndarray.h
header for the current status.
Reading an ndarray from a file¶
If you know the YAML Pointer path to an array, read it with asdf_get_ndarray:
asdf_ndarray_t *array = NULL;
if (asdf_get_ndarray(file, "data", &array) != ASDF_VALUE_OK) {
fprintf(stderr, "no ndarray at /data\n");
return 1;
}
printf("%u-dimensional array\n", array->ndim);
for (uint32_t dim = 0; dim < array->ndim; dim++)
printf(" axis %u: %" PRIu64 " elements\n", dim, array->shape[dim]);
The library allocates array; release it with asdf_ndarray_destroy when
you are done:
asdf_ndarray_destroy(array);
If you do not know the path in advance you can search the tree for the first
ndarray with asdf_value_find and the asdf_value_is_ndarray predicate, then
cast the matching value with asdf_value_as_ndarray:
asdf_value_t *root = asdf_get_value(file, "");
asdf_value_t *found = asdf_value_find(root, asdf_value_is_ndarray);
asdf_ndarray_t *array = NULL;
if (found && asdf_value_as_ndarray(found, &array) == ASDF_VALUE_OK) {
/* ... */
}
asdf_value_destroy(found);
asdf_value_destroy(root);
See Reading values out of the ASDF tree for more on generic value handles and tree traversal.
Accessing the raw data¶
The quickest way to reach the element data is asdf_ndarray_data, which returns a pointer to the (decompressed) bytes and optionally their size:
size_t nbytes = 0;
const void *data = asdf_ndarray_data(array, &nbytes);
This pointer is owned by array and must not be freed. The data is in the
array’s source datatype and byte order, exactly as stored – no conversion is
performed. Two helpers describe the array’s extent without touching the data:
asdf_ndarray_size – total number of elements (the product of the shape).
asdf_ndarray_nbytes – total size of the data in bytes.
asdf_ndarray_data_raw is a lower-level variant that, for compressed arrays, returns the still-compressed bytes without decompressing them; for uncompressed arrays it is equivalent to asdf_ndarray_data.
Because asdf_ndarray_data exposes the data in its original layout, you are
responsible for honoring byteorder and datatype yourself. When you want
the data converted to a convenient host type, use the reading functions in the
next section instead.
Reading data with conversion¶
asdf_ndarray_read_all copies the whole array into a buffer, converting it to the host’s native byte order and, optionally, to a different numeric datatype:
double *values = NULL;
asdf_ndarray_err_t err =
asdf_ndarray_read_all(array, ASDF_DATATYPE_FLOAT64, (void **)&values);
if (err == ASDF_NDARRAY_OK) {
uint64_t n = asdf_ndarray_size(array);
/* values[0 .. n-1] are now native-endian doubles */
free(values);
}
Passing NULL for the destination (as above, via a pointer whose value is
NULL) asks the library to allocate a buffer of the right size; the caller
then owns that memory and must free() it. Alternatively, pre-allocate a
buffer of asdf_ndarray_nbytes (or the appropriate size for the converted
type) and pass its address. Pass ASDF_DATATYPE_SOURCE as the destination
datatype to keep the array’s original element type and only normalize byte
order.
Reading tiles¶
Often you only need a rectangular sub-region (“tile”) of a large array.
asdf_ndarray_read_tile_ndim reads an arbitrary N-dimensional cutout given an
origin and a shape, each an array of ndim values:
/* A 5x5 tile anchored at the array origin of a 2-D array */
uint64_t origin[2] = {0, 0};
uint64_t shape[2] = {5, 5};
float *tile = NULL;
asdf_ndarray_err_t err = asdf_ndarray_read_tile_ndim(
array, origin, shape, ASDF_DATATYPE_FLOAT32, (void **)&tile);
if (err == ASDF_NDARRAY_OK) {
/* tile holds 25 native-endian floats in row-major order */
free(tile);
}
The same buffer-ownership and datatype-conversion rules as
asdf_ndarray_read_all apply: pass NULL to have a buffer allocated for you,
or your own buffer to read into, and ASDF_DATATYPE_SOURCE to keep the source
type. A tile that extends past the bounds of the array yields
ASDF_NDARRAY_ERR_OUT_OF_BOUNDS.
For the common two-dimensional case, asdf_ndarray_read_tile_2d offers a
friendlier signature taking x, y, width, and height directly
(plus an optional plane_origin selecting a plane of a higher-dimensional
array):
float *tile = NULL;
asdf_ndarray_read_tile_2d(
array, /* x= */ 10, /* y= */ 20, /* width= */ 8, /* height= */ 8,
/* plane_origin= */ NULL, ASDF_DATATYPE_FLOAT32, (void **)&tile);
Datatypes and byte order¶
An element datatype is described by asdf_datatype_t, whose type field is
one of the asdf_scalar_datatype_t enums – for example ASDF_DATATYPE_UINT8,
ASDF_DATATYPE_INT32, ASDF_DATATYPE_FLOAT32, or ASDF_DATATYPE_FLOAT64.
For simple numeric arrays setting type is all that is required;
asdf_datatype_size then reports the size of a single element in bytes.
Byte order is given by asdf_byteorder_t: ASDF_BYTEORDER_LITTLE or
ASDF_BYTEORDER_BIG for explicit endianness, or ASDF_BYTEORDER_DEFAULT to
let the library choose when writing. asdf_scalar_datatype_from_string and
asdf_scalar_datatype_to_string convert between the enumerators and their ASDF
string names (e.g. "float64").
Building an ndarray to write¶
For writing you stack-allocate an asdf_ndarray_t and fill in the fields that describe the array, then allocate a data buffer for its contents with asdf_ndarray_data_alloc:
const uint64_t shape[] = {128, 128};
asdf_ndarray_t nd = {
.datatype = (asdf_datatype_t){.type = ASDF_DATATYPE_FLOAT32},
.byteorder = ASDF_BYTEORDER_LITTLE,
.ndim = 2,
.shape = shape,
};
float *data = asdf_ndarray_data_alloc(&nd);
for (int idx = 0; idx < 128 * 128; idx++)
data[idx] = (float)idx;
asdf_set_ndarray(file, "image", &nd);
asdf_ndarray_data_alloc allocates a correctly sized buffer on the heap based on the array’s shape and datatype. After the file has been written, release it with asdf_ndarray_data_dealloc (safe to call after asdf_close):
asdf_write_to(file, "out.asdf");
asdf_close(file);
asdf_ndarray_data_dealloc(&nd);
Note
When building an ndarray inside an extension’s serialize callback, use asdf_ndarray_data_alloc_temp instead – its buffer is freed automatically once the write completes.
See Writing ASDF files for the surrounding file-writing workflow.
Storage modes¶
By default an array’s data is written into a binary block (“internal” storage). asdf_ndarray_storage_set selects a different asdf_array_storage_t mode:
ASDF_ARRAY_STORAGE_INTERNAL– data is written to a binary block in the same file (the default).ASDF_ARRAY_STORAGE_INLINE– data is serialized directly into the YAML tree as a nested sequence. This is convenient for very small arrays but a warning is logged above a configurable element-count threshold.ASDF_ARRAY_STORAGE_EXTERNAL– reserved for data stored in a separate file; not yet supported.
asdf_ndarray_storage_set(&nd, ASDF_ARRAY_STORAGE_INLINE);
asdf_ndarray_storage reports the mode that will be used for an array when writing it out. If the array was ready from an existing file, it will also report the storage format that was used in that file.
Compression¶
Internal block data can be compressed. Call asdf_ndarray_compression_set on the array before passing it to asdf_set_ndarray (or asdf_write_to):
asdf_ndarray_compression_set(&nd, "lz4");
asdf_set_ndarray(file, "image", &nd);
Compressed arrays are transparently decompressed when read back. See Compression for the full list of supported compressors.