asdf/core/ndarray.h

Minimalistic prototype implementation of the core/ndarray-1.1.0 schema

Support is not yet fully complete. What is implemented:

It can provide direct access to the raw data of an ndarray (via a void *); users must use the metadata provided in the asdf_ndarray_t struct to interpret the data. However, data can also be copied as tiles using the asdf_ndarray_read_tile_ndim and asdf_ndarray_read_tile_2d functions.

  • ASDF internal block sources

  • All int and most float data types (other data types can be read but are not fully implemented)

  • shape, byeorder, and offset

  • strides are partially supported

What is not yet supported:

  • Shape containing ‘*’

  • Reading complex64 or complex128, or float16 datatypes

  • Reading string datatypes (ascii or ucs4)

  • Reading structured datatypes (the datatypes are parsed but there is are no APIs yet for interpreted structured array data

  • Reading arbitrarily strided data

  • Masks are not parsed or used at all, whether simple mask values or mask arrays (though if present a warning is logged indicating lack of support)

The current limitations are purely aritificial–it is so that we can rapidly develop the minimal viable product needed to make ASDF ndarray data available in common use cases.

Complete ndarray support will follow gradually.

enum asdf_ndarray_err_t

Error codes returned by some functions that read ndarray data

enumerator ASDF_NDARRAY_OK = 0

Indicates that the ndarray was read successfully

enumerator ASDF_NDARRAY_ERR_OUT_OF_BOUNDS

Return value indicating that an attempt was made to read beyond the bounds of the ndarray

struct asdf_ndarray_t

Public definition of the asdf_ndarray_t type

This is the main object through which ndarrays are used. They can be retrieved via asdf_get_ndarray and asdf_value_as_ndarray. The library allocates memory for this data structure which must be freed by the user with asdf_ndarray_destroy when no-longer needed..

For convenience some basic fields are made public for now, though this may not be ABI-stable in future releases.

size_t source

The index of the binary block containing the ndarray data

uint32_t ndim

The number of dimensions of the array

const uint64_t *shape

The shape of the array, itself an array of size .ndim

asdf_datatype_t datatype

The datatype of the array as represented by asdf_datatype_t

asdf_byteorder_t byteorder

The byteorder of the array data where appliable

uint64_t offset

Optional offset into the binary block where the array data begins

const int64_t *strides

Optional strides to use when iterating/index array data (an array of size .ndim giving the stride for each dimension)

asdf_value_err_t asdf_get_ndarray(asdf_file_t *file, const char *path, asdf_ndarray_t **out)

Get an asdf_ndarray_t* out of the ASDF tree

Parameters:
Returns:

ASDF_VALUE_OK if the value exists and is an ndarray, otherwise ASDF_VALUE_ERR_NOT_FOUND or ASDF_VALUE_ERR_TYPE_MISMATCH.

asdf_value_err_t asdf_value_as_ndarray(asdf_value_t *value, asdf_ndarray_t **out)

Cast a generic asdf_value_t* as an ndarray value, if possible

Parameters:
Returns:

ASDF_VALUE_OK if the value is an ndarray, otherwise ASDF_VALUE_ERR_TYPE_MISMATCH.

void asdf_ndarray_destroy(asdf_ndarray_t *ndarray)

Release datastructures and memory allocated for an asdf_ndarray_t

Parameters:

ndarray methods

const void *asdf_ndarray_data(asdf_ndarray_t *ndarray, size_t *size)

Return a pointer to the ndarray data

..todo:

Finish documenting me.
const void *asdf_ndarray_data_raw(asdf_ndarray_t *ndarray, size_t *size)

Return a pointer to the raw (compressed, in the case of compressed arrays) ndarray data

On non-compressed arrays this is equivalent to asdf_ndarray_data.

..todo:

Finish documenting me.
uint64_t asdf_ndarray_size(const asdf_ndarray_t *ndarray)

Return the total number of elements (not bytes) in the ndarray

Parameters:
Returns:

Total number of elements in the array (just the product of its shape)

uint64_t asdf_ndarray_nbytes(const asdf_ndarray_t *ndarray)

Return the total number of bytes in the ndarray data

Parameters:
Returns:

The byte size of the ndarray (this is just its size times the datatype nbytes)

void *asdf_ndarray_data_alloc(asdf_ndarray_t *ndarray)

Allocate heap memory large enough to store the data for the ndarray

Every call to this function should have a corresponding asdf_ndarray_data_dealloc to free the allocated memory when it is no longer needed (such as after writing the file). The memory is not automatically freed.

Parameters:
Returns:

A void* to the allocated memory or NULL if the memory could not be allocated; subsequent calls on the same ndarray will return the same memory

void *asdf_ndarray_data_alloc_temp(asdf_file_t *file, asdf_ndarray_t *ndarray)

Allocate a temporary data buffer for an ndarray to be written to a file, with automatic cleanup after the write completes.

Like asdf_ndarray_data_alloc but the allocated memory is freed automatically after asdf_write_to (or asdf_close) is called. Extension authors should use this instead of asdf_ndarray_data_alloc when building ndarrays inside a serialize callback.

Parameters:
  • file – The asdf_file_t* to register the cleanup with

  • ndarray – An asdf_ndarray_t* whose shape and datatype are already set

Returns:

A void* to the zero-initialized buffer, or NULL on OOM

void asdf_ndarray_data_dealloc(asdf_ndarray_t *ndarray)

Free ndarray data allocated with asdf_ndarray_data_alloc

If the ndarray never had data allocated this is a no-op but does produce a debug log message if logging is enabled.

Parameters:
int asdf_ndarray_compression_set(asdf_ndarray_t *ndarray, const char *compression)

Set the compression method to use for the ndarray data when writing

See also asdf_block_compression_set for which this is a shortcut (applies to the block created for holding this ndarray’s data).

Parameters:
  • ndarray – An asdf_ndarray_t* handle

  • compression – String representing the compressor to use (e.g. “bzp2”) if any, or NULL or the empty string to set no compression

Returns:

Non-zero if the compression could not be set (e.g. invalid/unknown compressor; use asdf_error to check the error code

asdf_array_storage_t asdf_ndarray_storage(asdf_ndarray_t *ndarray)

Return the storage mode that will be used when the ndarray is written.

If the ndarray was read from a file this reflects how it was originally stored. For a newly constructed ndarray this reflects whatever was last passed to asdf_ndarray_storage_set, or ASDF_ARRAY_STORAGE_INTERNAL if the storage was never explicitly set.

Parameters:
Returns:

The asdf_array_storage_t for this ndarray.

void asdf_ndarray_storage_set(asdf_ndarray_t *ndarray, asdf_array_storage_t storage)

Set the storage mode used when the ndarray is written.

ASDF_ARRAY_STORAGE_INLINE serializes the data as a nested YAML sequence under the data key. A warning is logged if the number of elements exceeds the configured threshold (see asdf_emitter_cfg_t.inline_ndarray_warning_thresh).

ASDF_ARRAY_STORAGE_INTERNAL writes the data in a binary block (the default when no storage mode is set).

ASDF_ARRAY_STORAGE_EXTERNAL is not yet supported; calling this function with that value logs an error and leaves the storage mode unchanged.

Parameters:
asdf_block_t *asdf_ndarray_block(asdf_ndarray_t *ndarray)

Get the pointer to the open asdf_block_t associated with the ndarray

This is mostly for debugging/low-level inspection and is not needed for typical use cases.

Parameters:
Returns:

A constant pointer to the asdf_block_t structure representing the binary block underlying the array, if any. This will be NULL, for example, if the ndarray used inline data.

asdf_ndarray_err_t asdf_ndarray_read_all(asdf_ndarray_t *ndarray, asdf_scalar_datatype_t dst_t, void **dst)

Read the full ndarray, copying into the provided buffer (or allocating a destination buffer if dst = NULL)

This is like asdf_ndarray_read_tile_ndim but with a default “tile” size of the full array. Like asdf_ndarray_read_tile_ndim it will also convert the data to the host native byte order if necessary, and can convert it to a different numeric type than the source array.

Parameters:
  • ndarray – The asdf_ndarray_t* handle to the ndarray

  • dst_t – An asdf_scalar_datatype_t to convert to, or ASDF_DATATYPE_SOURCE to keep the original source datatype

  • dst – Pointer to a destination void* already allocated to receive the exact number of bytes in the source ndarray, or NULL to indicate that a buffer should be allocated. In the latter case the caller is responsible for freeing the allocated buffer.

asdf_ndarray_err_t asdf_ndarray_read_tile_ndim(asdf_ndarray_t *ndarray, const uint64_t *origin, const uint64_t *shape, asdf_scalar_datatype_t dst_t, void **dst)

Read tiles of up to N-dimensions out of N-D arrays

Tiles can be slices of any number of dimenions <= N and of any shape so long as they don’t go past the bounds of the array (otherwise ASDF_NDARRAY_ERR_OUT_OF_BOUNDS is returned).

Parameters:
  • ndarray – The asdf_ndarray_t* handle to the ndarray

  • origin – The indices of the first pixel of the tile–an array of size ndim

  • shape – The shape of the tile to read–an array of size ndim

  • dst – Pointer to a destination void* already allocated to receive the exact number of bytes in the output tile based on shape and datatype, or NULL to indicate that a buffer should be allocated. In the latter case the caller is responsible for freeing the allocated buffer.

Param:

dst_t: The output datatype, if conversion from the source array’s datatype to the output datatype is possible

Currently, if no conversion is possible it will just copy the tile data without conversion–this may change in the future to become an error. You can pass the special value ASDF_DATATYPE_SOURCE to indicate that the output datatype is the source datatype.

asdf_ndarray_err_t asdf_ndarray_read_tile_2d(asdf_ndarray_t *ndarray, uint64_t x, uint64_t y, uint64_t width, uint64_t height, const uint64_t *plane_origin, asdf_scalar_datatype_t dst_t, void **dst)

Like asdf_ndarray_read_tile_ndim but with conveniences for the common 2-D case

Parameters:
  • ndarray – The asdf_ndarray_t* handle to the ndarray

  • x – The x coordinate of the tile origin

  • y – The y coordinate of the tile origin

  • width – The width of the tile in the x direction

  • height – The height of the tile in the y direction

  • plane_origin – If the source array is greater than 2-dimensional, the ndim - 2 array of plane coordinates–may be NULL if either the source array is 2-D or otherwise the outer-most plane is used

  • dst – Pointer to a destination void* already allocated to receive the exact number of bytes in the output tile based on shape and datatype, or NULL to indicate that a buffer should be allocated. In the latter case the caller is responsible for freeing the allocated buffer.

Param:

dst_t: The output datatype, if conversion from the source array’s datatype to the output datatype is possible

Currently, if no conversion is possible it will just copy the tile data without conversion–this may change in the future to become an error. You can pass the special value ASDF_DATATYPE_SOURCE to indicate that the output datatype is the source datatype.