High-level overview

What does this library actually do?

To answer that, it helps to understand what an ASDF file is.

At its core, an ASDF file is a human-readable YAML document with optional binary data blocks attached. The YAML portion—called the tree— describes the structure and metadata of the file. The binary blocks that may follow typically store large numerical arrays, images, or other data that would be inefficient to represent directly in YAML.

Together, the YAML tree and the binary blocks form a single coherent data model: the tree provides the structure and metadata, while the blocks hold the raw data. The tree itself usually follows a particular schema that defines the expected keys, value types, and overall organization.

All data in an ASDF file—whether metadata or large arrays—is accessed through this tree. Conceptually, you can think of the file as one big nested mapping structure (like a Python dict) containing values of many different kinds.

The basic building blocks

A YAML document contains three main types of values:

  • Mappings—collection of key/value pairs (aka hash maps, dictionaries, etc.)

  • Sequences—ordered collections (aka arrays, lists, etc.)

  • Scalars—simple values such as numbers, strings, or booleans

Note

YAML itself does not prescribe strict types for scalar values—it treats them effectively as plain strings. However, the YAML Core Schema defines a common set of interpretations (e.g., numbers, booleans, and null), and most high-level languages such as Python, JavaScript, etc. that implement YAML parsers adhere to this convention.

libasdf adheres to this schema as well when reading and writing scalar, though it also possible to access values as raw scalars.

Beyond the core types: tags

In addition to these core YAML types, ASDF supports values that represent complex or domain-specific objects. This is achieved using YAML tags, which associate a value with a particular type definition known to the software.

Tags allow arbitrary objects—such as coordinate systems, physical units, or n-dimensional arrays—to be serialized and deserialized in a structured way. For example, a tag might tell libasdf that a particular mapping should be interpreted as an ndarray instead of a plain dictionary.

A mapping tagged as an ndarray
data: !core/ndarray-1.1.0
   source: 0
   datatype: int64
   byteorder: little
   shape: [1024, 1024]

This mechanism is similar to how YAML-based serializers in Python can store and restore instances of custom classes.

ASDF was originally designed around a Python reference implementation, and while the format itself is language-independent, it retains this spirit of structured object serialization. One of the most common tagged objects is the ndarray, which provides efficient storage for numerical array data—described next.

ndarrays (N-dimension typed arrays)

One of the most important and widely used types in ASDF is the ndarray. The concept originates from NumPy, the Python library for efficient numerical computing with n-dimensional arrays. In the Python ASDF implementation, NumPy arrays are serialized under the tag core/ndarray. Although the tag name and conventions come from Python, the underlying idea is language-independent: an ndarray represents a typed, multi-dimensional array of (typically) numerical values.

When an ndarray is stored in an ASDF file, the actual numeric data is not written directly into the YAML document. Instead, it is stored in a separate binary block, and the ndarray node in the YAML tree contains only the metadata needed to interpret that block. This metadata includes information such as the array’s shape, data type, byte order, and the reference (or “source”) of the binary data.

From the point of view of the ASDF file format, a binary block is just a contiguous sequence of bytes with no intrinsic meaning or structure. It is the corresponding ndarray metadata in the tree that gives those bytes their shape and semantic content—turning them into a structured numerical array.

This separation between structure (in YAML) and data (in binary blocks) is one of the key design principles of ASDF. It allows the format to combine human readability and flexibility in metadata with efficient storage and access for large numerical datasets.

To summarize

So to come back to the question at the top of this page: What does libasdf do? It simply reads values of different types from (and eventually writes to) the tree structure of an ASDF file.

In addition to standard mapping, sequence, and scalar types it also supports the core ASDF data types as well as custom data types through an extension mechanism allowing them to be read into C-native datastructures like asdf_ndarray_t. Additionally it includes a few convenience functions for working with standard data types, such as for reading ndarray data by tiles, with more to be added as common use cases are discovered.