High-level overview ################### What does this library actually *do*? To answer that, it helps to understand what an ASDF file is. At its core, an ASDF file is a human-readable **YAML document** with optional binary data blocks attached. The YAML portion—called the :ref:`tree `— describes the structure and metadata of the file. The binary blocks that may follow typically store large numerical arrays, images, or other data that would be inefficient to represent directly in YAML. Together, the YAML tree and the binary blocks form a single coherent data model: the tree provides the structure and metadata, while the blocks hold the raw data. The tree itself usually follows a particular *schema* that defines the expected keys, value types, and overall organization. All data in an ASDF file—whether metadata or large arrays—is accessed through this tree. Conceptually, you can think of the file as one big nested mapping structure (like a Python :py:obj:`dict`) containing values of many different kinds. The basic building blocks ========================= A YAML document contains three main types of values: * Mappings—collection of key/value pairs (aka hash maps, dictionaries, etc.) * Sequences—ordered collections (aka arrays, lists, etc.) * Scalars—simple values such as numbers, strings, or booleans .. note:: YAML itself does not prescribe strict types for scalar values—it treats them effectively as plain strings. However, the `YAML Core Schema`_ defines a common set of interpretations (e.g., numbers, booleans, and null), and most high-level languages such as Python, JavaScript, etc. that implement YAML parsers adhere to this convention. libasdf adheres to this schema as well when reading and writing scalar, though it also :c:func:`possible ` to access values as raw scalars. Beyond the core types: tags =========================== In addition to these core YAML types, ASDF supports values that represent complex or domain-specific objects. This is achieved using **YAML tags**, which associate a value with a particular type definition known to the software. Tags allow arbitrary objects—such as coordinate systems, physical units, or n-dimensional arrays—to be serialized and deserialized in a structured way. For example, a tag might tell libasdf that a particular mapping should be interpreted as an ``ndarray`` instead of a plain dictionary. .. code-block:: yaml :caption: A mapping tagged as an ndarray data: !core/ndarray-1.1.0 source: 0 datatype: int64 byteorder: little shape: [1024, 1024] This mechanism is similar to how YAML-based serializers in Python can store and restore instances of custom classes. ASDF was originally designed around a Python reference implementation, and while the format itself is language-independent, it retains this spirit of structured object serialization. One of the most common tagged objects is the **ndarray**, which provides efficient storage for numerical array data—described next. .. _ndarrays: ndarrays (N-dimension typed arrays) =================================== One of the most important and widely used types in ASDF is the **ndarray**. The concept originates from `NumPy`_, the Python library for efficient numerical computing with *n*-dimensional arrays. In the :external:doc:`Python ASDF implementation `, NumPy arrays are serialized under the tag :ref:`core/ndarray `. Although the tag name and conventions come from Python, the underlying idea is language-independent: an ``ndarray`` represents a typed, multi-dimensional array of (typically) numerical values. When an ``ndarray`` is stored in an ASDF file, the actual numeric data is not written directly into the YAML document. Instead, it is stored in a separate **binary block**, and the ``ndarray`` node in the YAML tree contains only the metadata needed to interpret that block. This metadata includes information such as the array's shape, data type, byte order, and the reference (or "source") of the binary data. From the point of view of the ASDF file format, a binary block is just a contiguous sequence of bytes with no intrinsic meaning or structure. It is the corresponding ``ndarray`` metadata in the tree that gives those bytes their shape and semantic content—turning them into a structured numerical array. This separation between structure (in YAML) and data (in binary blocks) is one of the key design principles of ASDF. It allows the format to combine human readability and flexibility in metadata with efficient storage and access for large numerical datasets. To summarize ============ So to come back to the question at the top of this page: What does libasdf *do*? It simply reads values of different types from (and eventually writes to) the tree structure of an ASDF file. In addition to standard mapping, sequence, and scalar types it also supports the core ASDF data types as well as custom data types through an extension mechanism allowing them to be read into C-native datastructures like `asdf_ndarray_t`. Additionally it includes a few convenience functions for working with standard data types, such as for reading ndarray data by :c:func:`tiles `, with more to be added as common use cases are discovered.