Application Interface and Tiering

DAOS Container Management

DAOS containers are the unit of data management for users.

Container Creation/Destroy

Containers can be created and destroyed through the daos_cont_create/destroy() functions exported by the DAOS API. A user tool called daos is also provided to manage containers.

To create a container:

$ daos container create --pool=a171434a-05a5-4671-8fe2-615aa0d05094 --svc=0
Successfully created container 008123fc-6b6c-4768-a88a-a2a5ef34a1a2

The container type (i.e., POSIX or HDF5) can be passed via the --type option. As shown below, the pool UUID, container UUID, and container attributes can be stored in the extended attributes of a POSIX file or directory for convenience. Then subsequent invocations of the daos tools need to reference the path to the POSIX file or directory.

$ daos container create --pool=a171434a-05a5-4671-8fe2-615aa0d05094 --svc=0 --path=/tmp/mycontainer --type=POSIX --oclass=large --chunk_size=4K
Successfully created container 419b7562-5bb8-453f-bd52-917c8f5d80d1 type POSIX
$ daos container query --svc=0 --path=/tmp/mycontainer
Pool UUID:      a171434a-05a5-4671-8fe2-615aa0d05094
Container UUID: 419b7562-5bb8-453f-bd52-917c8f5d80d1
Number of snapshots: 0
Latest Persistent Snapshot: 0
DAOS Unified Namespace Attributes on path /tmp/mycontainer:
Container Type: POSIX
Object Class:   large
Chunk Size:     4096

Container Properties

At creation time, a list of container properties can be specified:

While those properties are currently stored persistently with container metadata, many of them are still under development. The ability to modify some of these properties on an existing container will also be provided in a future release.

Container Snapshot

Similar to container create/destroy, a container can be snapshotted through the DAOS API by calling daos_cont_create_snap(). Additional functions are provided to destroy and list container snapshots.

The API also provides the ability to subscribe to container snapshot events and to rollback the content of a container to a previous snapshot, but those operations are not yet fully implemented.

This section will be updated once support for container snapshot is supported by the daos tool.

Container User Attributes

Similar to POSIX extended attributes, users can attach some metadata to each container through the daos_cont_{list/get/set}_attr() API.

Container ACLs

Support for per-container ACLs is scheduled for DAOS v1.2. Similar to pool ACLs, container ACLs will implement a subset of the NFSv4 ACL standard. This feature will be documented here once available.

Native Programming Interface

Building against the DAOS library

To build application or I/O middleware against the native DAOS API, include the daos.h header file in your program and link with -Ldaos. Examples are available under src/tests.

DAOS API Reference

libdaos is written in C and uses Doxygen comments that are added to C header files.

[TODO] Generate Doxygen document and add a link here.

Bindings to Different Languages

API bindings to both Python^1 and Go^2 languages are available.

POSIX Filesystem

A regular POSIX namespace can be encapsulated into a DAOS container. This capability is provided by the libdfs library that implements the file and directory abstractions over the native libdaos library. The POSIX emulation can be exposed to applications or I/O frameworks either directly (e.g., for frameworks Spark or TensorFlow, or benchmark like IOR or mdtest that support different a storage backend plugin), or transparently via a FUSE daemon, combined optionally with an interception library to address some of the FUSE performance bottleneck by delivering full OS bypass for POSIX read/write operations.

libdfs

DFS stands for DAOS File System and is a library that allows a DAOS container to be accessed as a hierarchical POSIX namespace. It supports files, directories, and symbolic links, but not hard links. Access permissions are inherited from the parent pool and not implemented on a per-file or per-directory basis. setuid() and setgid() programs, as well as supplementary groups, are currently not supported.

While libdfs can be tested from a single instance (i.e. single process or client node if used through dfuse), special care is required when the same POSIX container is mounted concurrently by multiple processes. Concurrent DFS mounts are not recommended. Support for concurrency control is under development and will be documented here once ready.

dfuse

A fuse daemon called dfuse is provided to mount a POSIX container in the local filesystem tree. dfuse exposes one mountpoint as a single DFS namespace with a single pool and container and can be mounted by regular use (provided that it is granted access to the pool and container). To mount an existing POSIX container with dfuse, run the following command:

$ dfuse --pool a171434a-05a5-4671-8fe2-615aa0d05094 -s 0 --container 464e68ca-0a30-4a5f-8829-238e890899d2 -m /tmp/daos

The UUID after -p and -c should be replaced with respectively the pool and container UUID. -s should be followed by the pool svc rank list and -m is the local directory where the mount point will be setup. When done, the file system can be unmounted via fusermount:

$ fusermount3 -u /tmp/daos

libioil

An interception library called libioil is available to work with dfuse. This library works in conjunction with dfuse and allow to interception of POSIX I/O calls and issue the I/O operations directly from the application context through libdaos without any appliction changes. This provides kernel-bypass for I/O data leading to improved performance. To use this set the LD_PRELOAD to point to the shared libray in the DOAS install dir

LD_PRELOAD=/path/to/daos/install/lib/libioil.so

Support for libioil is currently planned for DAOS v1.2.

Unified Namespace

The DAOS tier can be tightly integrated with the Lustre parallel filesystem in which DAOS containers will be represented through the Lustre namespace. This capability is under development and is scheduled for DAOS v1.2.

Current state of work can be summarized as follow :

HPC I/O Middleware Support

Several HPC I/O middleware libraries have been ported to the native API.

MPI-IO

DAOS has its own MPI-IO ROM ADIO driver located in a MPICH fork on GitHub:

https://github.com/daos-stack/mpich

This driver has been submitted upstream for integration.

To build the MPI-IO driver:

Switch the PATH and LD_LIBRARY_PATH to where you want to build your client apps or libs that use MPI to the installed MPICH.

Build any client (HDF5, ior, mpi test suites) normally with the mpicc and mpich library installed above (see child pages).

To run an example:

  1. Launch DAOS server(s) and create a pool as specified in the previous section. This will return a pool uuid "puuid" and service rank list "svcl"
  2. At the client side, the following environment variables need to be set:
    export PATH=/path/to/mpich/install/bin:$PATH
    export LD_LIBRARY_PATH=/path/to/mpich/install/lib:$LD_LIBRARY_PATH
    export MPI_LIB=""
    export CRT_ATTACH_INFO_PATH=/path/ (whatever was passed to daos_server start -a)
    export DAOS_SINGLETON_CLI=1
    
    1. export DAOS_POOL=puuid; export DAOS_SVCL=svcl This is just temporary till we have a better way of passing pool connect info to MPI-IO and other middleware over DAOS.
    2. Run the client application or test.

Limitations to the current implementation include:

HDF5

A prototype version of an HDF5 DAOS connector is available. Please refer to the DAOS VOL connector user guide^3 for instructions on how to build and use it.

Spark Support

Spark integration with libdfs is under development and is scheduled for DAOS v1.0 or v1.2.

Data Migration

Migration to/from a POSIX filesystem

A dataset mover tool is under consideration to move a snapshot of a POSIX, MPI-IO or HDF5 container to a POSIX filesystem and vice versa. The copy will be performed at the POSIX or HDF5 level. The resulting HDF5 file over the POSIX filesystem will be accessible through the native HDF5 connector with the POSIX VFD.

The first version of the mover tool is currently scheduled for DAOS v1.4.

Container Parking

The mover tool will also eventually support the ability to serialize and deserialize a DAOS container to a set of POSIX files that can be stored or "parked" in an external POSIX filesystem. This transformation is agnostic to the data model and container type and will retain all DAOS internal metadata.