DAOS System Administration¶
System monitoring and telemetry data will be provided as part of the control plane and will be documented in a future revision.
NVMe SSD Health Monitoring¶
Useful admin dmg commands to query NVMe SSD health:
- Query Per-Server Metadata (SMD):
dmg storage query smd
Queries persistently stored device and pool metadata tables. The device table maps device UUID to attached VOS target IDs. The pool table maps VOS target IDs to attached SPDK blob IDs.
$ dmg -l boro-11 storage query smd --devices --pools boro-11:10001: connected SMD Device List: boro-11:10001: Device: UUID: 5bd91603-d3c7-4fb7-9a71-76bc25690c19 VOS Target IDs: 0 1 2 3 SMD Pool List: boro-11:10001: Pool: UUID: 01b41f76-a783-462f-bbd2-eb27c2f7e326 VOS Target IDs: 0 1 3 2 SPDK Blobs: 4294967404 4294967405 4294967407 4294967406
- Query Blobstore Health Data:
dmg storage query blobstore-health
Queries in-memory health data for the SPDK blobstore (i.e, NVMe SSD). This includes a subset of the SPDK device health stats, as well as I/O error and checksum counters.
$ dmg -l boro-11 storage query blobstore-health --devuuid=5bd91603-d3c7-4fb7-9a71-76bc25690c19 -l=boro-11:10001 boro-11:10001: connected Blobstore Health Data: boro-11:10001: Device UUID: 5bd91603-d3c7-4fb7-9a71-76bc25690c19 Read errors: 0 Write errors: 0 Unmap errors: 0 Checksum errors: 0 Device Health: Error log entries: 0 Media errors: 0 Temperature: 289 Temperature: OK Available Spare: OK Device Reliability: OK Read Only: OK Volatile Memory Backup: OK
- Query Persistent Device State:
dmg storage query device-state
Queries the current persistently stored device state of the specified NVMe SSD (either NORMAL or FAULTY).
$ dmg storage query device-state --devuuid=5bd91603-d3c7-4fb7-9a71-76bc25690c19 -l=boro-11:10001 boro-11:10001: connected Device State Info: boro-11:10001: Device UUID: 5bd91603-d3c7-4fb7-9a71-76bc25690c19 State: NORMAL
- Manually Set Device State to FAULTY:
dmg storage set nvme-faulty
Allows the admin to manually set the device state of the given device to FAULTY, which will trigger faulty device reaction (all targets on the SSD will be rebuilt and the SSD will remain in an OUT state until reintegration is supported).
$ dmg storage set nvme-faulty --devuuid=5bd91603-d3c7-4fb7-9a71-76bc25690c19 -l=boro-11:10001 boro-11:10001: connected Device State Info: boro-11:10001: Device UUID: 5bd91603-d3c7-4fb7-9a71-76bc25690c19 State: FAULTY
Full Shutdown and Restart¶
A DAOS system can be restarted after a controlled shutdown providing no configurations changes have been made after the initial format.
The DAOS Control Server instance acting as access point records DAOS I/O Server instances that join the system in a "membership".
When up and running, the entire system (all I/O Server instances)
can be shut down with the command:
dmg -l <access_point_addr> system stop, after which DAOS Control
Servers will continue to operate and listen on the management network.
To start the system again (with no configuration changes) after a
controlled shutdown, run the command
dmg -l <access_point_addr> system start, DAOS I/O Servers
managed by DAOS Control Servers will be started.
To query the system membership, run the command
dmg -l <access_point_addr> system query, this lists details
(rank/uuid/control address/state) of DAOS I/O Servers in the
Controlled start/stop has some known limitations. "start" restarts all configured instances on all harnesses that can be located in the system membership, regardless of member state. Moreover, supplying the list of ranks to "start" and "stop" is not yet supported
To reset the DAOS metadata across all hosts, the system must be reformatted.
First, ensure all
daos_server processes on all hosts have been
stopped, then for each SCM mount specified in the config file
scm_mount in the
servers section) umount and wipe FS signatures.
Example illustration with two IO instances specified in the config file:
clush -w wolf-[118-121,130-133] umount /mnt/daos1
clush -w wolf-[118-121,130-133] umount /mnt/daos0
clush -w wolf-[118-121,130-133] wipefs -a /dev/pmem1
clush -w wolf-[118-121,130-133] wipefs -a /dev/pmem0
Then restart DAOS Servers and format.
Fault Domain Maintenance and Reintegration¶
Details on how to drain an individual storage node or fault domain (e.g. rack) in preparation for maintenance activity and how to reintegrate it will be provided in a future revision.
DAOS System Extension¶
Ability to add new DAOS server instances to a pre-existing DAOS system will be documented in a future revision.
DAOS relies on massively distributed single-ported storage. Each target is thus effectively a single point of failure. DAOS achieves availability and durability of both data and metadata by providing redundancy across targets in different fault domains.
Fault Detection & Isolation¶
DAOS servers are monitored within a DAOS system through a gossip-based protocol called SWIM1 that provides accurate, efficient, and scalable server fault detection.
Storage attached to each DAOS target is monitored through periodic local health assessment. Whenever a local storage I/O error is returned to the DAOS server, an internal health check procedure will be called automatically. This procedure makes an overall health assessment by analyzing the IO error code and device SMART/Health data. If the result is negative, the target will be marked as faulty, and further I/Os to this target will be rejected and re-routed.
Once detected, the faulty target or servers (effectively a set of targets) must be excluded from each pool membership. This process is triggered either manually by the administrator or automatically (see the next section for more information). Upon exclusion from the pool map, each target starts the collective rebuild process automatically to restore data redundancy. The rebuild process is designed to operate online while servers continue to process incoming I/O operations from applications.
Tools to monitor and manage rebuild are still under development.
The rebuild process may consume many resources on each server and can be throttled to reduce the impact on application performance. This current logic relies on CPU cycles on the storage nodes. By default, the rebuild process is configured to consume up to 30% of the CPU cycles, leaving the other 70% for regular I/O operations.
During the rebuild process, the user can set the throttle to guarantee that the rebuild will not use more resources than the user setting. The user can only set the CPU cycle for now. For example, if the user set the throttle to 50, then the rebuild will at most use 50% of the CPU cycle to do the rebuild job. The default rebuild throttle for CPU cycle is 30. This parameter can be changed via the daos_mgmt_set_params() API call and will be eventually available through the management tools.
Interoperability in DAOS is handled via protocol and schema versioning for persistent data structures. Further instructions on how to manage DAOS software upgrades will be provided in a future revision.
Limited protocol interoperability is provided by the DAOS storage stack. Version compatibility checks will be performed to verify that:
All targets in the same pool run the same protocol version.
Client libraries linked with the application may be up to one protocol version older than the targets.
If a protocol version mismatch is detected among storage targets in the same pool, the entire DAOS system will fail to start up and will report failure to the control API. Similarly, the connection from clients running a protocol version incompatible with the targets will return an error.
Persistent Schema Compatibility and Update¶
The schema of persistent data structures may evolve from time to time to fix bugs, add new optimizations, or support new features. To that end, the persistent data structures support schema versioning.
Upgrading the schema version will not be performed automatically and must be initiated by the administrator. A dedicated upgrade tool will be provided to upgrade the schema version to the latest one. All targets in the same pool must have the same schema version. Version checks are performed at system initialization time to enforce this constraint.
To limit the validation matrix, each new DAOS release will be published with a list of supported schema versions. To run with the new DAOS release, administrators will then need to upgrade the DAOS system to one of the supported schema versions. New pool shards will always be formatted with the latest version. This versioning schema only applies to a data structure stored in persistent memory and not to block storage that only stores user data with no metadata.
Support for end-to-end data integrity is planned for DAOS v1.2 and background checksum scrubbing for v2.2. Once available, that functionality will be documented here.