Figure 1 compares a conventional RAID layout versus an equivalent declustered array. Figure 1. This figure is an example of how GPFS Native RAID improves client performance during rebuild operations by utilizing the throughput of all disks in the declustered array. This is illustrated here by comparing a conventional RAID of three arrays versus a declustered array, both using 7 disks. To decluster this array, the disks are divided into seven tracks, two strips per array, as shown in the upper left. The strips from each group are then combinatorially spread across all seven disk positions, for a total of 21 virtual tracks, per the upper right.
|Published (Last):||19 May 2009|
|PDF File Size:||3.10 Mb|
|ePub File Size:||11.41 Mb|
|Price:||Free* [*Free Regsitration Required]|
Assign a primary and backup server to each recovery group. Each JBOD array should be connected to two servers to protect against server failure. Each server should also have two independent paths to each physical disk to protect against path failure and provide higher throughput to the individual disks. Define multiple recovery groups on a JBOD array, if the architecture suggests it, and use mutually reinforcing primary and backup servers to spread the processing evenly across the servers and the JBOD array.
Configure recovery group servers with a large vdisk track cache and a large pagepool. In general, a large number of vdisk track descriptors should be cached. If the expected vdisk NSD access pattern is random across all defined vdisks and within individual vdisks, a larger value for nsdRAIDTracks might be warranted. If the expected access pattern is sequential, a smaller value can be sufficient. It is not necessary to configure the pagepool to cache all the data for every cached vdisk track descriptor, but this example calculation can provide some guidance in determining appropriate values for nsdRAIDTracks and nsdRAIDBufferPoolSizePct.
Define each recovery group with at least one large declustered array. This is defined as at least nine pdisks plus the effective spare capacity. A minimum spare capacity equivalent to two pdisks is strongly recommended in each large declustered array.
The code width of the vdisks must also be considered. The effective number of non-spare pdisks must be at least as great as the largest vdisk code width. Place the log vdisk in a separate declustered array of solid-state disks SSDs. These SSDs should be isolated in a small log declustered array, and the log vdisk should be the only vdisk defined there.
One pdisk of spare capacity should be defined, which is the default for a small declustered array. For example, if the log declustered array contains four physical SSDs, it should have one spare defined and the log vdisk should use 3-way replication. The recommended track size for the log vdisk is 1MiB, and the recommended total size is 2 - 4 GiB.
Determine the declustered array maintenance strategy. Disks will fail and need replacement, so a general strategy of deferred maintenance can be used. For example, failed pdisks in a declustered array are only replaced when the spare capacity of the declustered array is exhausted. This is implemented with the replacement threshold for the declustered array set equal to the effective spare capacity.
This strategy is useful in installations with a large number of recovery groups where disk replacement might be scheduled on a weekly basis.
The choice of vdisk RAID codes depends on the level of redundancy protection required versus the amount of actual space required for user data, and the ultimate intended use of the vdisk NSDs in a GPFS file system. Reed-Solomon vdisks are more space efficient. When partial tracks of a Reed-Solomon vdisk are written, parity recalculation must occur. Replicated vdisks are less space efficient. The advantage of vdisks with N-way replication is that small or partial write operations can complete faster.
The file system metadata is typically written in small chunks, which takes advantage of the faster small and partial write operations of the replicated RAID code. Applications are often tuned to write file system user data in whole multiples of the file system block size, which works to the strengths of the Reed-Solomon RAID codes both in terms of space efficiency and speed.
When segregating vdisk NSDs for file system metadataOnly and dataOnly disk usage, the metadataOnly replicated vdisks can be created with a smaller block size and assigned to the GPFS file system storage pool.
When using multiple storage pools, a GPFS placement policy must be installed to direct file system data to non-system storage pools. When write performance optimization is not important, it is acceptable to use Reed-Solomon vdisks as dataAndMetadata NSDs for better space efficiency. All vdisks within all recovery groups in a given JBOD array should be assigned the same failure group number.