Appendix D: Advanced Data Storage Solutions

In the past, when the processing of all institution data was primarily in the mainframe environment, all data was stored centrally within the IT operations center on magnetic media (tapes and disks) directly connected to the processor. The introduction of PCs and LANs into the processing environment effectively decentralized information systems processing, bringing the computing power and data storage closer to the end user. With the subsequent proliferation of LANs and WANs, management of the increasing volume of data and the associated storage resources has become more challenging. Nevertheless, small, noncomplex institutions can still satisfactorily store data locally at the PC, network server, mid-range or mainframe level, with oversight responsibility assigned to local users, administrators, or operations personnel. Common storage solutions include the following:

  • PC, server, and midrange: hard drive, floppy discs, compact discs (CDs), and digital video discs (DVDs); and
  • Archival systems: computer output to laser disk (COLD), digital audio tape (DAT), and digital linear tape (DLT).

Within the traditional data center environment using mainframe or mid-range computers, data storage options include arrays of direct access storage devices (DASD), which are large drives of stacked magnetic disks. Other storage options include magnetic tape cartridge devices, automated tape library (ATL), and "jukeboxes". An ATL is a storage unit that contains one or more tape drives, a robotic arm, and a shelf of tapes. The ATL, also called a tape silo, is able to load and unload tapes into the tape drive from the shelf without operator intervention. More sophisticated tape libraries are able to identify each tape; for example, the robotic arm can use a bar-code reader to scan each tape's barcode and identify it. Jukeboxes, containing a series of optical disks, are conceptually similar to ATL units.

Larger, more complex institutions are turning to newer automated data storage solutions to meet their needs. Decision factors motivating the selection of automated storage solutions include:

  • Significant growth in the volume of data (particularly mission critical data);
  • The need to have data continuously available, and the resultant shrinking timeframe available for data back-up;
  • The need for scalability to very large sizes; and
  • The need to facilitate data back-up for business continuity purposes.

Storage Area Network
A storage area network (SAN) is a collection of interconnected storage devices, ultimately connected to host systems over a high-speed optical network. SANs allow institutions to centralize data and connect servers across the network to that data. SANs provide the ability to incorporate multiple storage solutions with different performance characteristics into a single storage pool. Management can map application requirements to the most appropriate storage option. Applications that are throughput-intensive may benefit from one configuration, while applications that are update-intensive may benefit from another configuration. SAN administrators should manage storage from the perspective of the individual applications, so storage monitoring and problem resolution can appropriately address the unique issues of the specific business lines. SANs support disk mirroring, back-up and restoration capabilities, archiving and retrieval of data, data migration from one storage device to another, and data sharing among servers within a network.

Many large institutions can benefit financially from the deployment of a SAN. SANs have a very high return on investment, which makes the total cost of ownership less. The operational benefits of SANs include:

  • Greater speed and performance through Fibre Channel protocol;
  • Increased disk utilization (multiple servers access the same physical disk resulting in more effective allocation of free space);
  • Higher availability of storage through multiple access paths (multiple physical connections from multiple servers);
  • More efficient staff utilization (enabling fewer people to manage more data);
  • Enhanced data recovery capabilities (mirroring capabilities);
  • Improved reliability through clustering (using shared drives, if one storage device fails, another takes over); and
  • Non-disruptive scalability (storage devices can be added to a SAN without affecting other network devices).

A SAN has three physical layers. The top layer, or host layer, consists of the servers. The major components of the host layer are the host bus adapter (HBA) or I/O adapter card, through which the server connects to the SAN and the fiber optic cables. The middle layer is the fabric layer, which includes hubs, switches, and additional cabling. A hub is a device that physically connects cables. A switch also physically connects cables, but has the additional functionality of being able to intelligently route data from the host layer to the storage layer. The third layer is the storage layer where all the storage devices and data are located.

Several protocols are used in SANs. Protocols enable computer systems to communicate with other devices. Protocols are divided into layers and logically sequenced into a stack. Each layer provides different functionality. The bottom layer is the physical layer, which includes all hardware (cabling, hubs, and switches). The software layers of the protocol stack lie on top of the physical layer. The primary SAN protocol is Fibre Channel, since it supports both peripheral interfaces and network interfaces. Fibre Channel protocol actually includes two protocols: Fibre Channel Arbitrated Loop (FC-AL), which works with hubs; and Fibre Channel Switched (FC-SW), which works with switches. Fibre Channel is the foundational protocol in SANs, as other protocols such as small computer system interface (SCSI) run on top of it. SCSI allows computer applications to talk to storage devices.

Fibre Channel SANs employ fiber optic cables, which use pulses of light to transmit data. Due to the fast speeds, this is the ideal medium for data communications. (In a vacuum, light travels approximately 300,000 kilometers per second. A strand of fiber optic cable, whose core consists of tiny strands of glass, slows the speed to about 200,000 kilometers per second due to the impurities of the glass.) The movement of data from server to the data storage device requires significant bandwidth. SANs generally operate within the 1-2 gigabyte per second bandwidth, although faster speeds are being introduced.

Another performance consideration in SANs is latency, which is the time needed for data to travel from one point to another. Latency can be caused by too much distance between the server and the storage device or by too many hops between the servers and the storage device. Each hop adds approximately a one-millisecond delay. Proper planning and design of a SAN is essential to minimize the number of hops. One or two hops are normal. Additional hops add latency and degrade performance.

In order to reduce the risk of major system problems, redundancy is an important consideration in designing SANs. A SAN should have at least two separate fabrics (cabling, hubs and switches), redundant HBAs in each server, and fail-over and/or load balancing software on the servers for the HBAs.

Redundant Array of Independent Disks
Redundant Array of Independent Disks (RAID) configurations are often incorporated into SANs. RAID refers to multiple individual physical hard drives that are combined to form one bigger drive, known as a RAID set. The RAID set represents all the smaller physical drives as one logical disk to the server. The logical drive is a Logical Unit Number (LUN). RAID configurations typically use many small capacity disks to store large amounts of data in order to provide increased reliability and redundancy. RAID is another form of DASD. It offers improved performance because the server has more disks to read from when data is accessed. Availability is increased because the RAID controller can recreate lost data from a failed drive by using parity information from the surviving disks, which is created when the data is initially written to the disk. Management can use a variety of different storage techniques (RAID types) to achieve different levels of redundancy, error recovery, and performance.

Network Attached Storage
Another concept in data storage is Network Attached Storage (NAS). NAS enables server connections and the movement of data between servers over a standard Internet Protocol network. A NAS usually resides on a LAN, while a SAN is its own network. Institutions can use NAS as primary or secondary storage within a network. The strength of NAS is its ease of installation as another node on the network. However, introducing an additional network node may reduce performance. An alternative approach to NAS uses Internet Small Computer Storage Interface (iSCSI) protocol, which connects servers to storage devices using a standard TCP/IP network adapter. iSCSI encapsulates standard SCSI storage blocks into the IP protocol, allowing the transmission of block-based SCSI data to storage devices using a standard IP network. The advantages of iSCSI include ease of deployment, the ability to leverage existing knowledge of IP networking, and reduced cost as opposed to a Fibre Channel SAN.

Storage Virtualization
As large institutions wrestle with growing volumes of data, the concept of storage virtualization is gaining prominence. Storage virtualization takes many different physical storage networks and devices and makes them appear as one entity. This offers institutions the ability to centralize and streamline storage services, thereby providing an efficient means of managing enterprise-wide storage across multiple platforms. Storage virtualization adds additional staff efficiencies by allowing fewer people the ability to manage more data. Storage virtualization is merely a part of the network virtualization concept, in which storage and computing capacity are centralized into a single virtual location so that processing capacity and other network administration tasks can be managed more effectively.




Previous Section
Appendix C: Item Processing