IT Baseline Protection Manual S 6.35 Stipulating data backup procedures
S 6.35 Stipulating data backup procedures
Initiation responsibility: IT Security Management
Implementation responsibility: IT Security Management, IT Procedures Officer
The process governing how data backups are to be made is determined by the influential factors set forth in S 6.34 Determining the factors influencing data backup. Data backup procedures must be stipulated for every IT system and type of data. If necessary, even individual applications of the IT system should be distinguished should they require different data backup strategies; this is particularly applicable to mainframe computers.
The following methods of making data backups should be considered when determining a backup system:
Type of data backup
Frequency and time of the data backup
Number of generations
Procedure and storage medium
Responsibility for data backup
Storage site
Requirements concerning the data backup archive
Transport modes
Storage modes
The dependency between data backup modes and influential factors is shown in the following table:
X implies direct influence, (X) implies indirect influence
Remarks:
Type of data backup
The following types of data backup are evident:
Data mirroring: With this procedure, copies of data are stored redundantly on several different media. These data media are usually fast, so that duplication of the data media and the required control software result in high costs. The major advantage of data mirroring is that a failure of one of these data media can be counteracted quickly.
Full data backup: With this procedure, all data requiring backup are stored on an additional data medium without consideration as to whether the files have been changed since the last backup. For this reason, full data backup requires a high storage capacity. Its advantage is the simple and quick restoration of data due to the fact that only the relevant files need to be extracted from the last full data backup. If full data backups are carried out infrequently, extensive changes to a file can result in major updating requirements.
Incremental data backup: In contrast to full data backup, this procedure simply stores the files which have been changed since the last (incremental or full) backup. This saves storage capacity and shortens the time required for the data backup. The restoration time for data is generally high, as the relevant files must be extracted from backups made at different stages. Incremental data backups are always based on full data backups and should be interspersed periodically by full data backups. During restoration, the last full backup is taken as a basis which is then extended with the updates from subsequent, incremental backups.
Differential data backup: This procedure stores only the files that have been changed since the last full data backup. A differential backup requires more memory space than an incremental backup but the files can be restored quicker and easier. For restoration of data, the last full data backup will suffice as will the most recent differential backup. This is not the case with an incremental backup, since under some circumstances many data backups must be read one after the other.
A special form of the above-mentioned data backup strategies is image backup. This procedure backs up the physical sectors of the hard disk instead of the individual files on it. This is a full backup which allows very quick restoration on hard disks of the same type.
Another form of backup is Hierarchic Storage Management (HSM). This primarily involves the profitable utilisation of expensive data media. Depending on the frequency at which they are accessed, files are stored on fast on-line media (hard disks), near-line media (automatic data- media changing systems) or off-line media (magnetic tape). Generally, these HSM systems also allow a combination of incremental and full data backup.
Redundant data storage is allowed by RAID (Redundant Array of Inexpensive Disks) systems. The RAID concept represents the linkage of several hard disks under the command of an array controller. There are various RAID levels, RAID level 1 involving data mirroring.
RAID systems are no replacement for data backups! The do not offer assistance in case of theft or fire. The data stores on RAID systems therefore has to be stored on additional media which have to be sited in different fire lobbies.
To select a suitable and economically efficient data backup strategy, the following factors should be taken into account:
Availability requirements:
If availability requirements are extremely high, data mirroring should be considered. If availability requirements are high, full data backup is preferable to incremental data backup.
Data and modification volumes:
If the modification volume is similar to the data volume (e.g. in the use of a database), the storage capacity saved by incremental data backup is so negligible that full backup should be considered. However, if the modification volume is much smaller than the data volume, the storage capacity saved by incremental data backup is considerable and reduces costs to a large extent.
Data modification times:
Data modification times can have a minor influence on the data backup strategy. If an application requires backup of the entire database at certain intervals (e.g. daily, weekly, monthly or annual bookkeeping statements), only full backups are recommended for this purpose.
Knowledge of IT users:
Implementing data mirroring requires appropriate knowledge of the system administrator but no previous knowledge of the IT user. A full data backup can be carried out by an IT user with little system knowledge. Compared with full data backup, incremental data backups require much greater familiarity with the system being used.
Frequency and times of data backup
If data is lost (e.g. due to a head crash on the hard disk), all data changes since the last backup must be restored. The shorter the backup intervals, the less the restoration effort in general. At the same time, it must be noted that in addition to regular data backup intervals (daily, weekly, every workday...), event-dependent backup intervals (e.g. after certain transactions or following the execution of certain programs after system modifications) might also be required.
The following factors must be considered during the determination of the frequency and times of data backup:
Availability requirements, reconstruction effort without data backup, modification volumes:
The interval between data backups should be selected so that the restoration time (the restoration time required for modified data which has not been backed up) for the data changed within this period (modification volume) is shorter than the maximum permissible downtime.
Data modification times:
If data are changed to a large extent (e.g. program sequence for salary payments or different software version) or the entire database needs to be made available at certain points in time, it is advisable to carry out a full data backup immediately afterwards. Regular as well as event-dependent intervals need to be stipulated here.
Number of generations
On the one hand, data backups are repeated in short intervals in order to have up-to-date data available, on the other hand, the data backup must guarantee that saved data are stored for as long as possible. If a full data backup is considered as a generation, the number of generations should be determined, as should the time intervals which must be observed between the generations. These requirements are illustrated using the following examples:
If a file is deleted intentionally or unintentionally, it will no longer be available in later data backups. If it turns out that the deleted file is still required, it can only be restored by using a backup version made before the time of deletion. If such a generation no longer exists, the file must be created again.
A loss of integrity in a file (e.g. due to a technical failure, inadvertent modification or computer virus) will probably be noticed at a later stage instead of immediately. The integrity of such files can only be restored using a generation dated earlier than the occurrence of the loss.
It is always possible for data backups to be carried out incompletely or incorrectly. In such cases, an additional generation often proves to be useful.
For the generation principle to remain useful, a basic condition must be fulfilled, i.e. the time interval between generations must not fall short of a minimum value. Example: an automatic data backup process is disrupted repeatedly; as a result, all existing generations are overwritten successively. This is prevented by overwriting generations only after ensuring that their minimum age has been maintained.
The generation principle is characterised by two values: the minimum age of the oldest generation and the number of available generations. The following applies here:
The higher the minimum age of the oldest generation, the greater the probability of the existence of a previous version of a file in which a loss of integrity has occurred (including deleted files which would have proved useful later).
The greater the number of available generations, the higher the degree of updating of the previous version.
However, the number of available generations is directly related to the costs of data backup, as a sufficient number of data media must be available, too. This is because every generation needs separate data media. For reasons of economy, the number of generations must be restricted to an appropriate value.
The parameters of the generation principle are selected in accordance with the following standards:
Data availability and integrity requirements:
The higher the data availability or integrity requirements, the greater the number of generations required to minimise the time needed to recover from a loss of integrity. If file loss or integrity infringement can not be detected until very late, additional quarterly or annual data backups are recommended.
Reconstruction effort without data backup:
If the database is extensive but can be reconstructed without backups, it can be considered as an additional "pseudo generation".
Data volumes:
The higher the volume of data, the higher the costs of maintaining a generation, due to the increased storage requirement. High volumes of data can therefore restrict the number of generations for reasons of economy.
Modification volume:
The higher the modification volume, the shorter the intervals between the generations should be in order to achieve close updating of files and minimum restoration effort.
Procedure and storage medium
Having determined the type of data backup, the frequency and the generation principle, it is now necessary to select the procedure, including appropriate and economically feasible data media. Examples of standard data backup procedures are described in the following:
Example 1: Manual, decentralised data backup on PC's
On non-networked PC's, backups of application data are usually performed manually by IT users as a full backup. Floppy diskettes are used as data media.
Example 2: Manual, central data backup in Unix systems
For Unix systems with connected terminals or PC's with terminal emulation, central data backup is advisable due to the central data stock. In such cases, data backup often consists of a combination of weekly full backups, and daily incremental backups, performed manually by the Unix administrator using streamer tapes.
Example 3: Manual, central data backup in LAN's
In LAN's (Local Area Networks) with connected PC's, data backup is often carried out in that the PC user backs up his application data on a central network server, after which the network administrator backs up these data centrally; this involves weekly full backup and daily incremental backup.
Example 4: Automatic, central data backup on mainframe computers
Similar to example 2, central data backups on mainframe computers consist of a combination of weekly full backups and daily incremental backups. Often, this is done automatically using HSM (Hierarchic Storage Management) tools. For individual IT applications, additional event-dependent full backups are often performed.
Example 5: Automatic, central data backup in distributed systems
Another alternative consists of a combination of examples 3 and 4. The local data of distributed systems are transmitted to a central, mainframe computer or server, where a combination of full and incremental data backups is performed.
Example 6: Fully automatic centralised backup of decentralised data in distributed systems
As opposed to the above example, the transfer from the decentralised to the centralised system is automatic. Tools are now available which allow access from a central data backup server to decentralised data. Data backup can thus be performed centrally for decentralised users.
To minimise the volume of data on the storage medium, data compression algorithms can also be used. They allow the volume of data to be reduced by up to 80%. When compression is employed for backup, the selected parameters and algorithms must be documented and observed later during data restoration (decompression).
Two parameters must be specified for the backup procedure: the degree of automation and the centralisation (storage location).
There are two degrees of automation: manual and automatic.
Manual data backup implies manual triggering of the backup procedure. Its advantage is that the operator can individually select the interval of data backup in accordance with the work schedule. Its disadvantage is that the efficiency of data backup depends on the discipline and motivation of the operator. Data backups may not be made due to illness or other reasons for absence.
Automatic data backups are triggered by a program at certain intervals. Their advantage is that discipline and reliability are not required of the operator if the backup schedule is complete and accurate. Their disadvantage is that the backup program generates costs and the backup schedule must be updated on changes in the work schedule otherwise important changes might not be backed up in time.
There are two degrees of centralisation: central and decentralised data backup.
Central data backups are characterised by the fact that the storage location and the performance of the data backup are carried out on a central IT system by one operator. This procedure is advantageous in that only the operator requires thorough training and the remaining IT users are relieved of this responsibility. Furthermore, increased centralisation of the database allows more economical usage of data media. The disadvantage is that confidential data might be transferred and disclosed to non-authorised persons.
Decentralised data backups are performed by IT users without being transferred to a central IT system. Their advantage is that IT users are able to control the information flow and data media, particularly if confidential data are backed up. Their disadvantage is that the consistency of data backup depends on the reliability of the IT user; furthermore, decentralised procedures are more time-consuming for IT users.
Following selection of manual or automatic, central or decentralised data backup, a suitable storage medium must be found for the backup copies. The following parameters can be considered for this:
Acquisition time for data media: The time required for priming data restoration depends on the time required for identifying the data media necessary for backup and making them available to the system. Cassettes in a robot-system can be made available for restoration within a matter of minutes; it may be necessary for stored tapes to first be transported in an elaborate procedure and then cued.
Access time, transfer rate: The time required for actually restoring the data depends on the average time needed to access the data on the storage medium and the rate of data transfer. Hard disks allow access to certain files in a few milliseconds, whereas magnetic tapes must first be wound to the correct position. When selecting the data medium, it should be noted that the transfer channels must not be overloaded in the case of high transfer rates.
Practicability/storage capacity: The more elaborate a data backup procedure, the greater the risk of it being performed incorrectly or even ignored. Data media with a low storage capacity prevent effective data backup, as their repeated interchange is time-consuming and susceptible to errors.
Costs: The effort of data backup, i.e. the costs of procuring read/write devices and data media as well as the times required for computations and operations, must be commensurate with the importance of the backup. The life and reliability of the data media should also be taken into consideration. On no account must the running cost of data backup exceed the total cost of restoration without backup including the consequential damage.
The following table (1995 version) contains key figures on acquisition costs, access times, transfer times etc. providing a basis for selecting the correct procedure and storage medium.
Due to the steady drop in the price of data media and continuing technological advances, the above figures can only be used for rough orientation. Currently applicable prices are to be established during the actual selection of the data media.
The following factors are of significance here:
Availability requirements:
The higher the availability requirements, the faster the required access to data media for backup purposes, and the shorter the required time for re-importing the relevant data from the data media. For reasons of availability, it must be ensured that the data media are still usable for restoration even if a reading device fails. A compatible and fully operational replacement for this reading device must be obtainable at short notice.
Data and modification volumes:
With an increasing data volume, use is generally made of economical, tape-data media like magnetic tapes or cassettes (data cartridges).
Deadlines:
If erasure deadlines are to be maintained (e.g. in the case of person-related data), the selected storage medium must allow this erasure. Data media for which erasure is impossible or difficult (e.g. WORM) should be avoided here.
Data confidentiality and integrity requirements:
If the confidentiality and integrity requirements of the original data are high, the same is applicable to the data media used for backing up this data. If encrypted data backup is not possible, consideration should be given to selecting data media whose design and transport characteristics would allow their storage in appropriate cabinets or safes.
Knowledge of IT users:
The knowledge and data processing capabilities of IT users are instrumental in determining whether the selected procedure should allow IT users to personally and manually perform data backups, whether different, qualified persons should perform decentralised backup, or whether automatic data backup would be more practical.
Responsibility for data backup
One of three groups can be assigned the responsibility to carry out data backups: IT users (usually chosen for decentralised and non-networked systems), system managers or administrators intended specially for data backup. Parties responsible for data backups not performed by IT users must be committed to keeping these data confidential and encryption should be considered.
Persons responsible for organising data restoration must also be appointed, in addition to persons authorised to access backup data media, particularly if these are archived. Only these authorised persons must be allowed to access these archives. Furthermore, persons authorised to carry out restorations of complete data stocks or selected, individual files must be appointed.
When determining these responsibilities, particular regard must be given to data confidentiality and integrity requirements, as well as the reliability of the employees in question. It must be ensured that the person-in-charge is available at all times and a substitute should be appointed and trained.
The following factor is influential in this context:
Knowledge possessed by IT users:
The knowledge and data processing capabilities of each IT user determine whether these individuals can be charged with the responsibility of carrying out data backups. If the IT user in question does not possess sufficient knowledge, responsibility must be transferred to the system administrator or a qualified person.
Storage site
Data backup media and original data media must always be stored in different fire sections. In the event that data backup media are stored in a different building or off the premises, the probability of backup copies being damaged in a crisis situation is lowered. However, the greater the distance between the data media and the IT periphery required for restoration (e.g. tape station), the longer the potential transport routes and times, and the longer the resulting restoration periods. The following factor is influential in this context:
Availability requirements:
The higher the availability requirements, the more quickly the data media need to be obtained for data backup. If data media with high availability requirements are stored externally for safety reasons, consideration should be given to storing additional backup copies in the immediate vicinity of the IT system.
Data confidentiality and integrity requirements:
The higher these requirements are, the more important it is to prevent data media from being manipulated. The necessary access control can generally be achieved by appropriate infrastructural and organisational measures, see Chapter 4.3.3. Data Media Archive
Data volume:
With increasing data volumes, the security of the storage site increases in importance.
Requirements concerning the data backup archive
Due to the concentration of data on backup data media, the degree of confidentiality and integrity of the backed up data is at least as high as that of the original data. Consequently, appropriate IT security measures, e.g. access control, are required for data media stored in a central archive.
In addition, organisational and personnel-related measures should be implemented (data media management) to allow quick and accurate access to required data media. For this, the measures in S 2.3 Data media control andChapter 4.3.3 Data media archive must be observed.
The following factors are influential in this context:
Availability requirements:
The higher the availability requirements, the faster the required access to relevant data media. If manual inventory-keeping does not fulfil the availability requirements, automatic access systems (e.g. robotic cassette archives) can be used.
Data volumes:
The data volume decisively determines the number of data media to be stored. Large data volumes require correspondingly large storage capacities of the data archive.
Deadlines:
If erasures deadlines need to be maintained, the data backup archive must be organised appropriately and equipped with the required erasure devices. Erasures are to be executed and documented in the data backup archive by the specified deadlines. In the event that erasure is not technically possible, organisational measures can prevent reuse of files to be erased.
Data confidentiality and integrity requirements:
The higher these requirements are, the more important it is to prevent data media from being manipulated. In general, the access control necessary for this can only be achieved by the infrastructure and organisation-related measures described in Chapter 4.3.3 Data Media Archive.
Transport modes
Data are transferred during any backup process. The following must be observed in such situations, irrespective of whether data are being transferred through a network or line, or whether data media are being dispatched to an archive.
Availability requirements:
The higher the availability requirements, the more quickly data need to be obtained for restoration. This is to be considered during the selection of the transmission medium or transport mode.
Data volumes:
If data required for restoration are to be transferred through a network, the selection of the network's transmission capacity must also be based on the data volumes. It must be ensured that the data volumes can be transmitted within the required time periods (availability requirement).
Data modification times:
If data backups are performed through a network (particularly at specified intervals), the data volumes involved can result in congestions during transmission. A sufficient transmission capacity must, therefore, be ensured at the time of data backup.
Data confidentiality and integrity requirements:
The higher these requirements are, the more important it is to prevent data from being intercepted, copied or manipulated by unauthorised persons during transport. Encryption or cryptographic measures against manipulation must be considered for such data transmissions. Secure containers and routes must be selected for physical transport, and the degree and usefulness of encryption procedures should also be evaluated here.
Storage modes
As part of the data backup policy, it must also be established whether storage or erasure deadlines need to be maintained for certain data.
Deadlines:
If storage deadlines need to be maintained, this can be achieved by archiving a data backup generation. In the case of extended storage deadlines, additional consideration must be given to the required inventory of reading devices and the fact that a refresh (renewed import of magnetically stored data) might become necessary, as such media are demagnetised over long periods of time, so that their data content is eventually lost.
If erasure deadlines are to be maintained, appropriate organisation is necessary; availability of the required erasure devices must also be ensured. Erasure is to be initiated and executed at the specified intervals.
Additional controls:
Are data backup procedures updated in accordance with changes to the IT system?
Are data restoration exercises carried out periodically?
Is adherence to the conditions stipulated in the data backup policy being checked?
Are the persons responsible for data backup sufficiently trained?