ITS Disaster Recovery

 

Volume I  
Revised 05/08/2006  
Get Adobe Reader

Table of Contents

I. Introduction

II. Disaster Plan

  1. Background and Business Impact Assessment    Details: Volume II
  2. General Preventative Activities
  3. Contingencies
  4. Off Site Storage
  5. Backups
  6. Security
  7. Testing of the Plan
  8. Update and Maintenance of the Plan

III. Emergency Procedures

  1. Building Interruption/Disaster
  2. Service Interruption/Disaster
  3. Degraded Level of Service
  4. Activation of the Disaster Recovery Plan
  5. Disaster Recovery Managers
  6. Disaster Recovery Teams
  7. Disaster Recovery Team Leaders

IV. Response Strategies   Detailed Action Plan:   Volume III

  1. Environmental Failure
  2. Hardware / Software Failure
  3. Application Failure

V. Communication Plan

  1. ITS Communication Guidelines
  2. University of Iowa Communication Guidelines

VI. Attachments

  1. Schedule and Milestones
  2. Disaster Planning Prioritization Criteria
  3. Emergency Call List
  4. Maintenance Record
  5. Building Coordinators

I.    Introduction

The University of Iowa depends significantly on Information Technology Services as the campus service provider for computer-supported information processing, campus-wide networks, telecommunications, and technology support for University of Iowa students, faculty, and staff.

The increasing dependency on computers, networks, and telecommunications for operational support poses the risk that a lengthy loss of these capabilities could seriously affect the overall performance of the University. A business impact and risk assessment of University departments which was conducted identified several systems as being critical, compromising those functions whose loss could cause a major impact to the University. It also categorized many of University functions as essential.

Every business unit within the University should develop a plan on how they will conduct business, both in the event of a disaster in their own building or a disaster at Information Technology Services that removes their access to voice and data communications for a period of time. Those business units need means to function while the computers and networks and/or telephones are down, plus they need a plan to synchronize the data that is restored on the central computers with the current state of affairs. For example, if the Payroll Office is able to produce a payroll while the central computers are down, that payroll data will have to be re-entered into the central computers when they return to service. Having a means of tracking all expenditures such as payroll while the central computers are down is extremely important.

The purpose of the plan is to define procedures for a contingency plan for recovery from disruption of telecommunications, computer and/or network services. However, while we will have a huge technical task of restoring computer and network operations ahead of us, we can't lose sight of the human interests at stake.

This disruption may come from total destruction of central sites or from minor disruptive incidents. There is a great deal of similarity in the procedures to deal with the different types of incidents affecting different departments in Information Technology Services. However, special attention and emphasis is given to an orderly recovery and resumption of those operations that concern the critical business of running the University. Consideration is given to recovery within a reasonable time and within cost constraints.

The plan provides guidelines for ensuring that needed personnel and resources are available for both disaster preparation and response and that the proper steps will be carried out to permit the timely restoration of services.

Therefore, the goals of the Disaster Plan are to:

II.    Disaster Plan

Background and Business Impact Assessment

A plan framework for the project was developed, assembled a Project Team and conducted awareness sessions. McGladrey and Pullen consultants were used for the Business Impact Assessment of University departments to: identify critical systems, processes and functions; assess the economic impact of incidents and disasters that result in a denial of access to systems and services; and assess the length of time business units can survive without access to systems, services and facilities.

The Business Impact Assessment Report identified critical service functions and the timeframes in which they must be recovered after interruption. The Business Impact Assessment Report was used as a basis for identifying systems and resources required supporting the critical services provided by Information Technology Services. (Volume II)

A business continuity project work group was established with differing levels and types of responsibilities for business continuity, as follows:

Each of the above departments worked with its staff members in their respective areas in preparing their disaster recovery procedures. Recovery plan components were defined and plans were documented. In the event of a disaster affecting any of the departmental areas, the Director serves as liaison between the service unit (s) affected and other departments providing major services. These services include the support provided by Facilities Management, security provided by Department of Public Safety, and public dissemination handled by University Relations, among others.

General Preventative Activities

Certain preparations have been made in advance to facilitate recovery from a disaster, which destroys all or part of the services that Information Technology provides. This document describes what has been done for a quick and orderly restoration of the facilities and services that Information Technology Services operates. The following list is the general procedures for Disaster Preparedness.

Contingencies

General situations that can interrupt or destroy computer, network, or telecommunication services usually occur under the following major categories:

Environmental Failures

Hardware/Software Failures

Application Failures

There are different levels of severity of these contingencies necessitating different strategies and different types and levels of recovery. This plan covers strategies for:

Off-Site Storage

Off-Site Storage is responsible on an on-going basis for the off site storage of required recovery programs, files, and data. Following the decision to activate the alternate site each group is responsible for orderly and timely transfer of the required off-site stored material to the alternate site location. All central file backups are on DAT tapes or other compact media and stored off site. The Group Leaders, Managers, and other key staff have access to keys where the tapes are stored.

Backups

All systems should be backed up on a periodic basis. Those backups should be stored in an area separate from the original data. Physical security of the data storage area for backups should be considered. Standards should be established on the number of backup cycles to retain and the length of their retention.

The actual backup and cycling procedures vary somewhat depending on the computer platform. Details of these procedures and storage locations are contained in the Response Strategies, Volume III.

Security

Security can be defined as safety, or a state of being free from doubt or danger. As it relates to information, security involves protection from damage or attack, being stable, reliable, and free of failure. Another way to think of it is a guarantee. Securing information is guaranteeing its confidentiality (levels of privacy), integrity (being complete and true), and availability. (being accessible)

Information Technology Services' Administrative Computing Policy primary points are that all information will be secured physically and electronically, all users of information will be individually identified, all applications and systems will be password protected, and all access authority requests will be documented.

All systems should have security products installed to protect against unauthorized entry. All systems should be protected by passwords, especially those permitting updates to data. All users should be required to change their passwords on a regular basis. All security systems should log invalid attempts to access data, and security administrators should review these logs on a regular basis.

Steps you should take immediately when a system has been compromised:

If you feel threatened or if system damage has occurred, you should report the incident to The Department of Public Safety. They will advise you on legal aspects of the computer crime.

The plan is predicated on the validity of some general assumptions, but does not include all special situations that can occur. Any special decisions for situations not covered in this plan needed at the time of an incident will be made by senior technology staff members on site.

Testing of the Plan

Testing the Disaster Recovery Plan is an essential element of preparedness. Partial tests of individual components and recovery plans of specific teams will be carried out on a regular basis. A comprehensive exercise of our continuity capabilities and support by our designated recovery facilities will be performed on an annual basis.

Update and Maintenance of the Plan

It is inevitable in the changing environment of the computer and telecommunication industry that this disaster recovery plan will become outdated and unusable unless it is kept up to date. Changes that will likely affect the plan fall into several categories:

As changes occur in any of the areas mentioned above, the CIO and Directors will determine if changes to the plan are necessary. This decision will require that they will be familiar with the plan in some detail. A document referencing common changes that will require plan maintenance will be made available and updated when required.

The staff in the affected area will make changes that affect the departmenatl recovery portions of the plan. After the changes have been made, The Directors will be advised that the updated documents are available. They will incorporate the changes into the body of the plan and distribute as required.

The following lists some of the types of changes that may require revisions to the disaster recovery plan. Any change that can potentially affect whether the plan can be used to successfully restore the operations of the department's computer, network, and telecommunications systems should be reflected in the plan.

Hardware

Software

Facilities

Personnel

Procedural

III.    Emergency Procedures

In case an incident has happened or is imminent that will drastically disrupt operations, the following minimum steps should be taken to reduce the probability of personal injuries and/or limit the extent of the damage.

Building Interruption/Disaster

Service Interruption/Disaster (Refer to Volume III Response Strategies)

A primary goal of the recovery process is to restore all computer operations without the loss of any data. It is important that the Administrative Information Systems Recovery Team Leader convene the Administrative Information Systems Recovery Team quickly so that they can immediately set about the task of protecting and salvaging any magnetic media on which data may be stored. This includes any magnetic tapes, optical disks, CD-ROMs, and disk drives.

The recovery strategy is to restore the University's data center's computer processing capability and to recover computer support services. This group determines Hardware/Software requirements for recovery processing. The planned recovery hardware is kept current and reviewed periodically by this group as is the configuration, support, and application software.

This group is responsible for the recovery planning of the required Recovery Network, services they provide and the maintaining of its currency. It is also responsible for the implementation of the Recovery Network and services within the time constraints necessary to meet the requirements of operating the critical systems.

Degraded Computer, Network, and/or Telecommunication Services at Central Sites

Activation of the Disaster Recovery Plan

This plan will be invoked upon the occurrence of an incident. The senior staff member on site at the time of the incident or the first on site following an incident will contact the Chief Information Officer (CIO) and/or Directors, of Administrative Information Systems, Systems and Platform Administration, and Telecommunication and Network Services for a determination of the need to declare an incident.

The senior technology staff member on site at the time of the incident will assume immediate responsibility. The responsibility will be to see that people are evacuated as needed. If injuries have resulted or may occur as a result of the incident, immediate attention will be given to those persons injured. The Department of Public Safety and Facilities Management will be notified if necessary. If the situation allows, attention will be focused on shutting down systems, turning off power, etc., but evacuation is the highest priority.

Once an incident that is covered by this plan has been declared, the plan, duties, and responsibilities will remain in effect until the incident is resolved and the proper authorities are notified.

Invoking this plan implies that a recovery operation has begun and will continue with top priority until computer, network, and/or telephone support to the University has been re-established.

This disaster recovery plan will be invoked under one of the following circumstances:

Disaster Recovery Managers: Responsible for the overall recovery progress and makes decisions as necessary for the timely execution of the Disaster Plan. The Chief Information Officer provides liaison with the President and Vice Presidents for reporting the status of the recovery operation.

Responsibilities include:

Disaster Recovery Teams

In case of a disaster, the emergency call list (Attachment 3) will need to be used. General duties of the disaster recovery managers are discussed. Recovery team leaders have been assigned in each area and general duties given. The team leader will make assignment of personnel in the major areas to specific tasks during the recovery stage over that area.

Each member of the recovery groups will follow this general plan of action:

Disaster Recovery Team Leader

The Team Leader(s) of Administrative Information Systems Recovery Team, Systems and Platform Administration Recovery Team, and Telecommunications and Network Services Recovery Team are directly responsible for the monitoring of the execution of the Disaster Plan by the Recovery Organization responsibilities and reporting progress and problems to the Disaster Recovery Coordinators.

Responsibilities include:

IV. Response Strategies

This section provides high-level information about the organization of recovery efforts and the role of the service units. The Disaster Plan Response Strategies (Volume III) documents the detailed recovery procedures for each of the computer, network, and telecommunication systems to be restored at the recovery facility. It is obvious that all major support sections in Information Technology Services will need to function together in a disaster, although a specific plan of action is written for each department. Each department documents the list of equipment necessary to restore service, power and cooling requirements, cabling and networking requirements, operating system and data restoration procedures, and procedures for placing the system into final form. Downtime/Recovery is the responsibility of each department; therefore, refer to individual unit department Volume III for appropriate downtime/recovery procedures.

The following portion of the plan reviews the various threats that can lead to a disaster, where our vulnerabilities are, and steps we should take to minimize our risk. The threats covered here are general to all of ITS facilities. For the most part, the major problems that can cause a computing system to be inoperable for a length of time result from environmental problems related to the computing systems. The various situations or incidents that can disable, partially, or completely, or impair support of ITS's computing facilities are identified. A working plan for how to deal with each situation in detail is provided in Volume III.

Fire

If you discover a fire extinguish it only if you can do so safely and quickly. After extinguished call Public Safety.

If a fire cannot be extinguished:

It is the individual responsibility of every employee to know the evacuation procedures for the building he or she works.

Power Interruption

Facilities Management should be notified in the event of a power outage, a mechanic will be dispatched to evaluate and correct the problem.

The following should be provided:

Heating, Ventilating or Air Conditioning Failure

Facilities Management should be notified in the event of a Heating, Ventilating or Air Conditioning Failure, a mechanic will be dispatched to evaluate and correct the problem.

The following should be provided:

Building or System Security Failure

Bomb Threat

Upon receipt of a bomb threat via telephone or mail all bomb threats must be treated as authentic. Panic must be prevented. The threat should not be discussed more than necessary and rumors should not be started or shared.

Receiving a bomb threat:

Receipt of a mail bomb threat:

Explosion

In the event an explosion occurs within the building:

Wind Damage/Tornado

Snow or Ice Storm

Other Severe Weather

Upon notification of a warning:

Flood and/or Water Damage

8am - 5pm weekdays:

After hours, holidays and weekends:

Riot, Demonstration

Intruders

Hazardous Waste

In the event of an incident involving Hazardous Waste on the University of Iowa Campus:

Response Strategies

Event

Triggers

Action

Responders

Comments

Volume III Section

Environmental/Facilities Failures

Fire

Fire Alarm, smoke, or flames

Call 911

IC Fire Dept., Public Safety

 

Sec.II. 1.2

Sec. III. 4.1

Power Failure

Loss of electrical service

335-5071

Facilities Management

 

Sec. II. 3.1, Sec. III. 4.2

Air Conditioning Failure

Area too hot

335-5071

Facilities Management

 

Sec. II. 3.2, Sec. III. 3.3,

Steam Failure

Loss of heat

335-5071

Facilities Management

 

Sec. II. 3.2

Building Mechanical Failures

Loss of mechanical services, elevators

335-5071

Facilities Management

   

Security System Failures

False alarms or alarm failure

335-5071

Facilities Management

   

Flood

High water, drain blockage

335-5022

Public Safety

 

Sec. II. 1.1

Weather Emergency

Siren, or extreme conditions ice, rain, etc.

335-5022

356-6020

Public Safety, IC Police, IC Fire, Johnson County Sheriff

 

Sec. II. 1.3

Riot, demonstration

Large group of people gathered for a specific purpose

Call 911

IC Police, Fire, Johnson County Sheriff

 

Sec. II. 2.3

Bomb Threat

Phone Call or Mail

Call 911

Public Safety, Johnson County Sheriff

 

Sec. II. 2.2

Intruders

Disruptive or abusive behavior

Call 911

Public Safety

   

Alternate Recovery Site

Unable to occupy existing space

Relocate staff, equipment to alternate site

Facilities Management, Information Technology Services, Outside service provider

 

Sec. II.

Appendix B

Hardware/Software Failures

Servers - Less than 2 hours MTTR

No server access

384-4357

Operations Center

 

Sec. II. Appendix D

Servers - More than 2 hours MTTR

No server access

384-4357

Operations Center

 

Sec. II. Appendix D

Network - Less than 2 hours MTTR

No network access

384-4357

Enterprise File & Print Services, TNS

 

Sec. II. Appendix D, Sec. III 4.4

Network - More than 2 hours MTTR

No network access

384-4357

Enterprise File & Print Services, TNS

 

Sec. II. Appendix D, Sec. III 4.4

           

Application Failures

Computing Infrastructure Interruption

Application fails

JH operators notify responsible employee for application.

AIS

 

Sec. I. 1.1

Application System Malfunction or Error

Application fails

JH operators notify responsible employee for application.

AIS, SPA

 

Sec. I. 1.2, Sec II. 3.3.

Office Environment Inaccessible or Uninhabitable

Unable to occupy office

Evacuate area

AIS, Facilities Management

 

Sec. I. 1.3

Unauthorized or Improper Modification of Software or Sensitive Data

Application fails

JH operators notify responsible employee for application.

AIS, SPA

 

Sec. I. 1.4, Sec. II. 2.5

Vendor Software Failure

Application fails

JH operators notify responsible employee for application.

AIS

 

Sec. I. 1.5

Damage, Destruction, or Corruption of Software/Data

Application fails

JH operators notify responsible employee for application.

AIS, SPA

 

Sec. I. 1.6, Sec. II. 2.7

Enterprise Systems (e-mail, calendering)

Application fails

JH operators notify responsible employee for application

AIS, SPA

   
V. Communication Plan

The ITS Communication Plan is designed to provide an orderly flow of accurate, effective and timely information to the ITS staff and campus during the onset of a crisis situation, or a situation of potential crisis affecting the University of Iowa campus telephone, data network and, computer and information systems.

It is the responsibility of each department to communicate with their customers and other ITS staff. Coordinating with Campus Services Help Desk and other key entry points will provide the communication link in communicating service interruptions.

ITS Communication Guidelines

The focus of this section is to decide in advance how ITS departments will communicate with internal and external audiences in the event of an unplanned service interruption.

This plan recognizes the importance of addressing and supporting communication needs and issues that emerge at the service level. Individual departments will need to extend this plan for the specific requirements of their area. This plan and the University of Iowa Enterprise Information Technology Disaster Plan is intended to provide a framework for partner business units in developing their plans. See http://cio.uiowa.edu/itsecurity/documents/Enterprise-IT-Disaster-Plan.pdf

Communication includes all forms of media, as well as formal and informal interpersonal communication activities.

This section is designed to:

Enterprise IT Emergency Communications

During a campus IT emergency, defined as a serious situation not (or perhaps not yet) having been declared a disaster, the Security Officer has primary responsibility for immediate response. All emergency IT messages will be sent to the college and departmental emergency contact list by the University IT Security Officer.

Develop a Plan of Action.

Determine how ITS will respond to any service interruption by defining the specific actions to be taken, outlining the way that appropriate information should flow to different audiences, and identifying appropriate spokespersons for various constituents. Particular attention should be paid to determine a priority order under which audiences will receive information, as well as a regular schedule of news updates.

Each department will work with University Relations to gather accurate and substantial information regarding the situation and details regarding the University response. University Relations, working with the department, will provide notification to customers, employees, and the general public on progress toward recovery.

Audiences/contacts that should be considered during a crisis:

Plan Enactment.

Notices should be issued in a timely manner, before the story and speculation starts leaking out on its own. It is the organization’s policy to be open and honest in communication no matter where the blame lies. Provide factual information to University Relations and authorities as quickly as facts have been verified, and use every means of communications available to offset rumors and misstatements.

Follow Up.

After the plan is activated, the Disaster Recovery Manager will determine subsequent actions and decide if other employees need to be involved. The following information must be gathered and its accuracy verified to provide an incident report to the Director.

What impact may this crisis have on the organization

University of Iowa Communication Guidelines

University Relations serve as the authorized spokespersons for the Institution. All public information must be coordinated and disseminated by their staff.

University policy requires that only certain administrators may speak on behalf of the University. These spokespeople are the president, the vice president for university relations, and the director for university relations. Under certain circumstances, the previously named administrators may name others as spokespersons.

In the event that regular telecommunications on campus are not available, University Relations will center media relations at a designated location. Information will be available there for the news media and, as possible, for faculty, staff, and students. Cellular and other emergency telephone numbers are available to Public Safety and other designated units. Official information will be made available as quickly as possible to the Campus Information Center, IMU.

In the event of an emergency or other unusual circumstance in which media attention may be focused on the University, you should call University Relations. After hours a University Relations representative is on call evenings, weekends, and holidays to assist University units in communication with the campus and the general public dealing with media emergencies and other unusual circumstances. The representative on call will provide media assistance and alert appropriate University administrators as necessary.

 

ATTACHMENTS

Attachment 1. Work Plan and Timetable

Project Schedule

Item

Assigned

Date Start

Date Due

Estimate of Hours

1. Organize Project and establish a Planning Work Group

Nickels, Dobbins, Grout

09/01/97

03/31/99

 

2. Gathered Detailed Department Data and issued BIA Report

McGladrey & Pullen

09/01/97

10/31/98

 

3. Review McGladrey & Pullen material

Fleagle, Noel, Sawin

04/01/99

06/31/99

10

4. Develop Detailed Response Strategies

Fleagle, Noel, Sawin

04/01/99

12/31/99

 

5. Developed 1st Draft of Response Strategies

Fleagle, Noel, Sawin

04/30/99

08/30/99

 

6. Develop 2nd Draft of Response Strategies

Fleagle, Noel, Sawin

04/30/99

10/29/99

30

7. Test Disaster Scenarios and Plans

Management Team & Functional Units

09/01/99

12/31/99

 

8. Review Y2K readiness and responses

Management Team

09/01/99

12/31/99

 

9. Implement Responses as necessary

Functional Units

 

Ongoing

 

10. Evaluation and Maintenance

Management Team

 

Ongoing

 

Attachment 2. Disaster Planning Prioritization Criteria

  1. Protect Human Life; prevent/minimize personal injury
  2. Protect the Environment
  3. Prevent/minimize damage to physical assets, including structures, animals, and research data
  4. Restore normal operations

Critical Services

 

Electrical / Steam

Communication Services

Potable Water

Transportation (for evacuation)

Chilled Water

Control of Hazardous Materials

Information Systems Software / Hardware

Primary Importance – Safety, Security & Life Support Services – Facilities

Fire protections & security alarms, campus lighting, and areas of controlled access/exits

Research, medical care & support, animal care, records,

access/exit of buildings, docks, labs, parking, and elevators.

Worker safety, occupational – confined spaces, hazardous work/areas.

Emergency signs, emergency response equipment, HVAC controls/systems

Public safety, public relations/communications protocol

Secondary Importance – Services

Residence halls (Food & Shelter), Final Exams

& Classes, Special Events/Concerts,

Daycare, Public Transportation

Financial Services (Payroll, Accounts Payable, Banking, Bonds)

Contracted Services (Conferences, Continuing Education, Health Services,

Registration Records, Receivables, Libraries)

Attachment 3. Confidential Emergency Call List

This list is kept up to date by each functional unit and the Director. It is available only on a "need-to-know " basis.

Attachment 4. Maintenance Record

Updated

Reason for Update

Comments

03/09/99

Numerous changes since original creation date

 

12/31/99

Numerous updates – Distributed complete new document

 

03/31/00

Quarterly Update

 
07/2004 Routine Update  
05/21/2005 Routine Update  
     
     
     
     

Attachment 5. ITS Building Coordinators

Building Name

Building Address

Building Coordinator Room Number

Phone Number

Alternate Building Coordinator

Room Number

Phone Number

Jessup Hall

5 W. Jefferson St.

Glenn Anson , B4

335-0284

 

Lindquist Center

240 S. Madison St.

Peggy Streb, S107

335-5971

Donna D'Ambrose, 134E

335-5398

North Hall

20 W. Davenport St.

Denny Dunlap, 400

335-5511

 

University Services Building

1 W. Prentiss St.

Sue Fangman, 3rd fl.

335-6305

Joyce Craig, 3rd fl.

384-0750

Copyright © 2005 The University of Iowa. All rights reserved.