Availability Management

Definition

"The practice of ensuring that services deliver the agreed levels of availability to meet the needs of customers and users."

To fulfill this purpose, organizations must:

Availability: The capacity of an IT service or configuration item to perform its agreed function when needed.

Availability (%) = (Agreed Service Time - Downtime) / Agreed Service Time x 100

% Availability	Downtime/Month	Downtime/Year	Typical Use
99%	~7.3 hours	~3.65 days	Internal systems, non-critical
99.9%	~43.8 minutes	~8.76 hours	Business applications
99.99%	~4.4 minutes	~52.6 minutes	E-commerce, financial services
99.999%	~26 seconds	~5.3 minutes	Emergency services, critical infrastructure

Metric	Abbreviation	Description
Mean Time Between Failures	MTBF	Average time between service failures
Mean Time to Restore Service	MTRS	Average time to restore service after failure
Mean Time Between Service Incidents	MTBSI	Average time between incidents

Technique	Description
Redundancy	Duplicate critical components to prevent single points of failure
Fault Tolerance	Design systems continuing operation when components fail
Active Monitoring	Detect problems before users notice them
Automated Failover	Automatically switch to backup systems during failures
Regular Testing	Verify that availability measures work as intended
Capacity Management	Prevent failures caused by resource exhaustion

Metric	What It Measures
Products/services with documented availability criteria	Coverage of availability requirements
Critical products/services with SLA-based availability requirements	Alignment with business needs
Timely updates to availability requirements	Responsiveness to change
Products/services monitored for availability	Monitoring coverage
Minimum time between failures	System reliability
Number of service disruptions	Incident frequency
Total downtime over the period	Aggregate impact
Maximum service outage	Worst-case impact
MTRS (Mean Time to Restore Service)	Recovery effectiveness
Effective availability controls	Control quality
Ratio actual losses vs expected losses	Accuracy of risk assessment

Availability Manager: Coordinates availability management activities, designs availability solutions, and reports on performance