ITIL v5 Compass
Management Practices
Availability Management

Availability Management

Definition

"The practice of ensuring that services deliver the agreed levels of availability to meet the needs of customers and users."

To fulfill this purpose, organizations must:

  • Establish a shared view of target service levels
  • Identify availability requirements
  • Measure, assess, and report service availability
  • Treat service availability risks

Key Terms

Availability: The capacity of an IT service or configuration item to perform its agreed function when needed.

Measuring Availability

Calculating Availability

Availability (%) = (Agreed Service Time - Downtime) / Agreed Service Time x 100

Common Availability Targets

% AvailabilityDowntime/MonthDowntime/YearTypical Use
99%~7.3 hours~3.65 daysInternal systems, non-critical
99.9%~43.8 minutes~8.76 hoursBusiness applications
99.99%~4.4 minutes~52.6 minutesE-commerce, financial services
99.999%~26 seconds~5.3 minutesEmergency services, critical infrastructure

MTBF, MTRS, and MTBSI

MetricAbbreviationDescription
Mean Time Between FailuresMTBFAverage time between service failures
Mean Time to Restore ServiceMTRSAverage time to restore service after failure
Mean Time Between Service IncidentsMTBSIAverage time between incidents

Formulas

  • MTBSI = MTBF + MTRS
  • Availability = MTBF / (MTBF + MTRS)

Processes

Managing Product and Service Availability

  1. Analyze requirements to understand business availability needs
  2. Propose and verify solution design with appropriate controls
  3. Support and verify solution testing and implementation
  4. Support and verify monitoring and reporting
  5. Analyze data and initiate improvements

Measuring and Reporting Availability

  1. Analyze measurement and reporting needs and capabilities
  2. Agree availability measurement and reporting requirements
  3. Design availability measurements and reports
  4. Implement availability measurement and reporting
  5. Review availability measurement and reporting

Techniques to Improve Availability

TechniqueDescription
RedundancyDuplicate critical components to prevent single points of failure
Fault ToleranceDesign systems continuing operation when components fail
Active MonitoringDetect problems before users notice them
Automated FailoverAutomatically switch to backup systems during failures
Regular TestingVerify that availability measures work as intended
Capacity ManagementPrevent failures caused by resource exhaustion

Recommendations for Practice Success

  • Understand service consumer needs and expectations beyond technical metrics
  • Understand legal and regulatory availability requirements
  • Design availability matching actual business needs, not theoretical maximums
  • Keep improving service availability without significant cost increases
  • Automate availability controls where practical
  • Integrate the practice into organizational value streams

Key Metrics

MetricWhat It Measures
Products/services with documented availability criteriaCoverage of availability requirements
Critical products/services with SLA-based availability requirementsAlignment with business needs
Timely updates to availability requirementsResponsiveness to change
Products/services monitored for availabilityMonitoring coverage
Minimum time between failuresSystem reliability
Number of service disruptionsIncident frequency
Total downtime over the periodAggregate impact
Maximum service outageWorst-case impact
MTRS (Mean Time to Restore Service)Recovery effectiveness
Effective availability controlsControl quality
Ratio actual losses vs expected lossesAccuracy of risk assessment

Key Roles

  • Availability Manager: Coordinates availability management activities, designs availability solutions, and reports on performance

Software Tools

  • Availability and capacity modelling and management tools
  • Automated testing tools
  • Monitoring and event management tools
  • Architecture management tools
  • Analysis and reporting tools
  • Service catalogue and CMDB tools
  • Risk management tools