ITIL v5 Compass
Product & Service Lifecycle
6. Operate

Operate

Stage 6 in the Lifecycle

Operate maintains live products and supporting systems in an "agreed working state" so delivery can proceed without unnecessary disruption. While customers rarely interact directly with operations, they experience the impact when availability or performance fail.

What you should take away

  • State the official purpose of operate
  • Define event, incident, and change as introduced in this chapter
  • Explain SRE and observability at Foundation level
  • List the three workflow steps

Official purpose

The purpose of operate is to maintain and monitor digital products and supporting systems ensuring they remain reliable and perform as agreed.

Operations work includes:

  • Running platforms and systems
  • Routine testing (continuity, security)
  • Backups and monitoring
  • Event handling
  • Policy and compliance maintenance

Good operations remain largely invisible; failures become visible through delivery and support channels.

Key facts

QuestionAnswer
Why?Maintain and monitor products ensuring optimal performance and reliability
Who?Product teams, IT operations teams, SRE teams
When?Continually; triggered by transitioned solutions and onboarded suppliers
Key outputs?Operating products/services, performance records and reports
Success metrics?Monitoring coverage/effectiveness, reliability, incident impact, stakeholder satisfaction

Key definitions

  • Event: Any change of state that matters for managing a product, service, or configuration item
  • Incident: Unplanned interruption or quality reduction in service; causes include technology errors, human error, external factors, or unauthorized changes
  • Change: Addition, modification, or removal of anything affecting products or services

Reliability, SRE, and observability

Reliability = performing intended function for required time or cycles.

Site Reliability Engineering (SRE) applies software-engineering discipline to operations, building scalable, reliable systems.

Observability = inferring internal state from external signals (metrics, logs, traces). Products should be "designed for observability" for strong operational data.

Collaboration into Operate

Effective operations require early involvement from operations/SRE teams during design and transition phases for visibility, knowledge transfer, and feedback loops. Dedicated SRE teams coordinate with multiple product teams.

High-level workflow (three steps)

Assess transitioned solutions and operational requirements

Plan operational activities; confirm resource availability

Execute operational plans; report status to stakeholders

Triggers and outputs

Triggered by: Deployments to live environments, transitioned resources, onboarded suppliers

Outputs feed: Deliver and support activities; stable products underpin delivery; deviations trigger support. Operational data informs discover, design, build, and transition.

Extended operations view

Monitoring and response

  • Real-time monitoring, alerting, dashboards
  • AIOps-style pattern recognition

Incident and problem (operational lens)

  • Restore service quickly
  • Reduce repeat incidents through problem management

Related management practices

PracticeRole
Monitoring and Event ManagementObserve and classify
Incident ManagementRestore service
Problem ManagementReduce underlying causes
Infrastructure and Platform ManagementRun platforms
Information Security ManagementOperational security
Availability ManagementMeet availability targets
Capacity and Performance ManagementPerformance and capacity

Inputs and outputs

Inputs: Live product, runbooks, SLAs, monitoring configuration

Outputs: Operational metrics, incidents/problems, reports, improvement signals

Metrics (examples)

  • Availability
  • MTTD / MTTR
  • Incident volume and trends
  • Automation rate
  • Customer-impacting incidents

Related pages