ITIL v5 Compass
Management Practices
Monitoring and Event Management

Monitoring and Event Management

Definition

"The practice of systematically observing services and service components, recording and reporting selected changes of state identified as events."

To fulfil the purpose, an organization needs to:

  • Establish and maintain event models and monitoring needs
  • Provide timely, relevant monitoring data to stakeholders
  • Detect, interpret, and respond to events promptly

Key Terms

Event: Any change of state that has significance for the management of a service or other Configuration Item (CI).

Monitoring: Repeated observation of a system, practice, process, service, or other entity to detect events and to ensure that the current status is known.

Threshold: The value of a metric that triggers a pre-defined response. Thresholds translate abstract service level targets into concrete, actionable monitoring rules.

Alert: A notification that an action needs to be taken, a threshold has been reached, something has changed, or a failure has occurred.

Event Types

TypeDescriptionResponse
InformationalA regular change of state with no action neededLog for reference; may be useful for trend analysis
WarningA change approaching a threshold or unusual patternReview and assess; may require proactive action
ExceptionA threshold has been breached or a failure has occurredImmediate response required; triggers incident management
💡

From monitoring to action: Monitoring generates raw data. Event management interprets that data and triggers the appropriate response. The quality of your event classification and correlation rules determines whether your monitoring investment creates value or just noise.

Processes

Monitoring Planning

Designing what, how, and why to monitor:

Define the objective

Define the objective of monitoring (why we monitor this component)

Define what to monitor

Define what needs to be and can be monitored (feasibility assessment)

Define event types

Define types of events for the object of monitoring (informational, warning, exception)

Define thresholds

Define thresholds for different types of events

Define service health model

Define a service health model (end-to-end event correlations)

Define event correlations

Define event correlations and rule sets (how events relate to each other)

Define monitoring action plans

Define monitoring action plans (what happens when events occur)

Define tool capabilities

Define required monitoring tool capabilities (tooling requirements)

Event Handling

Responding to detected events:

  1. Detect event: Event is identified by monitoring tools
  2. Log event: Record the event with timestamp, source, and context
  3. Filter and correlate event: Remove noise; link related events
  4. Classify event: Determine if informational, warning, or exception
  5. Select event response: Choose the appropriate action based on classification
  6. Notifications sent; response carried out: Execute the response plan

Monitoring and Event Management Review

Periodic review of the practice:

  1. Major events review: Analyse significant events for lessons learned
  2. Review of filtering and correlation analysis: Tune rules to reduce noise
  3. Review of service health models: Update models to reflect infrastructure changes
  4. Review of event response procedures and automation: Optimize response plans
  5. Review of monitoring and event tools: Assess tool capability and gaps
  6. Review of statistical information: Trend analysis and capacity planning

Recommendations for Practice Success

  • Develop the monitoring strategy with proper tools and processes, and review it regularly
  • Understand component purposes and stakeholder needs before designing monitoring
  • Adjust monitoring based on event context and significance
  • Avoid monitoring events of unknown significance unnecessarily (reduces noise)
  • Review monitoring report usage and effectiveness regularly
  • Collaborate post-incident to improve prevention through monitoring
  • Use automation to assess event significance and respond appropriately

Key Metrics

MetricWhat it measures
Satisfaction with practice approachStakeholder confidence in monitoring strategy
Organizational adherence to the approachConsistency of monitoring implementation
Unmet or unrealistic recommendations (%)Quality of monitoring design
Satisfaction with monitoring data and presentationUsefulness of monitoring output
Monitoring data qualityAccuracy and completeness of collected data
Impact of event management errorsConsequences of misclassified or missed events
Event communication noiseVolume of irrelevant alerts
Incidents/problems from poor event managementFailures attributable to monitoring gaps

Key Roles

💡

This practice does not define specific named roles. Monitoring and event management responsibilities are typically distributed across infrastructure, application, and service management teams.

Software Tools

  • Monitoring and event management tools (including native and add-on tools)
  • Workflow management and collaboration tools
  • Knowledge management and CMDB tools
  • Analysis and reporting tools

AIOps and Monitoring (ITIL v5)

CapabilityDescription
Anomaly detectionAI identifies unusual patterns that rule-based monitoring would miss
Event correlationAI links related events across infrastructure layers to reduce noise
Predictive alertsAI forecasts potential issues before thresholds are breached
Auto-remediationAI triggers automated responses to known event patterns
Capacity predictionAI forecasts resource needs based on usage trends