Platform Engineering

💡

This page extends beyond the ITIL v5 Foundation curriculum. It integrates established industry models and professional frameworks (referenced where used) to provide practical leadership guidance for ITIL v5 adoption.

The evolution of IT operations

Platform engineering represents the maturation of DevOps: instead of expecting every product team to build and maintain their own operational tooling, a dedicated platform team provides a curated, self-service Internal Developer Platform (IDP) that abstracts infrastructure complexity.

Internal Developer Platforms in an ITIL v5 context

An IDP is a collection of tools, workflows, and self-service capabilities that enable product teams to build, deploy, and operate software independently, within governance guardrails.

How IDPs align with ITIL v5

IDP Capability	ITIL v5 Practice	Alignment
Self-service infrastructure provisioning	Service Request Management	Automated fulfilment of standard requests
CI/CD pipeline templates	Change Enablement, Deployment Management	Pre-approved standard changes executed by automation
Environment management	Infrastructure and Platform Management	Consistent, reproducible environments
Secrets and certificate management	Information Security Management	Automated credential rotation and access control
Cost visibility dashboards	Service Financial Management	Real-time spend tracking by team and service
Compliance scanning	Information Security Management, Service Validation and Testing	Automated policy enforcement
Service catalogues	Service Catalog Management	Self-service discovery and provisioning

IDP design principles (aligned with ITIL guiding principles)

ITIL Guiding Principle	IDP Application
Focus on value	Build platform capabilities that solve real developer pain points, not theoretical ones
Start where you are	Evolve existing tools rather than replacing everything
Progress iteratively	Release platform features in small increments; gather feedback
Collaborate and promote visibility	Engage product teams as customers; measure platform adoption
Think and work holistically	Consider security, compliance, cost, and developer experience together
Keep it simple and practical	Reduce cognitive load; provide golden paths, not infinite flexibility
Optimize and automate	Automate everything that can be automated; measure toil reduction

Policy as Code

Policy as Code means expressing governance rules, compliance requirements, and security policies as machine-readable code that is automatically enforced in CI/CD pipelines and infrastructure provisioning.

How Policy as Code supports ITIL practices

ITIL Practice	Policy as Code Application
Change Enablement	Automated risk assessment: policies evaluate each change against predefined criteria and approve, flag, or reject based on risk level
Information Security	Security policies enforced at deployment: no secrets in code, mandatory encryption, network segmentation rules
Service Configuration	Configuration drift detection: policies compare running configuration against baseline and alert on deviations
Compliance	Regulatory requirements codified as policies: data residency, access control, audit logging

Common policy engines

Tool	Primary Use Case
Open Policy Agent (OPA)	General-purpose policy engine for Kubernetes, CI/CD, APIs
HashiCorp Sentinel	Policy enforcement for Terraform, Vault, Consul
Kyverno	Kubernetes-native policy management
AWS Config Rules	Cloud resource compliance
Azure Policy	Azure resource governance

FinOps: financial operations for cloud

FinOps is the practice of bringing financial accountability to cloud spending. It aligns with ITIL v5's Service Financial Management practice.

FinOps lifecycle (aligned with ITIL Continual Improvement)

FinOps Phase	Activities	ITIL Alignment
Inform	Tag resources, allocate costs, create dashboards	Measurement and Reporting
Optimize	Right-size instances, eliminate waste, use reserved capacity	Capacity and Performance Management
Operate	Establish budgets, create governance policies, automate actions	Service Financial Management, Governance

Key FinOps metrics

Metric	Description	Target
Unit cost	Cost per transaction, per user, or per service	Decreasing trend
Cloud utilization	% of provisioned capacity actually used	> 70%
Waste rate	Spend on unused or underutilized resources	under 10%
Coverage ratio	% of spend covered by reservations or savings plans	> 60%
Cost allocation accuracy	% of spend tagged and attributed to a team/service	> 95%

Observability and monitoring evolution

Modern observability goes beyond traditional monitoring to provide deep insight into complex, distributed systems.

From monitoring to observability

Traditional Monitoring	Modern Observability
Predefined checks and thresholds	Dynamic analysis of system behaviour
Known failure modes	Discovery of unknown failure modes
Dashboard alerts	Trace-based debugging
Siloed tools (network, application, infrastructure)	Unified observability platform
Reactive (alert when broken)	Proactive (predict before breaking)

The three pillars of observability

Pillar	Purpose	ITIL Practice
Metrics	Quantitative measures of system behaviour (CPU, latency, error rate)	Monitoring and Event Management
Logs	Detailed records of events (application logs, audit logs, security logs)	Monitoring and Event Management, Information Security
Traces	End-to-end request flow through distributed systems	Monitoring and Event Management, Incident Management

OpenTelemetry and ITIL integration

OpenTelemetry (OTel) is the industry-standard framework for instrumenting, generating, and collecting telemetry data. It maps to ITIL practices:

OTel Capability	ITIL Integration
Automatic instrumentation	Reduces effort to implement Monitoring and Event Management
Distributed tracing	Supports incident Root Cause Analysis across microservices
Metric collection	Feeds SLI/SLO measurement for Service Level Management
Log correlation	Connects events across systems for Problem Management

Service mesh and ITIL

A service mesh (e.g., Istio, Linkerd) provides infrastructure-level capabilities that support several ITIL practices:

Service Mesh Feature	ITIL Practice Supported
Traffic management (canary, blue-green)	Deployment Management, Change Enablement
Mutual TLS	Information Security Management
Circuit breaking	Availability Management, Incident Management
Observability (automatic metrics, traces)	Monitoring and Event Management
Rate limiting	Capacity and Performance Management

DevOps & SRE Integration (CI/CD, SLOs, error budgets)
AI Strategy for ITSM (AIOps and intelligent automation)
Operating Model Design (team topology)
Automation Tools (tool categories reference)

Last updated on April 2, 2026

DevOps & SRE Integration AI Strategy for ITSM