Platform Engineering
This page extends beyond the ITIL v5 Foundation curriculum. It integrates established industry models and professional frameworks (referenced where used) to provide practical leadership guidance for ITIL v5 adoption.
The evolution of IT operations
Platform engineering represents the maturation of DevOps: instead of expecting every product team to build and maintain their own operational tooling, a dedicated platform team provides a curated, self-service Internal Developer Platform (IDP) that abstracts infrastructure complexity.
Internal Developer Platforms in an ITIL v5 context
An IDP is a collection of tools, workflows, and self-service capabilities that enable product teams to build, deploy, and operate software independently, within governance guardrails.
How IDPs align with ITIL v5
| IDP Capability | ITIL v5 Practice | Alignment |
|---|---|---|
| Self-service infrastructure provisioning | Service Request Management | Automated fulfilment of standard requests |
| CI/CD pipeline templates | Change Enablement, Deployment Management | Pre-approved standard changes executed by automation |
| Environment management | Infrastructure and Platform Management | Consistent, reproducible environments |
| Secrets and certificate management | Information Security Management | Automated credential rotation and access control |
| Cost visibility dashboards | Service Financial Management | Real-time spend tracking by team and service |
| Compliance scanning | Information Security Management, Service Validation and Testing | Automated policy enforcement |
| Service catalogues | Service Catalog Management | Self-service discovery and provisioning |
IDP design principles (aligned with ITIL guiding principles)
| ITIL Guiding Principle | IDP Application |
|---|---|
| Focus on value | Build platform capabilities that solve real developer pain points, not theoretical ones |
| Start where you are | Evolve existing tools rather than replacing everything |
| Progress iteratively | Release platform features in small increments; gather feedback |
| Collaborate and promote visibility | Engage product teams as customers; measure platform adoption |
| Think and work holistically | Consider security, compliance, cost, and developer experience together |
| Keep it simple and practical | Reduce cognitive load; provide golden paths, not infinite flexibility |
| Optimize and automate | Automate everything that can be automated; measure toil reduction |
Policy as Code
Policy as Code means expressing governance rules, compliance requirements, and security policies as machine-readable code that is automatically enforced in CI/CD pipelines and infrastructure provisioning.
How Policy as Code supports ITIL practices
| ITIL Practice | Policy as Code Application |
|---|---|
| Change Enablement | Automated risk assessment: policies evaluate each change against predefined criteria and approve, flag, or reject based on risk level |
| Information Security | Security policies enforced at deployment: no secrets in code, mandatory encryption, network segmentation rules |
| Service Configuration | Configuration drift detection: policies compare running configuration against baseline and alert on deviations |
| Compliance | Regulatory requirements codified as policies: data residency, access control, audit logging |
Common policy engines
| Tool | Primary Use Case |
|---|---|
| Open Policy Agent (OPA) | General-purpose policy engine for Kubernetes, CI/CD, APIs |
| HashiCorp Sentinel | Policy enforcement for Terraform, Vault, Consul |
| Kyverno | Kubernetes-native policy management |
| AWS Config Rules | Cloud resource compliance |
| Azure Policy | Azure resource governance |
FinOps: financial operations for cloud
FinOps is the practice of bringing financial accountability to cloud spending. It aligns with ITIL v5's Service Financial Management practice.
FinOps lifecycle (aligned with ITIL Continual Improvement)
| FinOps Phase | Activities | ITIL Alignment |
|---|---|---|
| Inform | Tag resources, allocate costs, create dashboards | Measurement and Reporting |
| Optimize | Right-size instances, eliminate waste, use reserved capacity | Capacity and Performance Management |
| Operate | Establish budgets, create governance policies, automate actions | Service Financial Management, Governance |
Key FinOps metrics
| Metric | Description | Target |
|---|---|---|
| Unit cost | Cost per transaction, per user, or per service | Decreasing trend |
| Cloud utilization | % of provisioned capacity actually used | > 70% |
| Waste rate | Spend on unused or underutilized resources | under 10% |
| Coverage ratio | % of spend covered by reservations or savings plans | > 60% |
| Cost allocation accuracy | % of spend tagged and attributed to a team/service | > 95% |
Observability and monitoring evolution
Modern observability goes beyond traditional monitoring to provide deep insight into complex, distributed systems.
From monitoring to observability
| Traditional Monitoring | Modern Observability |
|---|---|
| Predefined checks and thresholds | Dynamic analysis of system behaviour |
| Known failure modes | Discovery of unknown failure modes |
| Dashboard alerts | Trace-based debugging |
| Siloed tools (network, application, infrastructure) | Unified observability platform |
| Reactive (alert when broken) | Proactive (predict before breaking) |
The three pillars of observability
| Pillar | Purpose | ITIL Practice |
|---|---|---|
| Metrics | Quantitative measures of system behaviour (CPU, latency, error rate) | Monitoring and Event Management |
| Logs | Detailed records of events (application logs, audit logs, security logs) | Monitoring and Event Management, Information Security |
| Traces | End-to-end request flow through distributed systems | Monitoring and Event Management, Incident Management |
OpenTelemetry and ITIL integration
OpenTelemetry (OTel) is the industry-standard framework for instrumenting, generating, and collecting telemetry data. It maps to ITIL practices:
| OTel Capability | ITIL Integration |
|---|---|
| Automatic instrumentation | Reduces effort to implement Monitoring and Event Management |
| Distributed tracing | Supports incident Root Cause Analysis across microservices |
| Metric collection | Feeds SLI/SLO measurement for Service Level Management |
| Log correlation | Connects events across systems for Problem Management |
Service mesh and ITIL
A service mesh (e.g., Istio, Linkerd) provides infrastructure-level capabilities that support several ITIL practices:
| Service Mesh Feature | ITIL Practice Supported |
|---|---|
| Traffic management (canary, blue-green) | Deployment Management, Change Enablement |
| Mutual TLS | Information Security Management |
| Circuit breaking | Availability Management, Incident Management |
| Observability (automatic metrics, traces) | Monitoring and Event Management |
| Rate limiting | Capacity and Performance Management |
Related pages
- DevOps & SRE Integration (CI/CD, SLOs, error budgets)
- AI Strategy for ITSM (AIOps and intelligent automation)
- Operating Model Design (team topology)
- Automation Tools (tool categories reference)
Last updated on April 2, 2026
ITIL® is a registered trademark of PeopleCert. © 2026 ITIL v5 Compass