Understanding Custom Events in PagerDuty: How user-defined alerts trigger incidents.

Custom Events in PagerDuty are user-defined alerts that trigger incidents. Teams tailor alerts from apps, monitoring tools, or scripts to meet specific conditions, speeding responses and creating incident workflows that feel flexible, intuitive, and in line with real-world operations.

Outline

  • Opening: The modern incident response world runs on more than boring alerts. Custom Events let you define what truly matters, so the right incidents get created at the right time.
  • What are Custom Events? Clear definition: user-defined alerts that trigger incidents, coming from apps, tools, or scripts you trust.

  • Why they matter: benefits in plain language—less noise, more relevant alerts, faster responses, better data for responders.

  • How they fit into PagerDuty: a simple mental model of sources, the Events API, routing, deduplication, and incident creation.

  • How to set them up in practice: a practical, approachable workflow with steps and a lightweight example.

  • Real-world examples: concrete scenarios from apps, infrastructure, and monitoring that show value.

  • Best practices and common pitfalls: keep criteria clear, test often, document payloads, and tune routing.

  • Quick tips to get started: practical pointers you can apply today.

  • Closing thought: Custom Events as a bridge between tools and a calmer, smarter incident response.

Custom Events in PagerDuty: what they are and why they matter

Let’s start with the core idea. Custom Events are user-defined alerts that trigger incidents when specific conditions are met. They aren’t limited to a single monitoring tool or a fixed set of notifications. Instead, you can pull in alerts from your applications, your favorite monitoring stack, or even homegrown scripts, and decide exactly when a situation deserves an incident. In other words, you get to decide what counts as a real problem and how it should be handled.

Why does this matter? Because every organization has its own quirks. One team might care about error rates above a threshold, another might care about a surge in latency, and yet another might react to a failed API call that logs an unusual error pattern. Custom Events give you a way to tailor incident creation to your reality. That translates into less noise, quicker triage, and more focused on-call efforts.

How Custom Events work in PagerDuty (the big picture)

Think of it as a funnel. It starts with a source—your app, a monitoring tool, or a script—that emits an event. The event travels to PagerDuty via an integration (often using the Events API). PagerDuty then routes that event to the right service, applies any deduplication logic so the same incident isn’t created multiple times, and, if criteria are met, creates an incident for the on-call team to handle.

Key elements you’ll encounter include:

  • The source and payload: what triggered the alert, where it came from, and any helpful details you attach.

  • Severity and routing: what level of urgency you assign and which service or team should handle it.

  • Correlation and deduplication: if the same issue shows up from different sources, PagerDuty can group them so responders aren’t overwhelmed with duplicates.

  • Incident creation: once the rules say “yes, this qualifies,” an incident is opened, the on-call schedule is notified, and the runbook or playbook can be surfaced.

A practical setup mindset

If you’re ready to experiment, here’s a lightweight, sane approach:

  • Start with a service you care about, like “Payments” or “API Gateway.”

  • Add an integration that accepts Custom Events. Many teams use the PagerDuty Events API (v2) to push events, but you can also connect via webhooks from your monitoring tools.

  • Craft a clear event payload. At minimum, include a summary, a source (where the event originated), and a level of severity. Add a few bite-sized details—like a relevant metric, a link to a runbook, or a diagnostic URL.

  • Decide the criteria that should trigger an incident. For example, “When payment failure rate > 5% for 10 minutes and total requests exceed 1000 in that window.”

  • Route smartly. Map events to the right service and on-call team. If you have multiple components, you might route payment issues to Payments on-call while cache or DB issues go to Infra.

  • Test with real or simulated events. Start with a controlled test event to confirm it creates an incident and reaches the right people.

  • Refine. After a cycle of alerts, tune the thresholds, add needed fields to the payload, and adjust the runbook links.

A concrete example to visualize

Imagine you run an e-commerce site. You have a microservice that handles checkout. Sometimes, a hiccup in the payment gateway causes a small but persistent error rate increase. You don’t want every transient blip to wake the whole on-call team, but you do want a clear signal if the checkout flow appears to be failing.

You set up a Custom Event that fires when:

  • The checkout service sees a failure rate above 4% for 15 minutes,

  • Total requests exceed 2,000 in that window,

  • And the error payload includes a checksum from the payment gateway so responders know the exact failing component.

The event payload carries a succinct summary like: “Checkout failures rising; gateway latency high; investigate payment service health.” It routes to the Payments service with a high severity. If the condition persists, PagerDuty creates an incident, surfaces the runbook link, and notifies the on-call engineer, who can quickly see the context and jump to the right diagnostic steps.

Real-world flavors of Custom Events you might deploy

  • Application health signals: custom events that tally specific error codes and tie them to a feature or service, so you see problems at the feature level, not just at the host level.

  • Infrastructure anomalies: spikes in queue depth, sudden CPU contention, or unusual disk I/O—alerts that come from the exact component that matters, not from a generic monitor.

  • Business-impact signals: events triggered by critical business metrics—like a drop in checkout conversions or an abnormal payment failure rate—that align incident response with business impact.

  • Custom scripts and automation: scripts that run in CI/CD or on a schedule can emit events when a deployment fails or a release goes out of tolerance, offering a direct bridge from change weariness to incident response.

Best practices and common pitfalls (so you don’t shoot in the dark)

  • Keep the criteria tight but clear: overly broad rules create noise. Too strict rules miss outages. Start with a modest threshold, verify with a few real events, and adjust.

  • Attach meaningful details: a concise summary plus a handful of supportive fields (like service name, region, or runbook URL) helps responders get oriented fast.

  • Use stable, human-friendly names: incident titles should quickly convey the problem so a responder can skim and decide what to do.

  • Include a runbook link or reference: responders should land on actionable steps, not guesswork.

  • Test, then test again: simulate typical failures and ensure the right people get alerted, the correct service is on call, and the incident channel opens cleanly.

  • Map events to the right owners: duplicates are less painful when you know where to route. Consider basic dedup rules to group related events.

  • Document payload changes: as you evolve your event definitions, keep a changelog so teammates understand why alerts changed and how to adapt.

  • Security matters: minimize sensitive data in event payloads; ensure access controls for who can send Custom Events and who can modify routing.

Tips to kick things off smoothly

  • Start simple: a single service, one or two straightforward rules, and one or two useful fields in your payload.

  • Use descriptive, compact summaries: something like “Checkout failures increasing; investigate gateway errors” is easier to scan than a long paragraph.

  • Link to runbooks and dashboards: responders appreciate quick access to the current status and the exact steps to take.

  • Watch the pulse: after deployment, monitor how these events land—do they wake the right people in a timely way? If not, adjust.

  • Keep a small, readable payload: a few well-chosen fields beat a dump of data that only confuses.

Why Custom Events are a quiet but mighty upgrade

Custom Events turn incident response from a one-size-fits-all alerting system into a tailored workflow that fits how your teams actually work. They let you decide what’s critical, what to measure, and who should respond. The result isn’t just faster reaction times; it’s smarter triage, better context for responders, and a calmer on-call experience because you’re not chasing every minor blip.

If you’re building or refining your PagerDuty setup, think of Custom Events as a bridge. They connect the data flowing through your tools with the people who need it most, turning scattered signals into coordinated action. And like any good bridge, the real value shows up in the steps you’ll take after it’s built: the rules you write, the payloads you craft, and the playbooks you keep up to date.

A closing thought

Custom Events aren’t magic; they’re a structured way to transform how alerts become incidents. With thoughtful criteria, clear payloads, and careful routing, you can align incident response with both technical realities and business priorities. So, take a moment to map out a couple of meaningful events for your core services. You might be surprised by how much smoother, quicker, and more confident your team’s responses become.

If you’re curious to explore how these events can align with your stack, a quick audit of your current alerting—what triggers an incident today, and what could be improved—can be a revealing first step. And as you refine, you’ll likely discover new opportunities to tell a cleaner, more precise story with every incident you face.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy