Skip to main content

Sentinel

Production AI ops layer on AWS Bedrock — investigate-only Claude agents, Slack-gated escalation.

AI Infrastructure·Beta·Rev. 2026·AWS Bedrock · Claude Sonnet 4.6 · Slack

What is Sentinel?

Sentinel is iSimplifyMe's production AI operations layer — a fleet of investigate-only Claude agents running on AWS Bedrock that monitor iSM's own infrastructure for regressions, anomalies, and operational incidents. Three workloads run in production: a Diagnostics Agent for tenant uptime forensics, a GH Triage Agent for CI failure classification, and a Pipeline Hang Detector for content-pipeline anomaly investigation. Every detection fires into Slack with structured context, a recommended action, and a human approval gate before any remediation runs.

Abstract

Sentinel is iSimplifyMe's production AI operations layer — a fleet of investigate-only Claude agents running on AWS Bedrock that monitor iSM's own infrastructure for regressions, anomalies, and operational incidents. It is internal infrastructure, not a customer-facing product, and runs to keep the rest of the platform honest. The same architecture is offered to clients as a productized Sentinel-pattern monitoring retainer.

Problem

Production AI infrastructure has more silent failure modes than monitorable ones. A Bedrock model that responds with semantically wrong answers passes a 200 OK health check. A retrieval pipeline that surfaces stale data clears every uptime probe.

Manual log review does not scale across a multi-site network with thirty in-production engagements. Status pages tell you what is on; they do not tell you what is wrong.

Approach

The agent topology

Each Sentinel agent is an investigate-only Bedrock-hosted workload with a discrete surveillance scope and a defined cadence. The agent reads from a constrained set of operational signals — logs, recent error events, model-call traces — runs a Claude Sonnet 4.6 inference pass via the us. US-bounded inference profile to classify the situation, and decides whether the finding warrants escalation. No agent writes to client systems; the architecture is investigate-and-notify only.

Slack as the approval gate

When an agent identifies something worth escalating, it posts a Block Kit card to the appropriate channel with the diagnosis, the recommended remediation, and a small set of action buttons. A human reviewer clicks one. Only then does any remediation fire.

The design rule came directly from a 2026 incident where an unguarded automated drip in the Retell Phone Bridge sent the same follow-up email 48 times to three leads. Sentinel's discipline since then: automated detection is fine, automated remediation requires a human in the loop.

Workload #1: Diagnostics Agent

The Diagnostics Agent investigates client tenant sites that have failed three consecutive uptime checks — running curl, dig, and Cloudflare 5xx-breakdown probes via custom Bedrock tools — then files a markdown bug-report ticket with timeline, root cause, evidence, and recommended fix. Verified cost: $0.06 per incident on synthetic test cases (Claude Sonnet 4.6, ~50 seconds active runtime, ~28k tokens).

Workload #2: GH Triage Agent

The GH Triage Agent polls iSimplifyMe org repository workflow runs every fifteen minutes, detects failures, and runs an inference pass classifying root cause across eight categories: test_flake, regression, infrastructure, auth, dependency, lint_or_typecheck, build_config, and unknown. Output is a structured ticket with markdown body covering Failure Summary, Classification, Failed Jobs, Recent Commits, and investigator Notes. Verified cost: $0.065 per run (Claude Sonnet 4.6, ~44 seconds active runtime).

Idempotent — once a failed run is investigated, a 24-hour DDB lock prevents re-investigation, so flapping CI does not produce duplicate tickets.

Workload #3: Pipeline Hang Detector

The Pipeline Hang Detector watches the iSM multi-site content pipeline for anomaly states — stuck topic-proposal runs, malformed MDX rejections, frontmatter envelope drift, write-post Lambda failures — and runs an inference pass to classify the cause and identify the affected tenants. Output uses the same structured ticket format as the other agents and routes to a content-pipeline-specific Slack channel.

The three workloads share infrastructure: one generic SQS-triggered runner Lambda dispatches the right agent based on a SENTINEL_AGENT_SLUG kickoff message, an atomic conditional-write lock at INCIDENT#OPEN race-protects parallel detection paths, and the same file_ticket and notify_slack tools serve all three. Adding a new Sentinel workload is a registry entry plus a detector handler; everything else is shared.

Eat-our-own-dogfood proof point

Sentinel runs on the same AWS Bedrock substrate (BedrockRuntimeClient + ConverseStreamCommand + DynamoDB ticket store + EventBridge cron + SQS queue + IAM-scoped Bedrock perms) that iSimplifyMe deploys for client validator-architecture engagements. iSM operates Sentinel as a production proof point of the architecture it proposes for regulated-industry clients — every workload type is in production at iSM before being offered to clients.

Status

  • Sentinel runs in production on AWS Bedrock as iSM's internal AI operations infrastructure. Three workloads are live: Diagnostics Agent, GH Triage Agent, and Pipeline Hang Detector.
  • Total platform cost: under $50/month across all three workloads at current activity volume.
  • Architecture is investigate-only by design — no agent writes to client systems, no agent fires remediation without human approval through the Slack gate.

Roadmap

Sentinel's roadmap continues across two tracks: additional workloads against iSM's own properties, and productization as a client-facing service line.

iSM property monitoring (internal expansion)

  • Lighthouse regression detector — nightly Lighthouse audits across the iSM editorial atlas network (Marque Cars, Subdial, Eldercare Atlas, RoofingTechPro) and client websites; threshold-based detection of performance regressions before they affect AEO rankings.
  • AEO drift and citation surveillance — schedule-driven probes against ChatGPT, Gemini, AI Overview, and Perplexity for the citation-protected substrate pages currently cited as authoritative sources; alerts on framing or citation drift.
  • Cost anomaly detector — CloudWatch billing and Cost Explorer probes for AWS spend spikes across the iSM project portfolio.
  • Weekly audit agent — cross-repo health rollups across the thirty-seven iSimplifyMe org repositories.
  • DNS watcher — Cloudflare zone monitoring for the brand-citation infrastructure across all iSM-managed domains.

Client engagements (productized)

The Sentinel architecture is available to client engagements as a productized retainer: Sentinel-pattern monitoring. iSM operates the same investigate-only agent topology on the buyer's AWS Bedrock infrastructure to provide continuous validator-gate hit/miss telemetry, drift detection, and incident response. The retainer pairs with the Validator Architecture build engagement — audit, architecture, then operate — and is priced as a custom monthly retainer sized to scope.

The pattern is repeatable per-client: discovery (which validator gates does the buyer deploy?), Sentinel deployment (investigate-only Claude agents on the buyer's Bedrock account), Slack-gated escalation (findings route to a buyer-designated channel; remediation requires human approval), and quarterly reviews against iSM's reference architecture. Mid-market and enterprise regulated industries only.

Frequently asked

I could not be happier with this company! I have had two websites designed by them and the whole experience was amazing. Their technology and skills are top of the line and their customer service is excellent.
Dr Millicent Rovelo
Beverly Hills
Apex Architecture

Every site we build runs on Apex — sub-500ms, AI-native, zero maintenance.

Explore Apex Architecture

Stay Ahead of the Curve

AI strategies, case studies & industry insights — delivered monthly.

K