Incident Copilot
AI-Assisted Incident Triage System
Designed and shipped in a weekend hackathon (36 hours).
Overview
Incident Copilot is an AI-assisted dashboard that aggregates operational signals - logs, alerts, support tickets, and deployment notes - and generates a structured triage report in seconds.
Built end-to-end during TartanHacks, it demonstrates how AI can compress incident chaos into actionable clarity.
Hackathon Constraint
Design and ship a functioning AI decision-support system over the weekend.
Constraint Forced:
- Clear scope
- System-first architecture
- Deterministic layers for reliability
- Strict output structure
The Problem
During production incidents, high cognitive load increases resolution time:
- Chaos Factors
- Logs are noisy
- Alerts stack rapidly
- Customers report failures
- Manual Triage Requires
- Cross-referencing multiple signals
- Guessing root causes
- Searching past incidents
What I Built
-
Multi-signal ingestion interface -
Deterministic severity scoring engine -
Similar-incident retrieval system -
Schema-constrained AI triage generator
System Philosophy
Hybrid Intelligence
Rule-based scoring for baseline reliability paired with LLM reasoning for structured hypothesis generation.
Optimized For
Speed, Clarity, and Trust.
MVP Scope
The weekend build supports:
Designed as a decision-support system, not automation.
Outcomes
Shipped in 36 Hours
Architected and fully built both frontend and backend within the hackathon time limit.
System Reliability
Implemented schema-controlled LLM outputs and deterministic scoring for trust.
Context Awareness
Built a lightweight incident memory layer to surface similar past events.
System Architecture
End-to-end stack built for operational speed
Frontend
- Tailwind dashboard UI
- Structured triage layout
- Dynamic binding of AI output
Backend
- Express server orchestration
- OpenAI API integration
- Severity scoring function
- Similar incident lookup from dataset
Strict Output Strategy
The LLM output is strictly structured, not conversational. By stripping away chat interfaces, the system delivers deterministic-feeling JSON that binds directly to the dashboard, ensuring engineers read operational data, not prose.
AI Control Strategy
Engineering maturity beyond "call OpenAI and hope."
Schema Enforcement
Enforced JSON schema directly inside the system prompt to guarantee object shape before parsing.
Resilient Parsing
Try/catch parsing guardrails and a default fallback structure ensure the dashboard never crashes on a hallucinated key.
Trust Metrics
Confidence scoring visualization paired with a deterministic severity score overlay grounds the AI reasoning.
Tradeoffs & Decisions
Balancing the constraints of a 36-hour build with the strict demands of an operational dashboard.
No Vector Database
Relied on keyword-based similarity matching and local deployment with a single-agent workflow. The priority was validating triage structure and reliability, not scaling infrastructure.
Dashboard Over Chat
Operators don't have time to converse during an outage. The UI reconstructs timelines and visualizes insights immediately upon load.
Quantifiable Trust
Implemented numeric severity scores and visual confidence bars so engineers can calibrate how much to trust the generated hypothesis.
Structured Runbooks
Instead of paragraphs of advice, the system outputs explicit, step-by-step runbook actions to immediately begin mitigation.
Scope discipline matters more than feature volume.
Deterministic layers increase AI trust. Without strict parsing and scoring rules, the tool would be too volatile for incident response.
Structured outputs outperform conversational UI. In operational contexts, parsing AI data into dashboards is exponentially more valuable than chatbot interfaces. Hybrid reasoning improves baseline reliability.