626Labs · Claude Code plugin

Audit the prompts your app ships.

Close the prompt loop: scan, audit, eval, grade, fix. vibe-prompt is the prompt-audit and behavioral-testing layer for vibe-coded apps that ship LLM features — running 13 structural smell checks across 5 dimensions, evaluating against your production model with a calibrated LLM-judge, and routing confident fixes to ready-to-apply diffs.

v0.7.1·9 commands·F1-F13 smells·MIT

Includes9 commands · 13 skills

vibe-prompt ~
$ /vibe-prompt:scan
[vibe-prompt] Scanning prompt sites... done
14 prompt sites found across src/
→ read-only — nothing changed
 
$ /vibe-prompt:audit
[vibe-prompt] Running F1–F13 checks...
→ schema tightness: 2 issues · persona consistency: clean
→ injection resistance: F11 flagged → handoff to vibe-sec
 
$ /vibe-prompt:grade
[vibe-prompt] Composite grade: B+ — regression vs baseline: none
01 · What it does

Nine commands across the prompt lifecycle.

From inventory to fix, every command in the loop is a direct call. Read-only by default — real model calls only on :eval, behind a cost gate.

/vibe-prompt:scan

Inventory every prompt site.

Reads the codebase and maps every place an LLM prompt is constructed or dispatched. Read-only — nothing is changed, nothing is written. The starting point for the rest of the loop.

Reach for it when you want to know what your app is actually sending to the model.

/vibe-prompt:audit

Run 13 structural smell checks.

Applies the F1–F13 taxonomy across five dimensions — schema tightness, persona consistency, instruction clarity, token efficiency, and injection resistance. No model calls, no cost.

Reach for it before an eval run, or any time you change a prompt.

/vibe-prompt:eval

Behavioral test against production.

Runs test cases against your real model with a calibrated LLM-judge. Cost-gated — you confirm spend before any call goes out. Surfaces behavioral drift that static checks can't catch.

Reach for it when you need ground truth on how the prompt behaves, not just how it reads.

/vibe-prompt:grade

Composite grade + regression check.

Rolls scan, audit, and eval signals into a composite grade and compares it against your monotonic baseline — so regressions show up as deltas, not surprises.

Reach for it on every PR that touches a prompt.

/vibe-prompt:remediate

Confidence-routed fix diffs.

Routes high-confidence findings to ready-to-apply diffs and lower-confidence ones to annotated review files. Backs up every prompt before touching it and keeps rollback explicit.

Reach for it after audit surfaces issues you want to fix.

/vibe-prompt:iterate

Propose new AI features.

Reads your product and audit history, then proposes concrete AI feature candidates grounded in what your current prompt surface already handles well.

Reach for it when you want to extend your LLM surface, not just maintain it.

/vibe-prompt:radar

Model-news digest.

Pulls a digest of model releases, capability changes, and pricing shifts relevant to your current stack. No eval calls — read-only.

Reach for it when a model update lands and you want to know if anything breaks.

/vibe-prompt:evolve-prompt

L3 self-improvement.

Reads vibe-prompt's own session and friction logs, then proposes concrete edits to its own SKILLs and evaluation templates. Never auto-applies.

Reach for it when the audit keeps missing something you're catching manually.

/vibe-prompt

State-aware router.

Reads your project state — last scan age, open audit findings, baseline freshness — and recommends the right next command. Asks before launching anything.

Reach for it when you're not sure where in the loop you are.

02 · How it's built

Scan, audit, eval, grade, fix — in that order.

The loop runs left to right: scan inventories every prompt site read-only, audit applies the F1–F13 taxonomy statically, eval runs behavioral tests against your production model behind a cost gate, grade rolls everything into a composite score compared against a monotonic baseline, and remediate routes confident fixes to diffs with backup and explicit rollback. Each step is a standalone call — run the whole loop or drop in at any stage.

The F1–F13 taxonomy covers 13 structural smell classes across five scoring dimensions: schema tightness (are outputs constrained or freeform?), persona consistency (does the assistant's identity hold across turns?), instruction clarity (are directives specific enough to survive model updates?), token efficiency (is the prompt spending tokens on things that matter?), and injection resistance (is user input sanitized before it reaches the model?). F10–F12 are the injection-surface smells — when those trigger, vibe-prompt hands off directly to vibe-sec, which carries the injection findings through its own security posture loop. The two plugins compose, neither depends on the other.

The eval step uses a calibrated LLM-judge: a secondary model call that scores the primary output against a rubric derived from your audit findings. Calibration means the judge's pass threshold is tuned against known-good and known-bad examples from your own baseline, not a generic rubric — so the score means something specific to your app. Every eval run costs real tokens; the cost gate ensures you confirm spend before anything goes out.

Read-only by default. Real model calls only on :eval, behind an explicit cost gate. Nothing leaves your machine except the eval request you confirm.
03 · Get it

Two channels.

Stable marketplace

Tagged releases, promoted via the Vibe Plugins marketplace.

/plugin marketplace add estevanhernandez-stack-ed/vibe-plugins
/plugin install vibe-prompt@vibe-plugins

Canary bleeding edge

Latest main from this repo.

/plugin marketplace add estevanhernandez-stack-ed/Vibe-Prompt
/plugin install vibe-prompt

Read-only by default; real vendor calls only on :eval behind a cost gate. Composes with vibe-sec for injection handoff.