Prompt management: How to manage your prompt catalog
Prompt engineering is undergoing transformation. With the increasing integration of language models (LLMs) into the technology stacks of startups and large companies, efficiently managing a base of prompts is becoming a necessity.
We must transition from "prompt engineering" to "prompt management".
Create a "Prompt Council"
When an organization relies on LLMs, clear governance is essential to avoid chaos, ensure consistency, and capitalize on collective intelligence.
For a team or a company, creating a governance entity, a Prompt Council or an AI Guild, is a key step. This cross-functional team, composed of engineers, product managers, and industry experts, has several responsibilities:
- Defining Standards: Establish naming conventions, the structure of prompts, and base templates. It guards the GitHub repository.
- Validating "Core Prompts": The most critical prompts for the business must be validated by this council to ensure their alignment with strategic objectives and their robustness.
- Managing the "Prompt Library": Maintain a centralized library of validated and documented prompts. Each prompt must have an "owner", a clear description of its purpose, existing versions, and the results of performance testing.
- Sharing Knowledge: Organize sharing sessions, document "learnings" (for example, "we observed that the instruction 'Think step by step' increases accuracy by 15% on tasks of complex reasoning").
Team Structuring
Prompt Committee (3-5 people)
- Define Standards
- Manage the “prompt library”
- Choose tools and processes
- Share knowledge
Prompt Owner (1 per category)
- Technical maintenance of prompts
- Continuous improvement
- Performance monitoring
Review Committee (3-5 people per category)
- Validation of evolutions
- Regular performance monitoring
Contributors (the entire team)
- Submission of new prompts
- Feedback on performance in “best effort”
- Suggestion of improvements
GitHub as the "Single Source of Truth" for Your Prompts
The era of the prompt "stored in a Google Doc" or "copied and pasted in Slack" is over. A prompt is a production asset as much as any code. It must be treated with the same rigor.
Specialized tools like promptpanda are developing. While awaiting the maturation of these tools, I recommend GitHub.
Why GitHub?
- Version Control: The most obvious reason. Every modification of a prompt, even a change of a single comma, can radically alter the performance of the LLM. Git allows tracking of each iteration, understanding the evolution, identifying regressions, and reverting to a stable previous version if necessary. git blame becomes as relevant for a prompt as for a line of code.
- Collaboration and Review: Pull Requests (PRs) are an essential mechanism for quality control. A new version of a prompt or a new prompt must go through a review process. The team can comment, suggest improvements, and validate the logic before merging it into the main branch. This ensures that knowledge is not siloed and that best practices are shared.
- Structure and Organization: Organize your prompts in a dedicated repository. A clear tree structure is non-negotiable. Imagine a structure by use case, then by version:
/prompt-database/
├── /categories/
│ ├── /category1/
│ ├── /category2/
│ ├── /category3/
│ └── /category4/
├── /templates/
│ ├── base-template.md
│ └── testing-template.md
├── /metrics/
│ ├── performance-logs.json
│ └── ab-test-results.csv
└── /deprecated/
This structure, coupled with Git tags, provides full visibility on the evolution of your prompt base.
- Versioning and Templating: Each prompt follows a standardized template that includes metadata, objectives, context variables, and performance metrics, ensuring consistency and reproducibility.
Data driven: Systematic Testing
The quality of a prompt is not judged by intuition. It requires testing.
To store prompt testing results and track their evolution across modifications, you can set up a structured prompt testing framework in your GitHub repo. Here's a practical, scalable method:
/prompts/
promptA.md
promptB.md
/tests/
promptA/
001-input.json
001-output.md
002-input.json
002-output.md
promptB/
...
/results/
promptA/
history.csv
snapshots/
2024-06-01-output.md
2024-06-15-output.md
Create a history.csv per prompt to log the results:
| Date | Prompt Version | Test ID | Accuracy | Latency | Cost | Notes |
|---|
| 2024-06-01 | v1.0 | 001 | 85% | 2.3s | $0.04 | Baseline | | 2024-06-15 | v1.1 | 001 | 78% | 1.8s | $0.03 | Quality Regression | | 2024-06-20 | v1.2 | 001 | 92% | 2.1s | $0.04 | Successful Optimization |
Testing Protocol:
- Defining metrics: Accuracy, relevance, cost-per-token, latency
- Sampling: Minimum 100 tests per version for statistical significance
- Blind testing: Evaluation without knowledge of the tested version
- Observation period: A minimum of 7 days to capture variations
Managing prompts is not a peripheral technical problem; it is a central pillar of the performance and scalability of your AI strategy. Moving from a craft approach to industrial management, relying on robust tools like GitHub, analytical methodologies like A/B testing, and clear governance, is the only way to build a sustainable competitive advantage.
Leave a private comment
Have a question or feedback? Send me a private message directly. It won't be published.