Prompt management: How to manage your prompt catalog

Prompt engineering is undergoing transformation. With the increasing integration of language models (LLMs) into the technology stacks of startups and large companies, efficiently managing a base of prompts is becoming a necessity.

We must transition from "prompt engineering" to "prompt management".

Create a "Prompt Council"

When an organization relies on LLMs, clear governance is essential to avoid chaos, ensure consistency, and capitalize on collective intelligence.

For a team or a company, creating a governance entity, a Prompt Council or an AI Guild, is a key step. This cross-functional team, composed of engineers, product managers, and industry experts, has several responsibilities:

Defining Standards: Establish naming conventions, the structure of prompts, and base templates. It guards the GitHub repository.
Validating "Core Prompts": The most critical prompts for the business must be validated by this council to ensure their alignment with strategic objectives and their robustness.
Managing the "Prompt Library": Maintain a centralized library of validated and documented prompts. Each prompt must have an "owner", a clear description of its purpose, existing versions, and the results of performance testing.
Sharing Knowledge: Organize sharing sessions, document "learnings" (for example, "we observed that the instruction 'Think step by step' increases accuracy by 15% on tasks of complex reasoning").

Team Structuring

Prompt Committee (3-5 people)

Define Standards
Manage the “prompt library”
Choose tools and processes
Share knowledge

Prompt Owner (1 per category)

Technical maintenance of prompts
Continuous improvement
Performance monitoring

Review Committee (3-5 people per category)

Validation of evolutions
Regular performance monitoring

Contributors (the entire team)

Submission of new prompts
Feedback on performance in “best effort”
Suggestion of improvements

GitHub as the "Single Source of Truth" for Your Prompts

The era of the prompt "stored in a Google Doc" or "copied and pasted in Slack" is over. A prompt is a production asset as much as any code. It must be treated with the same rigor.

Specialized tools like promptpanda are developing. While awaiting the maturation of these tools, I recommend GitHub.

Why GitHub?

Version Control: The most obvious reason. Every modification of a prompt, even a change of a single comma, can radically alter the performance of the LLM. Git allows tracking of each iteration, understanding the evolution, identifying regressions, and reverting to a stable previous version if necessary. git blame becomes as relevant for a prompt as for a line of code.
Collaboration and Review: Pull Requests (PRs) are an essential mechanism for quality control. A new version of a prompt or a new prompt must go through a review process. The team can comment, suggest improvements, and validate the logic before merging it into the main branch. This ensures that knowledge is not siloed and that best practices are shared.
Structure and Organization: Organize your prompts in a dedicated repository. A clear tree structure is non-negotiable. Imagine a structure by use case, then by version:

/prompt-database/
├── /categories/
│ ├── /category1/
│ ├── /category2/
│ ├── /category3/
│ └── /category4/
├── /templates/
│ ├── base-template.md
│ └── testing-template.md
├── /metrics/
│ ├── performance-logs.json
│ └── ab-test-results.csv
└── /deprecated/

This structure, coupled with Git tags, provides full visibility on the evolution of your prompt base.

Versioning and Templating: Each prompt follows a standardized template that includes metadata, objectives, context variables, and performance metrics, ensuring consistency and reproducibility.

Data driven: Systematic Testing

The quality of a prompt is not judged by intuition. It requires testing.

To store prompt testing results and track their evolution across modifications, you can set up a structured prompt testing framework in your GitHub repo. Here's a practical, scalable method:

/prompts/
  promptA.md
  promptB.md
/tests/
  promptA/
    001-input.json
    001-output.md
    002-input.json
    002-output.md
  promptB/
    ...
/results/
  promptA/
    history.csv
    snapshots/
      2024-06-01-output.md
      2024-06-15-output.md

Create a history.csv per prompt to log the results:

Date	Prompt Version	Test ID	Accuracy	Latency	Cost	Notes

| 2024-06-01 | v1.0 | 001 | 85% | 2.3s | $0.04 | Baseline | | 2024-06-15 | v1.1 | 001 | 78% | 1.8s | $0.03 | Quality Regression | | 2024-06-20 | v1.2 | 001 | 92% | 2.1s | $0.04 | Successful Optimization |

Testing Protocol:

Defining metrics: Accuracy, relevance, cost-per-token, latency
Sampling: Minimum 100 tests per version for statistical significance
Blind testing: Evaluation without knowledge of the tested version
Observation period: A minimum of 7 days to capture variations

Managing prompts is not a peripheral technical problem; it is a central pillar of the performance and scalability of your AI strategy. Moving from a craft approach to industrial management, relying on robust tools like GitHub, analytical methodologies like A/B testing, and clear governance, is the only way to build a sustainable competitive advantage.

Create a "Prompt Council"

Team Structuring

GitHub as the "Single Source of Truth" for Your Prompts

Data driven: Systematic Testing

Leave a private comment