> For the complete documentation index, see [llms.txt](https://docs.raiaai.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.raiaai.com/ai-training/ai-training/how-to-train-an-ai-agent/module-3-preparing-data-for-the-vector-store/best-practices-vector-store.md).

# Best Practices - Vector Store

## Best Practices for Preparing Files for Vector Store Ingestion

When building and maintaining a knowledge base for complex AI Agents, **file preparation and organization** are critical to ensure long-term scalability, searchability, and accuracy. Below are best practices we recommend for managing large file structures and consistently feeding high-quality data into your vector store.

***

### 1. File Naming Conventions

* Use a **clear and consistent naming structure** so files are easy to locate, replace, or delete later.
* Standard Format:

  ```
  File Name = %Category%-%Contents%.md
  ```

  **Example:**

  * `SALES-Product_Descriptions.md`
  * `SUPPORT-Troubleshooting_Guide.md`
  * `OBJECTIONS-Pricing_Battlecards.md`

**Guidelines:**

* Keep names **short, unique, and descriptive**.
* Only use **dashes (-)** or \*\*underscores (\_) \*\*as separators.
* **Avoid special characters** such as periods (other than `.md` or `.json`), commas, parentheses, or spaces.
* Maintain uniqueness — since file names may be referenced directly in prompts, they must not collide with other files.

***

### 2. File Formats

* **Preferred formats:**
  * **Markdown (`.md`)** → Great for structured documentation, FAQs, sales collateral, objection handling.
  * **JSON (`.json`)** → Best for structured data, configurations, or mapping tables (e.g., intent taxonomies).
* Avoid binary formats like `.pdf` or `.docx` unless necessary. Convert them to Markdown or JSON to preserve structure and allow clean chunking.

***

### 3. Metadata & Context

* Each file should include **metadata headers** so the system (and humans) can quickly identify the file’s purpose.
* Recommended metadata fields:

  ```
  ---
  category: SALES
  purpose: Product descriptions for sales collateral
  last_updated: 2025-09-21
  owner: sales-team
  ---
  ```
* This metadata can be used for **search filtering, debugging, or selective retrieval**.

***

### 4. Chunking Considerations

* Write content in **modular sections** (short paragraphs, bullet points, headings). This makes automated chunking more natural and effective.
* Use **descriptive headings (##)** in Markdown to create semantic breakpoints for chunking.
* Avoid very long walls of text — aim for **500–800 tokens per section** to balance context depth and retrieval efficiency.
* Where possible, **add overlap naturally** in your writing (e.g., re-state context in each section) so that chunks are self-contained.

***

### 5. File Categories & Structure

Organize files by **functional category** so your vector store remains intuitive and scalable:

* **SALES** → product descriptions, case studies, pricing guides, objection battle cards.
* **SUPPORT** → troubleshooting guides, FAQs, setup instructions.
* **LEGAL/COMPLIANCE** → policies, disclaimers, regulatory notes.
* **GENERAL** → company background, mission statements, leadership bios.

Store these in a logical folder structure locally (before upload) so the team can manage them easily.

***

### 6. Version Control & Updates

* Use a versioning convention in **metadata**, not file names. (e.g., `last_updated` field)
* When replacing a file:
  * Remove the old version from the vector store.
  * Upload the new version with the same file name (for continuity).
* Keep a **changelog** (separate `.md` file) that tracks updates across all knowledge base documents.

***

### 7. Content Quality Guidelines

* **Be explicit & factual** → Avoid ambiguous language; AI retrieves best when content is clear.
* **Use Q\&A or FAQ formats** where possible for objection handling and FAQs.
* **Cross-reference within files** using internal headings or bullet lists rather than links (since links may not resolve in vector retrieval).
* **Keep text plain** — avoid embedded images, tables with excessive formatting, or non-standard characters.

***

### 8. Security & Compliance

* **Never include sensitive data** (PII, customer details, private contracts).
* Only upload **approved, external-facing content** for sales/collateral files.
* For compliance-heavy industries, maintain a separate **LEGAL/COMPLIANCE** category and instruct the AI to always defer/escalate when sensitive queries arise.

***

### 9. Testing & Validation

* After upload, **query the vector store** with sample user prompts to confirm retrieval works as expected.
* Test edge cases such as:
  * “What’s your pricing?” → retrieves pricing file.
  * “Why should we choose you over Competitor X?” → retrieves objection-handling battle card.
  * “Tell me about troubleshooting login issues” → retrieves support guide.
* Regularly audit retrieval quality and re-chunk/reformat files if necessary.

***

### 10. Summary

By following these best practices, you’ll create a **scalable, maintainable, and reliable knowledge base** for your AI agents. Consistency in file naming, formatting, metadata, and structure ensures that retrieval is accurate, updates are easy, and objection-handling or discovery processes remain smooth.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.raiaai.com/ai-training/ai-training/how-to-train-an-ai-agent/module-3-preparing-data-for-the-vector-store/best-practices-vector-store.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
