# Lesson 3.2 – Data Transformation with raia Academy

{% embed url="<https://youtu.be/iEeL3Mq-eUc>" %}

### 📌 Introduction

Training an AI Agent isn’t about writing code — it’s about giving the agent access to the **right knowledge**, in the **right format**, stored in the **right place**.

This lesson walks you through the process of **transforming raw business data** into optimized, AI-consumable formats using **raia Academy** — a no-code tool purpose-built for training agents effectively and efficiently.

Whether you’re working with messy PDFs, wikis, spreadsheets, or customer transcripts, your job is to **turn unstructured information into structured insight**. With raia Academy, the science becomes streamlined — and the art becomes intuitive.

***

### 🎨 Training AI: Art and Science

Training an AI Agent is a blend of:

* **Science**: using the right formats, structure, and chunking for optimal performance
* **Art**: shaping how the agent interprets information through metadata, tone, and formatting

Your goal isn’t just to store content — it’s to **optimize how the AI retrieves and reasons with it** during a conversation.

> 📘 “The difference between a great agent and a mediocre one is almost always the quality of the training data.”

***

### 📂 Choosing the Right Format: Markdown vs. JSON

<figure><img src="https://3805827895-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSfECtcNwrIDQm7NrCIeB%2Fuploads%2FF4zs9DsOnk76VCE5SOhH%2Fmarkdown_vs_json_comparison.png?alt=media&#x26;token=1ffad0a3-6c22-4500-a934-ca1fbc4fffc6" alt=""><figcaption></figcaption></figure>

Different types of content call for different formats:

***

#### 📄 **Markdown** (Best for Unstructured Content)

Use Markdown when:

* You’re working with documents, wikis, PDFs, or web pages
* Content is primarily narrative or instructional (e.g. policies, FAQs, manuals)
* You want human-readable formatting (headers, bullets, emphasis)

**Examples**:

* Employee handbook
* Customer support policy
* Company overview page

📘 Markdown also supports metadata like tags, titles, and source attribution — all essential for retrieval quality.

***

#### 🧾 **JSON** (Best for Structured Data)

Use JSON when:

* Your content is already structured (tables, FAQs, configs)
* You want the agent to extract, filter, or format responses in a structured way
* You want to define specific input/output fields or schemas

**Examples**:

* API parameter references
* Price lists
* SOPs or how-to workflows
* Product comparison matrices

📘 JSON helps the AI preserve context and relationships between fields, which is useful for logic-based tasks.

***

### 🛠 How raia Academy Simplifies the Process

<figure><img src="https://3805827895-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSfECtcNwrIDQm7NrCIeB%2Fuploads%2FtYWhdk4ZE6e4azOTjGzT%2Fimage.png?alt=media&#x26;token=405657c6-8953-4609-9a5b-9bbb60806800" alt=""><figcaption></figcaption></figure>

Without raia Academy, preparing data involves:

* Manual extraction
* File conversion
* Chunking and tagging
* Custom embedding scripts
* Vector store upload logic

With **raia Academy**, it’s all unified.

***

#### 🧩 Features of raia Academy:

| Feature                                  | Benefit                                                                      |
| ---------------------------------------- | ---------------------------------------------------------------------------- |
| **Document Upload Interface**            | Drag-and-drop or bulk upload docs of any format (PDF, DOCX, HTML, etc.)      |
| **Auto-Transformation to Markdown/JSON** | Converts content into AI-readable formats with optional chunking             |
| **Semantic Metadata Editor**             | Add titles, tags, categories, and source notes to improve retrieval          |
| **Multi-format Export**                  | Export in Markdown, JSON, or structured training bundles                     |
| **Direct Upload to Vector Store**        | Native OpenAI vector store support + Retrieval Skill for Pinecone and others |
| **Derivative Content Generation**        | Summarize, extract FAQs, or reformat using prompt-powered AI workflows       |

> 📘 “raia Academy helps you turn chaos into clarity — it’s like a data refinery for AI training.”

***

### 🔗 Vector Store Integration: No Code, No Hassle

Your AI Agent can’t “learn” from documents unless they’ve been indexed in a **vector store** — a specialized database that enables semantic search and retrieval.

raia Academy supports:

* ✅ **Native integration with OpenAI’s vector store**
* ✅ **raia’s own Retrieval Skill**, which works with:
  * Pinecone
  * Weaviate
  * Other OpenAI-compatible embeddings

**No custom code or integration scripts required.** Just:

1. Upload → 2. Transform → 3. Tag → 4. Push to vector store

This allows both technical and non-technical team members to contribute to agent training.

***

### 📈 Training Optimization Tips

Here are some best practices, supported by raia Academy workflows:

| Tip                                         | Why It Matters                                             |
| ------------------------------------------- | ---------------------------------------------------------- |
| **Chunk by context, not by size**           | AI retrieves based on meaning — not paragraphs             |
| **Use metadata tags consistently**          | Helps differentiate similar topics across documents        |
| **Remove redundant or outdated sections**   | Reduces hallucination risk                                 |
| **Use AI to create summaries and examples** | Improves clarity, especially for long or technical content |
| **Preview retrieval before deploying**      | Use Copilot or simulator to test real use cases            |

***

### 🛤 Recommended Training Workflow

<figure><img src="https://3805827895-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSfECtcNwrIDQm7NrCIeB%2Fuploads%2FROa5RFkdnmqNpWh0kRgI%2Ftraining_optimization_complete_final.png?alt=media&#x26;token=37db04e4-cc5e-4322-8160-031b199594e2" alt=""><figcaption></figcaption></figure>

1. **Gather** documents from internal and public sources
2. **Upload** to raia Academy in raw format
3. **Transform** to Markdown or JSON
4. **Tag** with metadata (topic, use case, date)
5. **Preview** retrieval for key use cases
6. **Push** to your chosen vector store
7. **Iterate** based on live testing feedback

***

### ✅ Key Takeaways

* AI-ready data must be structured for retrieval — **Markdown and JSON** are the gold standards
* **Markdown** = flexible for unstructured data; **JSON** = precise for structured knowledge
* **raia Academy** makes it easy to transform, tag, and train without code
* raia supports both **OpenAI native vector stores** and external options via **Retrieval Skill**
* Great training data = better retrieval, fewer hallucinations, faster time to value
