# Agent Scraper Skill

<figure><img src="https://1905631084-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEwGkQrKpAAj0jhmC3fkJ%2Fuploads%2FOY87O8gnSlBM1kjOECet%2Fimage.png?alt=media&#x26;token=f0b0f614-5a63-4b0f-8a37-bae20bf74fdb" alt=""><figcaption></figcaption></figure>

## Able to setup multiple Scrapers

You have the ability to have multiple scrapers for your AI Agent to scrape information from.

<figure><img src="https://1905631084-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEwGkQrKpAAj0jhmC3fkJ%2Fuploads%2F10esblJ3YzWGVqNr1Nxo%2Fimage.png?alt=media&#x26;token=5fd530a6-8be9-4923-ad05-a6b5210ba69d" alt=""><figcaption></figcaption></figure>

## Able to fully customize your agents Scrap Config

You have the ability to fully choose and customize your scraping presets.

<figure><img src="https://1905631084-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEwGkQrKpAAj0jhmC3fkJ%2Fuploads%2FpSMnIcB1tL5NMX9DNv2Z%2Fimage.png?alt=media&#x26;token=6a3895c0-23f2-44d4-b071-527e3e76aa2b" alt=""><figcaption></figcaption></figure>

### Crawler Type

Selecting the appropriate crawler type determines how your AI or automation tool navigates and extracts data from websites, impacting speed, accuracy, and compatibility. Different crawlers like Apify or Firecrawl offer varying capabilities for structured data extraction, handling dynamic content, authentication, or large-scale scraping tasks. Choosing the right one ensures reliable data collection while minimizing errors, load issues, or website blocking.

<figure><img src="https://1905631084-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEwGkQrKpAAj0jhmC3fkJ%2Fuploads%2FJudYLH8NSr4eFRXBTAVJ%2Fimage.png?alt=media&#x26;token=ab1ae5f7-b55f-43e4-a79b-6d2b0bd6428b" alt=""><figcaption></figcaption></figure>

## Crawl Format

The crawl format determines how scraped website data is structured and delivered—such as in Markdown for readability, JSON for structured processing, or HTML for raw page content. Choosing the right format ensures the data is usable for your specific needs, whether it's for analysis, display, or integration into other systems.

<figure><img src="https://1905631084-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEwGkQrKpAAj0jhmC3fkJ%2Fuploads%2FUBusK3O1OofpuC15kC8l%2Fimage.png?alt=media&#x26;token=24c6e4b2-ff39-45d8-906f-c0c387b0d848" alt=""><figcaption></figcaption></figure>

## Page Limit and Max Depth

Page Limit and Max Depth are key settings in web scraping that help control the scope and efficiency of a crawl. **Page Limit** restricts the total number of pages scraped, preventing overload or unnecessary data collection, while **Max Depth** controls how far the crawler follows links from the starting page, ensuring it doesn't go too deep into irrelevant or unrelated content.

<figure><img src="https://1905631084-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEwGkQrKpAAj0jhmC3fkJ%2Fuploads%2FkTIOhysYO8HGuzj5QfTq%2Fimage.png?alt=media&#x26;token=83fa832d-44a5-4ceb-a091-d4f47a850ae7" alt=""><figcaption></figcaption></figure>

## Scraping Frequency

Defines how often the web scraping process runs for a given website. You can set it to manual (run only when triggered), daily, weekly, or monthly, depending on how often the site’s content changes. Choosing the right frequency ensures you capture updates without overloading the crawler or collecting unnecessary duplicate data.

<figure><img src="https://1905631084-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEwGkQrKpAAj0jhmC3fkJ%2Fuploads%2FBQpLwdqdnj38ICjlapCY%2Fimage.png?alt=media&#x26;token=9f3556f1-6974-4758-86c0-ccb74a9cd299" alt="" width="548"><figcaption></figcaption></figure>

## Auto Upload Document

When enabled, this feature automatically sends the scraped content to the assistant as a document as soon as the scraping process finishes. This ensures the data is instantly available for review, processing, or further actions without requiring a manual upload.

<figure><img src="https://1905631084-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEwGkQrKpAAj0jhmC3fkJ%2Fuploads%2F8eAyj0nojTZqdlsqPMry%2Fimage.png?alt=media&#x26;token=e22d6f15-0487-4772-84b4-46401ee12620" alt="" width="545"><figcaption></figcaption></figure>

## Website Exclusion while scraping

Excluding specific URLs during web scraping is important to avoid collecting irrelevant, sensitive, or duplicate content, helping ensure cleaner and more targeted data. It also reduces load on the crawler, speeds up the scraping process, and minimizes the risk of violating site policies or scraping restricted areas.

<figure><img src="https://1905631084-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEwGkQrKpAAj0jhmC3fkJ%2Fuploads%2FPGOrwXl1NMTcD4SZR6Kp%2Fimage.png?alt=media&#x26;token=10a74b00-e1e3-4bd6-b2f7-554062f0c326" alt=""><figcaption></figcaption></figure>
