MCP Server: Web scraping automation with AI

Agenty AI exposes a Model Context Protocol (MCP) server that gives AI assistants like Claude, Cursor, and any MCP-compatible client direct access to browser automation tools. You can capture screenshots, generate PDFs, scrape web pages, extract structured data, convert pages to Markdown and more using LLM models

In this article, I will covers how to connect to the Agenty’s MCP server and how to use each available tool.

Prerequisites

Before connecting, you will need:

An Agenty AI account and API key
An MCP-compatible client such as Claude Desktop, Cursor, Windsurf, or any tool that supports the Model Context Protocol
Node.js 18 or later if you are using the CLI or a local configuration file

How to Connect

Add the following configuration to your MCP client. For Claude Desktop, this goes in claude_desktop_config.json. For Cursor, it goes in your .cursor/mcp.json file.

{
  "mcpServers": {
    "agenty": {
      "url": "https://api.agenty.ai/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_API_KEY"
      }
    }
  }
}

Replace YOUR_API_KEY with the key from your Agenty AI > Settings > API Key. Once added, restart your client and the Agenty tools will appear in the tool list.

For clients that support OAuth or token-based login, you can authenticate directly through the Agenty AI sign-in flow without manually pasting a key.

Available Tools

The Agenty MCP server provides eleven tools covering the most common browser automation needs. Each tool accepts a URL or raw HTML as input along with optional parameters for controlling the browser behavior.

Markdown MCP

The markdown tool loads a page, strips navigation, footers, scripts, ads, and other boilerplate, and converts the remaining readable content to clean Markdown. It can resolve relative links to absolute URLs and remove data images.

This is particularly useful for feeding web content into LLMs, building RAG pipelines, or saving readable versions of articles.

Example prompt:

“Convert Web scraping - Wikipedia to Markdown”

“Get the Markdown content of this blog post: https://example.com/article”

Screenshot MCP

The screenshot tool launches a real browser, loads the page, and captures it as a PNG or JPEG image. It supports full-page screenshots, clipped regions, custom viewports, device emulation, and image manipulation like resize and rotate.

This is useful when you want a visual snapshot of a page, need to document a UI, or want to verify how a site looks at a specific screen size.

Example prompt:

“Take a full-page screenshot of https://example.com and return it as a PNG”

“Capture a mobile screenshot of https://agenty.ai using an iPhone 14 viewport”

Web Scraping MCP

The web scrape tool loads a URL in a real browser with full JavaScript execution and returns the rendered HTML. Unlike simple HTTP requests, it waits for dynamic content to load, making it reliable for single-page applications and JavaScript-heavy pages.

Use this when you need the raw HTML of a page exactly as a user would see it after all scripts have run.

Example prompt:

“Scrape the HTML of https://news.ycombinator.com and return the full rendered content”

“Scrape this page and wait 3 seconds for dynamic content to load: https://example.com/dashboard”

Web Crawling MCP

The links crawling tool renders a page and extracts every hyperlink found in the DOM. It returns a flat list of all anchor hrefs, which you can use to build a crawl queue, audit internal linking, or discover content.

Example prompt:

“Find all links on https://agenty.ai”

“Extract every link from https://example.com/blog and give me a list of article URLs”

PDF MCP

The PDF tool renders a page in a real browser and exports it as a PDF. You can set the page format (A4, Letter, etc.), enable landscape mode, set margins, show or hide backgrounds, and provide custom headers and footers.

Example prompt:

“Generate a PDF of https://agenty.ai in A4 format with print backgrounds enabled”

“Create a landscape PDF of https://example.com/report”

Data Extraction MCP

The extract tool renders a page and extracts structured data using CSS selectors or XPath. It is suited for pulling specific fields like product prices, review counts, job titles, or any repeating element from a rendered page.

Example prompt:

“Extract the product names and prices from https://example.com/products”

“Get all the headlines from https://news.ycombinator.com”

Sitemap MCP

The sitemap crawler tool reads a website’s sitemap.xml, follows sitemap index files, and returns a complete list of all discovered URLs. It supports exclusion patterns to filter out certain paths.

Use this to quickly understand the full structure of a website or to build a crawl list for a large site.

Example prompt:

“Find all URLs in the sitemap for https://agenty.ai”

“Get the sitemap for https://example.com and exclude any URLs containing /blog/”

Redirect Tracing MCP

The redirects capture tool follows all HTTP redirects for a given URL and returns the full chain including each intermediate URL, status code, and final destination.

This is helpful for debugging broken links, verifying canonical URLs, and auditing affiliate or shortened links.

Example prompt:

“Trace all redirects for https://bit.ly/some-link”

“Show me the redirect chain for https://example.com/old-page”

Email Scraper MCP

The emails scraper tool loads one or more URLs (up to ten at a time) and extracts all email addresses found on the rendered page. It handles obfuscated addresses and dynamically loaded contact sections.

Example prompt:

“Find all email addresses on https://example.com/contact”

“Scrape emails from these three pages: https://site1.com, https://site2.com, https://site3.com”

Content MCP

The content tool returns the complete rendered HTML of a page after JavaScript execution. It is similar to scrape but optimized for use cases where you need the final DOM state rather than a scrape pipeline.

Example prompt:

“Get the rendered HTML content of https://example.com/app”

Snapshot MCP

The snapshot tool captures the accessibility tree of a page — a structured list of all interactive and readable elements including buttons, links, inputs, and headings. This is designed for AI agents that need to navigate or reason about a page without processing raw HTML.

Example prompt:

“Take an accessibility snapshot of https://example.com so I can understand the page structure”

Example Prompts

Here are a few multi-step examples that combine tools:

Audit a website’s content:
“Get the sitemap for https://example.com, then convert the top 5 pages to Markdown and summarize each one”

Generate a visual report:
“Take a screenshot of https://agenty.ai on mobile and desktop viewports and show me both”

Build a contact list:
“Find all links on https://example.com/team, then scrape email addresses from each team member page”

Verify link health:
“Extract all links from https://example.com/blog and trace the redirect chain for any that don’t return 200”

Archive a page:
“Convert https://example.com/article to Markdown and generate a PDF version, both with absolute links”

Supported Parameters

All navigation-based tools (screenshot, pdf, scrape, content, markdown, links, extract, redirects) share a common set of browser configuration options:

Parameter	Type	Description
`url`	string	The page URL to load. Required if `html` is not provided.
`html`	string	Raw HTML string to render instead of loading a URL.
`waitFor`	string or number	Wait for a CSS selector, a number of milliseconds, or a network event before capturing.
`viewport`	object	Set `width`, `height`, `deviceScaleFactor`, `isMobile`, `hasTouch`, and `isLandscape`.
`device`	string	Emulate a named device such as iPhone 14 or iPad Pro.
`userAgent`	string	Override the browser user agent string.
`cookies`	array	Inject cookies before the page loads.
`blockAds`	boolean	Block ad network requests. Enabled by default for scraping tools.
`blockTrackers`	boolean	Block tracking and analytics scripts.
`goToOptions`	object	Control navigation `timeout` (ms) and `waitUntil` strategy (`load`, `domcontentloaded`, `networkidle0`, `networkidle2`).
`setExtraHTTPHeaders`	object	Add custom request headers as key-value pairs.
`rejectRequestPattern`	array	Block requests matching these URL patterns.
`requestInterceptors`	array	Intercept requests matching a pattern and return a custom response.
`authenticate`	object	Set HTTP `username` and `password` for pages behind basic auth.
`anonymous`	object	Route through a proxy and skip resource types for anonymous browsing.

Screenshot-specific parameters:

Parameter	Type	Description
`options.fullPage`	boolean	Capture the full scrollable page. Defaults to `true`.
`options.type`	string	Image format: `png` or `jpeg`. Defaults to `png`.
`options.quality`	number	JPEG quality from 0 to 100.
`options.clip`	object	Capture a specific region with `x`, `y`, `width`, and `height`.
`options.omitBackground`	boolean	Make the background transparent (PNG only).
`manipulate`	object	Post-process the image with `resize`, `rotate`, `flip`, and `flop`.
`responseType`	string	Return the image as a `buffer` (default) or a hosted `url`.

When using the screenshot tool via MCP, the result is returned as a hosted URL by default. When calling the API directly, responseType defaults to buffer. Set responseType: 'url' to receive a hosted link instead.

PDF-specific parameters:

Parameter	Type	Description
`options.format`	string	Page size: `A4`, `Letter`, `Legal`, `Tabloid`, `Ledger`, `A0`–`A6`.
`options.landscape`	boolean	Render in landscape orientation.
`options.margin`	object	Set `top`, `bottom`, `left`, and `right` margins.
`options.printBackground`	boolean	Include CSS backgrounds in the output.
`options.displayHeaderFooter`	boolean	Show header and footer on each page.
`options.headerTemplate`	string	HTML template for the page header.
`options.footerTemplate`	string	HTML template for the page footer.
`options.pageRanges`	string	Print specific pages, e.g. `1-5`.
`options.scale`	number	Scale the page content.
`emulateMedia`	string	Render as `screen` or `print` media type.
`responseType`	string	Return the PDF as a `buffer` (default) or a hosted `url`.

When using the PDF tool via MCP, the result is returned as a hosted URL by default. When calling the API directly, responseType defaults to buffer. Set responseType: 'url' to receive a hosted link instead.

Rate Limits and Fair Use

Each API key has a monthly request limit based on your plan. Long-running pages with heavy JavaScript or large PDFs consume more resources. Use blockAds and blockTrackers to speed up requests and reduce bandwidth usage on scraping workloads.

The Agenty AI request and response schema is intentionally kept compatible with the most widely used browser automation APIs including Browserless, Firecrawl, Cloudflare Browser Rendering, and similar services. If you are already using one of these providers, you can switch to Agenty AI by updating your base URL and API key without rewriting your existing requests.

Provider	Migration
Browserless	Change base URL and API key. All parameters are compatible.
Firecrawl	Change base URL and API key. Scrape, markdown, and crawl endpoints map directly.
Cloudflare Browser Rendering	Change base URL and auth header. Screenshot and content parameters are compatible.
Puppeteer HTTP APIs	Most goToOptions and page options map 1:1.

MCP Server: Web scraping automation with AI

Prerequisites

How to Connect

Available Tools

Markdown MCP

Screenshot MCP

Web Scraping MCP

Web Crawling MCP

PDF MCP

Data Extraction MCP

Sitemap MCP

Redirect Tracing MCP

Email Scraper MCP

Content MCP

Snapshot MCP

Example Prompts

Supported Parameters

Rate Limits and Fair Use

Top 10 Apify Alternatives for Web Scraping in 2026

Web Scraping vs API: Which Is Right for Your Data Project?

Top 10 Web Scraping Practice Sites for 2026

Master Web Crawling Python: Tools & Best Practices