Agenty 2.0

Fast, scalable and no-code web scraping tool

  • 2x better performance in scraping agent with automatic ads and tracker blocking feature
  • Puppeteer, Playwright integration with scraping agent to use any external code written in Node.js for cross platform.
  • Smart proxies to anonymize each request from thousands of running browsers and never get blocked
  • New /browser API for screenshot, pdf, content extraction with simple GET/POST requests in any programming language

New Browser API

New /browser API to capture screenshot, convert website into pdf, content extraction with simple GET/POST requests.

Browser API URL

https://browser.agenty.com

Endpoints -

  1. /api/screenshot
  2. /api/content
  3. /api/scrape
  4. /api/pdf

See the complete documentation here

Or, Try it here - https://cloud.agenty.com/browser

API Changes

The API routes has been upgraded to v2 from v1
Old URL

https://api.agenty.com/v1

New URL

https://api.agenty.com/v2

There are some other changes in the agent, list level API, please see all the changes in our new API documentation - https://www.agenty.com/docs/api-reference

Scraping agent changes

System fields

The default field has a new option called System fields to select system generate fields - For example to capture current page url,statusCode, or the webpage screenshot etc.

You can also use the default filed to set your static values, for example if I want to add default field with static value as “amazon”

I can just do that with default fields -

JSON Paths

The JSON Path engine has been upgraded to support all latest JSON queries. For example recursive descent and wildcard search is supported now to extract data from JSON APIs.

JSONPath Description
. or [] Child operator
Recursive descent. JSONPath borrows this syntax from ECMAScript 4x.
* Wildcard. All objects/elements regardless of their names.

See complete documentation here

Wait for

We have introduced some new options on wait for element settings under scraping agent.

These options will allow you to better configure your agent to wait for element visible, hidden or fixed timeout to handle dynamic website scraping where actual content is rendered after a few seconds with JavaScript and other client side frameworks.

  • Visible - Agenty will wait for the element to be visible, for example you can use this to wait for your products to be properly loaded from all network requests before starting the scraping .
  • Hidden - Agenty will wait for the element to be hidden, for example you can use this wait for spinner or loader elements to be hidden before scraping the data.

Pagination improvement

We have added a new smart change detection algorithm in scraping agent to scrape new items only while paginating through infinite scroll to avoid duplicates and exit the pagination loop if data starts repeating.

New container option in infinite scroll pagination to specify the particular section/div to scroll instead the full page.

When no container is specified, Agenty will scroll the full page to try loading more items when available.

Lazy load

The new lazy load feature has been added in scraping agent to scroll over the page after load to lazy load images, videos, news etc.

If you are using Agenty for capturing screenshots, scraping below the fold content from dynamic websites - Enable this lazy load option to scroll over the page automatically.

Iframe scraping

Now you can specify the iframe selector to extract data from iframe,

By default, Agenty will scrape the data from mainFrame when no selector is specified.

Timeout

The timeout option is now in milliseconds, we’ve updated all existing agents to use the new timeout settings and chrome extension has been configured to create a new agent with milliseconds timeout format.

Scripts

The scripts feature has been completely replaced with a new JavaScript engine to better integrate with our Browser API and improve testability.

You can use the Post-processing.js to write a custom function to modify the agent result. Now, you can also specify to run your post-processing script in parallel while the agent is running…

Here is the quick example to show if/else use case.

Default scraping result

This is my default web scraping agent result from this sandbox page - Simple products list example

Now, let’s say I want to change the brand name to Apple where it’s Sony on the web page, or maybe more complex logic to set some value in empty cells where the brand name is not specified.

Script

Here, I’ve written a custom function to set the brand name to Apple from Sony.

 module.exports = function modifyOutput(data) {
     data.forEach(row => {
         if (row.brand == 'Sony') {
             row.brand = 'Apple';
         }
     })
     return data;
 }

After script result

Logs and files

The logging feature has been improved to show logs in plain text mode with color coding to display warnings and errors in orange and red colors.

Additionally, we’ve added the new Files section above the logs to show the list of downloadable files created by the Agenty while executing the job.

Gzip compressed download

The result download feature performance has been improved by enabling the Gzip compression to download the compressed job result.

You may still download your agent’s result in plain text CSV format, by clicking on the CSV download button under Download drop-down.

Post processing functions

Some of the least-used post-processing functions have been removed to keep the list short and more focused on quick data transform without writing custom JavaScript functions for simple replace, append values etc.

Relative to absolute links

The scraping agent has been improved to extract absolute links by default, so no need to use the Insert function anymore to convert relative URL into absolute links as we were doing on Agenty v1.

Change detection improvement

The change detection agent has been upgraded with new form submit feature to add custom actions like clicking on a button, scrolling down, login to website before selecting an element for change detection.

Additionally, the comparison view has also been improved with GitHub style HTML comparison to find differences line by line

The change detection software now supports batch URL monitoring as well to monitor bulk URLs in the single agent with a common selector/element to watch and get an alert for changes.

Native integrations

Now, we have native apps for Shopify, Dropbox and Google to connect an authorized Agenty with your account for ETL process to automatic extract, transform and load the data on your integrated location.

So, if you were using the integration on older version Agenty. Make sure to re-connect it using the new Connection feature and then select that connection on your integration attached with the agent.

Chrome extension - v2.9.3

The chrome extension has been upgraded to version 2.9.3 with many improvements

  • URL field is added by default on new agents
  • Fixed a bug causing Preview/Export CSV broken for 1000+ rows
  • Improved CSS selector finder

Web Scraping with Chrome Extensions

Note: If you get an error saying “Unauthorized request”, please logout and login again. We are working on a fix to increase the session timeout for Chrome extension for longer duration.

If you have any feedback or suggestion, open an issue on our public roadmap on GitHub or contact support.

Signup now to get 100 pages credit free

14 days free trial, no credit card required!