When scraping a website using Agenty’s scraping tool, cookie consent prompts are displayed. Sometime we must dismiss or accept these prompts before continuing the web scraping agent to extract data or capture screenshot automatically.
The consent cookies are a standard part of the e-commerce experience, but they need to be set one at a time on the browser, leading to a lot of tedious clicking. It’s mandatory to accept on some websites especially in European region websites, where we must click on “Accept” button to continue the website or it won’t load the product or pages you are looking to scrape the data from.
There are multiple ways in Agenty to click on a button to accept/reject cookie consent automatically.
- Native option to turn on/off
- Using the login commands to perform one time action
- Using JavaScript to click on button
- Using Playwright/puppeteer code in developer mode
Native option to accept cookies consent
There is a native on/off option in web scraping and crawling agent to specify if you want Agenty to click on accept cookies button when visiting a website for the first time.
Agent > configuration > browser settings
When enabled, Agenty will find any active popup, modal with button ‘accept cookie’, ‘allow cookies’ etc. to click on it for consent and continue the crawling thereafter.
Accept cookies with commands
The consent cookies are usually one-time prompts and won’t appear on sequential pages in the same session, which makes it a perfect candidate to automate through the login feature, as the login commands are run only once before the scraping starts for a given URLs in input.
The login feature allows us to execute several commands in sequence to perform an interactive action on a website e.g. login, select location or region, zip code etc.
Follow these steps to simulate a click to accept or reject cookies.
- Add the navigate command to open the website home page or any other page.
- Add the click command to click on accept or reject button
Accept cookies with JavaScript
We can also inject a small JavaScript function in waitFor
option in Agenty to click on accept cookies button, close modal etc.
This allows us to check and execute our script after each page load, as opposed to the login feature which was supposed to run once only.
Here is an example code to click on accept cookie button if one found after the page load
var cookieBtn = document.querySelector('#onetrust-accept-btn-handler');
if(cookieBtn){
cookieBtn.click();
}
Remember, I am using if
to check whether the cookie button is present or not to avoid undefined errors by JavaScript.
Accept cookies with Puppeteer and Playwright
If you are using the Agenty’s developer mode. You can add this code in the page.evaluate
function to inject a function after page.goto
to accept or dismiss the cookies consent.
// Go to the `url` from input request, extract title and return the results
export default async ({ page, request }) => {
const response = await page.goto(request.url);
console.log(response.status());
const pageTitle = await page.title();
// Accept cookies
await page.evaluate(() => {
var cookieBtn = document.querySelector('#onetrust-accept-btn-handler');
if(cookieBtn){
cookieBtn.click();
}
});
return {
data : { title : pageTitle },
type : 'application/json'
};
};