Scraping iframes with Puppeteer

Scraping data from iframes can be quite tricky at times because you need to make sure the iframe is loaded completely in the webpage. Thus, if the iframe is not part of the html document, but is loaded using JavaScript or another technique, you should probably wait for the Puppeteer script to execute and render the iframe dynamic content before running your scraping function.

In this article I will show you how to scrape data from iframes using Agenty’s Puppeteer API to wait for an iframe loading and then extract the data when it’s loaded completely by watching a selected inside the iframe.

What is Iframe

An iframe is a HTML page that is embedded inside another page of a website. The iframe uses <iframe> tags in HTML and is mostly used to render some external content on a website.

This is the example iframe looks like -

Scraping iframe with Agenty

How to scrape iframe website using scraping agent to find the iframe selector and use that to identify the iframe element:

  1. Find CSS selector of iframe using Agenty’s chrome extension or by inspecting the website HTML
  2. Add this selector on scraping agent page under iframe option
  • Go to your scraping agent page > Configuration tab
  • Click on Wait for on sidebar
  • Enter the CSS Selector on ‘Iframe Selector’ option

  1. Then, create your css selector from main web page using Agenty chrome extension.
  2. Add those css selectors on your scraping agent.
  3. Save this agent and run it.

Scraping iFrame using Puppeteer

If you are using Agenty developer mode, you can also write your code in puppeteer to scrape data from a page which has the iframe, we need to wait for the iframe to render on page and then continue the scraping agent to extract the data we want from iframe.

  • Navigate to the page using page.goto()
  • Find the iframe by name() or url()
  • Wait for selector to ensure iframe loaded using frame.waitForSelector('selector here')
  • Extract data, capture screenshot etc.
// Read the `url` from request, goto the page, capture screenshot and return the results

module.exports = async ({ page, request }) => {
    const response = await page.goto(request.url, {
        waitUntil: 'networkidle2',
        timeout: 30000
    });  
    
   // Find the iframe by name() or url()
    const frame = await page.frames().find(f => f.name() === 'iframeResult');
    
    if(frame){
        // Wait for h2 inside the iframe
        await frame.waitForSelector('h2');
    }else{
         console.log("iFrame not found");
    }
    
    console.log(`statusCode : ${response.status()}`);
    
    // Capture the screenshot
    await page.screenshot({path : 'iframe-screenshot.png'});
    
    return {
        data : {},
        type : 'application/json'
    };   
};

Try the code here - https://cloud.agenty.com/browser

iframe-scraping

Find iFrame by class

Sometimes there are no iframe names or fixed URL defined in HTML to find the iframe using the pages.frames().find() method. So, we can use the contentFrame() and elementHandle combination to find the iframe using CSS selectors.

For example, this code will work same as above, but we are using the ID CSS selector #iframeResult to find the element instead by name.

const elementHandle = await page.$('#iframeResult');
const frame = await elementHandle.contentFrame();
await frame.waitForSelector('h2');
```[https://cloud.agenty.com/browser](https://cloud.agenty.com/browser)

Signup now to get 100 pages credit free

14 days free trial, no credit card required!