Infinite Scrolling, Load More and Next Click Pagination in Web Scraping

Pagination is a common technique used by web developers to display the large set of products or items in search/list website pages, instead loading the entire set of products on a single page load event.

Setting up pagination to click on the next button (or infinite scroll, load more) to scrape multiple pages in Agenty is very easy and doesn’t require any technical skills in most of the cases.

In this article, I will discuss the different ways of pagination used on websites + some pro techniques to share how to configure your web scraping agent to automatically paginate; and scrape the data from websites with pagination.

Options

  • Enable pagination : True/False
  • Pagination type : Click, Infinite-Scroll or Load-More — The type of pagination you want to run in your scraping agent
  • Next page selector : The unique CSS selector of Next button — The agent will click on that button to paginate until that button is hidden or disabled
  • Script : Advance JavaScript expression for developer to write your own code for pagination to handle complex sites.
  • Page limits : Maximum number of pages needs to be paginated — The maximum number can be anything like 100 or 1000 but the web scraper will exit the pagination if the “Next” button is not found, or disabled, or reached the end of the page. So the web scraping with pagination will keep running until it reaches to the maximum pages limit you set or the next button invisible/disabled on the web page.

Next Button Pagination

Next button pagination is most commonly used pagination in many websites and has a Button (or hyperlink) with “Next” option to click and go to the next page. For example, this web-page in screenshot :

  • It has a next button right at the bottom of the page.
  • If you use the Agenty chrome extension
    and click on the button, you can easily find the CSS selector of this button or can view it in source/inspect element if you are friendly with Chrome Developer tools

Go to the web page you want to crawl and find the unique CSS selector of next button using Agenty Chrome extension or manually by inspecting the element in Chrome Developer Tool
if you are a developer like me :slight_smile:

For example, I am using the extension in the below example and found a.next is the unique selector for the next button in this page to click.

Configure Pagination

  • Go to your scraping agent page and click on the Configuration tab, will take you to the advanced agent editor as in this screenshot below.

  • Open the Pagination section and enable the pagination switch
  • Select the Click as pagination type
  • Enter the next page CSS Selector in “Next button selector” box
  • Then, enter the “Max pages” value to limit the maximum number of pages to scrape

  • Once the Pagination configuration is completed, save the agent (or scraper if you call it that) and re-run to scrape the data from multiple pages automatically.

Infinite Scrolling Pagination

Infinite scrolling is a web-designing technique to load the content on list pages continuously as the user scrolls down the page in the browser, eliminating the need for pagination with next-previous buttons.

This is mostly done using front end frameworks like JavaScript, Jquery, AJAX, AngularJS, ReactJS, Vue.js etc, and the output for those requests are mostly in JSON or XML format.

So, a typical infinite scrolling page sends a HTTP GET or POST request to the server in the background, to fetch the data. Then, the response handler function parses the response and appends to the list/search container on the web-page to keep showing more and more items when the user scrolls down to the page.

So scraping data from infinite scrolling pages will be a bit different then usual next-previous pagination we see at the start of this post, where we just clicked on Next button to load the next page and continue scraping until it was not there.

Infinite Scrolling Website

So, to start with infinite scrolling web-pages scraping follow these steps :

  • Edit your scraping agent and enable the Pagination
  • Select the Infinite Scroll as pagination type
  • In the next page CSS selector option — Leave it blank, if no selector to enter. Or enter the particular element selector if you want Agenty to scroll/mouse over to somewhere specific, instead scrolling to the bottom of the page. By default,
    Agenty will go to end of the page.
  • Max pages : Set the maximum number of scrolling to limit how many pages you want to scrape with infinite scrolling

  • Just save the agent and run it to scrape data from infinite scrolling website.

  • If you want to try it out — The scraping agent is available in demo
    agents
    with the name as “Quotes- Infinite scrolling pagination”. Just clone it in your account and learn how to crawl an infinite scrolling AJAX website.

Load More Pagination

The Load more pagination is almost the same as infinite scroll, with the only difference is you will see a Load More or View More button on the page end.

So, instead of keeps scrolling down, we need to click on Load more button as well to load more items on webpage —

Load More Pagination Scraping

Follow these steps below to scrape data from pages with Load-more pagination

  • Select the Load More as pagination type
  • Enter the button CSS selector, where Agenty will click to load more items
  • Set the max pages limit(n) to tell Agenty how many pages should be crawled at maximum

Pagination with JavaScript Injection

If you are a professional web scraper — You know the web is vast, and not all websites are the same to scrape in terms of complexity, technique requires. Sometimes you need to wait a few seconds, before starting the pagination to look more realistic(click on next button) and sometimes you need to wait for a particular element to be visible before starting scraping pages behind the pagination.

So, having a JavaScript option injection in scraping agent allows developers to write their own code and insert in page, to control the full pagination feature in website scraping. Just bring your own code and logic to tell Agenty —

  • What element to wait/ or watch for
  • Where to click/hover for pagination
  • When to stop the pagination (or exit to continue on next input URL)

That’s it. Nothing gets in your way.

Test your Script

  • Go to the page you are crawling
  • Open Developer tools in Chrome and go to Sources tab
  • Click on the Snippets option and New snippets to open the code editor
  • Here, you can write and debug your script. Press Ctrl + Enter to execute it or you may click on the Play icon in the bottom right corner of code editor as in this screenshot.

    var element = document.querySelector(".next a");
    if(element.getAttribute("href") != "#")
    {
        element.click(); // Click on Next button
    }
    else
    {
       throw "No more pages" // Exit the pagination
    }

Apply script in agent

  • Go to the agent page
  • Enable pagination and select the Script as pagination type
  • Enter the JavaScript code to execute your custom JS function, instead Agenty built-in module for pagination

Remember: It’s important that you select — The right pagination type to tell Agenty if more data will appear on the same page(infinite-scroll pagination) or on the next page(click pagination) to handle the data extraction accordingly.

  • Save your agent configuration and re-run it.

Signup now to get 100 pages credit free

14 days free trial, no credit card required!