How to intercept HTTP request, responses with Puppeteer?

Puppeteer, a Node.js test automation and web scraping library. It provides a high-level API to control headless Chromium. One of its key features is the ability to intercept HTTP requests and responses, allowing users to manipulate network requests.

Let’s consider an example ecommerce website where the product prices are displayed based on the different zip codes. So, to scrape the products data with regional prices we’ll load a webpage, input different zip codes, intercept the requests made to fetch prices, and manipulate them accordingly.

Here’s how you can achieve this using the request interceptor:

Request interception

To intercept HTTP requests in Puppeteer, first we need to enable the interceptor using the page.setRequestInterception method.

const page = await browser.newPage();

  // Enable request interception
await page.setRequestInterception(true);

  // Intercept requests
page.on('request', req => {
   req.continue();
});

await page.goto('https://example.com');

Within the page.on(‘request’) event listener, we can access and modify the intercepted request as needed. Remember to call req.continue() to continue with the original request or your request may never be completed.

The same functionality is used for ad blocking to improve the web scraping performance,

Manipulate request body

Coming back to our example, let’s consider the website fetches product prices based on a user-provided zip code. When we enter the zip code, it sends an HTTP request to an API endpoint with the user’s zip code in the request body.

So, we want to intercept this request using Puppeteer, replace the zip code with a different value, and then continue with the modified request. Here’s how to do this -

page.on("request", (req) => {
  if (req.interceptResolutionState().action === "already-handled") return;
  const url = req.url();
  if (url.includes("/api/prices") && req.method() == "POST") {
    let postData = req.postData();
    if (postData) {
       postData = postData.replace(/11001/g, zipCode);
    }
    req.continue({ postData });
  } else {
    req.continue();
  }
});

Responses interception

Now, the requests have been manipulated and sent to the server. It’s time to wait for the response using the response interceptor to capture the response JSON.

  1. Listen for responses using page.on('response', ...).
  2. Check if the URL matches the API endpoint /api/prices.
  3. Parse the JSON response and set to result variable.
let result = {};
// Intercept responses
page.on("response", async (response) => {
  const url = response.url();

  // Check if the URL matches the API endpoint
  if (url.includes("/api/prices")) {
    result = await response.json();
  }
});

Full code

Here is the complete code using the Agenty developer mode to scrape data using Puppeteer or Playwright.

module.exports = async ({ page, request }) => {
  // Enable request interception
  await page.setRequestInterception(true);

  // Intercept requests
  page.on("request", (req) => {
    if (req.interceptResolutionState().action === "already-handled") return;
    const url = req.url();
    if (url.includes("/api/prices") && req.method() == "POST") {
      let postData = req.postData();
      if (postData) {
        postData = postData.replace(/11001/g, zipCode);
      }
      req.continue({ postData });
    } else {
      req.continue();
    }
  });

  // Intercept responses
  let result = {};
  page.on("response", async (response) => {
    const url = response.url();

    // Check if the response URL matches the API endpoint
    if (url.includes("/api/prices")) {
      result = await response.json();
    }
  });

  // Navigate to the web page
  await page.goto(request.url);

  return {
    data: [result],
    type: "application/json",
  };
};

Signup now to get 100 pages credit free

14 days free trial, no credit card required!