The join option in scraping agent allows you to combine multiple extracted values into one cell. This option is helpful especially when you are scraping some element with multiple matches and wants to combine that into a single delimited string.
For example, if you are scraping a product website and the product page displays multiple categories, sizes, images or color variants scraping. The scraping agent will display each result in separate row by default. So, having an option to join two or more extracted result is helpful in transforming the data in desired format. So, we can use the Join option to combine all matches into a single cell to make our data table as one-product, one-row.
If we see this product page screenshot, the product has the category as
Home > Books > Poetry and then the book name. And using the
.breadcrumb a selector extracted 3 matches in separate rows, while we have
price on 1st row.
- Edit the scraping agent by clicking on the Edit tab
- Go to the field you want to join. In this case Category and then enable the Join switch
- Then Save the scraping agent configuration
- And finally, re-run the scraping agent to apply the changes.
After executing your web scraping agent, you’ll see that the field result will be joined in single cell. As in this screenshot below for Category column.
The default join delimiter is
comma(,). But, you may also pass a custom delimiter using JoinDelimiter Post-processing function to tell Agenty what delimiter should be used to club the values.
For example, If I want to use the
semicolon (;) delimiter - I can add a post-processing function in this field to provide a custom delimiter as in this screenshot.
And re-running the web scraper will result in custom delimiter used