I have a list of about a million websites. How can I extract the bulk email address from their about or contact page automatically without writing the web scraping agent for each website? Let us see how we can scrape online emails in bulk through email extractor software.
To extract email addresses, use this REGEX option with this expression: ([\w.-]+@(?=[a-z\d][^.]*\.)[a-z\d.-]*[^.])
This will find and scrape all valid emails on any website URL you crawl, here is my test on Rubular site - Rubular: (^[\w.-]+@(?=[a-z\d][^.]*\.)[a-z\d.-]*[^.]$) with this test string which extracted all 6 valid emails.
email@domain.com
my.email@domain.com
my_email@domain.com
first_middle_last@domain.comIf you want to send an email, please email us at info@domain.com
Contact us for business inquiry on business@domain.com
Github link with example HTML - https://agenty.github.io/Agenty.TestData/forum/forum-33.html
Then create an agent (you may clone any sample agent) and follow the steps:
- Go to edit tab
- Add/Edit a field and change the Type : REGEX
- Enter your regex expression and Group : 1
- Save it and enter(or upload) the URL to extract emails from all the websites.