Overview
The Web Crawler integration allows you to extract information from websites. While the integration offers both and , we highly recommend indexing a website first before you query it: larger websites can take several minutes to index, and the live search will be limited to the front page.Authentication
The Web Crawler integration does not require any authentication and is available to all users.Search Options
Following options are available for theoptions parameter of web_crawler:
The URL of the website to crawl. Trailing slashes are ignored.
The maximum depth of the website to crawl. 0 means only the root page will be queried, 1 means the root page and all pages linked from it will be queried, and so on.
Resources
The Web Crawler integration returnsWebsite resources:
Additional Endpoints
Index a website before querying it
GET /integrations/web_crawler/index
Call this endpoint to index a website for indexed search. The website will be crawled recursively and added to the search index.