client = WebCrawlerClient()

Overview

The Web Crawler integration allows you to extract information from websites. While the integration offers both and , we highly recommend indexing a website first before you query it: larger websites can take several minutes to index, and the live search will be limited to the front page.

Authentication

The Web Crawler integration does not require any authentication and is available to all users.

Search Options

Following options are available for the options parameter of web_crawler:

url
string
required

The URL of the website to crawl. Trailing slashes are ignored.

max_depth
number
default:0

The maximum depth of the website to crawl. 0 means only the root page will be queried, 1 means the root page and all pages linked from it will be queried, and so on.

Resources

The Web Crawler integration returns Website resources:

Additional Endpoints

Index a website before querying it

GET /integrations/web_crawler/index

Call this endpoint to index a website for indexed search. The website will be crawled recursively and added to the search index.

client = WebCrawlerClient()