Criteo crawler


What is Criteo crawler?

Criteo Crawler is a software that visits web pages and analyzes its content to serve relevant ads on them.

Criteo crawler is identified by the following user-agent:

CriteoBot/0.1 (+https://www.criteo.com/criteo-crawler/)

Why does Criteo crawler visit my site?

Criteo is a leading global technology company powering the world’s marketers with trusted and impactful advertising. Criteo empowers companies of all sizes with technology to better know and serve their customers. Criteo is in the process of building a contextual advertising offering to help its publisher partners better monetize their content and support advertisers by better aligning their ads to relevant web pages.

To support its contextual offering, Criteo will analyze the public web content by crawling webpages. Criteo’s technology will identify content categories on a given webpage.
e.g.: an article about sport and running shoes would be classified in the categories “sport” and sub-category “running”.

When does Criteo crawler visit my site?

Criteo crawler will attempt to access URLs only when your website is sending a request to Criteo to deliver an ad on your domain. Criteo crawler limits the visits to your website. The crawler requests access to your website only if the compiled categories are no longer available or no longer up to date.

What data are crawled in my site?

The crawler does not extract or store any source code; it only provides data about the publicly available content of the page, such as the language and the categories of the content (e.g. sports > running).

Criteo Crawler is a privacy compliant system. The Crawler does not access data of the user navigating your website. The Crawler only accesses the published data publicly available on the internet.

How can I authorize the crawler? (coming soon: beginning 2021)

Many premium publishers explicitly allow Criteo Crawler to access their sites. Publishers benefit from Criteo’s categorization of their inventory to optimize target campaigns.

To approve Criteo crawler please add a separate paragraph to the robots.txt as following:

User-agent: CriteoBot/0.1
Disallow:

How can I exclude the crawler? (coming soon: beginning 2021)

If you prefer to exclude Criteo crawler to not visit specific sections of your site, please add a separate paragraph to the robots.txt and specify the path you’d like to exclude as following:

User-agent: CriteoBot/0.1
Disallow: /path/

If you prefer to exclude Criteo crawler to not visit specific your site entirely, please add a separate paragraph to the robots.txt as following:

User-agent: CriteoBot/0.1
Disallow: /

 

Note
It is possible to exclude the crawler when the robots.txt process is not yet available.

For publishers who are Criteo’s direct partners, please contact your Criteo’s representative. The crawling will be excluded from your domains within 24h

For publishers who are not Criteo’s direct partner, please add the User Agent “CriteoBot/0.1” to the robots.txt and contact crawler@criteo.com listing the domains that you would like to exclude from the crawling. Criteo will stop crawling your website within 24h.

More Information

If you need to know more about the crawler, please contact you criteo’s representative if you are Criteo’s direct partner or email us on crawler@criteo.com