Bad Bots – Headless Chrome

There’s never a shortage of bad bots and unidentifiable applications that crawl websites. Are they scraping the content? Updating it for some unnamed organization’s news site? Storing an archive of it? It’s not clear, as they typically won’t identify themselves with a legitimate robot-type user agent.

One group of firewall logs recently caught my eye for a few reasons. The first reason was that, similar to my issue with OVH Hosting in a previous blog, there were numerous clients connecting simultaneously with the same user agent. At any given time, 3 to 5 of these hosts would be crawling information, like tags and posts, off of the site. Viewing the visitors live, I saw that a high percentage of the IPs below were all using the same user agent.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/64.0.3282.119 Safari/537.36

Here’s a copy of the Firewall log where I set up a rule to do an extended browser validation using javascript:

Does anybody know the purpose and source of these connections? Did you end up here by searching of the IPs? All of the subnets below belong to Amazon Technologies and could possibly be connected behind the scenes on Amazon Web Services.

I’ve tested a solution called Kasada that I’d recommend for blocking these kinds of probes on a medium-large scale network. Otherwise, the tool ‘Cerber’ is useful for defending platforms like WordPress.

100+ IP Addresses recorded in the month of July: