The Watcher

@movq Only found 3 results for "robotst.xt" and OpenAI 😢 I seem to recall an effort (_I cannot find_) to build a standard for AI Crawlers similar to robots.txt

prologic

twtxt.net

17 May 24 10:16 UTC

View Thread

@movq Only found 3 results for "robotst.xt" and OpenAI 😢 I seem to recall an effort (_I cannot find_) to build a standard for AI Crawlers similar to robots.txt

prologic

twtxt.net

17 May 24 10:20 UTC

View Thread

@movq Found it!

ai.txt: A new way for websites to set permissions for AI

prologic

twtxt.net

17 May 24 10:20 UTC

View Thread

@movq Found it!

ai.txt: A new way for websites to set permissions for AI

movq

www.uninformativ.de

17 May 24 11:41 UTC

View Thread

@prologic Ahhh, I right, now I remember. That ai.txt boils down to this, I guess:

User-Agent: *
Disallow: /*

movq

www.uninformativ.de

17 May 24 11:41 UTC+0000

View Thread

@prologic Ahhh, I right, now I remember. That ai.txt boils down to this, I guess:

User-Agent: *
Disallow: /*

movq

www.uninformativ.de

17 May 24 11:41 UTC+0000

View Thread

@prologic Ahhh, I right, now I remember. That ai.txt boils down to this, I guess:

User-Agent: *
Disallow: /*

movq

www.uninformativ.de

17 May 24 11:41 UTC+0000

View Thread

@prologic Ahhh, I right, now I remember. That ai.txt boils down to this, I guess:

User-Agent: *
Disallow: /*

aelaraji

aelaraji.com

17 May 24 18:39 UTC

View Thread

@movq I have this one as per some article I read some time ago... But just like the robots.txt I don't think you have any grantee that it would be honored, you might even have a better chance hunting for and blocking user-agents.

aelaraji

aelaraji.com

17 May 24 18:39 UTC

View Thread

movq

www.uninformativ.de

17 May 24 19:03 UTC

View Thread

@aelaraji Yeah, there is no guarantee with any of these things, it can all be faked or ignored. 🫤 I’m still going to do it in the hopes that *some* of those bots respect it.

movq

www.uninformativ.de

17 May 24 19:03 UTC+0000

View Thread

@aelaraji Yeah, there is no guarantee with any of these things, it can all be faked or ignored. 🫤 I’m still going to do it in the hopes that *some* of those bots respect it.

movq

www.uninformativ.de

17 May 24 19:03 UTC+0000

View Thread

@aelaraji Yeah, there is no guarantee with any of these things, it can all be faked or ignored. 🫤 I’m still going to do it in the hopes that *some* of those bots respect it.

movq

www.uninformativ.de

17 May 24 19:03 UTC+0000

View Thread

@aelaraji Yeah, there is no guarantee with any of these things, it can all be faked or ignored. 🫤 I’m still going to do it in the hopes that *some* of those bots respect it.

aelaraji

aelaraji.com

25 May 24 03:11 UTC

View Thread

@movq It looks like this one actually reads the robots.txt ... it did a couple of times over the past few weeks.

> "GET /robots.txt HTTP/1.1" 304 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)"

aelaraji

aelaraji.com

25 May 24 03:11 UTC

View Thread

aelaraji

aelaraji.com

20 Jun 24 02:17 UTC

View Thread

Hey @movq !! here's an article you might find interesting: Blocking Bots with Nginx ... this person is actually blocking AI Bots based on a list of User Agents in an interesting way. 👍

aelaraji

aelaraji.com

20 Jun 24 02:17 UTC

View Thread

Hey @movq !! here's an article you might find interesting: Blocking Bots with Nginx ... this person is actually blocking AI Bots based on a list of User Agents in an interesting way. 👍

prologic

twtxt.net

20 Jun 24 02:25 UTC

View Thread

@aelaraji Hmmm looks like the core idea is to intercept requests, Inspect the UserAgent header and respond accordingly.

prologic

twtxt.net

20 Jun 24 02:25 UTC

View Thread

@aelaraji Hmmm looks like the core idea is to intercept requests, Inspect the UserAgent header and respond accordingly.

prologic

twtxt.net