The Watcher

abucci

anthony.buc.ci

Google Says It'll Scrape Everything You Post Online for AI

> Google updated its privacy policy over the weekend, explicitly saying the company reserves the right to scrape just about everything you post online to build its AI tools.

Google can eat shit.

prologic

twtxt.net

04 Jul 23 13:04 UTC

View Thread

@abucci Oh 😱 Hmmm 🤔

prologic

twtxt.net

04 Jul 23 13:04 UTC

View Thread

@abucci Oh 😱 Hmmm 🤔

prologic

twtxt.net

04 Jul 23 13:04 UTC

View Thread

@abucci Oh 😱 Hmmm 🤔

abucci

anthony.buc.ci

04 Jul 23 13:20 UTC

View Thread

Time to add


<meta name=”googlebot” content=”noindex,nofollow”>

to everything I guess.

abucci

anthony.buc.ci

04 Jul 23 13:21 UTC

View Thread

@prologic They were almost certainly doing this already, but now they're codifying it in their policies, essentially claiming ownership over everyone's web pages.

@marado@ciberlandia.pt

tilde.pt

04 Jul 23 13:54 UTC

View Thread

@abucci To be fair, it was already codified there. What is more interesting (to me) is how they're using a privacy policy (binding their users) in an attempt to get implicit licensing over materials out of the scope of those services, both from their users and others (or of authors unknown). Not that it matters much, I bet they'd argue such license is unneeded, but the fact that they decided to have that wording there makes me curious about the legal basis of such clause. Yes, I know Goggle had an extensive and capable legal team, but I'd still love seeing a legal analysis of the applicability of that under various jurisdictions.

marado

twtxt.net

04 Jul 23 13:54 UTC

View Thread

abucci

anthony.buc.ci

04 Jul 23 14:16 UTC

View Thread

@marado It can't possibly be defensible, which to me always signals an attempt at a power grab. They never explicitly said "we will use anything we scrape from the web to train our AI" before--that's new. There is growing pushback against that practice, with numerous legal cases winding through the legal system right now. Some day those cases will be heard and decided on by judges. So they're trying to get out ahead of that, in my opinion, and cement their claims to this data before there's a precedent set.

movq

www.uninformativ.de

04 Jul 23 17:29 UTC

View Thread

This should work as a robots.txt, right?


User-agent: Googlebot
Disallow: /

movq

www.uninformativ.de

04 Jul 23 17:29 UTC

View Thread

This should work as a robots.txt, right?


User-agent: Googlebot
Disallow: /

movq

www.uninformativ.de

04 Jul 23 17:29 UTC

View Thread

This should work as a robots.txt, right?


User-agent: Googlebot
Disallow: /

marado

twtxt.net

04 Jul 23 21:59 UTC

View Thread

@abucci where they now say they use it to train their AI models thry used to say "for language models", which isn't all that different (possibly extending the scope from text to images, audio and video?).

@marado@ciberlandia.pt

tilde.pt

04 Jul 23 21:59 UTC

View Thread

abucci

anthony.buc.ci

05 Jul 23 09:29 UTC

View Thread

@marado It's very different. Language models are part if traditional search engines and translation engines. The new policy mentions Cloud AI abd Bard specifically. This is a weird change and probably a good preemptive move as I said previously. I'm not sure why you're downplaying it