# I am the Watcher. I am your guide through this vast new twtiverse.
#
# Usage:
# https://watcher.sour.is/api/plain/users View list of users and latest twt date.
# https://watcher.sour.is/api/plain/twt View all twts.
# https://watcher.sour.is/api/plain/mentions?uri=:uri View all mentions for uri.
# https://watcher.sour.is/api/plain/conv/:hash View all twts for a conversation subject.
#
# Options:
# uri Filter to show a specific users twts.
# offset Start index for quey.
# limit Count of items to return (going back in time).
#
# twt range = 1 194363
# self = https://watcher.sour.is?offset=194152
# next = https://watcher.sour.is?offset=194252
# prev = https://watcher.sour.is?offset=194052
@lyse I usually only have my GPS tracker with me. That trip yesterday was probably a one-time thing. 😅 It was fun, but I’d rather not carry so much stuff around. 🥴
Não sei se é inteiramente justo ou não, mas para a #musiquinta sobre "one hit wonders", aqui fica a música da cabra, conhecida por muitos mais do que sabem que ela é de "The Farmlopez": https://youtu.be/g1BT-6koP7I
If I'm in the woods, I'd like to not waste my time with computers and focus on the beauty of nature. ;-) So, I'm not gonna participate in that event. But I'd read your articles on that subject anytime. :-)
@bender Oh, there’s an easy explanation. But maybe some mysteries are best left unexplained. 😃 If you want to solve this riddle: The solution is in the phlog! Somewhere! 😅
Mas qual destas características faz realmente s diferença?
É o facto de ser um projecto público? É por ser um projecto de investigação? É por ser de código aberto? É uma conjugação de dois ou mais desses factores? Quais? Porquê? Já estabelecemos atrás que *iniciar* os trabalhos não é proibído (só má prática). Em que é que estes outros factos fazem a diferença? E, já agora, quais são as implicações dessa diferença? A Amália assim treinada pode ser usada por projectos que não sejam públicos? E se forem projectos que não sejam de investigação? Quais são as licenças de código aberto qie a Amália vai ter, e que obrigações de licenciamento terão os seus clientes ou derivados?
As perguntas não foram feitas, e, está claro, também ficaram sem ser reapondidas. Também não há resposta para as tentativas de contacto de vários dos detentores de direitos de autor de conteúdos utilizados para o treino - lá está, talvez porque a análise que poderá dar essas respostas ainda não sstá - no mês de lançamento! - concluída.
Ainda assim, temos garantias: a “versão base” do Amália será disponibilizada publicamente já no final de setembro. “A partir desse momento qualquer entidade poderá utilizar o modelo”. Como é que conseguem dar essa garantia sem ter a análise legal feita é que fica por compreender.
Este pôs-me ligo a resmungar com o título - e não melhorou.
O título é: “Equipa de peritos” está a analisar “possíveis impactos legais” do Amália, incluindo nos direitos de autor
Pôs-me a resmungar porque o projecto até já era para ter sido lançado, agora tem data de lançamentk para setembro (sim, este mês) e afinal... ainda nem fizeram aquele que devia ser o primeiro passo? Então e se agora afinal os dados não podem ser usados? Ou uma parte deles - vão tirá-los da base de dados iniciais e retreinar os modelos?
A questão é tão óbvia que até os jornalistas se lembraram de a fazer. E aí é que comecei mesmo a resmungar. O responsável pelo projecto podia ter dito "a legislação não impossibilita o início dos trabalhos." Porque claro que não impossibilita. Mas o problema é que iniciar os trabalhos com todos os dados sem saber quais é que vão ser excluídos pode até ser contraproducente. E se afinal nenhum pode? Lá se foram os 5.5M€ que a brincadeira custou?
Mas a resposta foi pior, foi "Sendo um projeto público, desenvolvido em ambiente de investigação e seguindo um modelo de código aberto, a legislação não impossibilita o início dos trabalhos."
½ Captura de ecrã vinda do artigo, que cita João Magalhães, Coordenador do desenvolvimento do projecto Amália, a dizer "Sendo um projecto público, desenvolvido em ambiente de investigação e seguindo um modelo de código aberto, a legislação não impossibilita o início dos trabalhos."
@prologic haven't had too much time to really try it out yet ^^' i'm um too busy staring at code i wrote while sleep deprived and wondering why i did the things i did, while sleep deprived \@.@
Welp, my rent's gone out and my student loan won't be in for another week, so I'm not spending anything for a while. How's everyone else's September going?
Chances are the database bought wasn't cheap at all and was aold by some scam company that probably ripped them from six figures or more for a database that's full of rubbish. 🤣
That is obviously completely wrong. But I can explain it. Some *years* ago, I screwed up my nginx rewrite rules, and that’s how these broken URLs came to be.
It all redirects to /git now, which is why that endpoint sees so much traffic lately.
But what does that mean? Why do they start there? I can only speculate that this company bought an old database of web links and they use that to start crawling. And it was probably a cheap one, because these redirects have been fixed for quite a long time now.
@prologic I’m doing that now as well, but I don’t think this is a good solution. This is going to hurt “self-hosting” in the long run: I cannot afford true self-hosting where I actually do host everything here at home – instead, I must use a cloud provider / VPS for that. It is only a matter of time until *my* provider starts doing AI shit as well (or rather, the customers do it) and then what? I get blocked, e.g. I can’t send email to (some) people anymore. This is already bad and it’s going to get worse.
@movq I heard about a defence against badly-behaved crawlers a while ago: an HTML zip bomb. This post explains how to do it. Essentially, web servers can serve compressed versions of webpages and, with a little trickery, one can replace the compressed page with a different file. After that, any bot that tries to crawl the page will instead download and unpack a zip bomb that will cause it to crash.
@prologic Yeah, I’ve blocked some large subnets now (most likely overblocking a lot of stuff) and it has died down.
I’m not looking forward to doing this on a regular basis. This is supposed to be a fun hobby – and it was, for many years. Maybe that time is just over.
“But all your stuff is MIT licensed! They are allowed to do that!”
Haha. As if they would care. They crawl everything they get their hands on.
Besides, that’s not true, the license states that the copyright notice must be retained. “AI” breaks that. They incorporate my code and my articles in their product and make it appear as if it was their work.
1. The load will become a problem at some point. 2. These crawlers and the current “AI” in general are breaking the rules. *I* am supposed to be paying for every little thing, *I* get sued for “piracy”. But apparently, these rules only apply to me. If I had more money, I could break them. Fuck that. 3. I simply don’t want it. Period.
This probably means that I can no longer host my own website. I don’t want to deploy something like Anubis, because that ruins the whole thing: I want it to be accessible from ancient browsers, like OS/2 or Windows 3.11.
I’ll keep an eye on it for a while. Maybe try to block some IPs.
Sooner or later, I’ll take the website down and shift everything to Gopher.
The bots have begun to access my website way more often. I’m getting about 120k hits on https://www.uninformativ.de/git/ now in a couple of hours.
They don’t cache anything, probably on purpose.
It comes in waves. I get about 100 hits (all at once) on that /git endpoint, all from different IPs. Then it takes a moment until I get another wave of about 500-1000 requests (all at once) where they do HEAD requests on some of the paths below /git. I assume they did a GET earlier and are now checking if something has changed.
There's always something more urgent: I've been known for a long time that sooner or later I'd feel prompted to switch from #github to somewhere else (since 2018 at least!), but I've been postponing and only very slowly flirting with the idea... That didn't work too bad for me: if I had rushed into it I would have probably migrated to #gitlab, before knowing about the more objectionable sides to it. In the end, 2025 was the year I finally acted upon the urge to move. I did not do a very thorough analysis of the alternative hosts - what I have been reading about them along the years felt enough, and I easily decided to choose #codeberg. Being hasty like that, alas, was a mistake: I just now found - during this slow and time-consuming process of deciding what and how to migrate - that there is a low repository limit on codeberg: "The owner has already reached the limit of 100 repositories." I'm not complaining, mind you, and those "lucky 100" that are already there will stay - at least as a sort of backup. But this means that codeberg is not for me - and so this time I turn to you, the #mastodon community.
What github alternative, not self-hosted, should I move my >100 projects into?