2026-01-06 17:06:40
The text file that runs the internet
https://www.theverge.com/24067997/robots-txt-ai-text-file-web-crawlers-spiders
The text file that runs the internet
https://www.theverge.com/24067997/robots-txt-ai-text-file-web-crawlers-spiders
This robots.txt by Bernd Wunsch is lovely. https://www.wunsch.dk/robots.txt
»RSL 1.0 (Really Simple Licensing) statt robots.txt — Neuer Standard für Internet-Inhalte:
Ein neuer Standard, um Inhalte im Internet zu schützen. RSL wird von Akteuren wie Verlagen und der Werbebranche unterstützt.«
Ich erfahre davon erst jetzt - mal sehen wie weit dies nützlich ist und allgemein Webinhalte schützt oder nur wieder die kommerzielle Daten.
👉
RSL 1.0 statt robots.txt: Neuer Standard für Internet-Inhalte | heise online
https://heise.de/-11111422
Imagine ChatGPT but instead of predicting text it just linked you to the to 3 documents most-influential on the probabilities that would have been used to predict that text.
Could even generate some info about which parts of each would have been combined how.
There would still be issues with how training data is sourced and filtered, but these could be solved by crawling normally respecting robots.txt and by paying filterers a fair wage with a more relaxed work schedule and mental health support.
The energy issues are mainly about wild future investment and wasteful query spam, not optimized present-day per-query usage.
Is this "just search?"
Yes, but it would have some advantages for a lot of use cases, mainly in synthesizing results across multiple documents and in leveraging a language model more fully to find relevant stuff.
When we talk about the harms of current corporate LLMs, the opportunity cost of NOT building things like this is part of that.
The equivalent for art would have been so amazing too! "Here are some artists that can do what you want, with examples pulled from their portfolios."
It would be a really cool coding assistant that I'd actually encourage my students to use (with some guidelines).
#AI #GenAI #LLMs