Tootfinder

Opt-in global Mastodon full text search. Join the index!

@awinkler@openbiblio.social
2025-09-30 07:09:17
@frankel@mastodon.top
2025-10-27 16:08:02

The /llms.txt file
llmstxt.org/

@catsalad@infosec.exchange
2025-10-28 21:27:41

robots.txt (but really /dev/zero)

@knurd42@social.linux.pizza
2025-09-29 08:50:02

The release of #Linux 6.17[1] brought a few hours of calmness to the #kernel vanilla repos for #Fedora[2], as that means they just ship three different versions[3]. 😊
Soon the craziness will…

screenshot of https://www.leemhuis.info/files/kernel-vanilla/repostatus.txt
@timbray@cosocial.ca
2025-08-18 20:37:32

How about having a new /.aibots.txt (or maybe just /.ai.txt)?
All AI-focused crawlers expected to read it.
Start off with the exact same syntax as /.robots.txt - then there’s scope for adding genAI-specific stuff like iP claims and optimized paths and so on.
#genAI

@jkmartindale@mastodon.social
2025-09-29 04:06:38

Refreshing to see a Twitter spam account straight up tell you what it is

Twitter message request from Molly xx (@mollykgmy)

Bio: 21 😏 spam acc 💛

Message: hey, have we met?? u oddly showed up on the featured people, n i thought you are really handsome, will u please txt me on my other account @glowmollyyy? 💛
@fgraver@hcommons.social
2025-09-10 19:22:24

Pay-per-output? AI firms blindsided by beefed up robots.txt instructions. arstechnica.com/tech-policy/20

@gatewayy@mastodon.gatewayy.net
2025-08-16 19:33:09

I found this simaltaniousky very informative and extremely obvious in hindsight. It reminded me of when everything seemed a lot more straight forward. I think that I might try this out.
al3rez.com/todo-txt-journey

@mela@zusammenkunft.net
2025-10-17 14:39:39

Quelle surprise: seriöse Webseiten blocken den Zugang für KI-Trainingszugriff eher als Seiten, deren Zweck der Desinformation dient.
arxiv.org/abs/2510.10315

@cosmos4u@scicomm.xyz
2025-08-19 19:37:46

A previously unknown regular satellite of #Uranus has been detected in a series of images obtained by the Near-Infrared Camera (NIRCam) onboard the James Webb Space Telescope on 2025 Feb. 2: cbat.eps.harvard.edu/iau/cbet/ -> science.nasa.gov/blogs/webb/20 - the object is located at a projected radial distance of 56250 /- 250 km from Uranus' center in the planet's equatorial plane, initial astrometry is consistent with the moon orbiting on a nearly circular orbit with an orbital period 0.402 days, and the observed IR flux from the object indicates a radius of 4 - 5 km, placing it well below the detection threshold of earlier images from Voyager and the Hubble Space Telescope.

@grahamperrin@bsd.cafe
2025-10-27 05:34:45

With a FreeBSD pkg repository configuration file set to use quarterly for the one and only repo:
― why is latest (not quarterly) used for bootstrapping?
gist.github.com/grahamperrin/1

@yaxu@post.lurk.org
2025-09-23 13:34:51

Any reading recommendations for a small collective looking to move from google dependency to self-hosting?
We're collecting some resources here:
doc.patternclub.org/s/QCwRlvO1

@mro@digitalcourage.social
2025-08-13 10:09:16

Hallo @…,
was muß denn in der #robots.txt stehen, um auch im September noch für Euch inidiziert zu werden? #Bing ja wohl nicht mehr, oder doch?

@jake4480@c.im
2025-09-03 17:07:29

Calendar.txt by Tero Karvinen is a plain text file calendar that's versionable, supports all operating systems, is future-proof, easily syncs with Android, etc: #TextFiles

A screenshot of the page for Calendar.txt, a plain text calendar
@whitequark@mastodon.social
2025-09-18 19:04:08

catherine writes error messages
existence of challenge implies possibility of defeat

2025/09/18 19:00:54 pages err: unauthorized; defeated by DNS challenge: TXT record(s) at _git-pages-challenge.glasgow-embedded.org [13818d73986109c24ea135c8c367d5c78d88f14df3c259acf7aa57b5e93b293a] do not include 0c8f3e5b3f26cfac016b64d5d0d8160a509ef060169a29f925d2d2b60d0bf6b5; domain glasgow-embedded.org does not match wildcard *.grebedoc.dev
@arXiv_csCY_bot@mastoxiv.page
2025-10-14 09:47:18

Is Misinformation More Open? A Study of robots.txt Gatekeeping on the Web
Nicolas Steinacker-Olsztyn, Devashish Gosain, Ha Dao
arxiv.org/abs/2510.10315

@jdrm@social.linux.pizza
2025-09-15 14:29:47

El intercambio de correos en el que se explica cómo nació el formato de codificación UTF-8 de la mano de sus creadores #utf8

@zachleat@zachleat.com
2025-08-04 13:59:19

@… @… looks like it found an error in your robots.txt though I don’t see that content in your robots.txt 👀 hmmmmm I wonder if that is a user-agent issue with CloudFlare blocking the lighthouse bot?

@metacurity@infosec.exchange
2025-08-05 13:37:18

As we head into a blizzard of infosec news, stay ahead of the curve by checking out today's Metacurity for the latest developments, including
--Ukraine claims major hack of Russian nuclear submarine,
--SonicWall is aware of flaw exploitation,
--Perplexity is stealthily evading robots.txt,
--FinCen warns of crypt ATM crimes,
--Vietnamese hackers are targeting thousands,
--Informants' data stolen in a Louisiana sheriff's office ransomware attack, …

@grahamperrin@bsd.cafe
2025-10-27 05:40:25

Later, after I keyed n (to not continue), installation did continue.
This is partly understandable, because the y/n prompt was in response to a command that used
-y
Not really a bug, just slightly surprising.
gist.github.com/grahamperrin/1…

@mgorny@pol.social
2025-08-07 04:47:30

Z archiwum: jak Jasiu próbował się z kołem przewieźć cugiem z Katowic w okolice Bełsznicy.
tek.org.pl/psota-ic.txt
Jak ktoś potrzebuje po warszawsku, to np. tu niżej jest relacja:

@Techmeme@techhub.social
2025-08-04 14:50:39

Cloudflare says Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks (Cloudflare)
blog.cloudflare.com/perplexity

@arXiv_csCL_bot@mastoxiv.page
2025-09-18 10:19:51

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Alejandro Hern\'andez-Cano, Alexander H\"agele, Allen Hao Huang, Angelika Romanou, Antoni-Joan Solergibert, Barna Pasztor, Bettina Messmer, Dhia Garbaya, Eduard Frank \v{D}urech, Ido Hakimi, Juan Garc\'ia Giraldo, Mete Ismayilzada, Negar Foroutan, Skander Moalla, Tiancheng Chen, Vinko Sabol\v{c}ec, Yixuan Xu, Michael Aerni, Badr AlKhamissi, Ines Altemir Marinas, Mohammad Hossein Amani, Matin An…

@fanf@mendeddrum.org
2025-09-07 11:42:03

from my link log —
Nontransitive comparison functions lead to out-of-bounds read and write in glibc's qsort().
qualys.com/2024/01/30/qsort.tx
saved 2025-09-06

@stargazer@woof.tech
2025-10-24 16:09:54

#WritersCoffeeClub
19. How do you keep track of dates and events in a WIP?
20. What rôle does death (or undeath!) play in your work?
21. What is your take on the adverb debate?
---
19. Any WIP that needs dates and events is probably already big enough to void memorization.
I went all the way from stickies to txt files to wikis to, eventually, Campfire. I love m…

@tinoeberl@mastodon.online
2025-08-06 08:10:49

KIMissbrauch
Cloudflare wirft dem KI-Anbieter ##Perplexity vor, sich mit undeklarierten Crawlern Zugang zu gesperrten Websites zu verschaffen.
Trotz robots.txt-Verboten und IP-Blockaden soll Perplexity mit wechselnden User-Agents und IPs Inhalte verdeckt auslesen.
Das wäre eine Verletzung etablierter Webstandards und Missachtung von Website-Präferenzen.

@jtk@infosec.exchange
2025-10-07 18:45:41

There is an ActivityPub proposal that involves the #DNS.
I have only just discovered it and have not considered it deeply so I am reluctant to make any grand statements. It is not obvious to me why this is useful or better than alternative approaches. It appears to involve the use of TXT RRs, any new de facto use of which makes me skeptical.

@kubikpixel@chaos.social
2025-10-03 05:35:05

»Immer weniger echte Nutzer — Studie zeigt massiven Anstieg von Bot-Traffic im Web:
Menschliche Website-Besucher:innen werden zur Mangelware: Eine neue Analyse zeigt, wie Google, ChatGPT und Co. mit ihren Bots die Spielregeln im Netz verändern - und warum Publisher Alarm schlagen.«
Bots gibt es schon seit den Internet-Anfängen und auch die missachten bewusst die robots.txt Anweisungen.
🫥

@castarco@hachyderm.io
2025-09-12 10:11:22

I'm not sure if this is going to make a difference ( #LLMs weren't able to read #licenses or terms & conditions before when these were not formalized in a "machine-readable" way (plus, besides licenses we already had the robots.txt declarative files; even if those were not as expressive as this new proposal).
So, is this extra work for web developers and maintainers? Are we going to operate under the new assumption that if we didn't do the work of implementing this then we are granting permission to scrapper bots to steal all our online creations?
Or can this be a net gain for creators in some specific way?

@iam_jfnklstrm@social.linux.pizza
2025-08-08 07:20:10

Alltså snabb fråga: Jag har fått en .msg fil av en kollega och jag behöver använda adresserna i den för att göra ett utskick till alla på listan. Men jag lyckas inte på något sätt skapa ett nytt mail från filen. Öppnar jag den som txt fil så har den dubbletter av alla namn med ' ' tecken kring varje namn/adress.
Hur i hela världen gör jag för att slippa kopiera namn för namn in i ett nytt mail?

@arXiv_csSE_bot@mastoxiv.page
2025-10-14 08:45:38

SLEAN: Simple Lightweight Ensemble Analysis Network for Multi-Provider LLM Coordination: Design, Implementation, and Vibe Coding Bug Investigation Case Study
Matheus J. T. Vargas
arxiv.org/abs/2510.10010

@jake4480@c.im
2025-08-04 16:49:27

"Today, over two and a half million websites have chosen to completely disallow AI training through our managed robots.txt feature or our managed rule blocking AI Crawlers. Every Cloudflare customer is now able to selectively decide which declared AI crawlers are able to access their content in accordance with their business objectives.
We expected a change in bot and crawler behavior based on these new features, and we expect that the techniques bot operators use to evade detecti…

@grahamperrin@bsd.cafe
2025-10-09 06:52:35

@… two precautions that may help, in lossy situations:
pkg prime-origins | sort -u > /var/tmp/pkg-prime-origins.txt
/usr/local/etc/periodic/daily/411.pkg-backup
If – following an issue – you predict the need to revert to the backup, you can:
service cron stop