Tootfinder

Opt-in global Mastodon full text search. Join the index!

@TFG@social.linux.pizza
2025-07-31 08:52:37

for i in {00000..00496}; do wget virusshare.com/hashfiles/Virus; done
echo MD5 > VS_full.txt
for i in {00000..00496}; do cat VirusShare_$i.md5 | grep -v '#' >> ./VS_full.txt; done
kthxbye :)

@fanf@mendeddrum.org
2025-06-11 20:42:03

from my link log —
ai.robots.txt: A list of AI agents and robots to block.
github.com/ai-robots-txt/ai.ro
saved 2025-01-16

@timbray@cosocial.ca
2025-08-18 20:37:32

How about having a new /.aibots.txt (or maybe just /.ai.txt)?
All AI-focused crawlers expected to read it.
Start off with the exact same syntax as /.robots.txt - then there’s scope for adding genAI-specific stuff like iP claims and optimized paths and so on.
#genAI

@gatewayy@mastodon.gatewayy.net
2025-08-16 19:33:09

I found this simaltaniousky very informative and extremely obvious in hindsight. It reminded me of when everything seemed a lot more straight forward. I think that I might try this out.
al3rez.com/todo-txt-journey

@groupnebula563@mastodon.social
2025-07-21 22:50:14

slate.com/technology/2014/07/a
see this is wrong, the T-1000’s user agent would actually be
Mozilla/5.0 (T-1000; CPU like T-800) Skynet (KHTML, like Gecko) Terminator version 1000 (like Human)…

@mro@digitalcourage.social
2025-08-13 10:09:16

Hallo @…,
was muß denn in der #robots.txt stehen, um auch im September noch für Euch inidiziert zu werden? #Bing ja wohl nicht mehr, oder doch?

@cosmos4u@scicomm.xyz
2025-08-19 19:37:46

A previously unknown regular satellite of #Uranus has been detected in a series of images obtained by the Near-Infrared Camera (NIRCam) onboard the James Webb Space Telescope on 2025 Feb. 2: cbat.eps.harvard.edu/iau/cbet/ -> science.nasa.gov/blogs/webb/20 - the object is located at a projected radial distance of 56250 /- 250 km from Uranus' center in the planet's equatorial plane, initial astrometry is consistent with the moon orbiting on a nearly circular orbit with an orbital period 0.402 days, and the observed IR flux from the object indicates a radius of 4 - 5 km, placing it well below the detection threshold of earlier images from Voyager and the Hubble Space Telescope.

@karlauerbach@sfba.social
2025-07-24 17:37:16

The maga-regime has opened the floodgates to allow purported AI company 'bots' steal everything they can find. (Goodbye robots.txt, goodbye terms-of-service, goodbye copyright.)
To me that suggests that anyone wanting to take information - even highly sensitive stuff such as medical, financial, or even classified data - now can raise a defense that they are just gathering data to feed their AI. (A smart criminal would prepare for to use defense by actually buying an Nvidia AI c…

@EgorKotov@datasci.social
2025-06-18 16:12:16

📝🗃️ 𝗿𝗱𝗼𝗰𝗱𝘂𝗺𝗽: Dump ‘R’ Package Source, Documentation, and Vignettes into One File for use in LLMs #rstats #LLM is on CRAN ekotov.pro/rdocdum…

rdocdump
Get fresh package docs to pass to LLM
library(rdocdump)
rdd_to_txt(
pkg = "aws.s3"
output_file = "aws.s3.txt",
force_fetch = TRUE)
github.com/e-kotov/rdocdump
@noellabo@fedibird.com
2025-06-20 04:58:16

『DOSの人が困るので、ファイル名は8文字のアルファベット大文字と _ と数字の組みあわせ(8.3形式)でお願いします』 -- README~1.TXT

@zachleat@zachleat.com
2025-08-04 13:59:19

@… @… looks like it found an error in your robots.txt though I don’t see that content in your robots.txt 👀 hmmmmm I wonder if that is a user-agent issue with CloudFlare blocking the lighthouse bot?

@metacurity@infosec.exchange
2025-08-05 13:37:18

As we head into a blizzard of infosec news, stay ahead of the curve by checking out today's Metacurity for the latest developments, including
--Ukraine claims major hack of Russian nuclear submarine,
--SonicWall is aware of flaw exploitation,
--Perplexity is stealthily evading robots.txt,
--FinCen warns of crypt ATM crimes,
--Vietnamese hackers are targeting thousands,
--Informants' data stolen in a Louisiana sheriff's office ransomware attack, …

@GroupNebula563@mastodon.social
2025-07-05 01:32:59

#AI #honeypots huh

@ripienaar@devco.social
2025-06-15 12:14:41

Been designing distributed counters for NATS. Pretty happy with this.
50k/second unoptimised and on a single counter - but we will support aggregation of regional to global etc.
Hard dist sys problems made trivial to use and operate 💪💪
gist.github.com/ripienaar/d95d

@mgorny@pol.social
2025-08-07 04:47:30

Z archiwum: jak Jasiu próbował się z kołem przewieźć cugiem z Katowic w okolice Bełsznicy.
tek.org.pl/psota-ic.txt
Jak ktoś potrzebuje po warszawsku, to np. tu niżej jest relacja:

@xtaran@chaos.social
2025-06-08 23:55:00

Fsck GMail!
@ IN TXT "v=spf1 all"

@andycarolan@social.lol
2025-06-09 08:36:36

I just discovered TXT... feels all kinds of uplifting for a Monday morning :)
#TomorrowXTogether #KPop

@Techmeme@techhub.social
2025-08-04 14:50:39

Cloudflare says Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks (Cloudflare)
blog.cloudflare.com/perplexity

@chriscz@social.linux.pizza
2025-07-03 01:19:26

🥱
The day SHALL start.
Regards not given,
RFC2119
ietf.org/rfc/rfc2119.txt

@jake4480@c.im
2025-08-04 16:49:27

"Today, over two and a half million websites have chosen to completely disallow AI training through our managed robots.txt feature or our managed rule blocking AI Crawlers. Every Cloudflare customer is now able to selectively decide which declared AI crawlers are able to access their content in accordance with their business objectives.
We expected a change in bot and crawler behavior based on these new features, and we expect that the techniques bot operators use to evade detecti…

@groupnebula563@mastodon.social
2025-07-05 01:32:59

#AI #honeypots huh

@tinoeberl@mastodon.online
2025-08-06 08:10:49

KIMissbrauch
Cloudflare wirft dem KI-Anbieter ##Perplexity vor, sich mit undeklarierten Crawlern Zugang zu gesperrten Websites zu verschaffen.
Trotz robots.txt-Verboten und IP-Blockaden soll Perplexity mit wechselnden User-Agents und IPs Inhalte verdeckt auslesen.
Das wäre eine Verletzung etablierter Webstandards und Missachtung von Website-Präferenzen.

@timbray@cosocial.ca
2025-06-10 17:23:53

Is there anything I can put in robots.txt that will stop Scrapy?
Failing that, let’s take the ship up and nuke the site from orbit. It’s the only way to be sure.

@mgorny@social.treehouse.systems
2025-07-05 18:35:18

To whomever praises #Claude #LLM:
ClaudeBot has made 20k requests to bugs.gentoo.org today. 15k of them were repeatedly fetching robots.txt. That surely is a sign of great code quality.
#AI

@cosmos4u@scicomm.xyz
2025-07-03 00:55:22

There is now also a CBET about the new interstellar #comet 3I/ATLAS: cbat.eps.harvard.edu/iau/cbet/ - it comes with an even more precise orbit based on astrometry back to 5 June and predicts 13th magnitude with 60° elongation after perihelion in November. The current magnitude is about 17.7.

@iam_jfnklstrm@social.linux.pizza
2025-08-08 07:20:10

Alltså snabb fråga: Jag har fått en .msg fil av en kollega och jag behöver använda adresserna i den för att göra ett utskick till alla på listan. Men jag lyckas inte på något sätt skapa ett nytt mail från filen. Öppnar jag den som txt fil så har den dubbletter av alla namn med ' ' tecken kring varje namn/adress.
Hur i hela världen gör jag för att slippa kopiera namn för namn in i ett nytt mail?

@mgorny@pol.social
2025-07-05 18:36:35

Jak ktoś chwali sobie #Claude #LLM, to wspomnę:
ClaudeBot dziś wykonał 20 tysięcy żądań do bugs.gentoo.org. Spośród nich, 15 tysięcy w kółko ciągnęło plik robots.txt. Zaprawdę wysokiej jakości kod.
#AI

@GroupNebula563@mastodon.social
2025-07-08 01:41:19

new cool idea: whenever anything requests /llms.txt or *.md serve them 42.zip
#ai #noai

@groupnebula563@mastodon.social
2025-07-08 01:41:19

new cool idea: whenever anything requests /llms.txt or *.md serve them 42.zip
#ai #noai