Tootfinder

Opt-in global Mastodon full text search. Join the index!

@fanf@mendeddrum.org
2025-06-11 20:42:03

from my link log —
ai.robots.txt: A list of AI agents and robots to block.
github.com/ai-robots-txt/ai.ro
saved 2025-01-16

@tiotasram@kolektiva.social
2025-05-13 22:19:54

Writing code that ignores robots.txt is a professional ethics violation.
This is a toot about #AI

@lysander07@sigmoid.social
2025-05-13 16:25:32

Last week, our students learned how to conduct a proper evaluation for an NLP experiment. To this end, we introduced a small textcorpus with sentences about Joseph Fourier, who counts as one of the discoverers of the greenhouse effect, responsible for global warming.

Slide of the Information Service ENgineering lecture 03, Natural Language Processing 02, section 2.6: Evaluation, Precision, and Recall
Headline: Experiment
Let's consider the following text corpus (FOURIERCORPUS):
 1
In 1807, Fourier's work on heat transfer laid the foundation for understanding the greenhouse effect.
2
Joseph Fourier's energy balance analysis showed atmosphere's heat-trapping role.
3
Fourrier's calculations, though rudimentary, suggested that the atmosphere acts as an insulato…
@timbray@cosocial.ca
2025-06-10 17:23:53

Is there anything I can put in robots.txt that will stop Scrapy?
Failing that, let’s take the ship up and nuke the site from orbit. It’s the only way to be sure.

@groupnebula563@mastodon.social
2025-07-05 01:32:59

#AI #honeypots huh

@GroupNebula563@mastodon.social
2025-07-05 01:32:59

#AI #honeypots huh

@xtaran@chaos.social
2025-06-08 23:55:00

Fsck GMail!
@ IN TXT "v=spf1 all"

@andycarolan@social.lol
2025-06-09 08:36:36

I just discovered TXT... feels all kinds of uplifting for a Monday morning :)
#TomorrowXTogether #KPop

@kubikpixel@chaos.social
2025-05-26 06:00:07

From HOSTS.TXT to Modern Internet Infrastructure
🌐 #hoststxt

@cosmos4u@scicomm.xyz
2025-07-03 00:55:22

There is now also a CBET about the new interstellar #comet 3I/ATLAS: cbat.eps.harvard.edu/iau/cbet/ - it comes with an even more precise orbit based on astrometry back to 5 June and predicts 13th magnitude with 60° elongation after perihelion in November. The current magnitude is about 17.7.

@wfryer@mastodon.cloud
2025-07-02 03:03:31

Control How Your Content Is Used for AI Training With Cloudflare (Cloudflare Blog, 1 July 2024)
#MediaLit

@n8foo@macaw.social
2025-05-08 04:10:30

From the digital archives: #AWS #EC2 IP ranges from 14 years ago.

@chriscz@social.linux.pizza
2025-07-03 01:19:26

🥱
The day SHALL start.
Regards not given,
RFC2119
ietf.org/rfc/rfc2119.txt

@mgorny@social.treehouse.systems
2025-07-05 18:35:18

To whomever praises #Claude #LLM:
ClaudeBot has made 20k requests to bugs.gentoo.org today. 15k of them were repeatedly fetching robots.txt. That surely is a sign of great code quality.
#AI

@mgorny@pol.social
2025-07-05 18:36:35

Jak ktoś chwali sobie #Claude #LLM, to wspomnę:
ClaudeBot dziś wykonał 20 tysięcy żądań do bugs.gentoo.org. Spośród nich, 15 tysięcy w kółko ciągnęło plik robots.txt. Zaprawdę wysokiej jakości kod.
#AI

@arXiv_csNI_bot@mastoxiv.page
2025-05-29 07:21:03

Scrapers selectively respect robots.txt directives: evidence from a large-scale empirical study
Taein Kim, Karstan Bock, Claire Luo, Amanda Liswood, Emily Wenger
arxiv.org/abs/2505.21733

@groupnebula563@mastodon.social
2025-07-08 01:41:19

new cool idea: whenever anything requests /llms.txt or *.md serve them 42.zip
#ai #noai

@GroupNebula563@mastodon.social
2025-07-08 01:41:19

new cool idea: whenever anything requests /llms.txt or *.md serve them 42.zip
#ai #noai

@EgorKotov@datasci.social
2025-06-18 16:12:16

📝🗃️ 𝗿𝗱𝗼𝗰𝗱𝘂𝗺𝗽: Dump ‘R’ Package Source, Documentation, and Vignettes into One File for use in LLMs #rstats #LLM is on CRAN ekotov.pro/rdocdum…

rdocdump
Get fresh package docs to pass to LLM
library(rdocdump)
rdd_to_txt(
pkg = "aws.s3"
output_file = "aws.s3.txt",
force_fetch = TRUE)
github.com/e-kotov/rdocdump
@noellabo@fedibird.com
2025-06-20 04:58:16

『DOSの人が困るので、ファイル名は8文字のアルファベット大文字と _ と数字の組みあわせ(8.3形式)でお願いします』 -- README~1.TXT

@fluchtkapsel@nerdculture.de
2025-05-30 12:34:57
Content warning: tech, admin, dns

Today, I got notified about spamhaus not responding anymore to requests from our mailserver due to using an "open resolver".
Huh?
I found the command `dig short test.openresolver.com TXT @<ip_of_dns_server_to_test>` to test if my DNS server is deemed an open resolver. And yes, the mailserver uses a DNS server that got recognized as an open resolver.
Out of curiosity, I tried the same in my local network where I have a dnsmasq serving DHCP and DNS for my cli…

@ripienaar@devco.social
2025-06-15 12:14:41

Been designing distributed counters for NATS. Pretty happy with this.
50k/second unoptimised and on a single counter - but we will support aggregation of regional to global etc.
Hard dist sys problems made trivial to use and operate 💪💪
gist.github.com/ripienaar/d95d