A look at the challenges some AI developers face in building models to extract trillions of high-quality tokens from PDFs, which are hard to parse, for training (Josh Dzieza/The Verge)
https://www.theverge.com/ai-artificial-intelligence/882891/ai-pdf-parsing…
»PDF-Standard bekommt — Brotli-Kompression für 20 Prozent kleinere Dateien:
Die PDF Association führt Brotli als neuen Kompressionsfilter für PDF 2.0 ein. Tests zeigen durchschnittlich 20 Prozent kleinere Dateien gegenüber Deflate.«
Bis jetzt wusste ich nicht für was Brötli wirklich genutzt werden kann, da es sehr langsam ist aber efizient komprimmiert. Jetzt zeigt es mir, das die Google Erfindung bei PDF durchaus Sinn ergibt.
🥖
from my link log —
Back to the future: the story of Squeak, a practical Smalltalk written in itself.
http://www.vpri.org/pdf/tr1997001_backto.pdf
saved 2021-05-24
finally got around to "move my archive of scanned documents out of google drive" with the help of a lovely program "ocrmypdf", which is basically a python wrapper around tesseract and various pdf tools, but it's a really well done wrapper.
the simple invocation:
`ocrmypdf input.pdf output.pdf`
does what I want. the defaults are sensible. and now I can pdfgrep when I need to find that thing from 20 years ago that I still have for questionable "I do…
Vanderbilt Policy Accelerator - Capping-Credit-Card-Rates.pdf https://cdn.vanderbilt.edu/vu-URL/wp-content/uploads/sites/412/2025/09/03183755/Capping-Credit-Card-Rates.pdf
Resurrected!!! The source of RFK's scientific knowledge, now, once again, available to all.....
"Science Made Stupid"
https://www.chrispennello.com/tweller/Science Made Stupid.pdf
(2015, PDF) The rehabilitative potential of auditory to visual sensory substitution devices for the blind https://las.touro.edu/media/schools-an
@… However, if I set
diagram:
engine:
mermaid:
mime-type:
application/pdf: true
image/svg xml: false
diagram.lua fails because Inkscape doesn't find pdf2svg.
But I don't see why it even tries to call Inkscape, as mmdc can directly output PDF. The mermaid function looks goo…
from my link log —
Spotting fake face masks. (FFP2/N95/KN95)
https://bda.org/advice/Coronavirus/Documents/spotting-fake-face-masks.pdf
saved 2021-12-23
from my link log —
From collisions to chosen-prefix collisions, applied to full SHA-1.
https://eprint.iacr.org/2019/459.pdf
saved 2019-05-11 https://