@axbom@axbom.me2025-10-29 12:27:59
https://www.smartasaker.se/sv/ocr-lasare
@axbom@axbom.meDeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute (Jonathan Kemper/The Decoder)
https://the-decoder.com/deepseeks-ocr-system-compr…
Gibt's eine brauchbare Scanner-App für Android, ohne Abo? Braucht kein OCR, nur gute mehrseitige Scans2PDF.
Improving OCR using internal document redundancy
Diego Belzarena, Seginus Mowlavi, Aitor Artola, Camilo Mari\~no, Marina Gardella, Ignacio Ram\'irez, Antoine Tadros, Roy He, Natalia Bottaioli, Boshra Rajaei, Gregory Randall, Jean-Michel Morel
https://arxiv.org/abs/2508.14557
Extracting Information from Scientific Literature via Visual Table Question Answering Models
Dongyoun Kim, Hyung-do Choi, Youngsun Jang, John Kim
https://arxiv.org/abs/2508.18661
Paperwork does OCR on everything I scan. I've just scanned a document with my signature on it. It OCR-ed the signature (which is literally a scrawl on "Michał Górny") as "NBA".
Layout-Aware OCR for Black Digital Archives with Unsupervised Evaluation
Fitsum Sileshi Beyene, Christopher L. Dancy
https://arxiv.org/abs/2509.13236 https://
E-ARMOR: Edge case Assessment and Review of Multilingual Optical Character Recognition
Aryan Gupta, Anupam Purwar
https://arxiv.org/abs/2509.03615 https://…
Some nice examples in the 'use cases' section of AI for Humanists https://aiforhumanists.com/guides/usecases/ - from OCR to annotation to identifying voices and styles
Reducto, which uses OCR with vision language models to convert complex documents into inputs for LLMs, raised a $75M Series B led by a16z at a $600M valuation (Stephanie Palazzolo/The Information)
https://www.theinformation.com/articles/startup-using-ai-tran…
{tesseract} allows you to read text from images https://docs.ropensci.org/tesseract/ it can also be combined with {magick} https://ropen…
Logics-Parsing Technical Report
Xiangyang Chen, Shuzhao Li, Xiuwen Zhu, Yongfan Chen, Fan Yang, Cheng Fang, Lin Qu, Xiaoxiao Xu, Hu Wei, Minggang Wu
https://arxiv.org/abs/2509.19760
Hjärnblödning på skatteverket. Betalade in skatt, men råkade slå fel ocr, ringde och de skulle fixa. Nu fick jag beslut om utmätning från fogden trots att skatten betaldes för flera månader sedan.
SiDiaC: Sinhala Diachronic Corpus
Nevidu Jayatilleke, Nisansa de Silva
https://arxiv.org/abs/2509.17912 https://arxiv.org/pdf/2509.17912
Evaluating LLMs for Historical Document OCR: A Methodological Framework for Digital Humanities
Maria Levchenko
https://arxiv.org/abs/2510.06743 https://arx…
Integrating Generative AI into Cybersecurity Education: A Study of OCR and Multimodal LLM-assisted Instruction
Karan Patel, Yu-Zheng Lin, Gaurangi Raul, Bono Po-Jen Shih, Matthew W. Redondo, Banafsheh Saber Latibari, Jesus Pacheco, Soheil Salehi, Pratik Satam
https://arxiv.org/abs/2509.02998
Finally found a great ad-free and tracking-free #OpenSource document scanner for iOS, with OCR and multi-page PDF output:
https://openscanner.app/
33k one-page TIFFs is an OCR challenge, but it's not insurmountable. https://fed.brid.gy/r/https://bsky.app/profile/did:plc:gvda6fem6r7selm4gzjjww4a/post/3lxvbrbeabc2a
Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning
Haiyang Yu, Yuchuan Wu, Fan Shi, Lei Liao, Jinghui Lu, Xiaodong Ge, Han Wang, Minghan Zhuo, Xuecheng Wu, Xiang Fei, Hao Feng, Guozhi Tang, An-Lan Wang, Hanshen Zhu, Yangfan He, Quanhuan Liang, Liyuan Meng, Chao Feng, Can Huang, Jingqun Tang, Bin Li
https://
One of my most useful tools these days are things that take screenshots. Greenshot, a Windows tool with excellent usability. And Powertools Text Extractor which lets me OCR bits of text on the screen. Usability is important here: press one button and stuff is copied to clipboard.
Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Shashank Vempati, Nishit Anand, Gaurav Talebailkar, Arpan Garai, Chetan Arora
https://arxiv.org/abs/2508.21693
VARCO-VISION-2.0 Technical Report
Young-rok Cha, Jeongho Ju, SunYoung Park, Jong-Hyeon Lee, Younghyun Yu, Youngjune Kim
https://arxiv.org/abs/2509.10105 https://
Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning
Sofia Kirsanova, Yao-Yi Chiang, Weiwei Duan
https://arxiv.org/abs/2510.08385 https://