Tootfinder

@axbom@axbom.me
2025-10-29 12:27:59

@… Ja det där underlättar en del. Fick mig att minnas “kreditkortet” med genomskinliga rutor för att lättare avskilja delar av OCR-numret när man skrev av det. Och så hittade jag den här manicken! 😄

https://www.smartasaker.se/sv/ocr-lasare

@heiseonline@social.heise.de
2025-10-21 16:02:00

DeepSeek-OCR: Wie Bilder Chatbots helfen, lange Gespräche zu führen
Chinesische KI-Forscher wollen Chatbots mit Bildern bei langen Kontexten schnell und günstig halten. Optische Kontextkompression soll KI-Assistenten verbessern.

DeepSeek-OCR: Wie Bilder Chatbots helfen, lange Gespräche zu führen
Chinesische KI-Forscher wollen Chatbots mit Bildern bei langen Kontexten schnell und günstig halten. Optische Kontextkompression soll KI-Assistenten verbessern.

@Techmeme@techhub.social
2025-10-20 18:15:46

DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute (Jonathan Kemper/The Decoder)
https://the-decoder.com/deepseeks-ocr-system-compr…

Deepseek's OCR system compresses image-based text so AI can handle much longer documents
Chinese AI company Deepseek has built an OCR system that compresses image-based text documents for language models, aiming to let AI handle much longer contexts without running into memory limits.

@mela@zusammenkunft.net
2025-08-27 01:00:26

Gibt's eine brauchbare Scanner-App für Android, ohne Abo? Braucht kein OCR, nur gute mehrseitige Scans2PDF.

@arXiv_csCV_bot@mastoxiv.page
2025-08-21 10:04:30

Improving OCR using internal document redundancy
Diego Belzarena, Seginus Mowlavi, Aitor Artola, Camilo Mari\~no, Marina Gardella, Ignacio Ram\'irez, Antoine Tadros, Roy He, Natalia Bottaioli, Boshra Rajaei, Gregory Randall, Jean-Michel Morel
https://arxiv.org/abs/2508.14557

Improving OCR using internal document redundancy
Current OCR systems are based on deep learning models trained on large amounts of data. Although they have shown some ability to generalize to unseen data, especially in detection tasks, they can struggle with recognizing low-quality data. This is particularly evident for printed documents, where intra-domain data variability is typically low, but inter-domain data variability is high. In that context, current OCR methods do not fully exploit each document's redundancy. We propose an unsupervis…

@awinkler@openbiblio.social
2025-09-24 15:05:51

Content warning:

@… at "Digital Neo-Latin studies: ideas and perspectives" on efficient #OCR Post-Correction.
#neolatin

@arXiv_csIR_bot@mastoxiv.page
2025-08-27 08:26:03

Extracting Information from Scientific Literature via Visual Table Question Answering Models
Dongyoun Kim, Hyung-do Choi, Youngsun Jang, John Kim
https://arxiv.org/abs/2508.18661

Extracting Information from Scientific Literature via Visual Table Question Answering Models
This study explores three approaches to processing table data in scientific papers to enhance extractive question answering and develop a software tool for the systematic review process. The methods evaluated include: (1) Optical Character Recognition (OCR) for extracting information from documents, (2) Pre-trained models for document visual question answering, and (3) Table detection and structure recognition to extract and merge key information from tables with textual content to answer extra…

@mgorny@social.treehouse.systems
2025-08-14 19:06:21

Paperwork does OCR on everything I scan. I've just scanned a document with my signature on it. It OCR-ed the signature (which is literally a scrawl on "Michał Górny") as "NBA".

@arXiv_csDL_bot@mastoxiv.page
2025-09-17 07:56:49

Layout-Aware OCR for Black Digital Archives with Unsupervised Evaluation
Fitsum Sileshi Beyene, Christopher L. Dancy
https://arxiv.org/abs/2509.13236 https://

Layout-Aware OCR for Black Digital Archives with Unsupervised Evaluation
Despite their cultural and historical significance, Black digital archives continue to be a structurally underrepresented area in AI research and infrastructure. This is especially evident in efforts to digitize historical Black newspapers, where inconsistent typography, visual degradation, and limited annotated layout data hinder accurate transcription, despite the availability of various systems that claim to handle optical character recognition (OCR) well. In this short paper, we present a l…

@arXiv_csCL_bot@mastoxiv.page
2025-09-05 09:41:51

E-ARMOR: Edge case Assessment and Review of Multilingual Optical Character Recognition
Aryan Gupta, Anupam Purwar
https://arxiv.org/abs/2509.03615 https://…

E-ARMOR: Edge case Assessment and Review of Multilingual Optical Character Recognition
Optical Character Recognition (OCR) in multilingual, noisy, and diverse real-world images remains a significant challenge for optical character recognition systems. With the rise of Large Vision-Language Models (LVLMs), there is growing interest in their ability to generalize and reason beyond fixed OCR pipelines. In this work, we introduce Sprinklr-Edge-OCR, a novel OCR system built specifically optimized for edge deployment in resource-constrained environments. We present a large-scale compar…

@mia@hcommons.social
2025-09-19 14:22:23

Some nice examples in the 'use cases' section of AI for Humanists https://aiforhumanists.com/guides/usecases/ - from OCR to annotation to identifying voices and styles

Use Cases
The AI for Humanists project is developing resources to enable DH scholars to explore how large language models and AI technologies can be used in their research and teaching. Find an annotated bibliography of research papers and tools, a glossary of relevant terms, code tutorials, and information about our workshops.

@Techmeme@techhub.social
2025-10-15 03:20:53

Reducto, which uses OCR with vision language models to convert complex documents into inputs for LLMs, raised a $75M Series B led by a16z at a $600M valuation (Stephanie Palazzolo/The Information)
https://www.theinformation.com/articles/startup-using-ai-tran…

The Startup Using AI to Translate Documents Into Data
If you’ve ever uploaded a picture of a receipt to an expense report or read a PDF of a book online, you’ve likely used optical character recognition, a decades-old technique that converts images of typed, handwritten or printed text into text that’s editable on a computer.OCR might not sound ...

@datascience@genomic.social
2025-10-07 10:00:01

{tesseract} allows you to read text from images https://docs.ropensci.org/tesseract/ it can also be combined with {magick} https://ropen…

Open Source OCR Engine
Bindings to Tesseract: a powerful optical character recognition (OCR) engine that supports over 100 languages. The engine is highly configurable in order to tune the detection algorithms and obtain the best possible results.

@arXiv_csCV_bot@mastoxiv.page
2025-09-25 10:16:42

Logics-Parsing Technical Report
Xiangyang Chen, Shuzhao Li, Xiuwen Zhu, Yongfan Chen, Fan Yang, Cheng Fang, Lin Qu, Xiaoxiao Xu, Hu Wei, Minggang Wu
https://arxiv.org/abs/2509.19760

Logics-Parsing Technical Report
Recent advances in Large Vision-Language models (LVLM) have spurred significant progress in document parsing task. Compared to traditional pipeline-based methods, end-to-end paradigms have shown their excellence in converting PDF images into structured outputs through integrated Optical Character Recognition (OCR), table recognition, mathematical formula recognition and so on. However, the absence of explicit analytical stages for document layouts and reading orders limits the LVLM's capability…

@iam_jfnklstrm@social.linux.pizza
2025-10-13 07:13:36

Hjärnblödning på skatteverket. Betalade in skatt, men råkade slå fel ocr, ringde och de skulle fixa. Nu fick jag beslut om utmätning från fogden trots att skatten betaldes för flera månader sedan.

@arXiv_csCL_bot@mastoxiv.page
2025-09-23 12:54:10

SiDiaC: Sinhala Diachronic Corpus
Nevidu Jayatilleke, Nisansa de Silva
https://arxiv.org/abs/2509.17912 https://arxiv.org/pdf/2509.17912

SiDiaC: Sinhala Diachronic Corpus
SiDiaC, the first comprehensive Sinhala Diachronic Corpus, covers a historical span from the 5th to the 20th century CE. SiDiaC comprises 58k words across 46 literary works, annotated carefully based on the written date, after filtering based on availability, authorship, copyright compliance, and data attribution. Texts from the National Library of Sri Lanka were digitised using Google Document AI OCR, followed by post-processing to correct formatting and modernise the orthography. The construc…

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:22:21

Evaluating LLMs for Historical Document OCR: A Methodological Framework for Digital Humanities
Maria Levchenko
https://arxiv.org/abs/2510.06743 https://arx…

Evaluating LLMs for Historical Document OCR: A Methodological Framework for Digital Humanities
Digital humanities scholars increasingly use Large Language Models for historical document digitization, yet lack appropriate evaluation frameworks for LLM-based OCR. Traditional metrics fail to capture temporal biases and period-specific errors crucial for historical corpus creation. We present an evaluation methodology for LLM-based historical OCR, addressing contamination risks and systematic biases in diplomatic transcription. Using 18th-century Russian Civil font texts, we introduce novel …

@arXiv_csCY_bot@mastoxiv.page
2025-09-04 08:22:51

Integrating Generative AI into Cybersecurity Education: A Study of OCR and Multimodal LLM-assisted Instruction
Karan Patel, Yu-Zheng Lin, Gaurangi Raul, Bono Po-Jen Shih, Matthew W. Redondo, Banafsheh Saber Latibari, Jesus Pacheco, Soheil Salehi, Pratik Satam
https://arxiv.org/abs/2509.02998

Integrating Generative AI into Cybersecurity Education: A Study of OCR and Multimodal LLM-assisted Instruction
This full paper describes an LLM-assisted instruction integrated with a virtual cybersecurity lab platform. The digital transformation of Fourth Industrial Revolution (4IR) systems is reshaping workforce needs, widening skill gaps, especially among older workers. With rising emphasis on robotics, automation, AI, and security, re-skilling and up-skilling are essential. Generative AI can help build this workforce by acting as an instructional assistant to support skill acquisition during experien…

@toxi@mastodon.thi.ng
2025-08-04 15:27:23

Finally found a great ad-free and tracking-free #OpenSource document scanner for iOS, with OCR and multi-page PDF output:
https://openscanner.app/

Open Scanner
Open Scanner is an open-source document scanning app for iPhone

@grumpybozo@toad.social
2025-09-03 14:49:04

33k one-page TIFFs is an OCR challenge, but it's not insurmountable. https://fed.brid.gy/r/https://bsky.app/profile/did:plc:gvda6fem6r7selm4gzjjww4a/post/3lxvbrbeabc2a

Leah McElrath (@leahmcelrath.bsky.social)
Looks like they purposefully made the released Epstein documents into a pile of hay to make finding any needles very challenging. They even made each page a separate file.

@arXiv_csCL_bot@mastoxiv.page
2025-09-15 09:43:41

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning
Haiyang Yu, Yuchuan Wu, Fan Shi, Lei Liao, Jinghui Lu, Xiaodong Ge, Han Wang, Minghan Zhuo, Xuecheng Wu, Xiang Fei, Hao Feng, Guozhi Tang, An-Lan Wang, Hanshen Zhu, Yangfan He, Quanhuan Liang, Liyuan Meng, Chao Feng, Can Huang, Jingqun Tang, Bin Li
https://

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning
Chinese ancient documents, invaluable carriers of millennia of Chinese history and culture, hold rich knowledge across diverse fields but face challenges in digitization and understanding, i.e., traditional methods only scan images, while current Vision-Language Models (VLMs) struggle with their visual and linguistic complexity. Existing document benchmarks focus on English printed texts or simplified Chinese, leaving a gap for evaluating VLMs on ancient Chinese documents. To address this, we p…

@nelson@tech.lgbt
2025-08-30 01:35:07

One of my most useful tools these days are things that take screenshots. Greenshot, a Windows tool with excellent usability. And Powertools Text Extractor which lets me OCR bits of text on the screen. Usability is important here: press one button and stuff is copied to clipboard.

@arXiv_csCV_bot@mastoxiv.page
2025-09-01 09:52:02

Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Shashank Vempati, Nishit Anand, Gaurav Talebailkar, Arpan Garai, Chetan Arora
https://arxiv.org/abs/2508.21693

Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Conventional optical character recognition (OCR) techniques segmented each character and then recognized. This made them prone to error in character segmentation, and devoid of context to exploit language models. Advances in sequence to sequence translation in last decade led to modern techniques first detecting words and then inputting one word at a time to a model to directly output full words as sequence of characters. This allowed better utilization of language models and bypass error-prone…

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 09:58:31

VARCO-VISION-2.0 Technical Report
Young-rok Cha, Jeongho Ju, SunYoung Park, Jong-Hyeon Lee, Younghyun Yu, Youngjune Kim
https://arxiv.org/abs/2509.10105 https://

VARCO-VISION-2.0 Technical Report
We introduce VARCO-VISION-2.0, an open-weight bilingual vision-language model (VLM) for Korean and English with improved capabilities compared to the previous model VARCO-VISION-14B. The model supports multi-image understanding for complex inputs such as documents, charts, and tables, and delivers layoutaware OCR by predicting both textual content and its spatial location. Trained with a four-stage curriculum with memory-efficient techniques, the model achieves enhanced multimodal alignment, wh…

@arXiv_csCV_bot@mastoxiv.page
2025-10-10 11:03:49

Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning
Sofia Kirsanova, Yao-Yi Chiang, Weiwei Duan
https://arxiv.org/abs/2510.08385 https://

Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning
Historical map legends are critical for interpreting cartographic symbols. However, their inconsistent layouts and unstructured formats make automatic extraction challenging. Prior work focuses primarily on segmentation or general optical character recognition (OCR), with few methods effectively matching legend symbols to their corresponding descriptions in a structured manner. We present a method that combines LayoutLMv3 for layout detection with GPT-4o using in-context learning to detect and …

Tootfinder

Opt-in global Mastodon full text search. Join the index!