
2025-10-01 12:04:48
We offer rolling admission for our online grad certificate in Digital Curation, so you can apply any time. Our spring courses will cover #DigitalPreservation #DigitalStorage #Metadata
We offer rolling admission for our online grad certificate in Digital Curation, so you can apply any time. Our spring courses will cover #DigitalPreservation #DigitalStorage #Metadata
Agent-centric learning: from external reward maximization to internal knowledge curation
Hanqi Zhou, Fryderyk Mantiuk, David G. Nagy, Charley M. Wu
https://arxiv.org/abs/2507.22255
ML-Asset Management: Curation, Discovery, and Utilization
Mengying Wang, Moming Duan, Yicong Huang, Chen Li, Bingsheng He, Yinghui Wu
https://arxiv.org/abs/2509.23577 https://…
🇺🇦 #NowPlaying on KEXP's #Roadhouse
Pacific Range:
🎵 Guiding the Mast
#PacificRange
https://curation-records.bandcamp.com/track/guiding-the-mast
https://open.spotify.com/track/2LPYNcHhhQtcblPpRXRYZ7
celegans_interactomes: C. elegans interactomes (2009)
Ten networks of protein-protein interactions in Caenorhabditis elegans (nematode), from yeast two-hybrid experiments, biological process maps, literature curation, orthologous interactions, and genetic interactions. The WI8 network combines WI2004, WI2007 and BPmaps, while the Integrated Network combines data from all sources.
This network has 431 nodes and 438 edges.
Tags: Biological, Protein interactions, Unweighted
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
Debargha Ganguly, Sumit Kumar, Ishwar Balappanawar, Weicong Chen, Shashank Kambhatla, Srinivasan Iyengar, Shivkumar Kalyanaraman, Ponnurangam Kumaraguru, Vipin Chaudhary
https://arxiv.org/abs/2509.22631
«The recent news concerning GigaScience’s owners, BGI, laying off the entire editorial, software and curation team in Hong Kong on short notice, has filled us with both surprise and deep disappointment.»
Indeed, GigaScience has always been a mainstay at BOSC and the larger intersection of life sciences & open source/science & citizen science.
https://www.open-bio.org/2025/09/30/2025-09-30-gigascience/
Crosslisted article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[12/21]:
- Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Li, Hu, Shang, Wu, Liu, Liu, Gao, Shi, Zhang, Zhang, Shi, YU, Wu, Wu, Jia, Xiang, He, Li
Deconstructing Self-Bias in LLM-generated Translation Benchmarks
Wenda Xu, Sweta Agrawal, Vil\'em Zouhar, Markus Freitag, Daniel Deutsch
https://arxiv.org/abs/2509.26600 htt…
CodableLLM: Automating Decompiled and Source Code Mapping for LLM Dataset Generation
Dylan Manuel, Paul Rad
https://arxiv.org/abs/2507.22066 https://arxiv.…
"Caring for Data’s Soul: The development of a Curation Impact Factor to pinpoint the effects of data curation activities on data quality" https://doi.org/10.2218/ijdc.v19i1.1030
Burst: Collaborative Curation in Connected Social Media Communities
Yutong Zhang, Taeuk Kang, Sydney Yeh, Anavi Baddepudi, Lindsay Popowski, Tiziano Piccardi, Michael S. Bernstein
https://arxiv.org/abs/2508.19768
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
Zinan Tang, Xin Gao, Qizhi Pei, Zhuoshi Pan, Mengzhang Cai, Jiang Wu, Conghui He, Lijun Wu
https://arxiv.org/abs/2508.21589
Experiversum: an Ecosystem for Curating and Enhancing Data-Driven Experimental Science
Genoveva Vargas-Solar (LIRIS), Umberto Costa (ERIC, UL2), J\'er\^ome Darmont (ERIC, UL2), Javier Espinosa-Oviedo (ERIC, UCBL), Carmem Hara (ERIC, UL2), Sabine Loudcher (ERIC, UL2), Regina Motz, Martin A. Musicante, Jos\'e-Luis Zechinelli-Martini
https://
SMECS: A Software Metadata Extraction and Curation Software
Stephan Ferenz, Aida Jafarbigloo, Oliver Werth, Astrid Nie{\ss}e
https://arxiv.org/abs/2507.18159 https://
OpenGVL - Benchmarking Visual Temporal Progress for Data Curation
Pawe{\l} Budzianowski, Emilia Wi\'snios, Gracjan G\'oral, Igor Kulakov, Viktor Petrenko, Krzysztof Walas
https://arxiv.org/abs/2509.17321
MetaCLIP 2: A Worldwide Scaling Recipe
Yung-Sung Chuang, Yang Li, Dong Wang, Ching-Feng Yeh, Kehan Lyu, Ramya Raghavendra, James Glass, Lifei Huang, Jason Weston, Luke Zettlemoyer, Xinlei Chen, Zhuang Liu, Saining Xie, Wen-tau Yih, Shang-Wen Li, Hu Xu
https://arxiv.org/abs/2507.22062
OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
Huong Ngo, Matt Deitke, Martijn Bartelds, Sarah Pratt, Josh Gardner, Matt Jordan, Ludwig Schmidt
https://arxiv.org/abs/2508.20869
CaTE Data Curation for Trustworthy AI
Mary Versa Clemens-Sewall, Christopher Cervantes, Emma Rafkin, J. Neil Otte, Tom Magelinski, Libby Lewis, Michelle Liu, Dana Udwin, Monique Kirkman-Bey
https://arxiv.org/abs/2508.14741
Need to put your collection in a database and get it online? This fall UMaine's offering Digital Collections and Exhibitions (DIG 540) online with Craig Dietrich.
#Archives
Medium's CEO explains how the platform's AI policy has evolved as it adopts some AI tools while prioritizing human curation and inviting community feedback (Tony Stubblebine/The Medium Blog)
https://medium.com/blog/we-want-your-f
Culling Misinformation from Gen AI: Toward Ethical Curation and Refinement
Prerana Khatiwada, Grace Donaher, Jasymyn Navarro, Lokesh Bhatta
https://arxiv.org/abs/2507.14242
White House officials said they would review the Smithsonian’s exhibition text, curation, exhibition planning and collections,
starting with eight museums
“The Smithsonian’s work is grounded in a deep commitment to scholarly excellence, rigorous research, and the accurate, factual presentation of history,”
a Smithsonian spokesperson said in a statement Tuesday afternoon.
“We are reviewing the letter with this commitment in mind and will continue to collaborate constr…
Medium's CEO explains how the platform's AI policy has evolved as it adopts some AI tools while prioritizing human curation and seeking community feedback (Tony Stubblebine/The Medium Blog)
https://medium.com/blog/we-want-your-f…
Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs
Terry Jingchen Zhang, Wenyuan Jiang, Rongchuan Liu, Yisong Wang, Junran Yang, Ning Wang, Nicole Ni, Yinya Huang, Mrinmaya Sachan
https://arxiv.org/abs/2508.15878
Cascade! Human in the loop shortcomings can increase the risk of failures in recommender systems
Wm. Matthew Kennedy, Nishanshi Shukla, Cigdem Patlak, Blake Chambers, Theodora Skeadas, Tuesday, Kingsley Owadara, Aayush Dhanotiya
https://arxiv.org/abs/2509.20099
Bayesian Anomaly Detection for Ia Cosmology: Automating SALT3 Data Curation
S. A. K. Leeney, W. J. Handley, H. T. J. Bevins, E. de Lera Acedo
https://arxiv.org/abs/2509.13394 ht…
Identifying Constructive Conflict in Online Discussions through Controversial yet Toxicity Resilient Posts
Ozgur Can Seckin, Bao Tran Truong, Alessandro Flammini, Filippo Menczer
https://arxiv.org/abs/2509.18303
Herewith yet another curation of Long Links: https://www.tbray.org/ongoing/When/202x/2025/08/04/Long-Links
This month featuring new music from the New Eves, Coureurs des Bois, assembly mutation testing, and more flirting with Class Reductionism and Late Ca…
Feel like your collection website could be better but don't know the necessary tools? A few seats left in UMaine's online course in Digital Collections and Exhibitions starting 3 Sep, which teaches the fundamentals of getting your records in a database and putting them online https://DigitalCuration.UMaine.edu
celegans_interactomes: C. elegans interactomes (2009)
Ten networks of protein-protein interactions in Caenorhabditis elegans (nematode), from yeast two-hybrid experiments, biological process maps, literature curation, orthologous interactions, and genetic interactions. The WI8 network combines WI2004, WI2007 and BPmaps, while the Integrated Network combines data from all sources.
This network has 431 nodes and 438 edges.
Tags: Biological, Protein interactions, Unweighted
"Development of an Integrated Lifecycle of #RDM #Tools: Looking Back and Forward at KU Leuven"
https://ijdc.net/index.…
Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning
Antonio Guillen-Perez
https://arxiv.org/abs/2508.18397
Improving Deep Learning for Accelerated MRI With Data Filtering
Kang Lin, Anselm Krainovic, Kun Wang, Reinhard Heckel
https://arxiv.org/abs/2508.13822 https://
Building Data-Driven Occupation Taxonomies: A Bottom-Up Multi-Stage Approach via Semantic Clustering and Multi-Agent Collaboration
Nan Li, Bo Kang, Tijl De Bie
https://arxiv.org/abs/2509.15786
Best Practices and Considerations for Child Speech Corpus Collection and Curation in Educational, Clinical, and Forensic Scenarios
John Hansen, Satwik Dutta, Ellen Grand
https://arxiv.org/abs/2507.12870
@… I do like the thought.
I think that curation needs curators who curate …
PHP turned 30 last month. Why does this language that powers 75% of the web—including Facebook, Wikipedia, and WordPress—have such staying power? Find out by registering for this fall's DIG 540, UMaine's online course in building cultural databases.
https://DigitalCuration.UMaine.edu
Crosslisted article(s) found for astro-ph.CO. https://arxiv.org/list/astro-ph.CO/new
[1/1]:
- Bayesian Anomaly Detection for Ia Cosmology: Automating SALT3 Data Curation
S. A. K. Leeney, W. J. Handley, H. T. J. Bevins, E. de Lera Acedo
HypoGeneAgent: A Hypothesis Language Agent for Gene-Set Cluster Resolution Selection Using Perturb-seq Datasets
Ying Yuan, Xing-Yue Monica Ge, Aaron Archer Waterman, Tommaso Biancalani, David Richmond, Yogesh Pandit, Avtar Singh, Russell Littman, Jin Liu, Jan-Christian Huetter, Vladimir Ermakov
https://arxiv.org/abs/2509.09740
🇺🇦 #NowPlaying on #BBC6Music's #RileyAndCoe
Joe Harvey-Whyte And Bobby Lee:
🎵 Smoke Signals
#JoeHarveyWhyteAndBobbyLee
https://curation-records.bandcamp.com/track/smoke-signals
https://open.spotify.com/track/3W44JcdMkH5or7IJr7Y5Bp
zERExtractor:An Automated Platform for Enzyme-Catalyzed Reaction Data Extraction from Scientific Literature
Rui Zhou, Haohui Ma, Tianle Xin, Lixin Zou, Qiuyue Hu, Hongxi Cheng, Mingzhi Lin, Jingjing Guo, Sheng Wang, Guoqing Zhang, Yanjie Wei, Liangzhen Zheng
https://arxiv.org/abs/2508.09995
celegans_interactomes: C. elegans interactomes (2009)
Ten networks of protein-protein interactions in Caenorhabditis elegans (nematode), from yeast two-hybrid experiments, biological process maps, literature curation, orthologous interactions, and genetic interactions. The WI8 network combines WI2004, WI2007 and BPmaps, while the Integrated Network combines data from all sources.
This network has 2436 nodes and 136930 edges.
Tags: Biological, Protein interactions, Unweighte…
Replaced article(s) found for eess.AS. https://arxiv.org/list/eess.AS/new
[1/1]:
- Less is More: Data Curation Matters in Scaling Speech Enhancement
Li, Zhang, Wang, Scheibler, Saijo, Cornell, Fu, Sach, Ni, Kumar, Fingscheidt, Watanabe, Qian
SafetyFlow: An Agent-Flow System for Automated LLM Safety Benchmarking
Xiangyang Zhu, Yuan Tian, Chunyi Li, Kaiwei Zhang, Wei Sun, Guangtao Zhai
https://arxiv.org/abs/2508.15526
Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning
Yutong Wang, Pengliang Ji, Kaixin Li, Baolong Bi, Tao Feng, Guillaume Sartoretti
https://arxiv.org/abs/2508.03018
Improving the FAIRness and Sustainability of the NHGRI Resources Ecosystem
Larry Babb, Carol Bult, Vincent J. Carey, Robert J. Carroll, Benjamin C. Hitz, Chris J. Mungall, Heidi L. Rehm, Michael C. Schatz, Alex Wagner, NHGRI Resource Workshop Community
https://arxiv.org/abs/2508.13498
Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing
Manatsawin Hanmongkolchai
https://arxiv.org/abs/2507.15599 htt…
SAIL-VL2 Technical Report
Weijie Yin, Yongjie Ye, Fangxun Shu, Yue Liao, Zijian Kang, Hongyuan Dong, Haiyang Yu, Dingkang Yang, Jiacong Wang, Han Wang, Wenzhuo Liu, Xiao Liang, Shuicheng Yan, Chao Feng
https://arxiv.org/abs/2509.14033
An HTR-LLM Workflow for High-Accuracy Transcription and Analysis of Abbreviated Latin Court Hand
Joshua D. Isom
https://arxiv.org/abs/2507.04132 https://…
celegans_interactomes: C. elegans interactomes (2009)
Ten networks of protein-protein interactions in Caenorhabditis elegans (nematode), from yeast two-hybrid experiments, biological process maps, literature curation, orthologous interactions, and genetic interactions. The WI8 network combines WI2004, WI2007 and BPmaps, while the Integrated Network combines data from all sources.
This network has 1496 nodes and 1816 edges.
Tags: Biological, Protein interactions, Unweighted
Will AI rescue cultural heritage or replace it? In this IEEE video, learn how LLMs act as a compression algorithm for the Internet, enabling new forms of access while simultaneously threatening to replace ground truth with convincing facsimiles.
I also explain why mechanistic analogies, unlike animate metaphors like a child or parrot, can help us tune AI to get optimal results.
Online Homogeneity Can Emerge Without Filtering Algorithms or Homophily Preferences
Petter T\"ornberg
https://arxiv.org/abs/2508.10466 https://arxiv.o…
🇺🇦 #NowPlaying on #BBC6Music's #RileyAndCoe
Joe Harvey-Whyte And Bobby Lee:
🎵 Smoke Signals
#JoeHarveyWhyteAndBobbyLee
https://curation-records.bandcamp.com/track/smoke-signals
https://open.spotify.com/track/3W44JcdMkH5or7IJr7Y5Bp
RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale
Zhilong Chen, Chengzong Zhao, Boyuan Chen, Dayi Lin, Yihao Chen, Arthur Leung, Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Ahmed E. Hassan
https://arxiv.org/abs/2508.01550
celegans_interactomes: C. elegans interactomes (2009)
Ten networks of protein-protein interactions in Caenorhabditis elegans (nematode), from yeast two-hybrid experiments, biological process maps, literature curation, orthologous interactions, and genetic interactions. The WI8 network combines WI2004, WI2007 and BPmaps, while the Integrated Network combines data from all sources.
This network has 912 nodes and 22738 edges.
Tags: Biological, Protein interactions, Unweighted
Campaigning through the lens of Google: A large-scale algorithm audit of Google searches in the run-up to the Swiss Federal Elections 2023
Tobias Rohrbach, Mykola Makhortykh, Maryna Sydorova
https://arxiv.org/abs/2507.06018
Towards Reliable Multi-Agent Systems for Marketing Applications via Reflection, Memory, and Planning
Lorenzo Jaime Yu Flores, Junyi Shen, Xiaoyuan Gu
https://arxiv.org/abs/2508.11120
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[4/5]:
- SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning
Yu Zhang, Yuqi Xie, Huihan Liu, Rutav Shah, Michael Wan, Linxi Fan, Yuke Zhu
Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning
Yejin Jeon, Solee Im, Youngjae Kim, Gary Geunbae Lee
https://arxiv.org/abs/2508.10412
When Kids Mode Isn't For Kids: Investigating TikTok's "Under 13 Experience"
Olivia Figueira, Pranathi Chamarthi, Tu Le, Athina Markopoulou
https://arxiv.org/abs/2507.00299
celegans_interactomes: C. elegans interactomes (2009)
Ten networks of protein-protein interactions in Caenorhabditis elegans (nematode), from yeast two-hybrid experiments, biological process maps, literature curation, orthologous interactions, and genetic interactions. The WI8 network combines WI2004, WI2007 and BPmaps, while the Integrated Network combines data from all sources.
This network has 759 nodes and 1593 edges.
Tags: Biological, Protein interactions, Unweighted…
BOOST: Out-of-Distribution-Informed Adaptive Sampling for Bias Mitigation in Stylistic Convolutional Neural Networks
Mridula Vijendran, Shuang Chen, Jingjing Deng, Hubert P. H. Shum
https://arxiv.org/abs/2507.07134
Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions
Longfei Li, Zhiwen Fan, Wenyan Cong, Xinhang Liu, Yuyang Yin, Matt Foutter, Panwang Pan, Chenyu You, Yue Wang, Zhangyang Wang, Yao Zhao, Marco Pavone, Yunchao Wei
https://arxiv.org/abs/2507.07978…
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance
Myeongsoo Kim, Shweta Garg, Baishakhi Ray, Varun Kumar, Anoop Deoras
https://arxiv.org/abs/2507.10646
celegans_interactomes: C. elegans interactomes (2009)
Ten networks of protein-protein interactions in Caenorhabditis elegans (nematode), from yeast two-hybrid experiments, biological process maps, literature curation, orthologous interactions, and genetic interactions. The WI8 network combines WI2004, WI2007 and BPmaps, while the Integrated Network combines data from all sources.
This network has 759 nodes and 1593 edges.
Tags: Biological, Protein interactions, Unweighted…
🇺🇦 #NowPlaying on KEXP's #Roadhouse
Joe Harvey-Whyte & Bobby Lee:
🎵 Flatbed Alfalfa Run To Pueblo, Colorado, Fall 1972
#JoeHarveyWhyte #BobbyLee
https://curation-records.bandcamp.com/track/flatbed-alfalfa-run-to-pueblo-colorado-fall-1972
https://open.spotify.com/track/1Dha5E7nCLgGmYXXGHK0Mv
SynGen-Vision: Synthetic Data Generation for training industrial vision models
Alpana Dubey, Suma Mani Kuriakose, Nitish Bhardwaj
https://arxiv.org/abs/2509.04894 https://
Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion
Sha Zhao, Song Yi, Yangxuan Zhou, Jiadong Pan, Jiquan Wang, Jie Xia, Shijian Li, Shurong Dong, Gang Pan
https://arxiv.org/abs/2508.04723