
2025-08-15 10:25:22
Performance of GPT-5 in Brain Tumor MRI Reasoning
Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, Xiaofeng Yang
https://arxiv.org/abs/2508.10865 https://…
Performance of GPT-5 in Brain Tumor MRI Reasoning
Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, Xiaofeng Yang
https://arxiv.org/abs/2508.10865 https://…
WALL: A Web Application for Automated Quality Assurance using Large Language Models
Seyed Moein Abtahi, Akramul Azim
https://arxiv.org/abs/2509.09918 https://
ROBOPSY PL[AI]: Using Role-Play to Investigate how LLMs Present Collective Memory
Margarete Jahrmann, Thomas Brandstetter, Stefan Glasauer
https://arxiv.org/abs/2510.09874 https…
Automated Classification of Tutors' Dialogue Acts Using Generative AI: A Case Study Using the CIMA Corpus
Liqun He, Jiaqi Xu
https://arxiv.org/abs/2509.09125 https://…
Source: GPT-5 improvements won't be comparable to the leaps in performance of earlier models, such as between GPT-3 in 2020 and GPT-4 in 2023 (The Information)
https://www.theinformation.com/articles/inside-openais-rocky-path-gpt-5
Design and Implementation of Code Completion System Based on LLM and CodeBERT Hybrid Subsystem
Bingbing Zhang, Ziyu Lin, Yingxin Su
https://arxiv.org/abs/2509.08215 https://
GPT-5 may be slightly disappointing, Genie 3 demo blew me away... Watch it.
#ai
Nos reíamos de que Reagan preguntara a una vidente decisiones de política durante su presidencia. Pues en Suecia estšn con la versión 3.0 de consultar a un oršculo: https://www.theguardian.com/technology/2025/aug/05/chat-gpt-sw…
The Boiling-Frog Problem of Physics Education
Gerd Kortemeyer
https://arxiv.org/abs/2508.08842 https://arxiv.org/pdf/2508.08842
Retrieval Augmented Large Language Model System for Comprehensive Drug Contraindications
Byeonghun Bang, Jongsuk Yoon, Dong-Jin Chang, Seho Park, Yong Oh Lee
https://arxiv.org/abs/2508.06145
Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning
Imran Mansha
https://arxiv.org/abs/2510.05003 https://arxiv.org/pdf/2…
Replaced article(s) found for cs.PL. https://arxiv.org/list/cs.PL/new
[1/1]:
- RTLCoder: Outperforming GPT-3.5 in Design RTL Generation with Our Open-Source Dataset and Lightwe...
Shang Liu, Wenji Fang, Yao Lu, Qijun Zhang, Hongce Zhang, Zhiyao Xie
Can Large Language Models Bridge the Gap in Environmental Knowledge?
Linda Smail (College of Interdisciplinary Studies, Zayed University, UAE), David Santandreu Calonge (Department of Academic Development, Mohamed bin Zayed University of Artificial Intelligence, UAE), Firuz Kamalov (School of Engineering, Applied Science,Technology, Canadian University Dubai, UAE), Nur H. Orak (Department of Environmental Engineering, Marmara University, T\"urkiye)
Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy
Jairo Gudi\~no-Rosero, Cl\'ement Contet, Umberto Grandi, C\'esar A. Hidalgo
https://arxiv.org/abs/2508.04281
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model
Deepak Kumar, Divakar Yadav, Yash Patel
https://arxiv.org/abs/2508.16700
🧾 Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing
#software
Early Approaches to Adversarial Fine-Tuning for Prompt Injection Defense: A 2022 Study of GPT-3 and Contemporary Models
Gustavo Sandoval, Denys Fenchenko, Junyao Chen
https://arxiv.org/abs/2509.14271
Large Language Model-Based Uncertainty-Adjusted Label Extraction for Artificial Intelligence Model Development in Upper Extremity Radiography
Hanna Kreutzer, Anne-Sophie Caselitz, Thomas Dratsch, Daniel Pinto dos Santos, Christiane Kuhl, Daniel Truhn, Sven Nebelung
https://arxiv.org/abs/2510.05664 …
Anthropic prices Claude Sonnet 4.5 at $3/1M input and $15/1M output tokens, same as Sonnet 4, cheaper than Opus at $15/$75 but higher than GPT-5 at $1.25/$10 (Simon Willison/Simon Willison's Weblog)
https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/
Feedback That Clicks: Introductory Physics Students' Valued Features in AI Feedback Generated From Self-Crafted and Engineered Prompts
Amogh Sirnoorkar, N. Sanjay Rebello
https://arxiv.org/abs/2509.08516
Charts-of-Thought: Enhancing LLM Visualization Literacy Through Structured Data Extraction
Amit Kumar Das, Mohammad Tarun, Klaus Mueller
https://arxiv.org/abs/2508.04842 https:/…
Do Large Language Models Favor Recent Content? A Study on Recency Bias in LLM-Based Reranking
Hanpei Fang, Sijie Tao, Nuo Chen, Kai-Xin Chang, Tetsuya Sakai
https://arxiv.org/abs/2509.11353
Replaced article(s) found for cs.SD. https://arxiv.org/list/cs.SD/new
[1/1]:
- M6(GPT)3: Generating Multitrack Modifiable Multi-Minute MIDI Music from Text using Genetic algori...
Jakub Po\'cwiardowski, Mateusz Modrzejewski, Marek S. Tatara
Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language Models
Ke Zhou, Marios Constantinides, Daniele Quercia
https://arxiv.org/abs/2508.19269 https://
Exploring LLM-generated Culture-specific Affective Human-Robot Tactile Interaction
Qiaoqiao Ren, Tony Belpaeme
https://arxiv.org/abs/2507.22905 https://arx…
Synergies between Federated Foundation Models and Smart Power Grids
Seyyedali Hosseinalipour, Shimiao Li, Adedoyin Inaolaji, Filippo Malandra, Luis Herrera, Nicholas Mastronarde
https://arxiv.org/abs/2509.16496
Assessing the Quality and Security of AI-Generated Code: A Quantitative Analysis
Abbas Sabra, Olivier Schmitt, Joseph Tyler
https://arxiv.org/abs/2508.14727 https://
An Ensemble Classification Approach in A Multi-Layered Large Language Model Framework for Disease Prediction
Ali Hamdi, Malak Mohamed, Rokaia Emad, Khaled Shaban
https://arxiv.org/abs/2509.02446
Mitigating Trojanized Prompt Chains in Educational LLM Use Cases: Experimental Findings and Detection Tool Design
Richard M. Charles, James H. Curry, Richard B. Charles
https://arxiv.org/abs/2507.14207
The Carbon Cost of Conversation, Sustainability in the Age of Language Models
Sayed Mahbub Hasan Amiri, Prasun Goswami, Md. Mainul Islam, Mohammad Shakhawat Hossen, Sayed Majhab Hasan Amiri, Naznin Akter
https://arxiv.org/abs/2507.20018
Personality Matters: User Traits Predict LLM Preferences in Multi-Turn Collaborative Tasks
Sarfaroz Yunusov, Kaige Chen, Kazi Nishat Anwar, Ali Emami
https://arxiv.org/abs/2508.21628
Root Cause Analysis of Radiation Oncology Incidents Using Large Language Models
Yuntao Wang, Mariluz De Ornelas, Matthew T. Studenski, Elizabeth Bossart, Siamak P. Najad-Davarani, Yunze Yang
https://arxiv.org/abs/2508.17201
The Few-shot Dilemma: Over-prompting Large Language Models
Yongjian Tang, Doruk Tuncel, Christian Koerner, Thomas Runkler
https://arxiv.org/abs/2509.13196 https://
Does visualization help AI understand data?
Victoria R. Li, Johnathan Sun, Martin Wattenberg
https://arxiv.org/abs/2507.18022 https://arxiv.org/pdf/2507.18…
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[3/3]:
- CRISPR-GPT for Agentic Automation of Gene-editing Experiments
Qu, Huang, Yin, Zhan, Liu, Yin, Cousins, Johnson, Wang, Shah, Altman, Zhou, Wang, Cong
Can Large Language Models Understand As Well As Apply Patent Regulations to Pass a Hands-On Patent Attorney Test?
Bhakti Khera, Rezvan Alamian, Pascal A. Scherz, Stephan M. Goetz
https://arxiv.org/abs/2507.10576
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/3]:
- Comparison of pipeline, sequence-to-sequence, and GPT models for end-to-end relation extraction: ...
Shashank Gupta, Xuguang Ai, Ramakanth Kavuluru
A Retail-Corpus for Aspect-Based Sentiment Analysis with Large Language Models
Oleg Silcenco, Marcos R. Machad, Wallace C. Ugulino, Daniel Braun
https://arxiv.org/abs/2508.17994
A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts
Kian Tohidi, Kia Dashtipour, Simone Rebora, Sevda Pourfaramarz
https://arxiv.org/abs/2509.14922
Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
Badrinath Ramakrishnan, Akshaya Balaji
https://arxiv.org/abs/2508.14062 https://