Tootfinder

@arXiv_csSE_bot@mastoxiv.page
2025-08-11 07:36:39

Empirical Evaluation of AI-Assisted Software Package Selection: A Knowledge Graph Approach
Siamak Farshidi, Amir Saberhabibi, Behbod Eskafi, Niloofar Nikfarjam, Sadegh Eskandari, Slinger Jansen, Michel Chaudron, Bedir Tekinerdogan
https://arxiv.org/abs/2508.05693

Empirical Evaluation of AI-Assisted Software Package Selection: A Knowledge Graph Approach
Selecting third-party software packages in open-source ecosystems like Python is challenging due to the large number of alternatives and limited transparent evidence for comparison. Generative AI tools are increasingly used in development workflows, but their suggestions often overlook dependency evaluation, emphasize popularity over suitability, and lack reproducibility. This creates risks for projects that require transparency, long-term reliability, maintainability, and informed architectura…

@arXiv_csSE_bot@mastoxiv.page
2025-09-10 07:49:31

Aspect-Oriented Programming in Secure Software Development: A Case Study of Security Aspects in Web Applications
Mterorga Ukor
https://arxiv.org/abs/2509.07449 https://

Aspect-Oriented Programming in Secure Software Development: A Case Study of Security Aspects in Web Applications
Security remains a critical challenge in modern web applications, where threats such as unauthorized access, data breaches, and injection attacks continue to undermine trust and reliability. Traditional Object-Oriented Programming (OOP) often intertwines security logic with business functionality, leading to code tangling, scattering, and reduced maintainability. This study investigates the role of Aspect-Oriented Programming (AOP) in enhancing secure software development by modularizing cross-…

@arXiv_csSE_bot@mastoxiv.page
2025-07-02 09:57:00

Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability
Markus Borg, Dave Hewett, Nadim Hagatulah, Noric Couderc, Emma S\"oderberg, Donald Graham, Uttam Kini, Dave Farley
https://arxiv.org/abs/2507.00788

Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability
[Context] AI assistants, like GitHub Copilot and Cursor, are transforming software engineering. While several studies highlight productivity improvements, their impact on maintainability requires further investigation. [Objective] This study investigates whether co-development with AI assistants affects software maintainability, specifically how easily other developers can evolve the resulting source code. [Method] We conducted a two-phase controlled experiment involving 151 participants, 95% o…

@tiotasram@kolektiva.social
2025-07-31 16:25:48

LLM coding is the opposite of DRY
An important principle in software engineering is DRY: Don't Repeat Yourself. We recognize that having the same code copied in more than one place is bad for several reasons:
1. It makes the entire codebase harder to read.
2. It increases maintenance burden, since any problems in the duplicated code need to be solved in more than one place.
3. Because it becomes possible for the copies to drift apart if changes to one aren't transferred to the other (maybe the person making the change has forgotten there was a copy) it makes the code more error-prone and harder to debug.
All modern programming languages make it almost entirely unnecessary to repeat code: we can move the repeated code into a "function" or "module" and then reference it from all the different places it's needed. At a larger scale, someone might write an open-source "library" of such functions or modules and instead of re-implementing that functionality ourselves, we can use their code, with an acknowledgement. Using another person's library this way is complicated, because now you're dependent on them: if they stop maintaining it or introduce bugs, you've inherited a problem, but still, you could always copy their project and maintain your own version, and it would be not much more work than if you had implemented stuff yourself from the start. It's a little more complicated than this, but the basic principle holds, and it's a foundational one for software development in general and the open-source movement in particular. The network of "citations" as open-source software builds on other open-source software and people contribute patches to each others' projects is a lot of what makes the movement into a community, and it can lead to collaborations that drive further development. So the DRY principle is important at both small and large scales.
Unfortunately, the current crop of hyped-up LLM coding systems from the big players are antithetical to DRY at all scales:
- At the library scale, they train on open source software but then (with some unknown frequency) replicate parts of it line-for-line *without* any citation [1]. The person who was using the LLM has no way of knowing that this happened, or even any way to check for it. In theory the LLM company could build a system for this, but it's not likely to be profitable unless the courts actually start punishing these license violations, which doesn't seem likely based on results so far and the difficulty of finding out that the violations are happening. By creating these copies (and also mash-ups, along with lots of less-problematic stuff), the LLM users (enabled and encouraged by the LLM-peddlers) are directly undermining the DRY principle. If we see what the big AI companies claim to want, which is a massive shift towards machine-authored code, DRY at the library scale will effectively be dead, with each new project simply re-implementing the functionality it needs instead of every using a library. This might seem to have some upside, since dependency hell is a thing, but the downside in terms of comprehensibility and therefore maintainability, correctness, and security will be massive. The eventual lack of new high-quality DRY-respecting code to train the models on will only make this problem worse.
- At the module & function level, AI is probably prone to re-writing rather than re-using the functions or needs, especially with a workflow where a human prompts it for many independent completions. This part I don't have direct evidence for, since I don't use LLM coding models myself except in very specific circumstances because it's not generally ethical to do so. I do know that when it tries to call existing functions, it often guesses incorrectly about the parameters they need, which I'm sure is a headache and source of bugs for the vibe coders out there. An AI could be designed to take more context into account and use existing lookup tools to get accurate function signatures and use them when generating function calls, but even though that would probably significantly improve output quality, I suspect it's the kind of thing that would be seen as too-baroque and thus not a priority. Would love to hear I'm wrong about any of this, but I suspect the consequences are that any medium-or-larger sized codebase written with LLM tools will have significant bloat from duplicate functionality, and will have places where better use of existing libraries would have made the code simpler. At a fundamental level, a principle like DRY is not something that current LLM training techniques are able to learn, and while they can imitate it from their training sets to some degree when asked for large amounts of code, when prompted for many smaller chunks, they're asymptotically likely to violate it.
I think this is an important critique in part because it cuts against the argument that "LLMs are the modern compliers, if you reject them you're just like the people who wanted to keep hand-writing assembly code, and you'll be just as obsolete." Compilers actually represented a great win for abstraction, encapsulation, and DRY in general, and they supported and are integral to open source development, whereas LLMs are set to do the opposite.
[1] to see what this looks like in action in prose, see the example on page 30 of the NYTimes copyright complaint against OpenAI (#AI #GenAI #LLMs #VibeCoding

@arXiv_csSE_bot@mastoxiv.page
2025-06-24 11:17:30

The Impact of AI-Generated Solutions on Software Architecture and Productivity: Results from a Survey Study
Giorgio Amasanti, Jasmin Jahic
https://arxiv.org/abs/2506.17833

The Impact of AI-Generated Solutions on Software Architecture and Productivity: Results from a Survey Study
AI-powered software tools are widely used to assist software engineers. However, there is still a need to understand the productivity benefits of such tools for software engineers. In addition to short-term benefits, there is a question of how adopting AI-generated solutions affects the quality of software over time (e.g., maintainability and extendability). To provide some insight on these questions, we conducted a survey among software practitioners who use AI tools. Based on the data colle…

@arXiv_csSE_bot@mastoxiv.page
2025-09-04 08:39:41

Vision: An Extensible Methodology for Formal Software Verification in Microservice Systems
Connor Wojtak, Darek Gajewski, Tomas Cerny
https://arxiv.org/abs/2509.02860 https://…

Vision: An Extensible Methodology for Formal Software Verification in Microservice Systems
Microservice systems are becoming increasingly adopted due to their scalability, decentralized development, and support for continuous integration and delivery (CI/CD). However, this decentralized development by separate teams and continuous evolution can introduce miscommunication and incompatible implementations, undermining system maintainability and reliability across aspects from security policy to system architecture. We propose a novel methodology that statically reconstructs microservic…

@arXiv_csSE_bot@mastoxiv.page
2025-09-01 09:09:53

Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity
Domenico Cotroneo, Cristina Improta, Pietro Liguori
https://arxiv.org/abs/2508.21634

Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity
As AI code assistants become increasingly integrated into software development workflows, understanding how their code compares to human-written programs is critical for ensuring reliability, maintainability, and security. In this paper, we present a large-scale comparison of code authored by human developers and three state-of-the-art LLMs, i.e., ChatGPT, DeepSeek-Coder, and Qwen-Coder, on multiple dimensions of software quality: code defects, security vulnerabilities, and structural complexit…

@arXiv_csSE_bot@mastoxiv.page
2025-08-22 09:05:31

QUPER-MAn: Benchmark-Guided Target Setting for Maintainability Requirements
Markus Borg, Martin Larsson, Philip Breid, Nadim Hagatulah
https://arxiv.org/abs/2508.15512 https://

QUPER-MAn: Benchmark-Guided Target Setting for Maintainability Requirements
Maintainable source code is essential for sustainable development in any software organization. Unfortunately, many studies show that maintainability often receives less attention than its importance warrants. We argue that requirements engineering can address this gap the problem by fostering discussions and setting appropriate targets in a responsible manner. In this preliminary work, we conducted an exploratory study of industry practices related to requirements engineering for maintainabili…

@arXiv_csSE_bot@mastoxiv.page
2025-06-19 08:37:03

Large Language Models for Unit Testing: A Systematic Literature Review
Quanjun Zhang, Chunrong Fang, Siqi Gu, Ye Shang, Zhenyu Chen, Liang Xiao
https://arxiv.org/abs/2506.15227

Large Language Models for Unit Testing: A Systematic Literature Review
Unit testing is a fundamental practice in modern software engineering, with the aim of ensuring the correctness, maintainability, and reliability of individual software components. Very recently, with the advances in Large Language Models (LLMs), a rapidly growing body of research has leveraged LLMs to automate various unit testing tasks, demonstrating remarkable performance and significantly reducing manual effort. However, due to ongoing explorations in the LLM-based unit testing field, it is…

@arXiv_csSE_bot@mastoxiv.page
2025-06-17 11:03:37

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers
Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Bram Adams, Ahmed E. Hassan
https://arxiv.org/abs/2506.13538

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers
Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standardize this tool ecosystem, which has become the de facto standard with over eight million weekly SD…

@arXiv_csSE_bot@mastoxiv.page
2025-06-16 10:09:39

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
Jiahui Geng, Fengyu Cai, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, Fakhri Karray
https://arxiv.org/abs/2506.11066

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
Code retrieval is essential in modern software development, as it boosts code reuse and accelerates debugging. However, current benchmarks primarily emphasize functional relevance while neglecting critical dimensions of software quality. Motivated by this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across four key dimensions: correctness, efficiency, security, and maintainability. CoQuIR provides fine-gra…

@arXiv_csSE_bot@mastoxiv.page
2025-08-12 10:27:53

Extracting Overlapping Microservices from Monolithic Code via Deep Semantic Embeddings and Graph Neural Network-Based Soft Clustering
Morteza Ziabakhsh, Kiyan Rezaee, Sadegh Eskandari, Seyed Amir Hossein Tabatabaei, Mohammad M. Ghassemi
https://arxiv.org/abs/2508.07486

Extracting Overlapping Microservices from Monolithic Code via Deep Semantic Embeddings and Graph Neural Network-Based Soft Clustering
Modern software systems are increasingly shifting from monolithic architectures to microservices to enhance scalability, maintainability, and deployment flexibility. Existing microservice extraction methods typically rely on hard clustering, assigning each software component to a single microservice. This approach often increases inter-service coupling and reduces intra-service cohesion. We propose Mo2oM (Monolithic to Overlapping Microservices), a framework that formulates microservice extract…

@arXiv_csSE_bot@mastoxiv.page
2025-07-18 09:20:02

ROSE: Transformer-Based Refactoring Recommendation for Architectural Smells
Samal Nursapa, Anastassiya Samuilova, Alessio Bucaioni. Phuong T. Nguyen
https://arxiv.org/abs/2507.12561

ROSE: Transformer-Based Refactoring Recommendation for Architectural Smells
Architectural smells such as God Class, Cyclic Dependency, and Hub-like Dependency degrade software quality and maintainability. Existing tools detect such smells but rarely suggest how to fix them. This paper explores the use of pre-trained transformer models--CodeBERT and CodeT5--for recommending suitable refactorings based on detected smells. We frame the task as a three-class classification problem and fine-tune both models on over 2 million refactoring instances mined from 11,149 open-sour…

@arXiv_csSE_bot@mastoxiv.page
2025-08-19 09:12:29

Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset
Zhipeng Xue, Xiaoting Zhang, Zhipeng Gao, Xing Hu, Shan Gao, Xin Xia, Shanping Li
https://arxiv.org/abs/2508.11958

Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset
The Large Language Models (LLMs) have demonstrated great potential in code-related tasks. However, most research focuses on improving the output quality of LLMs (e.g., correctness), and less attention has been paid to the LLM input (e.g., the training code quality). Given that code smells are widely existed in practice and can negatively impact software maintainability and readability, this study takes the first systematic research to assess and improve dataset quality in terms of code smells. …

@arXiv_csSE_bot@mastoxiv.page
2025-07-16 09:40:31

How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow
Jasmine Latendresse, SayedHassan Khatoonabadi, Emad Shihab
https://arxiv.org/abs/2507.10818

How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow
Software libraries are central to the functionality, security, and maintainability of modern code. As developers increasingly turn to Large Language Models (LLMs) to assist with programming tasks, understanding how these models recommend libraries is essential. In this paper, we conduct an empirical study of six state-of-the-art LLMs, both proprietary and open-source, by prompting them to solve real-world Python problems sourced from Stack Overflow. We analyze the types of libraries they import…

@arXiv_csSE_bot@mastoxiv.page
2025-07-14 09:11:22

LLMCup: Ranking-Enhanced Comment Updating with LLMs
Hua Ge, Juan Zhai, Minxue Pan, Fusen He, Ziyue Tan
https://arxiv.org/abs/2507.08671 https://

LLMCup: Ranking-Enhanced Comment Updating with LLMs
While comments are essential for enhancing code readability and maintainability in modern software projects, developers are often motivated to update code but not comments, leading to outdated or inconsistent documentation that hinders future understanding and maintenance. Recent approaches such as CUP and HebCup have attempted automatic comment updating using neural sequence-to-sequence models and heuristic rules, respectively. However, these methods can miss or misinterpret crucial informatio…

Tootfinder

Opt-in global Mastodon full text search. Join the index!