Tootfinder

@deprogrammaticaipsum@mas.to
2025-07-05 15:35:43

"The overall life expectancy of a programming language has dwindled in the past 56 years. A COBOL developer in the 1960s most probably retired in the 2000s, still writing COBOL. As a former professional VBScript, then C#, then Objective-C, later Swift, and finally Go developer, I can only see this trend accelerating. We should expect our favorite programming language to be replaced and removed from the market in a relatively shorter time every decade."

Alan Perlis And The Evolution Of Programming Languages
Alan Jay Perlis knew a thing or two about programming languages, both as an early pioneer of our industry and as one of the designers of ALGOL. The language that has inspired the one you, dear reader of this magazine, probably use every day to earn a living.

@Techmeme@techhub.social
2025-06-05 06:01:39

A former employee says fewer than 10,000 people use Ola Krutrim's LLM chatbot, which supports 10 Indian languages, and that over 60% of them are random testers (Swathi Moorthy/The Economic Times)
https://

Krutrim finds few takers for its LLMs and cloud products
Ola founder Aggarwal first announced Krutrim in December 2023, launching an LLM that could support 10 Indian languages, starting with Krutrim chatbot in February 2024, and cloud offering shortly after. Two founders told ET recently that repeated sign-in attempts were blocked by a persistent “logged out of session” error, making sign-up via mobile or Gmail difficult.

@arXiv_csPL_bot@mastoxiv.page
2025-06-05 09:41:19

This https://arxiv.org/abs/2410.05460 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csPL_…

It's Not Easy Being Green: On the Energy Efficiency of Programming Languages
Does the choice of programming language affect energy consumption? Previous highly visible studies have established associations between certain programming languages and energy consumption. A causal misinterpretation of this work has led academics and industry leaders to use or support certain languages based on their claimed impact on energy consumption. This paper tackles this causal question directly. It first corrects and improves the measurement methodology used by prior work. It then dev…

@arXiv_csLO_bot@mastoxiv.page
2025-06-06 07:19:23

Proceedings of the 19th International Workshop on Logical and Semantic Frameworks, with Applications
Cynthia Kop (Radboud Universiteit Nijmegen), Helida Salles Santos (Universidade Federal do Rio Grande)
https://arxiv.org/abs/2506.05219

Proceedings of the 19th International Workshop on Logical and Semantic Frameworks, with Applications
This volume contains the post-proceedings of the 19th LSFA, which was held in Goiânia, the capital of Goiás state in Brazil, from September 18 to September 20, 2024. Logical and semantic frameworks are formal languages used to represent logics, languages and systems. These frameworks provide foundations for the formal specification of systems and programming languages, supporting tool development and reasoning. The aim of this series is bringing together theoreticians and practitioners t…

@sperbsen@discuss.systems
2025-07-03 13:18:31

Dredging up recollections of my experience as R6RS editor for my PLSS talk tomorrow.
https://2025.ecoop.org/details/plss-2025-papers/8/Do-Programming-Languages-Fulfill-Requirements-Should-They-

Do Programming Languages Fulfill Requirements? Should They? (PLSS 2025 - – Programming Language Standardization and Specification) - ECOOP 2025
Workshop on Programming Language Standardization and Specification The evolution of widely adopted programming languages is critical for ensuring their sustainability, interoperability, and adaptability to changing technological and societal needs. This workshop aims to advance the understanding of programming language standardization and foster collaborative solutions for its challenges. Participants will have the opportunity to share insights, case studies, and best practices to shape the fut…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-05 07:23:58

Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models
Parismita Gogoi, Sishir Kalita, Wendy Lalhminghlui, Viyazonuo Terhiija, Moakala Tzudir, Priyankoo Sarmah, S. R. M. Prasanna
https://arxiv.org/abs/2506.03606

Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models
This study explores the use of self-supervised learning (SSL) models for tone recognition in three low-resource languages from North Eastern India: Angami, Ao, and Mizo. We evaluate four Wav2vec2.0 base models that were pre-trained on both tonal and non-tonal languages. We analyze tone-wise performance across the layers for all three languages and compare the different models. Our results show that tone recognition works best for Mizo and worst for Angami. The middle layers of the SSL models ar…

@arXiv_csCL_bot@mastoxiv.page
2025-07-03 09:56:10

Adapting Language Models to Indonesian Local Languages: An Empirical Study of Language Transferability on Zero-Shot Settings
Rifki Afina Putri
https://arxiv.org/abs/2507.01645

Adapting Language Models to Indonesian Local Languages: An Empirical Study of Language Transferability on Zero-Shot Settings
In this paper, we investigate the transferability of pre-trained language models to low-resource Indonesian local languages through the task of sentiment analysis. We evaluate both zero-shot performance and adapter-based transfer on ten local languages using models of different types: a monolingual Indonesian BERT, multilingual models such as mBERT and XLM-R, and a modular adapter-based approach called MAD-X. To better understand model behavior, we group the target languages into three categori…

@arXiv_csSE_bot@mastoxiv.page
2025-06-05 07:21:49

Beyond C/C : Probabilistic and LLM Methods for Next-Generation Software Reverse Engineering
Zhuo Zhuo, Xiangyu Zhang
https://arxiv.org/abs/2506.03504 http…

Beyond C/C++: Probabilistic and LLM Methods for Next-Generation Software Reverse Engineering
This proposal discusses the growing challenges in reverse engineering modern software binaries, particularly those compiled from newer system programming languages such as Rust, Go, and Mojo. Traditional reverse engineering techniques, developed with a focus on C and C++, fall short when applied to these newer languages due to their reliance on outdated heuristics and failure to fully utilize the rich semantic information embedded in binary programs. These challenges are exacerbated by the limi…

@fanf@mendeddrum.org
2025-06-05 11:42:03

from my link log —
SKIM: The implementation of functional languages using custom hardware.
https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-81.html
saved 2025-03-22

@muz4now@mastodon.world
2025-07-04 23:08:13

MSPs back new powers supporting Gaelic and Scots https://www.bbc.com/news/articles/clyzll5m2j1o?utm_source=dlvr.it&utm_content=muz4now/magazine/Signs of Hope&utm_medium=masto…

MSPs back new powers supporting Gaelic and Scots
The Scottish Languages Bill would see both recognised as official languages.

@Mediagazer@mstdn.social
2025-06-06 19:36:02

YouTube rolls out a tool to let some creators upload different thumbnails for each video dubbed into a different language, to help expand their global audience (Dan Whateley/Business Insider)
https://www.businessinsider.com/youtube-te

YouTube is testing a new feature to help videos travel globally
YouTube is testing a new tool to let creators switch out thumbnails for videos in different languages, helping their content spread globally.

@grumpybozo@toad.social
2025-07-05 16:41:34

Great news. Canada is doing offering government services (commercial driver’s license tests) in Ojibwe/Anishinaabemowin.
It’s a crime and a tragedy that indigenous languages are in danger of dying and everything that can be done to fight that is for the better, especially by governments using them.

Indigenous man's 'jaw nearly hit the counter' when told he could write driver's test in Ojibwe | CBC News
The former chief of the Chippewas of Kettle and Stony Point First Nation got an early birthday surprise when he was finally able to do his written driver’s test in Ojibwe.

@deprogrammaticaipsum@mas.to
2025-07-06 08:40:13

"Professor Cain uses various programming languages to explain these concepts: he starts with C (roughly from the second lecture to the 8th), then Assembly (lectures 9 to 11), C (lectures 12 to 18), Scheme (lectures 19 to 23), and Python (from 24 to 26). The last lecture (27) dives into some other functional programming languages like ML, Miranda, and even Haskell, as well as some advanced type design concepts, to round up your general programming knowledge."

Jerry Cain
The late 2000s were an interesting time for online education. The wider availability of faster and more reliable bandwidth led to an explosion of online video. This, in turn, led to the emergence of an ever-expanding number of providers of online learning services, and then to a wave of "Massive Open Online Courses" or MOOCs, many of which were offered by large universities and high schools all over the world. This month's Vidéothèque movie is a full playlist featuring one of the earliest (an…

@theDuesentrieb@social.linux.pizza
2025-06-05 18:19:51

One way to spend time instead of #doomscrolling has recentlt been #codewars
Bitesized problems of different difficulties for almost any language.
Not as big as #adventofcode

Codewars - Achieve mastery through coding practice and developer mentorship
A coding practice website for all programming levels – Join a community of over 3 million developers and improve your coding skills in over 55 programming languages!

@Techmeme@techhub.social
2025-07-05 19:30:53

A look at India's push to compete in the global AI race, as the country's vast linguistic diversity poses a core challenge to building foundational AI models (Shadma Shaikh/MIT Technology Review)
https://www.technologyreview.com/2025/07/0

Inside India’s scramble for AI independence
Structural challenges and the nation’s many languages have made it tough to develop foundational AI models. But the government is keen not to be left behind.

@fanf@mendeddrum.org
2025-07-06 08:42:03

from my link log —
Vectorized interpreters: mass rapid transit for programming languages.
http://venge.net/graydon/talks/VectorizedInterpretersTalk-2023-05-12.pdf
saved 2025-06-05

@arXiv_csFL_bot@mastoxiv.page
2025-06-06 07:17:45

[2025-06-06 Fri (UTC), 2 new articles found for cs.FL Formal Languages and Automata Theory]
#toXiv_bot_toot

@arXiv_csCY_bot@mastoxiv.page
2025-06-05 07:16:45

Facts are Harder Than Opinions -- A Multilingual, Comparative Analysis of LLM-Based Fact-Checking Reliability
Lorraine Saju, Arnim Bleier, Jana Lasser, Claudia Wagner
https://arxiv.org/abs/2506.03655

Facts are Harder Than Opinions -- A Multilingual, Comparative Analysis of LLM-Based Fact-Checking Reliability
The proliferation of misinformation necessitates scalable, automated fact-checking solutions. Yet, current benchmarks often overlook multilingual and topical diversity. This paper introduces a novel, dynamically extensible data set that includes 61,514 claims in multiple languages and topics, extending existing datasets up to 2024. Through a comprehensive evaluation of five prominent Large Language Models (LLMs), including GPT-4o, GPT-3.5 Turbo, LLaMA 3.1, and Mixtral 8x7B, we identify signific…

@arXiv_mathOC_bot@mastoxiv.page
2025-06-05 07:27:54

Learning Parametric Convex Functions
Maximilian Schaller, Alberto Bemporad, Stephen Boyd
https://arxiv.org/abs/2506.04183 https://arx…

Learning Parametric Convex Functions
A parametrized convex function depends on a variable and a parameter, and is convex in the variable for any valid value of the parameter. Such functions can be used to specify parametrized convex optimization problems, i.e., a convex optimization family, in domain specific languages for convex optimization. In this paper we address the problem of fitting a parametrized convex function that is compatible with disciplined programming, to some given data. This allows us to fit a function arising i…

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:15

Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books
Chen Zhang, Jiuheng Lin, Xiao Liu, Zekai Zhang, Yansong Feng
https://arxiv.org/abs/2506.01796

Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books
While large language models (LLMs) have shown promise in translating extremely low-resource languages using resources like dictionaries, the effectiveness of grammar books remains debated. This paper investigates the role of grammar books in translating extremely low-resource languages by decomposing it into two key steps: grammar rule retrieval and application. To facilitate the study, we introduce ZhuangRules, a modularized dataset of grammar rules and their corresponding test sentences. Our …

@arXiv_csSD_bot@mastoxiv.page
2025-06-05 09:41:52

This https://arxiv.org/abs/2502.14627 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSD_…

ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
Multilingual audio-text retrieval (ML-ATR) is a challenging task that aims to retrieve audio clips or multilingual texts from databases. However, existing ML-ATR schemes suffer from inconsistencies for instance similarity matching across languages. We theoretically analyze the inconsistency in terms of both multilingual modal alignment direction error and weight error, and propose the theoretical weight error upper bound for quantifying the inconsistency. Based on the analysis of the weight err…

@arXiv_csAR_bot@mastoxiv.page
2025-06-06 07:15:26

hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation
Charles Hong, Brendan Roberts, Huijae An, Alex Um, Advay Ratan, Yakun Sophia Shao
https://arxiv.org/abs/2506.04544

hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation
Large language models (LLMs) are playing an increasingly large role in domains such as code generation, including hardware code generation, where Verilog is the key language. However, the amount of publicly available Verilog code pales in comparison to the amount of code available for software languages like Python. In this work, we present hdl2v ("HDL-to-Verilog"), a dataset which seeks to increase the amount of available human-written Verilog data by translating or compiling three other hardw…

@detondev@social.linux.pizza
2025-06-03 05:08:51

tap in yall
https://paralogue.org/viewtopic.php?p=559#p559

@arXiv_eessSY_bot@mastoxiv.page
2025-06-04 13:43:13

This https://arxiv.org/abs/2504.09642 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

HBS -- Hardware Build System: A Tcl-based, minimal common abstraction approach for build system for hardware designs
Build systems become an indispensable part of the software implementation and deployment process. New programming languages are released with the build system integrated into the language tools, for example, Go, Rust, or Zig. However, in the hardware description domain, no official build systems have been released with the predominant Hardware Description Languages (HDL) such as VHDL or SystemVerilog. Moreover, hardware design projects are often multilanguage. The paper proposes a new build s…

@arXiv_csSE_bot@mastoxiv.page
2025-06-04 07:30:26

Textual-Based vs. Thinging Machines Conceptual Modeling
Sabah Al-Fedaghi
https://arxiv.org/abs/2506.02646 https://arxiv.org/pdf/2506.…

Textual-Based vs. Thinging Machines Conceptual Modeling
Software engineers typically interpret the domain description in natural language and translate it into a conceptual model. Three approaches are used in this domain modeling: textual languages, diagrammatic languages, and a mixed based of text and diagrams. According to some researchers, relying on a diagrammatic notation levies certain burdens for designing large models because visual languages are intended to depict everything diagrammatically during a development process but fail to do so fo…

@hex@kolektiva.social
2025-06-25 22:07:06

As I'm learning Dutch, I'm reminded that the idea that there are people who believe that the bible is to be taken literally. The idea that a several hundred year old translation of a collection of texts in multiple languages, that were themselves translated multiple times between languages, before the whole thing was translated to Latin, then being translated to English, could somehow perfectly reflect the original text... Yeah, it's only possible to believe that if you have no idea how languages work and have never learned another language.
Like, just from linguistic drift alone if the bible were written in King James English you're losing *so* much context. But Hebrew, Aramaic, and Greek translated to Latin, then to English, then to English again?
There are so many things that erg can't be translated, even as a beginner. Dutch and English are two of the closest languages that exist, they're both Germanic languages and they're the closest to each other (other than Friesian). You can't really be much closer, and yet, there are so many things you can't mutually represent. Hebrew and Latin, Aramaic and Latin, Latin and English, Greek and English, these aren't even the same families at all... They're extremely distant. There's absolutely no way to represent concepts from one to another without another book's worth of explanation.
And that ignores all the cultural context, which is mostly lost and a library and decade of education to get the stuff that we *do* know.
Only monolingual Americans could come up with an idea so incredibly asinine.

@arXiv_qbiobm_bot@mastoxiv.page
2025-07-02 08:21:20

From Sentences to Sequences: Rethinking Languages in Biological System
Ke Liu, Shuanke Shen, Hao Chen
https://arxiv.org/abs/2507.00953 https://

From Sentences to Sequences: Rethinking Languages in Biological System
The paradigm of large language models in natural language processing (NLP) has also shown promise in modeling biological languages, including proteins, RNA, and DNA. Both the auto-regressive generation paradigm and evaluation metrics have been transferred from NLP to biological sequence modeling. However, the intrinsic structural correlations in natural and biological languages differ fundamentally. Therefore, we revisit the notion of language in biological systems to better understand how NLP …

@arXiv_csPL_bot@mastoxiv.page
2025-06-05 07:20:36

[2025-06-05 Thu (UTC), no new articles found for cs.PL Programming Languages]
#toXiv_bot_toot

@arXiv_csCV_bot@mastoxiv.page
2025-06-04 07:58:38

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
Wei Chow, Yuan Gao, Linfeng Li, Xian Wang, Qi Xu, Hang Song, Lingdong Kong, Ran Zhou, Yi Zeng, Yidong Cai, Botian Jiang, Shilin Xu, Jiajun Zhang, Minghui Qiu, Xiangtai Li, Tianshu Yang, Siliang Tang, Juncheng Li
https://arxiv.org/abs/2506.03144

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
Semantic retrieval is crucial for modern applications yet remains underexplored in current research. Existing datasets are limited to single languages, single images, or singular retrieval conditions, often failing to fully exploit the expressive capacity of visual information as evidenced by maintained performance when images are replaced with captions. However, practical retrieval scenarios frequently involve interleaved multi-condition queries with multiple images. Hence, this paper introduc…

@fanf@mendeddrum.org
2025-06-06 20:42:03

from my link log —
Wasm SpecTec has been adopted.
https://webassembly.org/news/2025-03-27-spectec/
saved 2025-03-28 https://

SpecTec has been adopted - WebAssembly
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications.

@avstockhausen@fedihum.org
2025-06-29 20:35:02

Bookmarked: Talking About Muslims in Middle French: The Potential of Word-to-Vector Models for Studying Semantic Relationships in Medieval Languages – DH Lab #Digital_Humanities

Talking About Muslims in Middle French: The Potential of Word-to-Vector Models for Studying Semantic Relationships in Medieval Languages
by Kimberly Lifton Medieval vernaculars are notoriously tricky for digital humanists to work with because they lack standardized spelling. Especially when using out-of-the-box libraries and software, most Natural Language Processing (NLP) techniques simply do not work well for medieval languages. However, word-to-vector models have the capacity to handle noise like spelling variants when trained on … „Talking About Muslims in Middle French: The Potential of Word-to-Vector Models for Studyin…

@akosma@mastodon.online
2025-06-03 06:53:50

Hubert-Félix Thiéfaine and Nick Cave are two impersonators of the same djinn singing in two different languages at once

@arXiv_csLO_bot@mastoxiv.page
2025-06-05 09:41:03

This https://arxiv.org/abs/2505.15053 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLO_…

Owicki--Gries Logic for Timestamp Semantics
Whereas an extension with non-interference of Hoare logic for sequential programs Owicki--Gries logic ensures the correctness of concurrent programs on strict consistency, it is unsound to weak memory models adopted by modern computer architectures and specifications of programming languages. This paper proposes a novel non-interference notion and provides concurrent program logic sound to timestamp semantics corresponding to a weak memory model that allows delays in the effects of store instru…

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 09:54:30

Natural language processing for African languages
David Ifeoluwa Adelani
https://arxiv.org/abs/2507.00297 https://arxiv.org/pdf/2507.…

Natural language processing for African languages
Recent advances in word embeddings and language models use large-scale, unlabelled data and self-supervised learning to boost NLP performance. Multilingual models, often trained on web-sourced data like Wikipedia, face challenges: few low-resource languages are included, their data is often noisy, and lack of labeled datasets makes it hard to evaluate performance outside high-resource languages like English. In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the …

@netzschleuder@social.skewed.de
2025-06-24 05:00:04

unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted

unicodelang: Languages spoken by country (2015). 868 nodes, 1255 edges. https://networks.skewed.de/net/unicodelang

unicodelang — Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.

@arXiv_csFL_bot@mastoxiv.page
2025-06-05 07:17:44

[2025-06-05 Thu (UTC), no new articles found for cs.FL Formal Languages and Automata Theory]
#toXiv_bot_toot

@arXiv_qbioNC_bot@mastoxiv.page
2025-06-26 09:02:10

Brains and language models converge on a shared conceptual space across different languages
Zaid Zada, Samuel A Nastase, Jixing Li, Uri Hasson
https://arxiv.org/abs/2506.20489

Brains and language models converge on a shared conceptual space across different languages
Human languages differ widely in their forms, each having distinct sounds, scripts, and syntax. Yet, they can all convey similar meaning. Do different languages converge on a shared neural substrate for conceptual meaning? We used language models (LMs) and naturalistic fMRI to identify neural representations of the shared conceptual meaning of the same story as heard by native speakers of three languages: English, Chinese, and French. We found that LMs trained on entirely different languages co…

@arXiv_csPL_bot@mastoxiv.page
2025-06-06 07:20:42

[2025-06-06 Fri (UTC), no new articles found for cs.PL Programming Languages]
#toXiv_bot_toot

@lysander07@sigmoid.social
2025-06-02 14:13:22

Excellent keynote in the #SemDH2025 workshop by Laura Hollink on Cultural Bias in Linked Open Data. Laura is addressing all bias related aspects in cultural heritage items itself, in the data representing it, the data schemata, vocabularies, and ontologies on which the data are based, as well as in the knowledge representation languages used to create the schemata.

Laura Hollink presenting her keynote at Semantic Digital Humanities 2025 Workshop, standing in front of the projection screen.

@Techmeme@techhub.social
2025-06-04 06:40:58

How Morgan Stanley is using its DevGen.AI tool, built in-house on OpenAI's GPT models, to translate legacy code into modern coding languages (Isabelle Bousquette/Wall Street Journal)
https://www.wsj.com/article…

@cdarwin@c.im
2025-07-01 18:49:53

The Trump administration has accomplished something that Hitler, Stalin, Mao, and other dictators desired.
-- It destroyed the Voice of America.
Until mid-March, VOA had been on the air continuously for 83 years.
Starting in 1942 with shortwave broadcasts in German to counter Nazi propaganda,
America’s external voice had expanded to nearly 50 languages,
with a weekly combined audience of more than 350 million people worldwide, watching on TV, listening on radi…

I Fought To Keep VOA Independent. Now It’s Gone.
The Trump administration has accomplished something that Hitler, Stalin, Mao, and other dictators desired. It destroyed the Voice of America. Until mid-March, VOA had been on the air continuously for 83 years. Starting in 1942 with shortwave broadcasts in German to counter Nazi propaganda, America’s...

@arXiv_csSE_bot@mastoxiv.page
2025-06-05 09:44:57

This https://arxiv.org/abs/2506.02943 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

A Multi-agent LLM-based JUit Test Generation with Strong Oracles
Unit testing plays a critical role in ensuring software correctness. However, writing unit tests manually is laborious, especially for strong typed languages like Java, motivating the need for automated approaches. Traditional methods primarily rely on search-based or randomized algorithms to generate tests that achieve high code coverage and produce regression oracles, which are derived from the program's current behavior rather than its intended functionality. Recent advances in large languag…

@arXiv_csIR_bot@mastoxiv.page
2025-07-01 09:04:23

Teaching a Language Model to Speak the Language of Tools
Simeon Emanuilov
https://arxiv.org/abs/2506.23394 https://arxiv.org/pdf/2506…

Teaching a Language Model to Speak the Language of Tools
External tool integration through function-calling is essential for practical language model applications, yet most multilingual models lack reliable tool-use capabilities in non-English languages. Even state-of-the-art multilingual models struggle with determining when to use tools and generating the structured outputs required for function calls, often exhibiting language confusion when prompted in lower-resource languages. This work presents a methodology for adapting existing language model…

@arXiv_mathCO_bot@mastoxiv.page
2025-06-03 17:43:02

This https://arxiv.org/abs/2505.22620 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_mat…

Counting big Ramsey degrees of the homogeneous and universal $K_4$-free graph
Big Ramsey degrees of Fraïssé limits of finitely constrained free amalgamation classes in finite binary languages have been recently fully characterised by Balko, Chodounský, Dobrinen, Hubička, Konečný, Vena, and Zucker. A special case of this characterisation is the universal homogeneous $K_4$-free graph. We give a self-contained and relatively compact presentation of this case and compute the actual big Ramsey degrees of small graphs.

@ginevra@hachyderm.io
2025-06-20 00:35:29

Language learning has been part of me since high school. I'm solid in 2 non-English languages, crappy but survivable in 2 others. I've played with & started learning others many times.
I'm real busy rn, but language learning could be a fun thing to do for myself & make me feel like I'm still me.
But I'm stumped about my language picks. I learnt the obvious European languages in school; later tried key Asian languages. What do I want to do now?
African languages? I won't be getting a chance to use them much in Aus, & I'm unlikely to get to a stage where I can read literature.
I tried Slovenian/Slovene on a whim & really love it, but I'll never go there. Is the practical but unfun answer grind out more kanji/hanzi? Or is whimsically learning a language spoken by only 2.5 million people reasonable? I will continue struggling through with Ukrainian, 'cause I think it's important.
#LanguageLearning

@arXiv_csSE_bot@mastoxiv.page
2025-06-05 07:22:07

Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation
Qiming Zhu, Jialun Cao, Xuanang Chen, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, Shing-Chi Cheung
https://arxiv.org/abs/2506.03535

Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation
Current research on large language models (LLMs) with retrieval-augmented code generation (RACG) mainly focuses on single-language settings, leaving cross-lingual effectiveness and security unexplored. Multi-lingual RACG systems are valuable for migrating code-bases across programming languages (PLs), yet face risks from error (e.g. adversarial data corruption) propagation in cross-lingual transfer. We construct a dataset spanning 13 PLs with nearly 14k instances to explore utility and robustne…

@arXiv_csFL_bot@mastoxiv.page
2025-07-03 12:39:25

Replaced article(s) found for cs.FL. https://arxiv.org/list/cs.FL/new
[1/1]:
- Dynamic Membership for Regular Tree Languages
Antoine Amarilli, Corentin Barloy, Louis Jachiet, Charles Paperman

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 10:08:30

NIRANTAR: Continual Learning with New Languages and Domains on Real-world Speech Data
Tahir Javed, Kaushal Bhogale, Mitesh M. Khapra
https://arxiv.org/abs/2507.00534

NIRANTAR: Continual Learning with New Languages and Domains on Real-world Speech Data
We introduce Nirantar, a comprehensive framework for evaluating continual learning (CL) in multilingual and multi-domain ASR. Designed to reflect real-world CL challenges, Nirantar leverages data collected incrementally across 22 languages and 208 districts in India through natural episodes. This enables evaluation across Language-Incremental (LIL), Domain-Incremental (DIL), and the novel Language-Incremental Domain-Incremental Learning (LIDIL) scenarios. Unlike prior work that relies on simula…

@juandesant@astrodon.social
2025-06-28 16:57:47

Looks like this headline was constructed to violate @…’s law…
https://apple.news/AxKolRBihRpy493vkr7KB2Q

Is being bilingual good for your brain? — The Economist
Perhaps. Learning languages offers other, more concrete benefits

@arXiv_eessAS_bot@mastoxiv.page
2025-06-04 07:29:31

Dhvani: A Weakly-supervised Phonemic Error Detection and Personalized Feedback System for Hindi
Arnav Rustagi, Satvik Bajpai, Nimrat Kaur, Siddharth Siddharth
https://arxiv.org/abs/2506.02166

Dhvani: A Weakly-supervised Phonemic Error Detection and Personalized Feedback System for Hindi
Computer-Assisted Pronunciation Training (CAPT) has been extensively studied for English. However, there remains a critical gap in its application to Indian languages with a base of 1.5 billion speakers. Pronunciation tools tailored to Indian languages are strikingly lacking despite the fact that millions learn them every year. With over 600 million speakers and being the fourth most-spoken language worldwide, improving Hindi pronunciation is a vital first step toward addressing this gap. This …

@simon_brooke@mastodon.scot
2025-06-27 12:14:58

"It's this brutal fragility of vector stacks — which are used by most modern computer languages — which makes software people so wary of fully exploiting the beauty and power of recursion, and I really think that's a shame"
#Lisp

The properties of the system, and their values
Lisp is the list processing language; that is what its name means. It processes data structures built of lists — which may be lists of lists, or lists of numbers, or lists of any other sort of data item provided for by the designers of the system.But how is a list, in a computer, actually implemented?

@arXiv_csSD_bot@mastoxiv.page
2025-06-04 07:33:42

Breaking the Barriers of Text-Hungry and Audio-Deficient AI
Hamidou Tembine, Issa Bamia, Massa NDong, Bakary Coulibaly, Oumar Issiaka Traore, Moussa Traore, Moussa Sanogo, Mamadou Eric Sangare, Salif Kante, Daryl Noupa Yongueng, Hafiz Tiomoko Ali, Malik Tiomoko, Frejus Laleye, Boualem Djehiche, Wesmanegda Elisee Dipama, Idris Baba Saje, Hammid Mohammed Ibrahim, Moumini Sanogo, Marie Coursel Nininahazwe, Abdul-Latif Siita, Haine Mhlongo, Teddy Nelvy Dieu Merci Kouka, Mariam Serine Jerid…

Breaking the Barriers of Text-Hungry and Audio-Deficient AI
While global linguistic diversity spans more than 7164 recognized languages, the current dominant architecture of machine intelligence remains fundamentally biased toward written text. This bias excludes over 700 million people particularly in rural and remote regions who are audio-literate. In this work, we introduce a fully textless, audio-to-audio machine intelligence framework designed to serve this underserved population, and all the people who prefer audio-efficiency. Our contributions in…

@arXiv_csPL_bot@mastoxiv.page
2025-06-03 07:23:40

Using Code Snippets to Teach Programming Languages
Joshua Akingbade, Jianhua Yang, Mir Seyedebrahimi
https://arxiv.org/abs/2506.00404 https://

Using Code Snippets to Teach Programming Languages
Coding is a fundamental skill required in the engineering discipline, and much work exists exploring better ways of teaching coding in the higher education context. In particular, Code Snippets (CSs) are approved to be an effective way of introducing programming language units to students. CSs are portions of source code of varying size and content. They can be used in a myriad of ways, one of which is to teach the code they contain as well as its function. To further explore the use of CSs, a …

@arXiv_csLO_bot@mastoxiv.page
2025-07-04 07:58:21

Decision algorithms for fragments of real analysis. III: A theory of differentiable functions with (semi-)open intervals
G. Buriola, D. Cantone, G. Cincotti, E. G. Omodeo, G. T. Spart\`a
https://arxiv.org/abs/2507.02742

Decision algorithms for fragments of real analysis. III: A theory of differentiable functions with (semi-)open intervals
This paper enriches preexisting satisfiability tests for unquantified languages, which in turn augment a fragment of Tarski's elementary algebra with unary real functions possessing a continuous first derivative. Two sorts of individual variables are available, one ranging over real numbers and the other one ranging over the functions of interest. Numerical terms are built from real variables through constructs designating the four basic arithmetic operations and through the function-applicat…

@arXiv_csSE_bot@mastoxiv.page
2025-06-06 07:22:14

A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair
Zanis Ali Khan, Aayush Garg, Qiang Tang
https://arxiv.org/abs/2506.04987 https://…

A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair
Software vulnerabilities pose significant security threats, requiring effective mitigation. While Automated Program Repair (APR) has advanced in fixing general bugs, vulnerability patching, a security-critical aspect of APR remains underexplored. This study investigates pre-trained language models, CodeBERT and CodeT5, for automated vulnerability patching across six datasets and four languages. We evaluate their accuracy and generalization to unknown vulnerabilities. Results show that while bot…

@Techmeme@techhub.social
2025-07-03 05:55:52

A look at the ~$377 TranscribeGlass smart glasses that use AI to subtitle conversations in nearly real time, built for the deaf or hard-of-hearing (Boone Ashworth/Wired)
https://www.wired.com/story/these-translating-ai-glasses-put-subtitles-on-…

These Transcribing Eyeglasses Put Subtitles on the World
TranscribeGlass can subtitle conversations in nearly real time and will soon be able to translate languages and tell you when the person you’re talking to you is feeling socially awkward.

@arXiv_csCL_bot@mastoxiv.page
2025-07-03 10:11:20

Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages
Samridhi Raj Sinha, Rajvee Sheth, Abhishek Upperwal, Mayank Singh
https://arxiv.org/abs/2507.01853

Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages
The rapid advancement of Large Language Models (LLMs) has intensified the need for evaluation frameworks that go beyond English centric benchmarks and address the requirements of linguistically diverse regions such as India. We present EKA-EVAL, a unified and production-ready evaluation framework that integrates over 35 benchmarks, including 10 Indic-specific datasets, spanning categories like reasoning, mathematics, tool use, long-context understanding, and reading comprehension. Compared to e…

@arXiv_csFL_bot@mastoxiv.page
2025-07-02 07:37:39

Eilenberg correspondence for Stone recognition
Jorge Almeida, Ond\v{r}ej Kl\'ima
https://arxiv.org/abs/2507.00409 https://arxiv.o…

Eilenberg correspondence for Stone recognition
We develop and explore the idea of recognition of languages (in the general sense of subsets of topological algebras) as preimages of clopen sets under continuous homomorphisms into Stone topological algebras. We obtain an Eilenberg correspondence between varieties of languages and varieties of ordered Stone topological algebras and a Birkhoff/Reiterman-type theorem showing that the latter may me defined by certain pseudo-inequalities. In the case of classical formal languages, of words over a …

@arXiv_csPL_bot@mastoxiv.page
2025-07-04 12:23:54

Replaced article(s) found for cs.PL. https://arxiv.org/list/cs.PL/new
[1/1]:
- A Lightweight Method for Generating Multi-Tier JIT Compilation Virtual Machine in a Meta-Tracing ...
Yusuke Izawa, Hidehiko Masuhara, Carl Friedrich Bolz-Tereick

@arXiv_csSE_bot@mastoxiv.page
2025-06-05 07:24:21

Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems
Sven Kirchner, Alois C. Knoll
https://arxiv.org/abs/2506.04038

Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems
Developing safety-critical automotive software presents significant challenges due to increasing system complexity and strict regulatory demands. This paper proposes a novel framework integrating Generative Artificial Intelligence (GenAI) into the Software Development Lifecycle (SDLC). The framework uses Large Language Models (LLMs) to automate code generation in languages such as C++, incorporating safety-focused practices such as static verification, test-driven development and iterative refi…

@fanf@mendeddrum.org
2025-06-29 17:42:03

from my link log —
Writing that can change how you think about programming languages.
https://bernsteinbear.com/blog/pl-writing/
saved 2025-05-13

Writing that changed how I think about PL
Every so often I come across a paper, blog post, or (occasionally) video that completely changes how I think about a topic in programming languages and compilers. For some of these posts, I can’t even remember how I thought about the idea before reading it—it was that impactful.

@netzschleuder@social.skewed.de
2025-06-13 05:00:04

unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted

unicodelang — Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.

@arXiv_csFL_bot@mastoxiv.page
2025-07-04 12:17:40

Replaced article(s) found for cs.FL. https://arxiv.org/list/cs.FL/new
[1/1]:
- Universality Frontier for Asynchronous Cellular Automata
Ivan Baburin, Matthew Cook, Florian Gr\"otschla, Andreas Plesner, Roger Wattenhofer

@arXiv_csPL_bot@mastoxiv.page
2025-06-03 16:12:25

This https://arxiv.org/abs/2505.16764 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csPL_…

Can a domain-specific language improve program structure comprehension of data pipelines? A mixed-methods study
In many application domains, domain-specific languages can allow domain experts to contribute to collaborative projects more correctly and efficiently. To do so, they must be able to understand program structure from reading existing source code. With high-quality data becoming an increasingly important resource, the creation of data pipelines is an important application domain for domain-specific languages. We execute a mixed-method study consisting of a controlled experiment and a follow-up…

@arXiv_csSE_bot@mastoxiv.page
2025-07-03 08:43:20

Combining Type Inference and Automated Unit Test Generation for Python
Lukas Krodinger, Stephan Lukasczyk, Gordon Fraser
https://arxiv.org/abs/2507.01477 h…

Combining Type Inference and Automated Unit Test Generation for Python
Automated unit test generation is an established research field that has so far focused on statically-typed programming languages. The lack of type information in dynamically-typed programming languages, such as Python, inhibits test generators, which heavily rely on information about parameter and return types of functions to select suitable arguments when constructing test cases. Since automated test generators inherently rely on frequent execution of candidate tests, we make use of these fre…

@arXiv_csCL_bot@mastoxiv.page
2025-07-04 13:15:43

Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[2/3]:
- Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
Hao Wang, Pinzhi Huang, Jihan Yang, Saining Xie, Daisuke Kawahara

@arXiv_csPL_bot@mastoxiv.page
2025-06-04 07:23:27

[2025-06-04 Wed (UTC), 2 new articles found for cs.PL Programming Languages]
#toXiv_bot_toot

@arXiv_csFL_bot@mastoxiv.page
2025-07-04 07:35:41

[2025-07-04 Fri (UTC), 1 new article found for cs.FL Formal Languages and Automata Theory]
toXiv_bot_toot

@arXiv_csSE_bot@mastoxiv.page
2025-07-01 10:18:03

What Challenges Do Developers Face When Using Verification-Aware Programming Languages?
Francisco Oliveira, Alexandra Mendes, Carolina Carreira
https://arxiv.org/abs/2506.23696

What Challenges Do Developers Face When Using Verification-Aware Programming Languages?
Software reliability is critical in ensuring that the digital systems we depend on function correctly. In software development, increasing software reliability often involves testing. However, for complex and critical systems, developers can use Design by Contract (DbC) methods to define precise specifications that software components must satisfy. Verification-Aware (VA) programming languages support DbC and formal verification at compile-time or run-time, offering stronger correctness guarant…

@netzschleuder@social.skewed.de
2025-06-11 15:00:03

unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted

unicodelang — Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 09:34:40

EfficientXLang: Towards Improving Token Efficiency Through Cross-Lingual Reasoning
Sanchit Ahuja, Praneetha Vaddamanu, Barun Patra
https://arxiv.org/abs/2507.00246

EfficientXLang: Towards Improving Token Efficiency Through Cross-Lingual Reasoning
Despite recent advances in Language Reasoning Models (LRMs), most research focuses solely on English, even though many models are pretrained on multilingual data. In this work, we investigate: Is English the most token-efficient language for reasoning? We evaluate three open-source RLMs: DeepSeek R1, Qwen 2.5 and Qwen 3, across four math datasets and seven typologically diverse languages. We find that reasoning in non-English languages not only reduces token usage, but also preserves accuracy. …

@arXiv_eessAS_bot@mastoxiv.page
2025-07-04 08:18:51

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams
Zirui Li, Lauri Juvela, Mikko Kurimo
https://arxiv.org/abs/2507.02115 https://

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams
Synthesizing second-language (L2) speech is potentially highly valued for L2 language learning experience and feedback. However, due to the lack of L2 speech synthesis datasets, it is difficult to synthesize L2 speech for low-resourced languages. In this paper, we provide a practical solution for editing native speech to approximate L2 speech and present PPG2Speech, a diffusion-based multispeaker Phonetic-Posteriorgrams-to-Speech model that is capable of editing a single phoneme without text al…

@fanf@mendeddrum.org
2025-06-24 20:42:03

from my link log —
HAFLANG: hardware acceleration of functional languages.
https://haflang.github.io/
saved 2025-03-22 https://dotat.at…

@Techmeme@techhub.social
2025-06-29 05:35:53

San Diego-based Clearspeed, which offers AI-driven voice-based risk assessment tech for 60 languages, raised a $60M Series D, taking its total funding to $110M (Duncan Riley/SiliconANGLE)
https://siliconangle.com/2025/06/26/cl

Clearspeed raises $60M to expand voice-based risk assessment platform - SiliconANGLE
Clearspeed raises $60M to expand voice-based risk assessment platform - SiliconANGLE

@arXiv_csFL_bot@mastoxiv.page
2025-06-04 07:19:59

[2025-06-04 Wed (UTC), no new articles found for cs.FL Formal Languages and Automata Theory]
#toXiv_bot_toot

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:56:59

Text2Cypher Across Languages: Evaluating Foundational Models Beyond English
Makbule Gulcin Ozsoy, William Tai
https://arxiv.org/abs/2506.21445 https://

Text2Cypher Across Languages: Evaluating Foundational Models Beyond English
Recent advances in large language models have enabled natural language interfaces that translate user questions into database queries, such as Text2SQL, Text2SPARQL, and Text2Cypher. While these interfaces enhance database accessibility, most research today focuses solely on English, with limited evaluation in other languages. This paper investigates the performance of foundational LLMs on the Text2Cypher task across multiple languages. We create and release a multilingual test set by translati…

@arXiv_csSE_bot@mastoxiv.page
2025-06-02 10:03:45

This https://arxiv.org/abs/2503.19217 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages
The rapid evolution of large language models (LLMs) has opened new possibilities for automating various tasks in software development. This paper evaluates the capabilities of the Llama 2-70B model in automating these tasks for scientific applications written in commonly used programming languages. Using representative test problems, we assess the model's capacity to generate code, documentation, and unit tests, as well as its ability to translate existing code between commonly used programming…

@arXiv_csPL_bot@mastoxiv.page
2025-06-03 07:23:09

[2025-06-03 Tue (UTC), 2 new articles found for cs.PL Programming Languages]
#toXiv_bot_toot

@arXiv_csSE_bot@mastoxiv.page
2025-06-04 07:38:55

A Multi-agent LLM-based JUit Test Generation with Strong Oracles
Qinghua Xu, Guancheng Wang, Lionel Briand, Kui Liu
https://arxiv.org/abs/2506.02943 https:…

A Multi-agent LLM-based JUit Test Generation with Strong Oracles
Unit testing plays a critical role in ensuring software correctness. However, writing unit tests manually is laborious, especially for strong typed languages like Java, motivating the need for automated approaches. Traditional methods primarily rely on search-based or randomized algorithms to generate tests that achieve high code coverage and produce regression oracles, which are derived from the program's current behavior rather than its intended functionality. Recent advances in large languag…

@arXiv_csFL_bot@mastoxiv.page
2025-07-01 07:34:53

Programmable Co-Transcriptional Splicing: Realizing Regular Languages via Hairpin Deletion
Da-Jung Cho, Szil\'ard Zsolt Fazekas, Shinnosuke Seki, Max Wiedenh\"oft
https://arxiv.org/abs/2506.23384

Programmable Co-Transcriptional Splicing: Realizing Regular Languages via Hairpin Deletion
RNA co-transcriptionality, where RNA is spliced or folded during transcription from DNA templates, offers promising potential for molecular programming. It enables programmable folding of nano-scale RNA structures and has recently been shown to be Turing universal. While post-transcriptional splicing is well studied, co-transcriptional splicing is gaining attention for its efficiency, though its unpredictability still remains a challenge. In this paper, we focus on engineering co-transcriptiona…

@arXiv_csSE_bot@mastoxiv.page
2025-06-04 07:26:19

Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability
Mengliang He, Jiayi Zeng, Yankai Jiang, Wei Zhang, Zeming Liu, Xiaoming Shi, Aimin Zhou
https://arxiv.org/abs/2506.02073

Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability
While large language models (LLMs) show promise in code generation, existing benchmarks neglect the flowchart-based code generation. To promote further research on flowchart-based code generation, this work presents Flow2Code, a novel benchmark for flowchart-based code generation evaluation. The evaluation dataset spans 15 programming languages and includes 5,622 code segments paired with 16,866 flowcharts of three types: code, UML, and pseudocode. Extensive experiments with 13 multimodal LLMs …

@arXiv_csCL_bot@mastoxiv.page
2025-07-03 10:11:50

DIY-MKG: An LLM-Based Polyglot Language Learning System
Kenan Tang, Yanhong Li, Yao Qin
https://arxiv.org/abs/2507.01872 https://arxi…

DIY-MKG: An LLM-Based Polyglot Language Learning System
Existing language learning tools, even those powered by Large Language Models (LLMs), often lack support for polyglot learners to build linguistic connections across vocabularies in multiple languages, provide limited customization for individual learning paces or needs, and suffer from detrimental cognitive offloading. To address these limitations, we design Do-It-Yourself Multilingual Knowledge Graph (DIY-MKG), an open-source system that supports polyglot language learning. DIY-MKG allows the…

@arXiv_csCL_bot@mastoxiv.page
2025-07-03 10:16:50

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman
https://arxiv.org/abs/2507.01931

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
In recent years, neural models trained on large multilingual text and speech datasets have shown great potential for supporting low-resource languages. This study investigates the performances of two state-of-the-art Automatic Speech Recognition (ASR) models, OpenAI's Whisper (Small & Large-V2) and Facebook's Wav2Vec-BERT on Bangla, a low-resource language. We have conducted experiments using two publicly available datasets: Mozilla Common Voice-17 and OpenSLR to evaluate model performances. Th…

@arXiv_csFL_bot@mastoxiv.page
2025-07-03 07:52:50

[2025-07-03 Thu (UTC), 1 new article found for cs.FL Formal Languages and Automata Theory]
toXiv_bot_toot

@arXiv_csPL_bot@mastoxiv.page
2025-07-01 16:19:32

Replaced article(s) found for cs.PL. https://arxiv.org/list/cs.PL/new
[1/1]:
- The Cyan Language
Jos\'e de Oliveira Guimar\~aes
https://

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:38

Code-Switching and Syntax: A Large-Scale Experiment
Igor Sterner, Simone Teufel
https://arxiv.org/abs/2506.01846 https://arxiv.org/pd…

Code-Switching and Syntax: A Large-Scale Experiment
The theoretical code-switching (CS) literature provides numerous pointwise investigations that aim to explain patterns in CS, i.e. why bilinguals switch language in certain positions in a sentence more often than in others. A resulting consensus is that CS can be explained by the syntax of the contributing languages. There is however no large-scale, multi-language, cross-phenomena experiment that tests this claim. When designing such an experiment, we need to make sure that the system that is p…

@arXiv_csFL_bot@mastoxiv.page
2025-06-03 07:19:06

[2025-06-03 Tue (UTC), 1 new article found for cs.FL Formal Languages and Automata Theory]
#toXiv_bot_toot

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:21

NAVER LABS Europe Submission to the Instruction-following Track
Beomseok Lee, Marcely Zanon Boito, Laurent Besacier, Ioan Calapodescu
https://arxiv.org/abs/2506.01808

NAVER LABS Europe Submission to the Instruction-following Track
In this paper we describe NAVER LABS Europe submission to the instruction-following speech processing short track at IWSLT 2025. We participate in the constrained settings, developing systems that can simultaneously perform ASR, ST, and SQA tasks from English speech input into the following target languages: Chinese, Italian, and German. Our solution leverages two pretrained modules: (1) a speech-to-LLM embedding projector trained using representations from the SeamlessM4T-v2-large speech encod…

@arXiv_csPL_bot@mastoxiv.page
2025-06-02 07:20:34

[2025-06-02 Mon (UTC), 1 new article found for cs.PL Programming Languages]
#toXiv_bot_toot

@arXiv_csFL_bot@mastoxiv.page
2025-07-02 13:08:10

Replaced article(s) found for cs.FL. https://arxiv.org/list/cs.FL/new
[1/1]:
- Computing Threshold Budgets in Discrete-Bidding Games
Guy Avni, Suman Sadhukhan

@arXiv_csCL_bot@mastoxiv.page
2025-07-03 10:03:40

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang
https://arxiv.org/abs/2507.01785

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
Data quality is a critical driver of large language model performance, yet existing model-based selection methods focus almost exclusively on English. We introduce MuRating, a scalable framework that transfers high-quality English data-quality signals into a single rater for 17 target languages. MuRating aggregates multiple English "raters" via pairwise comparisons to learn unified document-quality scores,then projects these judgments through translation to train a multilingual evaluator on mon…

@arXiv_csSE_bot@mastoxiv.page
2025-06-03 17:38:46

This https://arxiv.org/abs/2505.23671 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
Developing high-performance software is a complex task that requires specialized expertise. We introduce GSO, a benchmark for evaluating language models' capabilities in developing high-performance software. We develop an automated pipeline that generates and executes performance tests to analyze repository commit histories to identify 102 challenging optimization tasks across 10 codebases, spanning diverse domains and programming languages. An agent is provided with a codebase and performance …

@arXiv_csCL_bot@mastoxiv.page
2025-06-30 07:55:49

Efficient Multilingual ASR Finetuning via LoRA Language Experts
Jiahong Li, Yiwen Shao, Jianheng Zhuo, Chenda Li, Liliang Tang, Dong Yu, Yanmin Qian
https://arxiv.org/abs/2506.21555

Efficient Multilingual ASR Finetuning via LoRA Language Experts
Recent advancements in deep learning have significantly enhanced multilingual automatic speech recognition (ASR) due to the development of advanced model architectures and available large-scale multilingual datasets. Despite that, multilingual ASR still suffers from the curse of multilinguality in that different languages tend to interfere with each other, making it difficult for the ASR model to identify multiple languages effectively while sharing model capacity across them. This paper propos…

@arXiv_csFL_bot@mastoxiv.page
2025-06-18 08:16:09

Measure-Theoretic Aspects of Star-Free and Group Languages
Ryoma Sin'ya, Takao Yuyama
https://arxiv.org/abs/2506.14134 https://ar…

Measure-Theoretic Aspects of Star-Free and Group Languages
A language $L$ is said to be ${\cal C}$-measurable, where ${\cal C}$ is a class of languages, if there is an infinite sequence of languages in ${\cal C}$ that ``converges'' to $L$. We investigate the properties of ${\cal C}$-measurability in the cases where ${\cal C}$ is SF, the class of all star-free languages, and G, the class of all group languages. It is shown that a language $L$ is SF-measurable if and only if $L$ is GD-measurable, where GD is the class of all generalised definite language…

@arXiv_csPL_bot@mastoxiv.page
2025-05-29 07:20:53

An instance of FreeCHR with refined operational semantics
Sascha Rechenberger, Thom Fr\"uhwirth
https://arxiv.org/abs/2505.22155 https://

An instance of FreeCHR with refined operational semantics
Constraint Handling Rules (CHR) is a rule-based programming language which is typically embedded into a general-purpose language. There exists a plethora of implementations of CHR for numerous host languages. However, the existing implementations often reinvent the way to embed CHR, which impedes maintenance and weakens assertions of correctness. To formalize and thereby unify the embedding of CHR into arbitrary host languages, we introduced the framework FreeCHR and proved it to be a valid rep…

@arXiv_csFL_bot@mastoxiv.page
2025-07-01 16:15:42

Replaced article(s) found for cs.FL. https://arxiv.org/list/cs.FL/new
[1/1]:
- Bridging Chaos Game Representations and $k$-mer Frequencies of DNA Sequences
Haoze He, Lila Kari, Pablo Millan Arias

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 10:13:00

Transferable Modeling Strategies for Low-Resource LLM Tasks: A Prompt and Alignment-Based
Shuangquan Lyu, Yingnan Deng, Guiran Liu, Zhen Qi, Ruotong Wang
https://arxiv.org/abs/2507.00601

Transferable Modeling Strategies for Low-Resource LLM Tasks: A Prompt and Alignment-Based
This paper addresses the limited transfer and adaptation capabilities of large language models in low-resource language scenarios. It proposes a unified framework that combines a knowledge transfer module with parameter-efficient fine-tuning strategies. The method introduces knowledge alignment loss and soft prompt tuning to guide the model in effectively absorbing the structural features of target languages or tasks under minimal annotation. This enhances both generalization performance and tr…

@arXiv_csFL_bot@mastoxiv.page
2025-07-02 07:34:20

[2025-07-02 Wed (UTC), 1 new article found for cs.FL Formal Languages and Automata Theory]
toXiv_bot_toot

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 10:16:20

Contrasting Cognitive Styles in Vision-Language Models: Holistic Attention in Japanese Versus Analytical Focus in English
Ahmed Sabir, Azinovi\v{c} Gasper, Mengsay Loem, Rajesh Sharma
https://arxiv.org/abs/2507.00700

Contrasting Cognitive Styles in Vision-Language Models: Holistic Attention in Japanese Versus Analytical Focus in English
Cross-cultural research in perception and cognition has shown that individuals from different cultural backgrounds process visual information in distinct ways. East Asians, for example, tend to adopt a holistic perspective, attending to contextual relationships, whereas Westerners often employ an analytical approach, focusing on individual objects and their attributes. In this study, we investigate whether Vision-Language Models (VLMs) trained predominantly on different languages, specifically …

@arXiv_csFL_bot@mastoxiv.page
2025-06-02 07:17:43

[2025-06-02 Mon (UTC), no new articles found for cs.FL Formal Languages and Automata Theory]
#toXiv_bot_toot

Tootfinder

Opt-in global Mastodon full text search. Join the index!