Tootfinder

@tante@tldr.nettime.org
2025-06-27 15:20:42

This comes down to the cyberlibertarian roots of most digital movements (thing Archive.org, EFF, EDRI etc.): To them "open" is a value in itself and any political values are read as "restrictions" or "regulation" or "lack of freedom".
https://indieweb.social/@jaredw…

Jared White (🏳️‍⚧️ ally) (@jaredwhite@indieweb.social)
Goofy propaganda like this is why I consider the "pop" open source movement to be bereft of a true moral center. Ultimately, it doesn't really matter if the "code" is open if the training data is based on theft and the eventual use cases are unethical. 🤨 Imagine touting "open" weapons blueprints or "open" narcotics. Sure, the code/blueprint/recipe is open. So????? Don't make it right. https://social.lfx.dev/@linuxfoundation/114755559759681557

@tiotasram@kolektiva.social
2025-07-25 10:57:58

Just saw this:
#AI can mean a lot of things these days, but lots of the popular meanings imply a bevy of harms that I definitely wouldn't feel are worth a cute fish game. In fact, these harms are so acute that even "just" playing into the AI hype becomes its own kind of harm (it's similar to blockchain in that way).
@… noticed that the authors claim the code base is 80% AI generated, which is a red flag because people with sound moral compasses wouldn't be using AI to "help" write code in the first place. The authors aren't by some miracle people who couldn't build this app without help, in case that influences your thinking about it: they have the skills to write the code themselves, although it likely would have taken longer (but also been better).
I was more interested in the fish-classification AI, and how much it might be dependent on datacenters. Thankfully, a quick glance at the code confirms they're using ONNX and running a self-trained neural network on your device. While the exponentially-increasing energy & water demands of datacenters to support billion-parameter models are a real concern, this is not that. Even a non-AI game can burn a lot of cycles on someone's phone, and I don't think there's anything to complain about energy-wise if we're just using cycles on the end user's device as long as we're not having them keep it on for hours crunching numbers like blockchain stuff does. Running whatever stuff locally while the user is playing a game is a negligible environmental concern, unlike, say, calling out to ChatGPT where you're directly feeding datacenter demand. Since they claimed to have trained the network themselves, and since it's actually totally reasonable to make your own dataset for this and get good-enough-for-a-silly-game results with just a few hundred examples, I don't have any ethical objections to the data sourcing or training processes either. Hooray! This is finally an example of "ethical use of neutral networks" that I can hold up as an example of what people should be doing instead of the BS they are doing.
But wait... Remember what I said about feeding the AI hype being its own form of harm? Yeah, between using AI tools for coding and calling their classifier "AI" in a way that makes it seem like the same kind of thing as ChatGPT et al., they're leaning into the hype rather than helping restrain it. And that means they're causing harm. Big AI companies can point to them and say "look AI enables cute things you like" when AI didn't actually enable it. So I'm feeling meh about this cute game and won't be sharing it aside from this post. If you love the cute fish, you don't really have to feel bad for playing with it, but I'd feel bad for advertising it without a disclaimer.

@arXiv_csDC_bot@mastoxiv.page
2025-06-27 09:19:29

Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe
M{\aa}ns I. Andersson, Martin Karp, Niclas Jansson, Stefano Markidis
https://arxiv.org/abs/2506.20994

Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe
With the emergence of new high-performance computing (HPC) accelerators, such as Nvidia and AMD GPUs, efficiently targeting diverse hardware architectures has become a major challenge for HPC application developers. The increasing hardware diversity in HPC systems often necessitates the development of architecture-specific code, hindering the sustainability of large-scale scientific applications. In this work, we leverage DaCe, a data-centric parallel programming framework, to automate the gene…

@Techmeme@techhub.social
2025-06-23 12:01:17

Study: only 32 countries, mostly in the Northern Hemisphere, host AI data centers, with the US, China, and the EU controlling 50% of the world's top facilities (New York Times)
https://www.nytimes.com…

A.I. Computing Power Is Splitting the World Into Haves and Have-Nots
As countries race to power artificial intelligence, a yawning gap is opening around the world.

@mia@hcommons.social
2025-06-24 17:08:10

Noted while reading: 'a data structure or a block of code are things that make implicit and subjective arguments about how to see the world. This is possibly the single most important basic insight that Digital Humanities as a field needs to impart, because it affects so much of the world around us' - excellent post by @…

Meet a Historian: James Baillie on Digital Humanities and the Medieval Caucasus
Note from the Editor: I’m excited to feature another guest post with you all! This week we have James Baillie discussing how digital humanities and prosopographic methods can be used to bette…

@arXiv_csSE_bot@mastoxiv.page
2025-07-24 09:18:50

How Do Code Smells Affect Skill Growth in Scratch Novice Programmers?
Ricardo Hidalgo Arag\'on, Jes\'us M. Gonz\'alez-Barahona, Gregorio Robles
https://arxiv.org/abs/2507.17314

How Do Code Smells Affect Skill Growth in Scratch Novice Programmers?
Context. Code smells, which are recurring anomalies in design or style, have been extensively researched in professional code. However, their significance in block-based projects created by novices is still largely unknown. Block-based environments such as Scratch offer a unique, data-rich setting to examine how emergent design problems intersect with the cultivation of computational-thinking (CT) skills. Objective. This research explores the connection between CT proficiency and design-level c…

@arXiv_csDL_bot@mastoxiv.page
2025-05-26 07:17:10

Towards Industrial Convergence : Understanding the evolution of scientific norms and practices in the field of AI
Antoine Houssard
https://arxiv.org/abs/2505.17945

Towards Industrial Convergence : Understanding the evolution of scientific norms and practices in the field of AI
In the field of artificial intelligence (AI) research, there seems to be a rapprochement between academics and industrial forces. The aim of this study is to assess whether and to what extent industrial domination in the field as well as the ever more frequent switch between academia and industry resulted in the adoption of industrial norms and practices by academics. Using bibliometric information and data on scientific code, we aimed to understand academic and industrial researchers' practice…

@toxi@mastodon.thi.ng
2025-07-17 15:15:51

Added a customizable 2D vector field plot function for #ThingUmbrella

20x20 vector field visualization with vectors visualized as small arrows and directions mapped to different hues. Vector lengths are normalized

20x20 vector field visualization with vectors visualized as small arrows and directions mapped to different hues. Vector lengths are varying.

20x20 vector field visualization with vectors visualized as small dials

20x20 vector field visualization with vectors visualized as small black lines with red arrow heads.

Declarative, functional & multi-format data visualization toolkit based around @thi.ng/hiccup

@arXiv_csAI_bot@mastoxiv.page
2025-07-16 10:09:51

Modeling Code: Is Text All You Need?
Daniel Nichols, Konstantinos Parasyris, Harshitha Menon, Brian R. Bartoldson, Giorgis Georgakoudis, Tal Ben-Nun, Abhinav Bhatele
https://arxiv.org/abs/2507.11467

Modeling Code: Is Text All You Need?
Code LLMs have become extremely popular recently for modeling source code across a variety of tasks, such as generation, translation, and summarization. However, transformer-based models are limited in their capabilities to reason through structured, analytical properties of code, such as control and data flow. Previous work has explored the modeling of these properties with structured data and graph neural networks. However, these approaches lack the generative capabilities and scale of modern…

@arXiv_csCL_bot@mastoxiv.page
2025-07-21 09:48:50

Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies
Carlos Mena, Pol Serra, Jacobo Romero, Abir Messaoudi, Jose Giraldo, Carme Armentano-Oller, Rodolfo Zevallos, Ivan Meza, Javier Hernando
https://arxiv.org/abs/2507.13875

Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies
Code-switching (CS), the alternating use of two or more languages, challenges automatic speech recognition (ASR) due to scarce training data and linguistic similarities. The lack of dedicated CS datasets limits ASR performance, as most models rely on monolingual or mixed-language corpora that fail to reflect real-world CS patterns. This issue is critical in multilingual societies where CS occurs in informal and formal settings. A key example is Catalan-Spanish CS, widely used in media and parli…

@arXiv_csLG_bot@mastoxiv.page
2025-07-24 10:13:39

EarthLink: Interpreting Climate Signals with Self-Evolving AI Agents
Zijie Guo, Jiong Wang, Xiaoyu Yue, Wangxu Wei, Zhe Jiang, Wanghan Xu, Ben Fei, Wenlong Zhang, Xinyu Gu, Lijing Cheng, Jing-Jia Luo, Chao Li, Yaqiang Wang, Tao Chen, Wanli Ouyang, Fenghua Ling, Lei Bai
https://arxiv.org/abs/2507.17311

EarthLink: Interpreting Climate Signals with Self-Evolving AI Agents
Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can lea…

@netzschleuder@social.skewed.de
2025-06-16 16:00:04

board_directors: Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' ar…

board_directors: Norwegian Boards of Directors (2002-2011). 1266 nodes, 3696 edges. https://networks.skewed.de/net/board_directors#net1m_2005-09-01

board_directors — Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' are one-mode projections containing co-memberships among directors.

@thomasfuchs@hachyderm.io
2025-07-07 01:38:13

Even if “AI” worked (it doesn’t), there’s many reasons why you shouldn’t use it:
1. It’s destroying Internet sites that you love as you use chat bots instead of actually going to sources of information—this will cause them to be less active and eventually shut down.
2. Pollution and water use from server farms cause immediate harm; often—just like other heavy industry—these are built in underprivileged communities and harming poor people. Without any benefits as the big tech companies get tax breaks and don’t pay for power, while workers aren’t from the community but commute in.
3. The basic underlying models of any LLM rely on stolen data, even when specific extra data is obtained legally. Chatbots can’t learn to speak English just by reading open source code.
4. You’re fueling a speculation bubble that is costing many people their jobs—because the illusion of “efficiency” is kept up by firing people and counting that as profit.
5. Whenever you use the great cheat machine in the cloud you’re robbing yourself from doing real research, writing or coding—literally atrophying your brain and making you stupider.
It’s a grift, through and through.

@arXiv_csSE_bot@mastoxiv.page
2025-07-22 11:42:00

Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing
Manatsawin Hanmongkolchai
https://arxiv.org/abs/2507.15599 htt…

Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing
Large language models for code (Code LLM) are increasingly utilized in programming environments. Despite their utility, the training datasets for top LLM remain undisclosed, raising concerns about potential copyright violations. Some models, such as Pleias and Comma put emphasis on data curation and licenses, however, with limited training data these models are not competitive and only serve as proof of concepts. To improve the utility of these models, we propose an application of the "Chinese …

@arXiv_csCR_bot@mastoxiv.page
2025-06-19 08:06:33

Fair Data Exchange with Constant-Time Proofs
Majid Khabbazian
https://arxiv.org/abs/2506.14944 https://arxiv.org/pdf/2506.14944

Fair Data Exchange with Constant-Time Proofs
The Fair Data Exchange (FDE) protocol introduced at CCS 2024 offers atomic pay-per-file transfers with constant-size proofs, but its prover and verifier runtimes still scale linearly with the file length n. We collapse these costs to essentially constant by viewing the file as a rate-1 Reed-Solomon (RS) codeword, extending it to a lower-rate RS code with constant redundancy, encrypting this extended vector, and then proving correctness for only a small random subset of the resulting ciphertexts…

@jeang3nie@social.linux.pizza
2025-05-19 20:37:00

This morning I null routed another dozen IP addresses for scraping my personal git server using repeated http requests. As per usual, a quick inspection reveals that at least some of them are scraping for LLM data. As always, I have not consented to this use of my non-maintained code, experiments, college coursework, and miscellaneous crap that I for whatever reason decided to self host rather than pushing it to Codeberg.
I mean, if you really want to feed your LLM on a diet that inclu…

@berlinbuzzwords@floss.social
2025-05-14 14:00:33

LLMs are now part of our daily work, making coding easier. Join Ivan Dolgov at this year's Berlin Buzzwords to learn how they built an in-house LLM for AI code completion in JetBrains products, covering design choices, data preparation, training and model evaluation.
Learn more: https://

Session title: How to train a fast LLM for coding tasks

Join us from June 15-17 in Berlin or participate online / berlinbuzzwords.de

How to train a fast LLM for coding tasks
In this talk, we present our approach to training a code completion model using Mellum, our new open-source model, as an example. Mellum powers in-file code completion in AI-enabled JetBrains IDEs. We'll walk through the entire process, from designing the model and preparing the dataset — with emphasis on the permissiveness of using data — to the training process and evaluation strategies. Attendees will gain insights into state-of-the-art techniques and the challenges we faced and discover…

@arXiv_csNE_bot@mastoxiv.page
2025-07-22 07:59:40

DHEvo: Data-Algorithm Based Heuristic Evolution for Generalizable MILP Solving
Zhihao Zhang, Siyuan Li, Chenxi Li, Feifan Liu, Mengjing Chen, Kai Li, Tao Zhong, Bo An, Peng Liu
https://arxiv.org/abs/2507.15615

DHEvo: Data-Algorithm Based Heuristic Evolution for Generalizable MILP Solving
Primal heuristics play a critical role in improving the efficiency of mixed integer programming (MILP) solvers. As large language models (LLMs) have demonstrated superior code generation abilities, recent MILP works are devoted to leveraging the evolutionary computation approaches with LLMs to generate effective primal heuristics. Although the generated heuristics have achieved better solving performance than the hand-crafted ones with little adaptability, the advantage of current LLM-based met…

@arXiv_physicsplasmph_bot@mastoxiv.page
2025-07-23 08:43:02

Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models
Aaron Ho (MIT Plasma Science and Fusion Center, Cambridge, USA), Lorenzo Zanisi (UKAEA Culham Centre for Fusion Energy, Abingdon, UK), Bram de Leeuw (Radboud University, Nijmegen, Netherlands), Vincent Galvan (MIT Plasma Science and Fusion Center, Cambridge, USA), Pablo Rodriguez-Fernandez (MIT Plasma Science and Fusion Center, Cambridge, USA), Nath…

Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models
This work demonstrates a proof-of-principle for using uncertainty-aware architectures, in combination with active learning techniques and an in-the-loop physics simulation code as a data labeller, to construct efficient datasets for data-driven surrogate model generation. Building off of a previous proof-of-principle successfully demonstrating training set reduction on static pre-labelled datasets, using the ADEPT framework, this strategy was applied again to the plasma turbulent transport prob…

@arXiv_csHC_bot@mastoxiv.page
2025-06-19 08:22:04

Building Blocks of a User Experience Research Point of View
Patricia Diaz
https://arxiv.org/abs/2506.15332 https://arxiv.org/pdf/2506…

Building Blocks of a User Experience Research Point of View
This paper presents three User Experience Research (UXR) perspectives based on data, evidence and insights - known as Point of View (POV) - showcasing how the strategies and methods of building a POV work in an enterprise setting. The POV are: 1. Smart Visuals: Use AI to extract and translate text from visuals in videos (2019). 2. Assessable Code Editor: Focus on direct AI-feedback to the learner as it is the loop that requires the least effort for the highest impact(2023). 3. Opportunity Lands…

@mia@hcommons.social
2025-06-11 20:24:27

Very excited about this! Code to access GRIN will help lots of Google Books partners, and the example might open other doors, as well as the obvious benefits of access to data!
'Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability' https://arxiv.org/abs/2506…

Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability
Large language models (LLMs) use data to learn about the world in order to produce meaningful correlations and predictions. As such, the nature, scale, quality, and diversity of the datasets used to train these models, or to support their work at inference time, have a direct impact on their quality. The rapid development and adoption of LLMs of varying quality has brought into focus the scarcity of publicly available, high-quality training data and revealed an urgent need to ground the steward…

@Techmeme@techhub.social
2025-07-14 03:55:46

A look at WindBorne, which uses weather balloons and AI to improve forecasting, as potential budget cuts to NOAA threaten its access to public weather data (Tim Fernholz/New York Times)
https://www.

The Future of Weather Prediction Is Here. Maybe.
Thanks to A.I., companies like WindBorne hope to usher in a golden age of forecasting. But they rely in part on government data — and the agency that provides it is in turmoil.

@arXiv_csSE_bot@mastoxiv.page
2025-06-24 12:03:30

Your Token Becomes Worthless: Unveiling Rug Pull Schemes in Crypto Token via Code-and-Transaction Fusion Analysis
Hao Wu, Haijun Wang, Shangwang Li, Yin Wu, Ming Fan, Wuxia Jin, Yitao Zhao, Ting Liu
https://arxiv.org/abs/2506.18398

Your Token Becomes Worthless: Unveiling Rug Pull Schemes in Crypto Token via Code-and-Transaction Fusion Analysis
Rug pull scams have emerged as a persistent threat to cryptocurrency, causing significant financial losses. A typical scenario involves scammers deploying honeypot contracts to attract investments, restricting token sales, and draining the funds, which leaves investors with worthless tokens. Current methods either rely on predefined patterns to detect code risks or utilize statistical transaction data to train detection models. However, real-world Rug Pull schemes often involve a complex interp…

@cdarwin@c.im
2025-07-11 03:44:49

OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
Users can download and model walkable, drivable, or bikeable urban networks with a single line of Python code
-- and then easily analyze and visualize them.
You can just as easily download and work with amenities/points of interest, building footprints, elevation data, street bearings/orientations, and network routing.
If you use OSMnx in your work, please downlo…

@arXiv_csAR_bot@mastoxiv.page
2025-06-06 07:15:26

hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation
Charles Hong, Brendan Roberts, Huijae An, Alex Um, Advay Ratan, Yakun Sophia Shao
https://arxiv.org/abs/2506.04544

hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation
Large language models (LLMs) are playing an increasingly large role in domains such as code generation, including hardware code generation, where Verilog is the key language. However, the amount of publicly available Verilog code pales in comparison to the amount of code available for software languages like Python. In this work, we present hdl2v ("HDL-to-Verilog"), a dataset which seeks to increase the amount of available human-written Verilog data by translating or compiling three other hardw…

@seav@en.osm.town
2025-06-26 15:22:15

Done mapping all 15 #barangays of Bien Unido, Bohol, #Philippines 🇵🇭 in #OpenStreetMap, updating their #Wikidata

Screenshot of an Overpass Turbo query result map showing the boundaries and the node label positions of the 15 barangays of Bien Unido, Bohol, Philippines, with the relation of Pinamgo selected showing its tags.

Screenshot of the Wikidata Query Service query results showing a table of the 15 barangays of Bien Unido, Bohol, Philippines, along with some of their properties such as population, coordinates, PSGC (Philippine Standard Geographic Code), and OpenStreetMap IDs.

Screenshot of the Wikidata Query Service query results switched to the bubble chart visualization depicting the relative population sizes of the 15 barangays of Bien Unido, Bohol, Philippines.

overpass turbo
A web based data mining tool for OpenStreetMap which runs any kind of Overpass API query and shows the results on an interactive map.

@arXiv_csSE_bot@mastoxiv.page
2025-07-16 09:02:01

A Code Comprehension Benchmark for Large Language Models for Code
Jayant Havare, Saurav Chaudhary, Ganesh Ramakrishnan, Kaushik Maharajan, Srikanth Tamilselvam
https://arxiv.org/abs/2507.10641

A Code Comprehension Benchmark for Large Language Models for Code
Large Language Models have shown impressive capabilities in coding tasks like code generation and code completion, as they have been trained on a large amount of code data. Also, since one of the core pretraining objectives is Next Token Prediction, these models tends to learn surface-level syntactic patterns in code. However, this does not guarantee code comprehension ability i.e. the ability to capture the semantics of the code. In our opinion, this is the reason why these models often underp…

@shoppingtonz@mastodon.social
2025-07-21 06:40:18

I want the Micro Processor to order my Pulsar(Level 2 special operations unit) to mine coal and deposit it into the core...
I'll let you know about my progress...
So "mlog" is the "Mindustry Logic" 'language'.
mindustrygame.github.io/wiki/logic/0-introduction/
but I first started here:

Guide: Logic Basics
Logic is a mechanic introduced in Version 6.0 of Mindustry, which allows you to override the default behaviour of blocks and units through a customised block code programming language called MLog that can be edited in text editors when pasted into one. Logic is run through Processors in conjunction with accessory blocks such as the Memory Cell, Switch and Logic Display. It is recommended to have some form of prior programming experience in order to be familiar with data types. The way...

@netzschleuder@social.skewed.de
2025-07-09 06:00:03

board_directors: Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' ar…

board_directors: Norwegian Boards of Directors (2002-2011). 1544 nodes, 4429 edges. https://networks.skewed.de/net/board_directors#net1m_2007-06-01

board_directors — Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' are one-mode projections containing co-memberships among directors.

@arXiv_qbioGN_bot@mastoxiv.page
2025-05-21 07:36:42

OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking
Heng Yang, Jack Cole, Yuan Li, Renzhi Chen, Geyong Min, Ke Li
https://arxiv.org/abs/2505.14402

OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking
The code of nature, embedded in DNA and RNA genomes since the origin of life, holds immense potential to impact both humans and ecosystems through genome modeling. Genomic Foundation Models (GFMs) have emerged as a transformative approach to decoding the genome. As GFMs scale up and reshape the landscape of AI-driven genomics, the field faces an urgent need for rigorous and reproducible evaluation. We present OmniGenBench, a modular benchmarking platform designed to unify the data, model, bench…

@arXiv_grqc_bot@mastoxiv.page
2025-06-10 09:17:12

MatBYIB: A Matlab-based code for Bayesian inference of extreme mass-ratio inspiral binary with arbitrary eccentricity
Gen-Liang Li, Shu-Jie Zhao, Huai-Ke Guo, Jing-Yu Su, Zhen-Heng Lin
https://arxiv.org/abs/2506.05954

MatBYIB: A Matlab-based code for Bayesian inference of extreme mass-ratio inspiral binary with arbitrary eccentricity
Accurate parameter estimation(PE) of gravitational waves(GW) is essential for GW data analysis. In extreme mass-ratio inspiral binary(EMRI) systems, orbital eccentricity is a critical parameter for PE. However, current software for for PE of GW often neglects the direct estimation of orbital eccentricity. To fill this gap, we have developed the MatBYIB, a MATLAB-based software package for PE of GW with arbitrary eccentricity. The MatBYIB employs the Analytical Kludge (AK) waveform as a computat…

@arXiv_csSE_bot@mastoxiv.page
2025-07-17 07:59:00

MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Atharva Naik, Lawanya Baghel, Dhakshin Govindarajan, Darsh Agrawal, Daniel Fried, Carolyn Rose
https://arxiv.org/abs/2507.11687

MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Large Language Models, though successful in code generation, struggle with code quality analysis because they are limited by static training data and can't easily adapt to evolving best practices. We introduce MetaLint, a new instruction-following framework that formulates code quality analysis as the task of detecting and fixing problematic semantic code fragments or code idioms based on high-level specifications. Unlike conventional approaches that train models on static, rule-based data, Met…

@arXiv_csSD_bot@mastoxiv.page
2025-06-03 07:27:19

Improving Code Switching with Supervised Fine Tuning and GELU Adapters
Linh Pham
https://arxiv.org/abs/2506.00291 https://arxiv.org/p…

Improving Code Switching with Supervised Fine Tuning and GELU Adapters
There are few code switching datasets, labeled or unlabled, that exist today. As a result, ASR requires new methods to utilize the vast monolingual data and models that exist. This paper uses OpenAI's open source ASR model, Whisper, which has been pre-trained on 680K hours of audio to perform monolingual ASR tasks. In Part 1, this paper examines how exploiting Whisper's monolingual ability to individually tokenize training text, called "Switching Tokenizers Method", improves transcription accur…

@arXiv_csDB_bot@mastoxiv.page
2025-06-02 07:16:48

Searching Clinical Data Using Generative AI
Karan Hanswadkar, Anika Kanchi, Shivani Tripathi, Shi Qiao, Rony Chatterjee, Alekh Jindal
https://arxiv.org/abs/2505.24090

Searching Clinical Data Using Generative AI
Artificial Intelligence (AI) is making a major impact on healthcare, particularly through its application in natural language processing (NLP) and predictive analytics. The healthcare sector has increasingly adopted AI for tasks such as clinical data analysis and medical code assignment. However, searching for clinical information in large and often unorganized datasets remains a manual and error-prone process. Assisting this process with automations can help physicians improve their operationa…

@arXiv_csCR_bot@mastoxiv.page
2025-06-09 07:36:22

Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models
Mingjie Chen (Zhejiang University), Tiancheng Zhu (Huazhong University of Science,Technology), Mingxue Zhang (The State Key Laboratory of Blockchain,Data Security, Zhejiang University,Hangzhou High-Tech Zone), Yiling He (University College London), Minghao Lin (University of Southern California), Penghui Li (Columbia University), Kui Ren (The State Key Laboratory of Blockchain,Data Security, Z…

Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models
Binary code similarity detection (BCSD) serves as a fundamental technique for various software engineering tasks, e.g., vulnerability detection and classification. Attacks against such models have therefore drawn extensive attention, aiming at misleading the models to generate erroneous predictions. Prior works have explored various approaches to generating semantic-preserving variants, i.e., adversarial samples, to evaluate the robustness of the models against adversarial attacks. However, the…

@arXiv_csCL_bot@mastoxiv.page
2025-07-17 09:57:40

Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation
Ziyu Ge, Gabriel Chua, Leanne Tan, Roy Ka-Wei Lee
https://arxiv.org/abs/2507.11966 …

Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation
As online communication increasingly incorporates under-represented languages and colloquial dialects, standard translation systems often fail to preserve local slang, code-mixing, and culturally embedded markers of harmful speech. Translating toxic content between low-resource language pairs poses additional challenges due to scarce parallel data and safety filters that sanitize offensive expressions. In this work, we propose a reproducible, two-stage framework for toxicity-preserving translat…

@arXiv_csLG_bot@mastoxiv.page
2025-07-11 10:23:11

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Sukjun Hwang, Brandon Wang, Albert Gu
https://arxiv.org/abs/2507.07955 https://arxiv.org/pdf/2507.07955 https://arxiv.org/html/2507.07955
arXiv:2507.07955v1 Announce Type: new
Abstract: Despite incredible progress in language models (LMs) in recent years, largely resulting from moving away from specialized models designed for specific tasks to general models based on powerful architectures (e.g. the Transformer) that learn everything from raw data, pre-processing steps such as tokenization remain a barrier to true end-to-end foundation models. We introduce a collection of new techniques that enable a dynamic chunking mechanism which automatically learns content -- and context -- dependent segmentation strategies learned jointly with the rest of the model. Incorporating this into an explicit hierarchical network (H-Net) allows replacing the (implicitly hierarchical) tokenization-LM-detokenization pipeline with a single model learned fully end-to-end. When compute- and data- matched, an H-Net with one stage of hierarchy operating at the byte level outperforms a strong Transformer language model operating over BPE tokens. Iterating the hierarchy to multiple stages further increases its performance by modeling multiple levels of abstraction, demonstrating significantly better scaling with data and matching a token-based Transformer of twice its size. H-Nets pretrained on English show significantly increased character-level robustness, and qualitatively learn meaningful data-dependent chunking strategies without any heuristics or explicit supervision. Finally, the H-Net's improvement over tokenized pipelines is further increased in languages and modalities with weaker tokenization heuristics, such as Chinese and code, or DNA sequences (nearly 4x improvement in data efficiency over baselines), showing the potential of true end-to-end models that learn and scale better from unprocessed data.
toXiv_bot_toot

@arXiv_csDC_bot@mastoxiv.page
2025-06-12 07:30:41

Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
Mayank Arya, Yogesh Simmhan
https://arxiv.org/abs/2506.09554 https://

Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
Large Language Models (LLMs) have demonstrated exceptional benefits to a wide range of domains, for tasks as diverse as code generation and robot navigation. While LLMs are usually served from cloud data centers, mission-critical and privacy-sensitive applications may require local hosting of open LLM models. Given the large GPU memory footprint needed for LLMs, edge accelerators such as Nvidia Jetson Orin AGX with 64GB of shared GPU-CPU RAM are a compelling choice. However, the feasibility and…

@arXiv_mathQA_bot@mastoxiv.page
2025-06-12 08:32:32

A non-semisimple Kitaev lattice model
Sebastian Halbig, Ulrich Kr\"ahmer
https://arxiv.org/abs/2506.09249 https://arxiv.org/pdf/…

A non-semisimple Kitaev lattice model
The construction of the topologically protected code space of Kitaev's model for fault-tolerant quantum computation is extended from complex semisimple to arbitrary finite-dimensional Hopf algebras admitting pairs in involution. One input of the model are ribbon graphs, that is, the combinatorial data of cellular decompositions of oriented closed surfaces. The other input are certain Hopf bimodules that are closely related to the coefficients in Hopf-cyclic homology. As in previous generalisati…

@arXiv_csMS_bot@mastoxiv.page
2025-06-03 16:06:00

This https://arxiv.org/abs/2502.16517 has been replaced.
link: https://scholar.google.com/scholar?q=a

Annotation-guided AoS-to-SoA conversions and GPU offloading with data views in C++
The C++ programming language provides classes and structs as fundamental modeling entities. Consequently, C++ code tends to favour array-of-structs (AoS) for encoding data sequences, even though structure-of-arrays (SoA) yields better performance for some calculations. We propose a C++ language extension based on attributes that allows developers to guide the compiler in selecting memory arrangements, i.e.~to select the optimal choice between AoS and SoA dynamically depending on both the execut…

@arXiv_csSE_bot@mastoxiv.page
2025-07-16 08:18:31

$\texttt{Droid}$: A Resource Suite for AI-Generated Code Detection
Daniil Orel, Indraneil Paul, Iryna Gurevych, Preslav Nakov
https://arxiv.org/abs/2507.10583

$\texttt{Droid}$: A Resource Suite for AI-Generated Code Detection
In this work, we compile $\textbf{$\texttt{DroidCollection}$}$, the most extensive open data suite for training and evaluating machine-generated code detectors, comprising over a million code samples, seven programming languages, outputs from 43 coding models, and over three real-world coding domains. Alongside fully AI-generated samples, our collection includes human-AI co-authored code, as well as adversarial samples explicitly crafted to evade detection. Subsequently, we develop $\textbf{$\t…

@arXiv_eessSP_bot@mastoxiv.page
2025-06-03 16:28:58

This https://arxiv.org/abs/2405.15927 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"
Spike-based encoders represent information as sequences of spikes or pulses, which are transmitted between neurons. A prevailing consensus suggests that spike-based approaches demonstrate exceptional capabilities in capturing the temporal dynamics of neural activity and have the potential to provide energy-efficient solutions for low-power applications. The Spiketrum encoder efficiently compresses input data using spike trains or code sets (for non-spiking applications) and is adaptable to both…

@arXiv_nuclex_bot@mastoxiv.page
2025-06-11 09:06:15

Measurement of radionuclide production probabilities in negative muon nuclear capture and validation of Monte Carlo simulation model
Y. Yamaguchi, M. Niikura, R. Mizuno, M. Tampo, M. Harada, N. Kawamura, I. Umegaki, S. Takeshita, K. Haga
https://arxiv.org/abs/2506.08301

Measurement of radionuclide production probabilities in negative muon nuclear capture and validation of Monte Carlo simulation model
As part of the development of a sample radioactivity calculation program, we have measured radionuclide production probabilities in negative muon nuclear capture to update experimental data and to validate a calculation dataset obtained by a Monte Carlo simulation code. The probabilities have been obtained by an activation experiment on $^{27}$Al, $^\mathrm{nat}$Si, $^{59}$Co, and $^\mathrm{nat}$Ta targets. The obtained probabilities expand the validation scope to the radionuclide production pr…

@arXiv_grqc_bot@mastoxiv.page
2025-06-09 08:51:32

MatBYIB: A Matlab-based code for Bayesian inference of extreme mass-ratio inspiral binary with arbitrary eccentricity
Gen-Liang Li, Shu-Jie Zhao, Huai-Ke Guo, Jing-Yu Su, Zhen-Heng Lin
https://arxiv.org/abs/2506.05954

MatBYIB: A Matlab-based code for Bayesian inference of extreme mass-ratio inspiral binary with arbitrary eccentricity
Accurate parameter estimation(PE) of gravitational waves(GW) is essential for GW data analysis. In extreme mass-ratio inspiral binary(EMRI) systems, orbital eccentricity is a critical parameter for PE. However, current software for for PE of GW often neglects the direct estimation of orbital eccentricity. To fill this gap, we have developed the MatBYIB, a MATLAB-based software package for PE of GW with arbitrary eccentricity. The MatBYIB employs the Analytical Kludge (AK) waveform as a computat…

@arXiv_csSE_bot@mastoxiv.page
2025-07-22 11:51:40

Observing Fine-Grained Changes in Jupyter Notebooks During Development Time
Sergey Titov, Konstantin Grotov, Cristina Sarasua, Yaroslav Golubev, Dhivyabharathi Ramasamy, Alberto Bacchelli, Abraham Bernstein, Timofey Bryksin
https://arxiv.org/abs/2507.15831

Observing Fine-Grained Changes in Jupyter Notebooks During Development Time
In software engineering, numerous studies have focused on the analysis of fine-grained logs, leading to significant innovations in areas such as refactoring, security, and code completion. However, no similar studies have been conducted for computational notebooks in the context of data science. To help bridge this research gap, we make three scientific contributions: we (1) introduce a toolset for collecting code changes in Jupyter notebooks during development time; (2) use it to collect mor…

@arXiv_csCR_bot@mastoxiv.page
2025-07-08 10:34:01

Securing Mixed Rust with Hardware Capabilities
Jason Zhijingcheng Yu, Fangqi Han, Kaustab Choudhury, Trevor E. Carlson, Prateek Saxena
https://arxiv.org/abs/2507.03344

Securing Mixed Rust with Hardware Capabilities
The Rust programming language enforces three basic Rust principles, namely ownership, borrowing, and AXM (Aliasing Xor Mutability) to prevent security bugs such as memory safety violations and data races. However, Rust projects often have mixed code, i.e., code that also uses unsafe Rust, FFI (Foreign Function Interfaces), and inline assembly for low-level control. The Rust compiler is unable to statically enforce Rust principles in mixed Rust code which can lead to many security vulnerabilitie…

@arXiv_csGR_bot@mastoxiv.page
2025-06-02 09:56:59

This https://arxiv.org/abs/2505.19713 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csGR_…

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
In this work, we introduce CAD-Coder, a novel framework that reformulates text-to-CAD as the generation of CadQuery scripts - a Python-based, parametric CAD language. This representation enables direct geometric validation, a richer modeling vocabulary, and seamless integration with existing LLMs. To further enhance code validity and geometric fidelity, we propose a two-stage learning pipeline: (1) supervised fine-tuning on paired text-CadQuery data, and (2) reinforcement learning with Group Re…

@arXiv_csLG_bot@mastoxiv.page
2025-07-09 10:25:02

KnowIt: Deep Time Series Modeling and Interpretation
M. W. Theunissen, R. Rabe, M. H. Davel
https://arxiv.org/abs/2507.06009 https://…

KnowIt: Deep Time Series Modeling and Interpretation
KnowIt (Knowledge discovery in time series data) is a flexible framework for building deep time series models and interpreting them. It is implemented as a Python toolkit, with source code and documentation available from https://must-deep-learning.github.io/KnowIt. It imposes minimal assumptions about task specifications and decouples the definition of dataset, deep neural network architecture, and interpretability technique through well defined interfaces. This ensures the ease of importing n…

@arXiv_csSE_bot@mastoxiv.page
2025-06-23 10:32:50

LLMs in Coding and their Impact on the Commercial Software Engineering Landscape
Vladislav Belozerov, Peter J Barclay, Askhan Sami
https://arxiv.org/abs/2506.16653

LLMs in Coding and their Impact on the Commercial Software Engineering Landscape
Large-language-model coding tools are now mainstream in software engineering. But as these same tools move human effort up the development stack, they present fresh dangers: 10% of real prompts leak private data, 42% of generated snippets hide security flaws, and the models can even ``agree'' with wrong ideas, a trait called sycophancy. We argue that firms must tag and review every AI-generated line of code, keep prompts and outputs inside private or on-premises deployments, obey emerging safet…

@arXiv_csDC_bot@mastoxiv.page
2025-06-05 07:17:21

SLURM Heterogeneous Jobs for Hybrid Classical-Quantum Workflows
Aniello Esposito, Utz-Uwe Haus
https://arxiv.org/abs/2506.03846 https://

SLURM Heterogeneous Jobs for Hybrid Classical-Quantum Workflows
A method for efficient scheduling of hybrid classical-quantum workflows is presented, based on standard tools available on common supercomputer systems. Moderate interventions by the user are required, such as splitting a monolithic workflow in to basic building blocks and ensuring the data flow. This bares the potential to significantly reduce idle time of the quantum resource as well as overall wall time of co-scheduled workflows. Relevant pseudo-code samples and scripts are provided to demon…

@arXiv_csSE_bot@mastoxiv.page
2025-06-18 08:44:58

Characterising Bugs in Jupyter Platform
Yutian Tang, Hongchen Cao, Yuxi Chen, David Lo
https://arxiv.org/abs/2506.14055 https://arxiv…

Characterising Bugs in Jupyter Platform
As a representative literate programming platform, Jupyter is widely adopted by developers, data analysts, and researchers for replication, data sharing, documentation, interactive data visualization, and more. Understanding the bugs in the Jupyter platform is essential for ensuring its correctness, security, and robustness. Previous studies focused on code reuse, restoration, and repair execution environment for Jupyter notebooks. However, the bugs in Jupyter notebooks' hosting platform Jupyte…

@arXiv_csCR_bot@mastoxiv.page
2025-06-03 16:55:16

This https://arxiv.org/abs/2408.16028 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…

ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data
Supervised-learning-based vulnerability detectors often fall short due to limited labelled training data. In contrast, Large Language Models (LLMs) like GPT-4 are trained on vast unlabelled code corpora, yet perform only marginally better than coin flips when directly prompted to detect vulnerabilities. In this paper, we reframe vulnerability detection as anomaly detection, based on the premise that vulnerable code is rare and thus anomalous relative to patterns learned by LLMs. We introduce AN…

@arXiv_csDC_bot@mastoxiv.page
2025-06-04 07:24:48

eACGM: Non-instrumented Performance Tracing and Anomaly Detection towards Machine Learning Systems
Ruilin Xu, Zongxuan Xie, Pengfei Chen
https://arxiv.org/abs/2506.02007

eACGM: Non-instrumented Performance Tracing and Anomaly Detection towards Machine Learning Systems
We present eACGM, a full-stack AI/ML system monitoring framework based on eBPF. eACGM collects real-time performance data from key hardware components, including the GPU and network communication layer, as well as from key software stacks such as CUDA, Python, and PyTorch, all without requiring any code instrumentation or modifications. Additionally, it leverages libnvml to gather process-level GPU resource usage information. By applying a Gaussian Mixture Model (GMM) to the collected multidime…

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:19:46

Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning
Fangyu Lei, Jinxiang Meng, Yiming Huang, Tinghong Chen, Yun Zhang, Shizhu He, Jun Zhao, Kang Liu
https://arxiv.org/abs/2506.01710

Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning
Table reasoning, encompassing tasks such as table question answering, fact verification, and text-to-SQL, requires precise understanding of structured tabular data, coupled with numerical computation and code manipulation for effective inference. Supervised fine-tuning (SFT) approaches have achieved notable success but often struggle with generalization and robustness due to biases inherent in imitative learning. We introduce Reasoning-Table, the first application of reinforcement learning (RL)…

@arXiv_csSE_bot@mastoxiv.page
2025-06-12 07:57:31

Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models
Zongjie Li, Shuai Wang
https://arxiv.org/abs/2506.09396 https://

Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models
This position paper proposes a fundamental shift in designing code generation models: treating reasoning depth as a controllable resource. Rather than being an incidental byproduct of prompting, we argue that the trade-off between rapid, direct answers ("fast thinking") and elaborate, chain-of-thought deliberation ("slow thinking") must be explicitly managed. We contend that optimizing reasoning budgets across the entire model lifecycle - from synthetic data creation and benchmarking to real-wo…

@arXiv_csSD_bot@mastoxiv.page
2025-06-03 07:25:22

Probing Audio-Generation Capabilities of Text-Based Language Models
Arjun Prasaath Anbazhagan, Parteek Kumar, Ujjwal Kaur, Aslihan Akalin, Kevin Zhu, Sean O'Brien
https://arxiv.org/abs/2506.00003

Probing Audio-Generation Capabilities of Text-Based Language Models
How does textual representation of audio relate to the Large Language Model's (LLMs) learning about the audio world? This research investigates the extent to which LLMs can be prompted to generate audio, despite their primary training in textual data. We employ a three-tier approach, progressively increasing the complexity of audio generation: 1) Musical Notes, 2) Environmental Sounds, and 3) Human Speech. To bridge the gap between text and audio, we leverage code as an intermediary, prompting …

@arXiv_csSE_bot@mastoxiv.page
2025-06-16 10:13:09

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation
Changxin Ke, Rui Zhang, Shuo Wang, Li Ding, Guangli Li, Yuanbo Wen, Shuoming Zhang, Ruiyuan Xu, Jin Qin, Jiaming Guo, Chenxi Wang, Ling Li, Qi Guo, Yunji Chen
https://arxiv.org/abs/2506.11153

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation
The rise of GPU-based high-performance computing (HPC) has driven the widespread adoption of parallel programming models such as CUDA. Yet, the inherent complexity of parallel programming creates a demand for the automated sequential-to-parallel approaches. However, data scarcity poses a significant challenge for machine learning-based sequential-to-parallel code translation. Although recent back-translation methods show promise, they still fail to ensure functional equivalence in the translate…

@lilmikesf@c.im
2025-07-05 16:31:02

Paper Of Record delves into how unchecked global warming #climate chaos created catastrophic conditions for deadly #floods in #TX. The #GuadalupeRiver

The Guadalupe River rose from three feet to 34 feet in about 90 minutes, according to data from a river gauge near the town of Comfort, Texas. The volume of water exploded from 95 cubic feet per second to 166,000 cubic feet per second.

And the warming climate is creating the conditions in Texas for more of these sharp, deadly deluges.

In the eastern part of the state, the number of days per year with at least two inches of rain or snow has increased by 20 percent since 1900, according to the …

o Methane SAT Is Lost: The satellite, launched to track planet-warming emissions from oil and gas sites, was just a year into its mission. It has lost power, the mission’s controllers said, and most likely cannot be recovered.

o Imported Trash: Malaysia banned all plastic waste shipments from nations that had not sighed an agreement regulating hazardous waste. That includes the United States, which shipped more than 35,000 tons of it to the country in 2024.

» Saltier Seas, Less Ice: A study p…

As the World Warms, Extreme Rain Is Becoming Even More Extreme
Even in places, like Central Texas, with a long history of floods, human-caused warming is creating the conditions for more frequent and severe deluges.

@arXiv_csSE_bot@mastoxiv.page
2025-06-02 07:21:40

MGS3: A Multi-Granularity Self-Supervised Code Search Framework
Rui Li, Junfeng Kang, Qi Liu, Liyang He, Zheng Zhang, Yunhao Sha, Linbo Zhu, Zhenya Huang
https://arxiv.org/abs/2505.24274

MGS3: A Multi-Granularity Self-Supervised Code Search Framework
In the pursuit of enhancing software reusability and developer productivity, code search has emerged as a key area, aimed at retrieving code snippets relevant to functionalities based on natural language queries. Despite significant progress in self-supervised code pre-training utilizing the vast amount of code data in repositories, existing methods have primarily focused on leveraging contrastive learning to align natural language with function-level code snippets. These studies have overlooke…

@arXiv_csDC_bot@mastoxiv.page
2025-05-30 07:17:07

MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning
Yong-Cheng Liaw, Shuo-Han Chen
https://arxiv.org/abs/2505.23254 https://

MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning
Owing to the huge success of generative artificial intelligence (AI), large language models (LLMs) have emerged as a core subclass, underpinning applications such as question answering, text generation, and code completion. While fine-tuning these models on domain-specific data can yield significant performance gains, it also poses daunting computational challenges, especially for researchers and small organizations with limited hardware resources. Although SSD offloading (i.e., ZeRO-Infinity) …

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 16:49:19

This https://arxiv.org/abs/2408.09568 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair
Large Language Models (LLMs) have shown high capabilities in several software development-related tasks such as program repair, documentation, code refactoring, debugging, and testing. However, training these models requires massive amount of data and significant computational resources. Adapters are specialized, small modules designed for parameter efficient fine-tuning of LLMs for specific tasks, domains, or applications without requiring extensive retraining of the entire model. These adapte…

@seav@en.osm.town
2025-06-01 09:58:17

Done mapping all 10 #barangays of Hadji Muhtamad, Basilan, #Philippines 🇵🇭 in #OpenStreetMap, creating their #Wikidata

Screenshot of an Overpass Turbo query result map showing the boundaries and the node label positions of the 10 barangays of Hadji Muhtamad, Basilan, with the relation of Luuk-Bungsod selected showing its tags.

Screenshot of the Wikidata Query Service query results showing a table of the 10 barangays of Hadji Muhtamad, Basilan, along with some of their properties such as population, coordinates, PSGC (Philippine Standard Geographic Code), and OpenStreetMap IDs.

overpass turbo
A web based data mining tool for OpenStreetMap which runs any kind of Overpass API query and shows the results on an interactive map.

Tootfinder

Opt-in global Mastodon full text search. Join the index!