
2025-06-19 08:37:13
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
Yilin Zhang, Xinran Zhao, Zora Zhiruo Wang, Chenyang Yang, Jiayi Wei, Tongshuang Wu
https://arxiv.org/abs/2506.15655
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
Yilin Zhang, Xinran Zhao, Zora Zhiruo Wang, Chenyang Yang, Jiayi Wei, Tongshuang Wu
https://arxiv.org/abs/2506.15655
from my link log —
Tools built on tree-sitter's concrete syntax trees.
https://www.scannedinavian.com/tools-built-on-tree-sitters-concrete-syntax-trees.html
saved 2025-06-01
Why Senior Developers Google Basic Syntax
https://faun.pub/why-senior-developers-google-basic-syntax-fa56445e355f
AST-Enhanced or AST-Overloaded? The Surprising Impact of Hybrid Graph Representations on Code Clone Detection
Zixian Zhang, Takfarinas Saber
https://arxiv.org/abs/2506.14470
Thanks Gemini. I did not know that about markdown.
"Important Consideration for Writing Markdown:
Even with this CSS, the most reliable way to ensure a line break within a list item in the PDF is to use the standard Markdown syntax for a hard line break: end the line with two or more spaces before hitting Enter."
#Gemini
Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:
- customizable, composable & extensible tokenization (transducer based)
- ngram gene…
Substructural Abstract Syntax with Variable Binding and Single-Variable Substitution
Marcelo Fiore, Sanjiv Ranchod
https://arxiv.org/abs/2505.24812 https:/…
{nplyr} has helper functions to work on nested dataframes: #rstats #datascience
People keep making the same mistake, again and again and again and again forever, of thinking that it is syntax that makes software development hard.
Oh honey.
Re this from @mathaetaes:
https://infosec.exchange/@mathaetaes/114656764053846137
(P.S. Visual coding is actually really cool, and IMO an underexplored PL design space — but is very much coding, and very much tricky for the same reasons as any other kind of coding.)
Code-Switching and Syntax: A Large-Scale Experiment
Igor Sterner, Simone Teufel
https://arxiv.org/abs/2506.01846 https://arxiv.org/pd…
Uuuh, :ruby: #Ruby will likely get #Namespaces although the syntax will change because #GitLab already uses `Namespace` a lot and @…
There's something to be said for returning the whole syntax tree.
-- Larry Wall in <199710221833.LAA24741@wall.org>
It's publication day for 'Space Syntax - Selected papers by Bill Hillier', edited by my wonderful Bartlett School of Architecture colleague Laura Vaughan with John Peponis and Ruth Conroy Dalton.
The book brings together Hillier's groundbreaking work spanning half a century with current commentaries by international researchers
It is available #openAccess by
…
Quantifying Azure RBAC Wildcard Overreach
Christophe Parisel
https://arxiv.org/abs/2506.10755 https://arxiv.org/pdf/2506.10755…
„Mother of Order“ by The Sarge
Platform: C64
Released: 27 November 2021
Note: 1st in the Syntax 2021 Pixel Graphics (8bit) competition
#GFX_The_Sarge #YR_2021 #PF_C64…
from my link log —
Exploring Typst, a new typesetting system similar to LaTeX.
https://blog.jreyesr.com/posts/typst/
saved 2024-10-12 https://
I think strong and weak typing in programming languages is actually a spectrum rather than a binary classification.
See terraform for example:
> All values have a type, which dictates where that value can be used and what transformations can be applied to it.
https://developer.hashicorp…
Is spreadsheet syntax better than numeric indexing for cell selection?
Philip Heltweg, Dirk Riehle, Georg-Daniel Schwarz
https://arxiv.org/abs/2505.23296 h…
in der #verschlagwortung gabs diese woche ebooks zu syntax (https://hbz-ulbms.primo.exlibrisgroup.com/permalink/49HBZ…
I have a habbit of making (too) many (small) packages for functionality that might be reused in different context. {box} might be an alternative by making scripts into modlues that can be loaded: #RStats
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li
https://arxiv.org/abs/2506.09792
@… Oh, maybe, I’m alright with the syntax personally. If you know of something better that preserves the pipeline-first design, I’m interested.
Spark SQL pipe (|>) for Spark 4.0.0?!
https://issues.apache.org/jira/browse/SPARK-49555
https://
#numpy is being aware of the many syntax/shape gotchas. This looks like …
Design of a visual environment for programming by direct data manipulation
Michel Adam (UBS, IRISA), Patrice Frison (UBS, IRISA), Moncef Daoud (UBS), Sabine Letellier Zarshenas (UBS)
https://arxiv.org/abs/2506.03720
Dang it…is there a human-readable data format that is basically YAML syntax but the simple featureset of JSON (plus comments)?
I want something concise and Markdown-ish, made for human editing, like YAML, but without all of YAML’s…er, specialness.
EDIT: To be clear, this is for •primary content•, not configuration. It should •feel• like working with Markdown; it’s just that the output needs to be array-and-dict-shaped instead of HTML-shaped. (Lots of good suggestions in the replies already! TOML and KDL and are clear crowd favorites.)
This https://arxiv.org/abs/2502.13033 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLO_…
This https://arxiv.org/abs/2505.16978 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
An Exploratory Framework for Future SETI Applications: Detecting Generative Reactivity via Language Models
Po-Chieh Yu
#toXiv_bot_toot
Dang it…is there a human-readable data format that is basically YAML syntax but the simple featureset of JSON (plus comments)?
I want something concise and Markdown-ish, made for human editing, like YAML, but without all of YAML’s…er, specialness.
EDIT: To be clear, this is for •primary content•, not configuration. It should •feel• like working with Markdown; it’s just that the output needs to be array-and-dict-shaped instead of HTML-shaped. (Lots of good suggestions in the replies already! TOML and KDL and are clear crowd favorites.)
This https://arxiv.org/abs/2410.18042 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csPL_…