Noch ein paar der zuletzt hier besonders häufig geteilten #News:
IT-Angriff betrifft IT der Beweisstückstelle der Polizei
The year starts with a new competition case re. public EV charging prices in Italy. The competition authority has decided to conduct a further investigation into A2A Mobility, as prices charged to others were higher than in their own app.
https://www.agcm.it/pubblicazioni/bolletti<…
Wenn Cybercrime zeigt, dass wirklich niemand verschont bleibt. 🫠 Ein Ransomware-Angriff auf die Werkstatt Bremen hat auch Auswirkungen auf die IT-Systeme der polizeilichen Beweisstückstelle.
Zum Artikel: https://heise.de/-11165825?wt_mc=sm.re
"A pair of US lawmakers are calling for an investigation into how easily spies can steal information based on devices’ electromagnetic and acoustic leaks—a spying trick the NSA once codenamed TEMPEST"
https://www.wired.com/story/how-vulnerable
"Christopher Bishop’s 2006 book “Pattern Recognition and Machine Learning,” arguably one of the triggers of the current popularity of machine learning, is quite literally a book about applied mathematics, diving into probabilities, linear algebra, neural networks, Markov models, and combinatorics. And rightfully so; if your objective is to find a job as an engineer at OpenAI, knowing a thing or two about eigenvalues and eigenvectors is definitely going to be useful."
Rebuilding public trust in AI requires meaningful citizen engagement, transparent governance, and robust legislation. Technology itself is not the problem. The issue is that few people trust institutions to deploy it wisely and for their benefit. This makes the first step to answer the following question: What’s it in for me?
So, I have an answer to my previous question about GPU transfer efficiency.
Original code: write data to staging buffer on CPU, vkCopyBuffer to GPU local memory, run int-float32 conversion on GPU out of that buffer. The copy operation shows 50% SM occupancy by compute warps, 50% unallocated warp slots in active SMs.
GPU memory write bandwidth is sitting around 2%, about 1.9 ms copy/shader run time.
🇺🇦 #NowPlaying on #KEXP's #Early
Confidence Man:
🎵 Angry Girl
#ConfidenceMan
https://confidenceman.bandcamp.com/track/angry-girl-chai-version
https://open.spotify.com/track/2PXULQ9Lo1AmU7eMnnBnxp
ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation
Shihao Wang, Jiahao Chen, Yanqi Pan, Hao Huang, Yichen Hao, Xiangyu Zou, Wen Xia, Wentao Zhang, Haitao Wang, Junhong Li, Chongyang Qiu, Pengfei Wang
https://arxiv.org/abs/2602.02579 https://arxiv.org/pdf/2602.02579 https://arxiv.org/html/2602.02579
arXiv:2602.02579v1 Announce Type: new
Abstract: The prefill stage of long-context Retrieval-Augmented Generation (RAG) is severely bottlenecked by computational overhead. To mitigate this, recent methods assemble pre-calculated KV caches of retrieved RAG documents (by a user query) and reprocess selected tokens to recover cross-attention between these pre-calculated KV caches. However, we identify a fundamental "crowding-out effect" in current token selection criteria: globally salient but user-query-irrelevant tokens saturate the limited recomputation budget, displacing the tokens truly essential for answering the user query and degrading inference accuracy.
We propose ProphetKV, a user-query-driven KV Cache reuse method for RAG scenarios. ProphetKV dynamically prioritizes tokens based on their semantic relevance to the user query and employs a dual-stage recomputation pipeline to fuse layer-wise attention metrics into a high-utility set. By ensuring the recomputation budget is dedicated to bridging the informational gap between retrieved context and the user query, ProphetKV achieves high-fidelity attention recovery with minimal overhead. Our extensive evaluation results show that ProphetKV retains 96%-101% of full-prefill accuracy with only a 20% recomputation ratio, while achieving accuracy improvements of 8.8%-24.9% on RULER and 18.6%-50.9% on LongBench over the state-of-the-art approaches (e.g., CacheBlend, EPIC, and KVShare).
toXiv_bot_toot