
2025-05-29 14:02:20
Spark SQL pipe (|>) for Spark 4.0.0?!
https://issues.apache.org/jira/browse/SPARK-49555
https://
Spark SQL pipe (|>) for Spark 4.0.0?!
https://issues.apache.org/jira/browse/SPARK-49555
https://
With Apache NiFi, a multimodal data pipelining tool, you can assemble existing and/or custom Java & Python processors into a variety of flows. Join Lester Martin at Berlin Buzzwords this year and watch a rich data pipeline be constructed from Kafka, stored using the Apache Iceberg table format and consumed from Trino.
Learn more:
The inner working of parquette/arrow data in R: #rstats
StreamLink: Large-Language-Model Driven Distributed Data Engineering System
Dawei Feng, Di Mei, Huiri Tan, Lei Ren, Xianying Lou, Zhangxi Tan
https://arxiv.org/abs/2505.21575
Unveiling the Sagittarius Dwarf Spheroidal Galaxy Core with Gaia DR3
Ellie K. H. Toguchi-Tani, Daniel R. Hey, Thomas de Boer, Peter M. Frinchaboy, Daniel Huber
https://arxiv.org/abs/2507.20212
#Apache 2.4.64 is released! It fixes some vulnerabilities, listed here:
https://httpd.apache.org/security/vulnerabilities_24.html
Modern applications require search capabilities that go beyond basic text matching. They must be fast, accurate, personalised and context-aware. At this year's Berlin Buzzwords, Saurabh Singh will demonstrate how OpenSearch’s latest AI/ML enhancements and engine improvements enable organisations to build intelligent, scalable search experiences that meet these evolving needs.
Learn more:
Well, it doesn't look like much but I just switched my infra from Apache to Caddy.
Sometimes doing some admin work is good for the soul.
(Also a preparation to install an AI scraper poison service)
Join Andrew Musselman and Trevor Grant as they present the latest developments in Mahout's new quantum compute layer, Qumat. They will provide an overview of the project, explain why Qumat was developed, and demonstrate its current capabilities. They will also present a demo of Qumat in action and conclude with calls to action for researchers and engineers who are interested in using and contributing to the project.
Learn more:
Palomar and Apache Point Spectrophotometry of Interstellar Comet 3I/ATLAS
Matthew Belyakov, Christoffer Fremling, Matthew J. Graham, Bryce T. Bolin, Mukremin Kilic, Gracyn Jewett, Carey M. Lisse, Carl Ingebretsen, M. Ryleigh Davis, Ian Wong
https://arxiv.org/abs/2507.11720
Join Kevin Liang at this year's Berlin Buzzwords, where he will discuss how Apache Solr/Lucene builds dense vector indexes and talk about how he and his team optimised their dense vector setup, sharing the challenges they faced and the best practices they learned along the way.
Learn more: https…
The people committed to DDoSing the #Apache #SpamAssassin RuleQA server seem to have substantial resources. I’ve blocked a lot of them, but they keep coming, asking about things like the May 7 2017 performance of a single rule in one contributor's stats. Not stuff real people want.
Of course…
Mistral launches its first reasoning models: Magistral Small, on Hugging Face under an Apache 2.0 license, and Magistral Medium, in preview on Mistral's Le Chat (Kyle Wiggers/TechCrunch)
https://techcrunch.com/2025/06/10/mistral-releases-a-pai…
Been wanting to build it but I haven't had the time recently: Has anyone integrated iocaine or some similar anti AI scraper tools into apache?
Okee!…?
Sone art schlager-hiphop
RAF Camora x Apache 207 - JUPITER
https://www.youtube.com/watch?v=3RvBj77W9sQ
> JUPITER” hier streamen: http…
What's the preferred easy-to-use benchmarking tool these days for testing full HTTP responses? I know ab (apache bench), but it's also very old so I assume there's a new favorite.
This is for mostly informal tests, so ease of use > capability. Must run on Linux CLI.
#PHP
Big Data-Driven Fraud Detection Using Machine Learning and Real-Time Stream Processing
Chen Liu, Hengyu Tang, Zhixiao Yang, Ke Zhou, Sangwhan Cha
https://arxiv.org/abs/2506.02008 …
Apache Point Observatory follow-up of ACcelerating Candidate ExopLanet host Stars (APO ACCELS): Ages for 166 Accelerating Stars in the Northern Hemisphere
Anne E. Peck (Department of Astronomy, New Mexico State University), Eric L. Nielsen (Department of Astronomy, New Mexico State University), Robert J. De Rosa (European Southern Observatory), William Thompson (National Research Council, Herzberg Astronomy and Astrophysics), Bruce Macintosh (Department of Astronomy and Astrophysics, U…
A Galactic Self-Portrait: Density Structure and Integrated Properties of the Milky Way Disk
Julie Imig, Jon A. Holtzman, Gail Zasowski, Jianhui Lian, Nicholas F. Boardman, Alexander Stone-Martinez, J. Ted Mackereth, Moire K. M. Prescott, Rachael L. Beaton, Timothy C. Beers, Dmitry Bizyaev, Michael R. Blanton, Katia Cunha, Jos\'e G. Fern\'andez-Trincado, Catherine E. Fielder, Sten Hasselquist, Christian R. Hayes, Misha Haywood, Henrik J\"onsson, Richard R. Lane, Steven R. M…
As data speeds increase, it has become crucial to detect problems as they happen. At this year's Berlin Buzzwords, Olena Kutsenko explained how to build a real-time anomaly detection system using Apache Kafka for streaming, Apache Flink for processing, and AI for pattern recognition, covering Apache Iceberg for storing historical data to improve models.
Watch the full session:
This https://arxiv.org/abs/2504.06151 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csOS_…
Microservices and Real-Time Processing in Retail IT: A Review of Open-Source Toolchains and Deployment Strategies
Aaditaa Vashisht (Department of Information Science,Engineering, RV College of Engineering, India), Rekha B S (Department of Information Science,Engineering, RV College of Engineering, India)
https://arxiv.org/abs/25…
Apache Point rapid response characterization of primitive pre-impact detection asteroid 2024 RW$_1$
Carl Ingebretsen, Bryce T. Bolin, Robert Jedicke, Peter Vere\v{s}, Christine H. Chen, Carey M. Lisse, Russet McMillan, Torrie Sutherland, Amanda J. Townsend
https://arxiv.org/abs/2505.23736
Plug. Play. Persist. Inside a Ready-to-Go Havoc C2 Infrastructure
Alessio Di Santo
https://arxiv.org/abs/2507.00189 https://arxiv.org…
At Berlin Buzzwords 2025, Ved Prakash discussed how Siphon transformed their data pipeline using Apache Iceberg to successfully stream quality data into both Snowflake and Clickhouse simultaneously. In this short talk, you’ll learn about their battle-tested architecture, the performance improvements they’ve achieved, and their strategies for maintaining data consistency across two analytics engines.
Watch the full session:
Got slammed by an unidentified but certainly "#AI"-related #distributed #crawler this week, it drove one site's traffic to 10× average. Today I tired of playing Whac-a-Mole and blocked the two bigge…
CityPulse: Real-Time Traffic Data Analytics and Congestion Prediction
Idriss Djiofack Teledjieu, Irzum Shafique
https://arxiv.org/abs/2506.01971 https://…
Abundances of P, S, and K in 58 bulge spheroid stars from APOGEE
B. Barbuy, H. Ernandes, A. C. S. Fria\c{c}a, M. S. Camargo, P. da Silva, S. O. Souza, T. Masseron, M. Brauner, D. A. Garcia-Hernandez, J. G. Fernandez-Trincado, K. Cunha, V. V. Smith, A. Peerez-Villegas, C. Chiappini, A. B. A. Queiroz, B. X. Santiago, T. C. Beers, F. Anders, R. P. Schiavon, M. Valentini, D. Minniti, D. Geisler, D. Souto, V. M. Placco, M. Zoccali, S. Feltzing, M. Schultheis, C. Nitschelm
Apache Flink is uniquely positioned to serve as the backbone for AI agents, equipping them with the powerful new tool of stream processing. Join Steffen Hoellinger at this year's Berlin Buzzwords to explore how Flink jobs can be transformed into “Agents”—autonomous, goal-driven entities that dynamically interact with data streams, trigger actions, and adapt their behaviour based on real-time insights.
Learn more:
We're thrilled to announce that @… has rejoined Berlin Buzzwords as a Platinum Partner!
Learn more about OpenSearch: https://opensearch.org/
🇺🇦 #NowPlaying on BBCRadio3's #ThisClassicalLife
David Raksin & Johnny Mercer:
🎵 Love Song From "Apache"
#DavidRaksin #JohnnyMercer
OWLS I: The Olin Wilson Legacy Survey
Brett M. Morris, Leslie Hebb, Suzanne L. Hawley, Kathryn Jones, Jake Romney
https://arxiv.org/abs/2507.07330 https://…
Analysis of Server Throughput For Managed Big Data Analytics Frameworks
Emmanouil Anagnostakis, Polyvios Pratikakis
https://arxiv.org/abs/2506.03854 https:…
At this year's Berlin Buzzwords, Michal Gancarski led a workshop demonstrating practical ways to deploy, configure, interact with, and utilise the advanced features of Apache Iceberg.
Watch the full session: https://youtu.be/v15EiNQt9R0?si=QDuAZ4NqqmlaUSUK
Join Adrien Grand and Luca Cavanna at this year's Berlin Buzzwords as they share the fascinating journey to the release of version 10.0 of the popular Java search engine Apache Lucene, discussing the ups and downs, the team effort it took to get there, and much more.
Learn more: https://
Join Adrien Grand and Luca Cavanna at this year's Berlin Buzzwords as they share the fascinating journey to the release of version 10.0 of the popular Java search engine Apache Lucene, discussing the ups and downs, the team effort it took to get there, and much more.
Learn more: https://
Apache Solr 9.8 introduces the LLM module, opening the doors to end-to-end natural language query support through vector-backed semantic search (K Nearest Neighbors). At Berlin Buzzwords 2025, Alessandro Benedetti discussed the open-source contributions from both an indexing and query perspective, as well as what's next for Solr in terms of Large Language Model integration.
Watch the full session here:
At Berlin Buzzwords 2025, Javier Ramirez shared the journey of developing QuestDB, an Apache 2.0-licensed open-source time-series database, into a much faster analytical database.
Watch the full session: https://youtu.be/SuxHP3_KOgQ?si=mGVdSKX5tHV81If7
At this year's Berlin Buzzwords, Ilaria Petreti, Anna Ruggero, and Edward Lambe presented an AI Filter Assistant for Statistical Data (SDMX). They demonstrated how large language models can suggest the most effective filters for your natural language queries and assist in refining your results in Apache Solr.
You can watch the full session here: …