Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csCL_bot@mastoxiv.page
2024-03-01 06:53:41

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset
Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang, Wenchang Ning, Jia Yu, ChaoBin Zhang, Pei Chu, Yuan Qu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Ruiliang Xu, Wei Li, Hang Yan, Conghui He
arxiv.org/abs/2402.19282<…

@ErikJonker@mastodon.social
2024-05-02 07:37:08

An open letter on the position of scientists and researchers on the
recently proposed changes to the EU’s proposed Child Sexual Abuse Regulation. As of the 1st May 2024, the letter has been signed by 254 scientists and researchers from 33 countries. (among them @… )

@gevoel@mastodon.green
2024-03-31 21:38:29

Or find something better to do with your life than working at Unilever. toot.community/@openculture/11

@kexpmusicbot@mastodonapp.uk
2024-03-01 02:39:41

🔊 #NowPlaying on KEXP's #DriveTime
Radiohead:
🎵 Weird Fishes/Arpeggi
#Radiohead
open.spotify.com/track/4wajJ1o
ghostdata.bandcamp.com/track/r

@jonippolito@digipres.club
2024-02-29 12:46:22

AI companies to universities: Personalized tutors will make you obsolete
Also AI companies: Thanks for recording your lectures so we can sell them on the open market to train personalized tutors
annettevee.substack.com/p/when

@arXiv_physicsbioph_bot@mastoxiv.page
2024-05-01 07:04:57

Advanced analysis of single-molecule spectroscopic data
Joshua L. Botha, Bertus van Heerden, Tjaart P. J. Kr\"uger
arxiv.org/abs/2404.18945 arxiv.org/pdf/2404.18945
arXiv:2404.18945v1 Announce Type: new
Abstract: We present Full SMS, a multipurpose graphical user interface (GUI)-based software package for analysing single-molecule spectroscopy (SMS) data. SMS typically delivers multiparameter data -- such as fluorescence brightness, lifetime, and spectra -- of molecular- or nanometre-scale particles such as single dye molecules, quantum dots, or fluorescently labelled biological macromolecules. Full SMS allows an unbiased statistical analysis of fluorescence brightness through level resolution and clustering, analysis of fluorescence lifetimes through decay fitting, as well as the calculation of second-order correlation functions and the display of fluorescence spectra and raster-scan images. Additional features include extensive data filtering options, a custom HDF5-based file format, and flexible data export options. The software is open source and written in Python but GUI-based so it may be used without any programming knowledge. A multi-process architecture was employed for computational efficiency. The software is also designed to be easily extendable to include additional import data types and analysis capabilities.

@arXiv_csCR_bot@mastoxiv.page
2024-04-01 06:47:58

Differentiated Security Architecture for Secure and Efficient Infotainment Data Communication in IoV Networks
Jiani Fan, Lwin Khin Shar, Jiale Guo, Wenzhuo Yang, Dusit Niyato, Kwok-Yan Lam
arxiv.org/abs/2403.20136

@frankel@mastodon.top
2024-03-29 17:16:07

#LinuxFoundation Launches #OpenSource Valkey Community

@tante@tldr.nettime.org
2024-04-28 16:25:39

The definition of what constitutes an "open system" has really been sanded down by "AI". Just dropping random weights and a bit of network structure counts as open: No idea how and on what something was trained, which data was filtered out, which earlier training runs were discarded.
It's not really open at all. Maybe "free" as in "free beer" but from learning potential, ability to understand the system and maybe extend it, it's not any …

@bryanculbertson@mastodon.social
2024-02-02 00:20:36

LLM compression will be the final nail in the coffin of the open web
There is no benefit to create content just for it to be copied, compressed, and regurgitated without attribution or payment
Arc Browser is pretty cool, but it assumes websites will continue to publish high quality data without anyone ever visiting them directly

@arXiv_csHC_bot@mastoxiv.page
2024-04-01 08:31:44

This arxiv.org/abs/2403.17165 has been replaced.
initial toot: mastoxiv.page/@arXiv_csHC_…

@arXiv_csCL_bot@mastoxiv.page
2024-03-01 06:53:32

PeLLE: Encoder-based language models for Brazilian Portuguese based on open data
Guilherme Lamartine de Mello, Marcelo Finger, and Felipe Serras, Miguel de Mello Carpi, Marcos Menon Jose, Pedro Henrique Domingues, Paulo Cavalim
arxiv.org/abs/2402.19204

@arXiv_csCY_bot@mastoxiv.page
2024-05-01 06:48:27

Deep Learning for Educational Data Science
Juan D. Pinto, Luc Paquette
arxiv.org/abs/2404.19675 arxiv.org/pdf/2404.19675
arXiv:2404.19675v1 Announce Type: new
Abstract: With the ever-growing presence of deep artificial neural networks in every facet of modern life, a growing body of researchers in educational data science -- a field consisting of various interrelated research communities -- have turned their attention to leveraging these powerful algorithms within the domain of education. Use cases range from advanced knowledge tracing models that can leverage open-ended student essays or snippets of code to automatic affect and behavior detectors that can identify when a student is frustrated or aimlessly trying to solve problems unproductively -- and much more. This chapter provides a brief introduction to deep learning, describes some of its advantages and limitations, presents a survey of its many uses in education, and discusses how it may further come to shape the field of educational data science.

@arXiv_mathAP_bot@mastoxiv.page
2024-03-01 06:54:54

Global well-posedness for supercritical SQG with perturbations of radially symmetric data
Aynur Bulut, Hongjie Dong
arxiv.org/abs/2402.19439

@josemurilo@mato.social
2024-02-25 14:13:16

"The findings demonstrate that repository shutdown does happen and can result in permanent data loss… Data #reuse & #citation are increasingly promoted by journals, funders and other stakeholders. If these practices become more common, data loss might pose a threat to the permanence of the scholarly…

@arXiv_csSE_bot@mastoxiv.page
2024-03-01 08:36:10

This arxiv.org/abs/2307.02140 has been replaced.
link: scholar.google.com/scholar?q=a

@arXiv_csDL_bot@mastoxiv.page
2024-02-29 07:14:12

Handling Open Research Data within the Max Planck Society -- Looking Closer at the Year 2020
Martin Boosen, Michael Franke, Yves Vincent Grossmann, Sy Dat Ho, Larissa Leiminger, Jan Matthiesen
arxiv.org/abs/2402.18182

@shanmukhateja@social.linux.pizza
2024-02-29 19:11:05

Imagine if game studios would open source models of real world buildings, sites, etc. which they have researched for their games (Ubisoft).
Even if they are a decade old data, I wonder if such models could help indie dev studios.
#gaming #gamedesign

@shuttle@mastodon.online
2024-02-28 19:18:00

Bevy 0.13 is out! 🕺
For those who don't know, Bevy is a data-driven game engine built in Rust.
Check out the new features 👇
bevyengine.org/news/bevy-0-13/

@arXiv_eessIV_bot@mastoxiv.page
2024-04-30 07:34:10

Processing HSV Colored Medical Images and Adapting Color Thresholds for Computational Image Analysis: a Practical Introduction to an open-source tool
Lie Cai, Andre Pfob
arxiv.org/abs/2404.17878 arxiv.org/pdf/2404.17878
arXiv:2404.17878v1 Announce Type: new
Abstract: Background: Using artificial intelligence (AI) techniques for computational medical image analysis has shown promising results. However, colored images are often not readily available for AI analysis because of different coloring thresholds used across centers and physicians as well as the removal of clinical annotations. We aimed to develop an open-source tool that can adapt different color thresholds of HSV-colored medical images and remove annotations with a simple click.
Materials and Methods: We built a function using MATLAB and used multi-center international shear wave elastography data (NCT 02638935) to test the function. We provide step-by-step instructions with accompanying code lines.
Results: We demonstrate that the newly developed pre-processing function successfully removed letters and adapted different color thresholds of HSV-colored medical images.
Conclusion: We developed an open-source tool for removing letters and adapting different color thresholds in HSV-colored medical images. We hope this contributes to advancing medical image processing for developing robust computational imaging algorithms using diverse multi-center big data. The open-source Matlab tool is available at github.com/cailiemed/image-thr.

@arXiv_csAR_bot@mastoxiv.page
2024-05-01 06:46:52

PEFSL: A deployment Pipeline for Embedded Few-Shot Learning on a FPGA SoC
Lucas Grativol Ribeiro (IMT Atlantique - MEE, Lab\_STICC\_BRAIn, Lab-STICC\_2AI, LHC), Lubin Gauthier (Lab\_STICC\_BRAIn, IMT Atlantique - MEE), Mathieu Leonardon (IMT Atlantique - MEE, Lab\_STICC\_BRAIn), J\'er\'emy Morlier (IMT Atlantique - MEE, Lab\_STICC\_BRAIn), Antoine Lavrard-Meyer (IMT Atlantique), Guillaume Muller (Mines Saint-\'Etienne MSE, FAYOL-ENSMSE, FAYOL-ENSMSE), Virginie Fresse (LHC, …

@arXiv_csIR_bot@mastoxiv.page
2024-03-01 08:33:59

This arxiv.org/abs/2311.04590 has been replaced.
initial toot: mastoxiv.page/@arXiv_csIR_…

@v_i_o_l_a@openbiblio.social
2024-02-16 08:01:22

"Passion for Provenance (data): On collection data, museum systems and open collaboration via wikis" @…

@arXiv_physicsedph_bot@mastoxiv.page
2024-03-01 08:47:07

This arxiv.org/abs/2309.12924 has been replaced.
initial toot: mastoxiv.page/@arXi…

@NuclearDisorder@mastodon.social
2024-05-02 05:41:42

Heute vor 39 Jahren: Am 2. Mai 1985 zündeten die #USA im Rahmen von Operation Grenadier die 9. Atombombe "Towanda". Grenadier war eine Serie von #Kernwaffentests bei der 1984/85 insgesamt 16 Bomben im Testgebiet in

Kartierung aller Koordinaten von "Operation Grenadier (Atomtest)" im Testgebiet Nevada (Nevada National Security Site, NNSS)
Quelle: OpenStreetMap
Lizenz: Open Data Commons Open Database-Lizenz (ODbL)
@kettu@mastodon.green
2024-03-22 15:58:25

European Peatlands & Policies Open Data Mapathon 2024
Seeking teams (2 to 4 people) from all European countries to register for European Peatlands & Policies Open Data Mapathon 2024 #Mapathon2024
Date: 6th April Venue: Online/University of Galway

First Prize €1,200.

@timbray@cosocial.ca
2024-02-22 18:54:13

Bluesky says: Ready to federate. Their discussion of the differences between their approach and Fedi’s is interesting.
bsky.social/about/blog/02-22-2

@ubuntourist@mastodon.social
2024-02-24 19:32:13

February 22, 2024: "Today, we’re excited to announce that the Bluesky network is federating and opening up in a way that allows you to host your own data."
bsky.social/about/blog/02-22-2

@arXiv_statOT_bot@mastoxiv.page
2024-04-29 07:11:09

To democratize research with sensitive data, we should make synthetic data more accessible
Erik-Jan van Kesteren
arxiv.org/abs/2404.17271

@nfdi4culture@nfdi.social
2024-02-19 14:27:25

Rechtzeitig für die diesjährige #LoveDataWeek ♥️ erscheint der Bericht zum Beitrag von @lozross@post.lurk.org & @… von @… im Rahmen des internationalen Workshops

@arXiv_csCL_bot@mastoxiv.page
2024-03-01 06:53:46

OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models
Jenish Maharjan, Anurag Garikipati, Navan Preet Singh, Leo Cyrus, Mayank Sharma, Madalina Ciobanu, Gina Barnes, Rahul Thapa, Qingqing Mao, Ritankar Das
arxiv.org/abs/2402.19371

@arXiv_csSI_bot@mastoxiv.page
2024-05-01 07:22:38

Investigating the dissemination of STEM content on social media with computational tools
Oluwamayokun Oshinowo, Priscila Delgado, Meredith Fay, C. Alessandra Luna, Anjana Dissanayaka, Rebecca Jeltuhin, David R. Myers
arxiv.org/abs/2404.18944

@ronkjeffries@mastodon.social
2024-02-23 00:52:55

Bluesky federation has launched.
bsky.social/about/blog/02-22-2

@arXiv_astrophGA_bot@mastoxiv.page
2024-04-30 08:45:26

This arxiv.org/abs/2308.02279 has been replaced.
initial toot: mastoxiv.page/@arXiv_…

@arXiv_eessAS_bot@mastoxiv.page
2024-04-01 07:19:36

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li
arxiv.org/abs/2403.19971

@josemurilo@mato.social
2024-02-27 13:41:53

"#Apple points to #WebApps as the open alternative to the App Store, and actions to remove them have created deep concern in the web community.
#iOS demoting Web Apps to shortcuts threaten data loss and undermi…

@Techmeme@techhub.social
2024-03-23 16:35:38

Redis, the popular in-memory data store, switches from the open source 3-clause BSD license to a controversial dual-license model (Frederic Lardinois/TechCrunch)
techcrunch.com/2024/03/21/redi

@drbruced@aus.social
2024-02-28 02:07:37

Given the rather sloppy headlines about what “Wordpress” is planning to do with allowing AI systems to collect user data, I will point out:
- All the references to Wordpress refer to Wordpress.com, the hosting company whose parent company is Automattic. They do not refer to Wordpress the open source software project.
- Wordpress.com has published a blog post outlining their policies so that you don’t need to rely on speculative and vaguely alarming news articles

@acka47@openbiblio.social
2024-04-15 10:17:35

Jetzt Marcel Ackermanns Vortrag "Datenintegration aus offenen Quellen in der dblp computer science bibliography" beim #kimws24. Ich kriege einen kleinen Flashback zu meiner Aktivität in der Working Group on Open Bibliographic Data der OKFN, in deren Rahmen wir 2011 Marcels Gastbeitrag zur Freigabe der DBLP-Daten veröffentlicht haben:

@burger_jaap@mastodon.social
2024-04-22 13:37:24

Germany has open sourced a lot of data related to public #EV charging infrastructure.
Charging data is anonymised to protect operator interests, but it may still be useful for others to work with.
15k different ad-hoc pricing structures also registered - who wants to do an analysis on that?

@arXiv_csLG_bot@mastoxiv.page
2024-04-30 09:08:56

This arxiv.org/abs/2404.10255 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@lysander07@sigmoid.social
2024-04-19 20:04:52

Transactions on Graph Data & Knowledge (TGDK) is indexed now in the
Directory of Open Access Journals (DOAJ)
#knowledgegraphs

Transactions on Graph Data & Knowledge (TGDK) as on the webpage of
Directory of Open Access Journals (DOAJ)
@toxi@mastodon.thi.ng
2024-04-23 20:10:46

In another move[1] to stay up-to-date with latest version of Zig (v0.12.0), I've also updated all code (and .zig.zon depencency info) in the still-just-a-baby zig.thi.ng repo:
github.com/thi-ng/zig-thing
[1] Related (from yesterday):

@johl@mastodon.xyz
2024-02-26 18:37:23

Wikidata & AI, together again – Wikimedia Tech News tech-news.wikimedia.de/en/2024

@arXiv_csNI_bot@mastoxiv.page
2024-04-29 07:19:52

Colosseum: The Open RAN Digital Twin
Michele Polese, Leonardo Bonati, Salvatore D'Oro, Pedram Johari, Davide Villa, Sakthivel Velumani, Rajeev Gangula, Maria Tsampazi, Clifton Paul Robinson, Gabriele Gemmi, Andrea Lacava, Stefano Maxenti, Hai Cheng, Tommaso Melodia
arxiv.org/abs/2404.17317

@arXiv_csDC_bot@mastoxiv.page
2024-03-28 08:27:43

This arxiv.org/abs/2403.15721 has been replaced.
initial toot: mastoxiv.page/@arXiv_csDC_…

@HeidiSeibold@fosstodon.org
2024-04-23 11:00:01

Wow! My newsletter has 1800 subscribers now 🤩 💃
I am so happy that so many people are interested 🫶
To celebrate, I am preparing a very special issue this week:
✨ 3 simple rues for creating Open Science Policies ✨
(a collaboration with Sander Bosch)
Interested? You can have it in your inbox on Friday or view it directly on the newsletter page:

@berlinbuzzwords@floss.social
2024-04-26 12:00:15

Join Hellmar Becker at this year's Berlin Buzzwords to learn how to track data lineage in a real-time, open source analytics pipeline. #bbuzz
2024.berlinbuzzwords.de/sessio

Talk - Let's Do Data Lineage in Kafka, Flink and Druid!
Photograph of Hellmar Becker
9th-11th June 2024, Kulturbrauerei & Online, berlinbuzzwords.de
@laimis@mstdn.social
2024-03-29 22:18:25
Content warning: Raving about LLMs

ARRG I am adding CW warning to this because this will annoy people but I just had a fucking AAAMMMAZING session with claude.ai LLM.
I am working on my portfolio management tool and wanted to enhance it with charts that describe various aspects of my open positions.
Initially I started writing the code and then said hold on, let me tell LLM what I am building and give it the data model the first chart I did to give it an example.
And then I asked for suggestions what to cha…

@anneroth@systemli.social
2024-03-10 13:37:13

"Bei keiner der für 2024 angekündigten deutschen Open-Data-Day-Veranstaltungen findet sich das Thema Gender Data Gap.
„Die Auswirkungen geschlechtsspezifischer Datenlücken werden völlig unterschätzt“, so die Präsidentin des djb, Ursula Matthiessen-Kreuder. „Es besteht ein dringender Bedarf an nach Geschlechtern aufgeschlüsselten Daten.“"

@dennisfaucher@infosec.exchange
2024-04-25 14:36:16

Spent 3 days and 4 different operating systems trying to get an NVIDIA desktop GPU to act like a data center GPU in Linux. Turns out it was one kernel parameter. Woof.
#nvidia #linux #ai

@arXiv_csAI_bot@mastoxiv.page
2024-03-28 06:46:39

EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications
Nisha Pillai, Athish Ram Das, Moses Ayoola, Ganga Gireesan, Bindu Nanduri, Mahalingam Ramkumar
arxiv.org/abs/2403.18203

@arXiv_csDL_bot@mastoxiv.page
2024-02-29 07:14:13

How open are hybrid journals included in transformative agreements?
Najko Jahn
arxiv.org/abs/2402.18255 arxiv.org/pdf…

@keithwilson@fediphilosophy.org
2024-02-22 22:22:09

This is a nice statement of why #Bluesky exists and how it differs from other social networks like #X, #Threads and

@arXiv_csSE_bot@mastoxiv.page
2024-03-01 07:23:01

StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krau{\ss}, Naman Jain, Yixuan S…

@NuclearDisorder@mastodon.social
2024-04-02 06:27:26

Heute vor 39 Jahren: Am 2. April 1985 zündeten die #USA im Rahmen von Operation Grenadier die 7. Atombombe "Hermosa". Grenadier war eine Serie von #Kernwaffentests bei der 1984/85 insgesamt 16 Bomben im Testgebiet in

Kartierung aller Koordinaten von "Operation Grenadier (Atomtest)" im Testgebiet Nevada (Nevada National Security Site, NNSS)
Quelle: OpenStreetMap
Lizenz: Open Data Commons Open Database-Lizenz (ODbL)
@ubuntourist@mastodon.social
2024-02-24 19:32:13

February 22, 2024: "Today, we’re excited to announce that the Bluesky network is federating and opening up in a way that allows you to host your own data."
bsky.social/about/blog/02-22-2

@awinkler@openbiblio.social
2024-04-14 23:57:29

ein Blogbeitrag von @… zu #LOD und #GLAM|s. Neben einer technischen Hinführung wird das Potenzial auch allgemeiner herausgestellt: "Die Einrichtungen könne…

@arXiv_csIR_bot@mastoxiv.page
2024-04-01 06:50:17

Robust Federated Contrastive Recommender System against Model Poisoning Attack
Wei Yuan, Chaoqun Yang, Liang Qu, Guanhua Ye, Quoc Viet Hung Nguyen, Hongzhi Yin
arxiv.org/abs/2403.20107

@ronkjeffries@mastodon.social
2024-02-23 00:52:55

Bluesky federation has launched.
bsky.social/about/blog/02-22-2

@arXiv_csHC_bot@mastoxiv.page
2024-02-27 07:21:38

Open Your Ears to Take a Look: A State-of-the-Art Report on the Integration of Sonification and Visualization
Kajetan Enge, Elias Elmquist, Valentina Caiola, Niklas R\"onnberg, Alexander Rind, Michael Iber, Sara Lenzi, Fangfei Lan, Robert H\"oldrich, Wolfgang Aigner
arxiv.org/abs/2402.16558

@NuclearDisorder@mastodon.social
2024-04-01 09:31:26

Heute vor 73 Jahren: Am 1. April 1952 zündeten die #USA im Rahmen von Operation Tumbler–Snapper die Atombombe "Able". Tumbler–Snapper war eine Serie von #Kernwaffentests bei der 1952 insgesamt 8 Bomben im Testgebiet in

Kartierung aller Koordinaten von "Operation Tumbler-Snapper (Atomtest)" im Testgebiet Nevada (Nevada National Security Site, NNSS)
Quelle: OpenStreetMap
Lizenz: Open Data Commons Open Database-Lizenz (ODbL)
@arXiv_eessAS_bot@mastoxiv.page
2024-03-01 08:36:57

This arxiv.org/abs/2401.06788 has been replaced.
initial toot: mastoxiv.page/@arXiv_ees…

@Techmeme@techhub.social
2024-04-25 04:20:40

Snowflake announces Arctic, an LLM optimized for enterprise tasks such as SQL generation, coding, and instruction following, with an Apache 2.0 license (Shubham Sharma/VentureBeat)
venturebeat.com/data-infrastru

@josemurilo@mato.social
2024-02-28 13:08:08

"Access & use of #publicsector documents & political speeches are key to a thriving #democracy and civil society as they ensure transparency and accountability. Copyright & paywalls, however, can render access to these materials difficult or expensive.
Today we are proposing EU …

@arXiv_csCL_bot@mastoxiv.page
2024-04-01 08:30:00

This arxiv.org/abs/2402.00786 has been replaced.
link: scholar.google.com/scholar?q=a

@arXiv_csCY_bot@mastoxiv.page
2024-03-29 07:33:38

Cycling on the Freeway: The Perilous State of Open Source Neuroscience Software
Britta U. Westner, Daniel R. McCloy, Eric Larson, Alexandre Gramfort, Daniel S. Katz, Arfon M. Smith, invited co-signees
arxiv.org/abs/2403.19394

@laimis@mstdn.social
2024-03-29 22:18:25
Content warning: Raving about LLMs

ARRG I am adding CW warning to this because this will annoy people but I just had a fucking AAAMMMAZING session with claude.ai LLM.
I am working on my portfolio management tool and wanted to enhance it with charts that describe various aspects of my open positions.
Initially I started writing the code and then said hold on, let me tell LLM what I am building and give it the data model the first chart I did to give it an example.
And then I asked for suggestions what to cha…

@v_i_o_l_a@openbiblio.social
2024-04-02 19:25:58

"Welche Vorteile Linked Open Data für Kulturerbe hat" @wikimediaDE: #LinkedData

@arXiv_csDL_bot@mastoxiv.page
2024-05-01 06:48:33

Study on the Temporal Evolution of Literature Bradford Curves in the Context of Library Specialization
Haobai Xue, Xian Liu
arxiv.org/abs/2404.19267 arxiv.org/pdf/2404.19267
arXiv:2404.19267v1 Announce Type: new
Abstract: The Bradford's law of bibliographic scattering is a fundamental law in bibliometrics and can provide valuable guidance to academic libraries in literature search and procurement. However, the Bradford's curves can take various shapes at different time points and there is still a lack of causal explanation for it, so the prediction of its shape is still an open question. This paper attributes the deviation of Bradford curve from the theoretical J-shape to the integer constraints of the journal number and article number, and extends the Leimkuhler and Egghe's formula to cover the core region of very productive journals, where the theoretical journal number of which fall below one. The key parameters of the extended formula are identified and studied by using the Simon-Yule model. The reasons for the Groos Droop are explained and the critical point for the shape change are studied. Finally, the proposed formulae are validated with the empirical data found in the literature. It is found that the proposed method can be used to predict the evolution of Bradford's curves and thus guide the academic library for scientific literature procurement and utilization.

@arXiv_csDC_bot@mastoxiv.page
2024-03-26 06:48:41

Radical-Cylon: A Heterogeneous Data Pipeline for Scientific Computing
Arup Kumar Sarker, Aymen Alsaadi, Niranda Perera, Mills Staylor, Gregor von Laszewski, Matteo Turilli, Ozgur Ozan Kilic, Mikhail Titov, Andre Merzky, Shantenu Jha, Geoffrey Fox
arxiv.org/abs/2403.15721

@arXiv_csAI_bot@mastoxiv.page
2024-03-27 06:47:00

Data-driven Energy Consumption Modelling for Electric Micromobility using an Open Dataset
Yue Ding, Sen Yan, Maqsood Hussain Shah, Hongyuan Fang, Ji Li, Mingming Liu
arxiv.org/abs/2403.17632

@ErikJonker@mastodon.social
2024-04-26 09:58:41

Apparently even the European Data Protection Supervisor #EDPS , can't find the funds for it's mastodon servers. Which is bizar if you think about all the "talk" about not relying on #bigtech , a more independent EU, open standards, security, importance of reliable information etc.

@johl@mastodon.xyz
2024-04-04 08:36:36

Schönes Praxisbeispiel für Linked Open Data: Man braucht LOD, damit Bobbi, der Berliner Chatbot-Bär, den Namen des aktuellen Regierenden Bürgermeisters kennt.
odis-berlin.de/aktuelles/2024-

@arXiv_csCL_bot@mastoxiv.page
2024-05-01 06:48:59

Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom
Shisen Yue, Siyuan Song, Xinyuan Cheng, Hai Hu
arxiv.org/abs/2404.19509 arxiv.org/pdf/2404.19509
arXiv:2404.19509v1 Announce Type: new
Abstract: Understanding the non-literal meaning of an utterance is critical for large language models (LLMs) to become human-like social communicators. In this work, we introduce SwordsmanImp, the first Chinese multi-turn-dialogue-based dataset aimed at conversational implicature, sourced from dialogues in the Chinese sitcom $\textit{My Own Swordsman}$. It includes 200 carefully handcrafted questions, all annotated on which Gricean maxims have been violated. We test eight close-source and open-source LLMs under two tasks: a multiple-choice question task and an implicature explanation task. Our results show that GPT-4 attains human-level accuracy (94%) on multiple-choice questions. CausalLM demonstrates a 78.5% accuracy following GPT-4. Other models, including GPT-3.5 and several open-source models, demonstrate a lower accuracy ranging from 20% to 60% on multiple-choice questions. Human raters were asked to rate the explanation of the implicatures generated by LLMs on their reasonability, logic and fluency. While all models generate largely fluent and self-consistent text, their explanations score low on reasonability except for GPT-4, suggesting that most LLMs cannot produce satisfactory explanations of the implicatures in the conversation. Moreover, we find LLMs' performance does not vary significantly by Gricean maxims, suggesting that LLMs do not seem to process implicatures derived from different maxims differently. Our data and code are available at github.com/sjtu-compling/llm-p.

@awinkler@openbiblio.social
2024-04-14 23:57:29

ein Blogbeitrag von @… zu #LOD und #GLAM|s. Neben einer technischen Hinführung wird das Potenzial auch allgemeiner herausgestellt: "Die Einrichtungen könne…

@arXiv_csSE_bot@mastoxiv.page
2024-02-26 06:52:54

Studying LLM Performance on Closed- and Open-source Data
Toufique Ahmed, Christian Bird, Premkumar Devanbu, Saikat Chakraborty
arxiv.org/abs/2402.15100

@lysander07@sigmoid.social
2024-03-04 10:42:25

Nice overview about LLMs for data annotation including paper references of papers with open source code & data.
Zhen Tan et al, Large Language Models for Data Annotation: A Survey, arxiv.org/abs/2402.13446

 A list of representative LLM-Based Data Annotation papers with open-source code/data.
@arXiv_csCL_bot@mastoxiv.page
2024-02-29 06:50:38

Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
Yuan Ge, Yilun Liu, Chi Hu, Weibin Meng, Shimin Tao, Xiaofeng Zhao, Hongxia Ma, Li Zhang, Hao Yang, Tong Xiao
arxiv.org/abs/2402.18191

@arXiv_csDL_bot@mastoxiv.page
2024-04-26 06:50:33

An analysis of the effects of sharing research data, code, and preprints on citations
Giovanni Colavizza, Lauren Cadwallader, Marcel LaFlamme, Gr\'egory Dozot, St\'ephane Lecorney, Daniel Rappo, Iain Hrynaszkiewicz
arxiv.org/abs/2404.16171 arxiv.org/pdf/2404.16171
arXiv:2404.16171v1 Announce Type: new
Abstract: Calls to make scientific research more open have gained traction with a range of societal stakeholders. Open Science practices include but are not limited to the early sharing of results via preprints and openly sharing outputs such as data and code to make research more reproducible and extensible. Existing evidence shows that adopting Open Science practices has effects in several domains. In this study, we investigate whether adopting one or more Open Science practices leads to significantly higher citations for an associated publication, which is one form of academic impact. We use a novel dataset known as Open Science Indicators, produced by PLOS and DataSeer, which includes all PLOS publications from 2018 to 2023 as well as a comparison group sampled from the PMC Open Access Subset. In total, we analyze circa 122'000 publications. We calculate publication and author-level citation indicators and use a broad set of control variables to isolate the effect of Open Science Indicators on received citations. We show that Open Science practices are adopted to different degrees across scientific disciplines. We find that the early release of a publication as a preprint correlates with a significant positive citation advantage of about 20.2% on average. We also find that sharing data in an online repository correlates with a smaller yet still positive citation advantage of 4.3% on average. However, we do not find a significant citation advantage for sharing code. Further research is needed on additional or alternative measures of impact beyond citations. Our results are likely to be of interest to researchers, as well as publishers, research funders, and policymakers.

@arXiv_csHC_bot@mastoxiv.page
2024-02-21 06:50:04

Situating Data Sets: Making Public Data Actionable for Housing Justice
Anh-Ton Tran, Grace Guo, Jordan Taylor, Katsuki Chan, Elora Raymond, Carl DiSalvo
arxiv.org/abs/2402.12505

@arXiv_csDC_bot@mastoxiv.page
2024-03-26 06:48:41

Radical-Cylon: A Heterogeneous Data Pipeline for Scientific Computing
Arup Kumar Sarker, Aymen Alsaadi, Niranda Perera, Mills Staylor, Gregor von Laszewski, Matteo Turilli, Ozgur Ozan Kilic, Mikhail Titov, Andre Merzky, Shantenu Jha, Geoffrey Fox
arxiv.org/abs/2403.15721

@Techmeme@techhub.social
2024-03-11 09:50:48

Elon Musk says xAI plans to open-source Grok this week; xAI released Grok in November 2023, offering access to "real-time" data via a $16/month X subscription (Manish Singh/TechCrunch)
techcrunch.com/2024/03/11/elon

@ErikJonker@mastodon.social
2024-04-26 09:58:41

Apparently even the European Data Protection Supervisor #EDPS , can't find the funds for it's mastodon servers. Which is bizar if you think about all the "talk" about not relying on #bigtech , a more independent EU, open standards, security, importance of reliable information etc.

@NuclearDisorder@mastodon.social
2024-04-30 07:16:21

Heute vor 32 Jahren: Am 30. April 1992 zündeten die #USA im Rahmen von Operation Julin die 3. Atombombe "Diamond Fortune". Julin war eine Serie von #Kernwaffentests bei der 1991/92 insgesamt 9 Bomben im Testgebiet in

Kartierung aller Koordinaten von "Operation Julin (Atomtest)" im Testgebiet Nevada (Nevada National Security Site, NNSS)
Quelle: OpenStreetMap
Lizenz: Open Data Commons Open Database-Lizenz (ODbL)
@Techmeme@techhub.social
2024-03-22 23:01:21

Worldcoin announces Personal Custody, which saves biometric data captured by the Orb on users' personal devices, and plans to open source the Orb's software (RT Watson/The Block)
theblock.co/post/284123/worldc

@NuclearDisorder@mastodon.social
2024-04-30 07:27:31

Heute vor 55 Jahren: Am 30. April 1969 zündeten die #USA im Rahmen von Operation Bowline die Atombomben "Blenton" & "Thistle". Bowline war eine Serie von #Kernwaffentests bei der 68/69 insgesamt 58 Bomben im Testgebiet in

Kartierung aller Koordinaten von "Operation Bowline (Atomtest)" im Testgebiet Nevada (Nevada National Security Site, NNSS)
Quelle: OpenStreetMap
Lizenz: Open Data Commons Open Database-Lizenz (ODbL)
@arXiv_csDL_bot@mastoxiv.page
2024-03-28 08:27:43

This arxiv.org/abs/2310.18011 has been replaced.
initial toot: mastoxiv.page/@arXiv_csDL_…

@NuclearDisorder@mastodon.social
2024-04-30 07:14:39

Heute vor 49 Jahren: Am 30. April 1975 testen die #USA die Atombombe "Obar". Die Operation Bedrock war eine Serie von 27 US-amerikanischen #Kernwaffentests, die 1974/75 auf der Nevada Test Site in Nevada unterirdisch durchgeführt wurde.

Kartierung aller Koordinaten von "Operation Bedrock (Atomtest)" im Testgebiet Nevada (Nevada National Security Site, NNSS)
Quelle: OpenStreetMap
Lizenz: Open Data Commons Open Database-Lizenz (ODbL)
@Techmeme@techhub.social
2024-02-14 00:30:45

Cohere's nonprofit research lab releases its open-source multilingual LLM Aya, which it says can follow instructions in more than 100 languages (Alison Snyder/Axios)
axios.com/2024/02/13/open-sour

@arXiv_csCL_bot@mastoxiv.page
2024-02-29 06:51:00

Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
Nihal V. Nayak, Yiyang Nan, Avi Trost, Stephen H. Bach
arxiv.org/abs/2402.18334

@NuclearDisorder@mastodon.social
2024-04-30 07:18:34

Heute vor 37 Jahren: Am 30.04.1987 zündeten die #USA im Rahmen von Operation Musketeer die Atombombe "Hardin". Musketeer war eine Serie von #Kernwaffentests bei der 1986/87 insgesamt 16 Bomben im Testgebiet in

Kartierung aller Koordinaten von "Operation Musketeer (Atomtest)" im Testgebiet Nevada (Nevada National Security Site, NNSS)
Quelle: OpenStreetMap
Lizenz: Open Data Commons Open Database-Lizenz (ODbL)
@arXiv_csDL_bot@mastoxiv.page
2024-02-26 08:30:09

This arxiv.org/abs/2402.05211 has been replaced.
link: scholar.google.com/scholar?q=a

@Techmeme@techhub.social
2024-02-14 00:30:45

Cohere's nonprofit research lab releases its open-source multilingual LLM Aya, which it says can follow instructions in more than 100 languages (Alison Snyder/Axios)
axios.com/2024/02/13/open-sour

@arXiv_csDL_bot@mastoxiv.page
2024-03-27 08:24:09

This arxiv.org/abs/2310.18011 has been replaced.
initial toot: mastoxiv.page/@arXiv_csDL_…

@arXiv_csCL_bot@mastoxiv.page
2024-02-23 06:56:04

Balanced Data Sampling for Language Model Training with Clustering
Yunfan Shao, Linyang Li, Zhaoye Fei, Hang Yan, Dahua Lin, Xipeng Qiu
arxiv.org/abs/2402.14526