Tootfinder

@gedankenstuecke@scholar.social
2025-06-09 09:02:06

«Worse, as the latest Apple papers shows, LLMs may well work on your easy test set (like Hanoi with 4 discs) and seduce you into thinking it has built a proper, generalizable solution when it does not.»
The technology that keeps on giving
https://garymarcus.substack.com/p/a-knockout-blow-for-llms

@arXiv_csDB_bot@mastoxiv.page
2025-07-08 08:51:30

AKEGEN: A LLM-based Tabular Corpus Generator for Evaluating Dataset Discovery in Data Lakes
Zhenwei Dai, Chuan Lei, Asterios Katsifodimos, Xiao Qin, Christos Faloutsos, Huzefa Rangwala
https://arxiv.org/abs/2507.04687

AKEGEN: A LLM-based Tabular Corpus Generator for Evaluating Dataset Discovery in Data Lakes
How to generate a large, realistic set of tables along with joinability relationships, to stress-test dataset discovery methods? Dataset discovery methods aim to automatically identify related data assets in a data lake. The development and evaluation of such solutions for customers from a wide range of business domains, relies on diverse, high quality and domain-specific tabular benchmarks. Large language models (LLMs) are trained on a wide variety of text data, which can provide a strong foun…

@cosmos4u@scicomm.xyz
2025-07-07 23:26:34

Exoplanet Atmospheric Refraction Effects in the #Kepler Sample: https://arxiv.org/abs/2507.02126 -> "We present an analysis on the detection viability of refraction effects in Kepler's exoplanet atmospheres using binning techniques for their light curves in order to compare against simulated refraction effects. We split the Kepler exoplanets into sub-populations according to orbital period and planetary radius, then search for out-of-transit changes in the relative flux associated with atmospheric refraction of starlight. The presence of refraction effects - or lack thereof - may be used to measure and set limits on the bulk properties of an atmosphere, including mean molecular weight or the presence of hazes.
In this work, we use the presence of refraction effects to test whether exoplanets above the period-radius valley have H/He atmospheres, which high levels of stellar radiation could evaporate away, in turn leaving rocky cores below the valley. We find strong observational evidence of refraction effects for exoplanets above the period-radius valley based on Kepler photometry, however those related to optically thin H/He atmospheres are not common in the observed planetary population. This result may be attributed to signal dampening caused by clouds and hazes, consistent with the optically thick and intrinsically hotter atmospheres of Kepler exoplanets caused by relatively close host star proximity."

@arXiv_grqc_bot@mastoxiv.page
2025-07-08 12:51:50

Towards a consistent Semiclassical Theory of Gravity
Francisco Pipa
https://arxiv.org/abs/2507.05237 https://arxiv.org/pdf/2507.05237…

Towards a consistent Semiclassical Theory of Gravity
We argue that semiclassical gravity can be rendered consistent by assuming that quantum systems only emit a gravitational field when they interact with a stable determination chain (SDCs), which are specific chains of interactions modeled via decoherence and test functions obeying a set of conditions. When systems are disconnected from SDCs, they do not emit a gravitational field. This denies the universality of gravity, while upholding a version of the equivalence principle. We argue that this…

@paulbusch@mstdn.ca
2025-07-04 22:42:36

But that is adjustable via the included Ecobee smart thermostat. If we set the backup heat to start at minus 5°C then our furnace could be running for 8 weeks or less. The key is to test and monitor electricity consumption at lower temperatures since the heat pump works harder when it's colder. Then compare our gas and electricity bills to determine the right set-up. Over time we should find the right balance point.

@netzschleuder@social.skewed.de
2025-06-04 10:00:05

hiv_transmission: HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the…

hiv_transmission: HIV transmission network (1988-2001). 35229 nodes, 85890 edges. https://networks.skewed.de/net/hiv_transmission

hiv_transmission — HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the network among people who are connected to the egodyad network. …

@mgorny@social.treehouse.systems
2025-07-01 16:20:14

One of the goals I've set for further development of #Python eclasses in #Gentoo was to avoid needless complexity. Unfortunately, the subject matter sometimes requires them. However, many of the functions added lately were already manually done in ebuilds for years.
We've started disabling plugin autoloading years ago. First we just did that for individual packages that caused issues. Then, for these where tests ended up being really slow. Finally, pretty much anywhere `python_test()` was declared. Doing it all manually was particularly cumbersome — all I needed for `EPYTEST_PLUGINS` is a good idea how to generalize it.
Similarly, `EPYTEST_XDIST` was added after we have been adding manually `epytest -p xdist -n "$(makeopts_jobs)" --dist=worksteal` — and while at it, I've added `EPYTEST_JOBS` to override the job count.
Perhaps `EPYTEST_TIMEOUT` wasn't that common. However, it was meant to help CI systems that could otherwise get stuck on hanging test.
Similarly, "standard library" version (like `3.9`) matching to `python_gen_cond_dep` was added after a long period of explicitly stating `python3_9 pypy3`. As an extra benefit, this also resolved the problem that at the time `pypy3` could mean different Python versions.

@jerome@jasette.facil.services
2025-06-04 13:46:46

No quote post yet
Remote fetch replies is set as experimental and also asynchronous, unclear if it would be ready for the final release. https://mastodon.social/@MastodonEngineering/114625074809479231

Mastodon Engineering (@MastodonEngineering@mastodon.social)
The first beta release of Mastodon 4.4.0 is ready for testing! Many user-facing improvements: featured content, updates for lists, follower management, new emoji, and refreshed audio and media players. For server owners: new legal features, moderation tweaks, announcements updates, upgraded software stack, and some experimental feature options. Please test it out, and provide feedback 💬 https://github.com/mastodon/mastodon/releases/tag/v4.4.0-beta.1

@arXiv_statME_bot@mastoxiv.page
2025-06-03 17:27:08

This https://arxiv.org/abs/2410.12201 has been replaced.
link: https://scholar.google.com/scholar?q=a

Data-light Uncertainty Set Merging with Admissibility: Synthetics, Aggregation, and Test Inversion
This article introduces a Synthetics, Aggregation, and Test inversion (SAT) approach for merging diverse and potentially dependent uncertainty sets into a single unified set. The procedure is data-light, relying only on initial sets and control levels, and it adapts to any user-specified initial uncertainty sets, accommodating potentially varying coverage levels. SAT is motivated by the challenge of integrating uncertainty sets when only the initial sets and their control levels are available -…

@arXiv_econEM_bot@mastoxiv.page
2025-07-03 08:26:40

Uniform Validity of the Subset Anderson-Rubin Test under Heteroskedasticity and Nonlinearity
Atsushi Inoue, \`Oscar Jord\`a, Guido M. Kuersteiner
https://arxiv.org/abs/2507.01167 …

Uniform Validity of the Subset Anderson-Rubin Test under Heteroskedasticity and Nonlinearity
We consider the Anderson-Rubin (AR) statistic for a general set of nonlinear moment restrictions. The statistic is based on the criterion function of the continuous updating estimator (CUE) for a subset of parameters not constrained under the Null. We treat the data distribution nonparametrically with parametric moment restrictions imposed under the Null. We show that subset tests and confidence intervals based on the AR statistic are uniformly valid over a wide range of distributions that incl…

@jochenlingelba1@h-net.social
2025-06-30 08:35:48

WhichYear 6/30/25
3566 pts
9.0 avg. years off
3️⃣ ⚪ ⚪ ⚪ 1️⃣
https://whichyr.com

Which Year - Photo Year Guessing Game
Guess the year real-world photos were taken. Test your history knowledge with a daily challenge featuring a new set of photos each day.

@arXiv_eessAS_bot@mastoxiv.page
2025-06-06 09:39:10

This https://arxiv.org/abs/2505.22251 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition
Recent work suggests that large language models (LLMs) can improve performance of speech tasks compared to existing systems. To support their claims, results on LibriSpeech and Common Voice are often quoted. However, this work finds that a substantial amount of the LibriSpeech and Common Voice evaluation sets appear in public LLM pretraining corpora. This calls into question the reliability of findings drawn from these two datasets. To measure the impact of contamination, LLMs trained with or w…

@Xexyz@mastodon.me.uk
2025-05-15 14:23:32

Test Drive Unlimited: a quieter Horizon Festival
I found TDU in a box in the loft a couple of weeks ago, and since I now have a new Xbox 360 set up (replacing my old one which didn't have HDMI output) so can play games which are not compatible with the Xbox One, I thought I'd revisit the game. Unfortunately my save wasn't in the cloud, and wasn't transferred while I had both consoles set up, so I needed to start from the beginning.

Test Drive Unlimited: a quieter Horizon Festival
I found TDU in a box in the loft a couple of weeks ago, and since I now have a new Xbox 360 set up (replacing my old one which didn’t have HDMI output) so can play games which are not compati…

@netzschleuder@social.skewed.de
2025-05-31 13:00:05

hiv_transmission: HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the…

hiv_transmission — HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the network among people who are connected to the egodyad network. …

@arXiv_csCR_bot@mastoxiv.page
2025-07-01 07:40:43

In-context learning for the classification of manipulation techniques in phishing emails
Antony Dalmiere (LAAS-TRUST, LAAS), Guillaume Auriol (LAAS-TRUST, INSA Toulouse), Vincent Nicomette (LAAS-TSF, LAAS), Pascal Marchand (LERASS)
https://arxiv.org/abs/2506.22515

In-context learning for the classification of manipulation techniques in phishing emails
Traditional phishing detection often overlooks psychological manipulation. This study investigates using Large Language Model (LLM) In-Context Learning (ICL) for fine-grained classification of phishing emails based on a taxonomy of 40 manipulation techniques. Using few-shot examples with GPT-4o-mini on real-world French phishing emails (SignalSpam), we evaluated performance against a human-annotated test set (100 emails). The approach effectively identifies prevalent techniques (e.g., Baiting, …

@arXiv_mathST_bot@mastoxiv.page
2025-06-13 08:40:00

A note on the properties of the confidence set for the local average treatment effect obtained by inverting the score test
Ezequiel Smucler, Ludovico Lanni, David Masip
https://arxiv.org/abs/2506.10449

A note on the properties of the confidence set for the local average treatment effect obtained by inverting the score test
We study the properties of the score confidence set for the local average treatment effect in non and semiparametric instrumental variable models. This confidence set is constructed by inverting a score test based on an estimate of the nonparametric influence function for the estimand, and is known to be uniformly valid in models that allow for arbitrarily weak instruments; because of this, the confidence set can have infinite diameter at some laws. We characterize the six possible forms the sc…

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:56:59

Text2Cypher Across Languages: Evaluating Foundational Models Beyond English
Makbule Gulcin Ozsoy, William Tai
https://arxiv.org/abs/2506.21445 https://

Text2Cypher Across Languages: Evaluating Foundational Models Beyond English
Recent advances in large language models have enabled natural language interfaces that translate user questions into database queries, such as Text2SQL, Text2SPARQL, and Text2Cypher. While these interfaces enhance database accessibility, most research today focuses solely on English, with limited evaluation in other languages. This paper investigates the performance of foundational LLMs on the Text2Cypher task across multiple languages. We create and release a multilingual test set by translati…

@arXiv_econEM_bot@mastoxiv.page
2025-05-28 07:22:10

Conditional Method Confidence Set
Lukas Bauer, Ekaterina Kazak
https://arxiv.org/abs/2505.21278 https://arxiv.org/pdf/2505.21278

Conditional Method Confidence Set
This paper proposes a Conditional Method Confidence Set (CMCS) which allows to select the best subset of forecasting methods with equal predictive ability conditional on a specific economic regime. The test resembles the Model Confidence Set by Hansen et al. (2011) and is adapted for conditional forecast evaluation. We show the asymptotic validity of the proposed test and illustrate its properties in a simulation study. The proposed testing procedure is particularly suitable for stress-testing …

@netzschleuder@social.skewed.de
2025-05-31 13:00:05

hiv_transmission: HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the…

hiv_transmission — HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the network among people who are connected to the egodyad network. …

@arXiv_csNE_bot@mastoxiv.page
2025-06-25 07:44:39

A Unified Platform to Evaluate STDP Learning Rule and Synapse Model using Pattern Recognition in a Spiking Neural Network
Jaskirat Singh Maskeen, Sandip Lashkare
https://arxiv.org/abs/2506.19377

A Unified Platform to Evaluate STDP Learning Rule and Synapse Model using Pattern Recognition in a Spiking Neural Network
We develop a unified platform to evaluate Ideal, Linear, and Non-linear $\text{Pr}_{0.7}\text{Ca}_{0.3}\text{MnO}_{3}$ memristor-based synapse models, each getting progressively closer to hardware realism, alongside four STDP learning rules in a two-layer SNN with LIF neurons and adaptive thresholds for five-class MNIST classification. On MNIST with small train set and large test set, our two-layer SNN with ideal, 25-state, and 12-state nonlinear memristor synapses achieves 92.73 %, 91.07 %, an…

@arXiv_csIT_bot@mastoxiv.page
2025-06-13 07:48:40

Optimal Non-Adaptive Group Testing with One-Sided Error Guarantees
Daniel McMorrow, Jonathan Scarlett
https://arxiv.org/abs/2506.10374 https://

Optimal Non-Adaptive Group Testing with One-Sided Error Guarantees
The group testing problem consists of determining a sparse subset of defective items from within a larger set of items via a series of tests, where each test outcome indicates whether at least one defective item is included in the test. We study the approximate recovery setting, where the recovery criterion of the defective set is relaxed to allow a small number of items to be misclassified. In particular, we consider one-sided approximate recovery criteria, where we allow either only false neg…

@arXiv_csCY_bot@mastoxiv.page
2025-06-11 07:24:43

Evaluation of Machine Learning Models in Student Academic Performance Prediction
A. G. R. Sandeepa, Sanka Mohottala
https://arxiv.org/abs/2506.08047 https:…

Evaluation of Machine Learning Models in Student Academic Performance Prediction
This research investigates the use of machine learning methods to forecast students' academic performance in a school setting. Students' data with behavioral, academic, and demographic details were used in implementations with standard classical machine learning models including multi-layer perceptron classifier (MLPC). MLPC obtained 86.46% maximum accuracy for test set across all implementations. Under 10-fold cross validation, MLPC obtained 79.58% average accuracy for test set while for train…

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-06-19 09:27:32

Benchmarks for protocol control in nonequilibrium statistical mechanics
Stephen Whitelam, Corneel Casert, Megan Engel, Isaac Tamblyn
https://arxiv.org/abs/2506.15122

Benchmarks for protocol control in nonequilibrium statistical mechanics
We present a set of computer codes designed to test methods for optimizing time-dependent control protocols in fluctuating nonequilibrium systems. Each problem consists of a stochastic model, an optimization objective, and C++ and Python implementations that can be run on Unix-like systems. These benchmark systems are simple enough to run on a laptop, but challenging enough to test the capabilities of modern optimization methods. This release includes five problems and a worked example. The pro…

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 16:55:39

This https://arxiv.org/abs/2410.08911 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

Test-driven Software Experimentation with LASSO: an LLM Prompt Benchmarking Example
Empirical software engineering faces a critical gap: the lack of standardized tools for rapid development and execution of Test-Driven Software Experiments (TDSEs) -- that is, experiments that involve the execution of software subjects and the observation and analysis of their "de facto" run-time behavior. In this paper we present a general-purpose analysis platform called LASSO that provides a minimal set of domain-specific languages and data structures to conduct TDSEs. By empowering users wi…

@arXiv_econEM_bot@mastoxiv.page
2025-06-26 08:07:30

A Sharp and Robust Test for Selective Reporting
Stefan Faridani
https://arxiv.org/abs/2506.20035 https://arxiv.org/pdf/2506.20035

A Sharp and Robust Test for Selective Reporting
This paper proposes a test that is consistent against every detectable form of selective reporting and remains interpretable even when the t-scores are not exactly normal. The test statistic is the distance between the smoothed empirical t-curve and the set of all t-curves that would be possible in the absence of any selective reporting. This novel projection test can only be evaded in large meta-samples by selective reporting that also evades all other valid tests of restrictions on the t-curv…

@arXiv_eessIV_bot@mastoxiv.page
2025-06-23 11:47:30

Proportional Sensitivity in Generative Adversarial Network (GAN)-Augmented Brain Tumor Classification Using Convolutional Neural Network
Mahin Montasir Afif, Abdullah Al Noman, K. M. Tahsin Kabir, Md. Mortuza Ahmmed, Md. Mostafizur Rahman, Mufti Mahmud, Md. Ashraful Babu
https://arxiv.org/abs/2506.17165

Proportional Sensitivity in Generative Adversarial Network (GAN)-Augmented Brain Tumor Classification Using Convolutional Neural Network
Generative Adversarial Networks (GAN) have shown potential in expanding limited medical imaging datasets. This study explores how different ratios of GAN-generated and real brain tumor MRI images impact the performance of a CNN in classifying healthy vs. tumorous scans. A DCGAN was used to create synthetic images which were mixed with real ones at various ratios to train a custom CNN. The CNN was then evaluated on a separate real-world test set. Our results indicate that the model maintains hig…

@arXiv_mathST_bot@mastoxiv.page
2025-06-03 07:37:24

Detecting non-uniform patterns on high-dimensional hyperspheres
Tiefeng Jiang, Tuan Pham
https://arxiv.org/abs/2506.00444 https://arx…

Detecting non-uniform patterns on high-dimensional hyperspheres
We propose a new probabilistic characterization of the uniform distribution on hyperspheres in terms of its inner product, extending the ideas of \cite{cuesta2009projection,cuesta2007sharp} in a data-driven manner. Using this characterization, we define a new distance that quantifies the deviation of an arbitrary distribution from uniformity. As an application, we construct a novel nonparametric test for the uniformity testing problem: determining whether a set of $n$ i.i.d. random points on …

@arXiv_quantph_bot@mastoxiv.page
2025-06-12 09:50:22

Effective criteria for entanglement witnesses in small dimensions
{\L}ukasz Grzelka, {\L}ukasz Skowronek, Karol \.Zyczkowski
https://arxiv.org/abs/2506.09298

Effective criteria for entanglement witnesses in small dimensions
We present an effective set of necessary and sufficient criteria for block-positivity of matrices of order $4$ over $\mathbb{C}$. The approach is based on Sturm sequences and quartic polynomial positivity conditions presented in recent literature. The procedure allows us to test whether a given $4\times 4$ complex matrix corresponds to an entanglement witness, and it is exact when the matrix coefficients belong to the rationals, extended by $\mathrm{i}$. The method can be generalized to $\mathc…

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:59:49

Potemkin Understanding in Large Language Models
Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan
https://arxiv.org/abs/2506.21521 https://arxiv.org/pdf/2506.21521 https://arxiv.org/html/2506.21521
arXiv:2506.21521v1 Announce Type: new
Abstract: Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.
toXiv_bot_toot

@arXiv_csIR_bot@mastoxiv.page
2025-06-13 07:38:00

Towards Understanding Bias in Synthetic Data for Evaluation
Hossein A. Rahmani, Varsha Ramineni, Nick Craswell, Bhaskar Mitra, Emine Yilmaz
https://arxiv.org/abs/2506.10301

Towards Understanding Bias in Synthetic Data for Evaluation
Test collections are crucial for evaluating Information Retrieval (IR) systems. Creating a diverse set of user queries for these collections can be challenging, and obtaining relevance judgments, which indicate how well retrieved documents match a query, is often costly and resource-intensive. Recently, generating synthetic datasets using Large Language Models (LLMs) has gained attention in various applications. While previous work has used LLMs to generate synthetic queries or documents to imp…

@arXiv_mathFA_bot@mastoxiv.page
2025-06-16 08:14:49

A Diestel-Faires type result for multimeasures
Jos\'e Rodr\'iguez
https://arxiv.org/abs/2506.11872 https://arxiv.org/pdf/2506…

A Diestel-Faires type result for multimeasures
Let $X$ be a real Banach space and let $Y \subseteq X^*$ be a linear subspace having the Orlicz-Thomas property, that is, for each $σ$-algebra $Σ$ and for each map $ν:Σ\to X$, the countable additivity of the composition $x^*\circ ν$ for all $x^*\in Y$ implies the countable additivity of $ν$. We show that the Orlicz-Thomas property allows to test countable additivity of set-valued maps. Namely, if $M$ is a map defined on a $σ$-algebra $Σ$ whose values are convex, $σ(X,Y)$-compact, bound…

@arXiv_astrophGA_bot@mastoxiv.page
2025-06-12 08:41:31

Bar-driven dispersal of Galactic substructure
Adam M. Dillamore, Jason L. Sanders
https://arxiv.org/abs/2506.09117 https://arxiv.org/…

Bar-driven dispersal of Galactic substructure
Galactic archaeologists often assume that integrals of motion (IoMs) such as $L_z$ and $E$ are conserved, so substructure remains frozen in IoM space over many Gyr. However, this is not true in the Milky Way due in part to its rotating bar. In this study we quantify the effects of the bar on the dynamics of substructure. We employ three different theoretical models: an analytical toy model; a set of test particle simulations with steady and slowing bars; and a cosmological zoom-in simulation of…

@netzschleuder@social.skewed.de
2025-06-15 12:00:04

hiv_transmission: HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the…

hiv_transmission — HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the network among people who are connected to the egodyad network. …

@arXiv_quantph_bot@mastoxiv.page
2025-06-12 10:13:01

Certifying asymmetry in the configuration of three qubits
Abdelmalek Taoutioui, G\'abor Dr\'otos, Tam\'as V\'ertesi
https://arxiv.org/abs/2506.09939

Certifying asymmetry in the configuration of three qubits
We certify asymmetry in the configuration of the Bloch vectors of a set of three unknown qubit states within the dimensionally bounded prepare-and-measure scenario. To do this, we construct a linear witness from three simpler witnesses as building blocks, each featuring, along with two binary measurement settings, three preparations; two of them are associated with the certification task, while the third one serves as an auxiliary. The final witness is chosen to self-test some target configurat…

@arXiv_econEM_bot@mastoxiv.page
2025-05-30 07:21:50

Evaluating financial tail risk forecasts: Testing Equal Predictive Ability
Lukas Bauer
https://arxiv.org/abs/2505.23333 https://arxiv…

Evaluating financial tail risk forecasts: Testing Equal Predictive Ability
This paper provides comprehensive simulation results on the finite sample properties of the Diebold-Mariano (DM) test by Diebold and Mariano (1995) and the model confidence set (MCS) testing procedure by Hansen et al. (2011) applied to the asymmetric loss functions specific to financial tail risk forecasts, such as Value-at-Risk (VaR) and Expected Shortfall (ES). We focus on statistical loss functions that are strictly consistent in the sense of Gneiting (2011a). We find that the tests show lit…

@arXiv_csCL_bot@mastoxiv.page
2025-06-17 09:53:05

Refract ICL: Rethinking Example Selection in the Era of Million-Token Models
Arjun R. Akula, Kazuma Hashimoto, Krishna Srinivasan, Aditi Chaudhary, Karthik Raman, Michael Bendersky
https://arxiv.org/abs/2506.12346

Refract ICL: Rethinking Example Selection in the Era of Million-Token Models
The emergence of long-context large language models (LLMs) has enabled the use of hundreds, or even thousands, of demonstrations for in-context learning (ICL) - a previously impractical regime. This paper investigates whether traditional ICL selection strategies, which balance the similarity of ICL examples to the test input (using a text retriever) with diversity within the ICL set, remain effective when utilizing a large number of demonstrations. Our experiments demonstrate that, while longer…

Tootfinder

Opt-in global Mastodon full text search. Join the index!