2024-02-02 18:43:34
Spain training Ukrainian military personnel on servicing Patriot missile system: https://benborges.xyz/2024/02/02/spain-training-ukrainian.html
Spain training Ukrainian military personnel on servicing Patriot missile system: https://benborges.xyz/2024/02/02/spain-training-ukrainian.html
How the EFF, Techdirt, MuckRock, and DDoSecrets are pushing back against legal threats aiming to censor reports on Appin's alleged hacker-for-hire past (Andy Greenberg/Wired)
https://www.wired.com/story/appin-training-centers-lawsuits-censorship/…
Fine Structure-Aware Sampling: A New Sampling Training Scheme for Pixel-Aligned Implicit Models in Single-View Human Reconstruction
Kennard Yanting Chan, Fayao Liu, Guosheng Lin, Chuan Sheng Foo, Weisi Lin
https://arxiv.org/abs/2402.19197
Wiley licenses content for training an #LLM. The company was not named, but I would suspect it's the one which has been signing a lot of licensing deals lately. Access to STM content could be a big differentiator, though I wouldn't expect it to be exclusive. Also, $23M sounds small.
This https://arxiv.org/abs/2403.13799 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…
Sources: OpenAI discussed training GPT-5 on public YouTube video transcripts; AI companies struggle to find quality training data as publishers block access (Deepa Seetharaman/Wall Street Journal)
https://www.…
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Lucas-Andre\"i Thil, Mirela Popa, Gerasimos Spanakis
https://arxiv.org/abs/2405.00516
🔊 #NowPlaying on BBCRadio2's #ClaudiaWinkleman
Dua Lipa:
🎵 Training Season
#DuaLipa
https://open.spotify.com/track/2UrKwpJwcoOlYZpooWD1hY
https://sielrecords.bandcamp.com/track/dua-lipa-training-season-2
A pretest-posttest pilot study for augmented reality-based physical-cognitive training in community-dwelling older adults at risk of mild cognitive impairment
Sirinun Chaipunko, Watthanaree Ammawat, Keerathi Oanmun, Wanvipha Hongnaphadol, Supatida Sorasak, Pattrawadee Makmee
https://arxiv.org/abs/2404.18970 https://arxiv.org/pdf/2404.18970
arXiv:2404.18970v1 Announce Type: new
Abstract: As cognitive interventions for older adults evolve, modern technologies are increasingly integrated into their development. This study investigates the efficacy of augmented reality (AR)-based physical-cognitive training using an interactive game with Kinect motion sensor technology on older individuals at risk of mild cognitive impairment. Utilizing a pretest-posttest experimental design, twenty participants (mean age 66.8 SD. = 4.6 years, age range 60-78 years) underwent eighteen individual training sessions, lasting 45 to 60 minutes each, conducted three times a week over a span of 1.5 months. The training modules from five activities, encompassing episodic and working memory, attention and inhibition, cognitive flexibility, and speed processing, were integrated with physical movement and culturally relevant Thai-context activities. Results revealed significant improvements in inhibition, cognitive flexibility, accuracy, and reaction time, with working memory demonstrating enhancements in accuracy albeit not in reaction time. These findings underscore the potential of AR interventions to bolster basic executive enhancement among community-dwelling older adults at risk of cognitive decline.
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping
Chenyu Jiang, Ye Tian, Zhen Jia, Shuai Zheng, Chuan Wu, Yida Wang
https://arxiv.org/abs/2404.19429
Always be Pre-Training: Representation Learning for Network Intrusion Detection with GNNs
Zhengyao Gu, Diego Troy Lopez, Lilas Alrahis, Ozgur Sinanoglu
https://arxiv.org/abs/2402.18986
Update from Ukraine | Great news! Ruzzian Training base and Oil refinery Kapputed. ATACMS in action: https://benborges.xyz/2024/05/02/update-from-ukraine.html
I’m giving a (fairly exploratory) talk on the LLM training and inference stack and how to monitor it at Multicore World, Christchurch New Zealand on Feb 14th. We’re getting there and back by island hopping to explore some new places. Arrived in Tahiti today.
Yes, Your Sports Bra Really Can Restrict Your Breathing
https://www.outsideonline.com/health/training-performance/sports-bra-breathing-restriction-study/
Not new and not limited to cyber. I have seen it for decades. Do more with less money, fewer bodies and little to no training. It appears to work until ripe organic bovine waste product hits the fan and then the ones at the bottom get screwed over and fired.
https://www.…
Closed-loop training of static output feedback neural network controllers for large systems: A distillation case study
E. M. Turan, J. J\"aschke
https://arxiv.org/abs/2402.19309
I just let Claude write a short article about :has() as a test and while it got browser support wrong (probably because of older training data) the code examples turned out far better than those in the latest piece by DigitalOcean on CSS-Tricks… 🫣
I am not a fan of having everything I create on every platform which I don’t completely own being auto-opt-in for them to monetize my labor for LLM (fake-#AI) training.
https://front-end.social/@benschwarz/1
Dual Dynamic Threshold Adjustment Strategy for Deep Metric Learning
Xiruo Jiang, Yazhou Yao, Sheng Liu, Fumin Shen, Liqiang Nie, Xiansheng Hua
https://arxiv.org/abs/2404.19282 https://arxiv.org/pdf/2404.19282
arXiv:2404.19282v1 Announce Type: new
Abstract: Loss functions and sample mining strategies are essential components in deep metric learning algorithms. However, the existing loss function or mining strategy often necessitate the incorporation of additional hyperparameters, notably the threshold, which defines whether the sample pair is informative. The threshold provides a stable numerical standard for determining whether to retain the pairs. It is a vital parameter to reduce the redundant sample pairs participating in training. Nonetheless, finding the optimal threshold can be a time-consuming endeavor, often requiring extensive grid searches. Because the threshold cannot be dynamically adjusted in the training stage, we should conduct plenty of repeated experiments to determine the threshold. Therefore, we introduce a novel approach for adjusting the thresholds associated with both the loss function and the sample mining strategy. We design a static Asymmetric Sample Mining Strategy (ASMS) and its dynamic version Adaptive Tolerance ASMS (AT-ASMS), tailored for sample mining methods. ASMS utilizes differentiated thresholds to address the problems (too few positive pairs and too many redundant negative pairs) caused by only applying a single threshold to filter samples. AT-ASMS can adaptively regulate the ratio of positive and negative pairs during training according to the ratio of the currently mined positive and negative pairs. This meta-learning-based threshold generation algorithm utilizes a single-step gradient descent to obtain new thresholds. We combine these two threshold adjustment algorithms to form the Dual Dynamic Threshold Adjustment Strategy (DDTAS). Experimental results show that our algorithm achieves competitive performance on CUB200, Cars196, and SOP datasets.
Jahrelanges Training im Münchener Nahverkehr und Bahnhöfen hat es mir ermöglicht den optimalen Umstieg mit kurze Laufweg so hinzukriegen um den, für den DB Navigator unmöglichen, Umstieg hinzukriegen. Grüße aus dem ICE 802 😁
If you liked our last year's short paper on #DeepLearning from #TrajectoryData, you'll love our new #preprint even more:
📝 "MobilityDL: A Review of Deep Learning From Trajecto…
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta, Anna Deichler, Jim O'Regan, Birger Mo\"ell, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson
https://arxiv.org/abs/2404.19622 https://arxiv.org/pdf/2404.19622
arXiv:2404.19622v1 Announce Type: new
Abstract: Although humans engaged in face-to-face conversation simultaneously communicate both verbally and non-verbally, methods for joint and unified synthesis of speech audio and co-speech 3D gesture motion from text are a new and emerging field. These technologies hold great promise for more human-like, efficient, expressive, and robust synthetic communication, but are currently held back by the lack of suitably large datasets, as existing methods are trained on parallel data from all constituent modalities. Inspired by student-teacher methods, we propose a straightforward solution to the data shortage, by simply synthesising additional training material. Specifically, we use unimodal synthesis models trained on large datasets to create multimodal (but synthetic) parallel training data, and then pre-train a joint synthesis model on that material. In addition, we propose a new synthesis architecture that adds better and more controllable prosody modelling to the state-of-the-art method in the field. Our results confirm that pre-training on large amounts of synthetic data improves the quality of both the speech and the motion synthesised by the multimodal model, with the proposed architecture yielding further benefits when pre-trained on the synthetic data. See https://shivammehta25.github.io/MAGI/ for example output.
Automatic Cardiac Pathology Recognition in Echocardiography Images Using Higher Order Dynamic Mode Decomposition and a Vision Transformer for Small Datasets
Andr\'es Bell-Navas, Nourelhouda Groun, Mar\'ia Villalba-Orero, Enrique Lara-Pezzi, Jes\'us Garicano-Mena, Soledad Le Clainche
https://arxiv.org/abs/2404.19579 https://arxiv.org/pdf/2404.19579
arXiv:2404.19579v1 Announce Type: new
Abstract: Heart diseases are the main international cause of human defunction. According to the WHO, nearly 18 million people decease each year because of heart diseases. Also considering the increase of medical data, much pressure is put on the health industry to develop systems for early and accurate heart disease recognition. In this work, an automatic cardiac pathology recognition system based on a novel deep learning framework is proposed, which analyses in real-time echocardiography video sequences. The system works in two stages. The first one transforms the data included in a database of echocardiography sequences into a machine-learning-compatible collection of annotated images which can be used in the training stage of any kind of machine learning-based framework, and more specifically with deep learning. This includes the use of the Higher Order Dynamic Mode Decomposition (HODMD) algorithm, for the first time to the authors' knowledge, for both data augmentation and feature extraction in the medical field. The second stage is focused on building and training a Vision Transformer (ViT), barely explored in the related literature. The ViT is adapted for an effective training from scratch, even with small datasets. The designed neural network analyses images from an echocardiography sequence to predict the heart state. The results obtained show the superiority of the proposed system and the efficacy of the HODMD algorithm, even outperforming pretrained Convolutional Neural Networks (CNNs), which are so far the method of choice in the literature.
This https://arxiv.org/abs/2402.08523 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_qu…
Das #Kundenglücklichmachungsposteinlieferungsgeräusch findet zum Glück auch alleine raus...
Frau #Atelierhütehund frönt dem Training für #faulsterborderderwelt.
Saints to hold 2024 training camp at University of California, Irvine https://www.nfl.com/news/saints-to-hold-2024-training-camp-at-university-of-california-irvine
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Lucas-Andre\"i Thil, Mirela Popa, Gerasimos Spanakis
https://arxiv.org/abs/2405.00516
This https://arxiv.org/abs/2308.14947 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This makes me happy. I've always like Dom. I was at his very first spring training game for the Mets.
MLB Insider: Red Sox to sign former Mets, Cubs first baseman Dom Smith https://fansided.com/posts/mlb-insider-red-so…
I’m attending The Democratic National Committee’s event, “All Aboard: the Training Series - What's at stake with the extreme MAGA GOP agenda” – sign up now to join me! https://events.democrats.org/event/607046/?
"KI befindet sich auf einem technischen Plateau, #gpt5 wird nicht merklich besser sein als GPT-4, wenn es überhaupt jemals erscheint"
>> Es passiert nicht oft, dass man eine "Alchemie-Phase" einer Technologie live miterleben kann, wo die Anwendung die erklärende Kraft der dahinterliegenden Theorien (zeitweise) übersteigt.
Momentan werden immer noch wöchentlic…
EyeEm: Fotos löschen, sonst KI-Training
Wer nicht möchte, dass seine Fotos zu Trainingsmaterial für KI werden, muss sie bei der Plattform EyeEm löschen. Die AGB sehen das so vor.
https://www.h…
Socrates said:
“No man has the right to be an amateur in the matter of physical training. It is a shame for a man to grow old without seeing the beauty and strength of which his body is capable.”
The Greeks invented #calisthenics, the term coming from the words kšllos (κάλλος), "beauty," and sthenos (σθένος), "strength." It is the art of using one's body …
Joint Training and Reflection Pattern Optimization for Non-Ideal RIS-Aided Multiuser Systems
Zhenyao He, Jindan Xu, Hong Shen, Wei Xu, Chau Yuen, Marco Di Renzo
https://arxiv.org/abs/2403.19955
#OH on slack:
#infosec #cybersecurity
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozi\`ere, David Lopez-Paz, Gabriel Synnaeve
https://arxiv.org/abs/2404.19737 https://arxiv.org/pdf/2404.19737
arXiv:2404.19737v1 Announce Type: new
Abstract: Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared model trunk. Considering multi-token prediction as an auxiliary training task, we measure improved downstream capabilities with no overhead in training time for both code and natural language models. The method is increasingly useful for larger model sizes, and keeps its appeal when training for multiple epochs. Gains are especially pronounced on generative benchmarks like coding, where our models consistently outperform strong baselines by several percentage points. Our 13B parameter models solves 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models. Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities. As an additional benefit, models trained with 4-token prediction are up to 3 times faster at inference, even with large batch sizes.
Registration is open for the online course “Ministering to Those Impacted by Suicide.” The course meets 4 Mondays in a row, beginning April 15.
It is estimated that 50% of individuals in any congregation have been impacted by suicide in some way. This training will take a deep dive into the topic and expand the thinking of leaders about how churches and organizations can proactively address the hopelessness and isolation that contribute to suicidality.
One model to use them all: Training a segmentation model with complementary datasets
Alexander C. Jenke, Sebastian Bodenstedt, Fiona R. Kolbinger, Marius Distler, J\"urgen Weitz, Stefanie Speidel
https://arxiv.org/abs/2402.19340
Scaling and renormalization in high-dimensional regression
Alexander B. Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan
https://arxiv.org/abs/2405.00592 …
Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
Dongyuan Li, Ying Zhang, Yusong Wang, Funakoshi Kataro, Manabu Okumura
https://arxiv.org/abs/2405.00307
📢 Ver.di ruft Lufthansa-Bodenpersonal zu mehrtägigem Warnstreik auf
Im Tarifkonflikt um das Lufthansa-Bodenpersonal hat die Gewerkschaft ver.di einen mehrtägigen Warnstreik ab Mittwoch angekündigt. Zum dreitägigen bundesweiten Ausstand bis Freitag aufgerufen seien unter anderem Beschäftigte und Auszubildende der Lufthansa Technik, Lufthansa Aviation Training und Lufthansa Technical Training, teilte ver.di mit.
➡️
Employing Federated Learning for Training Autonomous HVAC Systems
Fredrik Hagstr\"om, Vikas Garg, Fabricio Oliveira
https://arxiv.org/abs/2405.00389 h…
“Isometric exercises can help reduce blood pressure even more effectively than aerobic activity, weight training, or high-intensity interval workouts.”
#Health #Fitness #Longevity
Sources: OpenAI discussed training GPT-5 on public YouTube video transcripts; AI industry's need for high-quality text data may outstrip supply within two years (Deepa Seetharaman/Wall Street Journal)
https://www.
I will give a research talk today (Friday, the 1st of March) at 11:00 (CET = UTC 1) in our local Functional Programming research group seminar.
There is a Zoom link for the stream which should hopefully work: https://chalmers.zoom.us/j/65586341322 (Password: f,p.t:a;l,k without the pun…
Early call to gauge interest: Who's seriously interested in attending a paid training on oscilloscope probing theory and practice, some time in the early summer at my lab just outside Seattle? It will be a one-day, in-person event including 4 hours of lecture and lots of hands-on lab time for everyone on some very nice equipment (16 GHz oscilloscope, 28 Gbps BERT, multiple VNAs, and more).
I've done a few test runs with friends and I think I've finally got it refined to th…
EEG classifier cross-task transfer to avoid training sessions in robot-assisted rehabilitation
Niklas Kueper, Su Kyoung Kim, Elsa Andrea Kirchner
https://arxiv.org/abs/2402.17790 …
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods
Steven Shave, Richard Kasprowicz, Abdullah M. Athar, Denise Vlachou, Neil O. Carragher, Cuong Q. Nguyen
https://arxiv.org/abs/2404.18960
I'm delighted to be launching 'Mastodon for organisations' training on behalf of @…
https://afallen.cymru/2024/04/24/intro
Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks
Kade M. Heckel, Thomas Nowotny
https://arxiv.org/abs/2402.18994 https://<…
"Training Data for the Price of a Sandwich – Common Crawl’s Impact on Generative AI "
https://foundation.mozilla.org/de/research/library/generative-ai-training-data/common-crawl/
Can DC Reporters Overcome Their Trumper Shock-Training? https://talkingpointsmemo.com/edblog/can-dc-reporters-overcome-their-trumper-shock-training
Had to sign into facebook for first time in *years* for a kid's thing.
Man that site is *terrible* 🤢
Even worse than instagram.
Overloaded by adverts, "celebrity" stories and "AI" spam pretending to be real photos... I can only hope these count as training critical thinking.. But for goodness sake, WHY do people still insist on using it...?
Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality
Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang
https://arxiv.org/abs/2402.19442
I just let Claude write a short article about :has() as a test and while it got browser support wrong (probably because of older training data) the code examples turned out far better than those in the latest piece by DigitalOcean on CSS-Tricks… 🫣
This https://arxiv.org/abs/2401.17010 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao
https://arxiv.org/abs/2402.19465
If you liked our last year's short paper on #DeepLearning from #TrajectoryData, you'll love our new #preprint even more:
📝 "MobilityDL: A Review of Deep Learning From Trajecto…
Elliptic, MIT, and IBM release an experimental new AI detection model and its 200M-transaction training dataset to help identify Bitcoin money laundering (Andy Greenberg/Wired)
https://www.wired.com/story/ai-crypto-tracing-model-money-laundering/
This https://arxiv.org/abs/2310.09617 has been replaced.
link: https://scholar.google.com/scholar?q=a
This makes me happy. I've always like Dom. I was at his very first spring training game for the Mets.
MLB Insider: Red Sox to sign former Mets, Cubs first baseman Dom Smith https://fansided.com/posts/mlb-insider-red-so…
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
Cansu Korkmaz, A. Murat Tekalp, Zafer Dogan
https://arxiv.org/abs/2402.19215
This https://arxiv.org/abs/2404.12538 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCV_…
Moldovan military holds JCET drills with US, Romania: https://benborges.xyz/2024/04/01/moldovan-military-holds.html
Saints moving camp due to renovation of 'F' caf https://www.espn.com/nfl/story/_/id/39833555/saints-moving-training-camp-due-renovation-f-cafeteria
This https://arxiv.org/abs/2310.13828 has been replaced.
link: https://scholar.google.com/scholar?q=a
I'm delighted to be launching 'Mastodon for organisations' training on behalf of @…
https://afallen.cymru/2024/04/24/intro
A look at Project Maven, the US DOD's flagship AI effort which identifies battlefield targets, and at concerns, including adversaries poisoning training data (Katrina Manson/Bloomberg)
https://www.bloomberg.com/features/2024-ai-warfare-project-maven/…
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar
https://arxiv.org/abs/2402.19446
Had to sign into facebook for first time in *years* for a kid's thing.
Man that site is *terrible* 🤢
Even worse than instagram.
Overloaded by adverts, "celebrity" stories and "AI" spam pretending to be real photos... I can only hope these count as training critical thinking.. But for goodness sake, WHY do people still insist on using it...?
Training Classical Neural Networks by Quantum Machine Learning
Chen-Yu Liu, En-Jui Kuo, Chu-Hsuan Abraham Lin, Sean Chen, Jason Gemsun Young, Yeong-Jar Chang, Min-Hsiu Hsieh
https://arxiv.org/abs/2402.16465
Safe Training with Sensitive In-domain Data: Leveraging Data Fragmentation To Mitigate Linkage Attacks
Mariia Ignashina, Julia Ive
https://arxiv.org/abs/2404.19486 https://arxiv.org/pdf/2404.19486
arXiv:2404.19486v1 Announce Type: new
Abstract: Current text generation models are trained using real data which can potentially contain sensitive information, such as confidential patient information and the like. Under certain conditions output of the training data which they have memorised can be triggered, exposing sensitive data. To mitigate against this risk we propose a safer alternative which sees fragmented data in the form of domain-specific short phrases randomly grouped together shared instead of full texts. Thus, text fragments that could re-identify an individual cannot be reproduced by the model in one sequence, giving significant protection against linkage attacks. We fine-tune several state-of-the-art LLMs using meaningful syntactic chunks to explore their utility. In particular, we fine-tune BERT-based models to predict two cardiovascular diagnoses. Our results demonstrate the capacity of LLMs to benefit from the pre-trained knowledge and deliver classification results when fine-tuned with fragmented data comparable to fine-tuning with full training data.
Leveraging AI Predicted and Expert Revised Annotations in Interactive Segmentation: Continual Tuning or Full Training?
Tiezheng Zhang, Xiaoxi Chen, Chongyu Qu, Alan Yuille, Zongwei Zhou
https://arxiv.org/abs/2402.19423
This https://arxiv.org/abs/2304.01705 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…
Homomorphic WiSARDs: Efficient Weightless Neural Network training over encrypted data
Leonardo Neumann, Antonio Guimar\~aes, Diego F. Aranha, Edson Borin
https://arxiv.org/abs/2403.20190
This https://arxiv.org/abs/2404.10255 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2311.08590 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…
MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning
Ahmed Agiza, Marina Neseem, Sherief Reda
https://arxiv.org/abs/2403.20320 https://…
Training Classical Neural Networks by Quantum Machine Learning
Chen-Yu Liu, En-Jui Kuo, Chu-Hsuan Abraham Lin, Sean Chen, Jason Gemsun Young, Yeong-Jar Chang, Min-Hsiu Hsieh
https://arxiv.org/abs/2402.16465
This https://arxiv.org/abs/2304.01705 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…
Extending Llama-3's Context Ten-Fold Overnight
Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hongjin Qian, Qiwei Ye, Zhicheng Dou
https://arxiv.org/abs/2404.19553 https://arxiv.org/pdf/2404.19553
arXiv:2404.19553v1 Announce Type: new
Abstract: We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4 , which indicates the LLMs' inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources. Therefore, the team will publicly release the entire resources (including data, model, data generation pipeline, training code) so as to facilitate the future research from the community: \url{https://github.com/FlagOpen/FlagEmbedding}.
VIXEN: Visual Text Comparison Network for Image Difference Captioning
Alexander Black, Jing Shi, Yifei Fai, Tu Bui, John Collomosse
https://arxiv.org/abs/2402.19119
This https://arxiv.org/abs/2402.00786 has been replaced.
link: https://scholar.google.com/scholar?q=a
Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction
K. Aditya Mohan, Massimiliano Ferrucci, Chuck Divin, Garrett A. Stevenson, Hyojin Kim
https://arxiv.org/abs/2404.19075 https://arxiv.org/pdf/2404.19075
arXiv:2404.19075v1 Announce Type: new
Abstract: 4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in-situ experiments that causes spurious artifacts and inaccurate morphological reconstructions of the object. To solve this problem, we propose to perform a 4D time-space reconstruction using a distributed implicit neural representation (DINR) network that is trained using a novel distributed stochastic training algorithm. Our DINR network learns to reconstruct the object at its output by iterative optimization of its network parameters such that the measured projection images best match the output of the CT forward measurement model. We use a continuous time and space forward measurement model that is a function of the DINR outputs at a sparsely sampled set of continuous valued object coordinates. Unlike existing state-of-the-art neural representation architectures that forward and back propagate through dense voxel grids that sample the object's entire time-space coordinates, we only propagate through the DINR at a small subset of object coordinates in each iteration resulting in an order-of-magnitude reduction in memory and compute for training. DINR leverages distributed computation across several compute nodes and GPUs to produce high-fidelity 4D time-space reconstructions even for extremely large CT data sizes. We use both simulated parallel-beam and experimental cone-beam X-ray CT datasets to demonstrate the superior performance of our approach.
Alibaba is cutting prices for over 100 core cloud products in China by up to 55%, and 20% on average, including data storage and elastic computing products (Bloomberg)
https://www.bloomberg.com/news/articles/2024-02…
This https://arxiv.org/abs/2402.17532 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…
Alibaba is cutting prices for over 100 core cloud products in China by up to 55%, and 20% on average, including data storage and elastic computing products (Bloomberg)
https://www.bloomberg.com/news/articles/2024-02…
Challenges in Pre-Training Graph Neural Networks for Context-Based Fake News Detection: An Evaluation of Current Strategies and Resource Limitations
Gregor Donabauer, Udo Kruschwitz
https://arxiv.org/abs/2402.18179
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation
Shicheng Xu, Liang Pang, Mo Yu, Fandong Meng, Huawei Shen, Xueqi Cheng, Jie Zhou
https://arxiv.org/abs/2402.18150
S\~onajaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation
Aleksei Dorkin, Kairit Sirts
https://arxiv.org/abs/2404.19430 https://arxiv.org/pdf/2404.19430
arXiv:2404.19430v1 Announce Type: new
Abstract: We present an information retrieval based reverse dictionary system using modern pre-trained language models and approximate nearest neighbors search algorithms. The proposed approach is applied to an existing Estonian language lexicon resource, S\~onaveeb (word web), with the purpose of enhancing and enriching it by introducing cross-lingual reverse dictionary functionality powered by semantic search.
The performance of the system is evaluated using both an existing labeled English dataset of words and definitions that is extended to contain also Estonian and Russian translations, and a novel unlabeled evaluation approach that extracts the evaluation data from the lexicon resource itself using synonymy relations.
Evaluation results indicate that the information retrieval based semantic search approach without any model training is feasible, producing median rank of 1 in the monolingual setting and median rank of 2 in the cross-lingual setting using the unlabeled evaluation approach, with models trained for cross-lingual retrieval and including Estonian in their training data showing superior performance in our particular task.