Training Differentially Private Ad Prediction Models with Semi-Sensitive Features
Lynn Chua, Qiliang Cui, Badih Ghazi, Charlie Harrison, Pritish Kamath, Walid Krichene, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash Varadarajan, Chiyuan Zhang
https://arXiv.org/abs/2401.15246<…
MEA-Defender: A Robust Watermark against Model Extraction Attack
Peizhuo Lv, Hualong Ma, Kai Chen, Jiachen Zhou, Shengzhi Zhang, Ruigang Liang, Shenchen Zhu, Pan Li, Yingjun Zhang
https://arXiv.org/abs/2401.15239
Introducing cosmosGPT: Monolingual Training for Turkish Language Models
H. Toprak Kesgin, M. Kaan Yuce, Eren Dogan, M. Egemen Uzun, Atahan Uz, H. Emre Seyrek, Ahmed Zeer, M. Fatih Amasyali
https://arxiv.org/abs/2404.17336
Training exercise:
Try to replace all `$foo->isBeep()/$foo->canBar()` etc. calls with an instanceof check. $foo instanceof Beepable, $foo instanceof Bars, etc.
What does that do to your data model? If you leverage parameter types instead of manual instanceof checks, how does that simplify your logic flow?
I don't expect it to work for every use case, especially in PHP, but it would be a valuable exercise to try.
ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing
Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, Yonghong Tian
https://arxiv.org/abs/2402.16445 https://arxiv.org/pdf/2402.16445
arXiv:2402.16445v1 Announce Type: new
Abstract: Large Language Models (LLMs), including GPT-x and LLaMA2, have achieved remarkable performance in multiple Natural Language Processing (NLP) tasks. Under the premise that protein sequences constitute the protein language, Protein Large Language Models (ProLLMs) trained on protein corpora excel at de novo protein sequence generation. However, as of now, unlike LLMs in NLP, no ProLLM is capable of multiple tasks in the Protein Language Processing (PLP) field. This prompts us to delineate the inherent limitations in current ProLLMs: (i) the lack of natural language capabilities, (ii) insufficient instruction understanding, and (iii) high training resource demands. To address these challenges, we introduce a training framework to transform any general LLM into a ProLLM capable of handling multiple PLP tasks. Specifically, our framework utilizes low-rank adaptation and employs a two-stage training approach, and it is distinguished by its universality, low overhead, and scalability. Through training under this framework, we propose the ProLLaMA model, the first known ProLLM to handle multiple PLP tasks simultaneously. Experiments show that ProLLaMA achieves state-of-the-art results in the unconditional protein sequence generation task. In the controllable protein sequence generation task, ProLLaMA can design novel proteins with desired functionalities. In the protein property prediction task, ProLLaMA achieves nearly 100\% accuracy across many categories. The latter two tasks are beyond the reach of other ProLLMs. Code is available at \url{https://github.com/Lyu6PosHao/ProLLaMA}.
Efficient Online Learning for Networks of Two-Compartment Spiking Neurons
Yujia Yin, Xinyi Chen, Chenxiang Ma, Jibin Wu, Kay Chen Tan
https://arxiv.org/abs/2402.15969 https://arxiv.org/pdf/2402.15969
arXiv:2402.15969v1 Announce Type: new
Abstract: The brain-inspired Spiking Neural Networks (SNNs) have garnered considerable research interest due to their superior performance and energy efficiency in processing temporal signals. Recently, a novel multi-compartment spiking neuron model, namely the Two-Compartment LIF (TC-LIF) model, has been proposed and exhibited a remarkable capacity for sequential modelling. However, training the TC-LIF model presents challenges stemming from the large memory consumption and the issue of gradient vanishing associated with the Backpropagation Through Time (BPTT) algorithm. To address these challenges, online learning methodologies emerge as a promising solution. Yet, to date, the application of online learning methods in SNNs has been predominantly confined to simplified Leaky Integrate-and-Fire (LIF) neuron models. In this paper, we present a novel online learning method specifically tailored for networks of TC-LIF neurons. Additionally, we propose a refined TC-LIF neuron model called Adaptive TC-LIF, which is carefully designed to enhance temporal information integration in online learning scenarios. Extensive experiments, conducted on various sequential benchmarks, demonstrate that our approach successfully preserves the superior sequential modeling capabilities of the TC-LIF neuron while incorporating the training efficiency and hardware friendliness of online learning. As a result, it offers a multitude of opportunities to leverage neuromorphic solutions for processing temporal signals.
OpenAI expands its Custom Model training program with "assisted fine-tuning", letting organizations set up data training pipelines, evaluation systems, and more (Kyle Wiggers/TechCrunch)
https://techcrunch.com/2024/04/04/openai-expands…
From the Rundown newsletter interesting highlights from the Nvidia Keynote,
The Blackwell B200 GPU delivers 30x the performance of its H100 GPU predecessor while using 25x less energy.
Nvidia said the Blackwell innovations will allow for training for up to 10T parameter models.
Huang also revealed that GPT-4 contains 1.8T parameters and that 2000 Blackwell chips could finish training the model in 90 days.
The last point illustrates the enormous training costs of a model l…
That's not... those are not the units you want for that.
Distributionally Robust Safe Screening
Hiroyuki Hanada, Satoshi Akahane, Tatsuya Aoyama, Tomonari Tanaka, Yoshito Okura, Yu Inatsu, Noriaki Hashimoto, Taro Murayama, Lee Hanju, Shinya Kojima, Ichiro Takeuchi
https://arxiv.org/abs/2404.16328
Evaluation of pseudo-healthy image reconstruction for anomaly detection with deep generative models: Application to brain FDG PET
Ravi Hassanaly, Camille Brianceau, Maëlys Solal, Olivier Colliot, Ninon Burgos
https://arXiv.org/abs/2401.16363
(continued from previous post)...blackwell GPU will cost $ 30.000 (minimum), so training a GPT4 model with 2000 GPUs costs approx. $ 60 million ? (in 90 days, at a minimum because there are also other costs)
#training #GPT4
A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters
Chunyu Xue, Weihao Cui, Han Zhao, Quan Chen, Shulai Zhang, Pengyu Yang, Jing Yang, Shaobo Li, Minyi Guo
https://arxiv.org/abs/2403.16125
GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model
Haocheng LiuIP Paris, LTCI, IDS, S2A, Teysir BaouebIP Paris, LTCI, IDS, S2A, Mathieu FontaineIP Paris, LTCI, IDS, S2A, Jonathan Le RouxMERL, Gael RichardIP Paris, LTCI, IDS, S2A
https://arxiv.org/abs/2402.15516
A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters
Chunyu Xue, Weihao Cui, Han Zhao, Quan Chen, Shulai Zhang, Pengyu Yang, Jing Yang, Shaobo Li, Minyi Guo
https://arxiv.org/abs/2403.16125
CT-3DFlow : Leveraging 3D Normalizing Flows for Unsupervised Detection of Pathological Pulmonary CT scans
Aissam Djahnine, Alexandre Popoff, Emilien Jupin-Delevaux, Vincent Cottin, Olivier Nempont, Loic Boussel
https://arxiv.org/abs/2403.18514
Leveraging power of deep learning for fast and efficient elite pixel selection in time series SAR interferometry
Ashutosh Tiwari, Nitheshnirmal Sadhashivam, Leonard O. Ohenhen, Manoochehr Shirzaei
https://arxiv.org/abs/2402.17069
Debiasing Cardiac Imaging with Controlled Latent Diffusion Models
Grzegorz Skorupko, Richard Osuala, Zuzanna Szafranowska, Kaisar Kushibar, Nay Aung, Steffen E Petersen, Karim Lekadir, Polyxeni Gkontra
https://arxiv.org/abs/2403.19508
Bayesian Learned Models Can Detect Adversarial Malware For Free
Bao Gia Doan, Dang Quang Nguyen, Paul Montague, Tamas Abraham, Olivier De Vel, Seyit Camtepe, Salil S. Kanhere, Ehsan Abbasnejad, Damith C. Ranasinghe
https://arxiv.org/abs/2403.18309
How we won BraTS 2023 Adult Glioma challenge? Just faking it! Enhanced Synthetic Data Augmentation and Model Ensemble for brain tumour segmentation
Andr\'e Ferreira, Naida Solak, Jianning Li, Philipp Dammann, Jens Kleesiek, Victor Alves, Jan Egger
https://arxiv.org/abs/2402.17317
Photon-counting CT using a Conditional Diffusion Model for Super-resolution and Texture-preservation
Christopher Wiedeman, Chuang Niu, Mengzhou Li, Bruno De Man, Jonathan S Maltz, Ge Wang
https://arxiv.org/abs/2402.16212
Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification
Bidur Khanal, Prashant Shrestha, Sanskar Amgain, Bishesh Khanal, Binod Bhattarai, Cristian A. Linte
https://arxiv.org/abs/2402.16734
$\texttt{MiniMol}$: A Parameter-Efficient Foundation Model for Molecular Learning
Kerstin Kl\"aser, B{\l}a\.zej Banaszewski, Samuel Maddrell-Mander, Callum McLean, Luis M\"uller, Ali Parviz, Shenyang Huang, Andrew Fitzgibbon
https://arxiv.org/abs/2404.14986