Performance of Confidential Computing GPUsAntonio Mart\'inez Ibarra, Julian James Stephen, Aurora Gonz\'alez Vidal, K. R. Jayaram, Antonio Fernando Skarmeta G\'omezhttps://arxiv.org/abs/2505.16501
Performance of Confidential Computing GPUsThis work examines latency, throughput, and other metrics when performing inference on confidential GPUs. We explore different traffic patterns and scheduling strategies using a single Virtual Machine with one NVIDIA H100 GPU, to perform relaxed batch inferences on multiple Large Language Models (LLMs), operating under the constraint of swapping models in and out of memory, which necessitates efficient control. The experiments simulate diverse real-world scenarios by varying parameters such as …