Using these new TensorRT-LLM optimizations, NVIDIA has pulled out a huge 2.4x performance leap with its current H100 AI GPU in MLPerf Inference 3.1 to 4.0 with GPT-J tests using an offline scenario.
The AI chip giant says the open-source software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models when it comes out next month. Nvidia ...