
A First Comprehensive Study of TurboQuant: Accuracy and Performance
·12 min read
TurboQuant, a method for KV-cache quantization, recently gained significant traction in the community due to the large advertised savings in GPU memory from very low bit-width quantization of a...
