Quantization Python - Search News

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

AWQ search for accurate quantization. Pre-computed AWQ model zoo for LLMs (LLaMA-1&2, OPT, Vicuna, LLaVA; load to generate quantized weights). Memory-efficient 4-bit Linear in PyTorch. Efficient CUDA ...

GitHub

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

AWQ search for accurate quantization. Pre-computed AWQ model zoo for LLMs (LLaMA, OPT, Vicuna, LLaVA; load to generate quantized weights). Memory-efficient 4-bit Linear in PyTorch. Efficient CUDA ...

IEEE

SearchQ: Search-Based Fine-Grained Quantization for Data-Free Model Compression

Abstract: The huge memory and computing costs of deep neural networks (DNNs) greatly hinder their deployment on resource-constrained devices with high efficiency. Quantization has emerged as an ...

IEEE

RefQSR: Reference-Based Quantization for Image Super-Resolution Networks

Abstract: Single image super-resolution (SISR) aims to reconstruct a high-resolution image from its low-resolution observation. Recent deep learning-based SISR models show high performance at the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results