You need to use n_gpu_layers in the initialization of Llama (), which offloads some of the work to the GPU. If you have enough VRAM, just put an arbitarily high number, or decrease it until you don't ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results