Skip to content

Latest commit

 

History

History
81 lines (64 loc) · 2.32 KB

README.md

File metadata and controls

81 lines (64 loc) · 2.32 KB

llama.node

CI NPM Version NPM Downloads

An another Node binding of llama.cpp to make same API with llama.rn as much as possible.

Platform Support

  • macOS
    • arm64: CPU and Metal GPU acceleration
    • x86_64: CPU only
  • Windows (x86_64 and arm64)
    • CPU
    • GPU acceleration via Vulkan
  • Linux (x86_64 and arm64)
    • CPU
    • GPU acceleration via Vulkan
    • GPU acceleration via CUDA

Installation

npm install @fugood/llama.node

Usage

import { loadModel } from '@fugood/llama.node'

// Initial a Llama context with the model (may take a while)
const context = await loadModel({
  model: 'path/to/gguf/model',
  use_mlock: true,
  n_ctx: 2048,
  n_gpu_layers: 1, // > 0: enable GPU
  // embedding: true, // use embedding
  // lib_variant: 'opencl', // Change backend
})

// Do completion
const { text } = await context.completion(
  {
    prompt: 'This is a conversation between user and llama, a friendly chatbot. respond in simple markdown.\n\nUser: Hello!\nLlama:',
    n_predict: 100,
    stop: ['</s>', 'Llama:', 'User:'],
    // n_threads: 4,
  },
  (data) => {
    // This is a partial completion callback
    const { token } = data
  },
)
console.log('Result:', text)

Lib Variants

  • default: General usage, not support GPU except macOS (Metal)
  • vulkan: Support GPU Vulkan (Windows/Linux), but some scenario might unstable
  • cuda: Support GPU CUDA (Linux), but only for limited capability (x86_64: 8.9, arm64: 8.7)

License

MIT


Built and maintained by BRICKS.