Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Add an example of using the /completions endpoint with Curl or Javascript - shows how to use Llamafile to do code completion. Could consider posting this on the main README.md as well.
  • Loading branch information
heaversm authored Nov 22, 2024
1 parent b0b4613 commit 6f2aab2
Showing 1 changed file with 62 additions and 3 deletions.
65 changes: 62 additions & 3 deletions llama.cpp/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,65 @@ node index.js

`system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime)

### Examples

**CODE COMPLETION**

You can use the completions endpoint for Code Completion (Fill-In-the Middle or FIM Completion) with the following prompt syntax:

<details>
<summary>Curl API Client Example</summary>

```bash
curl 'http://127.0.0.1:8081/completion' \
-X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" --data-binary \
'{
"model": "LlaMA_CPP",
"stream": false,
"prompt": "<|fim_prefix|>[CODE_BEFORE_CURSOR]<|fim_suffix|>[CODE_AFTER_CURSOR]<|fim_middle|>",
"temperature": 0.1,
"n_predict": 512,
"cache_prompt": true,
"stop": ["<|fim_middle|>", "\n\n", "<|endoftext|>"]
}'
```
</details>

<details>
<summary>Javascript API Client Example</summary>

```typescript
const generateCompletion = async (prefix: string, suffix: string) => {
try {
const response = await fetch('http://127.0.0.1:8080/completion', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer no-key',
},
body: JSON.stringify({
model: 'LlaMA_CPP',
stream: false,
prompt: `<|fim_prefix|>${prefix}<|fim_suffix|>${suffix}<|fim_middle|>`,
temperature: 0.1,
max_new_tokens: 512,
do_sample: false,
stop: ['<|fim_middle|>', '\n\n', '<|endoftext|>'],
}),
});
const data = await response.json();
} catch (error) {
console.error('Completion error:', error);
return null;
}
};

const completionResult = await generateCompletion('[YOUR_PREFIX', 'YOUR_SUFFIX');
```
</details>


### Result JSON:

* Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion.
Expand Down Expand Up @@ -274,20 +333,20 @@ Notice that each `probs` is an array of length `n_probs`.

```python
import openai

client = openai.OpenAI(
base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
api_key = "sk-no-key-required"
)

completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
{"role": "user", "content": "Write a limerick about python exceptions"}
]
)

print(completion.choices[0].message)
```
... or raw HTTP requests:
Expand Down

0 comments on commit 6f2aab2

Please sign in to comment.