Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Spelling/Grammar #52

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/Architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ The figure below shows the core framework structure, which is separated to four

![structure_image](media/structure.jpg)

## Recommended usings
## Recommended Use

Since `LLamaModel` interact with native library, it's not recommended to use the methods of it directly unless you know what you are doing. So does the `NativeApi`, which is not included in the arcitecher figure above.
Since `LLamaModel` interact with native library, it's not recommended to use the methods of it directly unless you know what you are doing. So does the `NativeApi`, which is not included in the architecture figure above.

`ChatSession` is recommended to be used when you want to build an application similar to ChatGPT, or the ChatBot, because it works best with `InteractiveExecutor`. Though other executors are also allowed to passed as a parameter to initialize a `ChatSession`, it's not encouraged if you are new to LLamaSharp and LLM.

Expand Down
4 changes: 2 additions & 2 deletions docs/ChatSession/basic-usages.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Basic usages of ChatSession

`ChatSession` is a higher-level absatrction than the executors. In the context of a chat application like ChatGPT, a "chat session" refers to an interactive conversation or exchange of messages between the user and the chatbot. It represents a continuous flow of communication where the user enters input or asks questions, and the chatbot responds accordingly. A chat session typically starts when the user initiates a conversation with the chatbot and continues until the interaction comes to a natural end or is explicitly terminated by either the user or the system. During a chat session, the chatbot maintains the context of the conversation, remembers previous messages, and generates appropriate responses based on the user's inputs and the ongoing dialogue.
`ChatSession` is a higher-level abstraction than the executors. In the context of a chat application like ChatGPT, a "chat session" refers to an interactive conversation or exchange of messages between the user and the chatbot. It represents a continuous flow of communication where the user enters input or asks questions, and the chatbot responds accordingly. A chat session typically starts when the user initiates a conversation with the chatbot and continues until the interaction comes to a natural end or is explicitly terminated by either the user or the system. During a chat session, the chatbot maintains the context of the conversation, remembers previous messages, and generates appropriate responses based on the user's inputs and the ongoing dialogue.

## Initialize a session

Currently, the only parameter that is accepted is an `ILLamaExecutor`, because this is the only parameter that we're sure to exist in all the future versions. Since it's the high-level absatrction, we're conservative to the API designs. In the future, there may be more kinds of constructors added.
Currently, the only parameter that is accepted is an `ILLamaExecutor`, because this is the only parameter that we're sure to exist in all the future versions. Since it's the high-level abstraction, we're conservative to the API designs. In the future, there may be more kinds of constructors added.

```cs
InteractiveExecutor ex = new(new LLamaModel(new ModelParams(modelPath)));
Expand Down
2 changes: 1 addition & 1 deletion docs/ChatSession/save-load-session.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Generally, the chat session could be switched, which requires the ability of loading and saving session.

When building a chat bot app, it's **NOT encouraged** to initialize many chat sessions and keep them in memory to wait for being switched, because the memory comsumption of both CPU and GPU is expensive. It's recommended to save the current session before switching to a new session, and load the file when switching back to the session.
When building a chat bot app, it's **NOT encouraged** to initialize many chat sessions and keep them in memory to wait for being switched, because the memory consumption of both CPU and GPU is expensive. It's recommended to save the current session before switching to a new session, and load the file when switching back to the session.

The API is also quite simple, the files will be saved into a directory you specified. If the path does not exist, a new directory will be created.

Expand Down
2 changes: 1 addition & 1 deletion docs/ChatSession/transforms.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Different from the input transform pipeline, the output transform only supports
session.WithOutputTransform(new MyOutputTransform());
```

Here's an example of how to implement the interface. In this example, the transform detects wether there's some keywords in the response and removes them.
Here's an example of how to implement the interface. In this example, the transform detects whether there's some keywords in the response and removes them.

```cs
/// <summary>
Expand Down
10 changes: 5 additions & 5 deletions docs/ContributingGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ After running `cmake --build . --config Release`, you could find the `llama.dll`

## Add a new feature to LLamaSharp

After refactoring the framework in `v0.4.0`, LLamaSharp will try to maintain the backward compatibility. However, in the following cases, break change is okay:
After refactoring the framework in `v0.4.0`, LLamaSharp will try to maintain the backward compatibility. However, in the following cases a breaking change will be required:

1. Due to some break changes in [llama.cpp](https://github.com/ggerganov/llama.cpp), making a break change will help to maintain the good abstraction and friendly user APIs.
2. A very improtant feature cannot be implemented unless refactoring some parts.
1. Due to some break changes in [llama.cpp](https://github.com/ggerganov/llama.cpp), making a breaking change will help to maintain the good abstraction and friendly user APIs.
2. A very important feature cannot be implemented unless refactoring some parts.
3. After some discussions, an agreement was reached that making the break change is reasonable.

If a new feature could be added without introducing any break change, please **open a PR** rather than open an issue first. We will never refuse the PR but help to improve it, unless it's malicious.
Expand Down Expand Up @@ -58,8 +58,8 @@ Besides, for some other integrations, like `ASP.NET core`, `SQL`, `Blazor` and s
There're mainly two ways to add an example:

1. Add the example to `LLama.Examples` of the repository.
2. Put the example in another repositpry and add the link to the readme or docs of LLamaSharp.
2. Put the example in another repository and add the link to the readme or docs of LLamaSharp.

## Add documents

LLamaSharp uses [mkdocs](https://github.com/mkdocs/mkdocs) to build the documantation, please follow the tutorial of mkdocs to add or modify documents in LLamaSharp.
LLamaSharp uses [mkdocs](https://github.com/mkdocs/mkdocs) to build the documentation, please follow the tutorial of mkdocs to add or modify documents in LLamaSharp.
2 changes: 1 addition & 1 deletion docs/Examples/LoadAndSaveState.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Load and save model/exeutor state
# Load and save model/executor state

```cs
using LLama.Common;
Expand Down
2 changes: 1 addition & 1 deletion docs/Examples/StatelessModeExecute.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Use stateless exeutor
# Use stateless executor

```cs
using LLama.Common;
Expand Down
6 changes: 3 additions & 3 deletions docs/LLamaExecutors/differences.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@ Therefore, please modify the prompt correspondingly when switching from one mode

## Stateful mode and Stateless mode.

Despite the differences between interactive mode and instruct mode, both of them are stateful mode. That is, your previous question/instruction will impact on the current response from LLM. On the contrary, the steteless executor does not have such a "memory". No matter how many times you talk to it, it will only concentrate on what you say in this time.
Despite the differences between interactive mode and instruct mode, both of them are stateful mode. That is, your previous question/instruction will impact on the current response from LLM. On the contrary, the stateless executor does not have such a "memory". No matter how many times you talk to it, it will only concentrate on what you say in this time.

Since the stateless executor has no memory of conversations before, you need to input your question with the whole prompt into it to get the better answer.

For example, if you feed `Q: Who is Trump? A: ` to the steteless executor, it may give the following answer with the antiprompt `Q: `.
For example, if you feed `Q: Who is Trump? A: ` to the stateless executor, it may give the following answer with the antiprompt `Q: `.

```
Donald J. Trump, born June 14, 1946, is an American businessman, television personality, politician and the 45th President of the United States (2017-2021). # Anexo:Torneo de Hamburgo 2022 (individual masculino)
Expand All @@ -65,5 +65,5 @@ Then, I got the following answer with the anti-prompt `Q: `.
45th president of the United States.
```

At this time, by repeating the same mode of `Q: xxx? A: xxx.`, LLM outputs the anti-prompt we want to help to decide where to dtop the generation.
At this time, by repeating the same mode of `Q: xxx? A: xxx.`, LLM outputs the anti-prompt we want to help to decide where to stop the generation.

4 changes: 2 additions & 2 deletions docs/LLamaExecutors/parameters.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Inference Parameters

Different from `LLamaModel`, when using an exeuctor, `InferenceParams` is passed to the `Infer` method instead of constructor. This is because executors only define the ways to run the model, therefore in each run, you can change the settings for this time inference.
Different from `LLamaModel`, when using an executor, `InferenceParams` is passed to the `Infer` method instead of constructor. This is because executors only define the ways to run the model, therefore in each run, you can change the settings for this time inference.


# InferenceParams
Expand Down Expand Up @@ -29,7 +29,7 @@ public int TokensKeep { get; set; }

### **MaxTokens**

how many new tokens to predict (n_predict), set to -1 to inifinitely generate response
how many new tokens to predict (n_predict), set to -1 to infinitely generate response
until it complete.

```csharp
Expand Down
2 changes: 1 addition & 1 deletion docs/LLamaModel/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ When initializing a `LLamaModel` object, there're three parameters, `ModelParams

The usage of `logger` will be further introduced in [logger doc](../More/log.md). The `encoding` is the encoding you want to use when dealing with text via this model.

The most improtant of all, is the `ModelParams`, which is defined as below. We'll explain the parameters step by step in this document.
The most important of all, is the `ModelParams`, which is defined as below. We'll explain the parameters step by step in this document.

```cs
public class ModelParams
Expand Down
2 changes: 1 addition & 1 deletion docs/More/log.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The `message` is the log message itself.

The `level` is the level of the information in the log. As shown above, there're four levels, which are `info`, `debug`, `warning` and `error` respectively.

The following is a simple example of theb logger implementation:
The following is a simple example of the logger implementation:

```cs
public sealed class LLamaDefaultLogger : ILLamaLogger
Expand Down
4 changes: 2 additions & 2 deletions docs/Tricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ $$ len(prompt) + len(response) < len(context) $$

In this inequality, `len(response)` refers to the expected tokens for LLM to generate.

## Try differenct executors with a prompt
## Try different executors with a prompt

Some prompt works well under interactive mode, such as `chat-with-bob`, some others may work well with instruct mode, such as `alpaca`. Besides, if your input is quite simple and one-time job, such as "Q: what is the satellite of the earth? A: ", stateless mode will be a good choice.

Expand All @@ -41,4 +41,4 @@ The differences between modes may lead to much different behaviors under the sam

## Set the layer count you want to offload to GPU

Currently, the `GpuLayerCount` param, which decides the number of layer loaded into GPU, is set to 20 by default. However, if you have some efficient GPUs, setting it as a larger number will attain faster inference.
Currently, the `GpuLayerCount` parameter, which decides the number of layer loaded into GPU, is set to 20 by default. However, if you have some efficient GPUs, setting it as a larger number will attain faster inference.
6 changes: 3 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ LLamaSharp is the C#/.NET binding of [llama.cpp](https://github.com/ggerganov/ll

## Essential insights for novice learners

If you are new to LLM, here're some tips for you to help you to get start with `LLamaSharp`. If you are experienced in this field, we'd still recommend you to take a few minutes to read it because somethings performs differently compared to cpp/python.
If you are new to LLM, here're some tips for you to help you to get start with `LLamaSharp`. If you are experienced in this field, we'd still recommend you to take a few minutes to read it because some things perform differently compared to cpp/python.

1. Tha main ability of LLamaSharp is to provide an efficient way to run inference of LLM (Large Language Model) locally (and fine-tune model in the future). The model weights, however, needs to be downloaded from other resources, like [huggingface](https://huggingface.co).
2. Since LLamaSharp supports multiple platforms, The nuget package is splitted to `LLamaSharp` and `LLama.Backend`. After installing `LLamaSharp`, please install one of `LLama.Backend.Cpu`, `LLama.Backend.Cuda11` and `LLama.Backend.Cuda12`. If you use the source code, dynamic libraries could be found in `LLama/Runtimes`. Then rename the one you want to use to `libllama.dll`.
1. The main ability of LLamaSharp is to provide an efficient way to run inference of LLM (Large Language Model) locally (and fine-tune model in the future). The model weights, however, need to be downloaded from other resources such as [huggingface](https://huggingface.co).
2. Since LLamaSharp supports multiple platforms, The nuget package is split into `LLamaSharp` and `LLama.Backend`. After installing `LLamaSharp`, please install one of `LLama.Backend.Cpu`, `LLama.Backend.Cuda11` or `LLama.Backend.Cuda12`. If you use the source code, dynamic libraries can be found in `LLama/Runtimes`. Rename the one you want to use to `libllama.dll`.
3. `LLaMa` originally refers to the weights released by Meta (Facebook Research). After that, many models are fine-tuned based on it, such as `Vicuna`, `GPT4All`, and `Pyglion`. Though all of these models are supported by LLamaSharp, some steps are necessary with different file formats. There're mainly three kinds of files, which are `.pth`, `.bin (ggml)`, `.bin (quantized)`. If you have the `.bin (quantized)` file, it could be used directly by LLamaSharp. If you have the `.bin (ggml)` file, you could use it directly but get higher inference speed after the quantization. If you have the `.pth` file, you need to follow [the instructions in llama.cpp](https://github.com/ggerganov/llama.cpp#prepare-data--run) to convert it to `.bin (ggml)` file at first.
4. LLamaSharp supports GPU acceleration, but it requires cuda installation. Please install cuda 11 or cuda 12 on your system before using LLamaSharp to enable GPU. If you have another cuda version, you could compile llama.cpp from source to get the dll. For building from source, please refer to [issue #5](https://github.com/SciSharp/LLamaSharp/issues/5).

Expand Down