forked from huggingface/candle
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add more example readmes. (huggingface#828)
* Add more readmes. * Add a readme for dinov2. * Add some skeleton files for a couple more examples. * More whisper details.
- Loading branch information
1 parent
805bf9f
commit e82fcf1
Showing
6 changed files
with
113 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# candle-bert | ||
|
||
Bert is a general large language model. In this example it can be used for two | ||
different tasks: | ||
- Compute sentence embeddings for a prompt. | ||
- Compute similarities between a set of sentences. | ||
|
||
|
||
## Sentence embeddings | ||
|
||
Bert is used to compute the sentence embeddings for a prompt. The model weights | ||
are downloaded from the hub on the first run. | ||
|
||
```bash | ||
cargo run --example bert --release -- --prompt "Here is a test sentence" | ||
|
||
> [[[ 0.0798, -0.0665, -0.0247, ..., -0.1082, -0.1000, -0.2751], | ||
> [ 0.4218, 0.2690, 0.2740, ..., 0.3889, 1.3503, 0.9908], | ||
> [ 0.0466, 0.3041, -0.1143, ..., 0.4427, 0.6926, -0.1515], | ||
> ... | ||
> [ 0.3396, 0.4320, -0.4408, ..., 0.9212, 0.2331, -0.6777], | ||
> [ 0.2789, 0.7539, 0.4306, ..., -0.0095, 0.3375, -1.7529], | ||
> [ 0.6737, 0.7882, 0.0548, ..., 0.1836, 0.7299, -0.6617]]] | ||
> Tensor[[1, 7, 384], f32] | ||
``` | ||
|
||
## Similarities | ||
|
||
In this example, Bert is used to compute the sentence embeddings for a set of | ||
sentences (hardcoded in the examples). Then cosine similarities are computed for | ||
each sentence pair and they are reported by decreasing values, hence the first | ||
reported pair contains the two sentences that have the highest similarity score. | ||
The sentence embeddings are computed using average pooling through all the | ||
sentence tokens, including some potential padding. | ||
|
||
```bash | ||
cargo run --example bert --release | ||
|
||
> score: 0.85 'The new movie is awesome' 'The new movie is so great' | ||
> score: 0.61 'The cat sits outside' 'The cat plays in the garden' | ||
> score: 0.52 'I love pasta' 'Do you like pizza?' | ||
> score: 0.23 'The new movie is awesome' 'Do you like pizza?' | ||
> score: 0.22 'I love pasta' 'The new movie is awesome' | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# candle-starcoder: code generation model | ||
|
||
StarCoder/BigCode is a LLM model specialized to code generation. | ||
|
||
```bash | ||
cargo run --example bigcode --release -- --prompt "fn fact(n: u64) -> u64 " | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# candle-dinov2 | ||
|
||
[DINOv2](https://github.com/facebookresearch/dinov2) is a computer vision model. | ||
In this example, it is used as an ImageNet classifier: the model returns the | ||
probability for the image to belong to each of the 1000 ImageNet categories. | ||
|
||
## Running some example | ||
|
||
```bash | ||
cargo run --example dinov2 --release -- --image candle-examples/examples/yolo-v8/assets/bike.jpg | ||
|
||
> mountain bike, all-terrain bike, off-roader: 43.67% | ||
> bicycle-built-for-two, tandem bicycle, tandem: 33.20% | ||
> crash helmet : 13.23% | ||
> unicycle, monocycle : 2.44% | ||
> maillot : 2.42% | ||
``` | ||
|
||
 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# candle-falcon | ||
|
||
Falcon is a general large language model. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# candle-whisper: speech recognition | ||
|
||
An implementation of [OpenAI Whisper](https://github.com/openai/whisper) using | ||
candle. Whisper is a general purpose speech recognition model, it can be used to | ||
convert audio files (in the `.wav` format) to text. Supported features include | ||
language detection as well as multilingual speech recognition. | ||
|
||
## Running some example | ||
|
||
If no audio file is passed as input, a [sample | ||
file](https://huggingface.co/datasets/Narsil/candle-examples/resolve/main/samples_jfk.wav) is automatically downloaded | ||
from the hub. | ||
|
||
```bash | ||
cargo run --example whisper --release | ||
|
||
> No audio file submitted: Downloading https://huggingface.co/datasets/Narsil/candle_demo/blob/main/samples_jfk.wav | ||
> loaded wav data: Header { audio_format: 1, channel_count: 1, sampling_rate: 16000, bytes_per_second: 32000, bytes_per_sample: 2, bits_per_sample: 16 } | ||
> pcm data loaded 176000 | ||
> loaded mel: [1, 80, 3000] | ||
> 0.0s -- 30.0s: And so my fellow Americans ask not what your country can do for you ask what you can do for your country | ||
``` | ||
In order to use the multilingual mode, specify a multilingual model via the | ||
`--model` flag, see the details below. | ||
## Command line flags | ||
- `--input`: the audio file to be converted to text, in wav format. | ||
- `--language`: force the language to some specific value rather than being | ||
detected, e.g. `en`. | ||
- `--task`: the task to be performed, can be `transcribe` (return the text data | ||
in the original language) or `translate` (translate the text to English). | ||
- `--timestamps`: enable the timestamp mode where some timestamps are reported | ||
for each recognized audio extracts. | ||
- `--model`: the model to be used. Models that do not end with `-en` are | ||
multilingual models, other ones are English only models. The supported models | ||
are `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`, `medium`, | ||
`medium.en`, `large`, and `large-v2`. |