Skip to content

Commit

Permalink
Add more example readmes. (huggingface#828)
Browse files Browse the repository at this point in the history
* Add more readmes.

* Add a readme for dinov2.

* Add some skeleton files for a couple more examples.

* More whisper details.
  • Loading branch information
LaurentMazare authored Sep 12, 2023
1 parent 805bf9f commit e82fcf1
Show file tree
Hide file tree
Showing 6 changed files with 113 additions and 1 deletion.
44 changes: 44 additions & 0 deletions candle-examples/examples/bert/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# candle-bert

Bert is a general large language model. In this example it can be used for two
different tasks:
- Compute sentence embeddings for a prompt.
- Compute similarities between a set of sentences.


## Sentence embeddings

Bert is used to compute the sentence embeddings for a prompt. The model weights
are downloaded from the hub on the first run.

```bash
cargo run --example bert --release -- --prompt "Here is a test sentence"

> [[[ 0.0798, -0.0665, -0.0247, ..., -0.1082, -0.1000, -0.2751],
> [ 0.4218, 0.2690, 0.2740, ..., 0.3889, 1.3503, 0.9908],
> [ 0.0466, 0.3041, -0.1143, ..., 0.4427, 0.6926, -0.1515],
> ...
> [ 0.3396, 0.4320, -0.4408, ..., 0.9212, 0.2331, -0.6777],
> [ 0.2789, 0.7539, 0.4306, ..., -0.0095, 0.3375, -1.7529],
> [ 0.6737, 0.7882, 0.0548, ..., 0.1836, 0.7299, -0.6617]]]
> Tensor[[1, 7, 384], f32]
```

## Similarities

In this example, Bert is used to compute the sentence embeddings for a set of
sentences (hardcoded in the examples). Then cosine similarities are computed for
each sentence pair and they are reported by decreasing values, hence the first
reported pair contains the two sentences that have the highest similarity score.
The sentence embeddings are computed using average pooling through all the
sentence tokens, including some potential padding.

```bash
cargo run --example bert --release

> score: 0.85 'The new movie is awesome' 'The new movie is so great'
> score: 0.61 'The cat sits outside' 'The cat plays in the garden'
> score: 0.52 'I love pasta' 'Do you like pizza?'
> score: 0.23 'The new movie is awesome' 'Do you like pizza?'
> score: 0.22 'I love pasta' 'The new movie is awesome'
```
7 changes: 7 additions & 0 deletions candle-examples/examples/bigcode/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# candle-starcoder: code generation model

StarCoder/BigCode is a LLM model specialized to code generation.

```bash
cargo run --example bigcode --release -- --prompt "fn fact(n: u64) -> u64 "
```
19 changes: 19 additions & 0 deletions candle-examples/examples/dinov2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# candle-dinov2

[DINOv2](https://github.com/facebookresearch/dinov2) is a computer vision model.
In this example, it is used as an ImageNet classifier: the model returns the
probability for the image to belong to each of the 1000 ImageNet categories.

## Running some example

```bash
cargo run --example dinov2 --release -- --image candle-examples/examples/yolo-v8/assets/bike.jpg

> mountain bike, all-terrain bike, off-roader: 43.67%
> bicycle-built-for-two, tandem bicycle, tandem: 33.20%
> crash helmet : 13.23%
> unicycle, monocycle : 2.44%
> maillot : 2.42%
```

![Leading group, Giro d'Italia 2021](../yolo-v8/assets/bike.jpg)
3 changes: 3 additions & 0 deletions candle-examples/examples/falcon/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# candle-falcon

Falcon is a general large language model.
2 changes: 1 addition & 1 deletion candle-examples/examples/quantized/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ cargo run --example quantized --release -- --prompt "The best thing about coding
> The best thing about coding in rust is 1.) that I don’t need to worry about memory leaks, 2.) speed and 3.) my program will compile even on old machines.
```
### Command-line flags
## Command-line flags
Run with `--help` to see all options.
Expand Down
39 changes: 39 additions & 0 deletions candle-examples/examples/whisper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# candle-whisper: speech recognition

An implementation of [OpenAI Whisper](https://github.com/openai/whisper) using
candle. Whisper is a general purpose speech recognition model, it can be used to
convert audio files (in the `.wav` format) to text. Supported features include
language detection as well as multilingual speech recognition.

## Running some example

If no audio file is passed as input, a [sample
file](https://huggingface.co/datasets/Narsil/candle-examples/resolve/main/samples_jfk.wav) is automatically downloaded
from the hub.

```bash
cargo run --example whisper --release

> No audio file submitted: Downloading https://huggingface.co/datasets/Narsil/candle_demo/blob/main/samples_jfk.wav
> loaded wav data: Header { audio_format: 1, channel_count: 1, sampling_rate: 16000, bytes_per_second: 32000, bytes_per_sample: 2, bits_per_sample: 16 }
> pcm data loaded 176000
> loaded mel: [1, 80, 3000]
> 0.0s -- 30.0s: And so my fellow Americans ask not what your country can do for you ask what you can do for your country
```
In order to use the multilingual mode, specify a multilingual model via the
`--model` flag, see the details below.
## Command line flags
- `--input`: the audio file to be converted to text, in wav format.
- `--language`: force the language to some specific value rather than being
detected, e.g. `en`.
- `--task`: the task to be performed, can be `transcribe` (return the text data
in the original language) or `translate` (translate the text to English).
- `--timestamps`: enable the timestamp mode where some timestamps are reported
for each recognized audio extracts.
- `--model`: the model to be used. Models that do not end with `-en` are
multilingual models, other ones are English only models. The supported models
are `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`, `medium`,
`medium.en`, `large`, and `large-v2`.

0 comments on commit e82fcf1

Please sign in to comment.