Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PY] Tokenizers #1066

Closed
Tracked by #397
aacebo opened this issue Dec 18, 2023 · 0 comments
Closed
Tracked by #397

[PY] Tokenizers #1066

aacebo opened this issue Dec 18, 2023 · 0 comments
Assignees
Labels
P0 parity JS → dotnet and/or JS → Python Python Change/fix applies to Python. If all three, use the 'JS & dotnet & Python' label small tshirt size small (1-4 days)

Comments

@aacebo
Copy link
Collaborator

aacebo commented Dec 18, 2023

implement tokenizers functionality of the JS SDK https://github.com/microsoft/teams-ai/tree/main/js/packages/teams-ai/src/tokenizers

@aacebo aacebo added Python Change/fix applies to Python. If all three, use the 'JS & dotnet & Python' label parity JS → dotnet and/or JS → Python P0 labels Dec 18, 2023
@aacebo aacebo added the small tshirt size small (1-4 days) label Dec 27, 2023
aacebo pushed a commit that referenced this issue Jan 17, 2024
## Linked issues

closes: #1066 

## Details

1. Implement tokenizers for Python based on the JS SDK
2. Changes the underlying coding to `cl100k_base`, which is used by gpt4
and gpt3.5. JS is using `r50k_base` and I have created
#1171 to track this issue.
3. Rename `GPT3Tokenizer` to `GPTTokenizer`, which seems making more
sense for its functionality, as both gpt4 and gpt3.5 can use this
tokenizer.
4. Add unit tests for the code
5. Add docstring for the code

## Attestation Checklist

- [x] My code follows the style guidelines of this project

- I have checked for/fixed spelling, linting, and other errors
- I have commented my code for clarity
- I have made corresponding changes to the documentation (we use
[TypeDoc](https://typedoc.org/) to document our code)
- My changes generate no new warnings
- I have added tests that validates my changes, and provides sufficient
test coverage. I have tested with:
  - Local testing
  - E2E testing in Teams
- New and existing unit tests pass locally with my changes
@lilyydu lilyydu closed this as completed Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 parity JS → dotnet and/or JS → Python Python Change/fix applies to Python. If all three, use the 'JS & dotnet & Python' label small tshirt size small (1-4 days)
Projects
None yet
Development

No branches or pull requests

3 participants