Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TransformersPipeline] Add in and refactor TransformersPipeline args #1218

Merged
merged 6 commits into from
Sep 6, 2023

Conversation

dsikka
Copy link
Contributor

@dsikka dsikka commented Aug 29, 2023

For this ticket: https://app.asana.com/0/1201735099598270/1205276886236966/f

Summary:

  • Updates the TransformersPipeline constructor to add in two arguments: config and tokenizer
  • For both of these, the user can provide a string, path or transformers object which will be used as opposed to relying on a deployment directory with the expected json files. By default, these will both be None and in that case, the normal deployment directory workflow will be used.
  • Additionally, the config argument may also be a dictionary
  • To support this functionality, the get_onnx_path_and_configs is refactored/separated into two separate functions, get_hugging_face_configs and get_onnx_path

Testing:

  • Tested locally using a variety of combinations for config and tokenizer

Example:

from deepsparse import Pipeline
from transformers import LlamaConfig, LlamaTokenizerFast

tokenizer = LlamaTokenizerFast.from_pretrained("hf-internal-testing/llama-tokenizer")
config = {
   "_name_or_path": None,
   "architectures": [
      "LlamaForCausalLM"
   ],
   "bos_token_id": 1,
   "eos_token_id": 2,
   "hidden_act": "silu",
   "hidden_size": 5120,
   "initializer_range": 0.02,
   "intermediate_size": 13824,
   "max_position_embeddings": 4096,
   "model_type": "llama",
   "num_attention_heads": 40,
   "num_hidden_layers": 40,
   "num_key_value_heads": 40,
   "pretraining_tp": 1,
   "rms_norm_eps": 1e-05,
   "rope_scaling": None,
   "tie_word_embeddings": "false",
   "torch_dtype": "float16",
   "transformers_version": "4.31.0.dev0",
   "use_cache": "true",
   "vocab_size": 32000
}

llama = Pipeline.create(
   task="text-generation",
   model_path="/home/dsikka/models_llama/deployment_13",
   engine_type="onnxruntime",
   deterministic=False,
   config=config,
   tokenizer=tokenizer
)

inference = llama(sequences=["Hello?"])
for s in inference.sequences:
   print(s)

@dsikka dsikka marked this pull request as ready for review August 29, 2023 22:56
@bfineran bfineran merged commit 0f0029a into main Sep 6, 2023
@bfineran bfineran deleted the new_args branch September 6, 2023 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants