Skip to content

Commit

Permalink
improve and clean tokenizer
Browse files Browse the repository at this point in the history
  • Loading branch information
kuronosec committed Mar 25, 2024
1 parent f3b442d commit 42e770f
Showing 1 changed file with 2 additions and 4 deletions.
6 changes: 2 additions & 4 deletions analysis/ethereum_smart_contracts/GPT_tokenizer.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "FPuIaYYJLjvJ"
},
"source": [
"# Training a new tokenizer based on smart contract disassembled code"
"# Train a new tokenizer based on smart contract opcodes"
]
},
{
Expand Down Expand Up @@ -90,9 +90,7 @@
"outputs": [],
"source": [
"tokenizer.save_pretrained(\"/data/forta/ethereum/tokenizer\")\n",
"print(tokenizer)\n",
"encode = old_tokenizer.tokenize(\"PUSH1 PUSH1 MSTORE PUSH1 CALLDATASIZE LT PUSH2\", return_tensors=\"pt\")\n",
"print(type(encode))"
"print(tokenizer)"
]
},
{
Expand Down

0 comments on commit 42e770f

Please sign in to comment.