数据格式的处理 #108

suooous · 2025-02-08T14:06:47Z

大神，您好，在您的提供的数据集中，提供的文件格式是 npy,但是你的参数设置的是 tsv文件格式的，请问如何解决

ImaiChika · 2025-02-23T08:55:02Z

这个是数据预处理得到的train_dataset.tsv、valid_dataset.tsv、test_dataset.tsv吧，用data_process/main.py生成的

suooous · 2025-02-23T08:57:32Z

那我们是从pacp文件处理而来，还是从这个npy文件处理而来

ImaiChika · 2025-02-23T08:59:16Z

这个main.py代码27行有一个pcap路径，40，41行是作者给的npy路径，不过他的是x_datagram_train.npy

suooous · 2025-02-23T09:01:14Z

所以说我们直接使用自己的pcap就可以了嘛，将pcap转化为tsv

ImaiChika · 2025-02-23T09:02:12Z

我是这样做的，tsv应该是用于微调的，不过我还没有生成微调模型

suooous · 2025-02-23T09:03:43Z

请问你现在做到哪一步了嘛

suooous · 2025-02-23T09:04:10Z

按照作者的意思，是不是直接下载它的模型，用我们自己的数据微调就可以了

ImaiChika · 2025-02-23T09:05:27Z

pretrained_model.bin应该是要下载的，我下一步是生成微调模型

suooous · 2025-02-23T09:08:10Z

好的，我最近也要重新训练一下，如果有什么问题，可以请教一下你嘛

ImaiChika · 2025-02-23T09:10:15Z

好的，我是刚开始接触这些的，可以交流一下

suooous · 2025-02-23T09:17:43Z

好的，十分感谢

suooous · 2025-02-28T03:23:21Z

好的，我是刚开始接触这些的，可以交流一下

请你复现到哪一步了嘛，我现在用作者的npy文件生成了tsv文件，同时词汇表都是使用作者的，然后尝试去微调模型，但是出现了

return forward_call(*args, **kwargs)

File "D:\Anaconda3\envs\bertf\lib\site-packages\torch\nn\modules\sparse.py", line 164, in forward
return F.embedding(
File "D:\Anaconda3\envs\bertf\lib\site-packages\torch\nn\functional.py", line 2267, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

我查了资料，发现是 embedding 层出现了索引超出范围的问题。具体来说，是因为输入数据中的 token ID 超出了词表大小的范围。
请你你有遇到这个问题嘛

ImaiChika · 2025-02-28T04:47:41Z

这个我还没遇到

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

数据格式的处理 #108

数据格式的处理 #108

suooous commented Feb 8, 2025

ImaiChika commented Feb 23, 2025

suooous commented Feb 23, 2025

ImaiChika commented Feb 23, 2025 •

edited

Loading

suooous commented Feb 23, 2025

ImaiChika commented Feb 23, 2025

suooous commented Feb 23, 2025

suooous commented Feb 23, 2025

ImaiChika commented Feb 23, 2025

suooous commented Feb 23, 2025

ImaiChika commented Feb 23, 2025

suooous commented Feb 23, 2025

suooous commented Feb 28, 2025

ImaiChika commented Feb 28, 2025

数据格式的处理 #108

数据格式的处理 #108

Comments

suooous commented Feb 8, 2025

ImaiChika commented Feb 23, 2025

suooous commented Feb 23, 2025

ImaiChika commented Feb 23, 2025 • edited Loading

suooous commented Feb 23, 2025

ImaiChika commented Feb 23, 2025

suooous commented Feb 23, 2025

suooous commented Feb 23, 2025

ImaiChika commented Feb 23, 2025

suooous commented Feb 23, 2025

ImaiChika commented Feb 23, 2025

suooous commented Feb 23, 2025

suooous commented Feb 28, 2025

ImaiChika commented Feb 28, 2025

ImaiChika commented Feb 23, 2025 •

edited

Loading