Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the dataset files #2

Open
handsomelada opened this issue Dec 16, 2024 · 6 comments
Open

Question about the dataset files #2

handsomelada opened this issue Dec 16, 2024 · 6 comments

Comments

@handsomelada
Copy link

Thanks for your excellent work!can you describe how to generate the following files?
And can you describe how to organize the metaworld dataset and how to generate these vqvae latent save files.

  1. the 'metaworld_'+task+'.pkl' file in latendata.py
        for ind, task in enumerate(tasks):
            data = pickle.load(open(osp.join(folder_robot, 'metaworld_'+task+'.pkl'), 'rb'))
            #print(len(data))
            data = data[:20]

See the code in latendata.py#L1421

  1. the video_id.npy, robot_latents_{idr}.npz, video.npy, and result.txt file in the line latendata.py
        self.video_list = np.load('./video_id.npy')

        self.robot_datas = [np.load(f'./data_meta/robot_latents_{idr}.npz')['robot'] for idr in range(len(self.files_obs))]
        self.wild_obs = np.concatenate([np.load(f'./data_meta/wild_latents_{idw}.npz')['wild'] for idw in range(1,3)], axis=0)
        self.wild_len = self.wild_obs.shape[0]  
        self.video_dict = np.load('./data_meta/video.npy', allow_pickle=True).flatten()[0]
        self.video_dict_ = sorted(self.video_dict.items(), key=lambda x: x[1])
        self.cumu_idx = [self.video_dict_[i][1] for i in range(len(self.video_dict_))]
        #print(self.cumu_idx[-1],self.video_list[-1])
        self.wild_desc = []
        file = open('./data/result/result.txt','r')  #open prompts file

See the code in latendata.py#L1480-1490

@Liujian1997
Copy link

the same question.

@wyl1253
Copy link

wyl1253 commented Jan 14, 2025

the same question

@tinnerhrhe
Copy link
Owner

Thanks for your questions, and sorry for the late reply.

  1. the 'metaworld_'+task+'.pkl' is the expert dataset collected by the scripts policy defined in the Metaworld benchmark. Unfortunately, since I have lost access to the server where these datasets were stored, I can not open-source these datasets. However, I think it is easy to collect expert demos for Metaworld tasks and you can follow the instructions provided in Readme. The pkl file is structured as a dict = {'observations':...,'actions':...}. Note that observations are images.
  2. For the purpose of debugging and quick training, I first encode both the robot videos and the human videos into latent codes via the well-trained VQVAE. You can refer to the following codes for an implementation example.
batch = []

mini_batch_size = 256

      for idx in range(self._clips.num_clips()):

          video, _, _, idx_ = self._clips.get_clip(idx)

          batch.append(preprocess(video, resolution))

          if len(batch) == mini_batch_size:

              embeds = self.vqvae.encode(torch.stack(batch))

              for item in embeds:

                  if item.shape==(4, 24, 24):

                      self.wild_obs.append(item.cpu().numpy())

                      self.wild_len += 1

              batch = []

              print(np.array(self.wild_obs).shape)

              #print("svaed")

          if idx % 100000 == 0 and idx != 0:

              np.savez(f'./data/wild_latents_v1_{idx//100000}.npz', wild=np.array(self.wild_obs))

              self.wild_obs = []

          self.wild_obs.append(self.vqvae.encode(preprocess(video, resolution).unsqueeze(0)).squeeze(0))

@wyl1253
Copy link

wyl1253 commented Jan 16, 2025

Thank you so much for your detailed response! I really appreciate it. Could you also provide a simple explanation of the contents of the following files: video_id.npy, robot_latents_{idr}.npz, video.npy, and result.txt?

@tinnerhrhe
Copy link
Owner

tinnerhrhe commented Jan 16, 2025

robot_latents_{idr}.npz is the latent code encoded by VQVAE, which is similar to wild_latents. video_id.npy records the index of the last clips for each video. Here is a simple implementation:

        prev_idx = 0
        video_dict = {}
        for idx in range(self._clips.num_clips()):
            cur_id = self._clips.get_clip_location(idx)
            if cur_id[0] > prev_idx:
                video_dict[self._clips.video_paths[prev_idx]] = idx
                prev_idx = cur_id[0]
        video_dict[self._clips.video_paths[cur_id[0]]] = idx
        print(video_dict)
        np.save('./data/video.npy', video_dict)

When I was working on this paper, I used GPT4o to generate descriptions for each human video, and I saved the generated captions in result.txt. However, I only used the annotations provided by Ego4d in the experiments of the paper. You do not need this to train a discrete diffusion policy.

Btw, thanks for your questions. I would like to apologize for the confusion caused by the codes.

@wyl1253
Copy link

wyl1253 commented Jan 16, 2025

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants