Question about the dataset files #2

handsomelada · 2024-12-16T03:48:13Z

Thanks for your excellent work！can you describe how to generate the following files?
And can you describe how to organize the metaworld dataset and how to generate these vqvae latent save files.

the 'metaworld_'+task+'.pkl' file in latendata.py

        for ind, task in enumerate(tasks):
            data = pickle.load(open(osp.join(folder_robot, 'metaworld_'+task+'.pkl'), 'rb'))
            #print(len(data))
            data = data[:20]

See the code in latendata.py#L1421

the video_id.npy, robot_latents_{idr}.npz, video.npy, and result.txt file in the line latendata.py

        self.video_list = np.load('./video_id.npy')

        self.robot_datas = [np.load(f'./data_meta/robot_latents_{idr}.npz')['robot'] for idr in range(len(self.files_obs))]
        self.wild_obs = np.concatenate([np.load(f'./data_meta/wild_latents_{idw}.npz')['wild'] for idw in range(1,3)], axis=0)
        self.wild_len = self.wild_obs.shape[0]  
        self.video_dict = np.load('./data_meta/video.npy', allow_pickle=True).flatten()[0]
        self.video_dict_ = sorted(self.video_dict.items(), key=lambda x: x[1])
        self.cumu_idx = [self.video_dict_[i][1] for i in range(len(self.video_dict_))]
        #print(self.cumu_idx[-1],self.video_list[-1])
        self.wild_desc = []
        file = open('./data/result/result.txt','r')  #open prompts file

See the code in latendata.py#L1480-1490

The text was updated successfully, but these errors were encountered:

Liujian1997 · 2024-12-20T02:39:53Z

the same question.

wyl1253 · 2025-01-14T14:10:33Z

the same question

tinnerhrhe · 2025-01-16T12:58:13Z

Thanks for your questions, and sorry for the late reply.

the 'metaworld_'+task+'.pkl' is the expert dataset collected by the scripts policy defined in the Metaworld benchmark. Unfortunately, since I have lost access to the server where these datasets were stored, I can not open-source these datasets. However, I think it is easy to collect expert demos for Metaworld tasks and you can follow the instructions provided in Readme. The pkl file is structured as a dict = {'observations':...,'actions':...}. Note that observations are images.
For the purpose of debugging and quick training, I first encode both the robot videos and the human videos into latent codes via the well-trained VQVAE. You can refer to the following codes for an implementation example.

batch = []

mini_batch_size = 256

      for idx in range(self._clips.num_clips()):

          video, _, _, idx_ = self._clips.get_clip(idx)

          batch.append(preprocess(video, resolution))

          if len(batch) == mini_batch_size:

              embeds = self.vqvae.encode(torch.stack(batch))

              for item in embeds:

                  if item.shape==(4, 24, 24):

                      self.wild_obs.append(item.cpu().numpy())

                      self.wild_len += 1

              batch = []

              print(np.array(self.wild_obs).shape)

              #print("svaed")

          if idx % 100000 == 0 and idx != 0:

              np.savez(f'./data/wild_latents_v1_{idx//100000}.npz', wild=np.array(self.wild_obs))

              self.wild_obs = []

          self.wild_obs.append(self.vqvae.encode(preprocess(video, resolution).unsqueeze(0)).squeeze(0))

wyl1253 · 2025-01-16T13:32:02Z

Thank you so much for your detailed response! I really appreciate it. Could you also provide a simple explanation of the contents of the following files: video_id.npy, robot_latents_{idr}.npz, video.npy, and result.txt?

tinnerhrhe · 2025-01-16T14:03:55Z

robot_latents_{idr}.npz is the latent code encoded by VQVAE, which is similar to wild_latents. video_id.npy records the index of the last clips for each video. Here is a simple implementation:

        prev_idx = 0
        video_dict = {}
        for idx in range(self._clips.num_clips()):
            cur_id = self._clips.get_clip_location(idx)
            if cur_id[0] > prev_idx:
                video_dict[self._clips.video_paths[prev_idx]] = idx
                prev_idx = cur_id[0]
        video_dict[self._clips.video_paths[cur_id[0]]] = idx
        print(video_dict)
        np.save('./data/video.npy', video_dict)

When I was working on this paper, I used GPT4o to generate descriptions for each human video, and I saved the generated captions in result.txt. However, I only used the annotations provided by Ego4d in the experiments of the paper. You do not need this to train a discrete diffusion policy.

Btw, thanks for your questions. I would like to apologize for the confusion caused by the codes.

wyl1253 · 2025-01-16T15:46:27Z

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the dataset files #2

Question about the dataset files #2

handsomelada commented Dec 16, 2024

Liujian1997 commented Dec 20, 2024

wyl1253 commented Jan 14, 2025

tinnerhrhe commented Jan 16, 2025

wyl1253 commented Jan 16, 2025

tinnerhrhe commented Jan 16, 2025 •

edited

Loading

wyl1253 commented Jan 16, 2025

Question about the dataset files #2

Question about the dataset files #2

Comments

handsomelada commented Dec 16, 2024

Liujian1997 commented Dec 20, 2024

wyl1253 commented Jan 14, 2025

tinnerhrhe commented Jan 16, 2025

wyl1253 commented Jan 16, 2025

tinnerhrhe commented Jan 16, 2025 • edited Loading

wyl1253 commented Jan 16, 2025

tinnerhrhe commented Jan 16, 2025 •

edited

Loading