LerobotDataset pushable to HF from any folder #563

Raziel90 · 2024-12-09T00:59:41Z

Path to card_template.md passed in relation to the current location of the lerobot package.

In the previous version the path was served statically and relative to the current folder: ./lerobot/common/datasets/card_template.md This creates problems when launching LerobotDataset.push_to_hub() outside of the package folder.

In the current version the path is provided in relation to the current path of the package: importlib.resources.path("lerobot.common.datasets", "card_template.md") This allows to execute the method create_lerobot_dataset_card from any folder. As long as Lerobot is installed.

On branch fix--dataset_push_to_hub
Changes to be committed:
modified: lerobot/common/datasets/utils.py

What this does

Explain what this PR does. Feel free to tag your PR with the appropriate label(s).

| Fixes #561 | (🐛 Bug) |

How it was tested

Executed the dataset creation from both inside and outside the lerobot folder.
It worked in both cases: https://huggingface.co/datasets/ccop/aloha_stationary_replay_test_v3.
The script used for the creation of the dataset will be object of another PR once refined. It converts a single episode aloha_hd5 dataset into a Lerobot Dataset V2. A draft snippet will be made available below.
tests executed with pytest without problems (Although, I notice that there is no current test for push_to_hub in the suite).

How to checkout & try? (for the reviewer)

Provide a simple way for the reviewer to try out your changes.

import h5py
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
from pathlib import Path
import cv2
import torch
from importlib.resources import path
data_path =  Path('/home/ccop/code/aloha_data')

def get_features(hdf5_file):
    topics = []
    features = {}
    hdf5_file.visititems(lambda name, obj : topics.append(name) if isinstance(obj, h5py.Dataset) else None)
    for topic in topics:
        # print(topic.replace('/', '.'))
        if 'images' in topic.split('/'):
            features[topic.replace('/', '.')] = {
                'dtype': "image",
                'shape': cv2.imdecode(hdf5_file[topic][0], 1).transpose(2, 0, 1).shape,
                'names': None
            }
        elif 'compress_len'  in topic.split('/'):
            continue
        else:
            features[topic.replace('/', '.')] = {
                'dtype': str(hdf5_file[topic][0].dtype),
                'shape': hdf5_file[topic][0].shape,
                'names': None
            }
            
    return features
if __name__ == '__main__':

    with h5py.File(data_path.absolute() / 'aloha_stationary_replay_test/episode_0.hdf5', 'r') as file:
        # List all groups
        print("Keys: %s" % file.keys())
        features = get_features(file)
        n_frames = file['observations/images/cam_high'][:].shape[0]
        print(n_frames)
        # print(cv2.imdecode(file['observations/images/cam_high'][0],1).shape)

    dataset = LeRobotDataset.create(
            repo_id='ccop/aloha_stationary_replay_test_v3',
            fps=50,
            robot_type="aloha-stationary",
            features=features,
            image_writer_threads=4,
        )
    with h5py.File(data_path.absolute() / 'aloha_stationary_replay_test/episode_0.hdf5', 'r') as file:
        # List all groups
        for frame_idx in range(n_frames):
            frame = {}
            for feature in features:
                if 'images' in feature.split('.'):
                    frame[feature] = torch.from_numpy(
                        cv2.imdecode(file[feature.replace('.', '/')][frame_idx], 1).transpose(2, 0, 1))
                else:    
                    frame[feature] = torch.from_numpy(file[feature.replace('.', '/')][frame_idx])
                # print(feature, frame[feature].shape)

            dataset.add_frame(frame)
    print('save episode!')
    dataset.save_episode(task='move_cube')
    dataset.consolidate()
    dataset.push_to_hub()

…f the lerobot package. In the previous version the path was served statically and relative to the current folder: `./lerobot/common/datasets/card_template.md` This creates problems when launching LerobotDataset.push_to_hub() outside of the package folder. In the current version the path is provided in relation to the current path of the package: `importlib.resources.path("lerobot.common.datasets", "card_template.md")` This allows to execute the method `create_lerobot_dataset_card` from any folder. As long as Lerobot is installed. On branch fix--dataset_push_to_hub Changes to be committed: modified: lerobot/common/datasets/utils.py

aliberts

Awesome, thank you @Raziel90!
We indeed didn't focus yet on packaging and releases of our code, this will come after refactoring but this is a welcome fix for people already using LeRobot as a dependency.

Side notes on your conversion script:

The task argument for save_episode is supposed to be a prompt in natural language describing your task. I'll try to make this appear more clearly in the code/docs.

- dataset.save_episode(task='move_cube')
+ dataset.save_episode(task='Move the cube to this spot.')

I would suggest using the "video" mode for storing images in your dataset as it would really benefit from it given their size (480x848)

if 'images' in topic.split('/'):
    features[topic.replace('/', '.')] = {
-       'dtype': "image",
+       'dtype': "video",
        'shape': cv2.imdecode(hdf5_file[topic][0], 1).transpose(2, 0, 1).shape,
        'names': None
    }

* feat: enable to use multiple rgb encoders per camera in diffusion policy (huggingface#484) Co-authored-by: Alexander Soare <[email protected]> * Fix config file (huggingface#495) * fix: broken images and a few minor typos in README (huggingface#499) Signed-off-by: ivelin <[email protected]> * Add support for Windows (huggingface#494) * bug causes error uploading to huggingface, unicode issue on windows. (huggingface#450) * Add distinction between two unallowed cases in name check "eval_" (huggingface#489) * Rename deprecated argument (temporal_ensemble_momentum) (huggingface#490) * Dataset v2.0 (huggingface#461) Co-authored-by: Remi <[email protected]> * Refactor OpenX (huggingface#505) * Fix missing local_files_only in record/replay (huggingface#540) Co-authored-by: Simon Alibert <[email protected]> * Control simulated robot with real leader (huggingface#514) Co-authored-by: Remi <[email protected]> * Update 7_get_started_with_real_robot.md (huggingface#559) * LerobotDataset pushable to HF from any folder (huggingface#563) * Fix example 6 (huggingface#572) * fixing typo from 'teloperation' to 'teleoperation' (huggingface#566) * [vizualizer] for LeRobodDataset V2 (huggingface#576) * Fix broken `create_lerobot_dataset_card` (huggingface#590) * feat(act): support training end of episode token to ACT model * changes * feat(arx): add arx arm (#2) * feat(arx): support arx arm * changes * changes * changes * changes * pass pipes explicitly * changes * us ndarray over a pipe * changes * changes * replay basically works * patch arx sdk * changes * support cameras in arx5 * rename to arx5 * kind of works * changes * changes * changes * various changes * changes * revert a few changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * remove TODO * allow multiple tasks --------- Signed-off-by: ivelin <[email protected]> Co-authored-by: Hirokazu Ishida <[email protected]> Co-authored-by: Alexander Soare <[email protected]> Co-authored-by: Arsen Ohanyan <[email protected]> Co-authored-by: Ivelin Ivanov <[email protected]> Co-authored-by: Daniel Ritchie <[email protected]> Co-authored-by: resolver101757 <[email protected]> Co-authored-by: Jannik Grothusen <[email protected]> Co-authored-by: KasparSLT <[email protected]> Co-authored-by: Simon Alibert <[email protected]> Co-authored-by: Remi <[email protected]> Co-authored-by: Michel Aractingi <[email protected]> Co-authored-by: Simon Alibert <[email protected]> Co-authored-by: berjaoui <[email protected]> Co-authored-by: Claudio Coppola <[email protected]> Co-authored-by: s1lent4gnt <[email protected]> Co-authored-by: Mishig <[email protected]> Co-authored-by: Eugene Mironov <[email protected]>

* feat: enable to use multiple rgb encoders per camera in diffusion policy (huggingface#484) Co-authored-by: Alexander Soare <[email protected]> * Fix config file (huggingface#495) * fix: broken images and a few minor typos in README (huggingface#499) Signed-off-by: ivelin <[email protected]> * Add support for Windows (huggingface#494) * bug causes error uploading to huggingface, unicode issue on windows. (huggingface#450) * Add distinction between two unallowed cases in name check "eval_" (huggingface#489) * Rename deprecated argument (temporal_ensemble_momentum) (huggingface#490) * Dataset v2.0 (huggingface#461) Co-authored-by: Remi <[email protected]> * Refactor OpenX (huggingface#505) * Fix missing local_files_only in record/replay (huggingface#540) Co-authored-by: Simon Alibert <[email protected]> * Control simulated robot with real leader (huggingface#514) Co-authored-by: Remi <[email protected]> * Update 7_get_started_with_real_robot.md (huggingface#559) * LerobotDataset pushable to HF from any folder (huggingface#563) * Fix example 6 (huggingface#572) * fixing typo from 'teloperation' to 'teleoperation' (huggingface#566) * [vizualizer] for LeRobodDataset V2 (huggingface#576) * Fix broken `create_lerobot_dataset_card` (huggingface#590) * Update README.md (huggingface#612) * Fix Quality workflow (huggingface#622) * fix(docs): typos in benchmark readme.md (huggingface#614) Co-authored-by: Simon Alibert <[email protected]> * fix(visualise): use correct language description for each episode id (huggingface#604) Co-authored-by: Simon Alibert <[email protected]> * typo fix: batch_convert_dataset_v1_to_v2.py (huggingface#615) Co-authored-by: Simon Alibert <[email protected]> * [viz] Fixes & updates to html visualizer (huggingface#617) * fixes to SO-100 readme (huggingface#600) Co-authored-by: Philip Fung <no@one> Co-authored-by: Simon Alibert <[email protected]> --------- Signed-off-by: ivelin <[email protected]> Co-authored-by: Hirokazu Ishida <[email protected]> Co-authored-by: Alexander Soare <[email protected]> Co-authored-by: Arsen Ohanyan <[email protected]> Co-authored-by: Ivelin Ivanov <[email protected]> Co-authored-by: Daniel Ritchie <[email protected]> Co-authored-by: resolver101757 <[email protected]> Co-authored-by: Jannik Grothusen <[email protected]> Co-authored-by: KasparSLT <[email protected]> Co-authored-by: Simon Alibert <[email protected]> Co-authored-by: Remi <[email protected]> Co-authored-by: Michel Aractingi <[email protected]> Co-authored-by: Simon Alibert <[email protected]> Co-authored-by: berjaoui <[email protected]> Co-authored-by: Claudio Coppola <[email protected]> Co-authored-by: s1lent4gnt <[email protected]> Co-authored-by: Mishig <[email protected]> Co-authored-by: Eugene Mironov <[email protected]> Co-authored-by: CharlesCNorton <[email protected]> Co-authored-by: Philip Fung <[email protected]> Co-authored-by: Philip Fung <no@one>

Cadene requested review from aliberts and Cadene December 9, 2024 01:27

aliberts approved these changes Dec 9, 2024

View reviewed changes

aliberts merged commit 44f9b21 into huggingface:main Dec 9, 2024
5 checks passed

helper2424 pushed a commit to helper2424/lerobot that referenced this pull request Dec 17, 2024

LerobotDataset pushable to HF from any folder (huggingface#563)

67f4d7e

michel-aractingi pushed a commit that referenced this pull request Jan 22, 2025

LerobotDataset pushable to HF from any folder (#563)

00dadca

chrisheninger pushed a commit to chrisheninger/lerobot that referenced this pull request Jan 26, 2025

LerobotDataset pushable to HF from any folder (huggingface#563)

8b7ef75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LerobotDataset pushable to HF from any folder #563

LerobotDataset pushable to HF from any folder #563

Raziel90 commented Dec 9, 2024

aliberts left a comment •

edited

Loading

LerobotDataset pushable to HF from any folder #563

LerobotDataset pushable to HF from any folder #563

Conversation

Raziel90 commented Dec 9, 2024

What this does

How it was tested

How to checkout & try? (for the reviewer)

aliberts left a comment • edited Loading

Choose a reason for hiding this comment

aliberts left a comment •

edited

Loading