Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LerobotDataset pushable to HF from any folder #563

Merged
merged 1 commit into from
Dec 9, 2024

Conversation

Raziel90
Copy link
Contributor

@Raziel90 Raziel90 commented Dec 9, 2024

Path to card_template.md passed in relation to the current location of the lerobot package.

In the previous version the path was served statically and relative to the current folder: ./lerobot/common/datasets/card_template.md This creates problems when launching LerobotDataset.push_to_hub() outside of the package folder.

In the current version the path is provided in relation to the current path of the package: importlib.resources.path("lerobot.common.datasets", "card_template.md") This allows to execute the method create_lerobot_dataset_card from any folder. As long as Lerobot is installed.

On branch fix--dataset_push_to_hub
Changes to be committed:
modified: lerobot/common/datasets/utils.py

What this does

Explain what this PR does. Feel free to tag your PR with the appropriate label(s).

| Fixes #561 | (🐛 Bug) |

How it was tested

  • Executed the dataset creation from both inside and outside the lerobot folder.
    It worked in both cases: https://huggingface.co/datasets/ccop/aloha_stationary_replay_test_v3.
    The script used for the creation of the dataset will be object of another PR once refined. It converts a single episode aloha_hd5 dataset into a Lerobot Dataset V2. A draft snippet will be made available below.
  • tests executed with pytest without problems (Although, I notice that there is no current test for push_to_hub in the suite).

How to checkout & try? (for the reviewer)

Provide a simple way for the reviewer to try out your changes.

import h5py
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
from pathlib import Path
import cv2
import torch
from importlib.resources import path
data_path =  Path('/home/ccop/code/aloha_data')

def get_features(hdf5_file):
    topics = []
    features = {}
    hdf5_file.visititems(lambda name, obj : topics.append(name) if isinstance(obj, h5py.Dataset) else None)
    for topic in topics:
        # print(topic.replace('/', '.'))
        if 'images' in topic.split('/'):
            features[topic.replace('/', '.')] = {
                'dtype': "image",
                'shape': cv2.imdecode(hdf5_file[topic][0], 1).transpose(2, 0, 1).shape,
                'names': None
            }
        elif 'compress_len'  in topic.split('/'):
            continue
        else:
            features[topic.replace('/', '.')] = {
                'dtype': str(hdf5_file[topic][0].dtype),
                'shape': hdf5_file[topic][0].shape,
                'names': None
            }
            
    return features
if __name__ == '__main__':

    with h5py.File(data_path.absolute() / 'aloha_stationary_replay_test/episode_0.hdf5', 'r') as file:
        # List all groups
        print("Keys: %s" % file.keys())
        features = get_features(file)
        n_frames = file['observations/images/cam_high'][:].shape[0]
        print(n_frames)
        # print(cv2.imdecode(file['observations/images/cam_high'][0],1).shape)

    dataset = LeRobotDataset.create(
            repo_id='ccop/aloha_stationary_replay_test_v3',
            fps=50,
            robot_type="aloha-stationary",
            features=features,
            image_writer_threads=4,
        )
    with h5py.File(data_path.absolute() / 'aloha_stationary_replay_test/episode_0.hdf5', 'r') as file:
        # List all groups
        for frame_idx in range(n_frames):
            frame = {}
            for feature in features:
                if 'images' in feature.split('.'):
                    frame[feature] = torch.from_numpy(
                        cv2.imdecode(file[feature.replace('.', '/')][frame_idx], 1).transpose(2, 0, 1))
                else:    
                    frame[feature] = torch.from_numpy(file[feature.replace('.', '/')][frame_idx])
                # print(feature, frame[feature].shape)

            dataset.add_frame(frame)
    print('save episode!')
    dataset.save_episode(task='move_cube')
    dataset.consolidate()
    dataset.push_to_hub()

…f the lerobot package.

In the previous version the path was served statically and relative to the current folder: `./lerobot/common/datasets/card_template.md`
This creates problems when launching LerobotDataset.push_to_hub() outside of the package folder.

In the current version the path is provided in relation to the current path of the package: `importlib.resources.path("lerobot.common.datasets", "card_template.md")`
This allows to execute the method `create_lerobot_dataset_card` from any folder. As long as Lerobot is installed.

On branch fix--dataset_push_to_hub
Changes to be committed:
	modified:   lerobot/common/datasets/utils.py
@Cadene Cadene requested review from aliberts and Cadene December 9, 2024 01:27
Copy link
Collaborator

@aliberts aliberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you @Raziel90!
We indeed didn't focus yet on packaging and releases of our code, this will come after refactoring but this is a welcome fix for people already using LeRobot as a dependency.

Side notes on your conversion script:

  • The task argument for save_episode is supposed to be a prompt in natural language describing your task. I'll try to make this appear more clearly in the code/docs.
- dataset.save_episode(task='move_cube')
+ dataset.save_episode(task='Move the cube to this spot.')
  • I would suggest using the "video" mode for storing images in your dataset as it would really benefit from it given their size (480x848)
if 'images' in topic.split('/'):
    features[topic.replace('/', '.')] = {
-       'dtype': "image",
+       'dtype': "video",
        'shape': cv2.imdecode(hdf5_file[topic][0], 1).transpose(2, 0, 1).shape,
        'names': None
    }

@aliberts aliberts merged commit 44f9b21 into huggingface:main Dec 9, 2024
5 checks passed
helper2424 pushed a commit to helper2424/lerobot that referenced this pull request Dec 17, 2024
villekuosmanen added a commit to villekuosmanen/lerobot that referenced this pull request Dec 30, 2024
* feat: enable to use multiple rgb encoders per camera in diffusion policy (huggingface#484)

Co-authored-by: Alexander Soare <[email protected]>

* Fix config file (huggingface#495)

* fix: broken images and a few minor typos in README (huggingface#499)

Signed-off-by: ivelin <[email protected]>

* Add support for Windows (huggingface#494)

* bug causes error uploading to huggingface, unicode issue on windows. (huggingface#450)

* Add distinction between two unallowed cases in name check "eval_" (huggingface#489)

* Rename deprecated argument (temporal_ensemble_momentum) (huggingface#490)

* Dataset v2.0 (huggingface#461)

Co-authored-by: Remi <[email protected]>

* Refactor OpenX (huggingface#505)

* Fix missing local_files_only in record/replay (huggingface#540)

Co-authored-by: Simon Alibert <[email protected]>

* Control simulated robot with real leader (huggingface#514)

Co-authored-by: Remi <[email protected]>

* Update 7_get_started_with_real_robot.md (huggingface#559)

* LerobotDataset pushable to HF from any folder (huggingface#563)

* Fix example 6 (huggingface#572)

* fixing typo from 'teloperation' to 'teleoperation' (huggingface#566)

* [vizualizer] for LeRobodDataset V2 (huggingface#576)

* Fix broken `create_lerobot_dataset_card`  (huggingface#590)

* feat(act): support training end of episode token to ACT model

* changes

* feat(arx): add arx arm (#2)

* feat(arx): support arx arm

* changes

* changes

* changes

* changes

* pass pipes explicitly

* changes

* us ndarray over a pipe

* changes

* changes

* replay basically works

* patch arx sdk

* changes

* support cameras in arx5

* rename to arx5

* kind of works

* changes

* changes

* changes

* various changes

* changes

* revert a few changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* remove TODO

* allow multiple tasks

---------

Signed-off-by: ivelin <[email protected]>
Co-authored-by: Hirokazu Ishida <[email protected]>
Co-authored-by: Alexander Soare <[email protected]>
Co-authored-by: Arsen Ohanyan <[email protected]>
Co-authored-by: Ivelin Ivanov <[email protected]>
Co-authored-by: Daniel Ritchie <[email protected]>
Co-authored-by: resolver101757 <[email protected]>
Co-authored-by: Jannik Grothusen <[email protected]>
Co-authored-by: KasparSLT <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Michel Aractingi <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: berjaoui <[email protected]>
Co-authored-by: Claudio Coppola <[email protected]>
Co-authored-by: s1lent4gnt <[email protected]>
Co-authored-by: Mishig <[email protected]>
Co-authored-by: Eugene Mironov <[email protected]>
villekuosmanen added a commit to villekuosmanen/lerobot that referenced this pull request Jan 10, 2025
* feat: enable to use multiple rgb encoders per camera in diffusion policy (huggingface#484)

Co-authored-by: Alexander Soare <[email protected]>

* Fix config file (huggingface#495)

* fix: broken images and a few minor typos in README (huggingface#499)

Signed-off-by: ivelin <[email protected]>

* Add support for Windows (huggingface#494)

* bug causes error uploading to huggingface, unicode issue on windows. (huggingface#450)

* Add distinction between two unallowed cases in name check "eval_" (huggingface#489)

* Rename deprecated argument (temporal_ensemble_momentum) (huggingface#490)

* Dataset v2.0 (huggingface#461)

Co-authored-by: Remi <[email protected]>

* Refactor OpenX (huggingface#505)

* Fix missing local_files_only in record/replay (huggingface#540)

Co-authored-by: Simon Alibert <[email protected]>

* Control simulated robot with real leader (huggingface#514)

Co-authored-by: Remi <[email protected]>

* Update 7_get_started_with_real_robot.md (huggingface#559)

* LerobotDataset pushable to HF from any folder (huggingface#563)

* Fix example 6 (huggingface#572)

* fixing typo from 'teloperation' to 'teleoperation' (huggingface#566)

* [vizualizer] for LeRobodDataset V2 (huggingface#576)

* Fix broken `create_lerobot_dataset_card`  (huggingface#590)

* Update README.md (huggingface#612)

* Fix Quality workflow (huggingface#622)

* fix(docs): typos in benchmark readme.md (huggingface#614)

Co-authored-by: Simon Alibert <[email protected]>

* fix(visualise): use correct language description for each episode id (huggingface#604)

Co-authored-by: Simon Alibert <[email protected]>

* typo fix: batch_convert_dataset_v1_to_v2.py (huggingface#615)

Co-authored-by: Simon Alibert <[email protected]>

* [viz] Fixes & updates to html visualizer (huggingface#617)

* fixes to SO-100 readme (huggingface#600)

Co-authored-by: Philip Fung <no@one>
Co-authored-by: Simon Alibert <[email protected]>

---------

Signed-off-by: ivelin <[email protected]>
Co-authored-by: Hirokazu Ishida <[email protected]>
Co-authored-by: Alexander Soare <[email protected]>
Co-authored-by: Arsen Ohanyan <[email protected]>
Co-authored-by: Ivelin Ivanov <[email protected]>
Co-authored-by: Daniel Ritchie <[email protected]>
Co-authored-by: resolver101757 <[email protected]>
Co-authored-by: Jannik Grothusen <[email protected]>
Co-authored-by: KasparSLT <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Michel Aractingi <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: berjaoui <[email protected]>
Co-authored-by: Claudio Coppola <[email protected]>
Co-authored-by: s1lent4gnt <[email protected]>
Co-authored-by: Mishig <[email protected]>
Co-authored-by: Eugene Mironov <[email protected]>
Co-authored-by: CharlesCNorton <[email protected]>
Co-authored-by: Philip Fung <[email protected]>
Co-authored-by: Philip Fung <no@one>
chrisheninger pushed a commit to chrisheninger/lerobot that referenced this pull request Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't push new datasets (v2) to hub unless running the script from the repository folder
2 participants