Skip to content

Commit

Permalink
Update STEVE-1 doc
Browse files Browse the repository at this point in the history
  • Loading branch information
muzhancun committed Jan 9, 2025
1 parent fdc71d0 commit ad44723
Show file tree
Hide file tree
Showing 5 changed files with 92 additions and 7 deletions.
26 changes: 26 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/docker-existing-dockerfile
{
"name": "Existing Dockerfile",
"build": {
// Sets the run context to one level up instead of the .devcontainer folder.
"context": "..",
// Update the 'dockerFile' property if you aren't using the standard 'Dockerfile' filename.
"dockerfile": "../Dockerfile"
}

// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],

// Uncomment the next line to run commands after the container is created.
// "postCreateCommand": "cat /etc/os-release",

// Configure tool-specific properties.
// "customizations": {},

// Uncomment to connect as an existing user other than the container default. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "devcontainer"
}
Binary file added docs/source/_static/image/steve.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/image/steve_hindsight.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 60 additions & 0 deletions docs/source/models/baseline-steve1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,63 @@ Built-in Models: STEVE-1
======================================================================
`STEVE-1: A Generative Model for Text-to-Behavior in Minecraft <https://arxiv.org/abs/2306.00937>`_

.. admonition:: Quick Facts

STEVE-1 [1]_ finetunes VPT to follow short-horizon open-ended text and visual instructions without the need of costly human annotations.

Insights
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pre-trained foundation models demonstrates suprising ability to be efficiently fine-tuned for becoming instruction-following.
In sequential decision-making domains, two foundation models in Minecraft are released: VPT [2]_ and MineCLIP [3]_, opening intriguing possibilities for exploring the finetuning of instruction-awared decision-making agents.

The authors draw insights from unCLIP [4]_ to propose a two-stage learning framework for training STEVE-1, eliminating laborious human annotations.

Method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. figure:: ../_static/image/steve.png
:width: 800
:align: center

STEVE-1 architecture. Image credit: [1]_

To create a policy in Minecraft conditioned text instructions :math:`y`, they ultilize a dataset of (partially) annotated trajectories :math:`[(\tau_1, y_1), (\tau_2, y_2), \dots, (\tau_n, \emptyset)]`
They employ MineCLIP which is capable of generating aligned latents :math:`z_{\tau_{t:t+16}}` and :math:`z_y`, where :math:`z_{\tau_{\text{goal}}} = z_{\tau_{t:t+16}}` is an embedding of 16 consecutive frames.

The instruction-following model is composed of a ``policy`` and a ``prior``:

.. math::
p(\tau | y) = p(\tau, z_{\tau_{\text{goal}}} | y) = p(\tau | z_{\tau_{\text{goal}}} ) p(z_{\tau_{\text{goal}}} | y),
where the policy generates a trajectory :math:`\tau` conditioned on the aligned latents :math:`z_{\tau_{\text{goal}}}` and the prior generates :math:`z_{\tau_{\text{goal}}}` conditioned on the instruction :math:`y`.

To train the policy, they use a modification of hindsight relabeling to generate goals for each trajectory:

.. figure:: ../_static/image/steve_hindsight.png
:width: 800
:align: center

They randomly select timesteps from episodes and use hindsight relabeling to set the intermediate goals for the trajectory segments to those visual
MineCLIP embeddings. Image credit: [1]_

By finetuning VPT on this dataset, the policy learns to reach given goal states (visual goals).

To also learn to follow text instructions, they train a conditioned variational autoencoder (CVAE) with Gaussian prior and posterior to translate from a text embedding :math:`z_y` to a visual embedding :math:`z_{\tau_{\text{goal}}}`.
The training objective is a standard ELBO loss:

.. math::
\mathcal{L}_{\text{prior}}(\phi) = \mathbb{E}_{(z_{\tau_{\text{goal}}}, z_y) \sim \mathcal{D}_{\text{labels}}} \left[ \text{KL}(q_{\phi}(z_{\tau_{\text{goal}}}|z_y)||p(z_{\tau_{\text{goal}}})) - \mathbb{E}_{c \sim q_\phi(z_{\tau_{\text{goal}}}|z_y)}[\log p_\phi(z_{\tau_{\text{goal}}}|c,z_y)] \right].
They ultilize classifier-free guidance to train the policy, where the goal embedding is occasionally droped out during training.
During inference, they compute a combination of logits with and without combination to generate the final trajectory.


Citations
---------

.. [1] Lifshitz S, Paster K, Chan H, et al. Steve-1: A generative model for text-to-behavior in minecraft[J]. Advances in Neural Information Processing Systems, 2024, 36.
.. [2] Baker B, Akkaya I, Zhokov P, et al. Video pretraining (vpt): Learning to act by watching unlabeled online videos[J]. Advances in Neural Information Processing Systems, 2022, 35: 24639-24654.
.. [3] Fan L, Wang G, Jiang Y, et al. Minedojo: Building open-ended embodied agents with internet-scale knowledge[J]. Advances in Neural Information Processing Systems, 2022, 35: 18343-18362.
.. [4] Ramesh A, Dhariwal P, Nichol A, et al. Hierarchical text-conditional image generation with clip latents[J]. arXiv preprint arXiv:2204.06125, 2022, 1(2): 3.
13 changes: 6 additions & 7 deletions minestudio/tutorials/simulator/test_play.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,17 @@
)
from minestudio.simulator.utils.gui import RecordDrawCall, CommandModeDrawCall, SegmentDrawCall
from functools import partial
from minestudio.models import load_openai_policy, load_rocket_policy
if __name__ == '__main__':
agent_generator = partial(
load_rocket_policy,
ckpt_path = 'YOUR CKPT PATH',
)
# agent_generator = partial(
# load_rocket_policy,
# ckpt_path = 'YOUR CKPT PATH',
# )
sim = MinecraftSim(
obs_size=(224, 224),
action_type="env",
callbacks=[
PlaySegmentCallback(sam_path='YOUR SAM PATH', sam_choice='small'),
PlayCallback(agent_generator=agent_generator, extra_draw_call=[RecordDrawCall, CommandModeDrawCall, SegmentDrawCall]),
# PlaySegmentCallback(sam_path='YOUR SAM PATH', sam_choice='small'),
PlayCallback(agent_generator=None, extra_draw_call=[RecordDrawCall, CommandModeDrawCall]),
RecordCallback(record_path='./output', recording=False),
]
)
Expand Down

0 comments on commit ad44723

Please sign in to comment.