README > Tile Iterator Concepts
Note: CUTLASS 3.0 deprecates all tile access iterators in favour of CuTe's single
vocabulary type cute::Tensor
, which is parameterized on cute::Layout
.
cute::Tensor
s can therefore be manipulated with the same layout algebra as all CuTe layouts.
This removes the need for bespoke types that encapsulate iterator properties.
The following text thus only applies to legacy CUTLASS 2.x API and related types.
CUTLASS 2.x implements generic algorithms on tiles of matrix or tensors of constant size. These may be considered as partitions of tensors of infinite size, with a range of partitions accessible by tile iterators.
Various data structures may make operations such as random access to tiles inexpensive, while data structures may not offer random access at all. For example, iterating over a linked list of matrices requires sequential traversal. Algorithms implemented in terms of sequences of tiles should require only the minimum set of operators be defined for tile iterators.
This document describes a set of C++ concepts which may be used to define tile iterators used
by CUTLASS algorithms. ("Concept" here does not refer to a C++20 concept that uses the concept
keyword.
Rather, it refers to a set of requirements on a type.)
Each concept specifies members and type definitions that a tile iterator
must implement. Frequently, a tile iterator implements several concepts, and its members are
the union of the members from each individual concept. These definitions were inspired by
Boost "New style" iterator concepts.
The set of all possible combinations of these concepts is quite large, however most tile iterator templates can be described by one of several combinations. The section Frequently Used Tile Iterator Concepts describes several common interfaces used throughout CUTLASS.
Base Tile Iterator Concept. All tile iterators must describe an Element type as well as a Shape.
/// Base concept for all tile iterators
struct TileIteratorConcept {
using Element; ///< Element type composing tile (concept: numeric type or Array<>)
using Shape; ///< Shape type describing extent of tile. The shape concept depends
/// on iterator implementation.
};
Contiguous Memory Tile Iterator Concept. Iterators over tiles stored arbitrarily within a continuous block of data in memory. Linear offset in units of Element may be added to internally held pointers to 'move' the iterator in memory.
/// Tile iterator over partitions of a tensor in contiguous memory which may be referenced via a
/// TensorRef object.
struct ContiguousMemoryTileIterator : public TileIteratorConcept {
using Index; ///< index type used to add pointer offsets
/// Adds a linear offset in units of Element to internal pointer(s) into tensor
CUTLASS_DEVICE
void add_pointer_offset(Index pointer_offset);
};
Readable Tile Iterator Concept. Iterators that may be read from define a Fragment
type holding
each thread's part of the data to be loaded. An explicit load()
method reads the tile from memory,
and places each thread's part in its Fragment
object.
/// Tile iterator capable of loading tiles from memory into fragments
struct ReadableTileIteratorConcept {
using Fragment; ///< fragment object derived from cutlass::Array<Element, N>
CUTLASS_DEVICE
void load(Fragment &frag); ///< loads a fragment from memory
};
Readable Contiguous Tile Iterator Concept. Iterators reading from contiguous memory support an optional pointer offset that is added to any internally managed pointers before performing the load. This provides a convenient method to fold an offset in with load operations.
/// Union of the following tile iterator concepts:
///
/// - ReadableTileIteratorConcept
/// - ContiguousMemoryTileIterator
///
struct ReadableContiguousTileIteratorConcept :
public ReadableTileIteratorConcept,
public ContiguousMemoryTileIterator {
/// Loads a fragment from memory with additional logical offset
CUTLASS_DEVICE
void load_with_pointer_offset(
Fragment &frag, ///< fragment to load from the tensor
Index pointer_offset); ///< loads a tile with a linear offset
};
Writeable Tile Iterator Concept. Iterators that may write to memory define a Fragment
type holding
each thread's part of the data to be written. An explicit store()
method writes the tile to memory.
/// Tile iterator capable of storing tiles from memory
struct WriteableTileIteratorConcept {
using Fragment; ///< fragment object derived from cutlass::Array<Element, N>
/// Stores a fragment to memory
CUTLASS_DEVICE
void store(Fragment const &frag); ///< stores a fragment to memory
};
Writeable Contiguous Tile Iterator Concept. Iterators writing to contiguous memory support an optional pointer offset that is added to any internally managed pointers before performing the store operation. This provides a convenient method to fold an offset into the store.
/// Union of the following tile iterator concepts:
///
/// - WriteableTileIteratorConcept
/// - ContiguousMemoryTileIterator
///
struct WriteableContiguousTileIteratorConcept :
public WriteableTileIteratorConcept,
public ContiguousMemoryTileIterator {
/// Loads a fragment from memory with additional logical offset
CUTLASS_DEVICE
void store_with_pointer_offset(
Fragment const &frag, ///< fragment to store to the tensor
Index pointer_offset); ///< stores a tile with a linear offset
};
Forward Tile Iterator Concept. This concept offers traversal "forward" by one tile in a pre-defined sequence. Often, this sequence is relevant to the context in which the iterator was defined, such as along the K dimension of a GEMM operation. Equality operators are defined to determine whether two iterators point to the same tile.
/// Tile iterator that may be incremented along a traversal sequence.
struct ForwardTileIteratorConcept {
CUTLASS_DEVICE bool operator==(TileIterator const &it); ///< true if iterators point to same tile, false if otherwise
CUTLASS_DEVICE bool operator!=(TileIterator const &it); ///< false if iterators point to same tile, true if otherwise
CUTLASS_DEVICE ForwardTileIteratorConcept & operator++(); ///< pre-increment - advance to next tile in sequence
CUTLASS_DEVICE ForwardTileIteratorConcept operator++(int); ///< post-increment - advance to next tile in sequence
};
Bidirectional Tile Iterator Concept. This concept permits traversal both forward and backward.
/// Tile iterator which may be traverse in both directions along a defined sequence.
struct BidirectionalTileIteratorConcept : public ForwardTileIteratorConcept {
CUTLASS_DEVICE
BidirectionalTileIteratorConcept & operator--(); ///< pre-decrement - traverse to previous tile in sequence
CUTLASS_DEVICE
BidirectionalTileIteratorConcept operator--(int); ///< post-decrement - traverse to previous tile in sequence
};
Random Access Tile Iterator Concept. This iterator defines random access operations in the logical coordinate system of the underlying tensor. Thus, tensors must have a defined Layout with associated TensorCoord coordinate describing logical position within the tensor and TensorRef reference type. It may be advanced forward or backwards by an offset specified as units of whole tiles along each dimension.
/// Tile iterator offering random access to tiles in contiguous memory.
struct RandomAccessTileIteratorConcept :
public BidirectionalTileIteratorConcept,
public ContiguousMemoryTileIterator {
using Layout; ///< Layout object mapping
using TensorRef; ///< Tensor Reference object
using TensorCoord; ///< Logical coordinate in referenced tensor
///< advances in units of whole tiles along the logical coordinate space of the tensor
CUTLASS_DEVICE
RandomAccessTileIteratorConcept & add_tile_offset(TensorCoord const &tile_offset);
///< advances in units of whole tiles along the logical coordinate space of the tensor
CUTLASS_DEVICE
RandomAccessTileIteratorConcept & operator+=(TensorCoord const &tile_offset);
///< advances in units of whole tiles along the logical coordinate space of the tensor
CUTLASS_DEVICE
RandomAccessTileIteratorConcept & operator-=(TensorCoord const &tile_offset);
};
Readable Random Access Tile Iterator Concept. Readable random access iterators accept an additional tile offset in logical coordinate space when loading fragments.
/// Loads a fragment with a logical coordinate offset in units of whole tiles.
struct ReadableRandomAccessTileIteratorConcept :
public RandomAccessTileIteratorConcept,
public ReadableTileIteratorConcept {
/// Loads a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void load(
Fragment &frag, ///< fragment to load from the tensor
TensorCoord const &tile_offset); ///< loads a tile with a logical offset in units of whole tiles
};
Readable Random Access Contiguous Tile Iterator Concept. Readable random access iterators accept an additional tile offset in logical coordinate space when loading fragments.
/// Loads a fragment with a logical coordinate offset in units of whole tiles.
struct ReadableRandomAccessContiguousTileIteratorConcept :
public ReadableRandomAccessTileIteratorConcept,
ReadableContiguousTileIteratorConcept {
/// Loads a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void load(
Fragment &frag, ///< fragment to load from the tensor
TensorCoord const &tile_offset, ///< loads a tile with a logical offset in units of whole tiles
Index pointer_offset); ///< loads a tile with a logical offset AND a pointer offset
};
Writeable Random Access Tile Iterator Concept. Writeable random access iterators accept an additional tile offset in logical coordinate space when storing fragments.
/// Stores a fragment with a logical coordinate offset in units of whole tiles.
struct WriteableRandomAccessTileIteratorConcept :
public RandomAccessTileIteratorConcept,
public WriteableContiguousTileIteratorConcept {
/// Stores a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void store(
Fragment const &frag, ///< fragment to store to the location pointed to by the tensor
TensorCoord const &tile_offset); ///< stores a tile with a given offset from the current iterator
};
Writeable Random Access Contiguous Tile Iterator Concept. Writeable random access iterators accept an additional tile offset in logical coordinate space when storing fragments.
/// Stores a fragment with a logical coordinate offset in units of whole tiles.
struct WriteableRandomAccessContiguousTileIteratorConcept :
public WriteableRandomAccessTileIteratorConcept,
public WriteableContiguousTileIteratorConcept {
/// Stores a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void store(
Fragment const &frag, ///< fragment to store to the location pointed to by the tensor
TensorCoord const &tile_offset, ///< stores a tile with a logical offset in units of whole tiles
Index pointer_offset); ///< stores a tile witha logical offset AND a pointer offset
};
Masked Tile Iterator Concept. Matrix and tensors may not always be multiples of whole tiles.
Masked tile iterators define a Mask
type which may be used to guard accesses to memory. The
semantics and interface of this Mask
are implementation-defined details of each tile iterator,
but several convenience methods are defined for interacting with the mask such as efficiently
clearing or enabling all guarded memory accesses.
/// Supports iterating over tiles that are not 'whole' in memory. Iterator maintains a mask object
/// which guards against out-of-bounds access.
///
/// Note, this concept definition does not formally define operations on the mask or methods it
/// supports. These remain implementation-dependent details of iterators implementing this concept.
struct MaskedTileIteratorConcept {
using Mask; ///< mask object used to guard against acceses.
CUTLASS_DEVICE void clear_mask(); ///< efficiently disables all accesses guarded by mask
CUTLASS_DEVICE void enable_mask(); ///< efficiently enables all accesses guarded by mask
CUTLASS_DEVICE void get_mask(Mask &mask); ///< gets the mask
CUTLASS_DEVICE void set_mask(Mask const &mask); ///< sets the mask
};
This section describes several frequently used compositions of the basic tile iterator concepts. They are listed here as complete type declarations for convenience of the reader.
Writeable, Readable, Forward, Contiguous Memory Tile Iterator Concept. This combines several of the basic iterator concepts to yield a tile iterator capable of loading and storing tiles as well as advancing forward along a traversal sequence.
/// This tile iterator embodies several of the above:
///
/// - ForwardTileIteratorConcept
/// - ReadableContiguousTileIteratorConcept
/// - WriteableContiguousTileIteratorConcept
///
/// It is restated explicitly for convenience of the reader.
///
struct WriteableReadableForwardContiguousTileIteratorConcept {
//
// Data types
//
using Element; ///< Element type composing tile.
using Shape; ///< Shape type describing extent of tile. The shape concept depends
/// on iterator implementation
using Index; ///< index type used as base for TensorCoord
using Fragment; ///< fragment object derived from cutlass::Array<Element, N>
//
// Methods
//
/// Adds a linear offset in units of Element to internal pointer(s) into tensor
CUTLASS_DEVICE
void add_pointer_offset(Index offset);
/// true if iterators point to same tile, false if otherwise
CUTLASS_DEVICE bool operator==(WriteableReadableForwardContiguousTileIteratorConcept const &it);
///< false if iterators point to same tile, true if otherwise
CUTLASS_DEVICE bool operator!=(WriteableReadableForwardContiguousTileIteratorConcept const &it);
/// pre-increment - traverse to next tile in sequence
CUTLASS_DEVICE
WriteableReadableForwardContiguousTileIteratorConcept &
operator++();
///< post-increment - traverse to next tile in sequence
CUTLASS_DEVICE
WriteableReadableForwardContiguousTileIteratorConcept
operator++(int);
/// Loads a fragment from memory
CUTLASS_DEVICE
void load(Fragment &frag); ///< fragment to be loaded from memory
/// Loads a fragment from memory with additional logical offset
CUTLASS_DEVICE
void load_with_pointer_offset(
Fragment &frag, ///< fragment to be loaded from memory
Index pointer_offset); ///< linear offset (in units of Element) when loading
/// Stores a fragment to memory
CUTLASS_DEVICE
void store(Fragment const &frag); ///< fragment to store to memory
/// Stores a fragment from memory with additional logical offset
CUTLASS_DEVICE
void store_with_pointer_offset(
Fragment const &frag, ///< fragment to store to memory
Index pointer_offset); ///< linear offset (in units of Element) when storing
};
Writeable, Readable, Random Access, Contiguous Memory Tile Iterator Concept. This combines several of the basic iterator concepts to yield a tile iterator with random access suitable for loading matrix operands for GEMM.
/// This tile iterator embodies several of the above:
///
/// - ReadableRandomAccessContiguousTileIteratorConcept
/// - WriteableRandomAccessContiguousTileIteratorConcept
///
/// It is restated explicitly for convenience of the reader.
///
struct WriteableReadableRandomAccessContiguousTileIteratorConcept {
//
// Data types
//
using Element; ///< Element type composing tile.
using Shape; ///< Shape type describing extent of tile. The shape concept depends
/// on iterator implementation
using Layout; ///< Layout object mapping
using TensorRef; ///< Tensor Reference object
using TensorCoord; ///< Logical coordinate in referenced tensor
using Index; ///< index type used as base for TensorCoord
using Fragment; ///< fragment object derived from cutlass::Array<Element, N>
//
// Methods
//
/// Adds a linear offset in units of Element to internal pointer(s) into tensor
CUTLASS_DEVICE
void add_pointer_offset(Index pointer_offset);
/// true if iterators point to same tile, false if otherwise
CUTLASS_DEVICE bool operator==(WriteableReadableRandomAccessContiguousTileIteratorConcept const &it);
///< false if iterators point to same tile, true if otherwise
CUTLASS_DEVICE bool operator!=(WriteableReadableRandomAccessContiguousTileIteratorConcept const &it);
/// pre-increment - traverse to next tile in sequence
CUTLASS_DEVICE
WriteableReadableRandomAccessContiguousTileIteratorConcept &
operator++();
///< post-increment - traverse to next tile in sequence
CUTLASS_DEVICE
WriteableReadableRandomAccessContiguousTileIteratorConcept
operator++(int);
/// pre-decrement - traverse to previous tile in sequence
CUTLASS_DEVICE
WriteableReadableRandomAccessContiguousTileIteratorConcept &
operator--();
///< post-decrement - traverse to previous tile in sequence
CUTLASS_DEVICE
WriteableReadableRandomAccessContiguousTileIteratorConcept
operator--(int);
///< advances in units of whole tiles along the logical coordinate space of the tensor
CUTLASS_DEVICE
WriteableReadableRandomAccessContiguousTileIteratorConcept & operator+=(TensorCoord const &tile_offset);
///< advances in units of whole tiles along the logical coordinate space of the tensor
CUTLASS_DEVICE
WriteableReadableRandomAccessContiguousTileIteratorConcept & operator-=(TensorCoord const &tile_offset);
/// Loads a fragment from memory
CUTLASS_DEVICE
void load(Fragment &frag); ///< fragment to be loaded from memory
/// Loads a fragment from memory with additional logical offset
CUTLASS_DEVICE
void load_with_pointer_offset(
Fragment &frag, ///< fragment to be loaded from memory
Index pointer_offset); ///< linear offset (in units of Element) when loading
/// Loads a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void load(
Fragment &frag, ///< fragment to be loaded from memory
TensorCoord const &tile_offset); ///< loads a tile with a logical offset in units of whole tiles
/// Loads a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void load(
Fragment &frag, ///< fragment to be loaded from memory
TensorCoord const &tile_offset, ///< loads a tile with a logical offset in units of whole tiles
Index pointer_offset); ///< loads a tile with a logical offset AND a pointer offset
/// Stores a fragment to memory
CUTLASS_DEVICE
void store(Fragment const &frag); ///< fragment to store to memory
/// Loads a fragment from memory with additional logical offset
CUTLASS_DEVICE
void store_with_pointer_offset(
Fragment const &frag, ///< fragment to store to memory
Index pointer_offset); ///< linear offset (in units of Element) when loading
/// Stores a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void store(
Fragment const &frag, ///< fragment to store to memory
TensorCoord const &tile_offset); ///< stores with logical offset in units of whole tiles
/// Stores a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void store(
Fragment const &frag, ///< fragment to store to memory
TensorCoord const &tile_offset, ///< stores with logical offset in units of whole tiles
Index pointer_offset);
};
Copyright (c) 2017 - 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: BSD-3-Clause
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.