Skip to content

Commit

Permalink
doc update
Browse files Browse the repository at this point in the history
Former-commit-id: 2ba8442
  • Loading branch information
dumerrill committed Aug 9, 2013
1 parent 9562c59 commit 1504496
Show file tree
Hide file tree
Showing 221 changed files with 5,902 additions and 3,282 deletions.
10 changes: 5 additions & 5 deletions cub/block/block_discontinuity.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ namespace cub {
* \blockcollective{BlockDiscontinuity}
* \par
* The code snippet below illustrates the head flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -233,7 +233,7 @@ public:
* \smemreuse
*
* The code snippet below illustrates the head-flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -318,7 +318,7 @@ public:
* \smemreuse
*
* The code snippet below illustrates the head-flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -418,7 +418,7 @@ public:
* \smemreuse
*
* The code snippet below illustrates the tail-flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -504,7 +504,7 @@ public:
* \smemreuse
*
* The code snippet below illustrates the tail-flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down
8 changes: 4 additions & 4 deletions cub/block/block_exchange.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,10 @@ namespace cub {
*
* \par
* BlockExchange supports the following types of data exchanges:
* - Transposing between [<em>blocked</em>](index.html#sec3sec3) and [<em>striped</em>](index.html#sec3sec3) arrangements
* - Transposing between [<em>blocked</em>](index.html#sec3sec3) and [<em>warp-striped</em>](index.html#sec3sec3) arrangements
* - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec3sec3)
* - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec3sec3)
* - Transposing between [<em>blocked</em>](index.html#sec4sec4) and [<em>striped</em>](index.html#sec4sec4) arrangements
* - Transposing between [<em>blocked</em>](index.html#sec4sec4) and [<em>warp-striped</em>](index.html#sec4sec4) arrangements
* - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec4sec4)
* - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec4sec4)
*
* \tparam T The data type to be exchanged.
* \tparam BLOCK_THREADS The thread block size in threads.
Expand Down
30 changes: 15 additions & 15 deletions cub/block/block_load.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ enum BlockLoadAlgorithm
/**
* \par Overview
*
* A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is read
* A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is read
* directly from memory. The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub>
* reads the <em>i</em><sup>th</sup> segment of consecutive elements.
*
Expand All @@ -489,7 +489,7 @@ enum BlockLoadAlgorithm
/**
* \par Overview
*
* A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is read directly
* A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is read directly
* from memory using CUDA's built-in vectorized loads as a coalescing optimization.
* The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub> uses vector loads to
* read the <em>i</em><sup>th</sup> segment of consecutive elements.
Expand All @@ -511,13 +511,13 @@ enum BlockLoadAlgorithm
/**
* \par Overview
*
* A [<em>striped arrangement</em>](index.html#sec3sec3) of data is read
* A [<em>striped arrangement</em>](index.html#sec4sec4) of data is read
* directly from memory and then is locally transposed into a
* [<em>blocked arrangement</em>](index.html#sec3sec3). The thread block
* [<em>blocked arrangement</em>](index.html#sec4sec4). The thread block
* reads items in a parallel "strip-mining" fashion:
* thread<sub><em>i</em></sub> reads items having stride \p BLOCK_THREADS
* between them. cub::BlockExchange is then used to locally reorder the items
* into a [<em>blocked arrangement</em>](index.html#sec3sec3).
* into a [<em>blocked arrangement</em>](index.html#sec4sec4).
*
* \par Performance Considerations
* - The utilization of memory transactions (coalescing) remains high regardless
Expand All @@ -531,13 +531,13 @@ enum BlockLoadAlgorithm
/**
* \par Overview
*
* A [<em>warp-striped arrangement</em>](index.html#sec3sec3) of data is read
* A [<em>warp-striped arrangement</em>](index.html#sec4sec4) of data is read
* directly from memory and then is locally transposed into a
* [<em>blocked arrangement</em>](index.html#sec3sec3). Each warp reads its own
* [<em>blocked arrangement</em>](index.html#sec4sec4). Each warp reads its own
* contiguous segment in a parallel "strip-mining" fashion: lane<sub><em>i</em></sub>
* reads items having stride \p WARP_THREADS between them. cub::BlockExchange
* is then used to locally reorder the items into a
* [<em>blocked arrangement</em>](index.html#sec3sec3).
* [<em>blocked arrangement</em>](index.html#sec4sec4).
*
* \par Usage Considerations
* - BLOCK_THREADS must be a multiple of WARP_THREADS
Expand All @@ -553,7 +553,7 @@ enum BlockLoadAlgorithm


/**
* \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec3sec3) across a CUDA thread block. ![](block_load_logo.png)
* \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec4sec4) across a CUDA thread block. ![](block_load_logo.png)
* \ingroup BlockModule
*
* \par Overview
Expand All @@ -563,17 +563,17 @@ enum BlockLoadAlgorithm
*
* \par
* Optionally, BlockLoad can be specialized by different data movement strategies:
* -# <b>cub::BLOCK_LOAD_DIRECT</b>. A [<em>blocked arrangement</em>](index.html#sec3sec3)
* -# <b>cub::BLOCK_LOAD_DIRECT</b>. A [<em>blocked arrangement</em>](index.html#sec4sec4)
* of data is read directly from memory. [More...](\ref cub::BlockLoadAlgorithm)
* -# <b>cub::BLOCK_LOAD_VECTORIZE</b>. A [<em>blocked arrangement</em>](index.html#sec3sec3)
* -# <b>cub::BLOCK_LOAD_VECTORIZE</b>. A [<em>blocked arrangement</em>](index.html#sec4sec4)
* of data is read directly from memory using CUDA's built-in vectorized loads as a
* coalescing optimization. [More...](\ref cub::BlockLoadAlgorithm)
* -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>. A [<em>striped arrangement</em>](index.html#sec3sec3)
* -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>. A [<em>striped arrangement</em>](index.html#sec4sec4)
* of data is read directly from memory and is then locally transposed into a
* [<em>blocked arrangement</em>](index.html#sec3sec3). [More...](\ref cub::BlockLoadAlgorithm)
* -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>. A [<em>warp-striped arrangement</em>](index.html#sec3sec3)
* [<em>blocked arrangement</em>](index.html#sec4sec4). [More...](\ref cub::BlockLoadAlgorithm)
* -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>. A [<em>warp-striped arrangement</em>](index.html#sec4sec4)
* of data is read directly from memory and is then locally transposed into a
* [<em>blocked arrangement</em>](index.html#sec3sec3). [More...](\ref cub::BlockLoadAlgorithm)
* [<em>blocked arrangement</em>](index.html#sec4sec4). [More...](\ref cub::BlockLoadAlgorithm)
*
* \tparam InputIteratorRA The input iterator type (may be a simple pointer type).
* \tparam BLOCK_THREADS The thread block size in threads.
Expand Down
2 changes: 1 addition & 1 deletion cub/block/block_radix_rank.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ namespace cub {
*
* \par Usage Considerations
* - Keys must be in a form suitable for radix ranking (i.e., unsigned bits).
* - Assumes a [<em>blocked arrangement</em>](index.html#sec3sec3) of elements across threads
* - Assumes a [<em>blocked arrangement</em>](index.html#sec4sec4) of elements across threads
* - \smemreuse{BlockRadixRank::TempStorage}
*
* \par Performance Considerations
Expand Down
18 changes: 9 additions & 9 deletions cub/block/block_radix_sort.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ namespace cub {
* \blockcollective{BlockRadixSort}
* \par
* The code snippet below illustrates a sort of 512 integer keys that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -237,12 +237,12 @@ public:
//@{

/**
* \brief Performs a block-wide radix sort over a [<em>blocked arrangement</em>](index.html#sec3sec3) of keys.
* \brief Performs a block-wide radix sort over a [<em>blocked arrangement</em>](index.html#sec4sec4) of keys.
*
* \smemreuse
*
* The code snippet below illustrates a sort of 512 integer keys that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive keys.
* \par
* \code
Expand Down Expand Up @@ -314,7 +314,7 @@ public:


/**
* \brief Performs a block-wide radix sort across a [<em>blocked arrangement</em>](index.html#sec3sec3) of keys and values.
* \brief Performs a block-wide radix sort across a [<em>blocked arrangement</em>](index.html#sec4sec4) of keys and values.
*
* BlockRadixSort can only accommodate one associated tile of values. To "truck along"
* more than one tile of values, simply perform a key-value sort of the keys paired
Expand All @@ -325,7 +325,7 @@ public:
* \smemreuse
*
* The code snippet below illustrates a sort of 512 integer keys and values that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive pairs.
* \par
* \code
Expand Down Expand Up @@ -412,12 +412,12 @@ public:


/**
* \brief Performs a radix sort across a [<em>blocked arrangement</em>](index.html#sec3sec3) of keys, leaving them in a [<em>striped arrangement</em>](index.html#sec3sec3).
* \brief Performs a radix sort across a [<em>blocked arrangement</em>](index.html#sec4sec4) of keys, leaving them in a [<em>striped arrangement</em>](index.html#sec4sec4).
*
* \smemreuse
*
* The code snippet below illustrates a sort of 512 integer keys that
* are initially partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are initially partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive keys. The final partitioning is striped.
* \par
* \code
Expand Down Expand Up @@ -497,7 +497,7 @@ public:


/**
* \brief Performs a radix sort across a [<em>blocked arrangement</em>](index.html#sec3sec3) of keys and values, leaving them in a [<em>striped arrangement</em>](index.html#sec3sec3).
* \brief Performs a radix sort across a [<em>blocked arrangement</em>](index.html#sec4sec4) of keys and values, leaving them in a [<em>striped arrangement</em>](index.html#sec4sec4).
*
* BlockRadixSort can only accommodate one associated tile of values. To "truck along"
* more than one tile of values, simply perform a key-value sort of the keys paired
Expand All @@ -508,7 +508,7 @@ public:
* \smemreuse
*
* The code snippet below illustrates a sort of 512 integer keys and values that
* are initially partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are initially partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive pairs. The final partitioning is striped.
* \par
* \code
Expand Down
6 changes: 3 additions & 3 deletions cub/block/block_reduce.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ enum BlockReduceAlgorithm
* \blockcollective{BlockReduce}
* \par
* The code snippet below illustrates a sum reduction of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -333,7 +333,7 @@ public:
* \smemreuse
*
* The code snippet below illustrates a max reduction of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -478,7 +478,7 @@ public:
* \smemreuse
*
* The code snippet below illustrates a sum reduction of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down
2 changes: 1 addition & 1 deletion cub/block/block_scan.cuh.REMOVED.git-id
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ea05cd3be059138b71ea9f1b98b51eb437edb991
14d1cfaefc9268cbd8f94fab3806edb8d7d3259a
30 changes: 15 additions & 15 deletions cub/block/block_store.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,7 @@ enum BlockStoreAlgorithm
/**
* \par Overview
*
* A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is written
* A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is written
* directly to memory. The thread block writes items in a parallel "raking" fashion:
* thread<sub><em>i</em></sub> writes the <em>i</em><sup>th</sup> segment of consecutive elements.
*
Expand All @@ -379,7 +379,7 @@ enum BlockStoreAlgorithm
/**
* \par Overview
*
* A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is written directly
* A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is written directly
* to memory using CUDA's built-in vectorized stores as a coalescing optimization.
* The thread block writes items in a parallel "raking" fashion: thread<sub><em>i</em></sub> uses vector stores to
* write the <em>i</em><sup>th</sup> segment of consecutive elements.
Expand All @@ -400,11 +400,11 @@ enum BlockStoreAlgorithm

/**
* \par Overview
* A [<em>blocked arrangement</em>](index.html#sec3sec3) is locally
* transposed into a [<em>striped arrangement</em>](index.html#sec3sec3)
* A [<em>blocked arrangement</em>](index.html#sec4sec4) is locally
* transposed into a [<em>striped arrangement</em>](index.html#sec4sec4)
* which is then written to memory. More specifically, cub::BlockExchange
* used to locally reorder the items into a
* [<em>striped arrangement</em>](index.html#sec3sec3), after which the
* [<em>striped arrangement</em>](index.html#sec4sec4), after which the
* thread block writes items in a parallel "strip-mining" fashion: consecutive
* items owned by thread<sub><em>i</em></sub> are written to memory with
* stride \p BLOCK_THREADS between them.
Expand All @@ -419,11 +419,11 @@ enum BlockStoreAlgorithm

/**
* \par Overview
* A [<em>blocked arrangement</em>](index.html#sec3sec3) is locally
* transposed into a [<em>warp-striped arrangement</em>](index.html#sec3sec3)
* A [<em>blocked arrangement</em>](index.html#sec4sec4) is locally
* transposed into a [<em>warp-striped arrangement</em>](index.html#sec4sec4)
* which is then written to memory. More specifically, cub::BlockExchange used
* to locally reorder the items into a
* [<em>warp-striped arrangement</em>](index.html#sec3sec3), after which
* [<em>warp-striped arrangement</em>](index.html#sec4sec4), after which
* each warp writes its own contiguous segment in a parallel "strip-mining" fashion:
* consecutive items owned by lane<sub><em>i</em></sub> are written to memory
* with stride \p WARP_THREADS between them.
Expand All @@ -446,24 +446,24 @@ enum BlockStoreAlgorithm


/**
* \brief The BlockStore class provides [<em>collective</em>](index.html#sec0) data movement methods for writing a [<em>blocked arrangement</em>](index.html#sec3sec3) of items partitioned across a CUDA thread block to a linear segment of memory. ![](block_store_logo.png)
* \brief The BlockStore class provides [<em>collective</em>](index.html#sec0) data movement methods for writing a [<em>blocked arrangement</em>](index.html#sec4sec4) of items partitioned across a CUDA thread block to a linear segment of memory. ![](block_store_logo.png)
*
* \par Overview
* The BlockStore class provides a single data movement abstraction that can be specialized
* to implement different cub::BlockStoreAlgorithm strategies. This facilitates different
* performance policies for different architectures, data types, granularity sizes, etc.
*
* \par Optionally, BlockStore can be specialized by different data movement strategies:
* -# <b>cub::BLOCK_STORE_DIRECT</b>. A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is written
* -# <b>cub::BLOCK_STORE_DIRECT</b>. A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is written
* directly to memory. [More...](\ref cub::BlockStoreAlgorithm)
* -# <b>cub::BLOCK_STORE_VECTORIZE</b>. A [<em>blocked arrangement</em>](index.html#sec3sec3)
* -# <b>cub::BLOCK_STORE_VECTORIZE</b>. A [<em>blocked arrangement</em>](index.html#sec4sec4)
* of data is written directly to memory using CUDA's built-in vectorized stores as a
* coalescing optimization. [More...](\ref cub::BlockStoreAlgorithm)
* -# <b>cub::BLOCK_STORE_TRANSPOSE</b>. A [<em>blocked arrangement</em>](index.html#sec3sec3)
* is locally transposed into a [<em>striped arrangement</em>](index.html#sec3sec3) which is
* -# <b>cub::BLOCK_STORE_TRANSPOSE</b>. A [<em>blocked arrangement</em>](index.html#sec4sec4)
* is locally transposed into a [<em>striped arrangement</em>](index.html#sec4sec4) which is
* then written to memory. [More...](\ref cub::BlockStoreAlgorithm)
* -# <b>cub::BLOCK_STORE_WARP_TRANSPOSE</b>. A [<em>blocked arrangement</em>](index.html#sec3sec3)
* is locally transposed into a [<em>warp-striped arrangement</em>](index.html#sec3sec3) which is
* -# <b>cub::BLOCK_STORE_WARP_TRANSPOSE</b>. A [<em>blocked arrangement</em>](index.html#sec4sec4)
* is locally transposed into a [<em>warp-striped arrangement</em>](index.html#sec4sec4) which is
* then written to memory. [More...](\ref cub::BlockStoreAlgorithm)
*
* \tparam OutputIteratorRA The input iterator type (may be a simple pointer type).
Expand Down
Loading

0 comments on commit 1504496

Please sign in to comment.