doc update

Former-commit-id: 2ba8442
brycelelbach · Aug 9, 2013 · 1504496 · 1504496
1 parent 9562c59
commit 1504496
Show file tree

Hide file tree

Showing 221 changed files with 5,902 additions and 3,282 deletions.
diff --git a/cub/block/block_discontinuity.cuh b/cub/block/block_discontinuity.cuh
@@ -60,7 +60,7 @@ namespace cub {
  * \blockcollective{BlockDiscontinuity}
  * \par
  * The code snippet below illustrates the head flagging of 512 integer items that
- * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+ * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
  * where each thread owns 4 consecutive items.
  * \par
  * \code
@@ -233,7 +233,7 @@ public:
      * \smemreuse
      *
      * The code snippet below illustrates the head-flagging of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code
@@ -318,7 +318,7 @@ public:
      * \smemreuse
      *
      * The code snippet below illustrates the head-flagging of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code
@@ -418,7 +418,7 @@ public:
      * \smemreuse
      *
      * The code snippet below illustrates the tail-flagging of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code
@@ -504,7 +504,7 @@ public:
      * \smemreuse
      *
      * The code snippet below illustrates the tail-flagging of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code

diff --git a/cub/block/block_exchange.cuh b/cub/block/block_exchange.cuh
@@ -59,10 +59,10 @@ namespace cub {
  *
  * \par
  * BlockExchange supports the following types of data exchanges:
- * - Transposing between [<em>blocked</em>](index.html#sec3sec3) and [<em>striped</em>](index.html#sec3sec3) arrangements
- * - Transposing between [<em>blocked</em>](index.html#sec3sec3) and [<em>warp-striped</em>](index.html#sec3sec3) arrangements
- * - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec3sec3)
- * - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec3sec3)
+ * - Transposing between [<em>blocked</em>](index.html#sec4sec4) and [<em>striped</em>](index.html#sec4sec4) arrangements
+ * - Transposing between [<em>blocked</em>](index.html#sec4sec4) and [<em>warp-striped</em>](index.html#sec4sec4) arrangements
+ * - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec4sec4)
+ * - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec4sec4)
  *
  * \tparam T                    The data type to be exchanged.
  * \tparam BLOCK_THREADS        The thread block size in threads.

diff --git a/cub/block/block_load.cuh b/cub/block/block_load.cuh
@@ -476,7 +476,7 @@ enum BlockLoadAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is read
+     * A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is read
      * directly from memory.  The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub>
      * reads the <em>i</em><sup>th</sup> segment of consecutive elements.
      *
@@ -489,7 +489,7 @@ enum BlockLoadAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is read directly
+     * A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is read directly
      * from memory using CUDA's built-in vectorized loads as a coalescing optimization.
      * The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub> uses vector loads to
      * read the <em>i</em><sup>th</sup> segment of consecutive elements.
@@ -511,13 +511,13 @@ enum BlockLoadAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>striped arrangement</em>](index.html#sec3sec3) of data is read
+     * A [<em>striped arrangement</em>](index.html#sec4sec4) of data is read
      * directly from memory and then is locally transposed into a
-     * [<em>blocked arrangement</em>](index.html#sec3sec3). The thread block
+     * [<em>blocked arrangement</em>](index.html#sec4sec4). The thread block
      * reads items in a parallel "strip-mining" fashion:
      * thread<sub><em>i</em></sub> reads items having stride \p BLOCK_THREADS
      * between them. cub::BlockExchange is then used to locally reorder the items
-     * into a [<em>blocked arrangement</em>](index.html#sec3sec3).
+     * into a [<em>blocked arrangement</em>](index.html#sec4sec4).
      *
      * \par Performance Considerations
      * - The utilization of memory transactions (coalescing) remains high regardless
@@ -531,13 +531,13 @@ enum BlockLoadAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>warp-striped arrangement</em>](index.html#sec3sec3) of data is read
+     * A [<em>warp-striped arrangement</em>](index.html#sec4sec4) of data is read
      * directly from memory and then is locally transposed into a
-     * [<em>blocked arrangement</em>](index.html#sec3sec3). Each warp reads its own
+     * [<em>blocked arrangement</em>](index.html#sec4sec4). Each warp reads its own
      * contiguous segment in a parallel "strip-mining" fashion: lane<sub><em>i</em></sub>
      * reads items having stride \p WARP_THREADS between them. cub::BlockExchange
      * is then used to locally reorder the items into a
-     * [<em>blocked arrangement</em>](index.html#sec3sec3).
+     * [<em>blocked arrangement</em>](index.html#sec4sec4).
      *
      * \par Usage Considerations
      * - BLOCK_THREADS must be a multiple of WARP_THREADS
@@ -553,7 +553,7 @@ enum BlockLoadAlgorithm
 
 
 /**
- * \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec3sec3) across a CUDA thread block.  ![](block_load_logo.png)
+ * \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec4sec4) across a CUDA thread block.  ![](block_load_logo.png)
  * \ingroup BlockModule
  *
  * \par Overview
@@ -563,17 +563,17 @@ enum BlockLoadAlgorithm
  *
  * \par
  * Optionally, BlockLoad can be specialized by different data movement strategies:
- *   -# <b>cub::BLOCK_LOAD_DIRECT</b>.  A [<em>blocked arrangement</em>](index.html#sec3sec3)
+ *   -# <b>cub::BLOCK_LOAD_DIRECT</b>.  A [<em>blocked arrangement</em>](index.html#sec4sec4)
  *      of data is read directly from memory.  [More...](\ref cub::BlockLoadAlgorithm)
- *   -# <b>cub::BLOCK_LOAD_VECTORIZE</b>.  A [<em>blocked arrangement</em>](index.html#sec3sec3)
+ *   -# <b>cub::BLOCK_LOAD_VECTORIZE</b>.  A [<em>blocked arrangement</em>](index.html#sec4sec4)
  *      of data is read directly from memory using CUDA's built-in vectorized loads as a
  *      coalescing optimization.    [More...](\ref cub::BlockLoadAlgorithm)
- *   -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>.  A [<em>striped arrangement</em>](index.html#sec3sec3)
+ *   -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>.  A [<em>striped arrangement</em>](index.html#sec4sec4)
  *      of data is read directly from memory and is then locally transposed into a
- *      [<em>blocked arrangement</em>](index.html#sec3sec3).  [More...](\ref cub::BlockLoadAlgorithm)
- *   -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>.  A [<em>warp-striped arrangement</em>](index.html#sec3sec3)
+ *      [<em>blocked arrangement</em>](index.html#sec4sec4).  [More...](\ref cub::BlockLoadAlgorithm)
+ *   -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>.  A [<em>warp-striped arrangement</em>](index.html#sec4sec4)
  *      of data is read directly from memory and is then locally transposed into a
- *      [<em>blocked arrangement</em>](index.html#sec3sec3).  [More...](\ref cub::BlockLoadAlgorithm)
+ *      [<em>blocked arrangement</em>](index.html#sec4sec4).  [More...](\ref cub::BlockLoadAlgorithm)
  *
  * \tparam InputIteratorRA      The input iterator type (may be a simple pointer type).
  * \tparam BLOCK_THREADS        The thread block size in threads.

diff --git a/cub/block/block_radix_rank.cuh b/cub/block/block_radix_rank.cuh
@@ -64,7 +64,7 @@ namespace cub {
  *
  * \par Usage Considerations
  * - Keys must be in a form suitable for radix ranking (i.e., unsigned bits).
- * - Assumes a [<em>blocked arrangement</em>](index.html#sec3sec3) of elements across threads
+ * - Assumes a [<em>blocked arrangement</em>](index.html#sec4sec4) of elements across threads
  * - \smemreuse{BlockRadixRank::TempStorage}
  *
  * \par Performance Considerations

diff --git a/cub/block/block_radix_sort.cuh b/cub/block/block_radix_sort.cuh
@@ -80,7 +80,7 @@ namespace cub {
  * \blockcollective{BlockRadixSort}
  * \par
  * The code snippet below illustrates a sort of 512 integer keys that
- * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+ * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
  * where each thread owns 4 consecutive items.
  * \par
  * \code
@@ -237,12 +237,12 @@ public:
     //@{
 
     /**
-     * \brief Performs a block-wide radix sort over a [<em>blocked arrangement</em>](index.html#sec3sec3) of keys.
+     * \brief Performs a block-wide radix sort over a [<em>blocked arrangement</em>](index.html#sec4sec4) of keys.
      *
      * \smemreuse
      *
      * The code snippet below illustrates a sort of 512 integer keys that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive keys.
      * \par
      * \code
@@ -314,7 +314,7 @@ public:
 
 
     /**
-     * \brief Performs a block-wide radix sort across a [<em>blocked arrangement</em>](index.html#sec3sec3) of keys and values.
+     * \brief Performs a block-wide radix sort across a [<em>blocked arrangement</em>](index.html#sec4sec4) of keys and values.
      *
      * BlockRadixSort can only accommodate one associated tile of values. To "truck along"
      * more than one tile of values, simply perform a key-value sort of the keys paired
@@ -325,7 +325,7 @@ public:
      * \smemreuse
      *
      * The code snippet below illustrates a sort of 512 integer keys and values that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive pairs.
      * \par
      * \code
@@ -412,12 +412,12 @@ public:
 
 
     /**
-     * \brief Performs a radix sort across a [<em>blocked arrangement</em>](index.html#sec3sec3) of keys, leaving them in a [<em>striped arrangement</em>](index.html#sec3sec3).
+     * \brief Performs a radix sort across a [<em>blocked arrangement</em>](index.html#sec4sec4) of keys, leaving them in a [<em>striped arrangement</em>](index.html#sec4sec4).
      *
      * \smemreuse
      *
      * The code snippet below illustrates a sort of 512 integer keys that
-     * are initially partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are initially partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive keys.  The final partitioning is striped.
      * \par
      * \code
@@ -497,7 +497,7 @@ public:
 
 
     /**
-     * \brief Performs a radix sort across a [<em>blocked arrangement</em>](index.html#sec3sec3) of keys and values, leaving them in a [<em>striped arrangement</em>](index.html#sec3sec3).
+     * \brief Performs a radix sort across a [<em>blocked arrangement</em>](index.html#sec4sec4) of keys and values, leaving them in a [<em>striped arrangement</em>](index.html#sec4sec4).
      *
      * BlockRadixSort can only accommodate one associated tile of values. To "truck along"
      * more than one tile of values, simply perform a key-value sort of the keys paired
@@ -508,7 +508,7 @@ public:
      * \smemreuse
      *
      * The code snippet below illustrates a sort of 512 integer keys and values that
-     * are initially partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are initially partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive pairs.  The final partitioning is striped.
      * \par
      * \code

diff --git a/cub/block/block_reduce.cuh b/cub/block/block_reduce.cuh
@@ -147,7 +147,7 @@ enum BlockReduceAlgorithm
  * \blockcollective{BlockReduce}
  * \par
  * The code snippet below illustrates a sum reduction of 512 integer items that
- * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+ * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
  * where each thread owns 4 consecutive items.
  * \par
  * \code
@@ -333,7 +333,7 @@ public:
      * \smemreuse
      *
      * The code snippet below illustrates a max reduction of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code
@@ -478,7 +478,7 @@ public:
      * \smemreuse
      *
      * The code snippet below illustrates a sum reduction of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec3sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec4) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code

diff --git a/cub/block/block_scan.cuh.REMOVED.git-id b/cub/block/block_scan.cuh.REMOVED.git-id
@@ -1 +1 @@
-ea05cd3be059138b71ea9f1b98b51eb437edb991
+14d1cfaefc9268cbd8f94fab3806edb8d7d3259a
diff --git a/cub/block/block_store.cuh b/cub/block/block_store.cuh
@@ -366,7 +366,7 @@ enum BlockStoreAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is written
+     * A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is written
      * directly to memory.  The thread block writes items in a parallel "raking" fashion:
      * thread<sub><em>i</em></sub> writes the <em>i</em><sup>th</sup> segment of consecutive elements.
      *
@@ -379,7 +379,7 @@ enum BlockStoreAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is written directly
+     * A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is written directly
      * to memory using CUDA's built-in vectorized stores as a coalescing optimization.
      * The thread block writes items in a parallel "raking" fashion: thread<sub><em>i</em></sub> uses vector stores to
      * write the <em>i</em><sup>th</sup> segment of consecutive elements.
@@ -400,11 +400,11 @@ enum BlockStoreAlgorithm
 
     /**
      * \par Overview
-     * A [<em>blocked arrangement</em>](index.html#sec3sec3) is locally
-     * transposed into a [<em>striped arrangement</em>](index.html#sec3sec3)
+     * A [<em>blocked arrangement</em>](index.html#sec4sec4) is locally
+     * transposed into a [<em>striped arrangement</em>](index.html#sec4sec4)
      * which is then written to memory.  More specifically, cub::BlockExchange
      * used to locally reorder the items into a
-     * [<em>striped arrangement</em>](index.html#sec3sec3), after which the
+     * [<em>striped arrangement</em>](index.html#sec4sec4), after which the
      * thread block writes items in a parallel "strip-mining" fashion: consecutive
      * items owned by thread<sub><em>i</em></sub> are written to memory with
      * stride \p BLOCK_THREADS between them.
@@ -419,11 +419,11 @@ enum BlockStoreAlgorithm
 
     /**
      * \par Overview
-     * A [<em>blocked arrangement</em>](index.html#sec3sec3) is locally
-     * transposed into a [<em>warp-striped arrangement</em>](index.html#sec3sec3)
+     * A [<em>blocked arrangement</em>](index.html#sec4sec4) is locally
+     * transposed into a [<em>warp-striped arrangement</em>](index.html#sec4sec4)
      * which is then written to memory.  More specifically, cub::BlockExchange used
      * to locally reorder the items into a
-     * [<em>warp-striped arrangement</em>](index.html#sec3sec3), after which
+     * [<em>warp-striped arrangement</em>](index.html#sec4sec4), after which
      * each warp writes its own contiguous segment in a parallel "strip-mining" fashion:
      * consecutive items owned by lane<sub><em>i</em></sub> are written to memory
      * with stride \p WARP_THREADS between them.
@@ -446,24 +446,24 @@ enum BlockStoreAlgorithm
 
 
 /**
- * \brief The BlockStore class provides [<em>collective</em>](index.html#sec0) data movement methods for writing a [<em>blocked arrangement</em>](index.html#sec3sec3) of items partitioned across a CUDA thread block to a linear segment of memory.  ![](block_store_logo.png)
+ * \brief The BlockStore class provides [<em>collective</em>](index.html#sec0) data movement methods for writing a [<em>blocked arrangement</em>](index.html#sec4sec4) of items partitioned across a CUDA thread block to a linear segment of memory.  ![](block_store_logo.png)
  *
  * \par Overview
  * The BlockStore class provides a single data movement abstraction that can be specialized
  * to implement different cub::BlockStoreAlgorithm strategies.  This facilitates different
  * performance policies for different architectures, data types, granularity sizes, etc.
  *
  * \par Optionally, BlockStore can be specialized by different data movement strategies:
- *   -# <b>cub::BLOCK_STORE_DIRECT</b>.  A [<em>blocked arrangement</em>](index.html#sec3sec3) of data is written
+ *   -# <b>cub::BLOCK_STORE_DIRECT</b>.  A [<em>blocked arrangement</em>](index.html#sec4sec4) of data is written
  *      directly to memory. [More...](\ref cub::BlockStoreAlgorithm)
- *   -# <b>cub::BLOCK_STORE_VECTORIZE</b>.  A [<em>blocked arrangement</em>](index.html#sec3sec3)
+ *   -# <b>cub::BLOCK_STORE_VECTORIZE</b>.  A [<em>blocked arrangement</em>](index.html#sec4sec4)
  *      of data is written directly to memory using CUDA's built-in vectorized stores as a
  *      coalescing optimization.  [More...](\ref cub::BlockStoreAlgorithm)
- *   -# <b>cub::BLOCK_STORE_TRANSPOSE</b>.  A [<em>blocked arrangement</em>](index.html#sec3sec3)
- *      is locally transposed into a [<em>striped arrangement</em>](index.html#sec3sec3) which is
+ *   -# <b>cub::BLOCK_STORE_TRANSPOSE</b>.  A [<em>blocked arrangement</em>](index.html#sec4sec4)
+ *      is locally transposed into a [<em>striped arrangement</em>](index.html#sec4sec4) which is
  *      then written to memory.  [More...](\ref cub::BlockStoreAlgorithm)
- *   -# <b>cub::BLOCK_STORE_WARP_TRANSPOSE</b>.  A [<em>blocked arrangement</em>](index.html#sec3sec3)
- *      is locally transposed into a [<em>warp-striped arrangement</em>](index.html#sec3sec3) which is
+ *   -# <b>cub::BLOCK_STORE_WARP_TRANSPOSE</b>.  A [<em>blocked arrangement</em>](index.html#sec4sec4)
+ *      is locally transposed into a [<em>warp-striped arrangement</em>](index.html#sec4sec4) which is
  *      then written to memory.  [More...](\ref cub::BlockStoreAlgorithm)
  *
  * \tparam OutputIteratorRA     The input iterator type (may be a simple pointer type).
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		ea05cd3be059138b71ea9f1b98b51eb437edb991
		14d1cfaefc9268cbd8f94fab3806edb8d7d3259a