apache · fjy · Jan 18, 2017 · Jan 3, 2017 · Jan 17, 2017 · Jan 17, 2017
diff --git a/api/src/main/java/io/druid/data/input/Firehose.java b/api/src/main/java/io/druid/data/input/Firehose.java
@@ -53,7 +53,7 @@ public interface Firehose extends Closeable
    *
    * @return The next row
    */
-  public InputRow nextRow() ;
+  public InputRow nextRow();
 
   /**
    * Returns a runnable that will "commit" everything read up to the point at which commit() is called.  This is

diff --git a/docs/content/ingestion/index.md b/docs/content/ingestion/index.md
@@ -187,7 +187,7 @@ This spec is used to generated segments with uniform intervals.
 | segmentGranularity | string | The granularity to create segments at. | no (default == 'DAY') |
 | queryGranularity | string | The minimum granularity to be able to query results at and the granularity of the data inside the segment. E.g. a value of "minute" will mean that data is aggregated at minutely granularity. That is, if there are collisions in the tuple (minute(timestamp), dimensions), then it will aggregate values together using the aggregators instead of storing individual rows. | no (default == 'NONE') |
 | rollup | boolean | rollup or not | no (default == true) |
-| intervals | string | A list of intervals for the raw data being ingested. Ignored for real-time ingestion. | yes for batch, no for real-time |
+| intervals | string | A list of intervals for the raw data being ingested. Ignored for real-time ingestion. | yes for Hadoop ingestion, no otherwise |
 | timezone | string | The timezone to represent the interval offsets in. Only valid if intervals are explicitly specified for batch ingestion. Will not be valid for kafka based ingestion. | no (default == 'UTC')
 
 ### Arbitrary Granularity Spec

diff --git a/docs/content/ingestion/tasks.md b/docs/content/ingestion/tasks.md
@@ -76,9 +76,8 @@ The Index Task is a simpler variation of the Index Hadoop task that is designed
     },
     "tuningConfig" : {
       "type" : "index",
-      "targetPartitionSize" : -1,
-      "rowFlushBoundary" : 0,
-      "numShards": 1
+      "targetPartitionSize" : 5000000,
+      "maxRowsInMemory" : 75000
     }
   }
 }
@@ -100,21 +99,28 @@ See [Ingestion](../ingestion/index.html)
 
 #### IOConfig
 
-This field is required. You can specify a type of [Firehose](../ingestion/firehose.html) here.
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|type|The task type, this should always be "index".|none|yes|
+|firehose|Specify a [Firehose](../ingestion/firehose.html) here.|none|yes|
+|appendToExisting|Creates segments as additional shards of the latest version, effectively appending to the segment set instead of replacing it. This will only work if the existing segment set has extendable-type shardSpecs (which can be forced by setting 'forceExtendableShardSpecs' in the tuning config).|false|no|
+|skipFirehoseCaching|By default the IndexTask will fully read the supplied firehose to disk before processing the data. This prevents the task from doing multiple remote fetches and enforces determinism if more than one pass through the data is required. It also allows the task to retry fetching the data if the firehose throws an exception during reading. This requires sufficient disk space for the temporary cache.|false|no|
 
 #### TuningConfig
 
 The tuningConfig is optional and default parameters will be used if no tuningConfig is specified. See below for more details.
 
 |property|description|default|required?|
 |--------|-----------|-------|---------|
-|type|The task type, this should always be "index".|None.|yes|
-|targetPartitionSize|Used in sharding. Determines how many rows are in each segment. Set this to -1 to use numShards instead for sharding.|5000000|no|
-|rowFlushBoundary|Used in determining when intermediate persist should occur to disk.|75000|no|
-|numShards|Directly specify the number of shards to create. You can skip the intermediate persist step if you specify the number of shards you want and set targetPartitionSize=-1.|null|no|
+|type|The task type, this should always be "index".|none|yes|
+|targetPartitionSize|Used in sharding. Determines how many rows are in each segment.|5000000|no|
+|maxRowsInMemory|Used in determining when intermediate persists to disk should occur.|75000|no|
+|numShards|Directly specify the number of shards to create. If this is specified and 'intervals' is specified in the granularitySpec, the index task can skip the determine intervals/partitions pass through the data. numShards cannot be specified if targetPartitionSize is set.|null|no|
 |indexSpec|defines segment storage format options to be used at indexing time, see [IndexSpec](#indexspec)|null|no|
+|maxPendingPersists|Maximum number of persists that can be pending but not started. If this limit would be exceeded by a new intermediate persist, ingestion will block until the currently-running persist finishes. Maximum heap memory usage for indexing scales with maxRowsInMemory * (2 + maxPendingPersists).|0 (meaning one persist can be running concurrently with ingestion, and none can be queued up)|no|
 |buildV9Directly|Whether to build a v9 index directly instead of first building a v8 index and then converting it to v9 format.|true|no|
 |forceExtendableShardSpecs|Forces use of extendable shardSpecs. Experimental feature intended for use with the [Kafka indexing service extension](../development/extensions-core/kafka-ingestion.html).|false|no|
+|reportParseExceptions|If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped.|false|no|
 
 #### IndexSpec