diff --git a/format/spec.md b/format/spec.md
index 60c0f99c3f90..855db29f569b 100644
--- a/format/spec.md
+++ b/format/spec.md
@@ -167,29 +167,33 @@ A **`map`** is a collection of key-value pairs with a key type and a value type.
#### Primitive Types
-| Primitive type | Description | Requirements |
-|--------------------|--------------------------------------------------------------------------|--------------------------------------------------|
-| **`boolean`** | True or false | |
-| **`int`** | 32-bit signed integers | Can promote to `long` |
-| **`long`** | 64-bit signed integers | |
-| **`float`** | [32-bit IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point | Can promote to double |
-| **`double`** | [64-bit IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point | |
-| **`decimal(P,S)`** | Fixed-point decimal; precision P, scale S | Scale is fixed [1], precision must be 38 or less |
-| **`date`** | Calendar date without timezone or time | |
-| **`time`** | Time of day without date, timezone | Microsecond precision [2] |
-| **`timestamp`** | Timestamp without timezone | Microsecond precision [2] |
-| **`timestamptz`** | Timestamp with timezone | Stored as UTC [2] |
-| **`string`** | Arbitrary-length character sequences | Encoded with UTF-8 [3] |
-| **`uuid`** | Universally unique identifiers | Should use 16-byte fixed |
-| **`fixed(L)`** | Fixed-length byte array of length L | |
-| **`binary`** | Arbitrary-length byte array | |
+Supported primitive types are defined in the table below. Primitive types added after v1 have an "added by" version that is the first spec version in which the type is allowed. For example, nanosecond-precision timestamps are part of the v3 spec; using v3 types in v1 or v2 tables can break forward compatibility.
+
+| Added by verison | Primitive type | Description | Requirements |
+|------------------|--------------------|--------------------------------------------------------------------------|--------------------------------------------------|
+| | **`boolean`** | True or false | |
+| | **`int`** | 32-bit signed integers | Can promote to `long` |
+| | **`long`** | 64-bit signed integers | |
+| | **`float`** | [32-bit IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point | Can promote to double |
+| | **`double`** | [64-bit IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point | |
+| | **`decimal(P,S)`** | Fixed-point decimal; precision P, scale S | Scale is fixed [1], precision must be 38 or less |
+| | **`date`** | Calendar date without timezone or time | |
+| | **`time`** | Time of day without date, timezone | Microsecond precision [2] |
+| | **`timestamp`** | Timestamp, microsecond precision, without timezone | [2] |
+| | **`timestamptz`** | Timestamp, microsecond precision, with timezone | [2] |
+| [v3](#version-3) | **`timestamp_ns`** | Timestamp, nanosecond precision, without timezone | [2] |
+| [v3](#version-3) | **`timestamptz_ns`** | Timestamp, nanosecond precision, with timezone | [2] |
+| | **`string`** | Arbitrary-length character sequences | Encoded with UTF-8 [3] |
+| | **`uuid`** | Universally unique identifiers | Should use 16-byte fixed |
+| | **`fixed(L)`** | Fixed-length byte array of length L | |
+| | **`binary`** | Arbitrary-length byte array | |
Notes:
1. Decimal scale is fixed and cannot be changed by schema evolution. Precision can only be widened.
-2. All time and timestamp values are stored with microsecond precision.
- - Timestamps _with time zone_ represent a point in time: values are stored as UTC and do not retain a source time zone (`2017-11-16 17:10:34 PST` is stored/retrieved as `2017-11-17 01:10:34 UTC` and these values are considered identical).
- - Timestamps _without time zone_ represent a date and time of day regardless of zone: the time value is independent of zone adjustments (`2017-11-16 17:10:34` is always retrieved as `2017-11-16 17:10:34`). Timestamp values are stored as a long that encodes microseconds from the unix epoch.
+2. `time`, `timestamp`, and `timestamptz` values are represented with _microsecond precision_. `timestamp_ns` and `timstamptz_ns` values are represented with _nanosecond precision_.
+ - Timestamp values _with time zone_ represent a point in time: values are stored as UTC and do not retain a source time zone (`2017-11-16 17:10:34 PST` is stored/retrieved as `2017-11-17 01:10:34 UTC` and these values are considered identical).
+ - Timestamp values _without time zone_ represent a date and time of day regardless of zone: the time value is independent of zone adjustments (`2017-11-16 17:10:34` is always retrieved as `2017-11-16 17:10:34`).
3. Character strings must be stored as UTF-8 encoded byte arrays.
For details on how to serialize a schema to JSON, see Appendix C.
@@ -307,12 +311,12 @@ Partition specs capture the transform from table data to partition values. This
| Transform name | Description | Source types | Result type |
|-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------|
| **`identity`** | Source value, unmodified | Any | Source type |
-| **`bucket[N]`** | Hash of value, mod `N` (see below) | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, `string`, `uuid`, `fixed`, `binary` | `int` |
+| **`bucket[N]`** | Hash of value, mod `N` (see below) | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int` |
| **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string` | Source type |
-| **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz` | `int` |
-| **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz` | `int` |
-| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz` | `int` |
-| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | `timestamp`, `timestamptz` | `int` |
+| **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` |
+| **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` |
+| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` |
+| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` |
| **`void`** | Always produces `null` | Any | Source type or `int` |
All transforms must return `null` for a `null` input value.
@@ -862,10 +866,12 @@ Maps with non-string keys must use an array representation with the `map` logica
|**`float`**|`float`||
|**`double`**|`double`||
|**`decimal(P,S)`**|`{ "type": "fixed",`
`"size": minBytesRequired(P),`
`"logicalType": "decimal",`
`"precision": P,`
`"scale": S }`|Stored as fixed using the minimum number of bytes for the given precision.|
-|**`date`**|`{ "type": "int",`
`"logicalType": "date" }`|Stores days from the 1970-01-01.|
+|**`date`**|`{ "type": "int",`
`"logicalType": "date" }`|Stores days from 1970-01-01.|
|**`time`**|`{ "type": "long",`
`"logicalType": "time-micros" }`|Stores microseconds from midnight.|
-|**`timestamp`**|`{ "type": "long",`
`"logicalType": "timestamp-micros",`
`"adjust-to-utc": false }`|Stores microseconds from 1970-01-01 00:00:00.000000.|
-|**`timestamptz`**|`{ "type": "long",`
`"logicalType": "timestamp-micros",`
`"adjust-to-utc": true }`|Stores microseconds from 1970-01-01 00:00:00.000000 UTC.|
+|**`timestamp`** | `{ "type": "long",`
`"logicalType": "timestamp-micros",`
`"adjust-to-utc": false }` | Stores microseconds from 1970-01-01 00:00:00.000000. [1] |
+|**`timestamptz`** | `{ "type": "long",`
`"logicalType": "timestamp-micros",`
`"adjust-to-utc": true }` | Stores microseconds from 1970-01-01 00:00:00.000000 UTC. [1] |
+|**`timestamp_ns`** | `{ "type": "long",`
`"logicalType": "timestamp-nanos",`
`"adjust-to-utc": false }` | Stores nanoseconds from 1970-01-01 00:00:00.000000000. [1], [2] |
+|**`timestamptz_ns`** | `{ "type": "long",`
`"logicalType": "timestamp-nanos",`
`"adjust-to-utc": true }` | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC. [1], [2] |
|**`string`**|`string`||
|**`uuid`**|`{ "type": "fixed",`
`"size": 16,`
`"logicalType": "uuid" }`||
|**`fixed(L)`**|`{ "type": "fixed",`
`"size": L }`||
@@ -874,6 +880,11 @@ Maps with non-string keys must use an array representation with the `map` logica
|**`list`**|`array`||
|**`map`**|`array` of key-value records, or `map` when keys are strings (optional).|Array storage must use logical type name `map` and must store elements that are 2-field records. The first field is a non-null key and the second field is the value.|
+Notes:
+
+1. Avro type annotation `adjust-to-utc` is an Iceberg convention; default value is `false` if not present.
+2. Avro logical type `timestamp-nanos` is an Iceberg convention; the Avro specification does not define this type.
+
**Field IDs**
@@ -908,10 +919,12 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo
| **`float`** | `float` | | |
| **`double`** | `double` | | |
| **`decimal(P,S)`** | `P <= 9`: `int32`,
`P <= 18`: `int64`,
`fixed` otherwise | `DECIMAL(P,S)` | Fixed must use the minimum number of bytes that can store `P`. |
-| **`date`** | `int32` | `DATE` | Stores days from the 1970-01-01. |
+| **`date`** | `int32` | `DATE` | Stores days from 1970-01-01. |
| **`time`** | `int64` | `TIME_MICROS` with `adjustToUtc=false` | Stores microseconds from midnight. |
| **`timestamp`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds from 1970-01-01 00:00:00.000000. |
| **`timestamptz`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=true` | Stores microseconds from 1970-01-01 00:00:00.000000 UTC. |
+| **`timestamp_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=false` | Stores nanoseconds from 1970-01-01 00:00:00.000000000. |
+| **`timestamptz_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=true` | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC. |
| **`string`** | `binary` | `UTF8` | Encoding must be UTF-8. |
| **`uuid`** | `fixed_len_byte_array[16]` | `UUID` | |
| **`fixed(L)`** | `fixed_len_byte_array[L]` | | |
@@ -935,8 +948,10 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo
| **`decimal(P,S)`** | `decimal` | | |
| **`date`** | `date` | | |
| **`time`** | `long` | `iceberg.long-type`=`TIME` | Stores microseconds from midnight. |
-| **`timestamp`** | `timestamp` | | [1] |
-| **`timestamptz`** | `timestamp_instant` | | [1] |
+| **`timestamp`** | `timestamp` | | Stores microseconds from 2015-01-01 00:00:00.000000. [1], [2] |
+| **`timestamptz`** | `timestamp_instant` | | Stores microseconds from 2015-01-01 00:00:00.000000 UTC. [1], [2] |
+| **`timestamp_ns`** | `timestamp` | | Stores nanoseconds from 2015-01-01 00:00:00.000000000. [1] |
+| **`timestamptz_ns`** | `timestamp_instant` | | Stores nanoseconds from 2015-01-01 00:00:00.000000000 UTC. [1] |
| **`string`** | `string` | | ORC `varchar` and `char` would also map to **`string`**. |
| **`uuid`** | `binary` | `iceberg.binary-type`=`UUID` | |
| **`fixed(L)`** | `binary` | `iceberg.binary-type`=`FIXED` & `iceberg.length`=`L` | The length would not be checked by the ORC reader and should be checked by the adapter. |
@@ -948,6 +963,7 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo
Notes:
1. ORC's [TimestampColumnVector](https://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/exec/vector/TimestampColumnVector.html) consists of a time field (milliseconds since epoch) and a nanos field (nanoseconds within the second). Hence the milliseconds within the second are reported twice; once in the time field and again in the nanos field. The read adapter should only use milliseconds within the second from one of these fields. The write adapter should also report milliseconds within the second twice; once in the time field and again in the nanos field. ORC writer is expected to correctly consider millis information from one of the fields. More details at https://issues.apache.org/jira/browse/ORC-546
+2. ORC `timestamp` and `timestamp_instant` values store nanosecond precision. Iceberg ORC writers for Iceberg types `timestamp` and `timestamptz` **must** truncate nanoseconds to microseconds.
One of the interesting challenges with this is how to map Iceberg’s schema evolution (id based) on to ORC’s (name based). In theory, we could use Iceberg’s column ids as the column and field names, but that would be inconvenient.
@@ -971,8 +987,10 @@ The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with
| **`decimal(P,S)`** | `hashBytes(minBigEndian(unscaled(v)))`[2] | `14.20` → `-500754589` |
| **`date`** | `hashInt(daysFromUnixEpoch(v))` | `2017-11-16` → `-653330422` |
| **`time`** | `hashLong(microsecsFromMidnight(v))` | `22:31:08` → `-662762989` |
-| **`timestamp`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T22:31:08` → `-2047944441` |
-| **`timestamptz`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T14:31:08-08:00`→ `-2047944441` |
+| **`timestamp`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T22:31:08` → `-2047944441`
`2017-11-16T22:31:08.000001` → `-1207196810` |
+| **`timestamptz`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T14:31:08-08:00` → `-2047944441`
`2017-11-16T14:31:08.000001-08:00` → `-1207196810` |
+| **`timestamp_ns`** | `hashLong(nanosecsFromUnixEpoch(v))` | `2017-11-16T22:31:08` → `-737750069`
`2017-11-16T22:31:08.000001` → `-976603392`
`2017-11-16T22:31:08.000000001` → `-160215926` |
+| **`timestamptz_ns`** | `hashLong(nanosecsFromUnixEpoch(v))` | `2017-11-16T14:31:08-08:00` → `-737750069`
`2017-11-16T14:31:08.000001-08:00` → `-976603392`
`2017-11-16T14:31:08.000000001-08:00` → `-160215926` |
| **`string`** | `hashBytes(utf8Bytes(v))` | `iceberg` → `1210000089` |
| **`uuid`** | `hashBytes(uuidBytes(v))` [3] | `f79c3e09-677c-4bbd-a479-3f349cb785e7` → `1488055340` |
| **`fixed(L)`** | `hashBytes(v)` | `00 01 02 03` → `-188683207` |
@@ -1018,8 +1036,10 @@ Types are serialized according to this table:
|**`double`**|`JSON string: "double"`|`"double"`|
|**`date`**|`JSON string: "date"`|`"date"`|
|**`time`**|`JSON string: "time"`|`"time"`|
-|**`timestamp without zone`**|`JSON string: "timestamp"`|`"timestamp"`|
-|**`timestamp with zone`**|`JSON string: "timestamptz"`|`"timestamptz"`|
+|**`timestamp, microseconds, without zone`**|`JSON string: "timestamp"`|`"timestamp"`|
+|**`timestamp, microseconds, with zone`**|`JSON string: "timestamptz"`|`"timestamptz"`|
+|**`timestamp, nanoseconds, without zone`**|`JSON string: "timestamp_ns"`|`"timestamp_ns"`|
+|**`timestamp, nanoseconds, with zone`**|`JSON string: "timestamptz_ns"`|`"timestamptz_ns"`|
|**`string`**|`JSON string: "string"`|`"string"`|
|**`uuid`**|`JSON string: "uuid"`|`"uuid"`|
|**`fixed(L)`**|`JSON string: "fixed[]"`|`"fixed[16]"`|
@@ -1179,8 +1199,10 @@ This serialization scheme is for storing single values as individual binary valu
| **`double`** | Stored as 8-byte little-endian |
| **`date`** | Stores days from the 1970-01-01 in an 4-byte little-endian int |
| **`time`** | Stores microseconds from midnight in an 8-byte little-endian long |
-| **`timestamp without zone`** | Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long |
-| **`timestamp with zone`** | Stores microseconds from 1970-01-01 00:00:00.000000 UTC in an 8-byte little-endian long |
+| **`timestamp`** | Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long |
+| **`timestamptz`** | Stores microseconds from 1970-01-01 00:00:00.000000 UTC in an 8-byte little-endian long |
+| **`timestamp_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 in an 8-byte little-endian long |
+| **`timestamptz_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC in an 8-byte little-endian long |
| **`string`** | UTF-8 bytes (without length) |
| **`uuid`** | 16-byte big-endian value, see example in Appendix B |
| **`fixed(L)`** | Binary value |
@@ -1206,6 +1228,8 @@ This serialization scheme is for storing single values as individual binary valu
| **`time`** | **`JSON string`** | `"22:31:08.123456"` | Stores ISO-8601 standard time with microsecond precision |
| **`timestamp`** | **`JSON string`** | `"2017-11-16T22:31:08.123456"` | Stores ISO-8601 standard timestamp with microsecond precision; must not include a zone offset |
| **`timestamptz`** | **`JSON string`** | `"2017-11-16T22:31:08.123456+00:00"` | Stores ISO-8601 standard timestamp with microsecond precision; must include a zone offset and it must be '+00:00' |
+| **`timestamp_ns`** | **`JSON string`** | `"2017-11-16T22:31:08.123456789"` | Stores ISO-8601 standard timestamp with nanosecond precision; must not include a zone offset |
+| **`timestamptz_ns`** | **`JSON string`** | `"2017-11-16T22:31:08.123456789+00:00"` | Stores ISO-8601 standard timestamp with nanosecond precision; must include a zone offset and it must be '+00:00' |
| **`string`** | **`JSON string`** | `"iceberg"` | |
| **`uuid`** | **`JSON string`** | `"f79c3e09-677c-4bbd-a479-3f349cb785e7"` | Stores the lowercase uuid string |
| **`fixed(L)`** | **`JSON string`** | `"000102ff"` | Stored as a hexadecimal string |
@@ -1223,6 +1247,8 @@ Default values are added to struct fields in v3.
* The `write-default` is a forward-compatible change because it is only used at write time. Old writers will fail because the field is missing.
* Tables with `initial-default` will be read correctly by older readers if `initial-default` is always null for optional fields. Otherwise, old readers will default optional columns with null. Old readers will fail to read required fields which are populated by `initial-default` because that default is not supported.
+Types `timestamp_ns` and `timestamptz_ns` are added in v3.
+
### Version 2
Writing v1 metadata: