Skip to content

Releases: apache/orc

v1.7.2

10 Feb 07:05
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-492: Avoid potential ArrayIndexOutOfBoundsException when getting WriterVersionn (#961)
  • ORC-1041: Use memcpy during LZO decompression (#958)
  • ORC-1053: Fix time zone offset precision when convert tool converts LocalDateTime to Timestamp is not consistent with the internal default precision of ORC (#967)
  • ORC-1059: Align findColumns behaviour between 1.6 and 1.7 release (#972)

Improvements (orc-tools)

  • ORC-1012: Support specifying columns in orc-scan (#921)
  • ORC-1017: Add sizes tool to determine and display the sizes of each column in a set of files. (#925)
  • ORC-1023: Support writing bloom filters in ConvertTool (#933)

Tests

  • ORC-915: Remove io.netty.netty from Spark benchmark (#822)
  • ORC-938: Bump netty-all from 4.1.42.Final to 4.1.66.Final (#819)
  • ORC-948: Add hive benchmark integration tests (#860)
  • ORC-957: Bump netty-all from 4.1.66.Final to 4.1.67.Final (#870)
  • ORC-1021: Add -fno-omit-frame-pointer in DEBUG and RELWITHDEBINFO builds (#932)
  • ORC-1051: Update benchmark dependencies (#964)

v1.7.1

10 Feb 07:05
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-879 - Flaky Test for TestJsonReader
  • ORC-1008 - Overflow detection code is incorrect in IntegerColumnStatisticsImpl
  • ORC-1009 - [C++] Missing string include causes build failure with MSVC++
  • ORC-1015 - Update OrcFile.WriterOptions::memory javadoc
  • ORC-1016 - Use [email protected] in GitHub Action MacOS CIs
  • ORC-1024 - BloomFilter hash computation is inconsistent between Java and C++ clients
  • ORC-1029 - Could not load 'org.apache.orc.DataMask.Provider' when using orc encryption and spark executor with multi cores!
  • ORC-1030 - Java Tools Recover File command does not accurately find OrcFile.MAGIC
  • ORC-1034 - The search byte array algorithm is incorrectly implemented in FileDump.java
  • ORC-1035 - backupDataPath may be incorrect in recoverFile
  • ORC-1039 - Make FileDump.recoverFile handle side files only if they exist

Test

  • ORC-1000 - Use Java 17 in GitHub Action
  • ORC-1002 - Add java17 profile for Java17 unit testing
  • ORC-1010 - Bump tzdata from tzdata-2020e-1.tar.xz to tzdata-2021b-1.tar.xz
  • ORC-1011 - Activate java17 profile automatically
  • ORC-1032 - Bump parquet.version from 1.12.0 to 1.12.2
  • ORC-1036 - Due to tzdata upgrade, the fixed download links in CI are often not working
  • ORC-1037 - Bump spark.version from 3.1.2 to 3.2.0
  • ORC-1040 - Add Debian 11 docker test
  • ORC-1042 - Ignore unused-function C++ compile warning on CentOS 7
  • ORC-1043 - Fix C++ conversion compilation error in CentOS 7

v1.7.0

10 Feb 07:04
Compare
Choose a tag to compare

New Feature

  • [ORC-40] - [C++] Support building SearchArgument
  • [ORC-577] - Allow row-level filtering
  • [ORC-602] - Create adaptor for using FSDataInputStream for Java ORC reader
  • [ORC-716] - Build and test on Java 17-EA
  • [ORC-731] - Improve Java Tools
  • [ORC-747] - Abstract Dictionary interface and refactoring
  • [ORC-751] - [C++] Implement Predicate Pushdown for C++ Reader
  • [ORC-765] - Added build option to compile libraries with position independent code
  • [ORC-819] - Add GitHub labeler

Improvement

  • [ORC-377] - [C++] Adding writing with snappy compression to orc c++ writing lib
  • [ORC-480] - [C++] Deactivate WARN_FLAGS in release build
  • [ORC-566] - Add docker file for building site
  • [ORC-568] - Make the convert tool sort the old _col column names by number
  • [ORC-574] - Performance: Use const references for string statistics min and max to avoid copy construction
  • [ORC-588] - Static field or method should be directly referred by its class
  • [ORC-595] - Optimize Decimal64 scale calculation
  • [ORC-597] - Row-level filtering bench
  • [ORC-606] - Optimize Timestamp parseNanos calculation
  • [ORC-607] - Sync orc-benchmarks module to the others
  • [ORC-608] - Fix DecimalBench reader options
  • [ORC-609] - Upgrade aircompressor to 0.16
  • [ORC-614] - Implement efficient seek() in decompression streams
  • [ORC-615] - Refactor decompression streams into common base class
  • [ORC-622] - Refactoring of TreeReader into TypeReader and BatchReader
  • [ORC-638] - ORCMapredRecordWriter enlarge columnVector with factors when child array size is not large enough
  • [ORC-639] - Improve zstd compression performance
  • [ORC-646] - Add Ubuntu 20.04 docker file
  • [ORC-651] - Use GitHub Pull Request Template
  • [ORC-652] - Upgrade ZSTD to 1.4.5
  • [ORC-655] - Update bench to use Spark 2.4.6
  • [ORC-656] - Use gharchive.org instead of githubarchive.org
  • [ORC-657] - Remove com.netflix.iceberg dependency in java/bench module
  • [ORC-683] - PPD: Make Floating point NaN check more strict
  • [ORC-684] - [C++] Make Floating point NaN check more strict
  • [ORC-687] - Upgrade to JUnit5
  • [ORC-688] - Allow CHAR, VARCHAR to be promoted to STRING
  • [ORC-689] - Add GitHubAction job to publish snapshot
  • [ORC-693] - Update credential according to INFRA setup
  • [ORC-694] - Update docker files adding Java11 support
  • [ORC-696] - Consistent TypeDescription handling for quoted field names
  • [ORC-697] - Improve Scan tool to report where files are corrupted.
  • [ORC-699] - Minor improvements to the scan tool
  • [ORC-704] - Publish snapshots at only apache repo
  • [ORC-710] - Update maven plugins
  • [ORC-712] - Add USING IN SPARK to website
  • [ORC-722] - Improve code quality using static analysis.
  • [ORC-734] - Use org.apache.commons.lang3
  • [ORC-736] - Upgrade Hive to 3.1.2
  • [ORC-737] - Upgrade Spark to 3.1.0
  • [ORC-744] - LazyIO of non-filter columns
  • [ORC-745] - Migrate to travis-ci.com
  • [ORC-748] - Add separate writer implementation for Trino
  • [ORC-749] - Add checkstyle to -Panalzye
  • [ORC-750] - Fix benchmark to pass checkstyle:check
  • [ORC-757] - Add Hashtable implementation for dictionary
  • [ORC-760] - Update spark to 3.1.1
  • [ORC-761] - Replace MAINTAINER command with LABEL command in Dockerfile
  • [ORC-766] - Generalize the docker scripts to handle build-args
  • [ORC-767] - Add docker support for jdk 8 in debian 10
  • [ORC-768] - Update commons-csv to 1.8
  • [ORC-769] - Support ZSTD in ORC data benchmark
  • [ORC-770] - Support ZSTD in Avro data benchmark
  • [ORC-776] - Include source jars during publishing snapshot
  • [ORC-777] - Make the vectorized row batch size configurable in MR record readers and writers
  • [ORC-779] - Upgrade commons-cli to 1.4
  • [ORC-780] - Add LZ4 Compression to the C++ Writer
  • [ORC-791] - Upgrade guava test dependency to 30.1.1-jre
  • [ORC-792] - Upgrade commons-lang to 3.12.0
  • [ORC-796] - Upgrade apache parent pom version to the latest, 23
  • [ORC-797] - Allow writers to get the stripe information
  • [ORC-799] - Remove Ubuntu 16 docker test
  • [ORC-800] - [ORC]if map.value is selected, map.key should be selected automatically to prevent segment fault.
  • [ORC-801] - Clean up Logging
  • [ORC-802] - Document Maven Version and mvnw
  • [ORC-803] - MemoryManagerImpl Simplify removeWriter
  • [ORC-806] - Upgrade to Apache POM 23
  • [ORC-807] - Separate Jackson Versions in POM
  • [ORC-808] - Update Spark to 3.1.2
  • [ORC-812] - Simplify getClosestBufferSize in Writer
  • [ORC-813] - Upgrade ZSTD to 1.5.0
  • [ORC-818] - Build and test in Apple Silicon
  • [ORC-821] - Use mvnw instead of mvn
  • [ORC-823] - Upgrade maven-assembly-plugin to 3.3.0
  • [ORC-848] - Recycle Internal Buffer in StringHashTableDictionary
  • [ORC-849] - Core Benchmark Cleanup
  • [ORC-893] - Remove junit-vintage-engine from shims module.
  • [ORC-913] - Support data/format/compress options in Spark benchmark
  • [ORC-921] - Add an encrypted example file
  • [ORC-922] - Remove redundant conditional statements
  • [ORC-927] - Extracting duplicate codes for RowFilterBenchmark
  • [ORC-930] - Ignore unsupported JSON x ZSTD combination in bench
  • [ORC-931] - Optimize RunLengthIntegerWriterV2 code for better readability
  • [ORC-933] - extend the example with advanced reader
  • [ORC-941] - Move MacOS 10.15 and 11.5 test from Travis to GitHub Action
  • [ORC-943] - Add Intellij conf to support JIRA/PR autolinks
  • [ORC-945] - Add OUTPUT_QUIET, ERROR_QUIET to suppress Java8 addopen error messages
  • [ORC-970] - Reordering statements, improve readability in WriterImpl
  • [ORC-976] - Optimize compute zigZagLiterals
  • [ORC-984] - Save the software version that wrote each ORC file

Sub-task

  • [ORC-599] - Bump guava version to 28.1-jre
  • [ORC-663] - [C++] Support nanosecond in timestamp column statistics
  • [ORC-713] - Add Java 15 test to github action
  • [ORC-714] - Remove MRUnit dependency and its usage
  • [ORC-715] - Add MapReduce test cases
  • [ORC-718] - Enable Checkstyle plugin and FileTabCharacter rule.
  • [ORC-719] - Enable UnusedImports.
  • [ORC-720] - Run mvn checkstyle:check in GitHub action.
  • [ORC-721] - Use org.junit.Assert instead of deprecated junit.framework.Assert.
  • [ORC-723] - Upgrade Mockito to 3.7.0.
  • [ORC-726] - Support Map type in orc-tools convert
  • [ORC-727] - Update Java Tools documentation
  • [ORC-728] - Support head command in Java Tools
  • [ORC-733] - Upgrade Zookeeper from 3.4.x to 3.6.2
  • [ORC-735] - ConvertTool should not fail at a single ORC file
  • [ORC-738] - Add date type conversion support in Java Tools
  • [ORC-741] - Schema Evolution missing column is not handled in the presence of filters
  • [ORC-742] - LazyIO of non-filter columns in the presence of filters
  • [ORC-743] - Conversion of SArg into Filters, to take advantage of LazyIO
  • [ORC-754] - Code cleanup
  • [ORC-755] - Introduce OrcFilterContext
  • [ORC-758] - Avoid decompressing compressed streams if already decompressed
  • [ORC-759] - StructBatchReader should always skip processing on the rootReader
  • [ORC-778] - Add "NewlineAtEndOfFile" checkstyle rule
  • [ORC-783] - Add a checkstyle rule to prevent trailing white spaces.
  • [ORC-795] - Add "LineLength" rule to checkstyle
  • [ORC-811] - Benchmarks for Filters
  • [ORC-814] - Build and test Java module on Apple Silicon
  • [ORC-815] - Build and test C++ module on CLang12
  • [ORC-816] - Rename and enable aarch64 profile automatically
  • [ORC-820] - Add Java 16 to GitHub Action
  • [ORC-822] - Add Java 17-ea to GitHub Action
  • [ORC-839] - Fix head command for batch reader
  • [ORC-851] - Fix CNFE in ORC tools uber jar to include required classes.
  • [ORC-857] - Add OuterTypeFilename/UpperEll/ArrayTypeStyle checkstyle rules.
  • [ORC-858] - Add NoLineWrap/OneStatementPerLine/NeedBraces checkstyle rules
  • [ORC-859] - Update maven-checkstyle-plugin to 3.1.2.
  • [ORC-866] - Reduce LineLength from 125 to 120
  • [ORC-867] - Upgrade hive-storage-api to 2.8.1
  • [ORC-871] - orc-tools json-schema fails at empty json file with EOFException
  • [ORC-882] - Remove hamcrest-core test dependency
  • [ORC-886] - Add an integration test for ORC Java tools
  • [ORC-889] - Remove orc-mapreduce build warnings due to overlapping resources
  • [ORC-895] - Use snappy-java 1.1.8.4 in bench/core to support Apple Silicon
  • [ORC-901] - Remove junit-vintage-engine from mapreduce/tools module
  • [ORC-905] - Add an integration test for example
  • [ORC-907] - Remove junit-vintage-engine from core module
  • [ORC-909] - Remove commons-io 2.1 dependency
  • [ORC-910] - Enforce maven-dependency-plugin
  • [ORC-911] - Remove janino dependency in favor of Spark's transitive dependency
  • [ORC-912] - Exclude Spark transitive avro/parquet dependency from Spark benchmark
  • [ORC-917] - Bump mockito-core from 3.7.0 to 3.11.2
  • [ORC-919] - Spark bench objenesis should be the same as Spark.
  • [ORC-920] - Use junit.version and mockito.version property and bump junit to 5.7.2
  • [ORC-924] - Add redundant modifier/modifier order checkstyle rules.
  • [ORC-926] - Consolidate license header style in Java files.
  • [ORC-928] - Bump checkstyle from 8.44 to 8.45.1
  • [ORC-929] - Fix NaN at orc-tools 'meta' command
  • [ORC-934] - Add integration tests for Java bench
  • [ORC-939] - Remove threetenbp dependency
  • [ORC-942] - Remove javax.xml.bind:jaxb-api dependency
  • [ORC-944] - Add "RedundantImport" checkstyle rule
  • [ORC-947] - Update coding guide to max line length 100 and enforce it.
  • [ORC-950] - Bump aircompressor to 0.20
  • [ORC-951] - Add since tag to org.apache.orc.Reader interface
  • [ORC-952] - Add since tag to org.apache.orc.RecordReader interface
  • [ORC-953] - Add since tag to org.apache.orc.Writer interface
  • [ORC-959] - C++ reader crash in resolving nested List columns for SearchArgument
  • [ORC-960] - Create SearchArgument using column ids
  • [ORC-971] - LESS_THAN_EQUALS doesn't handle the case when min=max
  • [ORC-973] - [C++] Pr...
Read more