Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/apache/carbondata
Browse files Browse the repository at this point in the history
  • Loading branch information
W1thOut committed Aug 14, 2021
2 parents 7fa6471 + 1ccf295 commit b2d9615
Show file tree
Hide file tree
Showing 289 changed files with 8,036 additions and 2,063 deletions.
4 changes: 3 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -210,4 +210,6 @@
BSD 2-Clause
------------

com.github.luben:zstd-jni
com.github.luben:zstd-jni

com.github.paul-hammant:paranamer
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Expand All @@ -26,12 +26,13 @@ You can find the latest CarbonData document and learn more at:

## Status
Spark2.4:
[![Build Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.4)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.4/lastBuild/testReport)
[![Build Status](https://ci-builds.apache.org/job/carbondata/job/spark-2.4/badge/icon)](https://ci-builds.apache.org/job/carbondata/job/spark-2.4/)
[![Coverage Status](https://coveralls.io/repos/github/apache/carbondata/badge.svg?branch=master)](https://coveralls.io/github/apache/carbondata?branch=master)
<a href="https://scan.coverity.com/projects/carbondata">
<img alt="Coverity Scan Build Status"
src="https://scan.coverity.com/projects/13444/badge.svg"/>
</a>

## Features
CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema, complex data type etc, and CarbonData has following unique features:
* Stores data along with index: it can significantly accelerate query performance and reduces the I/O scans and CPU resources, where there are filters in the query. CarbonData index consists of multiple level of indices, a processing framework can leverage this index to reduce the task it needs to schedule and process, and it can also do skip scan in more finer grain unit (called blocklet) in task side scanning instead of scanning the whole file.
Expand Down Expand Up @@ -93,8 +94,18 @@ This guide document introduces [how to contribute to CarbonData](https://github.
## Contact us
To get involved in CarbonData:

* First join by emailing to [[email protected]](mailto:[email protected]),then you can discuss issues by emailing to [[email protected]](mailto:[email protected]) or visit http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com
* Report issues on [Apache Jira](https://issues.apache.org/jira/browse/CARBONDATA).
* First join by emailing to [[email protected]](mailto:[email protected]), then you can discuss issues by emailing to [[email protected]](mailto:[email protected]).
You can also directly visit [[email protected]](https://lists.apache.org/[email protected]).
Or you can visit [Apache CarbonData Dev Mailing List archive](http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/).

* Report issues on [Apache Jira](https://issues.apache.org/jira/browse/CARBONDATA). If you do not already have an Apache JIRA account, sign up [here](https://issues.apache.org/jira/).

* You can also slack to get in touch with the community. After we invite you, you can use this [Slack Link](https://carbondataworkspace.slack.com/) to sign in to CarbonData.

* Of course, you can scan the QR Code to join in our WeChat Group to get in touch.
![QRCode_WechatGroup](docs/images/QRCode_WechatGroup.png)



## About
Apache CarbonData is an open source project of The Apache Software Foundation (ASF).
Expand Down
2 changes: 1 addition & 1 deletion assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.carbondata</groupId>
<artifactId>carbondata-parent</artifactId>
<version>2.2.0-SNAPSHOT</version>
<version>2.3.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.carbondata</groupId>
<artifactId>carbondata-parent</artifactId>
<version>2.2.0-SNAPSHOT</version>
<version>2.3.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

Expand Down
3 changes: 0 additions & 3 deletions conf/carbon.properties.template
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,6 @@
#

#################### System Configuration ##################
##Optional. Location where CarbonData will create the store, and write the data in its own format.
##If not specified then it takes spark.sql.warehouse.dir path.
#carbon.storelocation
#Base directory for Data files
#carbon.ddl.base.hdfs.url
#Path where the bad records are stored
Expand Down
4 changes: 0 additions & 4 deletions conf/dataload.properties.template
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,6 @@
# limitations under the License.
#

#carbon store path
# you should change to the code path of your local machine
carbon.storelocation=/home/david/Documents/carbondata/examples/spark/target/store

#csv delimiter character
delimiter=,

Expand Down
2 changes: 1 addition & 1 deletion core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.carbondata</groupId>
<artifactId>carbondata-parent</artifactId>
<version>2.2.0-SNAPSHOT</version>
<version>2.3.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ private CarbonCommonConstants() {
/**
* location of the carbon member, hierarchy and fact files
*/
@Deprecated
@CarbonProperty
public static final String STORE_LOCATION = "carbon.storelocation";

Expand Down Expand Up @@ -122,6 +123,16 @@ private CarbonCommonConstants() {
*/
public static final String CARBON_TIMESTAMP_MILLIS = "dd-MM-yyyy HH:mm:ss:SSS";

/**
* CARBON Default format - time segment
*/
public static final String CARBON_TIME_SEGMENT_DEFAULT_FORMAT = " HH:mm:ss";

/**
* CARBON Default data - time segment
*/
public static final String CARBON_TIME_SEGMENT_DATA_DEFAULT_FORMAT = " 00:00:00";

/**
* Property for specifying the format of DATE data type column.
* e.g. yyyy/MM/dd , or using default value
Expand Down Expand Up @@ -2648,4 +2659,26 @@ private CarbonCommonConstants() {

public static final String CARBON_SDK_EMPTY_METADATA_PATH = "emptyMetadataFolder";

/**
* Property to identify if the spark version is above 3.x version
*/
public static final String CARBON_SPARK_VERSION_SPARK3 = "carbon.spark.version.spark3";

public static final String CARBON_SPARK_VERSION_SPARK3_DEFAULT = "false";

/**
* Carbon Spark 3.x supported data file written version
*/
public static final String CARBON_SPARK3_VERSION = "2.2.0";

/**
* This property is to enable the min max pruning of target carbon table based on input/source
* data
*/
@CarbonProperty
public static final String CARBON_CDC_MINMAX_PRUNING_ENABLED =
"carbon.cdc.minmax.pruning.enabled";

public static final String CARBON_CDC_MINMAX_PRUNING_ENABLED_DEFAULT = "false";

}
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

package org.apache.carbondata.core.datastore.block;

import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;

Expand Down Expand Up @@ -51,6 +52,12 @@ public abstract class AbstractIndex implements Cacheable {
*/
private long deleteDeltaTimestamp;

public List<TableBlockInfo> getBlockInfos() {
return blockInfos;
}

protected List<TableBlockInfo> blockInfos;

/**
* map of blockletIdAndPageId to deleted rows
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,11 @@ public class TableBlockInfo implements Distributable, Serializable {

private transient DataFileFooter dataFileFooter;

/**
* Carbon Data file written version
*/
private String carbonDataFileWrittenVersion = null;

/**
* comparator to sort by block size in descending order.
* Since each line is not exactly the same, the size of a InputSplit may differs,
Expand Down Expand Up @@ -132,6 +137,7 @@ public TableBlockInfo copy() {
info.deletedDeltaFilePath = deletedDeltaFilePath;
info.detailInfo = detailInfo.copy();
info.indexWriterPath = indexWriterPath;
info.carbonDataFileWrittenVersion = carbonDataFileWrittenVersion;
return info;
}

Expand Down Expand Up @@ -353,4 +359,13 @@ public String toString() {
sb.append('}');
return sb.toString();
}

public String getCarbonDataFileWrittenVersion() {
return carbonDataFileWrittenVersion;
}

public void setCarbonDataFileWrittenVersion(String carbonDataFileWrittenVersion) {
this.carbonDataFileWrittenVersion = carbonDataFileWrittenVersion;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,7 @@ private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pag
if (vectorInfo != null) {
// set encodings of current page in the vectorInfo, used for decoding the complex child page
vectorInfo.encodings = encodings;
vectorInfo.vector.setCarbonDataFileWrittenVersion(vectorInfo.carbonDataFileWrittenVersion);
decoder
.decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo,
nullBitSet, isLocalDictEncodedPage, pageMetadata.numberOfRowsInpage,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,7 @@ protected ColumnPage decodeMeasure(DataChunk2 pageMetadata, ByteBuffer pageData,
ColumnPageDecoder codec =
encodingFactory.createDecoder(encodings, encoderMetas, compressorName, vectorInfo != null);
if (vectorInfo != null) {
vectorInfo.vector.setCarbonDataFileWrittenVersion(vectorInfo.carbonDataFileWrittenVersion);
codec.decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo,
nullBitSet, false, pageMetadata.numberOfRowsInpage, reusableDataBuffer);
return null;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
import org.apache.carbondata.core.indexstore.ExtendedBlocklet;
import org.apache.carbondata.core.indexstore.PartitionSpec;
import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
import org.apache.carbondata.core.mutate.CdcVO;
import org.apache.carbondata.core.readcommitter.LatestFilesReadCommittedScope;
import org.apache.carbondata.core.readcommitter.ReadCommittedScope;
import org.apache.carbondata.core.readcommitter.TableStatusReadCommittedScope;
Expand Down Expand Up @@ -102,6 +103,8 @@ public class IndexInputFormat extends FileInputFormat<Void, ExtendedBlocklet>

private Set<String> missingSISegments;

private CdcVO cdcVO;

IndexInputFormat() {

}
Expand Down Expand Up @@ -275,6 +278,10 @@ public void write(DataOutput out) throws IOException {
out.writeUTF(segment);
}
}
out.writeBoolean(cdcVO != null);
if (cdcVO != null) {
cdcVO.write(out);
}
}

@Override
Expand Down Expand Up @@ -330,6 +337,11 @@ public void readFields(DataInput in) throws IOException {
missingSISegments.add(in.readUTF());
}
}
boolean isCDCJob = in.readBoolean();
if (isCDCJob) {
this.cdcVO = new CdcVO();
cdcVO.readFields(in);
}
}

private void initReadCommittedScope() throws IOException {
Expand All @@ -353,6 +365,14 @@ public boolean isFallbackJob() {
return isFallbackJob;
}

public CdcVO getCdcVO() {
return cdcVO;
}

public void setCdcVO(CdcVO cdcVO) {
this.cdcVO = cdcVO;
}

/**
* @return Whether asyncCall to the IndexServer.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
public class Blocklet implements Writable, Serializable {

/** file path of this blocklet */
private String filePath;
protected String filePath;

/** id to identify the blocklet inside the block (it is a sequential number) */
private String blockletId;
Expand Down
Loading

0 comments on commit b2d9615

Please sign in to comment.