-
Notifications
You must be signed in to change notification settings - Fork 979
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
DRILL-8011: Add Dropbox File System to Drill (#2337)
- Loading branch information
Showing
10 changed files
with
693 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
#Dropbox and Drill | ||
As of Drill 1.20.0 it is possible to connect Drill to a Dropbox account and query files stored there. Clearly, the performance will be much better if the files are stored | ||
locally, however, if your data is located in dropbox Drill makes it easy to explore that data. | ||
|
||
## Creating an API Token | ||
The first step to enabling Drill to query Dropbox is creating an API token. | ||
1. Navigate to https://www.dropbox.com/developers/apps/create | ||
2. Choose `Scoped Access` under Choose an API. | ||
3. Depending on the access limitations you are looking for select either full or limited to a particular folder. | ||
4. In the permissions tab, make sure all the permissions associated with reading data are enabled. | ||
|
||
Once you've done that, and hit submit, you'll see a section in your newly created Dropbox App called `Generated Access Token`. Copy the value here and that is what you will | ||
use in your Drill configuration. | ||
|
||
## Configuring Drill | ||
Once you've created a Dropbox access token, you are now ready to configure Drill to query Dropbox. To create a dropbox connection, in Drill's UI, navigate to the Storage tab, | ||
click on `Create New Storage Plugin` and add the items below: | ||
|
||
```json | ||
"type": "file", | ||
"connection": "dropbox:///", | ||
"config": { | ||
"dropboxAccessToken": "<your access token here>" | ||
}, | ||
"workspaces": { | ||
"root": { | ||
"location": "/", | ||
"writable": false, | ||
"defaultInputFormat": null, | ||
"allowAccessOutsideWorkspace": false | ||
} | ||
} | ||
} | ||
``` | ||
Paste your access token in the appropriate field and at that point you should be able to query Dropbox. Drill treats Dropbox as any other file system, so all the instructions | ||
here (https://drill.apache.org/docs/file-system-storage-plugin/) and here (https://drill.apache.org/docs/workspaces/) | ||
about configuring a workspace, and adding format plugins are exactly the same as any other on Drill. | ||
|
||
### Securing Dropbox Credentials | ||
As with any other storage plugin, you have a few options as to how to store the credentials. See [Drill Credentials Provider](./PluginCredentialsProvider.md) for more | ||
information about how you can store your credentials securely in Drill. | ||
|
||
## Running the Unit Tests | ||
Unfortunately, in order to run the unit tests, it is necessary to have an external API token. Therefore, the unit tests have to be run manually. To run the unit tests: | ||
|
||
1. Get your Dropbox API key as explained above and paste it above into the `ACCESS_TOKEN` variable. | ||
2. In your dropbox account, create a folder called 'csv' and upload the file `hdf-test.csvh` into that folder | ||
3. In your dropbox account, upload the file `http-pcap.json` to the root directory of your dropbox account | ||
4. In the `testListFiles` test, you will have to update the modified dates | ||
5. Run tests. | ||
|
||
### Test Files | ||
Test files can be found in the `java-exec/src/test/resources/dropboxTestFiles` | ||
folder. Simply copy these files in the structure there into your dropbox account. | ||
|
||
## Limitations | ||
1. It is not possible to save files to Dropbox from Drill, thus CTAS queries will fail. | ||
2. Dropbox does not expose directory metadata, so it is not possible to obtain the directory size, modification date or access dates. | ||
3. Dropbox does not maintain the last access date as distinct from the modification date of files. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
280 changes: 280 additions & 0 deletions
280
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DropboxFileSystem.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,280 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.drill.exec.store.dfs; | ||
|
||
import com.dropbox.core.DbxException; | ||
import com.dropbox.core.DbxRequestConfig; | ||
import com.dropbox.core.v2.DbxClientV2; | ||
import com.dropbox.core.v2.files.FileMetadata; | ||
import com.dropbox.core.v2.files.FolderMetadata; | ||
import com.dropbox.core.v2.files.ListFolderResult; | ||
import com.dropbox.core.v2.files.Metadata; | ||
import org.apache.hadoop.fs.FSDataInputStream; | ||
import org.apache.hadoop.fs.FSDataOutputStream; | ||
import org.apache.hadoop.fs.FileStatus; | ||
import org.apache.hadoop.fs.FileSystem; | ||
import org.apache.hadoop.fs.Path; | ||
import org.apache.hadoop.fs.PositionedReadable; | ||
import org.apache.hadoop.fs.Seekable; | ||
import org.apache.hadoop.fs.permission.FsPermission; | ||
import org.apache.hadoop.util.Progressable; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
import java.io.ByteArrayInputStream; | ||
import java.io.ByteArrayOutputStream; | ||
import java.io.IOException; | ||
import java.net.URI; | ||
import java.net.URISyntaxException; | ||
import java.util.ArrayList; | ||
import java.util.HashMap; | ||
import java.util.List; | ||
import java.util.Map; | ||
|
||
public class DropboxFileSystem extends FileSystem { | ||
private static final Logger logger = LoggerFactory.getLogger(DropboxFileSystem.class); | ||
|
||
private static final String ERROR_MSG = "Dropbox is read only."; | ||
private Path workingDirectory; | ||
private DbxClientV2 client; | ||
private FileStatus[] fileStatuses; | ||
private final Map<String,FileStatus> fileStatusCache = new HashMap<>(); | ||
|
||
@Override | ||
public URI getUri() { | ||
try { | ||
return new URI("dropbox:///"); | ||
} catch (URISyntaxException e) { | ||
throw new RuntimeException(e); | ||
} | ||
} | ||
|
||
@Override | ||
public FSDataInputStream open(Path path, int bufferSize) throws IOException { | ||
FSDataInputStream fsDataInputStream; | ||
String filename = getFileName(path); | ||
client = getClient(); | ||
ByteArrayOutputStream out = new ByteArrayOutputStream(); | ||
try { | ||
client.files().download(filename).download(out); | ||
fsDataInputStream = new FSDataInputStream(new SeekableByteArrayInputStream(out.toByteArray())); | ||
} catch (DbxException e) { | ||
throw new IOException(e.getMessage()); | ||
} | ||
return fsDataInputStream; | ||
} | ||
|
||
@Override | ||
public FSDataOutputStream create(Path f, | ||
FsPermission permission, | ||
boolean overwrite, | ||
int bufferSize, | ||
short replication, | ||
long blockSize, | ||
Progressable progress) throws IOException { | ||
throw new IOException(ERROR_MSG); | ||
} | ||
|
||
@Override | ||
public FSDataOutputStream append(Path f, int bufferSize, Progressable progress) throws IOException { | ||
throw new IOException(ERROR_MSG); | ||
} | ||
|
||
@Override | ||
public boolean rename(Path src, Path dst) throws IOException { | ||
return false; | ||
} | ||
|
||
@Override | ||
public boolean delete(Path f, boolean recursive) throws IOException { | ||
throw new IOException(ERROR_MSG); | ||
} | ||
|
||
@Override | ||
public FileStatus[] listStatus(Path path) throws IOException { | ||
client = getClient(); | ||
List<FileStatus> fileStatusList = new ArrayList<>(); | ||
|
||
// Get files and folder metadata from Dropbox root directory | ||
try { | ||
ListFolderResult result = client.files().listFolder(""); | ||
while (true) { | ||
for (Metadata metadata : result.getEntries()) { | ||
fileStatusList.add(getFileInformation(metadata)); | ||
} | ||
if (!result.getHasMore()) { | ||
break; | ||
} | ||
result = client.files().listFolderContinue(result.getCursor()); | ||
} | ||
} catch (DbxException e) { | ||
throw new IOException(e.getMessage()); | ||
} | ||
|
||
// Convert to Array | ||
fileStatuses = new FileStatus[fileStatusList.size()]; | ||
for (int i = 0; i < fileStatusList.size(); i++) { | ||
fileStatuses[i] = fileStatusList.get(i); | ||
} | ||
|
||
return fileStatuses; | ||
} | ||
|
||
@Override | ||
public void setWorkingDirectory(Path new_dir) { | ||
logger.debug("Setting working directory to: " + new_dir.getName()); | ||
workingDirectory = new_dir; | ||
} | ||
|
||
@Override | ||
public Path getWorkingDirectory() { | ||
return workingDirectory; | ||
} | ||
|
||
@Override | ||
public boolean mkdirs(Path f, FsPermission permission) throws IOException { | ||
throw new IOException(ERROR_MSG); | ||
} | ||
|
||
@Override | ||
public FileStatus getFileStatus(Path path) throws IOException { | ||
String filePath = Path.getPathWithoutSchemeAndAuthority(path).toString(); | ||
/* | ||
* Dropbox does not allow metadata calls on the root directory | ||
*/ | ||
if (filePath.equalsIgnoreCase("/")) { | ||
return new FileStatus(0, true, 1, 0, 0, new Path("/")); | ||
} | ||
client = getClient(); | ||
try { | ||
Metadata metadata = client.files().getMetadata(filePath); | ||
return getFileInformation(metadata); | ||
} catch (Exception e) { | ||
throw new IOException("Error accessing file " + filePath + "\n" + e.getMessage()); | ||
} | ||
} | ||
|
||
private FileStatus getFileInformation(Metadata metadata) { | ||
if (fileStatusCache.containsKey(metadata.getPathLower())){ | ||
return fileStatusCache.get(metadata.getPathLower()); | ||
} | ||
|
||
FileStatus result; | ||
if (isDirectory(metadata)) { | ||
// Note: At the time of implementation, DropBox does not provide an efficient way of | ||
// getting the size and/or modification times for folders. | ||
result = new FileStatus(0, true, 1, 0, 0, new Path(metadata.getPathLower())); | ||
} else { | ||
FileMetadata fileMetadata = (FileMetadata) metadata; | ||
result = new FileStatus(fileMetadata.getSize(), false, 1, 0, fileMetadata.getClientModified().getTime(), new Path(metadata.getPathLower())); | ||
} | ||
|
||
fileStatusCache.put(metadata.getPathLower(), result); | ||
return result; | ||
} | ||
|
||
private DbxClientV2 getClient() { | ||
if (this.client != null) { | ||
return client; | ||
} | ||
|
||
// read preferred client identifier from config or use "Apache/Drill" | ||
String clientIdentifier = this.getConf().get("clientIdentifier", "Apache/Drill"); | ||
logger.info("Creating dropbox client with client identifier: {}", clientIdentifier); | ||
DbxRequestConfig config = DbxRequestConfig.newBuilder(clientIdentifier).build(); | ||
|
||
// read access token from config or credentials provider | ||
logger.info("Reading dropbox access token from configuration or credentials provider"); | ||
String accessToken = this.getConf().get("dropboxAccessToken", ""); | ||
|
||
this.client = new DbxClientV2(config, accessToken); | ||
return this.client; | ||
} | ||
|
||
private boolean isDirectory(Metadata metadata) { | ||
return metadata instanceof FolderMetadata; | ||
} | ||
|
||
private boolean isFile(Metadata metadata) { | ||
return metadata instanceof FileMetadata; | ||
} | ||
|
||
private String getFileName(Path path){ | ||
return path.toUri().getPath(); | ||
} | ||
|
||
static class SeekableByteArrayInputStream extends ByteArrayInputStream implements Seekable, PositionedReadable { | ||
|
||
public SeekableByteArrayInputStream(byte[] buf) | ||
{ | ||
super(buf); | ||
} | ||
@Override | ||
public long getPos() throws IOException{ | ||
return pos; | ||
} | ||
|
||
@Override | ||
public void seek(long pos) throws IOException { | ||
if (mark != 0) { | ||
throw new IllegalStateException(); | ||
} | ||
|
||
reset(); | ||
long skipped = skip(pos); | ||
|
||
if (skipped != pos) { | ||
throw new IOException(); | ||
} | ||
} | ||
|
||
@Override | ||
public boolean seekToNewSource(long targetPos) throws IOException { | ||
return false; | ||
} | ||
|
||
@Override | ||
public int read(long position, byte[] buffer, int offset, int length) throws IOException { | ||
|
||
if (position >= buf.length) { | ||
throw new IllegalArgumentException(); | ||
} | ||
if (position + length > buf.length) { | ||
throw new IllegalArgumentException(); | ||
} | ||
if (length > buffer.length) { | ||
throw new IllegalArgumentException(); | ||
} | ||
|
||
System.arraycopy(buf, (int) position, buffer, offset, length); | ||
return length; | ||
} | ||
|
||
@Override | ||
public void readFully(long position, byte[] buffer) throws IOException { | ||
read(position, buffer, 0, buffer.length); | ||
|
||
} | ||
|
||
@Override | ||
public void readFully(long position, byte[] buffer, int offset, int length) throws IOException { | ||
read(position, buffer, offset, length); | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.