Skip to content

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

License

Notifications You must be signed in to change notification settings

mdeenah/bigdata-file-viewer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bigdata-file-viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, Avro, etc. Support local file system, HDFS, AWS S3, etc.

Note, you're recommended to download release v1.1.1 to if you just want to view local bigdata binary files, it's lightweight without dependency to AWS SDK, Azure SDK, etc. Quite honestly, you can download data files from web portal of AWS, Azure ,etc. before viewing it with this tool. The reason why I integrted the cloud storage system's SDK into this tool is more like a demo of how to use Java to read files from specific storage system.

GitHub stars GitHub release GitHub license

Feature List

  • Open and view Parquet, ORC and Avro at local directory, HDFS, AWS S3, etc.
  • Convert binary format data to text format data like CSV
  • Support complex data type like array, map, struct, etc
  • Suport multiple platforms like Windows, MAC and Linux
  • Code is extensible to involve other data format

Usage

  • Download runnable jar from release page or follow Build section to build from source code.
  • Invoke it by java -jar BigdataFileViewer-1.2-SNAPSHOT-jar-with-dependencies.jar
  • Open binary format file by "File" -> "Open". Currently, it can open file with parquet suffix, orc suffix and avro suffix. If no suffix specified, the tool will try to extract it as Parquet file
  • Set the maximum rows of each page by "View" -> Input maximum row number -> "Go"
  • Set visible properties by "View" -> "Add/Remove Properties"
  • Convert to CSV file by "File" -> "Save as" -> "CSV"
  • Check schema information by unfolding "Schema Information" panel

Click here for live demo

Build

  • Use mvn package to build an all-in-one runnable jar
  • Java 1.8 or higher is required
  • Make sure the Java has javafx bound. For example, I installed openjdk 1.8 on Ubuntu 18.04 and it has no javafx bound, then I installed it following guide here.

Screenshots

Main page

About

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%