Skip to content

XpressAI/xai-spark

Repository files navigation

Component LibrariesProject Templates
DocsInstallTutorialsDeveloper GuidesContributeBlogDiscord

Xircuits Component Library for Apache Spark! Process and analyze big data with powerful, scalable components.


Xircuits Component Library for Spark

The Spark component library makes it easy to use Apache Spark within Xircuits workflows. You can process, analyze, and visualize large datasets efficiently.

Table of Contents

Preview

SparkLinePlot Example

SparkLinePlot

SparkLinePlot Result

SparkLinePlot

SparkLogisticRegressionSample Example

SparkLogisticRegressionSample

SparkLogisticRegressionSample Result

SparkLogisticRegressionSample_result

SparkSQLPlotBar Example

SparkSQLPlotBar

SparkSQLPlotBar Result

SparkSQLPlotBar_result

Main Xircuits Components

xSparkSession

Initializes a Spark session to enable distributed data processing.

xSparkSession

SparkReadFile Component:

Reads data from files in formats like csv, json, parquet, or orc into a Spark DataFrame.

SparkReadFile

SparkWriteFile Component:

Saves Spark DataFrames to files in formats such as csv, parquet, or orc.

SparkReadCSV Component:

Reads CSV files into a Spark DataFrame with options for custom delimiters and headers.

SparkSQL Component:

Executes SQL queries on a Spark DataFrame, returning query results as a new DataFrame.

SparkVisualize Component:

Creates bar, scatter, or line plots from Spark DataFrames and saves them as images.

SparkSplitDataFrame Component:

Splits a Spark DataFrame into training and testing sets based on a specified ratio.

SparkLogisticRegression Component:

Trains a logistic regression model using a Spark DataFrame.

SparkPredict Component:

Uses a trained Spark model to make predictions on a test DataFrame.

Try the Examples

We have provided an example workflow to help you get started with the Spark component library. Give it a try and see how you can create custom Spark components for your applications.

SparkLinePlot

This example demonstrates creating a line plot from a CSV file using Spark. It reads the dataset, processes it into a Spark DataFrame, and visualizes the relationship between the Year and Wind columns as a line plot saved as Lineplot.png.

SparkLogisticRegressionSample

This example uses the SparkLoadLIBSVM component to load data, splits it for training and testing with SparkSplitDataFrame, and trains a multinomial logistic regression model to make predictions.

SparkSQLPlotBar

This example loads penguin data from a CSV file, queries species information using Spark SQL, and generates a bar plot visualizing species distribution, saving it as 'pinguin_distribution.png'.

Installation

To use this component library, ensure that you have an existing Xircuits setup. You can then install the Spark library using the component library interface, or through the CLI using:

xircuits install spark

You can also do it manually by cloning and installing it:

# base Xircuits directory  
git clone https://github.com/XpressAI/xai-spark xai_components/xai_spark  
pip install -r xai_components/xai_spark/requirements.txt  

About

Xircuits component library for Apache Spark!

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages