Component Libraries •
Project Templates
Docs •
Install •
Tutorials •
Developer Guides •
Contribute •
Blog •
Discord
Xircuits Component Library for Apache Spark! Process and analyze big data with powerful, scalable components.
The Spark component library makes it easy to use Apache Spark within Xircuits workflows. You can process, analyze, and visualize large datasets efficiently.
Initializes a Spark session to enable distributed data processing.
Reads data from files in formats like csv, json, parquet, or orc into a Spark DataFrame.
Saves Spark DataFrames to files in formats such as csv, parquet, or orc.
Reads CSV files into a Spark DataFrame with options for custom delimiters and headers.
Executes SQL queries on a Spark DataFrame, returning query results as a new DataFrame.
Creates bar, scatter, or line plots from Spark DataFrames and saves them as images.
Splits a Spark DataFrame into training and testing sets based on a specified ratio.
Trains a logistic regression model using a Spark DataFrame.
Uses a trained Spark model to make predictions on a test DataFrame.
We have provided an example workflow to help you get started with the Spark component library. Give it a try and see how you can create custom Spark components for your applications.
This example demonstrates creating a line plot from a CSV file using Spark. It reads the dataset, processes it into a Spark DataFrame, and visualizes the relationship between the Year and Wind columns as a line plot saved as Lineplot.png.
This example uses the SparkLoadLIBSVM component to load data, splits it for training and testing with SparkSplitDataFrame, and trains a multinomial logistic regression model to make predictions.
This example loads penguin data from a CSV file, queries species information using Spark SQL, and generates a bar plot visualizing species distribution, saving it as 'pinguin_distribution.png'.
To use this component library, ensure that you have an existing Xircuits setup. You can then install the Spark library using the component library interface, or through the CLI using:
xircuits install spark
You can also do it manually by cloning and installing it:
# base Xircuits directory
git clone https://github.com/XpressAI/xai-spark xai_components/xai_spark
pip install -r xai_components/xai_spark/requirements.txt