Logo

dev-resources.site

for different kinds of informations.

Quick tip: Using Apache Spark with SingleStore Notebooks

Published at
3/18/2024
Categories
singlestoredb
apachespark
jupyternotebook
Author
veryfatboy
Author
10 person written this
veryfatboy
open
Quick tip: Using Apache Spark with SingleStore Notebooks

Abstract

SingleStore has been providing a cloud portal and a DBaaS offering for some time. Additionally, it has offered a Spark Connector for a while, but Apache Spark had to be run externally. The recent addition of notebooks to the cloud portal has significantly improved Data Science capabilities, including the ability to use Apache Spark. Spark can now be installed in the notebook environment in a few minutes. This article will show how.

The notebook file used in this article is available on GitHub.

Create a SingleStore Cloud account

A previous article showed the steps to create a free SingleStore Cloud account. We'll use the following settings:

  • Workspace Group Name: Spark Demo Group
  • Cloud Provider: AWS
  • Region: US East 1 (N. Virginia)
  • Workspace Name: spark-demo
  • Size: S-00

Create a new notebook

From the left navigation pane in the cloud portal, we'll select DEVELOP > Data Studio.

In the top right of the web page, we'll select New Notebook > New Notebook, as shown in Figure 1.

Figure 1. New Notebook.

Figure 1. New Notebook.

We'll call the notebook spark_demo, select a Blank notebook template from the available options, and save it in the Personal location.

Fill out the notebook

Install Apache Spark

We can easily install Java:

!conda install -y --quiet -c conda-forge openjdk=8
Enter fullscreen mode Exit fullscreen mode

and Spark:

!pip install pyspark --quiet
Enter fullscreen mode Exit fullscreen mode

Once the installation has been completed, we can check the version of Java, as follows:

!java version
Enter fullscreen mode Exit fullscreen mode

Example output:

openjdk version "1.8.0_382"
OpenJDK Runtime Environment (Zulu 8.72.0.17-CA-linux64) (build 1.8.0_382-b05)
OpenJDK 64-Bit Server VM (Zulu 8.72.0.17-CA-linux64) (build 25.382-b05, mixed mode)
Enter fullscreen mode Exit fullscreen mode

Next, let's check the version of PySpark:

print(pyspark.__version__)
Enter fullscreen mode Exit fullscreen mode

Example output:

3.5.1
Enter fullscreen mode Exit fullscreen mode

Finally, we'll check the version of Python:

print(sys.version)
Enter fullscreen mode Exit fullscreen mode

Example output:

3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]
Enter fullscreen mode Exit fullscreen mode

There is a useful Spark Python Supportability Matrix that shows the compatibility of Python with various Spark releases.

Test Apache Spark

Now, let's test the Apache Spark installation.

First, let's create a SparkSession:

# Create a Spark session
spark = SparkSession.builder.appName("Spark Test").getOrCreate()
Enter fullscreen mode Exit fullscreen mode

Next, let's create a DataFrame:

# Create a DataFrame
data = [("Peter", 27), ("Paul", 28), ("Mary", 29)]
df = spark.createDataFrame(data, ["Name", "Age"])
Enter fullscreen mode Exit fullscreen mode

Now we'll show the DataFrame:

# Show the content of the DataFrame
df.show()
Enter fullscreen mode Exit fullscreen mode

The output should be as follows:

+-----+---+
| Name|Age|
+-----+---+
|Peter| 27|
| Paul| 28|
| Mary| 29|
+-----+---+
Enter fullscreen mode Exit fullscreen mode

Finally, we'll stop the SparkSession:

# Stop the Spark session
spark.stop()
Enter fullscreen mode Exit fullscreen mode

Summary

In this short article, we've seen how to install and use Apache Spark in the SingleStore notebook environment. In future articles, we'll explore Spark's capabilities more extensively and demonstrate how to integrate it with the SingleStore Data Platform for reading and writing data using a database.

singlestoredb Article's
30 articles in total
Favicon
Quick tip: Visualising the Air Quality Index (AQI) across Punjab, Pakistan and India
Favicon
Quick tip: Using SingleStore with OpenAI's Swarm
Favicon
Quick tip: Using SingleStore and WebAssembly for Sentiment Analysis of Stack Overflow Comments
Favicon
Quick tip: Building Predictive Analytics for Loan Approvals
Favicon
Quick tip: Build Vector Embeddings for Video via Python Notebook & OpenAI CLIP
Favicon
Quick tip: SingleStore Kai support for MongoDB $vectorSearch
Favicon
Quick tip: Using SingleStore with PyIceberg
Favicon
Quick tip: Using SingleStore for Iceberg Catalog Storage
Favicon
Quick tip: Using picoGPT in the SingleStore portal
Favicon
Quick tip: Ollama + SingleStore - LangChain = :-(
Favicon
Quick tip: How to Build Local LLM Apps with Ollama and SingleStore
Favicon
Quick tip: Using R, OpenAI and SingleStore Notebooks
Favicon
Quick tip: Write numpy arrays directly to the SingleStore VECTOR data type
Favicon
Quick tip: Using R, Rayshader and SingleStore Notebooks
Favicon
Quick tip: Using R with SingleStore Notebooks
Favicon
Quick tip: Using Apache Spark and GraphFrames with SingleStore Notebooks
Favicon
Quick tip: Using Apache Spark Structured Streaming with SingleStore Notebooks
Favicon
Quick tip: Using SingleStore Spark Connector's Query Pushdown with SingleStore Notebooks
Favicon
Quick tip: Using the SingleStore Spark Connector with SingleStore Notebooks
Favicon
Quick tip: Using Apache Spark with SingleStore Notebooks for Fraud Detection
Favicon
Quick tip: Cosine Similarity revisited in SingleStore
Favicon
Quick tip: Using Apache Spark with SingleStore Notebooks
Favicon
Quick tip: Using Approximate Nearest Neighbor (ANN) Search with SingleStoreDB
Favicon
Quick tip: Using the new VECTOR data type and Infix Operators in SingleStoreDB
Favicon
Quick tip: Dot Product, Euclidean Distance and Cosine Similarity in SingleStoreDB
Favicon
Vector Databases & AI Applications for Dummies
Favicon
Quick tip: Analysing Stock Tick Data in SingleStoreDB using LangChain and OpenAI's Whisper
Favicon
Quick tip: Replicating JSON data from MongoDB to SingleStore Kai and creating OpenAI embeddings
Favicon
Quick tip: Streaming data from MongoDB Atlas to SingleStore Kai using Kafka and CDC
Favicon
Quick tip: Using LangChain's SQLDatabaseToolkit with SingleStoreDB

Featured ones: