dev-resources.site
for different kinds of informations.
Quick tip: Using R with SingleStore Notebooks
Update: As of September 2024, running R is no longer possible on the SingleStore Portal. I will find another way to show the integration between SingleStore and R.
Abstract
SingleStore provides a Jupyter-based notebook environment with support for Python, SQL and Markdown. However, we can also install and use the R programming language. In this article, we'll see how.
The notebook file used in this article is available on GitHub.
Create a SingleStore Cloud account
A previous article showed the steps to create a free SingleStore Cloud account. We'll use the following settings:
- Workspace Group Name: R Demo Group
- Cloud Provider: AWS
- Region: US East 1 (N. Virginia)
- Workspace Name: r-demo
- Size: S-00
Create a new notebook
From the left navigation pane in the cloud portal, we'll select DEVELOP > Data Studio.
In the top right of the web page, we'll select New Notebook > New Notebook, as shown in Figure 1.
We'll call the notebook r_demo, select a Blank notebook template from the available options, and save it in the Personal location.
Create a database
In our SingleStore Cloud account, let's use the SQL Editor to create a new database. Call this iris_db
, as follows:
DROP DATABASE IF EXISTS iris_db;
CREATE DATABASE IF NOT EXISTS iris_db;
Fill out the notebook
First, let's install the R kernel and some other packages we need for this article:
!conda install -y --quiet -c conda-forge r-irkernel r-rjava r-rjdbc r-ggplot2
Next, we need to change the kernel. Refreshing the page will help the notebook detect any changes, including the installation of a new kernel.
In the top right, we can see that Python is currently selected, as shown in Figure 2.
Selecting Python 3 will present a box with a pull-down as shown in Figure 3.
Clicking the pull-down will show some options and R should be one of the options. We'll choose R, as shown in Figure 4.
Next, we'll click the Select button.
To connect to SingleStore, we'll use JDBC, as follows:
library(RJDBC)
This will also load DBI
and rJava
:
Loading required package: DBI
Loading required package: rJava
Next, we'll download the SingleStore JDBC Client and save it in a jars
directory:
# URL of the JDBC driver file
driver_url <- "https://repo1.maven.org/maven2/com/singlestore/singlestore-jdbc-client/1.2.1/singlestore-jdbc-client-1.2.1.jar"
# Set the JDBC driver class name
driver <- "com.singlestore.jdbc.Driver"
# Local directory to save the driver file
local_dir <- "jars"
dir.create(local_dir, showWarnings = FALSE, recursive = TRUE)
# Check if the driver file already exists
driver_file <- file.path(
local_dir,
"singlestore-jdbc-client-1.2.1.jar"
)
if (!file.exists(driver_file)) {
# Download the JDBC driver file if it doesn't exist
download.file(
driver_url,
destfile = driver_file,
mode = "wb",
quiet = TRUE
)
}
# Check if the driver file has been downloaded successfully
if (file.exists(driver_file)) {
print("Driver file downloaded successfully")
} else {
print("Failed to download the driver file")
}
Now we'll create the connection details to SingleStore:
host <- "<HOST>"
port <- 3306
database <- "iris_db"
user <- "admin"
password <- "<PASSWORD>"
url <- paste0("jdbc:singlestore://", host, ":", port, "/", database)
Replace <HOST>
and <PASSWORD>
with the values for your environment. These values can be obtained from the workspace using Connect > SQL IDE.
Next, let's load the iris
dataset, make some small adjustments to the column names and show the first few rows:
# Load the iris dataset
data(iris)
# Replace "." with "_" in column names
colnames(iris) <- gsub("\\.", "_", colnames(iris))
# Print the first few rows of the dataset
head(iris)
Example output:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
We'll now prepare the connection to SingleStore, write the iris
dataset to the database and read it back again.
# Establish the JDBC connection
conn <- dbConnect(
drv = JDBC(driver, driver_file),
url = url,
user = user,
password = password
)
# Write the iris dataset to the database
dbWriteTable(conn, "iris", iris, overwrite = TRUE)
# Read the iris dataset from the database
iris_from_db <- dbReadTable(conn, "iris")
# Print the first few rows of the dataset read from the database
head(iris_from_db)
# Close the JDBC connection
dbDisconnect(conn)
Example output:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1 5.7 2.9 4.2 1.3 versicolor
2 5.1 3.8 1.5 0.3 setosa
3 5.4 3.0 4.5 1.5 versicolor
4 4.3 3.0 1.1 0.1 setosa
5 5.5 2.5 4.0 1.3 versicolor
6 6.4 2.9 4.3 1.3 versicolor
Bonus: Create visualisations
We can easily create some plots using ggplot
:
library(ggplot2)
First, a scatter plot of Sepal Length vs Sepal Width, as shown in Figure 5.
# Scatter plot of Sepal Length vs Sepal Width
ggplot(iris_from_db, aes(x = Sepal_Length, y = Sepal_Width, color = Species)) +
geom_point() +
labs(x = "Sepal Length", y = "Sepal Width", title = "Sepal Length vs Sepal Width")
Next, a box plot of Petal Length by Species, as shown in Figure 6.
# Box plot of Petal Length by Species
ggplot(iris_from_db, aes(x = Species, y = Petal_Length, fill = Species)) +
geom_boxplot() +
labs(x = "Species", y = "Petal Length", title = "Petal Length by Species")
Finally, a histogram of Petal Width, as shown in Figure 7.
# Histogram of Petal Width
ggplot(iris_from_db, aes(x = Petal_Width)) +
geom_histogram(binwidth = 0.1, fill = "skyblue", color = "black") +
labs(x = "Petal Width", y = "Frequency", title = "Petal Width")
Summary
In this short article, we've seen how to install R, how to connect to SingleStore from R, and how to write and read data using R and SingleStore. We've also quickly and easily created several powerful visualisations.
Featured ones: