dev-resources.site
for different kinds of informations.
Unlock 1 Billion NYC Taxi Rides: A Step-by-Step Guide
Unleashing Insights: Visualize NYC Taxi Data with Timescale and Grafana
Grafana, an open-source analytics and monitoring platform, is often used to visualize time-series data. In this tutorial, you will learn how to configure Grafana and Timescale Cloud, leverage Grafana to visualize metrics stored in TimescaleDB, and visualize geospatial data using Grafana.
Configuring Grafana and Timescale Cloud
To get started, set up Timescale Cloud. If you prefer to run your own instance of TimescaleDB, follow the installation instructions, and the remainder of the tutorial should be relatively straightforward to follow.
Once you've set up Timescale Cloud, you'll have a functional version of TimescaleDB with preloaded data. In this case, we'll use the New York City taxicab data found in the "Hello, Timescale!" tutorial. For background information on using TimescaleDB, be sure to follow the full tutorial.
If you're using Timescale Cloud, you can set up a Grafana Metrics Dashboard from the Create Service flow. Alternatively, you can set up Grafana Cloud and follow the rest of the instructions below. Note that Grafana Cloud is more feature-rich than the open-source version of Grafana included with Timescale Cloud, but requires a paid subscription from Grafana.
Finally, configure Grafana to connect to your Timescale Cloud instance (or your own installation of TimescaleDB). Begin by selecting 'Add Data Source' and choosing the 'PostgreSQL' option in the SQL group. In the configuration screen, provide the Host, Database, User, and Password for your Timescale Cloud instance (or TimescaleDB server). If you're a Timescale Cloud user, you can find this information in the Service Dashboard for your Timescale Cloud instance.
For more information on unleashing insights and visualizing NYC taxi data with Timescale and Grafana, check out this tutorial on t8tech.com.
When connecting to a TimescaleDB instance in Timescale Cloud for this tutorial, it is crucial to select the 'TimescaleDB' option in the 'PostgreSQL details' section of the PostgreSQL configuration screen.
We will also rename the database to NYC Taxi Cab Data
, which is optional but helps others who use our Grafana dashboard understand the nature of this data source.
Once complete, click 'Save & Test'. You should receive confirmation that your database connection is functioning correctly.
Designing a Grafana Dashboard and Panel
Grafana organizes information into 'Dashboards' and 'Panels', where a dashboard represents a view into the performance of a system, and each dashboard consists of one or more panels that convey details about a specific metric related to that system.
To begin, let's create a new dashboard. In the far left of the Grafana user interface, you'll find a '+' icon. Hover over it to reveal a 'Create' menu, containing a 'Dashboard' option. Select that 'Dashboard' option.
After creating a new dashboard, you'll see a 'New Panel' screen, with options for 'Add Query' and 'Choose Visualization'. If you already have a dashboard with panels, you can add a new panel by clicking on the '+' icon at the top of the Grafana user interface.
To continue with our tutorial, let's add a new visualization by selecting the 'Choose Visualization' option.
At this point, you'll have several options for different Grafana visualizations. We'll choose the first option, the 'Graph' visualization.
There are multiple ways to configure our panel, but we'll accept all the defaults and create a simple 'Lines' graph.
In the far left section of the Grafana user interface, select the 'Queries' tab.
Instead of using the Grafana query builder, we'll edit our query directly. In the view, click on the 'Edit SQL' button at the bottom.
Before crafting our query, we must set up the Query database to leverage the New York City taxi cab datasource we connected to earlier:
Unveiling metrics stored in TimescaleDB
Let's create a visualization that answers the question What is the daily ride frequency? as outlined in the Hello, Timescale! tutorial.
As demonstrated in the tutorial, the standard SQL syntax for our query is as follows:
SELECT date_trunc('day', pickup_datetime) AS day,
COUNT(*)
FROM rides
GROUP BY day
ORDER BY day;
We will need to alter this query to support Grafanaâs unique query syntax.
Modifying the SELECT statement
First, we will modify the date_trunc
function to use the TimescaleDB time_bucket
function. You can consult the TimescaleDB API Reference on time_bucket for more information on how to use it properly.
Letâs examine the SELECT
portion of this query. First, we will bucket our results into one day groupings using the time_bucket
function. If you set the âFormatâ of a Grafana panel to be âTime seriesâ, for use in Graph panel for example, then the query must return a column named time
that returns either a SQL datetime
or any numeric datatype representing a Unix epoch.
So, part 1 of this new query is modified so that the output of the time_bucket
grouping is labeled time
as Grafana requires, while part 2 is unchanged:
SELECT
--1--
time_bucket('1 day', pickup_datetime) AS "time",
--2--
COUNT(*)
FROM rides
The Grafana __timeFilter function
Grafana time-series panels include a tool that enables the end-user to filter on a given time range. A âtime filter,â if you will. Not surprisingly, Grafana has a way to link the user interface construct in a Grafana panel with the query itself. In this case, the $__timefilter()
function.
In the modified query below, we will use the $__timefilter()
function to set the pickup_datetime
column as the filtering range for our visualizations.
SELECT
--1--
time_bucket('1 day', pickup_datetime) AS "time",
--2--
COUNT(*)
FROM rides
WHERE $__timeFilter(pickup_datetime)
Defining Query Parameters
Our ultimate goal is to categorize our visualization based on the chosen time intervals and arrange the results in a sequential manner. Consequently, our GROUP BY
and ORDER BY
statements will rely on the time
parameter.
With these adjustments, our final Grafana query takes shape:
SELECT
--1--
time_bucket('1 day', pickup_datetime) AS time,
--2--
COUNT(*)
FROM rides
WHERE $__timeFilter(pickup_datetime)
GROUP BY time
ORDER BY time
Upon visualizing this query in Grafana, we obtain the following outcome:
IMPORTANT REMINDER: Don't overlook setting the time filter in the upper right corner of your Grafana dashboard. If you're utilizing the pre-built sample dataset for this example, you'll want to set your time filter around January 1st, 2016.
Currently, the data is aggregated into 1-day groupings. Let's modify the time_bucket
function to aggregate into 5-minute groupings instead and compare the resulting graphs:
SELECT
--1--
time_bucket('5m', pickup_datetime) AS time,
--2--
COUNT(*)
FROM rides
WHERE $__timeFilter(pickup_datetime)
GROUP BY time
ORDER BY time
When we bring this query to life, it will take the following form:
Unleash the Potential of Geospatial Insights in TimescaleDB
The NYC Taxi Cab dataset also includes the pickup location for each ride. In the Hello, Timescale! Tutorial, we explored rides originating near Times Square. Now, letâs take it a step further and chart rides with a distance traveled exceeding five miles within Manhattan.
We can achieve this in Grafana using the âWorldmap Panelâ. To begin, create a new panel, select âNew Visualizationâ, and choose the âWorldmap Panelâ.
Once again, weâll modify our query directly. In the Query screen, ensure youâve selected your NYC Taxicab Data as the data source. In the âFormat asâ dropdown, select âTableâ. Click on âEdit SQLâ and enter the following query in the text window:
SELECT time_bucket('5m', rides.pickup_datetime) AS time,
rides.trip_distance AS value,
rides.pickup_latitude AS latitude,
rides.pickup_longitude AS longitude
FROM rides
WHERE $__timeFilter(rides.pickup_datetime) AND
ST_Distance(pickup_geom,
ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)
) < 2000
GROUP BY time,
rides.trip_distance,
rides.pickup_latitude,
rides.pickup_longitude
ORDER BY time
LIMIT 500;
Let's dissect this query. Our initial goal is to create a visual representation of rides, with markers that indicate the distance traveled. Trips covering longer distances will receive distinct visual treatments on our map, with the trip_distance
serving as the value for our plot, stored in the value
field.
In the second and third lines of the SELECT
statement, we're harnessing the pickup_longitude
and pickup_latitude
fields in the database, mapping them to variables longitude
and latitude
, respectively, to pinpoint exact locations.
In the WHERE
clause, we're applying a geospatial boundary to identify trips within a 2000m radius of Times Square, effectively narrowing down our search.
Finally, in the GROUP BY
clause, we're supplying the trip_distance
and location variables, enabling Grafana to plot data with precision.
WARNING: This query may take some time, depending on your Internet connection speed. This is why weâre using the
LIMIT
statement for demonstration purposes, to avoid overwhelming the system.
Now, letâs configure our Worldmap visualization. Select the âVisualizationâ tab, located at the far left of the Grafana user interface. Youâll find options for âMap Visual Optionsâ, âMap Data Optionsâ, and more, allowing for a high degree of customization.
First, ensure the âMap Data Optionsâ are set to âtableâ and âcurrentâ. Then, in the âField Mappingsâ section, set the âTable Query Formatâ to âTableâ. We can map the âLatitude Fieldâ to our latitude
variable, the âLongitude Fieldâ to our longitude
variable, and the âMetricâ field to our value
variable, creating a seamless connection between our data and visualization.
In the âMap Visual Optionsâ, set the âMin Circle Sizeâ to 1 and the âMax Circle Sizeâ to 5, allowing for a range of visual representations.
In the âThreshold Optionsâ, set the âThresholdsâ to â2,5,10â. This will automatically configure a set of colors, with any plot whose value
is below 2 assigned one color, any value
between 2 and 5 assigned another color, any value
between 5 and 10 assigned a third color, and any value
over 10 assigned a fourth color, providing a clear visual distinction.
Your configuration should resemble this:
At this point, data should be flowing into our Worldmap visualization, as shown, providing a comprehensive and interactive view of our data.
You can fine-tune the time filter at the top of your visualization to access trip pickup data across diverse time intervals.
Unlocking the Potential of Grafana Variables
Our goal is to create a variable that determines the type of ride displayed in the visual, based on the payment method used for the ride.
A wide range of payment options are available, which can be seen in the payment_types
table:
payment_type | description
--------------+-------------
1 | credit card
2 | cash
3 | no charge
4 | dispute
5 | unknown
6 | voided trip
(6 rows)
Grafana includes many types of variables, and variables in Grafana function just like variables in programming languages. We define a variable, and then we can reference it in our queries.
Define a new Grafana variable
To create a new variable, go to your Grafana dashboard settings, navigate to the âVariableâ option in the side-menu, and then click the âAdd variableâ button.
In this case, we use the âQueryâ type, where our variable will be defined as the results of SQL query.
Under the âGeneralâ section, weâll name our variable payment_type
and give it a type of Query
. Then, weâll assign it the label of âPayment Typeâ, which is how it will appear in a drop-down menu.
We will select our data source and supply the query:
SELECT payment_type FROM payment_types;
Enable the 'Multi-value' and 'Include All option' functionalities. This will grant users of your dashboard the ability to choose multiple payment methods. The resulting configuration should resemble the following:
Click 'Add' to save your variable.
Integrating the Variable into a Grafana Panel
Let's modify the WorldMap panel we created in the previous section. The first thing you'll notice is that now that we've defined a variable for this dashboard, a dropdown menu for that variable has appeared in the upper left-hand corner of the panel.
We can utilize this variable to filter the results of our query using the WHERE
clause in SQL. We will verify whether rides.payment_type
is included in the array of the variable, which we've denoted as $payment_type
.
Let's revise our earlier query as follows:
SELECT time_bucket('5m', rides.pickup_datetime) AS time,
rides.trip_distance AS value,
rides.pickup_latitude AS latitude,
rides.pickup_longitude AS longitude
FROM rides
WHERE $__timeFilter(rides.pickup_datetime) AND
ST_Distance(pickup_geom,
ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)
) < 2000 AND
rides.payment_type IN ($payment_type)
GROUP BY time,
rides.trip_distance,
rides.pickup_latitude,
rides.pickup_longitude
ORDER BY time
LIMIT 500;
Use the variable in a Grafana panel
We can now leverage the drop-down menu to classify our rides based on the payment method utilized:
Enhancing the Grafana filter for better readability
However, this filter lacks visual appeal. The numerical values, such as â1â, are ambiguous. Fortunately, when we set up our NYC Taxi Cab dataset, we created a payment_types
table (which we queried earlier). The payment_types.description
field contains a more descriptive explanation of each payment code, for instance, âcredit cardâ, âcashâ, and so forth. It is these descriptive labels that we want to feature in our drop-down menu.
Click âDashboard settingsâ (the âgearâ icon in the upper-right of your Grafana visualizations). Select the âVariablesâ tab on the left, and click the $payment_types
variable. Modify your query to retrieve the description
and store it in the __text
field, and retrieve the payment_type
and store it in the __value
field, as follows:
SELECT description AS "__text", payment_type AS "__value" FROM payment_types
Your configuration should now mirror the following setup:
No modifications are required for the WorldMap visualization query itself. The database column designated as __text
is leveraged whenever the variable is displayed, whereas the value assigned to __value
serves as the actual value when Grafana executes a query.
As depicted, a variable can be utilized in a query in a manner akin to using a variable in any programming language.
Featured ones: