Logo

dev-resources.site

for different kinds of informations.

The Heart of DolphinScheduler: In-Depth Analysis of the Quartz Scheduling Framework

Published at
11/20/2024
Categories
apachedolphinscheduler
quartz
opensource
bigdata
Author
chen_debra_3060b21d12b1b0
Author
25 person written this
chen_debra_3060b21d12b1b0
open
The Heart of DolphinScheduler: In-Depth Analysis of the Quartz Scheduling Framework

Quartz is an open-source Java job scheduling framework that provides powerful capabilities for scheduling tasks. In DolphinScheduler, Quartz is used to implement task scheduling and management. DolphinScheduler integrates with Quartz through the QuartzExecutorImpl class, combining workflow and schedule management operations with Quartz's scheduling framework to achieve task execution.

This article provides a detailed analysis of Quartz's principles and implementation within DolphinScheduler.


Quartz Entity-Relationship Diagram

Image description

  1. QRTZ_JOB_DETAILS and QRTZ_TRIGGERS are the central tables, defining the relationship between jobs and triggers.
  2. The QRTZ_TRIGGERS table links with multiple trigger type tables, such as QRTZ_SIMPLE_TRIGGERS and QRTZ_CRON_TRIGGERS, to enable different trigger mechanisms.
  3. QRTZ_FIRED_TRIGGERS records execution history, associating with both the job and trigger tables.
  4. QRTZ_CALENDARS defines calendar exclusion rules for triggers, while QRTZ_PAUSED_TRIGGER_GRPS manages the pause state of trigger groups.
  5. QRTZ_SCHEDULER_STATE and QRTZ_LOCKS are used for scheduling coordination in clustered environments, ensuring high availability.

Quartz in DolphinScheduler

We will focus on the principle analysis of the use of quartz in DolphinScheduler, so the steps for using quartz in DolphinScheduler will be introduced briefly. Generally speaking, there are 4 steps:

  • Creating a New SHELL Task

  • Defining and Configuring Workflow Scheduling

  • Scheduling Activation

  • Workflow Instance Execution

Principle Analysis

Creating a Schedule

org.apache.dolphinscheduler.api.controller.SchedulerController#createSchedule  
--org.apache.dolphinscheduler.api.service.impl.SchedulerServiceImpl#insertSchedule  
...  
Schedule scheduleObj = new Schedule();  
Date now = new Date();  
scheduleObj.setTenantCode(tenantCode);  
scheduleObj.setProjectName(project.getName());  
scheduleObj.setProcessDefinitionCode(processDefineCode);  
scheduleObj.setProcessDefinitionName(processDefinition.getName());  
ScheduleParam scheduleParam = JSONUtils.parseObject(schedule, ScheduleParam.class);  
scheduleObj.setCrontab(scheduleParam.getCrontab());  
scheduleObj.setTimezoneId(scheduleParam.getTimezoneId());  
scheduleObj.setWarningType(warningType);  
scheduleObj.setWarningGroupId(warningGroupId);  
scheduleObj.setFailureStrategy(failureStrategy);  
scheduleObj.setCreateTime(now);  
scheduleObj.setUpdateTime(now);  
scheduleObj.setUserId(loginUser.getId());  
scheduleObj.setUserName(loginUser.getUserName());  
scheduleObj.setReleaseState(ReleaseState.OFFLINE);  
scheduleObj.setProcessInstancePriority(processInstancePriority);  
scheduleObj.setWorkerGroup(workerGroup);  
scheduleObj.setEnvironmentCode(environmentCode);  
scheduleMapper.insert(scheduleObj);  
...
Enter fullscreen mode Exit fullscreen mode

At its core, this operation inserts a new entry into the schedule table, as shown below:

Image description

Activating a Schedule

org.apache.dolphinscheduler.api.controller.SchedulerController#publishScheduleOnline  
--org.apache.dolphinscheduler.api.service.impl.SchedulerServiceImpl#onlineScheduler  
----org.apache.dolphinscheduler.api.service.impl.SchedulerServiceImpl#doOnlineScheduler  
------org.apache.dolphinscheduler.scheduler.quartz.QuartzScheduler#insertOrUpdateScheduleTask  

// Simplified code:  
JobKey jobKey = QuartzTaskUtils.getJobKey(schedule.getId(), projectId);  
Map<String, Object> jobDataMap = QuartzTaskUtils.buildDataMap(projectId, schedule);  
String cronExpression = schedule.getCrontab();  
String timezoneId = schedule.getTimezoneId();  

Date startDate = DateUtils.transformTimezoneDate(schedule.getStartTime(), timezoneId);  
Date endDate = DateUtils.transformTimezoneDate(schedule.getEndTime(), timezoneId);  
JobDetail jobDetail = newJob(ProcessScheduleTask.class).withIdentity(jobKey).build();  
jobDetail.getJobDataMap().putAll(jobDataMap);  
scheduler.addJob(jobDetail, false, true);  

TriggerKey triggerKey = new TriggerKey(jobKey.getName(), jobKey.getGroup());  
CronTrigger cronTrigger = newTrigger()  
                    .withIdentity(triggerKey)  
                    .startAt(startDate)  
                    .endAt(endDate)  
                    .withSchedule(  
                            cronSchedule(cronExpression)  
                                    .withMisfireHandlingInstructionIgnoreMisfires()  
                                    .inTimeZone(DateUtils.getTimezone(timezoneId)))  
                    .forJob(jobDetail).build();  

scheduler.scheduleJob(cronTrigger);  
Enter fullscreen mode Exit fullscreen mode

Related Tables

Job Details Table: Stores detailed information about each task.

Image description

Trigger Base Table: Stores basic information for all trigger types.

Image description

Cron Trigger Table: Stores information about Cron expression triggers.

Image description


Schedule Execution

org.apache.dolphinscheduler.scheduler.quartz.ProcessScheduleTask

protected void executeInternal(JobExecutionContext context) {  
    JobDataMap dataMap = context.getJobDetail().getJobDataMap();  
    int projectId = dataMap.getInt(QuartzTaskUtils.PROJECT_ID);  
    int scheduleId = dataMap.getInt(QuartzTaskUtils.SCHEDULE_ID);  

    Date scheduledFireTime = context.getScheduledFireTime();  
    Date fireTime = context.getFireTime();  

    Command command = new Command();  
    command.setCommandType(CommandType.SCHEDULER);  
    command.setExecutorId(schedule.getUserId());  
    command.setFailureStrategy(schedule.getFailureStrategy());  
    command.setProcessDefinitionCode(schedule.getProcessDefinitionCode());  
    command.setScheduleTime(scheduledFireTime);  
    command.setStartTime(fireTime);  
    command.setWarningGroupId(schedule.getWarningGroupId());  
    String workerGroup = StringUtils.isEmpty(schedule.getWorkerGroup()) ? Constants.DEFAULT_WORKER_GROUP  
            : schedule.getWorkerGroup();  
    command.setWorkerGroup(workerGroup);  
    command.setTenantCode(schedule.getTenantCode());  
    command.setEnvironmentCode(schedule.getEnvironmentCode());  
    command.setWarningType(schedule.getWarningType());  
    command.setProcessInstancePriority(schedule.getProcessInstancePriority());  
    command.setProcessDefinitionVersion(processDefinition.getVersion());  

    commandService.createCommand(command);  
}
Enter fullscreen mode Exit fullscreen mode

Essentially, this is a callback function in Quartz that ultimately generates a Command.

bigdata Article's
30 articles in total
Favicon
Rethinking distributed systems: Composability, scalability
Favicon
When to use Apache Xtable or Delta Lake Uniform for Data Lakehouse Interoperability
Favicon
Using Apache Parquet to Optimize Data Handling in a Real-Time Ad Exchange Platform
Favicon
The Columnar Approach: A Deep Dive into Efficient Data Storage for Analytics ๐Ÿš€
Favicon
Construyendo una aplicaciรณn con Change Data Capture (CDC) utilizando Debezium, Kafka y NiFi
Favicon
[Boost]
Favicon
Please read out this article
Favicon
Goodbye Kafka: Build a Low-Cost User Analysis System
Favicon
MapReduce - A Simplified Approach to Big Data Processing
Favicon
Query 1B Rows in PostgreSQL >25x Faster with Squirrels!
Favicon
Introduction to Hadoop:)
Favicon
Big Data Trends That Will Impact Your Business In 2025
Favicon
The Heart of DolphinScheduler: In-Depth Analysis of the Quartz Scheduling Framework
Favicon
SQL Filtering and Sorting with Real-life Examples
Favicon
Platform to practice PySpark Questions
Favicon
Big Data
Favicon
Introduction to Data lakes: The future of big data storage
Favicon
5 effektive Methoden, um Bilder aus Webseiten zu extrahieren
Favicon
The Apache Icebergโ„ข Small File Problem
Favicon
System Design 09 - Data Partitioning: Dividing to Conquer Big Data
Favicon
Understanding Star Schema vs. Snowflake Schema
Favicon
How IoT and Big Data Work Together: A Powerful Synergy
Favicon
Why Pangaea X is the Go-To Freelance Platform for Data Analysts
Favicon
Introduction to Messaging Systems with Kafka
Favicon
Best Practices for Data Security in Big Data Projects
Favicon
๐Ÿš€ Unlock the Power of ORC File Format ๐Ÿ“Š
Favicon
๐Ÿš€ Real-time YouTube Comment Sentiment Analysis with Kafka, Spark, Docker, and Streamlit ๐Ÿš€
Favicon
Bird Species
Favicon
SeaTunnel-Powered Data Integration: How 58 Group Handles Over 500 Billion+ Data Points Daily
Favicon
5 Big Data Use Cases that Retailers Fail to Use for Actionable Insights

Featured ones: