[Proposal][Task Plugin] Add Aliyun EMR Serverless Task Plugin #16127

EricGao888 · 2024-06-07T09:05:54Z

Purpose

Aliyun EMR Serverless Spark is a cloud-native, fully-managed serverless product designed specifically for large-scale data processing and analysis. It provides enterprises with a one-stop data platform service, including task development, debugging, scheduling, and operations maintenance, which greatly simplifies the entire data processing lifecycle. By using EMR Serverless Spark, enterprises can focus more on data processing and analysis, thus improving work efficiency.
This proposal aims to provide the DolphinScheduler community users with an out-of-box plugin which they could use to conveniently interact with Aliyun EMR Serverless Spark.

Aliyun EMR Serverless Spark is currently in public beta and free to use until the end of 25th, June, 2024. I have added detailed documents with examples in the Issue & PR. Reviewers / Users interested in this plugin could test it free of charge.

I will fix any known bugs / security vulnerabilities directly related to Aliyun Serverless Spark task plugin.
I will submit a test report for Aliyun Serverless Spark task plugin every time before a new DolphinScheduler release to make sure the plugin functions well.
The community could retire this plugin if test report not received before release for three consecutive times.

Parameters	Example Values / Operations
region id	cn-hangzhou
access key id
access key secret
resource queue id	root_queue
code type	JAR
job name	ds-emr-spark-jar
entry point	oss://datadev-oss-hdfs-test/spark-resource/examples/jars/spark-examples_2.12-3.3.1.jar
entry point arguments	100
spark submit parameters	--class org.apache.spark.examples.SparkPi --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1
engine release version	esr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is production	Please open the switch

Parameters	Example Values / Operations
region id	cn-hangzhou
access key id
access key secret
resource queue id	root_queue
code type	SQL
job name	ds-emr-spark-sql-1
entry point	Any non-empty string
entry point arguments	-e#show tables;show tables;
spark submit parameters	--class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1
engine release version	esr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is production	Please open the switch

Parameters	Example Values / Operations
region id	cn-hangzhou
access key id
access key secret
resource queue id	root_queue
code type	SQL
job name	ds-emr-spark-sql-2
entry point	Any non-empty string
entry point arguments	-f#oss://datadev-oss-hdfs-test/spark-resource/examples/sql/show_db.sql
spark submit parameters	--class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1"
engine release version	esr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is production	Please open the switch

Parameters	Example Values / Operations
region id	cn-hangzhou
access key id
access key secret
resource queue id	root_queue
code type	PYTHON
job name	ds-emr-spark-python
entry point	oss://datadev-oss-hdfs-test/spark-resource/examples/src/main/python/pi.py
entry point arguments	100
spark submit parameters	--conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1
engine release version	esr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is production	Please open the switch

EricGao888 added the feature new feature label Jun 7, 2024

EricGao888 self-assigned this Jun 7, 2024

EricGao888 mentioned this issue Jun 7, 2024

[Feature] [Task Plugin] Support emr serverless spark #16126

Open