Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal][Task Plugin] Add Aliyun EMR Serverless Task Plugin #16127

Open
3 tasks done
EricGao888 opened this issue Jun 7, 2024 · 0 comments
Open
3 tasks done

[Proposal][Task Plugin] Add Aliyun EMR Serverless Task Plugin #16127

EricGao888 opened this issue Jun 7, 2024 · 0 comments
Assignees
Labels
feature new feature

Comments

@EricGao888
Copy link
Member

EricGao888 commented Jun 7, 2024

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Purpose

  • Aliyun EMR Serverless Spark is a cloud-native, fully-managed serverless product designed specifically for large-scale data processing and analysis. It provides enterprises with a one-stop data platform service, including task development, debugging, scheduling, and operations maintenance, which greatly simplifies the entire data processing lifecycle. By using EMR Serverless Spark, enterprises can focus more on data processing and analysis, thus improving work efficiency.
  • This proposal aims to provide the DolphinScheduler community users with an out-of-box plugin which they could use to conveniently interact with Aliyun EMR Serverless Spark.

Environment for Manual Testing

  • Aliyun EMR Serverless Spark is currently in public beta and free to use until the end of 25th, June, 2024. I have added detailed documents with examples in the Issue & PR. Reviewers / Users interested in this plugin could test it free of charge.

Maintenance

  • I will fix any known bugs / security vulnerabilities directly related to Aliyun Serverless Spark task plugin.
  • I will submit a test report for Aliyun Serverless Spark task plugin every time before a new DolphinScheduler release to make sure the plugin functions well.
  • The community could retire this plugin if test report not received before release for three consecutive times.

Detailed Examples

Submit Jar tasks

Parameters Example Values / Operations
region id cn-hangzhou
access key id
access key secret
resource queue id root_queue
code type JAR
job name ds-emr-spark-jar
entry point oss://datadev-oss-hdfs-test/spark-resource/examples/jars/spark-examples_2.12-3.3.1.jar
entry point arguments 100
spark submit parameters --class org.apache.spark.examples.SparkPi --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1
engine release version esr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is production Please open the switch

Submit SQL tasks

Parameters Example Values / Operations
region id cn-hangzhou
access key id
access key secret
resource queue id root_queue
code type SQL
job name ds-emr-spark-sql-1
entry point Any non-empty string
entry point arguments -e#show tables;show tables;
spark submit parameters --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1
engine release version esr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is production Please open the switch

Submit SQL tasks located in OSS

Parameters Example Values / Operations
region id cn-hangzhou
access key id
access key secret
resource queue id root_queue
code type SQL
job name ds-emr-spark-sql-2
entry point Any non-empty string
entry point arguments -f#oss://datadev-oss-hdfs-test/spark-resource/examples/sql/show_db.sql
spark submit parameters --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1"
engine release version esr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is production Please open the switch

Submit PySpark Tasks

Parameters Example Values / Operations
region id cn-hangzhou
access key id
access key secret
resource queue id root_queue
code type PYTHON
job name ds-emr-spark-python
entry point oss://datadev-oss-hdfs-test/spark-resource/examples/src/main/python/pi.py
entry point arguments 100
spark submit parameters --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1
engine release version esr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is production Please open the switch

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature
Projects
None yet
Development

No branches or pull requests

1 participant