Logo

dev-resources.site

for different kinds of informations.

Dynamic Allocation Issues On Spark 2.4.8 (Possible Issue with External Shuffle Service?)

Published at
10/14/2024
Categories
spark
Author
mr_boom_boom
Categories
1 categories in total
spark
open
Author
12 person written this
mr_boom_boom
open
Dynamic Allocation Issues On Spark 2.4.8 (Possible Issue with External Shuffle Service?)

Hey Team,

I am having some issue with dynamic Allocation for spark 2.4.8. I have setup a cluster using your clemlab distribution (https://www.clemlab.com/) . Spark jobs are now running fine. The issue is when I try to use dynamicAllocation options. I am thinking the problems could be due to External Shuffle Service but I feel like it should be setup properly from what I have.

From the resource manager logs we can see that the container goes from ACQUIRED to RELEASED resources which is weird. It does not go to RUNNING state.

I am out of ideas at this point how to make the dynamic Allocation work. So I am turning to you in hope that you may have some insight in the matter.

There are no issues if I do not use dynamic Allocation and spark jobs work just fine but I really want to make dynamic allocation work.

Thank you for the assistance and apologies for the long message but just wanted to supply all details possible.

Here are setting I have in ambari related to it:

Yarn:

Image description

Checking the directories here I can find necessary jar on all nodemanager hosts in the right directory:
/usr/odp/1.2.2.0-138/spark2/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar
/usr/odp/current/spark2-cient/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar ( I believe there is symbolic link to the above jar)

Spark2:

Image description

 In the spark log I can see this message continuously spamming:

24/10/13 16:38:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:46 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:01 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

spark Article's
30 articles in total
Favicon
Like IDE for SparkSQL: Support Pycharm! SparkSQLHelper v2025.1.1 released
Favicon
Enhancing Data Security with Spark: A Guide to Column-Level Encryption - Part 2
Favicon
Time-saver: This IDEA plugin can help you write SparkSQL faster
Favicon
How to Migrate Massive Data in Record Time—Without a Single Minute of Downtime 🕑
Favicon
Why Is Spark Slow??
Favicon
Like IDE for SparkSQL: SparkSQLHelper v2024.1.4 released
Favicon
Mastering Dynamic Allocation in Apache Spark: A Practical Guide with Real-World Insights
Favicon
Auditoria massiva com Lineage Tables do UC no Databricks
Favicon
Platform to practice PySpark Questions
Favicon
Exploring Apache Spark:
Favicon
Big Data
Favicon
Dynamic Allocation Issues On Spark 2.4.8 (Possible Issue with External Shuffle Service?)
Favicon
Entendendo e aplicando estratégias de tunning Apache Spark
Favicon
[API Databricks como serviço interno] dbutils — notebook.run, widgets.getArgument, widgets.text e notebook_params
Favicon
Análise de dados de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka
Favicon
My journey learning Apache Spark
Favicon
Integrating Elasticsearch with Spark
Favicon
Advanced Deduplication Using Apache Spark: A Guide for Machine Learning Pipelines
Favicon
Journey Through Spark SQL
Favicon
Choosing the Right Real-Time Stream Processing Framework
Favicon
Top 5 Things You Should Know About Spark
Favicon
PySpark optimization techniques
Favicon
End-to-End Realtime Streaming Data Engineering Project
Favicon
Machine Learning with Spark and Groovy
Favicon
Hadoop/Spark is too heavy, esProc SPL is light
Favicon
Leveraging PySpark.Pandas for Efficient Data Pipelines
Favicon
Databricks - Variant Type Analysis
Favicon
Comprehensive Guide to Schema Inference with MongoDB Spark Connector in PySpark
Favicon
Troubleshooting Kafka Connectivity with spark streaming
Favicon
Apache Spark 101

Featured ones: