Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE] clusters.create_and_wait not accepting dict-input from configuration file. #690

Open
tseader opened this issue Jun 28, 2024 · 1 comment

Comments

@tseader
Copy link

tseader commented Jun 28, 2024

Description
I am attempting to create a compute cluster using the Python SDK while sourcing a cluster-create configuration JSON file, which is how it's done for the databricks-cli (e.g. databricks clusters create --json @my/path/to/cluster-create.json) and what databricks provides through the GUI. Reading in the JSON as a Dict fails due to the assumption in the SDK that the arguments are of specific DataClass types, e.g.:

>       if autoscale is not None: body['autoscale'] = autoscale.as_dict()
E       AttributeError: 'dict' object has no attribute 'as_dict'

Reproduction

Trimmed down cluster-create example JSON config:

{
    "cluster_name": "databricks-poc",
    "spark_version": "15.3.x-scala2.12",
    "spark_conf": {},
    "gcp_attributes": {
        "use_preemptible_executors": false,
        "availability": "ON_DEMAND_GCP",
        "zone_id": "auto"
    },
    "node_type_id": "e2-highmem-2",
    "spark_env_vars": {},
    "autotermination_minutes": 60,
    "enable_elastic_disk": true,
    "data_security_mode": "USER_ISOLATION",
    "runtime_engine": "STANDARD",
    "autoscale": {
        "min_workers": 1,
        "max_workers": 1
    }
}
from databricks.sdk import WorkspaceClient

db_client = WorkspaceClient()
with open("my/path/to/cluster-create.json") as file:
    create_config = json.load(file)
db_client.clusters.create_and_wait(**create_config)

Expected behavior
I expect by passing in the dict of the cluster configuration, the SDK would handle casting. Maybe not this method, but perhaps another method created to do similar.

Is it a regression?
No

Debug Logs
N/A I don't think.

Additional context
I can work through this by implementing my own custom solution by working through casting to the appropriate data classes, but I'm hoping maybe I'm just missing the pattern or this pattern is helpful for more than just me.

@tseader
Copy link
Author

tseader commented Jun 28, 2024

Here's what I came up with to get around the situation:

def create_compute_cluster(db_client: WorkspaceClient, cluster_conf: dict) -> str:
    cc = CreateCluster.from_dict(cluster_conf)
    refactored_input = dict()
    for field in list(cc.__dataclass_fields__.keys()):
        refactored_input[field] = cc.__getattribute__(field)
    return db_client.clusters.create_and_wait(**refactored_input, timeout=CLUSTER_UP_TIMEOUT)

I could also see the function reading the json file more like this:

def create_compute_cluster(db_client: WorkspaceClient, create_config_path: dict) -> str:
    with open(create_config_path) as file:
        create_config = json.load(file)
    cc = CreateCluster.from_dict(create_config)
    refactored_input = dict()
    for field in list(cc.__dataclass_fields__.keys()):
        refactored_input[field] = cc.__getattribute__(field)
    return db_client.clusters.create_and_wait(**refactored_input, timeout=CLUSTER_UP_TIMEOUT)

What may make sense is some additional functions in ClustersAPI class unless overloading is preferred using multipledispatch. All this assumes there's a need outside my own to do this type of pattern. 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant