Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving and loading a trained model #7

Open
emremrah opened this issue Jul 12, 2023 · 0 comments
Open

Saving and loading a trained model #7

emremrah opened this issue Jul 12, 2023 · 0 comments

Comments

@emremrah
Copy link

I run this using run.sh and trained a classification model using Spark ML. After training, I wanted to save the model.

I tried model.write().overwrite().save('spark-model'). This creates a spark-model directory but only saves the "_SUCCESS" files in it; no actual model fies were saved.

Then I checked if they are in workers' files and they were in /home/jovyan/work in workers' file system:
image

When I collect the files into one place and tried to load the model using PipelineModel.load, I get this error:

----> [3](vscode-notebook-cell:/home/emre/etiya/stuff/mongo-spark-jupyter/Untitled.ipynb#Y113sZmlsZQ%3D%3D?line=2) pipeline_model = PipelineModel.load('spark-model')

File [/usr/local/spark/python/pyspark/ml/util.py:332](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:332), in MLReadable.load(cls, path)
    329 @classmethod
    330 def load(cls, path):
    331     """Reads an ML instance from the input path, a shortcut of `read().load(path)`."""
--> 332     return cls.read().load(path)

File [/usr/local/spark/python/pyspark/ml/pipeline.py:256](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/pipeline.py:256), in PipelineModelReader.load(self, path)
    255 def load(self, path):
--> 256     metadata = DefaultParamsReader.loadMetadata(path, self.sc)
    257     if 'language' not in metadata['paramMap'] or metadata['paramMap']['language'] != 'Python':
    258         return JavaMLReader(self.cls).load(path)

File [/usr/local/spark/python/pyspark/ml/util.py:525](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:525), in DefaultParamsReader.loadMetadata(path, sc, expectedClassName)
    514 """
    515 Load metadata saved using :py:meth:`DefaultParamsWriter.saveMetadata`
    516 
   (...)
    522     If non empty, this is checked against the loaded metadata.
    523 """
    524 metadataPath = os.path.join(path, "metadata")
--> 525 metadataStr = sc.textFile(metadataPath, 1).first()
    526 loadedVals = DefaultParamsReader._parseMetaData(metadataStr, expectedClassName)
    527 return loadedVals

File [/usr/local/spark/python/pyspark/rdd.py:1591](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/rdd.py:1591), in RDD.first(self)
   1589 if rs:
   1590     return rs[0]
-> 1591 raise ValueError("RDD is empty")

ValueError: RDD is empty

How can I save and load the models without issues? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant