Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pathlib.Path in datasets 2.19.0 #6935

Open
lamyiowce opened this issue May 30, 2024 · 0 comments
Open

Support for pathlib.Path in datasets 2.19.0 #6935

lamyiowce opened this issue May 30, 2024 · 0 comments

Comments

@lamyiowce
Copy link

Describe the bug

After the recent update of datasets, Dataset.save_to_disk does not accept a pathlib.Path anymore. It was supported in 2.18.0 and previous versions. Is this intentional? Was it supported before only because of a Python dusk-typing miracle?

Steps to reproduce the bug

from datasets import Dataset
import pathlib

path = pathlib.Path("./my_out_path")
Dataset.from_dict(
    {"text": ["hello world"], "label": [777], "split": ["train"]}
.save_to_disk(path)

This results in an error when using datasets 2.19:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/jb/scratch/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 1515, in save_to_disk
    fs, _ = url_to_fs(dataset_path, **(storage_options or {}))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jb/scratch/venv/lib/python3.11/site-packages/fsspec/core.py", line 383, in url_to_fs
    chain = _un_chain(url, kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jb/scratch/venv/lib/python3.11/site-packages/fsspec/core.py", line 323, in _un_chain
    if "::" in path
       ^^^^^^^^^^^^
TypeError: argument of type 'PosixPath' is not iterable

Converting to str works, however.

Dataset.from_dict(
     {"text": ["hello world"], "label": [777], "split": ["train"]}
).save_to_disk(str(path))

Expected behavior

My dataset gets saved to disk without an error.

Environment info

aiohttp==3.9.5
aiosignal==1.3.1
attrs==23.2.0
certifi==2024.2.2
charset-normalizer==3.3.2
datasets==2.19.0
dill==0.3.8
filelock==3.14.0
frozenlist==1.4.1
fsspec==2024.3.1
huggingface-hub==0.23.2
idna==3.7
multidict==6.0.5
multiprocess==0.70.16
numpy==1.26.4
packaging==24.0
pandas==2.2.2
pyarrow==16.1.0
pyarrow-hotfix==0.6
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
requests==2.32.3
six==1.16.0
tqdm==4.66.4
typing_extensions==4.12.0
tzdata==2024.1
urllib3==2.2.1
xxhash==3.4.1
yarl==1.9.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant