Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dali Loader using a NamedTuple datatype instead of an array #5539

Open
1 task done
rachelglenn opened this issue Jun 24, 2024 · 5 comments
Open
1 task done

Dali Loader using a NamedTuple datatype instead of an array #5539

rachelglenn opened this issue Jun 24, 2024 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@rachelglenn
Copy link

Describe the question.

I am following the example for external input to the dali loader. My datatype going to my model is a NamedTuple. When I try to create the dataloader:
dataloader = DALIGenericIterator(pipeline, ["image"])
I get an error associated with my NamedTuple type:
TypeError: Illegal pipeline output type. The output 0 contains a nested DataNode. Missing list/tuple expansion (*) is the likely cause.

I am not sure how the Dali loader can accept a NamedTuple type. Is it possible? I am not sure what to put for the second argument in the creation of the dataloader iterator (DALIGenericIterator).

Thanks for the help.

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report
@rachelglenn rachelglenn added the question Further information is requested label Jun 24, 2024
@JanuszL
Copy link
Contributor

JanuszL commented Jun 24, 2024

Hi @rachelglenn,

Thank you for reaching out. I'm afraid that you may hit the DALI limitation, however, before we rule other issues out please share a simple code snip we can run on our end that will illustrate your approach and reproduce the problem.

@rachelglenn
Copy link
Author

rachelglenn commented Jun 25, 2024

Here is what I can put together as an example. I hope that I didn't make any small typos


import cupy as cp
import imageio


class model_data(NamedTuple):
    image: torch.Tensor
    lable: torch.Tensor
    filename: str


class ExternalInputGpuIterator(object):
    def __init__(self, batch_size):
        self.images_dir = "../../data/images/"
        self.batch_size = batch_size
        with open(self.images_dir + "file_list.txt", "r") as f:
            self.files = [line.rstrip() for line in f if line != ""]
        shuffle(self.files)

    def __iter__(self):
        self.i = 0
        self.n = len(self.files)
        return self

    def __next__(self):
        batch = []
        labels = []
        filenames = []
        for _ in range(self.batch_size):
            jpeg_filename, label = self.files[self.i].split(" ")
            im = imageio.imread(self.images_dir + jpeg_filename)
            im = cp.asarray(im)
            im = im * 0.6
    
            self.i = (self.i + 1) % self.n
   
           model_data(im.astype(cp.uint8), cp.array([label], dtype=np.uint8), self.files[self.i].split(" "))
           batch.append(model_data)
        return batch

eii_gpu = ExternalInputGpuIterator(batch_size)
pipe_gpu = Pipeline(batch_size=batch_size, num_threads=2, device_id=0)
with pipe_gpu:
    model_data = fn.external_source(source=eii_gpu, device="gpu",  )
    model_data.image= fn.brightness_contrast(model_data.image, contrast=2)
    pipe_gpu.set_outputs(model_data)
train_loader = DALIGenericIterator(pipeline, ["model_data"])

@JanuszL
Copy link
Contributor

JanuszL commented Jun 25, 2024

Hi @rachelglenn,

Thank you for providing the code snippet. However, I get multiple errors running it. Can you please check it on your end?

@rachelglenn
Copy link
Author

rachelglenn commented Jun 26, 2024

Yes, I am not surprised. I am not able to get it to work. This is why I am asking for help of how to use a named Tuple in the datatype for the pipeline. Can you help provide an example using:

class model_data(NamedTuple):
    image: torch.Tensor
    lable: torch.Tensor
    filename: str

@JanuszL
Copy link
Contributor

JanuszL commented Jun 26, 2024

@rachelglenn,

I get errors not related to the issue you raised, for example:

class model_data(NamedTuple):
NameError: name 'NamedTuple' is not defined

After adding:

from collections import namedtuple
import torch```
I get

class model_data(namedtuple):
TypeError: function() argument 'code' must be code, not str

and I'm not sure if I'm running the same code as you anymore. Please update the provided snipped in a way that will show the mentioned error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants