OSError: 无法识别图像文件 <_io.BufferedReader>

3
我正在进行神经网络训练代码的移植。我将此代码编写为Udacity项目的一部分,并在Udacity环境中正常运行。
现在,我将代码移植到运行Ubuntu 18.04和Python 3.6.8的Nvidia Jetson Nano上。
在迭代训练数据时,某种方式下划线符号“._”会出现在文件路径之前,导致错误消息。
当我运行文件时,会收到以下错误消息:
Traceback (most recent call last):
  File "train_rev6.py", line 427, in <module>
    main()
  File "train_rev6.py", line 419, in main
    train_model(in_args)
  File "train_rev6.py", line 221, in train_model
    for inputs, labels in trainloader:
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 560, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 560, in <listcomp>
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/folder.py", line 132, in __getitem__
    sample = self.loader(path)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/folder.py", line 178, in default_loader
    return pil_loader(path)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/folder.py", line 160, in pil_loader
    img = Image.open(f)
  File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 2705, in open
    % (filename if filename else fp))
OSError: cannot identify image file <_io.BufferedReader name='/home/mme/Documents/001_UdacityFinalProjectFlowersRev2/flowers/train/40/._image_04589.jpg'>

我怀疑错误是由于文件名前的"._"造成的,因为这不是文件名的一部分,当我提示时出现了问题。翻译成中文:"我怀疑这个错误是由于文件名前的 "._" 造成的,因为这不是文件名的一部分。当我提示时出现了问题。"
sudo find / -name image_00824.jpg

I get the correct path:

/home/mme/Documents/001_UdacityFinalProjectFlowersRev2/flowers/train/81/image_00824.jpg

文件名前不要加上"._"。

我的问题似乎与OSError: cannot identify image file中的问题相同。

(按照答案中建议的从PIL import Image;Image.open(open("path/to/file", 'rb'))进行调整和运行不会出现错误信息。)

文件路径在命令行中给出:

python3 train_rev6.py --file_path "/home/mme/Documents/001_UdacityFinalProjectFlowersRev2/flowers" --arch "vgg16" --epochs 5 --gpu "gpu" --running_loss True --valid_loss True --valid_accuracy True --test True

下面的代码显示了两个相关函数。
有什么办法可以去掉这个“._”?
最初的回答:
def load_data(in_args):
    """
    Function to:
        - Specify diretories for training, validation and test set.
        - Define your transforms for the training, validation and testing sets.
        - Load the datasets with ImageFolder.
        - Using the image datasets and the trainforms, define the dataloaders.
        - Label mapping.
    """
    # Specify diretories for training, validation and test set.
    data_dir = in_args.file_path
    train_dir = data_dir + "/train"
    valid_dir = data_dir + "/valid"
    test_dir = data_dir + "/test"

    # Define your transforms for the training, validation, and testing sets
    # Means: [0.485, 0.456, 0.406]. Standard deviations [0.229, 0.224, 0.225]. Calculated by ImageNet images.
    # Transformation on training set: random rotation, random resized crop to 224 x 224 pixels, random horizontal and vertical flip, tranform to a tensor and normalize data.
    train_transforms = transforms.Compose([transforms.RandomRotation(23),
                                           transforms.RandomResizedCrop(224),
                                           transforms.RandomHorizontalFlip(),
                                           transforms.RandomVerticalFlip(),
                                           transforms.ToTensor(),
                                           transforms.Normalize([0.485, 0.456, 0.406],
                                                                [0.229, 0.224, 0.225])])

    # Transformation on validation set: resize and center crop to 224 x 224 pixels, tranform to a tensor and normalize data.
    valid_transforms = transforms.Compose([transforms.Resize(255),
                                           transforms.CenterCrop(224),
                                           transforms.ToTensor(),
                                           transforms.Normalize([0.485, 0.456, 0.406],
                                                                [0.229, 0.224, 0.225])])

    # Transformation on test set: resize and center crop to 224 x 224 pixels, tranform to a tensor and normalize data.
    test_transforms = transforms.Compose([transforms.Resize(255),
                                          transforms.CenterCrop(224),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.485, 0.456, 0.406],
                                                               [0.229, 0.224, 0.225])])

    # Load the datasets with ImageFolder
    global train_dataset
    global valid_dataset
    global test_dataset
    train_dataset = datasets.ImageFolder(data_dir + "/train", transform=train_transforms)
    valid_dataset = datasets.ImageFolder(data_dir + "/valid", transform=valid_transforms)
    test_dataset = datasets.ImageFolder(data_dir + "/test", transform=test_transforms)

    # Using the image datasets and the trainforms, define the dataloaders, as global variables.
    global trainloader
    global validloader
    global testloader
    trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
    validloader = torch.utils.data.DataLoader(valid_dataset, batch_size=64)
    testloader = torch.utils.data.DataLoader(test_dataset, batch_size=64)

    # Label mapping.
    global cat_to_name
    with open("cat_to_name.json", "r") as f:
        cat_to_name = json.load(f)

    print("Done loading data...")

    return

def train_model(in_args):
    """
    Function to build and train model.
    """
    # Number of epochs.
    global epochs
    epochs = in_args.epochs
    # Set running_loss to 0
    running_loss = 0

    # Prepare lists to print losses and accuracies.
    global list_running_loss
    global list_valid_loss
    global list_valid_accuracy
    list_running_loss, list_valid_loss, list_valid_accuracy = [], [], []

    # If in testing mode, set loop counter to prematurly return to the main().
    if in_args.test == True:
        loop_counter = 0

    # for loop to train model.
    for epoch in range(epochs):
        # for loop to iterate through training dataloader.
        for inputs, labels in trainloader:
            # If in testing mode, increase loop counter to prematurly return to the main() after 5 loops.
            if in_args.test == True:
                loop_counter +=1
                if loop_counter == 5:
                    return

            # Move input and label tensors to the default device.
            inputs, labels = inputs.to(device), labels.to(device)

            # Set gradients to 0 to avoid accumulation
            optimizer.zero_grad()

            # Forward pass, back propagation, gradient descent and updating weights and bias.
            # Forward pass through model to get log of probabilities.
            log_ps = model.forward(inputs)
            # Calculate loss of model output based on model prediction and labels.
            loss = criterion(log_ps, labels)
            # Back propagation of loss through model / gradient descent.
            loss.backward()
            # Update weights / gradient descent.
            optimizer.step()

            # Accumulate loss for training image set for print out in terminal
            running_loss += loss.item()

            # Calculate loss for verification image set and accuracy for print out in terminal.
            # Validation pass and print out the validation accuracy.
            # Set loss of validation set and accuracy to 0.
            valid_loss = 0
            # test_loss = 0
            valid_accuracy = 0
            # test_accuracy = 0

            # Set model to evaluation mode to turn off dropout so all images in the validation & test set are passed through the model.
            model.eval()

            # Turn off gradients for validation, saves memory and computations.
            with torch.no_grad():
                # for loop to evaluate loss of validation image set and its accuracy.
                for valid_inputs, valid_labels in validloader:
                    # Move input and label tensors to the default device.
                    valid_inputs, valid_labels = valid_inputs.to(device), valid_labels.to(device)

                    # Run validation image set through model.
                    valid_log_ps = model.forward(valid_inputs)

                    # Calculate loss for validation image set.
                    valid_batch_loss = criterion(valid_log_ps, valid_labels)

                    # Accumulate loss for validation image set.
                    valid_loss += valid_batch_loss.item()

                    # Calculate probabilities
                    valid_ps = torch.exp(valid_log_ps)

                    # Get the most likely class using the ps.topk method.
                    valid_top_k, valid_top_class = valid_ps.topk(1, dim=1)

                    # Check if the predicted classes match the labels.
                    valid_equals = valid_top_class == valid_labels.view(*valid_top_class.shape)

                    # Calculate the percentage of correct predictions.
                    valid_accuracy += torch.mean(valid_equals.type(torch.FloatTensor)).item()

            # Print out losses and accuracies
            # Create string for running_loss.
            str1 = ["Train loss: {:.3f} ".format(running_loss) if in_args.running_loss == True else ""]
            str1 = "".join(str1)
            # Create string for valid_loss.
            str2 = ["Valid loss: {:.3f} ".format(valid_loss/len(validloader)) if in_args.valid_loss == True else ""]
            str2 = "".join(str2)
            # Create string for valid_accuracy.
            str3 = ["Valid accuracy: {:.3f} ".format(valid_accuracy/len(validloader)) if in_args.valid_accuracy == True else ""]
            str3 = "".join(str3)
            # Print strings
            print(f"{epoch+1}/{epochs} " + str1 + str2 + str3)

            # Append current losses and accuracy to lists to print losses and accuracies.
            list_running_loss.append(running_loss)
            list_valid_loss.append(valid_loss/len(validloader))
            list_valid_accuracy.append(valid_accuracy/len(validloader))

            # Set running_loss to 0.
            running_loss = 0

            # Set model back to train mode.
            model.train()

    print("Done training model...")

    return

请问,文件(._image_04589.jpg)的内容是什么?我对这个库不太熟悉,但看起来它只是遍历了训练目录下的所有文件,并尝试将它们作为图像加载,如果那里有非图像文件的话,可能会出现问题。 - undefined
模型预测花的类型。该模型经过102个类别的训练,即使用102种不同花卉类型的花卉图像进行训练(在上面的示例中是第40类,这就是为什么错误消息中的文件路径中有一个40)。每个102个文件夹包含该特定花卉类型的不同图像,供模型进行训练。这些图像实际上是存在的,我可以打开它们。因此,我认为问题与文件内容无关。 - undefined
FYI,我还复制了文件到更高层级的文件夹中,以缩短文件路径。但这并没有解决问题。 - undefined
FYI,我在OS X上尝试了相同的代码,没有任何问题地运行。 - undefined
1个回答

0
一个同事在工作中指出,在Linux中,以句点开头的文件是隐藏文件。所以我在文件浏览器中选择了“显示隐藏文件”,然后它们就出现了。我删除了它们,问题得到解决(请参见下面的命令)。
查找并显示所有以“._”开头的文件(首先显示选定的文件,以确保这些是您要删除的文件):
find test -name '._*' -print

在所有子文件夹中查找并删除以“._”开头的所有文件。
find test -name '._*' -delete

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接