为什么我的glob.glob循环没有遍历文件夹中的所有文本文件？

Question

为什么我的glob.glob循环没有遍历文件夹中的所有文本文件？

4

我正在尝试使用Python 3从包含文本文档的文件夹中读取内容，这是LingSpam电子邮件垃圾邮件数据集的修改版。我期望我编写的代码能够返回所有1893个文本文档的名称，但实际上它只返回了前420个文件名。我不明白为什么它没有返回全部文件名。有任何想法吗？

if not os.path.exists('train'):  # download data
  from urllib.request import urlretrieve
  import tarfile
  urlretrieve('http://cs.iit.edu/~culotta/cs429/lingspam.tgz', 'lingspam.tgz')
  tar = tarfile.open('lingspam.tgz')
  tar.extractall()
  tar.close()
abc = []
for f in glob.glob("train/*.txt"):
  print(f)
  abc.append(f)
print(len(abc))

我已经尝试更改全局参数，但仍然没有成功。

编辑：显然我的代码对于其他人来说是有效的。这里是我的输出

- Codarus

5

你的代码对我来说运行得很好：https://asciinema.org/a/39x9vuca48gd7fieugpkicbbt - larsks

你尝试使用绝对路径了吗？ - Chris Ghyzel

嗯，这非常奇怪。为什么它对你们来说的工作方式与我不同呢？ - Codarus

2

标题为“glob.glob gloop”，点赞支持有趣的标题。 - Winter

你尝试过使用反斜杠glob.glob("train\*.txt")吗？或者使用glob.glob(os.path.join("train", "*.txt"))以实现跨平台。 - user707650

显示剩余2条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Codarus · Answer 1

成功！问题是

if not os.path.exists('train'):  # download data

为了检查我的输出，我实际上已经将文件下载到了我的电脑上，由于这行代码检查文件夹是否存在，而它确实存在，这就导致了问题。我从我的机器上删除了这些文件，现在它正常工作，虽然我怀疑运行...

  from urllib.request import urlretrieve
  import tarfile
  urlretrieve('http://cs.iit.edu/~culotta/cs429/lingspam.tgz', 'lingspam.tgz')
  tar = tarfile.open('lingspam.tgz')
  tar.extractall()
  tar.close()

没有 if 语句也会得到相同的结果。