最快/最安全的文件查找/解析方法?

5
在c盘上,我有成千上万个*.foobar文件。它们分布在各种位置(例如子目录)中。这些文件大约为1-64kb大小,并且是纯文本。
我有一个class Foobar(string fileContents)用于强类型化这些.foobar文件。
我的挑战是获取所有位于c盘上的*.foobar文件列表,表示为Foobar对象数组。最快的方法是什么?
我想知道是否有更好的方法(无疑),比如下面的第一种方法是否存在潜在问题(例如I/O并发问题抛出异常):
var files = Directory.EnumerateFiles
                (rootPath, "*.foobar", SearchOption.AllDirectories);

Foobar[] foobars = 
(
    from filePath in files.AsParallel()
    let contents = File.ReadAllText(filePath)
    select new Foobar(contents)
)
.ToArray();

8
并行执行该操作可能并没有带来太多好处;在物理磁盘上搜索文件必然是一个受输入输出限制的操作。 - Daniel Mann
愚蠢的问题:文件搜索是否真的需要磁盘I/O?我认为操作系统内核会将磁盘的文件系统结构缓存到内存中,只在需要时更新,因为该结构与磁盘上的内容是分开的。不是这样吗? - user979672
如果搜索是I/O绑定的,那么.Parallel()唯一能做的就是将new Foobar()操作线程化(这可能需要时间;毕竟它必须解析一个巨大的字符串)。对吗?我想知道为每个new Foobar()启动新线程的成本是否比在单个线程中串行创建new Foobar()对象更昂贵。 - user979672
1
你到底想做什么?如果你想搜索文件内容,考虑使用像Windows索引服务或dtSearch这样的索引服务。 - Steve Danner
1
不管你做什么,它都会很慢。你当前方法最大的问题是非常“不可靠”。在这数千个文件中的一个上加锁,运行一分钟后你将只能看到异常信息。 - Hans Passant
@Hans:我刚刚遇到了这个问题。由于IO异常发生在LINQ语句内部,所以无法捕获异常并继续处理下一个文件。我遇到的第一个异常是访问被拒绝(应用程序没有以管理员身份运行)。请详细说明您的评论作为答案,我很可能会接受它。谢谢! :) - user979672
2个回答

8

因为权限错误(或其他错误)显然可以使枚举停滞不前,所以您可能希望自己实现类似以下的枚举器:

class SafeFileEnumerator : IEnumerable<string>
{
  private string root;
  private string pattern;
  private IList<Exception> errors;
  public SafeFileEnumerator(string root, string pattern)
  {
     this.root = root;
     this.pattern = pattern;
     this.errors = new List<Exception>();
  }

  public SafeFileEnumerator(string root, string pattern, IList<Exception> errors)
  {
     this.root = root;
     this.pattern = pattern;
     this.errors = errors;
  }

  public Exception[] Errors()
  {
     return errors.ToArray();
  }
  class Enumerator : IEnumerator<string>
  {
     IEnumerator<string> fileEnumerator;
     IEnumerator<string> directoryEnumerator;
     string root;
     string pattern;
     IList<Exception> errors;

     public Enumerator(string root, string pattern, IList<Exception> errors)
     {
        this.root = root;
        this.pattern = pattern;
        this.errors = errors;
        fileEnumerator = System.IO.Directory.EnumerateFiles(root, pattern).GetEnumerator();
        directoryEnumerator = System.IO.Directory.EnumerateDirectories(root).GetEnumerator();
     }
     public string Current
     {
        get
        {
           if (fileEnumerator == null) throw new ObjectDisposedException("FileEnumerator");
           return fileEnumerator.Current;
        }
     }

     public void Dispose()
     {
        if (fileEnumerator != null)
           fileEnumerator.Dispose();
        fileEnumerator = null;
        if (directoryEnumerator != null)
           directoryEnumerator.Dispose();
        directoryEnumerator = null;
     }

     object System.Collections.IEnumerator.Current
     {
        get { return Current; }
     }

     public bool MoveNext()
     {
        if ((fileEnumerator != null) && (fileEnumerator.MoveNext()))
           return true;
        while ((directoryEnumerator != null) && (directoryEnumerator.MoveNext()))
        {
           if (fileEnumerator != null)
              fileEnumerator.Dispose();
           try
           {
              fileEnumerator = new SafeFileEnumerator(directoryEnumerator.Current, pattern, errors).GetEnumerator();
           }
           catch (Exception ex)
           {
              errors.Add(ex);
              continue;
           }
           if (fileEnumerator.MoveNext())
              return true;
        }
        if (fileEnumerator != null)
           fileEnumerator.Dispose();
        fileEnumerator = null;
        if (directoryEnumerator != null)
           directoryEnumerator.Dispose();
        directoryEnumerator = null;
        return false;
     }

     public void Reset()
     {
        Dispose();
        fileEnumerator = System.IO.Directory.EnumerateFiles(root, pattern).GetEnumerator();
        directoryEnumerator = System.IO.Directory.EnumerateDirectories(root).GetEnumerator();
     }
  }
  public IEnumerator<string> GetEnumerator()
  {
     return new Enumerator(root, pattern, errors);
  }

  System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  {
     return GetEnumerator();
  }
}

4

做得好,这里提供一个扩展代码,返回FileSystemInfo而不是字符串路径。有一些行上的小修改,比如添加SearchOption(就像本地 .net 一样),以及在获取初始目录时进行错误捕获,以防根文件夹被拒绝访问。再次感谢您发布原始帖子!

public class SafeFileEnumerator : IEnumerable<FileSystemInfo>
{
    /// <summary>
    /// Starting directory to search from
    /// </summary>
    private DirectoryInfo root;

    /// <summary>
    /// Filter pattern
    /// </summary>
    private string pattern;

    /// <summary>
    /// Indicator if search is recursive or not
    /// </summary>
    private SearchOption searchOption;

    /// <summary>
    /// Any errors captured
    /// </summary>
    private IList<Exception> errors;

    /// <summary>
    /// Create an Enumerator that will scan the file system, skipping directories where access is denied
    /// </summary>
    /// <param name="root">Starting Directory</param>
    /// <param name="pattern">Filter pattern</param>
    /// <param name="option">Recursive or not</param>
    public SafeFileEnumerator(string root, string pattern, SearchOption option)
        : this(new DirectoryInfo(root), pattern, option)
    {}

    /// <summary>
    /// Create an Enumerator that will scan the file system, skipping directories where access is denied
    /// </summary>
    /// <param name="root">Starting Directory</param>
    /// <param name="pattern">Filter pattern</param>
    /// <param name="option">Recursive or not</param>
    public SafeFileEnumerator(DirectoryInfo root, string pattern, SearchOption option)
        : this(root, pattern, option, new List<Exception>()) 
    {}

    // Internal constructor for recursive itterator
    private SafeFileEnumerator(DirectoryInfo root, string pattern, SearchOption option, IList<Exception> errors)
    {
        if (root == null || !root.Exists)
        {
            throw new ArgumentException("Root directory is not set or does not exist.", "root");
        }
        this.root = root;
        this.searchOption = option;
        this.pattern = String.IsNullOrEmpty(pattern)
            ? "*"
            : pattern;
        this.errors = errors;
    }

    /// <summary>
    /// Errors captured while parsing the file system.
    /// </summary>
    public Exception[] Errors
    {
        get
        {
            return errors.ToArray();
        }
    }

    /// <summary>
    /// Helper class to enumerate the file system.
    /// </summary>
    private class Enumerator : IEnumerator<FileSystemInfo>
    {
        // Core enumerator that we will be walking though
        private IEnumerator<FileSystemInfo> fileEnumerator;
        // Directory enumerator to capture access errors
        private IEnumerator<DirectoryInfo> directoryEnumerator;

        private DirectoryInfo root;
        private string pattern;
        private SearchOption searchOption;
        private IList<Exception> errors;

        public Enumerator(DirectoryInfo root, string pattern, SearchOption option, IList<Exception> errors)
        {
            this.root = root;
            this.pattern = pattern;
            this.errors = errors;
            this.searchOption = option;

            Reset();
        }

        /// <summary>
        /// Current item the primary itterator is pointing to
        /// </summary>
        public FileSystemInfo Current
        {
            get
            {
                //if (fileEnumerator == null) throw new ObjectDisposedException("FileEnumerator");
                return fileEnumerator.Current as FileSystemInfo;
            }
        }

        object System.Collections.IEnumerator.Current
        {
            get { return Current; }
        }

        public void Dispose()
        {
            Dispose(true, true);
        }

        private void Dispose(bool file, bool dir)
        {
            if (file)
            {
                if (fileEnumerator != null)
                    fileEnumerator.Dispose();

                fileEnumerator = null;
            }

            if (dir)
            {
                if (directoryEnumerator != null)
                    directoryEnumerator.Dispose();

                directoryEnumerator = null;
            }
        }

        public bool MoveNext()
        {
            // Enumerate the files in the current folder
            if ((fileEnumerator != null) && (fileEnumerator.MoveNext()))
                return true;

            // Don't go recursive...
            if (searchOption == SearchOption.TopDirectoryOnly) { return false; }

            while ((directoryEnumerator != null) && (directoryEnumerator.MoveNext()))
            {
                Dispose(true, false);

                try
                {
                    fileEnumerator = new SafeFileEnumerator(
                        directoryEnumerator.Current,
                        pattern,
                        SearchOption.AllDirectories,
                        errors
                        ).GetEnumerator();
                }
                catch (Exception ex)
                {
                    errors.Add(ex);
                    continue;
                }

                // Open up the current folder file enumerator
                if (fileEnumerator.MoveNext())
                    return true;
            }

            Dispose(true, true);

            return false;
        }

        public void Reset()
        {
            Dispose(true,true);

            // Safely get the enumerators, including in the case where the root is not accessable
            if (root != null)
            {
                try
                {
                    fileEnumerator = root.GetFileSystemInfos(pattern, SearchOption.TopDirectoryOnly).AsEnumerable<FileSystemInfo>().GetEnumerator();
                }
                catch (Exception ex)
                {
                    errors.Add(ex);
                    fileEnumerator = null;
                }

                try
                {
                    directoryEnumerator = root.GetDirectories(pattern, SearchOption.TopDirectoryOnly).AsEnumerable<DirectoryInfo>().GetEnumerator();
                }
                catch (Exception ex)
                {
                    errors.Add(ex);
                    directoryEnumerator = null;
                }
            }
        }
    }
    public IEnumerator<FileSystemInfo> GetEnumerator()
    {
        return new Enumerator(root, pattern, searchOption, errors);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接