高效地在C#并行中向list<class>添加项目

Question

高效地在C#并行中向list<class>添加项目

4

我有一段代码，可以将由 string, string, string 组成的类列表中某个属性的字符串拆分。

现在，我声明了一个空的 Dataframe2 (string,string[], string) 并使用 Add 方法将项目添加到列表中。

class Program

{


    public static string[] SPString(string text)
    {
        string[] elements;
        elements = text.Split(' ');
        return elements;
    }

    //Structures
    public class Dataframe
    {

        public string Name { get; set; }
        public string Text { get; set; }
        public string Cat { get; set; }
    }

    public class Dataframe2
    {

        public string Name { get; set; }
        public string[] Text { get; set; }
        public string Cat { get; set; }
    }



    static void Main(string[] args)
    {

        List<Dataframe> doc = new List<Dataframe>{new Dataframe { Name = "Doc1", Text = "The quick brown cat", Cat = ""},
            new Dataframe { Name = "Doc2", Text = "The big fat cat", Cat = "Two"},
            new Dataframe { Name = "Doc4", Text = "The quick brown rat", Cat = "One"},
            new Dataframe { Name = "Doc3", Text = "Its the cat in the hat", Cat = "Two"},
            new Dataframe { Name = "Doc5", Text = "Mice and rats eat seeds", Cat = "One"},
        };

        // Can this be made more efficient?
        ConcurrentBag<Dataframe2> doc2 = new ConcurrentBag<Dataframe2>();
        Parallel.ForEach(doc, entry =>
        {
            string s = entry.Text;
            string[] splitter = SPString(s);
            doc2.Add(new Dataframe2 {Name = entry.Name, Text = splitter, Cat =entry.Cat});
        } );

    }
}

有没有更有效的方法使用并行LINQ向列表中添加内容，其中Dataframe2继承了我没有修改的属性？

- ccsv

2

我很难理解你想要实现什么。另外，不要在并发中使用List<T>，它会产生意想不到的结果。请改用ConcurrentBag<T>。 - Patrick Hofman

@PatrickHofman 我正在尝试找出除了doc2.Add(new Dataframe2 {Name = entry.Name, Text = splitter, Cat =entry.Cat});之外，是否有更高效的方法将东西添加到 list<T> 中，比如应用一个掩码或映射我不使用的东西之类的。同时也不是很熟悉ConcurrentBag，但我假设它是一个线程安全的列表？ - ccsv

确实。它是线程安全的。List<T>不是。 - Patrick Hofman

@PatrickHofman 好的，我已经把它改成了“bags”。谢谢。因为我刚开始接触并行处理，所以不知道该怎么做。 - ccsv

@Jodrell 我不会贴出完整的代码，因为空间有限的明显原因，而且parallel.foreach具有速度优势，请参见此处答案中的示例https://dev59.com/RWct5IYBdhLWcg3wNq-v - ccsv

显示剩余3条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dmitry Bychenko · Accepted Answer

您可以尝试使用PLinq来添加并行性并保留List<T>：

// Do NOT create and then fill the List<T> (which is not thread-safe) in parallel manually,
// Let PLinq do it for you
List<Dataframe2> doc2 = doc
  .AsParallel()
  .Select(entry => {
     //TODO: make Dataframe2 from given Dataframe (entry)
     ...
     return new Dataframe2 {Name = entry.Name, Text = splitter, Cat = entry.Cat};
  }) 
  .ToList();