LINQ to objects 中的 LEFT OUTER JOIN

9
考虑以下代码。
City和CityPlace由CityCode连接。
我想要做的是在CityPlace和City之间执行一个LEFT OUTER JOIN。
City[] cities = new City[]{
 new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"},
 new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"},
 new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"},
 new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"},
};


CityPlace[] places = new CityPlace[]{
 new CityPlace{CityCode="0771",Place="Shankar Nagar"},
 new CityPlace{CityCode="0771",Place="Pandari"},
 new CityPlace{CityCode="0771",Place="Energy Park"},

 new CityPlace{CityCode="0751",Place="Baadaa"},
 new CityPlace{CityCode="0751",Place="Nai Sadak"},
 new CityPlace{CityCode="0751",Place="Jayendraganj"},
 new CityPlace{CityCode="0751",Place="Vinay Nagar"},

 new CityPlace{CityCode="0755",Place="Idgah Hills"},

 new CityPlace{CityCode="022",Place="Parel"},
 new CityPlace{CityCode="022",Place="Haaji Ali"},
 new CityPlace{CityCode="022",Place="Girgaon Beach"},

 new CityPlace{CityCode="0783",Place="Railway Station"}};

我所做的是

var res = places.GroupJoin(cities,
                           p1=>p1.CityCode,
                           c1=>c1.CityCode,
                           (p2,c2s)=>new {Place=p2.Place,
                                CityName=c2s.Count()==0 ? "NO NAME" 
                                           : c2s.First().CityName });

foreach(var v in res)
 Console.WriteLine(v);

这是标准方案还是快速粗略的解决方案?
3个回答

15

你自己的答案没问题,但不够优雅。所以,是有点脏的。有一种标准的左连接方式,可以处理你的例子并且可以处理存在重复城市的情况。你的例子无法处理重复的城市,因为当你选择c2s.First()时,任何重复项都会被忽略。

标准左连接的步骤如下:

  1. 使用GroupJoin从数据创建一个层次结构。
  2. 使用SelectMany展开层次结构。

你的GroupJoin在一步中展开了层次结构,忽略了除第一个匹配城市之外的所有内容。这就是它脏的地方。如果你尝试将这段代码反过来,把城市和地点左连接起来,你只会得到每个城市一个地点!这显然是不好的。最好学习如何正确地进行左连接,这样它就总是有效的。

如果你只想看到你具体问题的答案,请直接滚动到下面的“城市和地点”标题,但首先,这里有一个使用两个简单字符串数组的完整示例。

完整解释的抽象示例

这是一个完整的示例,使用两个字母数组而不是你的代码。我想先展示一个更简单的例子。如果您愿意,您可以将其复制粘贴到LINQPad中,并将语言设置为“C#语句”,然后运行它。我强烈推荐LINQPad作为测试各种代码的工具,不仅仅是LINQ。或者,您也可以在Visual Studio中创建一个控制台应用程序。

以下是没有太多注释的代码。下面是一个有很多注释的版本。如果您想了解每个参数的确切含义,您可能需要跳到那里。

var leftLetters = new string[]{ "A", "B", "C" };
var rightLetters = new string[]{ "A", "B" };

//Create a hierarchical collection that includes every left item paired with a collection of matching right items (which may be empty if there are no matching right items.)
var groupJoin =
    leftLetters.GroupJoin(
        rightLetters, 
        leftLetter => leftLetter, 
        rightLetter => rightLetter, 
        ( leftLetter, matchingRightLetters ) => new { leftLetter, matchingRightLetters } 
    );

//Flatten the groupJoin hierarchical collection with a SelectMany
var selectMany = 
    groupJoin.SelectMany(           
        groupJoinItem => groupJoinItem.matchingRightLetters.DefaultIfEmpty( "MISSING" ),            
        ( groupJoinItem, rightLetter ) => new {
            LeftLetter = groupJoinItem.leftLetter, 
            RightLetter = rightLetter 
        }
    );

//You can think of the elements of selectMany as "rows" as if this had been a left outer join in SQL. But this analogy breaks down rapidly if you are selecting objects instead of scalar values.
foreach( var row in selectMany )
{
    Console.WriteLine( row.LeftLetter + ", " + row.RightLetter );
}

这是输出结果,因为我们都知道左连接应该做什么,所以它应该很明显。

A, A
B, B
C, MISSING

高度注释的版本:

var leftLetters = new string[]{ "A", "B", "C" };
var rightLetters = new string[]{ "A", "B" };

//Create a hierarchical collection that includes every left item paired with a collection of matching right items (which may be empty if there are no matching right items.)
var groupJoin =
    leftLetters.GroupJoin(
        rightLetters, //inner: the right hand collection in the join
        leftLetter => leftLetter, //outerKeySelector: There is no property to use as they join key, the letter *is* the key. So this lambda simply returns the parameter itself.
        rightLetter => rightLetter, //innerKeySelector: Same with the rightLetters
        ( leftLetter, matchingRightLetters ) => new { leftLetter, matchingRightLetters } //resultSelector: given an element from the left items, and its matching collection of right items, project them to some class. In this case we are using a new anonymous type. 
    );

//Flatten the groupJoin hierarchical collection with a SelectMany
var selectMany = 
    groupJoin.SelectMany(
        //collectionSelector: given a single element from our collection of group join items from above, provide a collection of its "right" items which we want to flatten out. In this case the right items are in a property of the groupJoinItem itself, but this does not need to be the case! We use DefaultIfEmpty to turn an empty collection into a new collection that has exactly one item instead: the string "MISSING".
        groupJoinItem => groupJoinItem.matchingRightLetters.DefaultIfEmpty( "MISSING" ), 
        //resultSelector: SelectMany does the flattening for us and this lambda gets invoked once for *each right item* in a given left item's collection of right items.
        ( 
            groupJoinItem, //The first parameter is one of the original group join item, including its entire collection of right items, but we will ignore that collection in the body of this lamda and just grab the leftLetter property.
            rightLetter //The second parameter is *one* of the matching right items from the collection of right items we selected in the first lambda we passed into SelectMany.
        )  
            => new {
                LeftLetter = groupJoinItem.leftLetter, //groupJoinItem is one of the original items from the GroupJoin above. We just want the left letter from it.
                RightLetter = rightLetter //This is one of the individual right letters, so just select it as-is.
            }
    );

//You can think of the elements of selectMany as "rows" as if this had been a left outer join in SQL. But this analogy breaks down rapidly if you are selecting objects instead of scalar values.
foreach( var row in selectMany )
{
    Console.WriteLine( row.LeftLetter + ", " + row.RightLetter );
}   

再次提供参考输出:

A, A
B, B
C, MISSING
使用 LINQ 的上述方法通常称为“方法链”,您将一些集合链接在一起,以获得所需结果。大多数时候,您不使用变量来保存单个表达式。您只需执行 GroupJoin(…)。SelectMany(…) ,因此被称为“方法链”。它非常冗长和明确,并需要很长时间编写。
相反,我们可以使用称为“推导式”、“查询推导式”或“LINQ 推导式”的东西。推导式是 1970 年代的一个旧计算机科学术语,对大多数人来说并没有太多意义。相反,人们称它们为“LINQ 查询”或“LINQ 表达式”,但这些术语在技术上也适用于方法链,因为在两种情况下都构建了表达式树。 (表达式树超出本教程范围) LINQ 推导式是一种类似于 SQL 的语法,用于编写 LINQ,但它与实际的 SQL 无关。以下是使用查询推导式编写的同一代码:
var leftLetters = new string[]{ "A", "B", "C" };
var rightLetters = new string[]{ "A", "B" };

var query = 
    from leftLetter in leftLetters
    join rightLetter in rightLetters
    on leftLetter equals rightLetter into matchingRightLetters
    from rightLetter in matchingRightLetters.DefaultIfEmpty( "MISSING" )
    select new
    {
        LeftLetter = leftLetter,
        RightLetter = rightLetter
    };

foreach( var row in query )
{
    Console.WriteLine( row.LeftLetter + ", " + row.RightLetter );
}   

这将编译成与上面示例完全相同的代码,只是在SelectMany中名为"groupJoinItem"的参数将被命名为类似"temp0"的东西,因为该参数在这段代码的推导版本中不存在。

我认为你可以欣赏这个版本的代码有多简单。在进行左外连接时,我总是使用这种语法。我从不使用GroupJoin和SelectMany。然而,乍一看它几乎没有任何意义。一个join后跟一个into创建了一个GroupJoin。你首先必须知道这个,以及为什么你想要这个。然后第二个from表示一个SelectMany,这是非显而易见的。当你有两个from关键字时,你实际上创建了一个交叉连接(笛卡尔积),这就是SelectMany正在做的事情。(有点像)。

例如,这个查询:

from leftLetter in leftLetters
from rightLetter in rightLetters
select new
{
    LeftLetter = leftLetter,
    RightLetter = rightLetter
}
将产生:
A, A
A, B
B, A
B, B
C, A
C, B

这是一个基本的交叉连接。

那么,回到我们最初的左连接LINQ查询:查询的第一个from是组连接,第二个from表示每个groupJoinItem与其匹配的右侧字母集合之间的交叉连接。就像这样:

from groupJoinItem in groupJoin
from rightLetter in groupJoinItem.matchingRightLetters
select new{...}

实际上,我们可以将其写成这样!

var groupJoin = 
    from leftLetter in leftLetters
    join rightLetter in rightLetters
    on leftLetter equals rightLetter into matchingRightLetters
    select new 
    {
        LeftLetter = leftLetter,
        MatchingRightLetters = matchingRightLetters
    };


var selectMany = 
    from groupJoinItem in groupJoin 
    from rightLetter in groupJoinItem.MatchingRightLetters.DefaultIfEmpty( "MISSING" )
    select new
    {
        LeftLetter = groupJoinItem.LeftLetter,
        RightLetter = rightLetter
    };
表达的含义是:对于groupJoin中的每个项,将其与其自己的MatchingRightLetters属性进行交叉连接,并将所有结果连接在一起。这与我们上面任何一个left join代码给出了完全相同的结果。
这可能是对于这个简单问题的过度解释,但我不喜欢盲目模仿编程(谷歌一下)。你应该确切地知道你的代码在做什么以及为什么,否则你将无法处理更困难的问题。

城市和地点

所以,这是您代码的方法链版本。这是一个完整的程序,因此人们可以运行它(在LINQPad中使用“C#程序”语言类型或使用Visual Studio或C#编译器创建控制台应用程序)。
void Main()
{
    City[] cities = new City[]{
        new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"},
        new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"},
        new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"},
        new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"},
    };  

    CityPlace[] places = new CityPlace[]{
        new CityPlace{CityCode="0771",Place="Shankar Nagar"},
        new CityPlace{CityCode="0771",Place="Pandari"},
        new CityPlace{CityCode="0771",Place="Energy Park"},

        new CityPlace{CityCode="0751",Place="Baadaa"},
        new CityPlace{CityCode="0751",Place="Nai Sadak"},
        new CityPlace{CityCode="0751",Place="Jayendraganj"},
        new CityPlace{CityCode="0751",Place="Vinay Nagar"},

        new CityPlace{CityCode="0755",Place="Idgah Hills"},

        new CityPlace{CityCode="022",Place="Parel"},
        new CityPlace{CityCode="022",Place="Haaji Ali"},
        new CityPlace{CityCode="022",Place="Girgaon Beach"},

        new CityPlace{CityCode="0783",Place="Railway Station"}
    };

    var query = 
        places.GroupJoin(
            cities,
            place => place.CityCode,
            city => city.CityCode,
            ( place, matchingCities ) 
                => new {
                    place,
                    matchingCities
                }
        ).SelectMany(
            groupJoinItem => groupJoinItem.matchingCities.DefaultIfEmpty( new City{ CityName = "NO NAME" } ),
            ( groupJoinItem, city )
                => new {
                    Place = groupJoinItem.place,
                    City = city
                }
        );              

    foreach(var pair in query)
    {
        Console.WriteLine( pair.Place.Place + ": " + pair.City.CityName );
    }
}

class City
{
    public string CityCode;
    public string CityName;
    public string CityPopulation;
}

class CityPlace
{
    public string CityCode;
    public string Place;
}

这是输出结果:

Shankar Nagar: Raipur
Pandari: Raipur
Energy Park: Raipur
Baadaa: Gwalior
Nai Sadak: Gwalior
Jayendraganj: Gwalior
Vinay Nagar: Gwalior
Idgah Hills: Bhopal
Parel: Mumbai
Haaji Ali: Mumbai
Girgaon Beach: Mumbai
Railway Station: NO NAME

注意,DefaultIfEmpty将返回实际City类的新实例,而不只是字符串。这是因为我们将CityPlaces连接到实际的City对象,而不是字符串。您可以改用DefaultIfEmpty(),不带参数,然后您将获得一个"Railway Station"的null City,但在调用pair.City.CityName之前,您必须在foreach循环中检查null值。这是个人偏好的问题。

这是使用查询表达式的相同程序:

void Main()
{
    City[] cities = new City[]{
        new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"},
        new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"},
        new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"},
        new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"},
    };  

    CityPlace[] places = new CityPlace[]{
        new CityPlace{CityCode="0771",Place="Shankar Nagar"},
        new CityPlace{CityCode="0771",Place="Pandari"},
        new CityPlace{CityCode="0771",Place="Energy Park"},

        new CityPlace{CityCode="0751",Place="Baadaa"},
        new CityPlace{CityCode="0751",Place="Nai Sadak"},
        new CityPlace{CityCode="0751",Place="Jayendraganj"},
        new CityPlace{CityCode="0751",Place="Vinay Nagar"},

        new CityPlace{CityCode="0755",Place="Idgah Hills"},

        new CityPlace{CityCode="022",Place="Parel"},
        new CityPlace{CityCode="022",Place="Haaji Ali"},
        new CityPlace{CityCode="022",Place="Girgaon Beach"},

        new CityPlace{CityCode="0783",Place="Railway Station"}
    };

    var query = 
        from place in places
        join city in cities
        on place.CityCode equals city.CityCode into matchingCities
        from city in matchingCities.DefaultIfEmpty( new City{ CityName = "NO NAME" } )
        select new {
            Place = place,
            City = city
        };      

    foreach(var pair in query)
    {
        Console.WriteLine( pair.Place.Place + ": " + pair.City.CityName );
    }
}

class City
{
    public string CityCode;
    public string CityName;
    public string CityPopulation;
}

class CityPlace
{
    public string CityCode;
    public string Place;
}

作为长期使用SQL的用户,我非常喜欢查询理解版。一旦你知道查询中各个部分的作用,那么对于其他人来阅读代码,它更容易表达出意图

愉快的编程!


8

这是一个使用linq查询的版本

var noCity = new City {CityName = "NO NAME"};
var anotherway = from p in places
                 join c in cities on p.CityCode equals c.CityCode into merge
                 from c in merge.DefaultIfEmpty(noCity)
                 select new { p.Place, c.CityName };

我认为使用DefaultIfEmpty()可以使它更清晰明了。

总的来说,我觉得linq中的outer join相当令人困惑。这是我发现SQL查询显著优越的少数几个地方之一。


是的,我也同意!但是在我的例子中我不能使用SQL查询。 - Akshay J

5

在您的情况下,您没有对记录进行分组,因此不要使用您的解决方案。您可以使用ScottS的解决方案或使用以下查询。

var res = from p in places
                       select new
                       {
                           Place = p.Place,
                           CityName = (from c in cities where p.CityCode == c.CityCode select c.CityName).DefaultIfEmpty("NO NAME").ElementAtOrDefault(0)
                       };

GroupJoin是使用LINQ方法进行左外连接的第一步。它与GroupBy不同。左连接的第二步是SelectMany。GroupJoin创建了一个分层连接。每个“左”(外部)项都与一组匹配的“右”(内部)项连接。然后,SelectMany将层次结构展平,这会为其集合中的每个右项创建重复的左项。如果其右项的集合为空,则使用DefaultIfEmpty将左项保留在结果中。因此,这是一个左外连接。请参见我的即将发布的答案,其中包含完整的示例。 - Glazed

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接