.NET中类加载器的等效物

Question

.NET中类加载器的等效物

.netcompiler-constructionprogramming-languagesclrlanguage-features

44

有人知道在.NET中是否可以定义“Java自定义类加载器”的等效物吗？

为了给大家一些背景，我正在开发一个针对CLR的新编程语言，名为“Liberty”。该语言的一个特点是其能够定义“类型构造器(type constructors)”，这些方法由编译器在编译时执行，并生成类型作为输出。它们是泛型的一种概括（该语言确实具有正常的泛型），并允许编写以下代码（用“Liberty”语法）：

var t as tuple<i as int, j as int, k as int>;
t.i = 2;
t.j = 4;
t.k = 5;

“tuple” 的定义如下：

public type tuple(params variables as VariableDeclaration[]) as TypeDeclaration
{
   //...
}

在这个例子中，类型构造函数tuple提供了类似于VB和C#中的匿名类型的东西。但是，与匿名类型不同，“元组”具有名称并且可以在公共方法签名中使用。这意味着我需要一种方法来使最终由编译器发出的类型可在多个程序集之间共享。例如，我想要在程序集A中定义的tuple<x as int>最终成为与在程序集B中定义的tuple<x as int>相同的类型。当然，问题在于程序集A和程序集B将在不同的时间编译，这意味着它们都将发出自己不兼容的版本的元组类型。我研究了一些“类型擦除”的方法来解决这个问题，这样我就会有一个共享库，其中包含许多这样的类型（这是“Liberty”语法）:

class tuple<T>
{
    public Field1 as T;
}

class tuple<T, R>
{
    public Field2 as T;
    public Field2 as R;
}

然后只需将 i、j 和 k 的访问重定向到 Field1、Field2 和 Field3 即可。但这并不是一个可行的选择。这意味着在编译时，tuple<x as int> 和 tuple<y as int> 将成为不同的类型，而在运行时，它们将被视为相同的类型。这会导致许多问题，例如相等性和类型识别。对我来说，这是一个过于泄露的抽象概念。

其他可能的选择是使用“状态包对象”。然而，使用状态包将破坏语言中“类型构造函数”的全部目的。其想法是启用“自定义语言扩展”，以生成编译器可以进行静态类型检查的新类型。

在 Java 中，可以使用自定义类加载器来实现此操作。基本上，使用元组类型的代码可以在未在磁盘上定义该类型的情况下发出。然后可以定义一个自定义“类加载器”，在运行时动态生成元组类型。这将允许编译器内部进行静态类型检查，并统一跨编译边界的元组类型。

然而，不幸的是，CLR 不支持自定义类加载。CLR 中的所有加载都是在程序集级别完成的。可以为每个“构造类型”定义一个单独的程序集，但这将很快导致性能问题（具有只包含一个类型的许多程序集将使用过多的资源）。

因此，我想知道：

是否可能在 .NET 中模拟类似于 Java 类加载器的东西，在其中可以发出对不存在类型的引用，然后在需要使用它的代码运行之前在运行时动态生成对该类型的引用？

注意:

*实际上我已经知道了问题的答案，我会在下面提供答案。然而，我花了大约3天的时间进行研究，并进行了相当多的 IL 黑客攻击，以便提出解决方案。我想如果其他人遇到了同样的问题，将其记录在这里是个好主意。*

- Scott Wisniewski

2个回答

-5

我认为这是DLR在C# 4.0中应该提供的类型。目前很难获得信息，但也许我们会在PDC08上了解更多。非常期待看到您的C# 3解决方案...我猜它使用匿名类型。

- Kevin Dostalek

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Scott Wisniewski · Accepted Answer

答案是肯定的，但解决方案有点棘手。 System.Reflection.Emit 命名空间定义了类型，允许动态生成程序集。它们还允许逐步定义生成的程序集，也就是说可以向动态程序集添加类型，执行生成的代码，然后再添加更多类型到程序集中。 System.AppDomain 类还定义了 AssemblyResolve 事件，每当框架无法加载程序集时都会触发该事件。通过为该事件添加处理程序，可以将所有 "构造" 类型放入单个 "运行时" 程序集中。使用构造类型的编译器生成的代码将引用运行时程序集中的类型。由于运行时程序集实际上不存在于磁盘上，因此在编译后的代码尝试访问构造类型时，将首次触发 AssemblyResolve 事件。然后，该事件的处理程序将生成动态程序集并将其返回给 CLR。

很遗憾，要让这个工作起来有一些棘手的问题。第一个问题是确保在编译代码运行之前始终安装事件处理程序。对于控制台应用程序，这很容易。可以将挂钩事件处理程序的代码添加到Main方法中，在其他代码运行之前运行。然而，对于类库，没有主方法。DLL可能作为另一种语言编写的应用程序的一部分加载，因此不能假设总是有一个主方法可用于挂钩事件处理程序代码。

第二个问题是确保在使用引用它们的任何代码之前，所有引用的类型都被插入到动态程序集中。System.AppDomain类还定义了一个TypeResolve事件，该事件在CLR无法解析动态程序集中的类型时执行。它为事件处理程序提供了在使用它之前在动态程序集中定义类型的机会。然而，在这种情况下，该事件将不起作用。即使引用的程序集是动态定义的，CLR也不会为其他程序集“静态引用”的程序集触发该事件。这意味着我们需要一种方法，在编译的程序集中的任何其他代码运行之前运行代码，并在运行时程序集中动态注入所需的类型（如果尚未定义）。否则，当CLR尝试加载那些类型时，它将注意到动态程序集不包含它们所需的类型，并引发类型加载异常。

幸运的是，CLR提供了一个解决这两个问题的方法：模块初始化器。模块初始化器相当于一个“静态类构造函数”，不同之处在于它初始化整个模块，而不仅仅是单个类。基本上，CLR将会：

在访问模块内部任何类型之前运行模块构造函数。
保证只有直接被模块构造函数访问的类型在执行期间被加载。
在构造函数完成之前，不允许模块外的代码访问其任何成员。

这适用于所有程序集，包括类库和可执行文件，并且对于EXE文件，在执行Main方法之前会运行模块构造函数。

有关构造函数的更多信息，请参见此博客文章。

无论如何，解决我的问题需要几个完整的组成部分：

The following class definition, defined inside a "language runtime dll", that is referenced by all assemblies produced by the compiler (this is C# code).

using System;
using System.Collections.Generic;
using System.Reflection;
using System.Reflection.Emit;

namespace SharedLib
{
    public class Loader
    {
        private Loader(ModuleBuilder dynamicModule)
        {
            m_dynamicModule = dynamicModule;
            m_definedTypes = new HashSet<string>();
        }

        private static readonly Loader m_instance;
        private readonly ModuleBuilder m_dynamicModule;
        private readonly HashSet<string> m_definedTypes;

        static Loader()
        {
            var name = new AssemblyName("$Runtime");
            var assemblyBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(name, AssemblyBuilderAccess.Run);
            var module = assemblyBuilder.DefineDynamicModule("$Runtime");
            m_instance = new Loader(module);
            AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve);
        }

        static Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args)
        {
            if (args.Name == Instance.m_dynamicModule.Assembly.FullName)
            {
                return Instance.m_dynamicModule.Assembly;
            }
            else
            {
                return null;
            }
        }

        public static Loader Instance
        {
            get
            {
                return m_instance;
            }
        }

        public bool IsDefined(string name)
        {
            return m_definedTypes.Contains(name);
        }

        public TypeBuilder DefineType(string name)
        {
            //in a real system we would not expose the type builder.
            //instead a AST for the type would be passed in, and we would just create it.
            var type = m_dynamicModule.DefineType(name, TypeAttributes.Public);
            m_definedTypes.Add(name);
            return type;
        }
    }
}

The class defines a singleton that holds a reference to the dynamic assembly that the constructed types will be created in. It also holds a "hash set" that stores the set of types that have already been dynamically generated, and finally defines a member that can be used to define the type. This example just returns a System.Reflection.Emit.TypeBuilder instance that can then be used to define the class being generated. In a real system, the method would probably take in an AST representation of the class, and just do the generation it's self.

Compiled assemblies that emit the following two references (shown in ILASM syntax):
```
.assembly extern $Runtime
{
    .ver 0:0:0:0
}
.assembly extern SharedLib
{
    .ver 1:0:0:0
}
```
Here "SharedLib" is the Language's predefined runtime library that includes the "Loader" class defined above and "$Runtime" is the dynamic runtime assembly that the consructed types will be inserted into.

A "module constructor" inside every assembly compiled in the language.

As far as I know, there are no .NET languages that allow Module Constructors to be defined in source. The C++ /CLI compiler is the only compiler I know of that generates them. In IL, they look like this, defined directly in the module and not inside any type definitions:

.method privatescope specialname rtspecialname static 
        void  .cctor() cil managed
{
    //generate any constructed types dynamically here...
}

For me, It's not a problem that I have to write custom IL to get this to work. I'm writing a compiler, so code generation is not an issue.

In the case of an assembly that used the types tuple<i as int, j as int> and tuple<x as double, y as double, z as double> the module constructor would need to generate types like the following (here in C# syntax):

class Tuple_i_j<T, R>
{
    public T i;
    public R j;
}

class Tuple_x_y_z<T, R, S>
{
    public T x;
    public R y;
    public S z;
}

The tuple classes are generated as generic types to get around accessibility issues. That would allow code in the compiled assembly to use tuple<x as Foo>, where Foo was some non-public type.

The body of the module constructor that did this (here only showing one type, and written in C# syntax) would look like this:

var loader = SharedLib.Loader.Instance;
lock (loader)
{
    if (! loader.IsDefined("$Tuple_i_j"))
    {
        //create the type.
        var Tuple_i_j = loader.DefineType("$Tuple_i_j");
        //define the generic parameters <T,R>
       var genericParams = Tuple_i_j.DefineGenericParameters("T", "R");
       var T = genericParams[0];
       var R = genericParams[1];
       //define the field i
       var fieldX = Tuple_i_j.DefineField("i", T, FieldAttributes.Public);
       //define the field j
       var fieldY = Tuple_i_j.DefineField("j", R, FieldAttributes.Public);
       //create the default constructor.
       var constructor= Tuple_i_j.DefineDefaultConstructor(MethodAttributes.Public);

       //"close" the type so that it can be used by executing code.
       Tuple_i_j.CreateType();
    }
}

无论如何，这是我能想到的机制，用于在CLR中实现类似自定义类加载器的粗略等效功能。

有人知道更简单的方法吗？