如何从Python中正确或最可靠地判断一个导入的模块是来自C扩展还是纯Python模块?例如,如果一个Python包有一个既有纯Python实现又有C实现的模块,您希望在运行时能够确定使用哪个。
一个想法是检查module.__file__
的文件扩展名,但我不确定应该检查哪些文件扩展名以及这种方法是否最可靠。
如何从Python中正确或最可靠地判断一个导入的模块是来自C扩展还是纯Python模块?例如,如果一个Python包有一个既有纯Python实现又有C实现的模块,您希望在运行时能够确定使用哪个。
一个想法是检查module.__file__
的文件扩展名,但我不确定应该检查哪些文件扩展名以及这种方法是否最可靠。
inspect.getsource()
和inspect.getsourcefile()
函数在C扩展(可以理解为没有纯Python源代码)和其他类型的模块(例如仅有字节码的模块)中都含糊地返回None
。无用的。importlib
机制仅适用于可由PEP 302-compliant loaders加载并因此对默认importlib
导入算法可见的模块。 有用,但很难普遍适用。当现实世界反复打击您的软件包时,PEP 302兼容性的假设会崩溃。例如,您是否知道内置的__import__()
实际上是可重载的?这是我们在地球还是平的时候定制Python导入机制的方式。有一个完美的答案。就像传说中海拉鲁三力神三角力量一样,每个不完美的问题都有一个完美的答案。……没有完美的答案。
import inspect, os
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
from types import ModuleType
def is_c_extension(module: ModuleType) -> bool:
'''
`True` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Parameters
----------
module : ModuleType
Previously imported module object to be tested.
Returns
----------
bool
`True` only if this module is a C extension.
'''
assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
让我们使用四个可移植的导入模块对此函数进行单元测试:
os.__init__
模块。希望不是 C 扩展。importlib.machinery
子模块。希望不是 C 扩展。_elementtree
C 扩展。numpy.core.multiarray
C 扩展。即:
>>> import os
>>> import importlib.machinery as im
>>> import _elementtree as et
>>> import numpy.core.multiarray as ma
>>> for module in (os, im, et, ma):
... print('Is "{}" a C extension? {}'.format(
... module.__name__, is_c_extension(module)))
Is "os" a C extension? False
Is "importlib.machinery" a C extension? False
Is "_elementtree" a C extension? True
Is "numpy.core.multiarray" a C extension? True
结果好就一切都好。
我们代码的细节并不重要。好的,我们从哪里开始?
__loader__
属性,其值为加载此模块的加载器对象。因此:
importlib.machinery.ExtensionFileLoader
类的实例,则该模块是 C 扩展。__import__()
机制被覆盖(例如,由低级引导程序运行此 Python 应用程序作为特定平台的冻结二进制文件)。在任一情况下,都要回退到测试该模块的文件类型是否为当前平台特定的 C 扩展。__file__
。(几个地方都有记录,例如:Inspect文档中的Types and members
表格。)请注意,如果您使用类似于py2app
或者cx_freeze
之类的工具,那么“内置”的定义可能与独立安装有所不同。easy_install
中很常见,但在pip
中不太常见)安装的包中的模块将具有空白或无意义的__file__
。在3.1+版本中,导入过程已经得到了大量的清理,大部分被重新编写为Python代码,并且大多数暴露给了Python层。
因此,您可以使用importlib
模块来查看用于加载模块的加载器链,最终可以看到BuiltinImporter
(builtins)、ExtensionFileLoader
(.so/.pyd等)、SourceFileLoader
(.py)或SourcelessFileLoader
(.pyc/.pyo)。
您还可以在importlib.machinery
中的常量中查看分配给每个四个后缀,在当前目标平台上。因此,您可以 检查any(pathname.endswith(suffix) for suffix in importlib.machinery.EXTENSION_SUFFIXES))
,但这在egg/zip情况下并没有实际帮助,除非您已经旅行过整个链。
inspect
模块中实现的,因此最好的做法是使用它。其中一个或多个最佳选择是getsource
、getsourcefile
和getfile
;哪一个是最佳的取决于您想要使用的启发式方法。对于任何一个内置模块,这三个函数之一都会引发TypeError
。对于扩展模块,getsourcefile
应该返回空字符串。在我拥有的所有2.5-3.4版本中,这似乎都有效,但我没有2.4版本的环境。对于getsource
,至少在某些版本中,它返回.so文件的实际字节,即使它应该返回空字符串或引发IOError
。(在3.x中,您几乎肯定会得到UnicodeError
或SyntaxError
,但您可能不希望依靠此功能……)如果是纯Python模块,则如果在egg/zip/etc中则应该返回空字符串getsourcefile
,但如果源码可用,则始终应该为getsource
返回非空字符串,即使在egg/zip/etc中也是如此,但如果它们是无源代码的字节码(.pyc/etc。),则将返回空字符串或引发IOError。最好的方法是在您关心的平台上以您关心的分发/设置为基础进行尝试。getsourcefile
和/或getsourcelines
? - abarnertUnicodeError
或SyntaxError
。然而,在至少2.7和3.4中,getsource
有时适用于无法使用getsourcefile
的压缩源文件。因此,它们都有优点和缺点,没有完美的答案。 - abarnertsys.path
中。奇怪但有效。问题出现在平台特定的库上。纯库可以从您正在开发的任何操作系统复制过来,但需要在另一个Amazon Linux实例中构建OS特定的库并将其复制到供应商文件夹中。这很麻烦,因此您只想为非纯Python库执行此操作。 - Chris Johnson@Cecil Curry的函数非常出色。两个小注释:首先,_elementtree
示例在我使用的Python 3.5.6版本中会引发TypeError
异常。其次,正如@crld指出的那样,了解一个模块是否包含C扩展也很有帮助,但更便携的版本可能更好。因此,更通用的版本(使用Python 3.6+ f-string语法)可能是:
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
import inspect
import logging
import os
import os.path
import pkgutil
from types import ModuleType
from typing import List
log = logging.getLogger(__name__)
def is_builtin_module(module: ModuleType) -> bool:
"""
Is this module a built-in module, like ``os``?
Method is as per :func:`inspect.getfile`.
"""
return not hasattr(module, "__file__")
def is_module_a_package(module: ModuleType) -> bool:
assert inspect.ismodule(module)
return os.path.basename(inspect.getfile(module)) == "__init__.py"
def is_c_extension(module: ModuleType) -> bool:
"""
Modified from
https://dev59.com/52Ij5IYBdhLWcg3wYUCQ
``True`` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Args:
module: Previously imported module object to be tested.
Returns:
bool: ``True`` only if this module is a C extension.
Examples:
.. code-block:: python
from cardinal_pythonlib.modules import is_c_extension
import os
import _elementtree as et
import numpy
import numpy.core.multiarray as numpy_multiarray
is_c_extension(os) # False
is_c_extension(numpy) # False
is_c_extension(et) # False on my system (Python 3.5.6). True in the original example.
is_c_extension(numpy_multiarray) # True
""" # noqa
assert inspect.ismodule(module), f'"{module}" not a module.'
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# If it's built-in, it's not a C extension.
if is_builtin_module(module):
return False
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
def contains_c_extension(module: ModuleType,
import_all_submodules: bool = True,
include_external_imports: bool = False,
seen: List[ModuleType] = None,
verbose: bool = False) -> bool:
"""
Extends :func:`is_c_extension` by asking: is this module, or any of its
submodules, a C extension?
Args:
module: Previously imported module object to be tested.
import_all_submodules: explicitly import all submodules of this module?
include_external_imports: check modules in other packages that this
module imports?
seen: used internally for recursion (to deal with recursive modules);
should be ``None`` when called by users
verbose: show working via log?
Returns:
bool: ``True`` only if this module or one of its submodules is a C
extension.
Examples:
.. code-block:: python
import logging
import _elementtree as et
import os
import arrow
import alembic
import django
import numpy
import numpy.core.multiarray as numpy_multiarray
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG) # be verbose
contains_c_extension(os) # False
contains_c_extension(et) # False
contains_c_extension(numpy) # True -- different from is_c_extension()
contains_c_extension(numpy_multiarray) # True
contains_c_extension(arrow) # False
contains_c_extension(alembic) # False
contains_c_extension(alembic, include_external_imports=True) # True
# ... this example shows that Alembic imports hashlib, which can import
# _hashlib, which is a C extension; however, that doesn't stop us (for
# example) installing Alembic on a machine with no C compiler
contains_c_extension(django)
""" # noqa
assert inspect.ismodule(module), f'"{module}" not a module.'
if seen is None: # only true for the top-level call
seen = [] # type: List[ModuleType]
if module in seen: # modules can "contain" themselves
# already inspected; avoid infinite loops
return False
seen.append(module)
# Check the thing we were asked about
is_c_ext = is_c_extension(module)
if verbose:
log.info(f"Is module {module!r} a C extension? {is_c_ext}")
if is_c_ext:
return True
if is_builtin_module(module):
# built-in, therefore we stop searching it
return False
# Now check any children, in a couple of ways
top_level_module = seen[0]
top_path = os.path.dirname(top_level_module.__file__)
# Recurse using dir(). This picks up modules that are automatically
# imported by our top-level model. But it won't pick up all submodules;
# try e.g. for django.
for candidate_name in dir(module):
candidate = getattr(module, candidate_name)
# noinspection PyBroadException
try:
if not inspect.ismodule(candidate):
# not a module
continue
except Exception:
# e.g. a Django module that won't import until we configure its
# settings
log.error(f"Failed to test ismodule() status of {candidate!r}")
continue
if is_builtin_module(candidate):
# built-in, therefore we stop searching it
continue
candidate_fname = getattr(candidate, "__file__")
if not include_external_imports:
if os.path.commonpath([top_path, candidate_fname]) != top_path:
if verbose:
log.debug(f"Skipping, not within the top-level module's "
f"directory: {candidate!r}")
continue
# Recurse:
if contains_c_extension(
module=candidate,
import_all_submodules=False, # only done at the top level, below # noqa
include_external_imports=include_external_imports,
seen=seen):
return True
if import_all_submodules:
if not is_module_a_package(module):
if verbose:
log.debug(f"Top-level module is not a package: {module!r}")
return False
# Otherwise, for things like Django, we need to recurse in a different
# way to scan everything.
# See https://dev59.com/0HA75IYBdhLWcg3wT3L8 # noqa
log.debug(f"Walking path: {top_path!r}")
try:
for loader, module_name, is_pkg in pkgutil.walk_packages([top_path]): # noqa
if not is_pkg:
log.debug(f"Skipping, not a package: {module_name!r}")
continue
log.debug(f"Manually importing: {module_name!r}")
# noinspection PyBroadException
try:
candidate = loader.find_module(module_name)\
.load_module(module_name) # noqa
except Exception:
# e.g. Alembic "autogenerate" gives: "ValueError: attempted
# relative import beyond top-level package"; or Django
# "django.core.exceptions.ImproperlyConfigured"
log.error(f"Package failed to import: {module_name!r}")
continue
if contains_c_extension(
module=candidate,
import_all_submodules=False, # only done at the top level # noqa
include_external_imports=include_external_imports,
seen=seen):
return True
except Exception:
log.error("Unable to walk packages further; no C extensions "
"detected so far!")
raise
return False
# noinspection PyUnresolvedReferences,PyTypeChecker
def test() -> None:
import _elementtree as et
import arrow
import alembic
import django
import django.conf
import numpy
import numpy.core.multiarray as numpy_multiarray
log.info(f"contains_c_extension(os): "
f"{contains_c_extension(os)}") # False
log.info(f"contains_c_extension(et): "
f"{contains_c_extension(et)}") # False
log.info(f"is_c_extension(numpy): "
f"{is_c_extension(numpy)}") # False
log.info(f"contains_c_extension(numpy): "
f"{contains_c_extension(numpy)}") # True
log.info(f"contains_c_extension(numpy_multiarray): "
f"{contains_c_extension(numpy_multiarray)}") # True # noqa
log.info(f"contains_c_extension(arrow): "
f"{contains_c_extension(arrow)}") # False
log.info(f"contains_c_extension(alembic): "
f"{contains_c_extension(alembic)}") # False
log.info(f"contains_c_extension(alembic, include_external_imports=True): "
f"{contains_c_extension(alembic, include_external_imports=True)}") # True # noqa
# ... this example shows that Alembic imports hashlib, which can import
# _hashlib, which is a C extension; however, that doesn't stop us (for
# example) installing Alembic on a machine with no C compiler
django.conf.settings.configure()
log.info(f"contains_c_extension(django): "
f"{contains_c_extension(django)}") # False
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO) # be verbose
test()
def is_c(module):
# if module is part of the main python library (e.g. os), it won't have a path
try:
for path, subdirs, files in os.walk(module.__path__[0]):
for f in files:
ftype = f.split('.')[-1]
if ftype == 'so':
is_c = True
break
return is_c
except AttributeError:
path = inspect.getfile(module)
suffix = path.split('.')[-1]
if suffix != 'so':
return False
elif suffix == 'so':
return True
is_c(os), is_c(im), is_c(et), is_c_extension(ma), is_c(numpy)
# (False, False, True, True, True)
如果像我一样看到了 @Cecil Curry 出色的回答,并想知道如何以超级懒的方式处理整个需求文件而不使用 @Rudolf Cardinal 复杂的子库遍历,那么请看以下:
首先,在一个虚拟环境中把所有已安装的要求(假设你没有其他东西)倾倒到一个文件中:pip freeze > requirements.txt
。
然后运行以下脚本来检查每个要求。
注意:这种方法非常懒惰,对于很多库,它们的导入名称与其 pip 名称不匹配,因此无法正常工作。
import inspect, os
import importlib
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
from types import ModuleType
# function from Cecil Curry's answer:
def is_c_extension(module: ModuleType) -> bool:
'''
`True` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Parameters
----------
module : ModuleType
Previously imported module object to be tested.
Returns
----------
bool
`True` only if this module is a C extension.
'''
assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
with open('requirements.txt') as f:
lines = f.readlines()
for line in lines:
# super lazy pip name to library name conversion
# there is probably a better way to do this.
lib = line.split("=")[0].replace("python-","").replace("-","_").lower()
try:
mod = importlib.import_module(lib)
print(f"is {lib} a c extension? : {is_c_extension(mod)}")
except:
print(f"could not check {lib}, perhaps the name for imports is different?")
dir
/检查文档以了解更多信息。 - Marcinnumpy
是一个纯Python模块,而pickle
无论_Pickle
和朋友们是来自C加速器还是纯Python,都是纯Python。 - abarnertcPickle
的repr
,它具有路径名,而不是字符串“built-in”。而区分内置模块的唯一官方启发式方法是缺少__file__
,这对于cPickle
也不正确。 - abarnert__file__
属性以.so
结尾,但我不知道是否始终或通常是这种情况。 - cjerdonek