注意:
PowerShell Core(v6+)的观点(有关Windows PowerShell的下一节),与字符渲染问题无关(也在下一节中介绍),关于与外部程序通信:
在类Unix平台上,PowerShell Core默认使用UTF-8编码(通常是因为现代的类Unix平台基于UTF-8的本地化环境)。
在Windows上,通过其旧版的系统区域设置,即通过其OEM代码页来确定默认编码,包括所有控制台,包括Windows PowerShell和PowerShell Core控制台窗口,尽管最近的Windows 10版本允许将系统区域设置为代码页65001
(UTF-8);请注意,此功能作为本文写作时仍处于测试版,并且使用它会产生深远的后果 - 参见this answer。
使您的Windows PowerShell控制台窗口支持Unicode(UTF-8):
在Windows PowerShell中,以下神奇的咒语可以实现这一点(如所述,这
隐含地 执行了
chcp 65001
):
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding =
New-Object System.Text.UTF8Encoding
为了使您未来的交互式PowerShell会话默认支持UTF-8,请将上述命令添加到您的$PROFILE文件中,以“持久化”这些设置。
注意:最近的Windows 10版本现在允许
将系统区域设置设置为代码页65001(UTF-8)(该功能在Windows 10版本1903中仍处于测试阶段),这将使所有控制台窗口都默认为UTF-8,包括Windows PowerShell的控制台窗口。如果您使用该功能,则不再严格需要设置[console] :: InputEncoding / [console] :: OutputEncoding,但您仍然需要设置$OutputEncoding(在PowerShell Core中不需要设置$OutputEncoding,因为它已经默认为UTF-8)。
重要提示:
这些设置假定你与任何外部实用程序的通信都期望UTF-8编码的输入并产生UTF-8输出。
- 例如,使用Node.js编写的CLI符合此标准。
- 如果Python脚本考虑到UTF-8支持,也可以处理UTF-8。
相比之下,这些设置可能会破坏(旧版)只期望单字节编码的实用程序,因为这暗示了系统的传统OEM代码页。
- 直到Windows 8.1,甚至包括标准的Windows实用程序,如
find.exe
和findstr.exe
,这在Windows 10中已被修复。
- 请参阅本文底部,了解如何通过临时切换到UTF-8来解决此问题,以满足调用给定实用程序的需求。
这些设置仅适用于外部程序,与PowerShell的cmdlet在输出上使用的编码无关:
- 有关PowerShell cmdlet使用的默认字符编码,请参见this answer;简而言之:如果要使Windows PowerShell中的cmdlet默认为UTF-8(PowerShell [Core] v6+无论如何都是这样),请将
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
添加到您的$PROFILE
,但请注意,除非显式使用该参数,否则这将影响所有调用带有-Encoding
参数的cmdlet的会话;还请注意,在Windows PowerShell中,您将始终获得带BOM的UTF-8文件;相反,在默认为BOM-less UTF-8的PowerShell [Core] v6+中(在没有-Encoding
和-Encoding utf8
的情况下),您必须使用'utf8BOM'
。
可选背景信息
向eryksun致敬,感谢他的所有贡献。
While a TrueType font is active, the console-window buffer correctly preserves (non-ASCII) Unicode chars. even if they don't render correctly; that is, even though they may appear generically as ?
, so as to indicate lack of support by the current font, you can copy & paste such characters elsewhere without loss of information, as eryksun notes.
PowerShell is capable of outputting Unicode characters to the console even without having switched to code page 65001
first.
However, that by itself does not guarantee that other programs can handle such output correctly - see below.
When it comes to communicating with external programs via stdout (piping), PowersShell uses the character encoding specified in the $OutputEncoding
preference variable, which defaults to ASCII(!) in Windows PowerShell, which means that any non-ASCII characters are transliterated to literal ?
characters, resulting in information loss. (By contrast, commendably, PowerShell Core (v6+) now uses (BOM-less) UTF-8 as the default encoding, consistently.)
- By contrast, however, passing non-ASCII arguments (rather than stdout (piped) output) to external programs seems to require no special configuration (it is unclear to me why that works); e.g., the following Node.js command correctly returns
€: 1
even with the default configuration:
node -pe "process.argv[1] + ': ' + process.argv[1].length" €
[Console]::OutputEncoding
:
- controls what character encoding is assumed when the console translates program output into console display characters.
- also tells PowerShell what encoding to assume when capturing output from an external program.
The upshot is that if you need to capture output from an UTF-8-producing program, you need to set [Console]::OutputEncoding
to UTF-8 as well; setting $OutputEncoding
only covers the input (to the external program) aspect.
[Console]::InputEncoding
sets the encoding for keyboard input into a console[2] and also determines how PowerShell's CLI interprets data it receives via stdin (standard input).
If switching the console to UTF-8 for the entire session is not an option, you can do so temporarily, for a given call:
$oldOutputEncoding = $OutputEncoding; $oldConsoleEncoding = [Console]::OutputEncoding
$OutputEncoding = [Console]::OutputEncoding = New-Object System.Text.Utf8Encoding
$captured = '€' | node -pe "require('fs').readFileSync(0).toString().trim()"
$captured; $captured.Length
$OutputEncoding = $oldOutputEncoding; [Console]::OutputEncoding = $oldConsoleEncoding
Problems on older versions of Windows (pre-W10):
- An active
chcp
value of 65001
breaking the console output of some external programs and even batch files in general in older versions of Windows may ultimately have stemmed from a bug in the WriteFile()
Windows API function (as also used by the standard C library), which mistakenly reported the number of characters rather than bytes with code page 65001
in effect, as discussed in this blog post.
The resulting symptoms, according to a comment by bobince on this answer from 2008, are: "My understanding is that calls that return a number-of-bytes (such as fread/fwrite/etc) actually return a number-of-characters. This causes a wide variety of symptoms, such as incomplete input-reading, hangs in fflush, the broken batch files and so on."
优秀的替代方案,取代Windows自带控制台(终端)conhost.exe
eryksun提供了两个替代方案,取代原生Windows控制台窗口(conhost.exe
),它们使用现代的、GPU加速的DirectWrite/DirectX API,提供了更好、更快的Unicode字符渲染,而不是“旧的GDI实现[无法处理复杂的脚本、非BMP字符或自动回退字体]”。
[1] 请注意,在 PowerShell 会话中运行 chcp 65001
是无效的,因为 .NET 在启动时缓存了控制台的输出编码,并且不知道后来使用 chcp
进行的更改(只有直接通过 [console] :: OutputEncoding]
进行的更改才会被捕获)。
[2] 我不清楚这在实践中如何体现;如果您知道,请告诉我们。