如何使用PowerShell显示Unicode字符名称及其十六进制代码?

3

由于默认的Windows PowerShell控制台字体不支持表情符号,我想显示它们的代理对十六进制代码,并且最好还能显示它们的Unicode字符名称以进行调试。

我知道如何将表情符号转换为字节数组,但我还没有弄清楚如何将它们转换为代理对十六进制代码和Unicode字符名称。

$ThumbsUp = ""
$Bytes = [system.Text.Encoding]::UTF8.GetBytes($ThumbsUp)
# output
#240
#159
#145
#141

我需要的是以下输出:
$Hex = 0x1F44D 
$CharName = "Thumbs Up Sign"

也就是说,以下命令应该将十六进制值转换回表情符号:

[char]::ConvertFromUtf32($Hex)
# output
#
3个回答

5

部分回答 - 我只知道如何获取UTF-32码点:

$ThumbsUp = ""
$utf32bytes = [System.Text.Encoding]::UTF32.GetBytes( $ThumbsUp )
$codePoint = [System.BitConverter]::ToUint32( $utf32bytes )
"0x{0:X}" -f $codePoint

输出:

0x1F44D

对于字符名称,您可能在这里找到答案: 在.NET中查找Unicode字符名称

5
也许以下脚本(我的整个项目的一部分)可以帮助。该脚本定义了相当复杂的 Get-CharInfo 函数。
示例:'r Ř',0x1F44D|chr -OutUni -OutHex -OutStr -IgnoreWhiteSpace
r Ř
0x0072,0x002C,0x0158,0x0001F44D
\u0072\u002C\u0158\U0001F44D
Char CodePoint                              Category Description
---- ---------                              -------- -----------
   r {U+0072, 0x72}                  LowercaseLetter Latin Small Letter R
   Ř {U+0158, 0xC5,0x98}             UppercaseLetter Latin Capital Letter R With Caron
   {U+1F44D, 0xF0,0x9F,0x91,0x8D}               So THUMBS UP SIGN (0xd83d,0xdc4d)
 #             ↑ UFF-8                               ↑ name          ↑ surrogates

代码(函数体末尾的基于注释的帮助):

# Get-CharInfo function. Activate dot-sourced
# . .\_get-CharInfo_2.1.ps1
# Comment-based help at the end of the function body
# History notes at the very end of the script

if ( -not ('Microsofts.CharMap.UName' -as [type]) ) {
  Add-Type -Name UName -Namespace Microsofts.CharMap -MemberDefinition $(
    switch ("$([System.Environment]::SystemDirectory -replace 
                '\\', '\\')\\getuname.dll") {
    {Test-Path -LiteralPath $_ -PathType Leaf} {@"
[DllImport("${_}", ExactSpelling=true, SetLastError=true)]
private static extern int GetUName(ushort wCharCode, 
    [MarshalAs(UnmanagedType.LPWStr)] System.Text.StringBuilder buf);

public static string Get(char ch) {
    var sb = new System.Text.StringBuilder(300);
    UName.GetUName(ch, sb);
    return sb.ToString();
}
"@
    }
    default {'public static string Get(char ch) { return "???"; }'}
    })
}
function Get-CharInfo {
    [CmdletBinding()]
    [OutputType([System.Management.Automation.PSCustomObject],
                [System.Array])]
    param(
        # named or positional: a string or a number e.g. 'r Ř'
        # pipeline: an array of strings and numbers, e.g 'r Ř',0x1f44d
        [Parameter(Position=0, Mandatory, ValueFromPipeline)]
        $InputObject,
        # + Write-Host Python-like Unicode literal e.g. \u0072\u0020\u0158\U0001F44D
        [Parameter()]
        [switch]$OutUni,
        # + Write-Host array of hexadecimals e.g. 0x0072,0x0020,0x0158,0x0001F44D
        [Parameter()]
        [switch]$OutHex,
        # + Write-Host concatenated string e.g. r Ř
        [Parameter()]
        [switch]$OutStr,
        # choke down whitespaces ( $s -match '\s' ) from output
        [Parameter()]
        [switch]$IgnoreWhiteSpace,
        # from https://www.unicode.org/Public/UNIDATA/UnicodeData.txt
        [Parameter()]
        [string]$UnicodeData = 'D:\Utils\CodePages\UnicodeData.txt'
    )
    begin {
        Set-StrictMode -Version latest
        if ( [string]::IsNullOrEmpty( $UnicodeData) ) { $UnicodeData = '::' }
        Function ReadUnicodeRanges {
            if ($Script:UnicodeFirstLast.Count -eq 0) {
                $Script:UnicodeFirstLast = @'
                    First,Last,Category,Description
                    128,128,Cc-Control,Padding Character
                    129,129,Cc-Control,High Octet Preset
                    132,132,Cc-Control,Index
                    153,153,Cc-Control,Single Graphic Character Introducer
                    13312,19903,Lo-Other_Letter,CJK Ideograph Extension A
                    19968,40956,Lo-Other_Letter,CJK Ideograph
                    44032,55203,Lo-Other_Letter,Hangul Syllable
                    94208,100343,Lo-Other_Letter,Tangut Ideograph
                    101632,101640,Lo-Other_Letter,Tangut Ideograph Supplement
                    131072,173789,Lo-Other_Letter,CJK Ideograph Extension B
                    173824,177972,Lo-Other_Letter,CJK Ideograph Extension C
                    177984,178205,Lo-Other_Letter,CJK Ideograph Extension D
                    178208,183969,Lo-Other_Letter,CJK Ideograph Extension E
                    183984,191456,Lo-Other_Letter,CJK Ideograph Extension F
                    196608,201546,Lo-Other_Letter,CJK Ideograph Extension G
                    983040,1048573,Co-Private_Use,Plane 15 Private Use
                    1048576,1114109,Co-Private_Use,Plane 16 Private Use
'@ | ConvertFrom-Csv -Delimiter ',' |
                ForEach-Object {
                    [PSCustomObject]@{
                        First      = [int]$_.First
                        Last       = [int]$_.Last
                        Category   = $_.Category
                        Description= $_.Description
                    }
                }
            }
            foreach ( $FirstLast in $Script:UnicodeFirstLast) {
                if ( $FirstLast.First -le $ch -and $ch -le $FirstLast.Last ) {
                    $out.Category = $FirstLast.Category
                    $out.Description = $FirstLast.Description + $nil
                    break
                }
            }
        }
        $AuxHex = [System.Collections.ArrayList]::new()
        $AuxStr = [System.Collections.ArrayList]::new()
        $AuxUni = [System.Collections.ArrayList]::new()
        $Script:UnicodeFirstLast = @()
        $Script:UnicodeDataLines = @()
        function ReadUnicodeData {
            if ( $Script:UnicodeDataLines.Count -eq 0 -and (Test-Path $UnicodeData) ) {
                 $Script:UnicodeDataLines = @([System.IO.File]::ReadAllLines(
                        $UnicodeData, [System.Text.Encoding]::UTF8))
            }
            $DescrLine = $Script:UnicodeDataLines -match ('^{0:X4}\;' -f $ch)
            if ( $DescrLine.Count -gt 0) {
                $u0, $Descr, $Categ, $u3 = $DescrLine[0] -split ';'
                $out.Category = $Categ
                $out.Description = $Descr + $nil
            }
        }
        function out {
            param(
                [Parameter(Position=0, Mandatory=$true )] $ch,
                [Parameter(Position=1, Mandatory=$false)]$nil=''
                 )
            if (0 -le $ch -and 0xFFFF -ge $ch) {
                [void]$AuxHex.Add('0x{0:X4}' -f $ch)
                $s = [char]$ch
                [void]$AuxStr.Add($s)
                [void]$AuxUni.Add('\u{0:X4}' -f $ch)
                $out = [pscustomobject]@{
                    Char      = $s
                    CodePoint = ('U+{0:X4}' -f $ch),
                        (([System.Text.UTF32Encoding]::UTF8.GetBytes($s) |
                            ForEach-Object { '0x{0:X2}' -f $_ }) -join ',')
                    Category  = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($ch)
                    Description = [Microsofts.CharMap.UName]::Get($ch)
                }
                if ( $out.Description -eq 'Undefined' ) { ReadUnicodeRanges }
                if ( $out.Description -eq 'Undefined' ) { ReadUnicodeData }
            } elseif (0x10000 -le $ch -and 0x10FFFF -ge $ch) {
                [void]$AuxHex.Add('0x{0:X8}' -f $ch)
                $s = [char]::ConvertFromUtf32($ch)
                [void]$AuxStr.Add($s)
                [void]$AuxUni.Add('\U{0:X8}' -f $ch)
                $out = [pscustomobject]@{
                    Char        = $s
                    CodePoint   = ('U+{0:X}' -f $ch),
                        (([System.Text.UTF32Encoding]::UTF8.GetBytes($s) |
                            ForEach-Object { '0x{0:X2}' -f $_ }) -join ',')
                    Category    = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($s, 0)
                    Description = '???' + $nil
                }
                ReadUnicodeRanges 
                if ( $out.Description -eq ('???' + $nil) ) { ReadUnicodeData }
            } else {
                Write-Warning ('Character U+{0:X4} is out of range' -f $ch)
                $s = $null
            }
            if (( $null -eq $s ) -or
                ( $IgnoreWhiteSpace.IsPresent -and ( $s -match '\s' ))
               ) {
            } else {
                $out
            }
        }
    }
    process {
        #if ($PSBoundParameters['Verbose']) {
        #    Write-Warning "InputObject $InputObject, type = $($InputObject.GetType().Name)"
        #}
        if ( ($InputObject -as [int]) -gt 0xFFFF -and 
             ($InputObject -as [int]) -le 0x10ffff ) {
            $InputObject = [string][char]::ConvertFromUtf32($InputObject)
        }
        if ($null -cne ($InputObject -as [char])) {
            #Write-Verbose "A $([char]$InputObject) InputObject character"
            out $([int][char]$InputObject) ''
        } elseif (  $InputObject -isnot [string] -and 
                    $null -cne ($InputObject -as [int])) {
            #Write-Verbose "B $InputObject InputObject"
            out $([int]$InputObject) ''
        } else {
            $InputObject = [string]$InputObject
            #Write-Verbose "C $InputObject InputObject.Length $($InputObject.Length)"
            for ($i = 0; $i -lt $InputObject.Length; ++$i) {
                if (  [char]::IsHighSurrogate($InputObject[$i]) -and 
                      (1+$i) -lt $InputObject.Length -and 
                      [char]::IsLowSurrogate($InputObject[$i+1])) {
                    $aux = ' (0x{0:x4},0x{1:x4})' -f [int]$InputObject[$i], 
                                                   [int]$InputObject[$i+1]
                    # Write-Verbose "surrogate pair $aux at position $i" 
                    out $([char]::ConvertToUtf32($InputObject[$i], $InputObject[1+$i])) $aux
                    $i++
                } else {
                    out $([int][char]$InputObject[$i]) ''
                }
            }
        }
    }
    end {
        if ( $OutStr.IsPresent -or $PSBoundParameters['Verbose']) {
            Write-Host -ForegroundColor Magenta -Object $($AuxStr -join '')
        }
        if ( $OutHex.IsPresent -or $PSBoundParameters['Verbose']) {
            Write-Host -ForegroundColor Cyan -Object $($AuxHex -join ',')
        }
        if ( $OutUni.IsPresent -or $PSBoundParameters['Verbose']) {
            Write-Host -ForegroundColor Yellow -Object $($AuxUni -join '')
        }
    }
<#
.SYNOPSIS
Return basic information about supplied Unicode characters.

.DESCRIPTION
Return information about supplied Unicode characters:
    - as a PSCustomObject for programming purposes,
    - in a human-readable form, and
    - with optional additional output to the Information Stream.

Properties of the output PSCustomObject are as follows:

Char        The character itself (if renderable)
CodePoint   [string[]]Unicode CodePoint, its UTF-8 byte sequence
Category    General Category (long name or abbreviation)
Description Name (and surrogate pair in parentheses if apply).

.INPUTS
    An array of characters, strings and numbers (in any combination)
    can be piped to the function as parameter $InputObject, e.g as
    "ΧАB",[char]4301,191,0x1F3DE | Get-CharInfo
    or (the same in terms of decimal numbers) as
    935,1040,66,4301,191,127966 | Get-CharInfo

    On the other side, the $InputObject parameter supplied named
    or positionally must be of the only base type: either a number
    or a character or a string.
    The same input as a string:
    Get-CharInfo -InputObject 'ΧАBჍ¿'

    -Verbose implies all -OutUni, -OutHex and -OutStr

.OUTPUTS
    [System.Management.Automation.PSCustomObject]
    [Object[]]    (an array like [PSCustomObject[]])

.NOTES
    The UnicodeData.txt file (if used) must be saved locally
    from https://www.unicode.org/Public/UNIDATA/UnicodeData.txt
    (currently Unicode 13.0.0)

    The UnicodeData.txt file is not required however, in such case,
    Get-CharInfo function could be return inaccurate properties
    Category and Description for characters above BMP, see Example-3.

.LINK
    Unicode® Standard Annex #44: Unicode Character Database (UCD)
.LINK
    https://www.unicode.org/reports/tr44/
.LINK
    https://www.unicode.org/reports/tr44/#General_Category_Values

.EXAMPLE
# full (first three lines are in the Information Stream)
'r Ř'|Get-CharInfo -OutUni -OutHex -OutStr -IgnoreWhiteSpace

r Ř
0x0072,0x0020,0x0158,0x0001F44D
\u0072\u0020\u0158\U0001F44D
Char CodePoint                             Category Description                
---- ---------                             -------- -----------                
   r {U+0072, 0x72}                 LowercaseLetter Latin Small Letter R       
   Ř {U+0158, 0xC5,0x98}            UppercaseLetter Latin Capital Letter R W...
   {U+1F44D, 0xF0,0x9F,0x91,0x8D}              So THUMBS UP SIGN (0xd83d,0...


.EXAMPLE
# shortened version of above (output is the same)
'r Ř'|chr -Verbose -IgnoreWhiteSpace

.EXAMPLE
# inaccurate (inexact) output above BMP if missing UnicodeData.txt
'r Ř'|chr -Verbose -IgnoreWhiteSpace -UnicodeData .\foo.bar

r Ř
0x0072,0x0020,0x0158,0x0001F44D
\u0072\u0020\u0158\U0001F44D
Char CodePoint                             Category Description                
---- ---------                             -------- -----------                
   r {U+0072, 0x72}                 LowercaseLetter Latin Small Letter R       
   Ř {U+0158, 0xC5,0x98}            UppercaseLetter Latin Capital Letter R W...
   {U+1F44D, 0xF0,0x9F,0x91,0x8D}     OtherSymbol ??? (0xd83d,0xdc4d)        


.FUNCTIONALITY
Tested: Windows 8.1/64bit, Powershell 4
        Windows 10 /64bit, Powershell 5
        Windows 10 /64bit, Powershell Core 6.2.0
        Windows 10 /64bit, Powershell Core 7.1.0
#>
}
Set-Alias -Name chr -Value Get-CharInfo

<#
HISTORY NOTES

Origin by: http://poshcode.org/5234
           http://fossil.include-once.org/poshcode/artifact/5757dbbd0bc26c84333e7cf4ccc330ab89447bf679e86ddd6fbd3589ca24027e

License: CC0
  https://creativecommons.org/publicdomain/zero/1.0/legalcode

Activate dot-sourced like this (apply a real path instead of .\):
. .\Get-CharInfo.ps1

Improved by: https://stackoverflow.com/users/3439404/josefz
             (to version 2)
#>

我的Powershell版本是5.1。为什么我无法运行Get-CharInfo命令?... Get-CharInfo:未将术语'Get-CharInfo'识别为cmdlet名称 - Clamarc
@Clamarc 阅读并遵循四行注释...(将代码保存为get_charinfo.ps1,使用点源操作符 .激活)。 - JosefZ
我下载了Get-CharInfo.ps1,但是我不确定!Dot-source运算符是一个模块,我必须在我的PowerShell中下载才能调用我的脚本中的Get-CharInfo.ps1,还是它已经是PowerShell的一部分?在Get-CharInfo脚本的注释中有这样的信息:“DOWNLOAD and DEFINITION OF THE FUNCTION irm https://gist.github.com/mklement0/25694cbb8e10a7044b36a310e1243959/raw/Get-CharInfo.ps1 | iex”上述直接定义了该函数...那会是什么呢?...抱歉,这是我第一次使用这个! - Clamarc
在gist.github.com/mklement0上有另一个同名的脚本。要进行点源引用,请使用. Get-CharInfo.ps1 - JosefZ
好的,我应该把Get-CharInfo.ps1放在哪个文件夹中,以便我的脚本能够调用它? - Clamarc
@Clamarc 像往常一样,您可以将任何脚本保存到任意位置(请记住,_调用_脚本需要指定文件的相对路径或完全限定路径)。 - JosefZ

3
这是一个简单的脚本来获取名称。请注意,表情符号由两个代理字符组成。使用哈希比使用where-object更快,即使只搜索一个包含34,626行的文件。
# idChar.ps1

param($inputChar)

if ($inputchar[0] -ge 0xD800 -and $inputchar[0] -le 0xDBFF -and
$inputchar.length -eq 1) {
  write-warning 'paste emojis with control v not right click, half of surrogate pair missing'
}

if (! (test-path $psscriptroot\UnicodeData.txt)) { 
  wget http://www.unicode.org/Public/UNIDATA/UnicodeData.txt -outfile UnicodeData.txt
}
$unicode = import-csv $psscriptroot\UnicodeData.txt -Delimiter ';' -Header hexcode,
  name
$unicode | % { $hash = @{} } { $hash[[int]('0x' + $_.hexcode)] = $_ }
$hash[[char]::ConvertToUtf32($inputChar,($index=0))]

示例(按Ctrl+V粘贴到控制台,不要右键点击,以使用psreadline的粘贴功能):
.\idChar 

hexcode name
------- ----
1F44D   THUMBS UP SIGN


.\idChar —

hexcode name
------- ----
2014    EM DASH


.\idchar �

WARNING: paste emojis with control v not right click, half of surrogate pair missing
MethodInvocationException: C:\Users\ccfadmin\idChar.ps1:24
Line |
  24 |  $result = $hash[[char]::ConvertToUtf32($inputChar,$index)]
     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Exception calling "ConvertToUtf32" with "2" argument(s): "Found a high surrogate char without a following low
     | surrogate at index: 0. The input may not be in this encoding, or may not contain valid Unicode (UTF-16)
     | characters. (Parameter 's')"

inputchar hexcode name
--------- ------- ----

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接