批处理脚本：从文件中移除BOM（ï»¿）

Question

批处理脚本：从文件中移除BOM（ï»¿）

batch-filebyte-order-mark

4

我已经创建了一个批处理脚本，用于将 SQL 文件从文件夹复制到一个大型 SQL 脚本中。问题是当我运行这个 SQL 脚本时，出现错误：

Incorrect syntax near ''

我将这个 SQL 脚本复制到 Notepad++ 中，并将编码设置为 ANSI。我看到发生错误的地方有一个符号ï»¿（BOM）。

是否有任何方法可以在我的批处理脚本中自动删除它？我不想每次运行此任务时都手动移除它。

下面是我目前的批处理脚本：

@echo off

set "path2work=C:\StoredProcedures"
cd /d "%path2work%"

echo. > C:\FinalScript\AllScripts.sql

for %%a in (*.sql) do (

    echo. >>"C:\FinalScript\AllScripts.sql"
    echo GO >>"C:\FinalScript\AllScripts.sql"
    type "%%a">>"C:\FinalScript\AllScripts.sql"
    echo. >>"C:\FinalScript\AllScripts.sql"
)

- user10127407

1

"ANSI"没有BOM。当您将带有BOM的UTF-8文件解释为ANSI时，会得到"ï»¿"。即使如此，它也应该仅出现在文件的开头。但是您说您在多行中看到了"ï»¿"，而不仅仅是在第一行的开头。在这种情况下，它不是BOM，而是一个非断空零宽度空格。 - MSalters

4个回答

3

TypeWithoutBOM.bat

@echo off
set "RemoveUTF8BOM=(pause & pause & pause)>nul"
type %1|(%RemoveUTF8BOM% & findstr "^")

这个批处理文件的作用类似于type命令，但会删除显示的文件前三个字节。
用法: TypeWithoutBOM UTF8-file.txt > newfile.txt

- Michael Hutter

2

根据MSalters在他的评论中提到，根据wikipedia，ï»¿是UTF8 BOM的ANSI表示。

PowerShell比批处理更适合处理编码。

## Q:\Test\2018\09\11\SO_522772705.ps1
Set-Location 'C:\StoredProcedures'
Get-ChildItem '*.sql' | ForEach-Object {
    "`nGO"
    Get-Content $_.FullName -Encoding UTF8
    ""
} | Set-Content 'C:\FinalScript\AllScripts.sql' -Encoding UTF8

与标签batch-file相关的主题是批处理调用PowerShell的基本部分：

:: Q:\Test\2018\09\11\SO_522772705..cmd
@echo off
set "path2work=C:\StoredProcedures"
cd /d "%path2work%"

powershell -NoProfile -Command "Get-ChildItem '*.sql'|ForEach-Object{\"`nGO\";Get-Content $_.FullName -Enc UTF8;\"\"}|Set-Content 'C:\FinalScript\AllScripts.sql' -Enc UTF8"

- user6811411

1

您只需要将编码更改为UTF-8（无BOM）并保存文件即可。

请注意，在旧版Notepad++中，菜单项略有不同。

- phuclv

1

有没有自动化的方法来完成这个任务？这个批处理文件是在构建服务器进程中运行的。 - user10127407

你是否正在自动生成批处理文件？如果是的话，请配置生成器以停止发射BOM。在这种情况下，您需要为生成器提供代码。 - phuclv

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- sst · Accepted Answer

这是因为type命令会保留UTF-8 BOM，所以当您组合多个带有BOM的文件时，最终文件将在文件中间包含多个BOM。

如果您确定要合并的所有SQL文件都以BOM开头，那么您可以在实际组合它们之前使用以下脚本从每个文件中删除BOM来完成此操作。

这是通过将type的输出导入管道来完成的。管道的另一侧将借助3个pause命令消耗前3个字节（BOM）。每个pause将消耗一个字节。剩余的流将被发送到findstr命令以将其附加到最终脚本中。

由于SQL文件是以UTF-8编码的，它们可能包含Unicode范围内的任何字符，某些代码页会干扰操作，并可能导致最终SQL脚本损坏。

因此已经考虑到这一点，并且批处理文件将以安全访问任何二进制序列的代码页437重新启动。

@echo off
setlocal DisableDelayedExpansion


setlocal EnableDelayedExpansion
for /F "tokens=*" %%a in ('chcp') do for %%b in (%%a) do set "CP=%%~nb"
if  !CP! NEQ 437 if !CP! NEQ 65001 chcp 437 >nul && (

    REM for file operations, the script must restatred in a new instance.
    "%COMSPEC%" /c "%~f0"

    REM Restoring previous code page
    chcp !CP! >nul
    exit /b
)
endlocal


set "RemoveUTF8BOM=(pause & pause & pause)>nul"
set "echoNL=echo("
set "FinalScript=C:\FinalScript\AllScripts.sql"

:: If you want the final script to start with UTF-8 BOM (This is optional)
:: Create an empty file in NotePad and save it as UTF8-BOM.txt with UTF-8 encoding.
:: Or Create a file in your HexEditor with this byte sequence: EF BB BF
:: and save it as UTF8-BOM.txt
:: The file must be exactly 3 bytes with the above sequence.
(
    type "UTF8-BOM.txt" 2>nul

    REM This assumes that all sql files start with UTF-8 BOM
    REM If not, then they will loose their first 3 otherwise legitimate characters.
    REM Resulting in a final corrupted script.
    for %%A in (*.sql) do (type "%%~A" & %echoNL%)|(%RemoveUTF8BOM% & findstr "^")

)>"%FinalScript%"