在文件中删除BOM字符

Question

在文件中删除BOM字符

35

我在html文件中有一个BOM字符，想要删除它。我已经搜索了很多并使用了很多脚本等等，但是没有一个有效。我也下载了notepad ++，但是在其编码菜单中没有“UTF8 without BOM”的编码方式。请问如何删除那个BOM字符？谢谢。

我的Notepad ++截图

- Meysam Valueian

3个回答

6

您可以使用vim来解决这个问题，在MinGW-w64中很容易获取（如果您已安装Git，则会随之而来）或Cygwin。

所以，关键是使用：

- 选项“-s”，它将执行一个带有vim命令的vim脚本。 - 选项“-b”，它将以二进制模式打开您的文件，在那里您将看到那些尴尬的BOM字节。 - 选项“-n”非常重要！此选项拒绝使用交换文件，因此所有操作都在内存中运行。它会给您保证，因为如果文件很大，交换文件可能会误导过程。

说了这么多，让我们看代码吧！

First you create a simple file, here named 'script', which will hold the vim commands
```
echo 'gg"+gPggdtCZZ' > script
```
...this weird string says to vim "Go to the beginning of the file, copy the first word and paste it behind the cursor, so delete everything until character 'C', then, save the file"

Note: If your file starts with other character than 'C', you have to specify it. If you have different 'first characters', you can follow the logic and create a bash script which will read the first character and replace it for you in the snippet above.
Run the vim command:
```
vim -n -b <the_file> -s script
```

- Leandro Ferreira Fernandes

5

如果你想使用 Vim，这个命令会简单得多：vim <文件名> "+set nobomb" "+wq"。这样，你就不必知道文件的第一个可见字符了。 - Neal Gokli

你能详细说明一下交换文件对你的脚本造成了什么问题吗？它不应该是透明的吗？ - Neal Gokli

在Windows上，您可以直接从https://vim.sourceforge.io/download.php下载Vim安装程序。无需MinGW-w64或Cygwin。 - Neal Gokli

1

你的建议非常好用！vim <filename> "+set nobomb" "+wq"关于交换文件，当你处理大量大小超过10MB的文件时，vim会在后台使用.swap文件而不是原始文件，因此，在运行所有文件后，通常会出现损坏的文件。因此，解决方案是使用-n选项直接将文件加载到内存中。 - Leandro Ferreira Fernandes

0

我认为这不应该被视为问题。当它是一个问题时，BOM只有3个字节EF BB BF。我们不能只是删除它吗？或者更改为其他内容，然后再次关闭文件？

无论如何，下面的代码可以解决问题，如果存在BOM，则将其更改为“***”。运行方式：

x file

其中 file 是文件的名称。

#define _CRT_SECURE_NO_WARNINGS     
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv)
{
    const unsigned char BOM[3] = { '\xEF', '\xBB', '\xBF' };
    char file_name[64] = { "target.csv" };
    if (argc > 1) strcpy(file_name, argv[1]);
    FILE* one = fopen(file_name, "r+b");
    if (!one) return -1;
    unsigned char buffer[64];
    int n = fread(buffer, 1, 3, one);
    if (n != 3)return -2;
    if (memcmp(buffer, BOM, 3) != 0)
    {   printf("file '%s' has no BOM\n", file_name);
        fclose(one);
        return 0;
    };
    n = fseek(one, 0, SEEK_SET);
    if (n != 0) return -3;
    buffer[0] = buffer[1] = buffer[2] = '*';
    n = fwrite(buffer, 1, 3, one);
    if (n == 3)
        printf("Byte Order Mark changed to '***'\n");
    else
        printf("Error writing to file\n");
    fclose(one);
    return 0;
}

- arfneto

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- WalterM · Accepted Answer

如果您在同一菜单中查找，请点击“转换为UTF-8”。