如何将Windows格式的文本文件转换为Unix格式

3

将Unix格式转换为Windows格式时,可以得到正确的输出;然而,将Windows格式转换为Unix格式时,会得到一些奇怪的输出。我原以为只需要去掉回车符'\r'即可,但这种方法并不奏效。运行代码后打开文本文件,就会看到一些奇怪的结果,第一行是正确的,接下来就会出现各种问题。

   int main( )
{
   bool windows = false;
   char source[256];
   char destination[256]; // Allocate the max amount of space for the filenames.

   cout << "Please enter the name of the source file: ";
   cin >> source;

   ifstream fin( source, ios::binary );
   if ( !fin )          // Check to make sure the source file exists.
   {
      cerr << "File " << source << " not found!";
      getch();
      return 1;
   }//endif

   cout << "Please enter the name of the destination file: ";
   cin >> destination;

   ifstream fest( destination );
   if ( fest )          // Check to see if the destination file already exists.
   {
      cout << "The file " << destination << " already exists!" << endl;
      cout << "If you would like to truncate the data, please enter 'Y', "
           << "otherwise enter 'N' to quit: ";
      char answer = char( getch() );
      if ( answer == 'n' || answer == 'N' )
      {
         return 1;
      }//endif
   }//endif
   clrscr();            // Clear screen for neatness.

   ofstream fout( destination, ios::binary );
   if ( !fout.good() )  // Check to see if the destination file can be edited.
   {
      cout << destination << "could not be opened!" << endl;
      getch();
      return 1;
   }//endif
                        // Open the destination file in binary mode.
   fout.open( destination, ios::binary );
   char ch = fin.get(); // Set ch to the first char in the source file.
   while ( !fin.eof() )
   {
      if ( ch == '\x0D' ) // If ch is a carriage return, then the source file
      {                   // must be in a windows format.
         windows = true;
      }//endif
      if ( windows == true )
      {
         ch = fin.get();  // Advance ch, so that at the bottom of the loop, the
      }//endif            // carriage return is not coppied into the new file.
      if ( windows == false )
      {
         if ( ch == '\x0A' )    // If the file is in the Unix format..
         {
            fout.put( '\x0D' ); // When a new line is found, output a carriage
         }//endif               // return.
      }//endif

      fout.put( ch );
      ch = fin.get();
   }//endwh
   if ( windows == true )
   {
      fout.put( '\x0A' );
   }//endif
   fout.close();
   fin.close();                 // Close yer files.

   if ( windows == true )       // A little output for user-friendly-ness.
   {
      cout << "The contents of " << source << " have been coppied to "
           << destination << " and converted to Unix format." << endl;
   }else{
      cout << "The contents of " << source << " have been coppied to "
           << destination << " and converted to Windows format." << endl;
   }//endif
   cout << "Enter any key to quit.." << endl;
   getch();
   return 0;
}//endmn

那么,反过来会发生什么? - BЈовић
如果您能使用脚本,也可以在Linux/Unix平台上使用dos2unix。 - weima
1
我怀疑在这种情况下这不是你的问题,但在将其作为输出流打开之前,你确实应该调用 fest.close() 来关闭目标文件的输入流。 - inspector-g
我投票支持将其移至CodeReview。这个投票部分是因为我希望CodeReview能够蓬勃发展。 - std''OrgnlDave
4个回答

4
如果您只需要转换简单的ASCII(也许是UTF-8)文本文件,您可以在翻译模式下使用非成员函数getline()逐行读取源文件,并在每一行后插入\n或\r\n输出到输出文件中(对于此情况它足够处理换行符)。
然后,您可以删除原始文件并将临时文件重命名为原始文件名。或者,如果您愿意,您可以将行推入vector中。然后,您可以关闭文件的输入句柄,使用ofstream out("filename", ios_base::trunc)打开输出句柄,并按所需的方式通过vector的元素将它们写入文件中,将它们用您想要的新行分隔开。
这完全取决于您的要求。
以下是一个具有最小错误处理的示例。但是,我真正想展示的只是FOR循环和逐行读取的不同做法。
convert_file.exe "test.txt" "linux"
convert_file.exe "test.txt" "win"
#include <iostream>
#include <string>
#include <fstream>
#include <ostream>
#include <cstdlib>
#include <cstdio>
using namespace std;

int main(int argc, char* argv[]) {
    if (argc != 3) {
        cerr << "Usage: this.exe file_to_convert newline_format(\"linux\" or \"win\")" << endl;
        return EXIT_FAILURE;
    }
    string fmt(argv[2]);
    if (fmt != "linux" && fmt != "win") {
        cerr << "Invalid newline format specified" << endl;
        return EXIT_FAILURE;
    }
    ifstream in(argv[1]);
    if (!in) {
        cerr << "Error reading test.txt" << endl;
        return EXIT_FAILURE;
    }
    string tmp(argv[1]);
    tmp += "converted";
    ofstream out(tmp.c_str(), ios_base::binary);
    if (!out) {
        cerr << "Error writing " << tmp << endl;
        return EXIT_FAILURE;
    }
    bool first = true;
    for (string line; getline(in, line); ) {
        if (!first) {
            if (fmt == "linux") {
                out << "\n";
            } else {
                out << "\r\n";
            }
        }
        out << line;
        first = false;
    }
    in.close();
    out.close();
    if (remove(argv[1]) != 0) {
        cerr << "Error deleting " << argv[1] << endl;
        return EXIT_FAILURE;
    }
    if (rename(tmp.c_str(), argv[1]) != 0) {
        cerr << "Error renaming " << tmp << " to " << argv[1] << endl;
        return EXIT_FAILURE;
    }
}

正如其他人所说,已经有一些实用程序(包括像Notepad++这样的文本编辑器)可以为您执行换行符转换。因此,除非您出于其他原因(您没有说明),否则不需要自己实现任何内容。


我猜想我想知道我的逻辑或算法有没有问题。 - Jerrod

2
我已经重新编辑了您的代码,它对我来说运行良好。希望这能帮到您!
#include <iostream>
#include <fstream>
#include <iostream>
#include<stdio.h>
using namespace std;

int main( )
{
    bool windows = false;
    char source[256];
    char destination[256]; // Allocate the max amount of space for the filenames.

    cout << "Please enter the name of the source file: ";
    cin >> source;

    ifstream fin( source, ios::binary );
    if ( !fin )          // Check to make sure the source file exists.
    {
        cerr << "File " << source << " not found!";
        return 1;
    }//endif

    cout << "Please enter the name of the destination file: ";
    cin >> destination;

    ifstream fest( destination );
    if ( fest )          // Check to see if the destination file already exists.
    {
        cout << "The file " << destination << " already exists!" << endl;
        cout << "If you would like to truncate the data, please enter 'Y', "
        << "otherwise enter 'N' to quit: ";
        char answer;
        cin >> answer;
        if ( answer == 'n' || answer == 'N' )
        {
            return 1;
        }
    }
    //clrscr();

    ofstream fout( destination);
    if ( !fout.good() )
    {
        cout << destination << "could not be opened!" << endl;
        return 1;
    }
    char ch = fin.get();
    while (!fin.eof())
    {
        if ( ch == '\r' ) 
        {                   
            windows = true;
        }
        if ( ch == '\n' && windows == false )    // If the file is in the Unix format..
        {
            // Don't do anything here
        }
        fout.put( ch );
        cout << ch; // For Debugging purpose
        ch = fin.get();
    }
    fout.close();
    fin.close();

    if ( windows == true )       // A little output for user-friendly-ness.
    {
        cout<<endl;
        cout << "The contents of " << source << " have been coppied to "
        << destination << " and converted to Unix format." << endl;
    }else{
        cout << "The contents of " << source << " have been coppied to "
        << destination << " and converted to Windows format." << endl;
    }//endif
    cout << "Enter any key to quit.." << endl;
    return 0;
}

1
这似乎只是将一个文件复制到另一个文件,但格式无论如何都没有成功转换。 - Jerrod

2
不用担心在循环中检查窗口。只需检查回车符。设置一个变量“carriage_return”。下一次迭代,如果“carriage_return”和ch!= linefeed,则简单地插入换行符。然后将carriage_return变量重置为false。这是一个非常简单和基本的规则,不会让你出错。
bool carriage_return = false;
const char linefeed = '\n'; // Is it? I forget.
const char cr = '\r'; // I forget again. Too late to check.
char ch = fin.get();
if (ch == cr) carriage_return = true;
while (!fin.eof()){
  if (carriage_return) {  // Check if we already have a newline
    if (ch != linefeed) { // If we have a newline previously, we need a linefeed. If it's already there just leave it, if it isn't there put it in
      fout.put(linefeed);
    }
    if (ch != cr) carriage_return = false; // Reset the carriage-return flag *if* we don't have another carriage return. This handles multiple empty lines in an easy way for us.
  }

  fout.put(ch);
  ch = fin.get();
}

1
你确定在正确的格式中读取数据并保存数据了吗?
尝试使用不同的字符编码并仅“读取”它会导致非常糟糕的结果:|
然后,您还需要考虑需要进行的不同替换。
这可能会有所帮助 链接

我的C++知识相当有限。我将数据作为二进制代码读入,因此在这个层面上,Windows和Unix之间的格式没有区别。据我所知,Windows换行符被读取为“/r/n”,而Unix则被读取为“/n”。因此,在Windows中打开一个包含三个单词的Unix文本文件,它们分别位于三行,将显示为一行中没有空格的三个单词。这是因为Windows不会将“/n”识别为换行符,除非其前面有“/r”。因此,对我来说,这意味着所有需要做的就是添加“/r”。这也是我尝试的内容。 - Jerrod
是的,但如果文件被编码为UTF-16,你就必须用\n\0替换\r\0\n\0。类似这样的东西。 - Mr Lister

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接