XDocument文本节点换行

4

我正在尝试使用Linq XML命名空间中的XText将换行符插入文本节点。

我有一个包含换行符的字符串,但是我需要找出如何将它们转换为实体字符(即
),而不仅仅是让它们在XML中显示为新行。

XElement element = new XElement( "NodeName" );
...

string example = "This is a string\nWith new lines in it\n";

element.Add( new XText( example ) );

然后,使用XmlTextWriterXElement写出,导致文件包含换行符而不是实体替换。有人遇到过这个问题并找到解决方案吗?
编辑:当我将XML加载到EXCEL中时,问题就会显现出来,因为它似乎不能接受换行符但可以接受实体替换。结果是,除非我用
替换换行符,否则在EXCEL中不会显示换行符。

你是否尝试过使用 XCData 代替 XText 呢? - sellmeadog
没有,我还没有尝试过 - 我会试一下,但我预测 EXCEL 可能不喜欢它! - Nick
3个回答

4
作弊:
        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.CheckCharacters = false;
        settings.NewLineChars = "
";
        XmlWriter writer = XmlWriter.Create(..., settings);
        element.WriteTo(writer);
        writer.Flush();

更新:

完整程序

using System;
using System.Xml;
using System.Xml.Linq;


namespace ConsoleApplication1
{
class Program
{
    static void Main(string[] args)
    {
        XElement element = new XElement( "NodeName" );
        string example = "This is a string\nWith new lines in it\n";
        element.Add( new XText( example ) );

        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.CheckCharacters = false;
        settings.NewLineChars = "
";
        XmlWriter writer = XmlWriter.Create(Console.Out, settings);
        element.WriteTo(writer);
        writer.Flush();
    }
}
}

输出:

C:\Users\...\\ConsoleApplication1\bin\Release>ConsoleApplication1.exe
<?xml version="1.0" encoding="ibm850"?>&#10;<NodeName>This is a string&#10;With new lines in it&#10;</NodeName>

似乎无法替换文本节点中的换行符。 - Nick
完美。解决了手头的问题。 - Nick

1
对于任何标准的XML解析器来说,实体&#10;和换行符没有区别,因为它们是同一个东西。
为了说明这一点,以下代码显示它们是相同的:
string s1 = "<root>Test&#10;Test2</root>";
string s2 = "<root>Test\nTest2</root>";

XDocument doc1 = XDocument.Parse(s1);
XDocument doc2 = XDocument.Parse(s2);

Console.WriteLine(doc1.ToString());
Console.WriteLine(doc2.ToString());

好的,我遇到的问题是Excel无法识别换行符,但它可以识别实体替换。 - Nick
啊,这个我不知道。你保存文件的格式是什么,然后再在Excel中打开它们呢? - samjudson
这是一个XML格式,描述在此处:http://msdn.microsoft.com/en-us/library/aa140062(v=office.10).aspx - Nick

1

是XmlTextWriter负责输出转义实体。因此,如果您执行以下操作:

        using (XmlTextWriter w = new XmlTextWriter("test.xml", Encoding.UTf8))
        {
            w.WriteString("&#x10;");
        }

在 text.xml 中还会出现一个转义的 ampersand 输出 &amp;#x10,而这不是您想要的。您希望保留 &#x10; 序列原始状态。

我提出的解决方案是创建一个新的 StreamWriter 实现,能够检测到像 "&amp;#x10;" 这样的转义字符串:

    // A StreamWriter that does not escape &#10; characters
    public class NonXmlEscapingStreamWriter : StreamWriter
    {
        private const string AmpToken = "amp";
        private int _bufferState = 0; // used to keep state

        // add other ctors overloads if needed
        public NonXmlEscapingStreamWriter(string path)
            : base(path)
        {
        }

        // NOTE this code is based on the assumption that StreamWriter
        // only overrides these 4 Write functions, which is true today but could change in the future
        // and also on the assumption that the XmlTextWrite writes escaped values in a specific WriteXX calls sequence
        public override void Write(char value)
        {
            if (value == '&')
            {
                if (_bufferState == 0)
                {
                    _bufferState++;
                    return; // hold it
                }
                else
                {
                    _bufferState = 0;
                }
            }
            else if (value == ';')
            {
                if (_bufferState > 1)
                {
                    _bufferState++;
                    return;
                }
                else
                {
                    Write('&'); // release what's been held
                    Write(AmpToken);
                    _bufferState = 0;
                }
            }
            else if (value == '\n') // detect non escaped \n
            {
                base.Write("&#10;");
                return;
            }
            base.Write(value);
        }

        public override void Write(string value)
        {
            if (_bufferState > 0)
            {
                if (value == AmpToken)
                {
                    _bufferState++;
                    return; // hold it
                }
                else
                {
                    Write('&'); // release what's been held
                    _bufferState = 0;
                }
            }
            base.Write(value);
        }

        public override void Write(char[] buffer, int index, int count)
        {
            if (_bufferState > 2)
            {
                _bufferState = 0;
                base.Write('&'); // release this anyway
                string replace;
                if ((buffer != null) && ((replace = GetReplaceLength(buffer, index, count)) != null))
                {
                    base.Write(replace);
                    base.Write(buffer, index + replace.Length, count - replace.Length);
                    return;
                }
                else
                {
                    base.Write(AmpToken); // release this
                    base.Write(';'); // release this
                }
            }
            base.Write(buffer, index, count);
        }

        public override void Write(char[] buffer)
        {
            Write(buffer, 0, buffer != null ? buffer.Length : 0);
        }

        private string GetReplaceLength(char[] buffer, int index, int count)
        {
            // this is specific to the 10 character but could be adapted
            const string token = "#10;";
            if ((index + count) < token.Length)
                return null;

            // we test the char array to avoid string allocations
            for(int i = 0; i < token.Length; i++)
            {
                if (buffer[index + i] != token[i])
                    return null;
            }
            return token;
        }
    }

你可以像这样使用它:

    using (XmlTextWriter w = new XmlTextWriter(new NonXmlEscapingStreamWriter("test.xml")))
    {
        element.WriteTo(w);
    }

注意:尽管它能够检测到孤立的 \n 序列,但我建议您确保在原始文本中所有的 \n 都已经被转义,因此,在实际输出 XML 之前,您需要将 \n 替换为 &#x10;,就像这样:

string example = "This is a string&#x10;With new lines in it&#x10;";

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接