使用C#解析带有字面量的文本文件

3

我在解析一个带有文字字面量的文本文件时遇到了问题。

我遇到问题的字面量是:

"\(" 是一个开括号"(",而 "/)" 是一个闭括号")"

这是我正在解析的文本文件的示例:

BT /F1 9 Tf 53.8646616541353 441 Td ( Voucher  AADA    Trans.      Prods               CDE                TRX                                   Payment) Tj ET
BT /F1 9 Tf 53.8646616541353 432 Td ( Number    Num    Date    WH   ID   Name     Name  #  Year  Inv. #    CD  Due Date    Qty   Price  Disct      %     Amount Due) Tj ET
BT /F1 9 Tf 53.8646616541353 423 Td (--------- ---- ---------- -- ------ -------- ----- -- ---- ---------- -- ---------- ------ ------- ------- ------- ------------) Tj ET
BT /F1 9 Tf 53.8646616541353 414 Td ( 21812539      09/30/2015 NA  29264 Symante  SUMME 52 2015 1735247    RM 09/30/2015      2  $15.00 50.0000  100.0%       15.00 ) Tj ET
BT /F1 9 Tf 53.8646616541353 405 Td ( 21827266      10/01/2015 NA  29264 Symante  SUMME 52 2015 1735966    RE 10/01/2015      1  $15.00 50.0000  100.0%       \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 396 Td ( 21832628      10/02/2015 NA  29264 Symante  SUMME 52 2015 1736174    RM 10/02/2015      1  $15.00 50.0000  100.0%        7.50 ) Tj ET
BT /F1 9 Tf 53.8646616541353 387 Td ( 21838251      10/02/2015 NA  29264 Symante  SUMME 52 2015 1736429    RE 10/02/2015      1  $15.00 50.0000  100.0%       \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 378 Td ( 21841821      10/03/2015 NA  29264 Symante  SUMME 52 2015 1736583    RM 10/03/2015      1  $15.00 50.0000  100.0%        7.50 ) Tj ET
BT /F1 9 Tf 53.8646616541353 369 Td ( 21874851      10/08/2015 NA  29264 Symante  SUMME 52 2015 1738192    RE 10/08/2015      1  $15.00 50.0000  100.0%       \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 360 Td ( 21879328      10/09/2015 NA  29264 Symante  SUMME 52 2015 1738389    RM 10/09/2015      1  $15.00 50.0000  100.0%        7.50 ) Tj ET
BT /F1 9 Tf 53.8646616541353 351 Td ( 21933007      10/16/2015 NA  29264 Symante  SUMME 52 2015 0000531968 SK 10/16/2015      1  $15.00 50.0000  100.0%       \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 342 Td (                                                                                                                  -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 333 Td (                                                                                           Sub Total:               \($1,650.00\)) Tj ET
BT /F1 9 Tf 53.8646616541353 324 Td (                                                                                                                  -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 315 Td ( 21827466      10/02/2015 NA  57629                        0000531284 PO 10/02/2015      0                  100.0%    \(1500.00\)) Tj ET
BT /F1 9 Tf 53.8646616541353 306 Td (                                                                                                                  -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 297 Td (                                                                                           Sub Total:               \($1,500.00\)) Tj ET
BT /F1 9 Tf 53.8646616541353 288 Td (                                                                                                                  -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 279 Td ( 21663952      09/02/2015 SN  57629 Zeal \(I\) 61-SE 61 2015 0000529704 IN 11/01/2015   2443  $14.95 50.0000  100.0%    11111.43 ) Tj ET
BT /F1 9 Tf 53.8646616541353 270 Td ( 21663953      09/02/2015 SN  57629 Zeal \(I\) 61-SE 61 2015 0000529704 SP 11/01/2015   2443  $14.95 50.0000  100.0%     \(200.33\)) Tj ET
BT /F1 9 Tf 53.8646616541353 261 Td ( 21699656      09/09/2015 S2  57629 Zeal \(I\) 61-SE 61 2015 0000530025 IN 11/08/2015    449  $14.95 50.0000  100.0%     1156.28 ) Tj ET
BT /F1 9 Tf 53.8646616541353 252 Td ( 21699657      09/09/2015 S2  57629 Zeal \(I\) 61-SE 61 2015 0000530025 SP 11/08/2015    449  $14.95 50.0000  100.0%      \(36.82\)) Tj ET
BT /F1 9 Tf 53.8646616541353 243 Td ( 21699658      09/09/2015 SL  57629 Zeal \(I\) 61-SE 61 2015 0000530025 IN 11/08/2015   1320  $14.95 50.0000  100.0%     1111.00 ) Tj ET
BT /F1 9 Tf 53.8646616541353 234 Td ( 21699659      09/09/2015 SL  57629 Zeal \(I\) 61-SE 61 2015 0000530025 SP 11/08/2015   1320  $14.95 50.0000  100.0%     \(108.24\)) Tj ET
BT /F1 9 Tf 53.8646616541353 225 Td ( 21736996      09/16/2015 S1  57629 Zeal \(I\) 61-SE 61 2015 0000530390 IN 11/15/2015   1016  $14.95 50.0000  100.0%     1111.60 ) Tj ET
BT /F1 9 Tf 53.8646616541353 216 Td ( 21736997      09/16/2015 S1  57629 Zeal \(I\) 61-SE 61 2015 0000530390 SP 11/15/2015   1016  $14.95 50.0000  100.0%      \(83.31\)) Tj ET
BT /F1 9 Tf 53.8646616541353 207 Td ( 21808378      09/29/2015 NA  57629 Zeal \(I\) 61-SE 61 2015 1735086    RE 09/29/2015      8  $14.95 50.0000  100.0%      \(59.80\)) Tj ET
BT /F1 9 Tf 53.8646616541353 198 Td ( 21838252      10/02/2015 NA  57629 Zeal \(I\) 61-SE 61 2015 1736429    RE 10/02/2015      1  $14.95 50.0000  100.0%       \(7.48\)) Tj ET
BT /F1 9 Tf 53.8646616541353 189 Td ( 21874852      10/08/2015 NA  57629 Zeal \(I\) 61-SE 61 2015 1738192    RE 10/08/2015      4  $14.95 50.0000  100.0%      \(29.90\)) Tj ET
BT /F1 9 Tf 53.8646616541353 180 Td (  

如果您看第20行,产品名称为Zeal (I)。负金额(最后一列待付款金额)也由括号括起来。
我正在逐行解析文本文件,但当我尝试……
line.Replace(@"\(", "");

这似乎不起作用。我以前从未在文件中遇到过这些字面量,因此不确定如何处理它们。除此之外,我几乎完成了解析工作。

我所做的方式非常直接。

                string line;
                int count = 0; // to be removed. Used in testing to cap count.
                while ((line = reader.ReadLine()) != null)
                {
                    if (count <= 10)
                    {
                        if (line.Length > 170 && line.Length < 200)
                        {
                            if (!ContainsAny(line))
                            {

                                line.Replace(@"\(", "");

                                indexStart = line.IndexOf("Td (") + 4;

                                col0 = line.Substring(indexStart, 9);
                                col1 = line.Substring(indexStart + 10, 4);
                                col2 = line.Substring(indexStart + 15, 10);
                                col3 = line.Substring(indexStart + 26, 2);
                                col4 = line.Substring(indexStart + 29, 6);
                                col5 = line.Substring(indexStart + 36, 8);
                                col6 = line.Substring(indexStart + 45, 5);
                                col7 = line.Substring(indexStart + 51, 2);
                                col8 = line.Substring(indexStart + 54, 4);
                                col9 = line.Substring(indexStart + 59, 10);
                                col10 = line.Substring(indexStart + 70, 2);
                                col11 = line.Substring(indexStart + 73, 10);
                                col12 = line.Substring(indexStart + 84, 6);
                                col13 = line.Substring(indexStart + 91, 7).Replace("$", "");
                                col14 = line.Substring(indexStart + 99, 7);
                                col15 = line.Substring(indexStart + 107, 7).Replace("%", "");
                                col16 = line.Substring(indexStart + 115, 12);

                                MessageBox.Show(string.Format("{0}; {1}; {2}; {3}; {4}; {5}; {6}; {7}; {8}; {9}; {10}; {11}; {12}; {13}; {14}; {15}; {16};", col0, col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11, col12, col13, col14, col15, col16));


                                //writer.WriteLine(lineOut);


                                count += 1; // to be removed. Used in testing to cap count.
                            }
                        }
                    }

我写入文件时得到的结果是:
21841821             10/03/2015  NA 29264   Symante  SUMME  52  2015     1736583     RM  10/03/2015 1   15  50  100 7.5
21874851             10/08/2015  NA 29264   Symante  SUMME  52  2015    1738192  RE  10/08/2015 1   15  50  100 -7.5
21879328             10/09/2015  NA 29264   Symante  SUMME  52  2015    1738389  RM  10/09/2015 1   15  50  100 7.5
21933007             10/16/2015  NA 29264   Symante  SUMME  52  2015    531968   SK  10/16/2015 1   15  50  100 -7.5
21827466             10/02/2015  NA 57629                                   531284   PO  10/02/2015 0                           100 -4500
21663952             09/02/2015  SN 57629    Zeal \(I    ) 61-   E   1 20    5 00005297 4    N 11/01/20  5   24  3  14.  5 50.00     0  100.    18261.40%
21663953             09/02/2015  SN 57629    Zeal \(I    ) 61-   E   1 20    5 00005297 4    P 11/01/20  5   24  3  14.  5 50.00     0  100.    -200.00%
21699656             09/09/2015  S2 57629    Zeal \(I    ) 61-   E   1 20    5 00005300 5    N 11/08/20  5    4  9  14.  5 50.00     0  100.    3356.20%

尝试使用以下代码:line = line.Substring(line.IndexOf(@"Td (") + 4).Replace(@"\"," ").Replace(@")","").Replace(@",","").Replace(@"(","").Replace(@"%"," ").Replace(@"$"," ");,你将得到一个格式化的字符串,可以通过位置进行解析。 - bansi
3个回答

4

line.Replace(@"\(", "");不会修改原始的string字符串,它只是返回一个新的已修改过的string字符串。正确的写法应该是:

line = line.Replace(@"\(", "");

查看String.Replace文档:

返回一个新字符串,其中当前实例中所有指定字符串的出现都被替换为另一个指定的字符串。


糟糕。不敢相信我错过了那个。我真笨。 - Saif Khan

1
你需要使用:

你需要使用:

line=line.Replace(@"\(", "");

1

看起来你写的内容比实际要求的多得多。

        var allLines = File.ReadAllLines(@"C:\myfile.text");
        var correctedLines = allLines.Select(l => l.Replace(@"\(", "").Replace(@"\)", ""));
        //now use corrected lines in your code

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接