如何加速Word Interop处理?

5
我很新C#,写的代码相对比较笨拙。我在线学了很多课程,其中许多都说有几种方法来解决问题。现在我编写了一个程序,可以加载一个 .Doc Word 文件,并使用 if 语句查找相关信息。
现在我的问题是,这个程序需要很长时间才能完成,大约需要30分钟到1小时的时间才能完成下面的代码。
有什么好的想法可以使我的小程序不那么笨重吗?我希望这些解决方案可以极大地增加我的知识,所以提前感谢大家!
问候 克里斯
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace WindowsFormsApplication3
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
        public int id = 0;
        public int[] iD = new int[100];
        public string[] timeOn = new string[100];
        public string[] timeOff = new string[100];
        public string[] dutyNo = new string[100];
        public string[] day = new string[100];

        private void button1_Click(object sender, EventArgs e)
        {



            Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
            Microsoft.Office.Interop.Word.Document document = application.Documents.Open("c:\\Users\\Alien\\Desktop\\TESTJOBS.doc");
            //the following for will loop for all words

            int count = document.Words.Count;
            for (int i = 1; i <= count; i++)
            {
                // the following if statement will look for the first word that is On
                // this is then (on the file) proceded by  04:00 (thus i+2/3/4 respectively)
                if (document.Words[i].Text == "On")
                {
                    iD[id] = id;
                   // Console.WriteLine("ID Number ={0}", iD[id]);
                    dutyNo[id] = document.Words[i - 14].Text;
                   // Console.WriteLine("duty No set to:{0}", dutyNo[id]);
                    timeOn[id] = document.Words[i + 2].Text + document.Words[i + 3].Text + document.Words[i + 4].Text;
                   // Console.WriteLine("on time set to:{0}", timeOn[id]);
                    // the following if (runs if the last word was not "On" and then searches for the word "Off" which procedes "On" in the file format)
                    // this is then (on the file) proceded by  04:00 (thus i+2/3/4 respectively)
                }
                else if (document.Words[i].Text == "Off")
                {
                    timeOff[id] = document.Words[i + 2].Text + document.Words[i + 3].Text + document.Words[i + 4].Text;
                    //Console.WriteLine("off time set to:{0}", timeOff[id]);
                    // the following if (runs if the last word was not "Off" and then searches for the word "Duty" which procedes "Off" in the file format)
                    // this is then (on the file) proceded by  04:00 (thus i+2/3/4 respectively)
                }
                else if (document.Words[i].Text == "Days" && !(document.Words[i + 3].Text == "Type"))
                {

                    day[id] = document.Words[i + 2].Text;
                    //Console.WriteLine("day set to:{0}", day[id]);
                    //we then print the whole new duty out to ListBox1
                    listBox1.Items.Add(string.Format("new duty ID:{0} Time on:{1} Time off:{2} Duty No:{3} Day:{4}", iD[id], timeOn[id], timeOff[id], dutyNo[id], day[id]));
                    id++;
                }


            }

            for (int i = 1; i <= 99; i++)
            {
                Console.WriteLine("new duty ID:{0} Time on:{1} Time off:{2} Duty No:{3} Day:{4}", iD[id], timeOn[id], timeOff[id], dutyNo[id], day[id]);
            }


        }
    }
}

2
当你从外星人那里打开文档时,就会发生这种情况... ("Open("c:\Users\Alien\Desktop\TESTJOBS.doc"_) - Mihai-Daniel Virna
1
你是否在代码中加入了计时诊断(Stopwatch类)来查看哪些部分耗费了时间? - ChrisF
我没有,我一定会研究一下,虽然我可以猜到大概是什么原因。在我看来,这是因为它正在循环遍历每个单词(包括空格),然后检查每个单词的IF语句,因此需要很长时间才能完成一个成功单词的完整周期,大约需要20-30秒。 - Chris_livermore
1
首先学习如何对程序进行性能分析,这将告诉你它的运行缓慢的原因。如果没有这个,你怎么可能知道该修复什么问题呢? - Lex Li
1
除了代码的脆弱性(没有错误处理,期望数据以特定格式输入等),您只需在文档中所有单词上进行一次扫描。这应该非常快。看起来在交互层有一些隐藏的东西(显然非常昂贵)。作为参考,您可以每秒执行数百万个字符串比较/ifs - 这不是代码本身结构慢的原因。 - Cameron
显示剩余2条评论
4个回答

3

办公互操作性较为缓慢。

Openxml可能会更快一些,但文件是.doc格式,所以可能无法处理它。


但就像在这个问题中的Excel一样,有一种方法可以提高性能-不要通过索引访问Range中的每个单词,因为据我所知,这会导致创建一个包装在RCW中的单独Range实例,这是应用程序性能瓶颈的主要候选者。

这意味着你最好的选择是在实际处理之前将所有单词(.Text)加载到一些可索引的String集合中,然后仅使用该集合创建输出。

如何以最快的方式完成?我不确定,但你可以尝试从 _Document.Words enumerator 获取所有单词(虽然它可能更有效率,但至少你能看到仅检索所需单词需要多长时间):
var words = document
    .Cast<Range>()
    .Select(r => 
        r.Text)
    .ToList();

或者你可以尝试使用 _Document.Content 范围的 Text,但是你需要自己分离每个单词。


谢谢你,尤金。明天我会开始着手处理这些更改。 - Chris_livermore

1

好的,现在已经完成了,我们像以前一样处理所有信息,并且仍然导入整个文档。总运行时间为02:09.8,包括空格在内,共有2780个句子和约44,000个单词。以下是我的(不完美的代码),考虑到我两周前才开始学习C#,还不错;希望这能帮助未来的某个人。

    public Form1()
    {
        InitializeComponent();
    }
    public int id = 0;
    public int[] iD = new int[400];
    public string[] timeOn = new string[400];
    public string[] timeOff = new string[400];
    public string[] dutyNo = new string[400];
    public string[] day = new string[400];
    public string[] hours = new string[400];

    //Create File Location Var
    public string fileLocation = null;

    // On Click of Add Dutys
    private void button1_Click(object sender, EventArgs e)
    {
        //Sets Progress Bar visible and prepares to increment
        pBar1.Visible = true;
        pBar1.Minimum = 1;
        pBar1.Value = 1;
        pBar1.Step = 1;


        //Stopwatch test Declared
        Stopwatch stopWatch = new Stopwatch();

        try {
            //Self Test to see if a File Location has been set for Duty document.
            if (fileLocation == null) {
                //If not set prompts user with message box and brings up file explorer
                MessageBox.Show("It Appears that a file location has not yet been set, Please Select one now.");
                Stream myStream = null;
                OpenFileDialog openFileDialog1 = new OpenFileDialog();
                //Sets default Location and Default File type as .doc
                openFileDialog1.InitialDirectory = "c:\\";
                openFileDialog1.Filter = "All files (*.*)|*.*|Word Files (*.doc)|*.doc";
                openFileDialog1.FilterIndex = 2;
                openFileDialog1.RestoreDirectory = true;
                //Waits for User to Click ok in File explorer and then Sets file location to var
                if (openFileDialog1.ShowDialog() == DialogResult.OK)
                {
                    try
                    {
                        //Checks to make sure a file location is set
                        if ((myStream = openFileDialog1.OpenFile()) != null)
                        {
                            using (myStream)
                            {
                                //This is where we set location to var
                                fileLocation = openFileDialog1.FileName;
                            }
                            //Prompts user to click a file before OK
                        }else { MessageBox.Show("Please Select a file location before clicking ok"); }
                    }
                    catch (Exception ex)
                    {
                        MessageBox.Show("Error: Could not read file from disk: " + ex.Message);
                    }
                }
            }

           //Loads New Duty file 
            Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
            Microsoft.Office.Interop.Word.Document document = application.Documents.Open(fileLocation);
            //Begin stop watch (COPY TIME)
            stopWatch.Start();

            //Sets Count to No of sentences and then prepares Array using Number of sentences 
            //**This process reduces amount of processng time by taking everything in to the program to start and then dealing with it.
            int count = document.Sentences.Count;
            string[] sents = new string[count];
            //Then sets the Progress bar to the Number of sentences that will be Copied to our array
            pBar1.Maximum = count;

            try {
                //For loop runs throug every sentence and adds it to the array.
                for (int i = 0; i < count; i++) {
                    sents[i] = document.Sentences[i+1].Text;
                    //increment Progress bar by 1 for every sentence(Parse made)
                    pBar1.PerformStep();
                }
                //Closes our instance of word
                application.Quit();
                try {

                    for (int i = 0; i < count; i++)
                    {
                        //Sets our Split criteria 
                        char[] delimiterChars = { ' ','\t' };
                        string[] test = (sents[i].Split(delimiterChars));
                        //we then enter For loop that runs for the number of ords found/Split
                        for (int a = 0; a < test.Length; a++)
                        {  
                            //If tests only begin if the word is NOT a space blank, tab , - As these do parse through into our Test arrays
                            if (!(test[a] == "" || test[a].Contains("/t")|| test[a].Contains("-") || test[a].Contains(" ")))
                            {
                                //If tests to find Duty numbers ours on off and assigns ID number for easy indexing. 
                                //##THIS DOES ASSUME THAT OUR TIMES ARE 1 SPACE AFTER THEIR IDENTIFIERS.
                                if (test[a] == "TG")
                                {
                                    dutyNo[id] = test[a + 2]; 
                                }
                                else if (test[a] == "On")
                                {
                                    iD[id] = id;
                                    timeOn[id] = test[a + 1];
                                }
                                else if (test[a] == "Off")
                                {
                                    timeOff[id] = test[a + 1];
                                }
                                else if (test[a] == "Hrs")
                                {
                                    hours[id] = test[a + 1];
                                }
                                else if (test[a] == "Days")
                                {
                                    day[id] = test[a + 1];
                                    //PRINTS TO USER VIA LIST BOX ALL THE DUTYS ADDED.
                                    listBox1.Items.Add(string.Format("ADDED:Duty No:{3} Time on:{1} Time off:{2} Hours{5} Day:{4} ID:{0}", iD[id], timeOn[id], timeOff[id], dutyNo[id], day[id], hours[id]));
                                    id++;
                                }

                            }
                        }
                    }
                }
                catch(Exception ex) { MessageBox.Show("Error in split:" + ex.Message); }
            }
            catch(Exception ex) { MessageBox.Show("error setting string to Document:" + ex.Message); }
            //Stopwatch Is then printed for testing purposes.
            TimeSpan ts = stopWatch.Elapsed;
            string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds,
            ts.Milliseconds / 10);
            Console.WriteLine("RunTime (total):" + elapsedTime);

            stopWatch.Reset();

        }
        catch(Exception ex) { MessageBox.Show("Error in reading/finding file: "+ ex.Message); }

    }


}

}

我使用这段代码与一个相当大的列表框(ListBox1),一个按钮(Button1)和一个在启动时不可见的进度条(pBar1)。


你的希望正在实现。谢谢。 - Kanat

1
你可以使用OpenXml加载整个.Content范围并进行处理,然后重新导入它。

但是,这样做不是需要解析XML本身吗?而且在这样做的过程中,您还需要深刻理解XML结构,以获取所需的单元格和内容,那么您就无法使用 WordInterop 的所有好用特性了吧?我有遗漏什么吗? - Matthew Kligerman
我曾经也遇到过同样的问题。使用wordInterop编写一个表格需要8秒钟,而使用openXML在17秒钟内可以编写300个表格!感谢您的建议! - Fil

0

不要使用:

document.Words[i].Text

如果需要多次执行,请使用:

String Text = document.Words[i].Text;

在for循环的顶部使用“Text”(或您想要的任何名称)代替。Eugene Podskal的建议似乎非常有帮助,但这个简单的改进(在看到Eugene的回复之前我就想到了)非常容易实现,可以带来实质性的改进。

我不认为这会行,因为我正在使用Words[i]并且在If语句内部上下增加来查找所需信息的特定位置。如果我错了,请纠正我! - Chris_livermore

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接