快速将Excel导入DataTable

34

我正在尝试将一个Excel文件读入到Data.DataTable的列表中,但是使用我的当前方法可能需要很长时间。我基本上是逐个Worksheet地进行处理,逐个单元格进行处理,这往往需要很长时间。有没有更快的方法来做到这一点?这是我的代码:

    List<DataTable> List = new List<DataTable>();

    // Counting sheets
    for (int count = 1; count < WB.Worksheets.Count; ++count)
    {
        // Create a new DataTable for every Worksheet
        DATA.DataTable DT = new DataTable();

        WS = (EXCEL.Worksheet)WB.Worksheets.get_Item(count);

        textBox1.Text = count.ToString();

        // Get range of the worksheet
        Range = WS.UsedRange;


        // Create new Column in DataTable
        for (cCnt = 1; cCnt <= Range.Columns.Count; cCnt++)
        {
            textBox3.Text = cCnt.ToString();


                Column = new DataColumn();
                Column.DataType = System.Type.GetType("System.String");
                Column.ColumnName = cCnt.ToString();
                DT.Columns.Add(Column);

            // Create row for Data Table
            for (rCnt = 0; rCnt <= Range.Rows.Count; rCnt++)
            {
                textBox2.Text = rCnt.ToString();

                try
                {
                    cellVal = (string)(Range.Cells[rCnt, cCnt] as EXCEL.Range).Value2;
                }
                catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException)
                {
                    ConvertVal = (double)(Range.Cells[rCnt, cCnt] as EXCEL.Range).Value2;
                    cellVal = ConvertVal.ToString();
                }

                // Add to the DataTable
                if (cCnt == 1)
                {

                    Row = DT.NewRow();
                    Row[cCnt.ToString()] = cellVal;
                    DT.Rows.Add(Row);
                }
                else
                {

                    Row = DT.Rows[rCnt];
                    Row[cCnt.ToString()] = cellVal;

                }
            }
        }
        // Add DT to the list. Then go to the next sheet in the Excel Workbook
        List.Add(DT);
    }

1
有没有更快的方法?不幸的是没有。这段代码为每个Excel单元格值创建(并错误地未处理)一个COM对象。这是最慢的方法!最好的方法是一次性将整个工作表读入数组中,然后迭代该数组中的项。 - Mike Gledhill
7个回答

27

调用 .Value2 是一项昂贵的操作,因为它涉及到 COM-interop 调用。相反,我建议将整个范围读入数组中,然后循环遍历数组:

object[,] data = Range.Value2;

// Create new Column in DataTable
for (int cCnt = 1; cCnt <= Range.Columns.Count; cCnt++)
{
    textBox3.Text = cCnt.ToString();

    var Column = new DataColumn();
    Column.DataType = System.Type.GetType("System.String");
    Column.ColumnName = cCnt.ToString();
    DT.Columns.Add(Column);

    // Create row for Data Table
    for (int rCnt = 1; rCnt <= Range.Rows.Count; rCnt++)
    {
        textBox2.Text = rCnt.ToString();

        string CellVal = String.Empty;
        try
        {
            cellVal = (string)(data[rCnt, cCnt]);
        }
        catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException)
        {
            ConvertVal = (double)(data[rCnt, cCnt]);
            cellVal = ConvertVal.ToString();
        }

        DataRow Row;

        // Add to the DataTable
        if (cCnt == 1)
        {

            Row = DT.NewRow();
            Row[cCnt.ToString()] = cellVal;
            DT.Rows.Add(Row);
        }
        else
        {

            Row = DT.Rows[rCnt + 1];
            Row[cCnt.ToString()] = cellVal;

        }
    }
} 

3
这仍然完美地运作着。我有40,000条记录,处理时间从大约2分钟减少到了约2秒钟。 - Henrique Feijo
2
我对答案中变量的使用感到非常困惑,它似乎不太适合新手。
  1. 我无法在那个位置使用Range.Value2,它显示错误为“无法隐式将对象[]转换为对象[,]”。
  2. 我不确定Convertval变量是什么意思。
- parkourkarthik
@parkourkarthik,“ConvertVal”来自于原始问题,所以我无法确定它的目的。然而,它似乎与原始问题不相关。 - D Stanley
我似乎找不到所有使用的变量的声明。 - ThEpRoGrAmMiNgNoOb
@DStanley,我同意你的观点;答案的根源正如你所建议的那样,并不是要打COM。你的答案非常高效,我已经在自己的项目中实现了它。感谢你分享这个伟大的知识片段,也感谢你容纳其他小语法细节。 - Leo Gurdian
显示剩余7条评论

8

如果有人正在使用EPPlus,这个实现方法相当幼稚,但有评论引起了这种注意。如果您在其上再添加一个方法GetWorkbookAsDataSet(),它将完成OP所要求的操作。

    /// <summary>
    /// Assumption: Worksheet is in table format with no weird padding or blank column headers.
    /// 
    /// Assertion: Duplicate column names will be aliased by appending a sequence number (eg. Column, Column1, Column2)
    /// </summary>
    /// <param name="worksheet"></param>
    /// <returns></returns>
    public static DataTable GetWorksheetAsDataTable(ExcelWorksheet worksheet)
    {
        var dt = new DataTable(worksheet.Name);
        dt.Columns.AddRange(GetDataColumns(worksheet).ToArray());
        var headerOffset = 1; //have to skip header row
        var width = dt.Columns.Count;
        var depth = GetTableDepth(worksheet, headerOffset);
        for (var i = 1; i <= depth; i++)
        {
            var row = dt.NewRow();
            for (var j = 1; j <= width; j++)
            {
                var currentValue = worksheet.Cells[i + headerOffset, j].Value;

                //have to decrement b/c excel is 1 based and datatable is 0 based.
                row[j - 1] = currentValue == null ? null : currentValue.ToString();
            }

            dt.Rows.Add(row);
        }

        return dt;
    }

    /// <summary>
    /// Assumption: There are no null or empty cells in the first column
    /// </summary>
    /// <param name="worksheet"></param>
    /// <returns></returns>
    private static int GetTableDepth(ExcelWorksheet worksheet, int headerOffset)
    {
        var i = 1;
        var j = 1;
        var cellValue = worksheet.Cells[i + headerOffset, j].Value;
        while (cellValue != null)
        {
            i++;
            cellValue = worksheet.Cells[i + headerOffset, j].Value;
        }

        return i - 1; //subtract one because we're going from rownumber (1 based) to depth (0 based)
    }

    private static IEnumerable<DataColumn> GetDataColumns(ExcelWorksheet worksheet)
    {
        return GatherColumnNames(worksheet).Select(x => new DataColumn(x));
    }

    private static IEnumerable<string> GatherColumnNames(ExcelWorksheet worksheet)
    {
        var columns = new List<string>();

        var i = 1;
        var j = 1;
        var columnName = worksheet.Cells[i, j].Value;
        while (columnName != null)
        {
            columns.Add(GetUniqueColumnName(columns, columnName.ToString()));
            j++;
            columnName = worksheet.Cells[i, j].Value;
        }

        return columns;
    }

    private static string GetUniqueColumnName(IEnumerable<string> columnNames, string columnName)
    {
        var colName = columnName;
        var i = 1;
        while (columnNames.Contains(colName))
        {
            colName = columnName + i.ToString();
            i++;
        }

        return colName;
    }


3

MS Office Interop很慢,Microsoft甚至不推荐在服务器端使用Interop并且不能用于导入大型Excel文件。有关更多详细信息,请参见Microsoft的为什么不使用OLE自动化

相反,您可以使用任何Excel库,例如EasyXLS。这是一个代码示例,展示了如何读取Excel文件:

ExcelDocument workbook = new ExcelDocument();
DataSet ds = workbook.easy_ReadXLSActiveSheet_AsDataSet("excel.xls");
DataTable dataTable = ds.Tables[0];

如果您的Excel文件有多个工作表,或者仅导入单元格范围(以获得更好的性能),请查看更多代码样本,了解如何使用EasyXLS在C#中将Excel导入到DataTable中。详见如何使用EasyXLS导入Excel到DataTable


2
Dim sSheetName As String
Dim sConnection As String
Dim dtTablesList As DataTable
Dim oleExcelCommand As OleDbCommand
Dim oleExcelReader As OleDbDataReader
Dim oleExcelConnection As OleDbConnection

sConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Test.xls;Extended Properties=""Excel 12.0;HDR=No;IMEX=1"""

oleExcelConnection = New OleDbConnection(sConnection)
oleExcelConnection.Open()

dtTablesList = oleExcelConnection.GetSchema("Tables")

If dtTablesList.Rows.Count > 0 Then
    sSheetName = dtTablesList.Rows(0)("TABLE_NAME").ToString
End If

dtTablesList.Clear()
dtTablesList.Dispose()

If sSheetName <> "" Then

    oleExcelCommand = oleExcelConnection.CreateCommand()
    oleExcelCommand.CommandText = "Select * From [" & sSheetName & "]"
    oleExcelCommand.CommandType = CommandType.Text

    oleExcelReader = oleExcelCommand.ExecuteReader

    nOutputRow = 0

    While oleExcelReader.Read

    End While

    oleExcelReader.Close()

End If

oleExcelConnection.Close()

1
 class DataReader
    {
        Excel.Application xlApp;
        Excel.Workbook xlBook;
        Excel.Range xlRange;
        Excel.Worksheet xlSheet;
        public DataTable GetSheetDataAsDataTable(String filePath, String sheetName)
        {
            DataTable dt = new DataTable();
            try
            {
                xlApp = new Excel.Application();
                xlBook = xlApp.Workbooks.Open(filePath);
                xlSheet = xlBook.Worksheets[sheetName];
                xlRange = xlSheet.UsedRange;
                DataRow row=null;
                for (int i = 1; i <= xlRange.Rows.Count; i++)
                {
                    if (i != 1)
                        row = dt.NewRow();
                    for (int j = 1; j <= xlRange.Columns.Count; j++)
                    {
                        if (i == 1)
                            dt.Columns.Add(xlRange.Cells[1, j].value);
                        else
                            row[j-1] = xlRange.Cells[i, j].value;
                    }
                    if(row !=null)
                        dt.Rows.Add(row);
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
            finally
            {
                xlBook.Close();
                xlApp.Quit();
            }
            return dt;
        }
    }

0

我知道这是一个老话题,但是有一种简单快捷的方法可以通过MiniExcel库来实现,就像以下代码一样

    var filePath = @".....demo.xlsx";
    
    List<DataTable> list = new List<DataTable>();
    var sheets = MiniExcel.GetSheetNames(filePath);
    foreach (var sheetName in MiniExcel.GetSheetNames(filePath))
        list.Add(MiniExcel.QueryAsDataTable(filePath,true,sheetName:sheetName));

此致敬礼

enter image description here


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接