读取Excel Open XML时忽略空单元格

51

我正在使用这里的被接受的解决方案将Excel表转换为datatable。如果我的数据是"完美的",那么它可以正常工作,但是如果我在数据中间有一个空单元格,它似乎会将错误的数据放入每一列。

我认为这是因为在以下代码中:

row.Descendants<Cell>().Count()

是填充单元格的数量(不是所有列)且:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

似乎是找到下一个有数据的单元格(不一定是该索引处的数据),所以如果第一列为空,而我调用ElementAt(0),它会返回第二列中的值。

以下是完整的解析代码。

DataRow tempRow = dt.NewRow();

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
        Console.Write(tempRow[i].ToString());
    }
}

请查看您提到的同一主题中的此答案。它解决了空单元格的问题。 - RBT
16个回答

47
这是有道理的,因为Excel不会存储空单元格的值。如果您使用Open XML SDK 2.0 Productivity Tool打开文件并遍历XML到单元格级别,您将看到只有包含数据的单元格才会出现在该文件中。
您可以选择在要遍历的单元格范围内插入空白数据,或编程式地找出被跳过的单元格并相应地调整索引。
我创建了一个示例Excel文档,在单元格引用A1和C1中放置了一个字符串。然后我在Open XML Productivity Tool中打开了Excel文档,以下是存储的XML:
<x:row r="1" spans="1:3" 
   xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A1" t="s">
    <x:v>0</x:v>
  </x:c>
  <x:c r="C1" t="s">
    <x:v>1</x:v>
  </x:c>
</x:row>

在这里,您将看到数据对应于第一行,并且仅为该行保存了两个单元格的数据。保存的数据对应于A1和C1,并且没有保存任何空值单元格。
要获取所需的功能,您可以像上面那样遍历单元格,但需要检查单元格引用的值并确定是否跳过了任何单元格。为此,您需要两个实用程序函数,以从单元格引用中获取列名,然后将该列名转换为基于零的索引:
    private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Create a regular expression to match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index), it will return the zero based column index.
    /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). 
    /// A length of three can be implemented when needed.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful; otherwise null</returns>
    public static int? GetColumnIndexFromName(string columnName)
    {
        int? columnIndex = null;

        string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
        colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();

        if (colLetters.Count() <= 2)
        {
            int index = 0;
            foreach (string col in colLetters)
            {
                List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList();
                int? indexValue = Letters.IndexOf(col1.ElementAt(index));

                if (indexValue != -1)
                {
                    // The first letter of a two digit column needs some extra calculations
                    if (index == 0 && colLetters.Count() == 2)
                    {
                        columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
                    }
                    else
                    {
                        columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
                    }
                }

                index++;
            }
        }

        return columnIndex;
    }

然后您可以迭代单元格并检查单元格引用与列索引的比较。 如果它小于,则将空数据添加到tempRow中,否则只需读取单元格中包含的值。 (注意:我没有测试下面的代码,但一般思路应该是有帮助的):

DataRow tempRow = dt.NewRow();

int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
   // Gets the column index of the cell with data
   int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));

   if (columnIndex < cellColumnIndex)
   {
      do
      {
         tempRow[columnIndex] = //Insert blank data here;
         columnIndex++;
      }
      while(columnIndex < cellColumnIndex);
    }
    tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);

    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
       Console.Write(tempRow[i].ToString());
    }
    columnIndex++;
}

你知道如何检测空单元格吗?这是我的问题。我想要一个可以准确读取工作表内容(包括空白)的解决方案。 - leora
@ooo - 唯一检测它的方法是如果单元格引用在子后代列表中不存在。 - amurra
4
请查看@amurra在此处提供的答案,以了解“Letters”列表的定义。(注:链接已经无法访问) - Andy Thomas
2
如果我有直到AH的列呢? - captainsac
2
有时候cell.CellReference为空。 - captainsac

22

下面是一个实现了 IEnumerable 接口的代码,经过编译和单元测试,应该可以满足你的需求。

    ///<summary>returns an empty cell when a blank cell is encountered
    ///</summary>
    public IEnumerator<Cell> GetEnumerator()
    {
        int currentCount = 0;

        // row is a class level variable representing the current
        // DocumentFormat.OpenXml.Spreadsheet.Row
        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for ( ; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

以下是它所依赖的函数:

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index),
    /// it will return the zero based column index.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful</returns>
    /// <exception cref="ArgumentException">thrown if the given string
    /// contains characters other than uppercase letters</exception>
    public static int ConvertColumnNameToNumber(string columnName)
    {
        Regex alpha = new Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }

把它放在一个类里,试试看。


2
有人可以展示一下如何明确地实现这个吗?谢谢! - shabos
2
为枚举示例提供一些背景信息会更好。 - Asheh
我本想将此作为评论添加,但没有足够的空间来解释。 如果你的行中的最后一个单元格为空,则GetEnumerator()方法将不会返回所需数量的单元格。要纠正这个问题,请参阅我的帖子:链接 - Gerhard Liebenberg

22

这是 Waylon 的回答稍作修改,并参考了其他答案。它将他的方法封装在一个类中。

我做了一些改动。

IEnumerator<Cell> GetEnumerator()

IEnumerable<Cell> GetRowCells(Row row)

这是一个类,你不需要实例化它,它只作为一个实用工具类:

public class SpreedsheetHelper
{
    ///<summary>returns an empty cell when a blank cell is encountered
    ///</summary>
    public static IEnumerable<Cell> GetRowCells(Row row)
    {
        int currentCount = 0;

        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for (; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Match the column name portion of the cell name.
        var regex = new System.Text.RegularExpressions.Regex("[A-Za-z]+");
        var match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index),
    /// it will return the zero based column index.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful</returns>
    /// <exception cref="ArgumentException">thrown if the given string
    /// contains characters other than uppercase letters</exception>
    public static int ConvertColumnNameToNumber(string columnName)
    {
        var alpha = new System.Text.RegularExpressions.Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }
}

现在您可以通过以下方式获取所有行的单元格:

// skip the part that retrieves the worksheet sheetData
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach(Row row in rows)
{
    IEnumerable<Cell> cells = SpreedsheetHelper.GetRowCells(row);
    foreach (Cell cell in cells)
    {
         // skip part that reads the text according to the cell-type
    }
}

即使单元格为空,它也将包含所有单元格。


我本想将此作为评论添加,但没有足够的空间来解释。 如果你的行中的最后一个单元格为空,则GetRowCells()方法将不会返回所需数量的单元格。要纠正这个问题,请参阅我的帖子:链接 - Gerhard Liebenberg

5
看看我的实现:

  Row[] rows = worksheet.GetFirstChild<SheetData>()
                .Elements<Row>()
                .ToArray();

  string[] columnNames = rows.First()
                .Elements<Cell>()
                .Select(cell => GetCellValue(cell, document))
                .ToArray();

  HeaderLetters = ExcelHeaderHelper.GetHeaderLetters((uint)columnNames.Count());

  if (columnNames.Count() != HeaderLetters.Count())
  {
       throw new ArgumentException("HeaderLetters");
  }

  IEnumerable<List<string>> cellValues = GetCellValues(rows.Skip(1), columnNames.Count(), document);

//Here you can enumerate through the cell values, based on the cell index the column names can be retrieved.

使用这个类可以收集HeaderLetters:

    private static class ExcelHeaderHelper
    {
        public static string[] GetHeaderLetters(uint max)
        {
            var result = new List<string>();
            int i = 0;
            var columnPrefix = new Queue<string>();
            string prefix = null;
            int prevRoundNo = 0;
            uint maxPrefix = max / 26;

            while (i < max)
            {
                int roundNo = i / 26;
                if (prevRoundNo < roundNo)
                {
                    prefix = columnPrefix.Dequeue();
                    prevRoundNo = roundNo;
                }
                string item = prefix + ((char)(65 + (i % 26))).ToString(CultureInfo.InvariantCulture);
                if (i <= maxPrefix)
                {
                    columnPrefix.Enqueue(item);
                }
                result.Add(item);
                i++;
            }
            return result.ToArray();
        }
    }

还有一些辅助方法:

    private static IEnumerable<List<string>> GetCellValues(IEnumerable<Row> rows, int columnCount, SpreadsheetDocument document)
    {
        var result = new List<List<string>>();
        foreach (var row in rows)
        {
            List<string> cellValues = new List<string>();
            var actualCells = row.Elements<Cell>().ToArray();

            int j = 0;
            for (int i = 0; i < columnCount; i++)
            {
                if (actualCells.Count() <= j || !actualCells[j].CellReference.ToString().StartsWith(HeaderLetters[i]))
                {
                    cellValues.Add(null);
                }
                else
                {
                    cellValues.Add(GetCellValue(actualCells[j], document));
                    j++;
                }
            }
            result.Add(cellValues);
        }
        return result;
    }


private static string GetCellValue(Cell cell, SpreadsheetDocument document)
{
    bool sstIndexedcell = GetCellType(cell);
    return sstIndexedcell
        ? GetSharedStringItemById(document.WorkbookPart, Convert.ToInt32(cell.InnerText))
        : cell.InnerText;
}

private static bool GetCellType(Cell cell)
{
    return cell.DataType != null && cell.DataType == CellValues.SharedString;
}

private static string GetSharedStringItemById(WorkbookPart workbookPart, int id)
{
    return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id).InnerText;
}

解决方案涉及共享单元格项目(SST索引单元格)。

2

字母编码是一种基于26进制的编码,因此可以将其转换为偏移量。

// Converts letter code (i.e. AA) to an offset
public int offset( string code)
{
    var offset = 0;
    var byte_array = Encoding.ASCII.GetBytes( code ).Reverse().ToArray();
    for( var i = 0; i < byte_array.Length; i++ )
    {
        offset += (byte_array[i] - 65 + 1) * Convert.ToInt32(Math.Pow(26.0, Convert.ToDouble(i)));
    }
    return offset - 1;
}

2

这些都是很好的例子。以下是我使用的方法,因为我需要跟踪所有行、单元格、值和标题以进行相关性分析。

ReadSpreadsheet 方法打开 xlxs 文件并遍历每个工作表、行和列。由于值存储在引用字符串表中,因此我还明确地对每个工作表使用了它。还使用了其他类:DSFunction 和 StaticVariables。后者保存常用的参数值,例如引用的 'quotdouble'(quotdouble = "\u0022";)和 'crlf'(crlf = "\u000D" + "\u000A";)。

下面包含了相关的 DSFunction 方法 GetIntColIndexForLetter。它返回与字母名称(如 A、B、AA、ADE 等)相对应的列索引的整数值。这与参数 'ncellcolref' 一起使用,以确定是否已跳过任何列,并为每个缺失的列输入空字符串值。

在将值临时存储到 List 对象中之前,我还会对其进行一些清理(使用 Replace 方法)。

随后,我使用列名称的哈希表(Dictionary)从不同的工作表中提取值,将它们相关联,创建规范化的值,然后创建一个对象,该对象用于我们的产品,然后将其存储为 XML 文件。虽然没有展示这些步骤,但这就是为什么使用这种方法的原因。

    public static class DSFunction {

    /// <summary>
    /// Creates an integer value for a column letter name starting at 1 for 'a'
    /// </summary>
    /// <param name="lettstr">Column name as letters</param>
    /// <returns>int value</returns>
    public static int GetIntColIndexForLetter(string lettstr) {
        string txt = "", txt1="";
        int n1, result = 0, nbeg=-1, nitem=0;
        try {
            nbeg = (int)("a".ToCharArray()[0]) - 1; //1 based
            txt = lettstr;
            if (txt != "") txt = txt.ToLower().Trim();
            while (txt != "") {
                if (txt.Length > 1) {
                    txt1 = txt.Substring(0, 1);
                    txt = txt.Substring(1);
                }
                else {
                    txt1 = txt;
                    txt = "";
                }
                if (!DSFunction.IsNumberString(txt1, "real")) {
                    nitem++;
                    n1 = (int)(txt1.ToCharArray()[0]) - nbeg;
                    result += n1 + (nitem - 1) * 26;
                }
                else {
                    break;
                }
            }
        }
        catch (Exception ex) {
            txt = ex.Message;
        }
        return result;
    }


}


    public static class Extractor {

    public static string ReadSpreadsheet(string fileUri) {
        string msg = "", txt = "", txt1 = "";
        int i, n1, n2, nrow = -1, ncell = -1, ncellcolref = -1;
        Boolean haveheader = true;
        Dictionary<string, int> hashcolnames = new Dictionary<string, int>();
        List<string> colvalues = new List<string>();
        try {
            if (!File.Exists(fileUri)) { throw new Exception("file does not exist"); }
            using (SpreadsheetDocument ssdoc = SpreadsheetDocument.Open(fileUri, true)) {
                var stringTable = ssdoc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
                foreach (Sheet sht in ssdoc.WorkbookPart.Workbook.Descendants<Sheet>()) {
                    nrow = 0;
                    foreach (Row ssrow in ((WorksheetPart)(ssdoc.WorkbookPart.GetPartById(sht.Id))).Worksheet.Descendants<Row>()) {
                        ncell = 0;
                        ncellcolref = 0;
                        nrow++;
                        colvalues.Clear();
                        foreach (Cell sscell in ssrow.Elements<Cell>()) {
                            ncell++;
                            n1 = DSFunction.GetIntColIndexForLetter(sscell.CellReference);
                            for (i = 0; i < (n1 - ncellcolref - 1); i++) {
                                if (nrow == 1 && haveheader) {
                                    txt1 = "-missing" + (ncellcolref + 1 + i).ToString() + "-";
                                    if (!hashcolnames.TryGetValue(txt1, out n2)) {
                                        hashcolnames.Add(txt1, ncell - 1);
                                    }
                                }
                                else {
                                    colvalues.Add("");
                                }
                            }
                            ncellcolref = n1;
                            if (sscell.DataType != null) {
                                if (sscell.DataType.Value == CellValues.SharedString && stringTable != null) {
                                    txt = stringTable.SharedStringTable.ElementAt(int.Parse(sscell.InnerText)).InnerText;
                                }
                                else if (sscell.DataType.Value == CellValues.String) {
                                    txt = sscell.InnerText;
                                }
                                else txt = sscell.InnerText.ToString();
                            }
                            else txt = sscell.InnerText;
                            if (txt != "") txt1 = txt.ToLower().Trim(); else txt1 = "";
                            if (nrow == 1 && haveheader) {
                                txt1 = txt1.Replace(" ", "");
                                if (txt1 == "table/viewname") txt1 = "tablename";
                                else if (txt1 == "schemaownername") txt1 = "schemaowner";
                                else if (txt1 == "subjectareaname") txt1 = "subjectarea";
                                else if (txt1.StartsWith("column")) {
                                    txt1 = txt1.Substring("column".Length);
                                }
                                if (!hashcolnames.TryGetValue(txt1, out n1)) {
                                    hashcolnames.Add(txt1, ncell - 1);
                                }
                            }
                            else {
                                txt = txt.Replace(((char)8220).ToString(), "'");  //special "
                                txt = txt.Replace(((char)8221).ToString(), "'"); //special "
                                txt = txt.Replace(StaticVariables.quotdouble, "'");
                                txt = txt.Replace(StaticVariables.crlf, " ");
                                txt = txt.Replace("  ", " ");
                                txt = txt.Replace("<", "");
                                txt = txt.Replace(">", "");
                                colvalues.Add(txt);
                            }
                        }
                    }
                }
            }
        }
        catch (Exception ex) {
            msg = "notok:" + ex.Message;
        }
        return msg;
    }





}

1

使用ClosedXML.Excel代替OpenXML:


    public DataTable ImportTable(DataTable dt, string FileName)
    {
        Statics.currentProgressValue = 0;
        Statics.maxProgressValue = 100;
        Statics.cancelProgress = false;
        try
        {
            bool fileExist = File.Exists(FileName);
            if (fileExist)
            {
                using (XLWorkbook workBook = new XLWorkbook(FileName))
                {
                    IXLWorksheet workSheet = workBook.Worksheet(1);

                    var rowCount = workSheet.RangeUsed().RowCount();
                    if (rowCount > 0)
                    {
                        var colCount = workSheet.Row(1).CellsUsed().Count();
                        if (dt.Columns.Count < colCount)
                            throw new Exception($"Expects at least {dt.Columns.Count} columns.");

                        //Loop through the Worksheet rows.
                        Statics.maxProgressValue = rowCount;
                        for (int i = 1; i < rowCount; i++)
                        {
                            Statics.currentProgressValue += 1;
                            dt.Rows.Add();
                            for (int j = 2; j < dt.Columns.Count; j++)
                            {
                                var cell = (workSheet.Rows().ElementAt(i).Cell(j));
                                if (!string.IsNullOrEmpty(cell.Value.ToString()))
                                    dt.Rows[i - 1][j] = cell.Value.ToString().Trim();
                                else
                                    dt.Rows[i - 1][j] = "";
                            }
                            if (Statics.cancelProgress == true)
                                break;
                        }
                    }

                    return dt;
                }
            }
        }
        catch (Exception ex)
        {
            Statics.cancelProgress = true;
            throw new Exception("Error exporting data." +
                Environment.NewLine + ex.Message);
        }
        return dt;
    }

1
您可以使用此功能从行中提取单元格,通过传递标题索引:
public static Cell GetCellFromRow(Row r ,int headerIdx) {
        string cellname = GetNthColumnName(headerIdx) + r.RowIndex.ToString();
        IEnumerable<Cell> cells = r.Elements<Cell>().Where(x=> x.CellReference == cellname);
        if (cells.Count() > 0)
        {
            return cells.First();
        }
        else {
            return null;
        }
}
public static string GetNthColumnName(int n)
    {
        string name = "";
        while (n > 0)
        {
            n--;
            name = (char)('A' + n % 26) + name;
            n /= 26;
        }
        return name;
    }

1

好的,我并不是这方面的专家,但其他答案似乎对我来说有些过度了,所以这是我的解决方案:

// Loop through each row in the spreadsheet, skipping the header row
foreach (var row in sheetData.Elements<Row>().Skip(1))
{
    var i = 0;
    string[] letters = new string[15] {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O" };

    List<String> cellsList = new List<string>();
    foreach (var cell in row.Elements<Cell>().ToArray())
    {
        while (cell.CellReference.ToString()[0] != Convert.ToChar(letters[i]))
        {//accounts for multiple consecutive blank cells
            cellsList.Add("");
            i++;
        }
        cellsList.Add(cell.CellValue.Text);
        i++;
    }

    string[] cells = cellsList.ToArray();

    foreach(var cell in cellsList)
    {
        //display contents of cell, depending on the datatype you may need to call each of the cells manually
    }
}

希望有人会发现这个有用!

1

非常抱歉又要回答这个问题,以下是我使用的代码。

如果工作表顶部有空行,则OpenXML无法正常工作会导致问题。它有时只返回一个具有0行和0列的DataTable。下面的代码处理了这个问题以及所有其他工作表。

以下是调用我的代码的方式。只需传入要读取的文件名和工作表的名称:

DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");

这里是代码本身:

    public class OpenXMLHelper
    {
        //  A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one
        //  of the worksheets.
        //
        //  We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers,
        //  OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more 
        //  stable method of reading in the data.
        //
        public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
        {
            DataTable dt = new DataTable(worksheetName);

            using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
            {
                // Find the sheet with the supplied name, and then use that 
                // Sheet object to retrieve a reference to the first worksheet.
                Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
                if (theSheet == null)
                    throw new Exception("Couldn't find the worksheet: " + worksheetName);

                // Retrieve a reference to the worksheet part.
                WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
                Worksheet workSheet = wsPart.Worksheet;

                string dimensions = workSheet.SheetDimension.Reference.InnerText;       //  Get the dimensions of this worksheet, eg "B2:F4"

                int numOfColumns = 0;
                int numOfRows = 0;
                CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
                System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));

                SheetData sheetData = workSheet.GetFirstChild<SheetData>();
                IEnumerable<Row> rows = sheetData.Descendants<Row>();

                string[,] cellValues = new string[numOfColumns, numOfRows];

                int colInx = 0;
                int rowInx = 0;
                string value = "";
                SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;

                //  Iterate through each row of OpenXML data, and store each cell's value in the appropriate slot in our [,] string array.
                foreach (Row row in rows)
                {
                    for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                    {
                        //  *DON'T* assume there's going to be one XML element for each column in each row...
                        Cell cell = row.Descendants<Cell>().ElementAt(i);
                        if (cell.CellValue == null || cell.CellReference == null)
                            continue;                       //  eg when an Excel cell contains a blank string

                        //  Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
                        colInx = GetColumnIndexByName(cell.CellReference);             //  eg "C" -> 2  (0-based)
                        rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1;     //  Needs to be 0-based

                        //  Fetch the value in this cell
                        value = cell.CellValue.InnerXml;
                        if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                        {
                            value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
                        }

                        cellValues[colInx, rowInx] = value;
                    }
                }

                //  Copy the array of strings into a DataTable.
                //  We don't (currently) make any attempt to work out which columns should be numeric, rather than string.
                for (int col = 0; col < numOfColumns; col++)
                    dt.Columns.Add("Column_" + col.ToString());

                for (int row = 0; row < numOfRows; row++)
                {
                    DataRow dataRow = dt.NewRow();
                    for (int col = 0; col < numOfColumns; col++)
                    {
                        dataRow.SetField(col, cellValues[col, row]);
                    }
                    dt.Rows.Add(dataRow);
                }

#if DEBUG
                //  Write out the contents of our DataTable to the Output window (for debugging)
                string str = "";
                for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
                {
                    for (colInx = 0; colInx < maxNumOfColumns; colInx++)
                    {
                        object val = dt.Rows[rowInx].ItemArray[colInx];
                        str += (val == null) ? "" : val.ToString();
                        str += "\t";
                    }
                    str += "\n";
                }
                System.Diagnostics.Trace.WriteLine(str);
#endif
                return dt;
            }
        }

        private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
        {
            //  How many columns & rows of data does this Worksheet contain ?  
            //  We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
            //      eg "B1:F4" -> we'll need 6 columns and 4 rows.
            //
            //  (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
            try
            {
                string[] parts = dimensions.Split(':');     // eg "B1:F4" 
                if (parts.Length != 2)
                    throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");

                numOfColumns = 1 + GetColumnIndexByName(parts[1]);     //  A=1, B=2, C=3  (1-based value), so F4 would return 6 columns
                numOfRows = GetRowIndexFromCellAddress(parts[1]);
            }
            catch
            {
                throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
            }
        }

        public static int GetRowIndexFromCellAddress(string cellAddress)
        {
            //  Convert an Excel CellReference column into a 1-based row index
            //  eg "D42"  ->  42
            //     "F123" ->  123
            string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
            return int.Parse(rowNumber);
        }

        public static int GetColumnIndexByName(string cellAddress)
        {
            //  Convert an Excel CellReference column into a 0-based column index
            //  eg "D42" ->  3
            //     "F123" -> 5
            var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
            int number = 0, pow = 1;
            for (int i = columnName.Length - 1; i >= 0; i--)
            {
                number += (columnName[i] - 'A' + 1) * pow;
                pow *= 26;
            }
            return number - 1;
        }
    }

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接