如何从Excel电子表格中读取单列数据?

4

我正在尝试从Excel文档中读取单列数据。我想读取整个列,但显然只存储有数据的单元格。我还想尝试处理列中某些单元格为空的情况,但如果在该列下方有更多单元格值,则仍将读取后面的内容。例如:

| Column1 |
|---------|
|bob      |
|tom      |
|randy    |
|travis   |
|joe      |
|         |
|jennifer |
|sam      |
|debby    |

如果我有那一列,我不介意在joe后的行中有一个值为"",但我希望它在空单元格后继续获取值。然而,我不希望它在假定debby是该列中的最后一个值时继续进行 35,000 行。

还可以安全地假设这将始终是第一列。

到目前为止,我有:

Excel.Application myApplication = new Excel.Application();
myApplication.Visible = true;
Excel.Workbook myWorkbook = myApplication.Workbooks.Open("C:\\aFileISelect.xlsx");
Excel.Worksheet myWorksheet = myWorkbook.Sheets["aSheet"] as Excel.Worksheet;
Excel.Range myRange = myWorksheet.get_Range("A:A", Type.Missing);

foreach (Excel.Range r in myRange)
{
    MessageBox.Show(r.Text);
}

我发现许多从较早版本的.NET中找到的示例做了类似的事情,但不完全相同,并希望确保我所做的更现代些(假设用于此操作的方法已经有所改变)。
我的当前代码读取整个列,但包括最后一个值之后的空单元格。
编辑1 我喜欢Isedlacek下面的答案,但我确实有一个问题,我不确定它是否特定于他的代码。如果我这样使用它:
Excel.Application myApplication = new Excel.Application();
myApplication.Visible = true;
Excel.Workbook myWorkbook = myApplication.Workbooks.Open("C:\\aFileISelect.xlsx");
Excel.Worksheet myWorksheet = myWorkbook.Sheets["aSheet"] as Excel.Worksheet;
Excel.Range myRange = myWorksheet.get_Range("A:A", Type.Missing);

var nonEmptyRanges = myRange.Cast<Excel.Range>()
.Where(r => !string.IsNullOrEmpty(r.Text));

foreach (var r in nonEmptyRanges)
{
    MessageBox.Show(r.Text);
}

MessageBox.Show("Finished!");
Finished!提示框没有显示出来。我不确定为什么会这样,但它似乎从未真正完成搜索。我尝试在循环中添加计数器,以查看它是否只是在不断地搜索该列,但它似乎并不是…… 它似乎只是停止了。
Finished!提示框处,我试图只关闭工作簿和电子表格,但该代码从未运行(因为Finished!提示框从未运行,所以这是预期的)。
如果我手动关闭Excel电子表格,我会收到一个COMException:

用户代码未处理COMException
附加信息:来自HRESULT的异常:0x803A09A2

有什么想法吗?

1
哈哈,那就是目标了,谢谢! - trueCamelType
我的回答对你有帮助吗?我做了几个更新来解决你遇到的效率问题。 - Michael Gunter
1
我的需求改变了,我用了另一种方法,但我认为你的答案对于任何遇到同样问题需要解决的人都很好。感谢你的出色回答! - trueCamelType
3个回答

3
答案取决于您是想获取已使用单元格的边界范围还是想从列中获取非空值。
以下是如何高效地从列中获取非空值。请注意,一次读取整个tempRange.Value属性比逐个单元格读取要快得多,但代价是生成的数组可能会占用大量内存。
private static IEnumerable<object> GetNonNullValuesInColumn(_Application application, _Worksheet worksheet, string columnName)
{
    // get the intersection of the column and the used range on the sheet (this is a superset of the non-null cells)
    var tempRange = application.Intersect(worksheet.UsedRange, (Range) worksheet.Columns[columnName]);

    // if there is no intersection, there are no values in the column
    if (tempRange == null)
        yield break;

    // get complete set of values from the temp range (potentially memory-intensive)
    var value = tempRange.Value2;

    // if value is NULL, it's a single cell with no value
    if (value == null)
        yield break;

    // if value is not an array, the temp range was a single cell with a value
    if (!(value is Array))
    {
        yield return value;
        yield break;
    }

    // otherwise, the value is a 2-D array
    var value2 = (object[,]) value;
    var rowCount = value2.GetLength(0);
    for (var row = 1; row <= rowCount; ++row)
    {
        var v = value2[row, 1];
        if (v != null)
            yield return v;
    }
}

以下是一种高效的方法,用于获取包含列中非空单元格的最小范围。请注意,我仍然一次性读取整个tempRange值集合,然后使用生成的数组(如果是多单元格范围)来确定哪些单元格包含第一个和最后一个值。然后,在确定哪些行具有数据后,构造边界范围。

private static Range GetNonEmptyRangeInColumn(_Application application, _Worksheet worksheet, string columnName)
{
    // get the intersection of the column and the used range on the sheet (this is a superset of the non-null cells)
    var tempRange = application.Intersect(worksheet.UsedRange, (Range) worksheet.Columns[columnName]);

    // if there is no intersection, there are no values in the column
    if (tempRange == null)
        return null;

    // get complete set of values from the temp range (potentially memory-intensive)
    var value = tempRange.Value2;

    // if value is NULL, it's a single cell with no value
    if (value == null)
        return null;

    // if value is not an array, the temp range was a single cell with a value
    if (!(value is Array))
        return tempRange;

    // otherwise, the temp range is a 2D array which may have leading or trailing empty cells
    var value2 = (object[,]) value;

    // get the first and last rows that contain values
    var rowCount = value2.GetLength(0);
    int firstRowIndex;
    for (firstRowIndex = 1; firstRowIndex <= rowCount; ++firstRowIndex)
    {
        if (value2[firstRowIndex, 1] != null)
            break;
    }
    int lastRowIndex;
    for (lastRowIndex = rowCount; lastRowIndex >= firstRowIndex; --lastRowIndex)
    {
        if (value2[lastRowIndex, 1] != null)
            break;
    }

    // if there are no first and last used row, there is no used range in the column
    if (firstRowIndex > lastRowIndex)
        return null;

    // return the range
    return worksheet.Range[tempRange[firstRowIndex, 1], tempRange[lastRowIndex, 1]];
}

谢谢,如果我有时间,我会将其移植到C#并将其作为我的问题的编辑添加。我认为这样的东西对于发现这个问题的人会很有帮助。 - trueCamelType
我刚刚进行了一些编辑。我相信我已经完成了编辑。最终答案! - Michael Gunter

1
如果您不介意完全删除空行:
var nonEmptyRanges = myRange.Cast<Excel.Range>()
    .Where(r => !string.IsNullOrEmpty(r.Text))
foreach (var r in nonEmptyRanges)
{
    // handle the r
    MessageBox.Show(r.Text);
}

那个回答非常完美,谢谢。保留空行不是必须的要求。 - trueCamelType
我对这段代码有一个问题。如果我使用它,它似乎永远无法完成。我将在我的问题中添加编辑以解释我的意思。 - trueCamelType
这个程序无法完成的原因是它在评估工作表中的每个单元格(或者根据myRange的设置,是每列的每个单元格)。在Excel 2007+中,每列有1,048,576个单元格。Excel互操作性能差是出了名的慢。这就是为什么你需要采用我回答中的技巧来限制你评估的单元格数量。 - Michael Gunter

0
    /// <summary>
    /// Generic method which reads a column from the <paramref name="workSheetToReadFrom"/> sheet provided.<para />
    /// The <paramref name="dumpVariable"/> is the variable upon which the column to be read is going to be dumped.<para />
    /// The <paramref name="workSheetToReadFrom"/> is the sheet from which te column is going to be read.<para />
    /// The <paramref name="initialCellRowIndex"/>, <paramref name="finalCellRowIndex"/> and <paramref name="columnIndex"/> specify the length of the list to be read and the concrete column of the file from which to perform the reading. <para />
    /// Note that the type of data which is going to be read needs to be specified as a generic type argument.The method constraints the generic type arguments which can be passed to it to the types which implement the IConvertible interface provided by the framework (e.g. int, double, string, etc.).
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="dumpVariable"></param>
    /// <param name="workSheetToReadFrom"></param>
    /// <param name="initialCellRowIndex"></param>
    /// <param name="finalCellRowIndex"></param>
    /// <param name="columnIndex"></param>
    static void ReadExcelColumn<T>(ref List<T> dumpVariable, Excel._Worksheet workSheetToReadFrom, int initialCellRowIndex, int finalCellRowIndex, int columnIndex) where T: IConvertible
    {
        dumpVariable = ((object[,])workSheetToReadFrom.Range[workSheetToReadFrom.Cells[initialCellRowIndex, columnIndex], workSheetToReadFrom.Cells[finalCellRowIndex, columnIndex]].Value2).Cast<object>().ToList().ConvertAll(e => (T)Convert.ChangeType(e, typeof(T)));
    }

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接