我冒昧创建了一些类似于您提供的数据以测试我的解决方案。此外,我没有使用输入的 csv
文件,而是用了一个 dataframe
。这是我的解决方案:
import pandas as pd
import numpy as np
data = {
'time': [0, 292.5669, 620.8469, 0, 832.3269, 5633.9419, 20795.0950, 21395.6879, 0, 230.5678, 456.8468, 0, 784.3265, 5445.9452, 20345.0980, 21095.6898],
'magnitude': [13517, 370, 528, 377, 50187, 3088, 2922, 2498, 13000, 369, 527, 376, 50100, 3087, 2921, 2497]
}
df = pd.DataFrame(data)
def split_dataframe_by_pattern(df, output_prefix):
file_count = 1
current_group = pd.DataFrame(columns=df.columns)
for index, row in df.iterrows():
if row['time'] == 0 and not current_group.empty:
output_file = f'{output_prefix}_{file_count}.csv'
current_group.to_csv(output_file, index=False)
current_group = pd.DataFrame(columns=df.columns)
file_count += 1
current_group = pd.concat([current_group, row.to_frame().T], ignore_index=True)
current_group.to_csv(f'{output_prefix}_{file_count}.csv', index=False)
output_prefix = 'output_file'
split_dataframe_by_pattern(df, output_prefix)
我的输出是四个csv
文件:
output_file_1.csv
time,magnitude
0.0,13517.0
292.5669,370.0
620.8469,528.0
output_file_2.csv
time,magnitude
0.0,377.0
832.3269,50187.0
5633.9419,3088.0
20795.095,2922.0
21395.6879,2498.0
output_file_3.csv
time,magnitude
0.0,13000.0
230.5678,369.0
456.8468,527.0
output_file_4.csv
time,magnitude
0.0,376.0
784.3265,50100.0
5445.9452,3087.0
20345.098,2921.0
21095.6898,2497.0
awk
解决方案。 - dodrg