我正在进行一个相当简单的测试:
- 有一个随机二进制信息的大文件,大小约为6Gb
- 算法执行“SeekCount”次循环
- 每次重复执行以下操作:
- 计算文件大小范围内的随机偏移量
- 跳转到该偏移量
- 读取小块数据
C#:
public static void Test()
{
string fileName = @"c:\Test\big_data.dat";
int NumberOfSeeks = 1000;
int MaxNumberOfBytes = 1;
long fileLength = new FileInfo(fileName).Length;
FileStream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, 65536, FileOptions.RandomAccess);
Console.WriteLine("Processing file \"{0}\"", fileName);
Random random = new Random();
DateTime start = DateTime.Now;
byte[] byteArray = new byte[MaxNumberOfBytes];
for (int index = 0; index < NumberOfSeeks; ++index)
{
long offset = (long)(random.NextDouble() * (fileLength - MaxNumberOfBytes - 2));
stream.Seek(offset, SeekOrigin.Begin);
stream.Read(byteArray, 0, MaxNumberOfBytes);
}
Console.WriteLine(
"Total processing time time {0} ms, speed {1} seeks/sec\r\n",
DateTime.Now.Subtract(start).TotalMilliseconds, NumberOfSeeks / (DateTime.Now.Subtract(start).TotalMilliseconds / 1000.0));
stream.Close();
}
然后在C++中进行相同的测试:
void test()
{
FILE* file = fopen("c:\\Test\\big_data.dat", "rb");
char buf = 0;
__int64 fileSize = 6216672671;//ftell(file);
__int64 pos;
DWORD dwStart = GetTickCount();
for (int i = 0; i < kTimes; ++i)
{
pos = (rand() % 100) * 0.01 * fileSize;
_fseeki64(file, pos, SEEK_SET);
fread((void*)&buf, 1 , 1,file);
}
DWORD dwEnd = GetTickCount() - dwStart;
printf(" - Raw Reading: %d times reading took %d ticks, e.g %d sec. Speed: %d items/sec\n", kTimes, dwEnd, dwEnd / CLOCKS_PER_SEC, kTimes / (dwEnd / CLOCKS_PER_SEC));
fclose(file);
}
执行时间:
- C#: 100-200次读取/秒
- C++: 250,000次读取/秒(250,000次)
问题:为什么在像文件读取这样的琐碎操作上,C++比C#快数千倍?
额外信息:
- 我玩过流缓冲区并将它们设置为相同的大小(4 KB)。
- 硬盘是碎片化的(0%碎片化)。
- 操作系统配置:Windows 7,NTFS,一些最新的现代500GB硬盘(如果记得正确的话是WD),8 GB RAM(虽然几乎没有使用),4核CPU(利用率几乎为零)。
MaxNumberOfBytes
的值是多少? - Some programmer duderand()/double(RAND_MAX)*fileSize
以具有与Java相同的功能。 - Mooing Duck