查询速度非常快(仅使用分区和行键时非常快,当还要搜索特定匹配的属性时稍慢,但仍然可接受)。
然而,插入和删除行都非常缓慢。
澄清:
我想澄清的是,即使插入单个批次的100个项目也需要几秒钟的时间。这不仅是在处理成千上万行的总吞吐量时的问题。即使只插入100个项目,也会影响我。
以下是我用于批量插入表格的代码示例:
static async Task BatchInsert( CloudTable table, List<ITableEntity> entities )
{
int rowOffset = 0;
while ( rowOffset < entities.Count )
{
Stopwatch sw = Stopwatch.StartNew();
var batch = new TableBatchOperation();
// next batch
var rows = entities.Skip( rowOffset ).Take( 100 ).ToList();
foreach ( var row in rows )
batch.Insert( row );
// submit
await table.ExecuteBatchAsync( batch );
rowOffset += rows.Count;
Trace.TraceInformation( "Elapsed time to batch insert " + rows.Count + " rows: " + sw.Elapsed.ToString( "g" ) );
}
}
我正在使用批量操作,以下是一个调试输出的示例:
Microsoft.WindowsAzure.Storage Information: 3 : b08a07da-fceb-4bec-af34-3beaa340239b: Starting asynchronous request to http://127.0.0.1:10002/devstoreaccount1.
Microsoft.WindowsAzure.Storage Verbose: 4 : b08a07da-fceb-4bec-af34-3beaa340239b: StringToSign = POST..multipart/mixed; boundary=batch_6d86d34c-5e0e-4c0c-8135-f9788ae41748.Tue, 30 Jul 2013 18:48:38 GMT./devstoreaccount1/devstoreaccount1/$batch.
Microsoft.WindowsAzure.Storage Information: 3 : b08a07da-fceb-4bec-af34-3beaa340239b: Preparing to write request data.
Microsoft.WindowsAzure.Storage Information: 3 : b08a07da-fceb-4bec-af34-3beaa340239b: Writing request data.
Microsoft.WindowsAzure.Storage Information: 3 : b08a07da-fceb-4bec-af34-3beaa340239b: Waiting for response.
Microsoft.WindowsAzure.Storage Information: 3 : b08a07da-fceb-4bec-af34-3beaa340239b: Response received. Status code = 202, Request ID = , Content-MD5 = , ETag = .
Microsoft.WindowsAzure.Storage Information: 3 : b08a07da-fceb-4bec-af34-3beaa340239b: Response headers were processed successfully, proceeding with the rest of the operation.
Microsoft.WindowsAzure.Storage Information: 3 : b08a07da-fceb-4bec-af34-3beaa340239b: Processing response body.
Microsoft.WindowsAzure.Storage Information: 3 : b08a07da-fceb-4bec-af34-3beaa340239b: Operation completed successfully.
iisexpress.exe Information: 0 : Elapsed time to batch insert 100 rows: 0:00:00.9351871
如您所见,这个例子插入100行几乎需要1秒钟。在我的开发机器上(3.4 GHz四核),平均时间似乎约为0.8秒。
这看起来很荒谬。
以下是批量删除操作的示例:
Microsoft.WindowsAzure.Storage Information: 3 : 4c271cb5-7463-44b1-b2e5-848b8fb10a93: Starting asynchronous request to http://127.0.0.1:10002/devstoreaccount1.
Microsoft.WindowsAzure.Storage Verbose: 4 : 4c271cb5-7463-44b1-b2e5-848b8fb10a93: StringToSign = POST..multipart/mixed; boundary=batch_7e3d229f-f8ac-4aa0-8ce9-ed00cb0ba321.Tue, 30 Jul 2013 18:47:41 GMT./devstoreaccount1/devstoreaccount1/$batch.
Microsoft.WindowsAzure.Storage Information: 3 : 4c271cb5-7463-44b1-b2e5-848b8fb10a93: Preparing to write request data.
Microsoft.WindowsAzure.Storage Information: 3 : 4c271cb5-7463-44b1-b2e5-848b8fb10a93: Writing request data.
Microsoft.WindowsAzure.Storage Information: 3 : 4c271cb5-7463-44b1-b2e5-848b8fb10a93: Waiting for response.
Microsoft.WindowsAzure.Storage Information: 3 : 4c271cb5-7463-44b1-b2e5-848b8fb10a93: Response received. Status code = 202, Request ID = , Content-MD5 = , ETag = .
Microsoft.WindowsAzure.Storage Information: 3 : 4c271cb5-7463-44b1-b2e5-848b8fb10a93: Response headers were processed successfully, proceeding with the rest of the operation.
Microsoft.WindowsAzure.Storage Information: 3 : 4c271cb5-7463-44b1-b2e5-848b8fb10a93: Processing response body.
Microsoft.WindowsAzure.Storage Information: 3 : 4c271cb5-7463-44b1-b2e5-848b8fb10a93: Operation completed successfully.
iisexpress.exe Information: 0 : Elapsed time to batch delete 100 rows: 0:00:00.6524402
持续时间超过0.5秒。
我在Azure上运行了这个程序(小型实例),记录下插入28000行的时间为20分钟。
我目前正在使用Storage Client Library 2.1 RC版本:MSDN Blog
我一定做错了什么。有什么想法吗?
更新
我尝试了并行处理,总体速度得到了提高(使用了8个逻辑处理器),但是在我的开发机上每秒插入150行数据。
整体效果没有明显改善,甚至在Azure上部署时可能更差。
我已经按照此建议增加了线程池,并增加了WebRole的最大HTTP连接数。
我仍然感觉缺少某些基本的东西,导致插入/删除操作受到限制,只能达到每秒150次的速度。
更新2
通过分析部署在Azure上的小型实例的一些诊断日志(使用2.1 RC Storage Client中内置的新日志记录功能),我获得了更多信息。
批量插入的第一个Storage Client日志在635109046781264034
时刻记录:
caf06fca-1857-4875-9923-98979d850df3: Starting synchronous request to https://?.table.core.windows.net/.; TraceSource 'Microsoft.WindowsAzure.Storage' event
接着大约3秒钟后,我在 635109046810104314
个刻度处看到了这个日志:
caf06fca-1857-4875-9923-98979d850df3: Preparing to write request data.; TraceSource 'Microsoft.WindowsAzure.Storage' event
接下来还有几条日志,总共耗时0.15秒,最后一条是在635109046811645418
个tick处结束的,这条日志记录了插入操作的情况:
caf06fca-1857-4875-9923-98979d850df3: Operation completed successfully.; TraceSource 'Microsoft.WindowsAzure.Storage' event
我不确定如何解释这个,但是在我查看的批处理插入日志中非常一致。
更新3:
以下是用于并行批量插入的代码。在此代码中,仅供测试,我确保将每个100批次插入到唯一的分区中。
static async Task BatchInsert( CloudTable table, List<ITableEntity> entities )
{
int rowOffset = 0;
var tasks = new List<Task>();
while ( rowOffset < entities.Count )
{
// next batch
var rows = entities.Skip( rowOffset ).Take( 100 ).ToList();
rowOffset += rows.Count;
string partition = "$" + rowOffset.ToString();
var task = Task.Factory.StartNew( () =>
{
Stopwatch sw = Stopwatch.StartNew();
var batch = new TableBatchOperation();
foreach ( var row in rows )
{
row.PartitionKey = row.PartitionKey + partition;
batch.InsertOrReplace( row );
}
// submit
table.ExecuteBatch( batch );
Trace.TraceInformation( "Elapsed time to batch insert " + rows.Count + " rows: " + sw.Elapsed.ToString( "F2" ) );
} );
tasks.Add( task );
}
await Task.WhenAll( tasks );
}
如上所述,这确实有助于提高插入数千行的总体时间,但是每个100行的批处理仍然需要几秒钟。
更新4:
因此,我创建了一个全新的Azure云服务项目,使用VS2012.2,Web角色作为单页模板(其中包含TODO示例)。
这是直接从盒子里拿出来的,没有新的NuGet包或任何东西。它默认使用Storage客户端库v2和EDM及相关库v5.2。
我只需修改HomeController代码如下(使用一些随机数据模拟我想要存储在真实应用程序中的列):
public ActionResult Index( string returnUrl )
{
ViewBag.ReturnUrl = returnUrl;
Task.Factory.StartNew( () =>
{
TableTest();
} );
return View();
}
static Random random = new Random();
static double RandomDouble( double maxValue )
{
// the Random class is not thread safe!
lock ( random ) return random.NextDouble() * maxValue;
}
void TableTest()
{
// Retrieve storage account from connection-string
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting( "CloudStorageConnectionString" ) );
// create the table client
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
// retrieve the table
CloudTable table = tableClient.GetTableReference( "test" );
// create it if it doesn't already exist
if ( table.CreateIfNotExists() )
{
// the container is new and was just created
Trace.TraceInformation( "Created table named " + "test" );
}
Stopwatch sw = Stopwatch.StartNew();
// create a bunch of objects
int count = 28000;
List<DynamicTableEntity> entities = new List<DynamicTableEntity>( count );
for ( int i = 0; i < count; i++ )
{
var row = new DynamicTableEntity()
{
PartitionKey = "filename.txt",
RowKey = string.Format( "$item{0:D10}", i ),
};
row.Properties.Add( "Name", EntityProperty.GeneratePropertyForString( i.ToString() ) );
row.Properties.Add( "Data", EntityProperty.GeneratePropertyForString( string.Format( "data{0}", i ) ) );
row.Properties.Add( "Value1", EntityProperty.GeneratePropertyForDouble( RandomDouble( 10000 ) ) );
row.Properties.Add( "Value2", EntityProperty.GeneratePropertyForDouble( RandomDouble( 10000 ) ) );
row.Properties.Add( "Value3", EntityProperty.GeneratePropertyForDouble( RandomDouble( 1000 ) ) );
row.Properties.Add( "Value4", EntityProperty.GeneratePropertyForDouble( RandomDouble( 90 ) ) );
row.Properties.Add( "Value5", EntityProperty.GeneratePropertyForDouble( RandomDouble( 180 ) ) );
row.Properties.Add( "Value6", EntityProperty.GeneratePropertyForDouble( RandomDouble( 1000 ) ) );
entities.Add( row );
}
Trace.TraceInformation( "Elapsed time to create record rows: " + sw.Elapsed.ToString() );
sw = Stopwatch.StartNew();
Trace.TraceInformation( "Inserting rows" );
// batch our inserts (100 max)
BatchInsert( table, entities ).Wait();
Trace.TraceInformation( "Successfully inserted " + entities.Count + " rows into table " + table.Name );
Trace.TraceInformation( "Elapsed time: " + sw.Elapsed.ToString() );
Trace.TraceInformation( "Done" );
}
static async Task BatchInsert( CloudTable table, List<DynamicTableEntity> entities )
{
int rowOffset = 0;
var tasks = new List<Task>();
while ( rowOffset < entities.Count )
{
// next batch
var rows = entities.Skip( rowOffset ).Take( 100 ).ToList();
rowOffset += rows.Count;
string partition = "$" + rowOffset.ToString();
var task = Task.Factory.StartNew( () =>
{
var batch = new TableBatchOperation();
foreach ( var row in rows )
{
row.PartitionKey = row.PartitionKey + partition;
batch.InsertOrReplace( row );
}
// submit
table.ExecuteBatch( batch );
Trace.TraceInformation( "Inserted batch for partition " + partition );
} );
tasks.Add( task );
}
await Task.WhenAll( tasks );
}
这是我得到的输出:
iisexpress.exe Information: 0 : Elapsed time to create record rows: 00:00:00.0719448
iisexpress.exe Information: 0 : Inserting rows
iisexpress.exe Information: 0 : Inserted batch for partition $100
...
iisexpress.exe Information: 0 : Successfully inserted 28000 rows into table test
iisexpress.exe Information: 0 : Elapsed time: 00:01:07.1398928
这比我的其他应用程序快一点,达到了超过460 ROPS的水平。但这仍然无法接受。在这个测试中,我的CPU(8个逻辑处理器)几乎达到了最大值,磁盘访问则几乎空闲。
我不知道问题出在哪里。
更新5
反复调整和微调已经取得了一些改进,但是批量InsertOrReplace操作的速度最多只能达到500-700 ROPS左右(每批100个)。此测试在Azure云上使用一个或两个小实例进行。
根据下面的评论,我接受本地测试至多会很慢的事实。
以下是几个示例。每个示例都有自己的PartitionKey:
Successfully inserted 904 rows into table org1; TraceSource 'w3wp.exe' event
Elapsed time: 00:00:01.3401031; TraceSource 'w3wp.exe' event
Successfully inserted 4130 rows into table org1; TraceSource 'w3wp.exe' event
Elapsed time: 00:00:07.3522871; TraceSource 'w3wp.exe' event
Successfully inserted 28020 rows into table org1; TraceSource 'w3wp.exe' event
Elapsed time: 00:00:51.9319217; TraceSource 'w3wp.exe' event
也许是我的MSDN Azure账户有一些性能限制?我不确定。
目前我认为我已经完成了这个问题。也许对于我的目的来说它足够快,或者我会选择不同的路径。
结论:
以下所有答案都很好!
对于我的具体问题,我已经能够在一个小的Azure实例上看到高达2k ROPS的速度,更典型的情况是1k左右。由于我需要控制成本(因此需要控制实例大小),这定义了我将能够使用表格的范围。
感谢大家的帮助。