Titan索引更新时间过长。

Question

Titan索引更新时间过长。

databasetitantinkerpopgremlin-server

3

即使在空数据库上，Titan 1.0创建索引也需要数分钟的时间。这个时间似乎是确定的，这表明存在没必要的延迟。

我的问题是：如何缩短或消除Titan重新索引所需的时间？从概念上讲，由于没有进行任何工作，所以时间应该是最小的，绝对不会花费4分钟的时间。

（注：我之前被指向了一个解决方案，只是使Titan等待完整的延迟时间而不超时。这是错误的解决方案-我想要完全消除延迟。）

我用来从头开始设置数据库的代码是：

graph = ... a local cassandra instance ...
graph.tx().rollback()

// 1. Check if the index already exists
mgmt = graph.openManagement()
i = mgmt.getGraphIndex('byIdent')
if(! i) {
  // 1a. If the index does not exist, add it
  idKey = mgmt.getPropertyKey('ident')
  idKey = idKey ? idKey : mgmt.makePropertyKey('ident').dataType(String.class).make()
  mgmt.buildIndex('byIdent', Vertex.class).addKey(idKey).buildCompositeIndex()
  mgmt.commit()
  graph.tx().commit()

  mgmt  = graph.openManagement()
  idKey = mgmt.getPropertyKey('ident')
  idx   = mgmt.getGraphIndex('byIdent')
  // 1b. Wait for index availability
  if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
    mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
  }
  // 1c. Now reindex, even though the DB is usually empty.
  mgmt.updateIndex(mgmt.getGraphIndex('byIdent'), SchemaAction.REINDEX).get()
  mgmt.commit()
  mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.ENABLED).call()
} else { mgmt.commit() }

看起来是由于updateIndex...REINDEX调用导致阻塞超时。这是已知问题还是worksformewon'tfix？我做错了什么吗？

编辑：禁用REINDEX，如评论中所讨论的，实际上并不是一个解决方案，因为索引似乎没有变得活动起来。我现在看到：

WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [(myindexedkey = somevalue)]. For better performance, use indexes

- Thomas M. DuBuisson

可能是Titan使用Amazon DynamoDB后端时索引状态从未更改为ENABLED的重复问题。 - Mohamed Taher Alrefaie

如果没有现有数据，比如第一次创建属性键和索引时，请消除对REINDEX的调用。 - Jason Plurad

@JasonPlurad 这对于我大多数用途来说是个不错的策略。但如果在索引创建时数据库很小怎么办？比如说，如果我只有极少量的非零顶点，那么必须重新建立索引并承受这看似毫无意义的延迟吗（至少要提交一个拉取请求）？ - Thomas M. DuBuisson

是的，如果你有数据在里面，那么你需要在这种情况下进行REINDEX。最佳实践是提前定义好你的模式和索引，并将其锁定。 - Jason Plurad

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Thomas M. DuBuisson · Accepted Answer

这次延迟完全是因为我误用了Titan（虽然在Titan 1.0.0的文档第28章中出现了这种模式）。

不要在事务中堵塞！

改为：

  mgmt  = graph.openManagement()
  idKey = mgmt.getPropertyKey('ident')
  idx   = mgmt.getGraphIndex('byIdent')
  // 1b. Wait for index availability
  if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
    mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
  }

请考虑：

  mgmt  = graph.openManagement()
  idKey = mgmt.getPropertyKey('ident')
  idx   = mgmt.getGraphIndex('byIdent')
  // Wait for index availability
  if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
    mgmt.commit()
    mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
  } else { mgmt.commit() }

使用 ENABLE_INDEX

不要使用：mgmt.updateIndex(mgmt.getGraphIndex('byIdent'), SchemaAction.REINDEX).get()

应使用：mgmt.updateIndex(mgmt.getGraphIndex('byIdent'),SchemaAction.ENABLE_INDEX).get()