Sync before updating/inserting data into dependent databases
router.put('/account/edit', function(req, res) { syncElasticWithDatabase().then(() => { elastiClient.update({...}); // client for elasticsearch cassandraClient.execute({...}); // client for cassandra req.end(); }) })
syncElasticWithDatabase()
方法使用 updates
表中的数据(来自于 postgres 数据库),由于一些人需要等待 syncElasticWithDatabase()
完成,因此这个方法可能会变慢。我喜欢这个方法,因为我利用了 sequantial_ids
(有关详情请查看文章)。在新数据到达之前,数据已经被同步,允许依赖项跟上,只有遗漏的数据将被同步。与下面的选项2不同,可以防止重新索引/重新插入。
Using a backround process (ei: running every 24 hours), I could sync data by selecting "missed out data" from
update_error
table, which contains data when elasticsearch or cassandra fail. Here's a rough examplerouter.put('/account/edit', function(req, res) { psqlClient.query('UPDATE....').then(() => { elastiClient.update({...}); // client for elasticsearch cassandraClient.execute({...}); // client for cassandra }).catch(err => { psqlClient.query('INERT INTO update_error ....') }) })
However this method would require to reindex or reinsert data, because in some cases elasticsearch could insert data while cassandra didn't or either way. Because of this I will need a separate column that will record database type that failed. This way I can select data that failed since the last synchronization time for each type of database (ealsticsearch or cassandra).
问题:
方法1看起来很完美,但这意味着由于
syncElasticWithDatabase()
,有些人需要等待更长时间才能更新他们的帐户。然而,上面的文章确实做到了这一点(看看他们的图表),或者我误解了什么?因为上述延迟(如果我理解正确),我介绍了第二个选项。然而,为了同步,这太多了。但是我花了很多时间思考这个问题...那么是否有比1和2更简单或更好的方法?
Apache Zoo Keeper在我的情况下有帮助吗?
谢谢 :)
其他参考资料
Sync elasticsearch on connection with database - nodeJS
https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/