welcome to the knowledge gaps that I’ve been screaming about for many moons!
Elasticsearch - How to add new field to existing document by update by query. Elasticsearch: Back-fill old documents with email, documents would be around 1M+ records.
You can use the update by query API in order to add a new field to all your existing documents:
POST index/_update_by_query?conflicts=proceed&scroll_size=500{"script": {"source": "ctx._source.email = 'rohitpatel0105@gmail.com'","lang": "painless"},"query": {"bool": {"must_not": [{"exists": {"field": "email"}}]}}}
While running update by query API around 700k records have been updated and suddenly CPU utilization went to maximum state and nodes started to go down with cluster status "Red".the most likely root cause is exactly that - my UBQ … which hit too many documents too fast and destroyed the cluster. in addition to UBQ being SO easy to over-match on and update docs you didn’t mean to.{ memory and gc graphs are towards the bottom of those dashboards }I would suggest that you use the_tasksapi to first find and annihilate any remaining, running UBQ tasks ( which will keep running independent of whether you are still connected ) … or just wait out the storm and see if the cluster ever comes back.If deleting a problematic index isn't feasible, you can restore a snapshot, delete documents from the index, change the index settings, reduce the number of replicas, or delete other indices to free up disk space. The important step is to resolve the red cluster status before re-configuring your Amazon ES domain. Re-configuring a domain with a red cluster status can compound the problem and lead to the domain being stuck in a configuration state of Processing until you resolve the status.Conclusion: Don't try to use update by query API with large docs without closing index directly in cluster.


No comments:
Post a Comment