Search service

Incident Report for Tipser

Postmortem

At 17:09 UTC we have been alarm that production Elasticsearch cluster hosted by external vendor is marked as unhealhy. Ivestigation has shown that one of the nodes is marked as down.
At 17:20 UTC we decided to restart nodes to clear status but our attempt was unsuccessful. Unhealthy node was not restarted and restarting healthy nodes led to downtime of whole cluster.
At 17:35 UTC operation of all nodes but one has been restored and whole service was available.
External vendor has informed us that faulty node was replaced at 19:05 UTC.

This mean that following endpoints were unavailable between 17:20 UTC and 17:35 UTC.

/v3/products/*
/v4/products/*

Posted May 13, 2022 - 13:13 CEST

Resolved

One of node in Elasticsearch cluster was marked as down by monitoring system.

Posted Aug 18, 2021 - 23:30 CEST