In a previous blog post by one of my colleagues, we shared how we developed a powerful semantic search engine for our NetEye User Guide. This solution uses Elasticsearch in combination with machine learning models like ELSER to index and query our documentation. While the proof of concept (POC) worked great, there was a challenge that needed to be tackled before putting it in production: ensuring consistency between the deployed user guide and the search results, all while maintaining zero downtime.
In this post, we’ll dive deeper into how we adapted the POC and solved the challenge of keeping the search engine consistent and operational during a user guide deployment. More specifically, we’ll talk about how we borrowed an idea from smartphone OS updates and applied it to Elasticsearch indices for seamless updates and zero-downtime operations.
As you may know, Elasticsearch is a powerful tool for full-text search, but its architecture is not designed for atomic operations like those you’d find in traditional relational databases. This posed a significant challenge for our use case.
When the content of our user guide changes, we need to index those new documents in Elasticsearch so that search results reflect the updated content. However, we wanted to guarantee that even in the middle of an update, the search engine would always return consistent results. This means that even if a user is searching the guide while the update is happening, the results should still be valid and reflect the most current version of the guide.
Elasticsearch indexing is not atomic; there’s always a window of time during which the documents might be inconsistent with the live content of the user guide. During the update, some users might see outdated search results because the documents were still being indexed. Our goal was clear: Ensure consistency and eliminate downtime during the update process.
In my search for a solution, I was reminded of a technique I had encountered in the past when experimenting with custom smartphone ROMs (and basically messing up both my smartphone warranty and core functionalities). Android OS updates typically use a feature known as the A/B partition system.
The idea behind A/B partitions is simple yet powerful: there are two separate partitions (A and B) that each contain a copy of the system. When an update is released, only one partition is updated, while the other partition continues to run the current version of the system. Once the update is finished, the device switches to the updated partition on the next reboot, minimizing downtime and ensuring that the user can always interact with a stable version of the OS.
I realized this exact same approach could be applied to our Elasticsearch indices. Instead of having a single index that is constantly updated, we could use two indices to mirror the state of the user guide content.
Here’s how we implemented this solution in production:
user-guide-v1
(the active index) and user-guide-v2
(the staging index).user-guide-v1
) always contains the live content, while the staging index (user-guide-v2
) is used to index the new documents when there’s an update to the guide.user-guide-v2
).user-guide-v1
) continues to serve search requests without disruption, ensuring consistency in the search results.user-guide-v1
to user-guide-v2
.user-guide-v1
, thus switching back to the previous version without any downtime.By implementing this A/B partitioning approach, we were able to update the user guide content without interrupting the search functionality. Even when the content is being updated in Elasticsearch, users can still perform searches and get consistent results based on the currently active index.
Since Elasticsearch aliases allow for quick switching between indices, this method ensures that the search system remains up and running throughout the update process. Users will always get accurate search results corresponding to the latest version of the deployed content, with zero downtime.
In conclusion, by applying the A/B partitioning strategy from Android OS updates to our Elasticsearch-based search system, we solved the challenge of maintaining consistency between the user guide content and search results. This solution allowed us to deploy updates without causing any downtime or inconsistency, ensuring a seamless experience for our users.
This strategy not only improved our search functionality but also gave us the confidence to continue evolving and deploying new content without worrying about the stability of the search engine. It’s a great example of how taking inspiration from different fields can lead to innovative solutions in software engineering.
Feel free to try out the updated search functionality in our NetEye User Guide and let us know how it works for you!
Are you passionate about performance metrics or other modern IT challenges? Do you have the experience to drive solutions like the one above? Our customers often present us with problems that need customized solutions. In fact, we’re currently hiring for roles just like this as well as other roles here at Würth Phoenix.