Before deploying El Proxy in production we, the R&D Team, carried out numerous benchmarks and reproduced real life scenarios to ensure that the real-time log signing performed by El Proxy would not represent a bottleneck in environments where logs that must respect the Italian “Garante Privacy” regulations are generated with data rates of around 2,000 logs per second.
Our first installations of El Proxy in productive environments confirmed that the real-time log signing was really not an issue.
What we did not consider instead, is that the performance of the verification of the El Proxy blockchains might represent a problem. In fact, while it is true that the time needed for verifying the integrity of a blockchain is not as crucial as it is for real-time log signing, the verification of a blockchain still needs to be performed in a reasonable amount of time. We soon realized that verifying the integrity of blockchains containing hundreds of millions of logs would take days for the El Proxy verification to complete.
After realizing this, we started an investigation to understand what the bottleneck was in the verification of the blockchains. We first tried to get an overview of where time was most spent by using execution time flamegraphs, but these were unusable due to the Rust async runtime which caused repeated calls to the same function to appear as separate calls in the flamegraph.
So we proceeded by isolating the different parts of the verification process and benchmarking them separately. What we discovered is that the time spent for the actual verification of the log signature represented just a minimal fraction of the whole execution time, while the largest part (around 90% of the total verification time) was spent just on retrieving the logs from the blockchain stored in Elasticsearch.
In fact, El Proxy verification retrieves the logs via the Elasticsearch REST APIs, and these take a substantial amount of time to complete. The time is due mainly due to Elasticsearch itself which needs to retrieve the documents that match the query performed by El Proxy and sort them.
Given the bottleneck that we had discovered, we had two plans to speed up El Proxy verification:
To tune the Elasticsearch queries we worked together with Elasticsearch support, trying for example the Point in time Elasticsearch API, but this did not speed up our queries.
So we jumped into refactoring the El Proxy verification as a concurrent process. The idea of concurrent El Proxy verification is that the blockchain is split into batches and each batch is assigned to one worker, which verifies its batch. The verification that each log is correctly chained to the next one is guaranteed by simply including the last log of each batch in the following batch, so that at the end of the verification each log has been verified together with the previous one.
The results of the concurrent implementation of the verification were satisfying. To give some numbers, the concurrent implementation, with 4 workers running concurrently, achieved a 250% speed up in terms of logs verified per second, which means that the concurrent implementation now takes less than half the time with respect to the sequential version of the verification.
In the following graph you can see how changing the number of workers affects the execution time of the verification.
Did you find this article interesting? Does it match your skill set? Programming is at the heart of how we develop customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.