![]() Here is what IACA says about the innermost loop: It is somewhat tricky to use with Visual C++ 64-bit compiler: you have to copy-paste the piece of code from assembly listing, surround it with markers (which are a bit wrong in iacaMarks.h by the way), then compile it with ml64.exe, and pass the resulting object file into iaca.exe. It assumes that the code piece is run in the infinite loop, that all branches are well-predicted and not taken, and that all memory accesses hit L1 cache (plus maybe some other assumptions). We can take closer look by using Intel Architecture Code Analyzer: IACA is a small static analysis tool, which analyzes a snippet of code at instruction level. Data access is scalar (point 3): you cannot vectorize it even if several keys are searched simultaneously. But it is inherently sequental (point 2): you cannot know which element to load on the next step until the element on the current step has been loaded and compared completely. And it really happens (unless your compiler thinks that you compile for 486), according to assembly listing of the innermost lea rax, QWORD PTR cmp DWORD PTR, r8d cmovl rdx, rax sar rcx, 1 test rcx, rcx jg SHORT good is this binary search implementation? It has no branches (point 1 from the above list), which is great. The comparison itself is included in the ternary operator, which is supposed to compile into cmovXX instruction. The best approach for it is described in this blog post from Paul Khuong, which is: make the very first iteration special by using step = n+1 - 2^logstep instead of just 2^logstep, so that regardless of comparison result the next search interval would have length 2^logstep, including either the beginning of array or the end of array (note that two such possible intervals overlap, and this is not a problem). In order to support arbitrary size of input array, some sort of modification is required. This code only works properly when N+1 is power of two. ![]() Int binary_search_branchless ( const int * arr, int n, int key ) Here is a branchless implementation that we will use in the comparison later: You can read about it for instance in this demofox's blog post. Of course, not all implementations of binary search are created equal: for small arrays branchless implementation is preferred. The problem is typically solved with binary search in O(log N) time, but it might easily happen so that for small N simple linear algorithm would be faster. And this investigation soon developed into a full-length standalone article. I decided to start investigating a simpler problem first, which is solved by std::lower_bound: given a sorted array of elements and a key, find index of the first array element greater or equal than the key. Avoid complicated algorithms: they almost always fail on one of the previous points, and they sometimes do too much work for small inputs.Prefer simple data access and manipulation patterns: this allows to vectorize the algorithm. ![]() Reduce data dependency: this allows to fully utilize processing units in CPU pipeline.Avoid branches whenever possible: unpredictable ones are very slow.What exactly should we strive for to get an algorithm efficient for small N? Here is the list of things to look for: std::sort) or mergesort includes some simple quadratic algorithm which is run for sufficiently small subarrays like N <= 32. That's why every well-optimized sorting algorithm based on quicksort (e.g. Indeed, applying some fancy O(N log N) algorithm is not a good idea: although it has optimal asymptotic performance, its logic is too complicated to outperform simple bubble-sort-like algorithms which take O(N^2) time instead. Suppose that we have a very small array and we want to sort it as fast as possible. It is a simple strategy for sorting or doing comparison-based tasks, which works wonderfully when input data is small enough. While working on an implementation of merge sort promised in the previous article, I realized that I'd like to use one neat little thing, which is worth its own post. 25 August 2017 | in High performance | tags: binary search cmov iaca simd
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |