"Topological" sorting graph with cycles

"Topological" sorting graph with cycles

The full title of the article should have sounded like "Sustainable" topological "graph sorting with cycles for O (| V | + | e | log | e |) by time and O (| V |) from memory without recursion, ”but I was told that this is a brute force.

Disclaimer: I am a programmer, not a mathematician, therefore inaccurate wording is possible in places where you can and should be kicked.

The essence of the problem

I will analyze the problem statement, the solution of which I want to share, in parts.

Topological Sorting is the ordering of the vertices of a directed acyclic graph, in which each of the vertices from which edge goes, goes earlier than the top, in which this edge is included. There are two important nuances here: there can be more than one such orderings and it applies only to acyclic graphs. Mathematicians are not worried, but programmers sometimes want determinism and a little more than “sorry, you have a cycle here, there will be no sorting for you”.

Therefore, we add the requirement stability : a pair of vertices, the order between which is not specified by the edges of the graph, should be determined by the order in which these same vertices arrived at the input of the algorithm.

With the absence of recursion , everything is simple, the computer is significantly weaker than the Turing machine, and memory (and especially the stack) is limited. Therefore, in application programming, usually iterative algorithms are preferable to recursive ones.

And finally, I’ll define what I call a “topological” sort if there are cycles in the graph. This is the ordering of the vertices, which coincides with the true topological sorting, if each of the cycles is replaced by one vertex, and the vertices of the cycle itself, in accordance with the requirement of stability, are located relative to each other in their original order.
And now with all this garbage, we will try to take off I’ll do this within the limits of time and memory specified at the beginning of the post.

Finding a solution

If you look at existing topological sorting algorithms ( Kahn algorithm , depth search ), it turns out that they all, if there is a cycle, say "I can not" and stop working.

Therefore, let us go on the other hand, with algorithms that can do something intelligible with cycles. For example, find them. Among those listed in Wikipedia algorithms for searching cycles in graphs , attention attracted Tarjan algorithm . It contains an interesting remark that, as a by-product, the algorithm gives reverse topological graph sorting:
It’s not a problem. Therefore, it is a resourceful and incomprehensible algorithm that it has a resilience. , but this is clearly a movement in the right direction. When reading Wikipedia more attentively, it reveals a reference to the article A is clearly connected components for the authorship of Comrade David Pierce, in which there is not only an imperative algorithm, but also with reduced memory requirements compared to the classical Tarjan algorithm. The bonus goes implementation of the algorithm in Java . We must take!

PEA_FIND_SCC3 Algorithm (V, E) from Pierce’s article

So, we have a list of vertices at the input and (thanks to Pierce) a certain index of the strong connected component to which this vertex belongs to the output. The next step is to stably sort the vertices according to the sequence number of their component. There is an algorithm for this task, it is called sorting by calculation that performs this O (n) time.

In the process of collecting the algorithm into a heap, it turned out that the fact that it is natural for him to give back the topological sorting reverse is really hard to get out of Tarjan - then the neighboring branches of the graph (which do not have order relations) will number it backwards in advance, the pieces of the graph, which have no connections between themselves, turn out to be in the reverse order ...


So, the final decision:

  1. Number the vertices of the source list. O (| V |)
  2. Sort the edges of each of the vertices according to the number of the vertex where the edge goes. O (| e | log | e |)
  3. Using the Pierce algorithm, we find and number the components of strong connectivity. O (| V |)
  4. Using sorting by counting, sort the vertices on the basis of the numbers of the components of strong connectivity they received. O (| V |)

GitHub, Java code, public domain . You can note that in order to ensure the stability of sorting, the Pierce algorithm is slightly modified and goes around the vertices in the reverse order.

But why ???

And now the background, why it all took. When loading/unloading dynamic libraries (.so), glibc, you need to decide in which order to initialize the static variables. Variables depend on each other, depend on various functions, etc. In general, all this forms the same graph in which there are cycles and which must be sorted.

Once upon a time, this task was handled by a rather non-optimal code that performed the task for O (n ^ 2) . And in general, it didn’t really bother anyone until, in 2012, it wasn’t it turned out that The code does not work correctly and in some cases makes a mistake.

Severe men from RedHat thought-thought and screwed on top some more cycles. Problem cases have been fixed, but the algorithm has already begun to work for O (n ^ 3) , and this has already become noticeable and in some applications it has taken a few tens of seconds, which was started in 2013 bug . Also, the author of the bug found cases in which the algorithm with O (n ^ 3) was also wrong . He suggested using Tarjan’s algorithm, although the patch with corrections was never issued.

As time went on, glibc braked mercilessly and in 2015, another attempt to fix the algorithm occurs. Unfortunately unsuccessful, the algorithm was chosen O (n ^ 2) , besides it was confused by the branches of the graph, between which no order was defined.

Today is the year 2019, and glibc is still slowing down.Judging by how much time it took for me to correct the task, the chances that I will complete it are significantly lower than 100%. Everything is aggravated by the fact that it happens on C, without IDE support, in the code with the GNU coding style, the insane test runner (“and if you want to run the test again, just delete the corresponding .out file”). And in order for glibc to take a look at your patch at all, you must go through the procedure copyright assignment'a , correctly patch and damn what else. Therefore, in order to at least remove the question of inventing an algorithm that solves the problem, this post was written.

Source text: "Topological" sorting graph with cycles