Consider agents "Auditor"

Consider agents "Auditor"


It is no secret that the automated system "Auditor" monitors the control of locks on the list of prohibited information in Russia. How it works is well written here in this article on Habr , the picture is the same:

AC Examiner

Directly from the provider, the Agent Revizor module is installed:
The “Agent Revizor” module is a structural element of the automated system “Auditor” (AS “Auditor”). This system is intended to exercise control over the fulfillment by telecom operators of the requirements for restricting access within the provisions established by Articles 15.1-15.4 of the Federal Law of July 27, 2006 No. 149-ФЗ “On Information, Information Technologies and Information Protection.”

The main purpose of creating the Auditor is to ensure that telecom operators comply with the requirements established by Articles 15.1-15.4 of the Federal Law No. 149-FZ dated July 27, 2006 “On Information, Information Technologies and Information Protection” in terms of identifying access to prohibited information and obtaining supporting materials (data) on violations to restrict access to prohibited information.

Taking into account the fact that if not all, then many providers installed this device, it should have turned out to be a large network of test beacons like RIPE Atlas and even more, but with private access. However, the lighthouse is the lighthouse to send signals in all directions, and what if you catch them and see what we caught and how much?

Before considering, let's see why this may be possible at all.

A bit of theory


Agents check resource availability, including through HTTP (S) requests, like this one for example:

  TCP, 14678 & gt;  80, "[SYN] Seq = 0"
 TCP, 80 & gt;  14678, "[SYN, ACK] Seq = 0 Ack = 1"
 TCP, 14678 & gt;  80, "[ACK] Seq = 1 Ack = 1"

 HTTP, "GET/somepage HTTP/1.1"
 TCP, 80 & gt;  14678, "[ACK] Seq = 1 Ack = 71"
 HTTP, "HTTP/1.1 302 Found"

 TCP, 14678 & gt;  80, "[FIN, ACK] Seq = 71 Ack = 479"
 TCP, 80 & gt;  14678, "[FIN, ACK] Seq = 479 Ack = 72"
 TCP, 14678 & gt;  80, "[ACK] Seq = 72 Ack = 480"
  

In addition to the payload, the request consists of the connection setup phase: the SYN and SYN-ACK exchange, and the connection completion phase: FIN-ACK . < br/>
The registry of prohibited information contains several types of locks. Obviously, if the resource will be blocked by IP address or domain name, then we will not see any requests. These are the most destructive types of blocking that lead to the unavailability of all resources on one IP address or all information on a domain. There is also a type of “by URL” lock. In this case, the filtering system must parse the HTTP request header to determine exactly what to block. And, as can be seen above, the connection setup phase should happen to it, which you can try to track down, since the filter will most likely miss it.

To do this, select a suitable free domain with a blocking type “by URL” and HTTP to facilitate the operation of the filtering system, preferably long abandoned, to minimize the ingress of unauthorized traffic except from Agents. This task turned out to be not at all difficult, there are a lot of free domains in the registry of prohibited information and for every taste. Therefore, the domain was acquired, tied to IP addresses on a VPS running tcpdump , and the counting began.

Audit of the "Auditors"


I expected to see periodic bursts of requests, which would tell in my opinion about the controlled action.I can’t say that I didn’t see it at all, but there was definitely no clear picture:

Source dump

This is not surprising, even an unnecessary domain for never used IP will be sent to just a lot of information not requested, this is the modern Internet. But fortunately, I needed only URL-specific requests, so all scanners and password pererabotchiki were quickly found. Also, it was enough just to understand where the flood is in the mass of similar requests. Then I made up the frequency of occurrence of IP addresses and walked around the top manually separating those who slipped in the previous stages. Additionally, I cut out all the sources that were sent in one package, there were not many of them. And it turned out this:

Audit Requests

A small lyrical digression. A little more than a day later, my hosting provider sent a letter of rather streamlined content, they say there is a resource from the forbidden list of the RKN at your facilities, so it is blocked. At first I thought that my account was blocked, it was not. Then I thought that I was just warned about what I already know. But it turned out that the hoster turned on its filter in front of my domain and in the end I came under double filtering: from the providers and from the hoster. The filter passed only the ends of the requests: FIN-ACK and RST cutting off all HTTP at the forbidden URL. As you can see from the graph above, after the first day I began to receive less data, but I still got them, which was quite enough for the task of calculating the sources of requests.

Get to the point. In my opinion, two bursts are clearly seen every day, the first one is smaller, after midnight Moscow time, the second is closer to 6 am with the tail before 12 noon. Peak does not occur at exactly the same time. At first, I wanted to single out the IP addresses that fell only during these periods, and each one for all periods, assuming that the checks by the Agents are performed periodically. But upon careful viewing, I rather quickly found periods falling into other intervals, with different frequencies, up to one request every hour. Then I thought about time zones and what was possible in them, then I thought that in general the system might not be globally synchronized. In addition, for sure, NAT will play its role, and the same Agent can make requests from different public IPs.

Since my initial goal was not exactly, I considered all the addresses that I’ve got in a week and got 2791 . The number of TCP sessions established from one address is on average 4, with a median of 2. Top sessions per address: 464, 231, 149, 83, 77. The maximum of 95% of the sample is 8 sessions per address. The median is not very high, let me remind you that the schedule shows a clear daily frequency, so you could expect something around 4 to 8 in 7 days. If we throw out all the once-encountered sessions, then we just get the median equal to 5. But I could not exclude them on a clear sign. On the contrary, random inspection showed that they are related to requests for a prohibited resource.

Addresses are addresses, and on the Internet, autonomous systems, AS, are more important than 1510 , an average of 2 addresses per AS with a median 1. Top addresses at AS: 288, 77, 66, 39, 27. Maximum of 95% of the sample - 4 addresses per AS. Here the median is expected - one Agent per provider. Top is also expected - there are big players in it. In a large network, Agents, probably, should be located in each region of the operator’s presence, and we don’t forget about NAT. If we take the countries, the maximums will be: 1409 - RU, 42 - UA, 23 - CZ, 36 from other regions, not the RIPE NCC. Inquiries not from Russia draw attention to themselves.Probably, this can be explained by geolocation errors or recorder errors when filling in the data. Or the fact that the Russian company may have non-Russian roots, or have a foreign representation because it is easier that it is natural to deal with the foreign organization RIPE NCC. Some part is undoubtedly superfluous, but it is authentically difficult to separate it, since the resource is under blocking, and from the second day under double blocking and most of the sessions represent only the exchange of several service packets. We agree on the fact that this is a small part.

These numbers can already be compared with the number of providers in Russia. According to the RKN licenses for “Communication services for data transmission, except for voice” - 6387, but this is heavily dragged Evaluation from above, not all of these licenses are for Internet providers who need to install an Agent. In the RIPE NCC zone, a similar number of AS are registered in Russia - 6,230, of which not all providers. UserSide made a more rigorous calculation and received 3,940 companies in 2017, and this is more of a top rating. In any case, we have the number of illuminated AS two and a half times less. But here it is understood that the AS is not strictly equal to the provider. Some providers do not have their own AS, some have more than one. If we assume that Agents still stand at all, then someone filters more strongly than the others, so their requests are indistinguishable from garbage, if at all. But for a rough estimate it is quite tolerable, even if something is lost due to my mistake.

About DPI


Despite the fact that my hosting provider turned on its filter starting from the second day, according to the information for the first day, it can be concluded that the locks are working successfully. Only 4 sources were able to break through and have fully completed HTTP and TCP sessions (as in the example above). Another 460 can send GET , but the session is instantly terminated by RST . Notice TTL :

  TTL 50, TCP, 14678 & gt;  80, "[SYN] Seq = 0"
 TTL 64, TCP, 80 & gt;  14678, "[SYN, ACK] Seq = 0 Ack = 1"
 TTL 50, TCP, 14678 & gt;  80, "[ACK] Seq = 1 Ack = 1"

 HTTP, "GET/filteredpage HTTP/1.1"
 TTL 64, TCP, 80 & gt;  14678, "[ACK] Seq = 1 Ack = 294"

 # This filter sent
 TTL 53, TCP, 14678 & gt;  80, "[RST] Seq = 3458729893"
 TTL 53, TCP, 14678 & gt;  80, "[RST] Seq = 3458729893"

 HTTP, "HTTP/1.1 302 Found"

 # This is an attempt of the original node to get a loss
 TTL 50, TCP ACKed unseen segment, 14678 & gt;  80, "[ACK] Seq = 294 Ack = 145"

 TTL 50, TCP, 14678 & gt;  80, "[FIN, ACK] Seq = 294 Ack = 145"
 TTL 64, TCP, 80 & gt;  14678, "[FIN, ACK] Seq = 171 Ack = 295"

 TTL 50, TCP Dup ACK 14678 & gt;  80 "[ACK] Seq = 295 Ack = 145"

 # The source node understands that the session is destroyed
 TTL 50, TCP, 14678 & gt;  80, "[RST] Seq = 294"
 TTL 50, TCP, 14678 & gt;  80, "[RST] Seq = 295"
  

Variations of this can be different: less RST or more retransmitts - it also depends on what the filter sends to the source node. In any case, this is the most reliable template from which it is clear that it was the forbidden resource that was requested. Plus there is always an answer that appears in a session with a TTL larger than in the previous and next packages.

Even the GET is not visible from the rest:

  TTL 50, TCP, 14678 & gt;  80, "[SYN] Seq = 0"
 TTL 64, TCP, 80 & gt;  14678, "[SYN, ACK] Seq = 0 Ack = 1"

 # This filter sent
 TTL 53, TCP, 14678 & gt;  80, "[RST] Seq = 1"
  

Or like this:

  TTL 50, TCP, 14678 & gt;  80, "[SYN] Seq = 0"
 TTL 64, TCP, 80 & gt;  14678, "[SYN, ACK] Seq = 0 Ack = 1"
 TTL 50, TCP, 14678 & gt;  80, "[ACK] Seq = 1 Ack = 1"

 # This filter sent
 TTL 53, TCP, 14678 & gt;  80, "[RST, PSH] Seq = 1"

 TTL 50, TCP ACKed unseen segment, 14678 & gt;  80, "[FIN, ACK] Seq = 89 Ack = 172"
 TTL 50, TCP ACKed unseen segment, 14678 & gt;  80, "[FIN, ACK] Seq = 89 Ack = 172"

 # Filter again, many times
 TTL 53, TCP, 14678 & gt;  80, "[RST, PSH] Seq = 1"
 ...
  

Be sure to see the difference in TTL if something comes from the filter. But often nothing can fly at all:

  TCP, 14678 & gt;  80, "[SYN] Seq = 0"
 TCP, 80 & gt;  14678, "[SYN, ACK] Seq = 0 Ack = 1"
 TCP Retransmission, 80 & gt;  14678, "[SYN, ACK] Seq = 0 Ack = 1"
 ...
  

Or like this:

  TCP, 14678 & gt;  80, "[SYN] Seq = 0"
 TCP, 80 & gt;  14678, "[SYN, ACK] Seq = 0 Ack = 1"
 TCP, 14678 & gt;  80, "[ACK] Seq = 1 Ack = 1"

 # A few seconds passed without traffic

 TCP, 80 & gt;  14678, "[FIN, ACK] Seq = 1 Ack = 1"
 TCP Retransmission, 80 & gt;  14678, "[FIN, ACK] Seq = 1 Ack = 1"
 ...
  

And all this is repeated and is repeated and repeated, as can be seen on the graph, not just once, every day.

IPv6 Pro


The good news is it. I can say for sure that from 5 different IPv6 addresses, periodic requests to the forbidden resource occur, exactly the behavior of the Agents that I expected. And one of the IPv6 addresses does not fall under filtering and I see a full session. From the second I saw only one unfinished session, one of which was interrupted by RST from the filter, the second in time. Total total 7 .

Since there are few addresses, I studied all of them in detail and it turned out that there are only 3 providers there, they can be applauded while standing! Another address is cloud hosting in Russia (does not filter), another is a research center in Germany (there is a filter, where?). But why they check the availability of prohibited resources is a good question. The remaining two were made on a single request and are not in the limits of Russia, and one of them is filtered (after all, in transit?).

Locks and Agents is a big brake on IPv6, the implementation of which is not moving very fast. It is sad. Those who solved this task can be fully proud of themselves.

In conclusion


I didn’t try to forgive me for 100% accuracy for this, I hope someone will want to repeat this work with more accuracy. It was important for me to understand whether such an approach would work in principle. The answer will be. The obtained figures are, in the first approximation, I think, quite reliable.

What else could I do and what I was too lazy to do is to count the queries to the DNS. They are not filtered, but they do not give much accuracy as they work only for the domain, and not for the entire URL. The periodicity should be visible. If you combine it with what is seen directly in the requests, then this will allow you to separate the extra and get more information. It is even possible to determine the DNS developers used by providers and much more.

I did not expect that for my VPS the hoster will also turn on its filter. Maybe this is a common practice. In the end, the RKN sends a request to delete the resource to the hoster. But I was not surprised, and even somewhere played a benefit. The filter worked very effectively by cutting off all the correct HTTP requests to the forbidden URL, but not the correct ones that passed through the filter of the providers before, even if in the form of endings: FIN-ACK and RST - a minus on a minus and almost has turned out plus. By the way, the IPv6 hoster is not filtered.Of course, this affected the quality of the collected material, but still made it possible to see the frequency. This turned out to be an important point when choosing a site for placing resources, do not forget to be interested in the organization of work with the list of prohibited sites and requests from the RKN.

At the beginning, I compared AS “Auditor” with RIPE Atlas . This comparison is justified and a large network of Agents can be useful. For example, determining the quality of resource availability from different providers in different parts of the country. You can calculate delays, you can build graphs, you can analyze it all and see the changes occurring both locally and globally. This is not the most direct way, but astronomers use “standard candles”, why not use Agents? Knowing (finding) their standard behavior, one can determine the changes that occur around them and how this affects the quality of the services provided. And at the same time it is not necessary to arrange the probes on the network on their own, they have already been supplied by Roskomnadzor.
One more thing I want to touch on, every tool can be a weapon. AS “Revizor” is a closed network, but Agents hand over all the giblets by sending requests for all resources from the prohibited list. To have such a resource does not represent any problems. In total, providers through Agents, unwittingly, talk about their network much more than they possibly would have: types of DPI and DNS, location of the Agent (central hub and service network?), Network markers of delays and losses - and this is only the most obvious. Just as someone can monitor the actions of Agents to improve the availability of their resources, someone can do it for other purposes and there are no obstacles to this. A double-edged and very versatile tool turned out, anyone can see this.

Source text: Consider agents "Auditor"