[From the sandbox] History of a little study of Legacy code

[From the sandbox] History of a little study of Legacy code


It’s good when there is someone more experienced in the team, who will show you what to do and how, and what rake and angle lie in front of you, and where to download the best bicycle drawings for 2007 on DVD. This story is about how the desired was given out for real, what happened as a result, and how the crisis was overcome.

It happened at that time when I, having rather, as it seemed to me, mediocre experience in development, looked for a place where it was possible to evolve (or mutate) from a non-junior, at least into a confident junior. In inscrutable ways of the Lord such a place was found, in the appendage the project was attached to the place, and the “old school” programmer, who over the course of his career wrote more systems than the girls. "Fine! The project, and therefore the money for the RFP is, the mentor is attached, we live! ”- I thought, but then, as in the description of a typical horror, the heroes in dark darkness faced terrible horror ...

First things first:

1. Size matters


  Development was started on a samopinny someone who used to use a php engine, used to store data (here you might think MySQL \ PostgreSQL \ SQLite \ MongoDB \ Something-there-but-surely-with-suffix DB boys-not-understand, and here they have not guessed) api-gateway.

“What for, using php, do you attach another api-gateway to it, and store data on it? Isn't it easier to work with api directly from js-code? Or use a DBMS + PHP? ”- a well-meaning reader will ask. And he will be right. But at that time I, who had not yet seen the views, did not think so, well, who knows, cool guys are probably doing that, and the old school programmers know better.

As I was further explained:

  1. Gateway = security, no one will enter and will not leave just like this
  2. Gateway = secure data storage, you just can’t climb, + backups
  3. Gateway = speed, works quickly and without failures, time tested
  4. The authoritative point of view of “old school” programmers says - your entire php is full of holes, any web application is hacked by default, so there is nothing to store data next to it

A characteristic feature of the api-gateway was that the json data was transmitted in a get request. Yes, yes, those json-objects, which are so dear to the heart, were url-encoded and put in the query string. And everything would be fine, when suddenly one day ... the length of the get-request ceased to be enough. Stupidly did not get there url-coded json, canal! The “old school” programmer, scratching his head, asked:
“Shaw are we going to do? Our json grew, and we didn’t notice ... ”.
“Well, uh, can we send them to the post then?” I suggested, it would be more correct.
“Ok, pass to the post”.
It was atas number one.

2. Time and backup management


To screw the new functionality into the project, it was necessary to implement
the corresponding CRUD requests at the gateway, which our “old school” comrade actually did. The problem was that he did this once every 3 days, giving out - “Done, check”. Checks at times showed that not everything worked, for example, getting a list - ok, adding a new item - not ok. Correction and revision took some more time, after which it was possible to release functionality to the mass access. The proposal to do the implementation of requests on the gateway itself, because it was at least faster, was rejected, because “it’s difficult there, you won’t figure it out”. The result of this approach was the closure of the work "on itself." If, for example, it was necessary to fix something massively in the database, then choosing between the 3-day wait and the implementation of the fixes myself through the queries - I chose the 2nd option. Customers did not particularly like to wait, the new introductory arrives steadily. One of these introductory, namely the mass affixing of a certain sign to users, was assigned to me to realize, time for everything about everything — an hour, the authorities were waiting for a beautiful report. Here we are waiting for ATAS number two, sir.

The fact is that the format of the json-data transmitted in the requests assumed only a few required fields, all the rest were arbitrary, there was no clear and definitive structure. For example, to add a user, I passed json like:

  POST/api/users
 {
 “Email”: ”ivanov@mail.ru”,
 “Password”: “myEmailIsVeryBig”,
 “Name_last”: “Ivanov”,
 “Name_first”: ”Ivan”,
 “Name_middle”: ”Ivanych”,
 “Birth”: 01/01/1961 ”,//and this is the part that should be considered necessary, then we give
 “Living_at”: ”Susania str., 3 k.4 kv.24”,
 “Phone_num”: ”+ 70000000000”

 }
  

The optional part, which was transmitted in the requests for adding/updating, was saved and was given in full (how it was implemented - I will tell you below). The essence of the matter, time is not worth it, it would be necessary to solve the problem - update users, put tags on them. But not to drive the whole structure every time? Need to check! I tested it on myself - I transmitted only one field in the update request, I checked that the field appeared, the rest of the data was in place. Things are easy - to loop and update the rest.

The script quietly puffed up, accepting and giving away the data, and everything seemed to be going well ... when suddenly a bell rang. “We do not see the name of the users in the system!” - report from that end of the wire. “Oh nafig! Well it worked out! ”- An unpleasant chill ran down my back. Further investigation revealed that indeed, the full name was “”, although all other data was in place. What to do in this situation? Expand backup!

“Comrade“ old school ”programmer, ui hev e problems hir! Need backup! When is the last actual done? ”- I ask.

“Uh-uh ... Let me see ... No, there is no bakapa. ”

The situation was saved by the fact that a couple of hours earlier I refined and tested the module with reports, I had a csv-scale with all the necessary data, for another hour the order was restored.

The lack of clear documentation, descriptions of the algorithms of work, input checks for validity, and most importantly, backups of the database - atas number two with.

Since then, backups have been removed every day.

3. Deep striking


Shako-shaky, but the work was moving, problems were being solved, some faster, some slower, when suddenly ... customers realized that the system was not on anyone's servers, and for such an attitude to PDN and the organization of information on IP in ISPDn them on the head will not stroke. It is necessary to transfer the server to yourself.

Why was the system not originally transmitted? The leadership had one passion - centralization. The management dreamed of a system that will do everything! Do you need a child at school to attach? You go in the system, in a special office, there you submit an application. You need to, say, order a pizza - you go into the system, into another special room, apply for pizza. Maybe you wanted to communicate with beautiful ladies/gentlemen? There is a third special office at your service - you also apply there. And so on to infinity.

Benefits - one login and password for everything, data is safely and securely stored on the gateway. Even backups are there. And note that no one will take this system from us! And even if it takes - what's next? Anyway, they won’t understand the protection system from “old school” programmers - everything is difficult there.

VDS with the system unloaded, attributed to customers, they unfolded it, everything is dancing and singing, beauty!
And here I was covered with a wave of curiosity and some suspicions.

If our web application is full of holes, then where is the data? Really stayed on other servers? And if they decide to close the system from the outside, then everything will collapse?

A simple check showed that the data, like the gateway handlers themselves, was on the same server. And, no, they were not transferred there because of the transfer of the server, they were always there.
Now I had at my disposal that very secret “old school” development, which I began to investigate.Of course, there was no cool reverse engineering in the style of Hacker magazine articles, with ollydbg, biases, and other cool things, so I share what I have.
Actually the development was completed in python, only the .pyc files remained, which were easily decompiled back into readable code. Frankly, a lot of time, as much as 25 minutes, took to figure out how it works.

So, a complex system developed by an “old school” programmer that few people can figure out consists of:

  1. A script processed by an Apache, who actually received the request. What did this script do? Opened a connection to a specific localhost port and sent a request with all its data there. Everything. Interesting things go further.
  2. Server side, processing requests from the script. The logic of his actions was quite interesting. First, there was no data manipulation in the code, and queries to the database, instead, the database functions on PL \ SQL were called. All logic, checks, and so on, everything was embedded in the database function. 50% of the script was a dictionary containing the name of the request, the function associated with it, and the names of the parameters of the function, which should correspond to the data passed in the get-request line. JSON-data, if they were needed, passed as a separate parameter. A feature of the organization of the server part was reserving the connection with user authentication. If the login and password were found in the database, the session ID was generated, and an open connection instance was added to the dictionary (and killed by a 10 minute timeout so that it did not kill, there was a special method for extending the session life), the session ID was the key not kept How exactly is the session ID associated with the user? After all, there is a request for data in which the user ID is not transmitted? It works, which means something is wrong here.

Very difficult development was given to the consciousness with difficulty and did not hasten to reveal the long-lost secrets of the masters of the past.

Incredible (Go to & gt; Definition, thanks to PhpStorm for understanding PL \ SQL), to the incomprehensible mind of the man in the street suffering True Knowledge of the Lost Civilization of Old School Programmers was nevertheless acquired. In general, when connected, a temporary table was generated in the authentication data verification function in which the user ID was stored.

This was just the beginning, a sample list of serious vulnerabilities found:

  • DDoS using mass authentication (connections were repaired, and, therefore, rested against the DBMS connection limit, which, given the existing possibility of extending the session lifetime, allowed the memory to become completely clogged with connections, and the work of new users in the system will be impossible);
  • lack of protection against brute-force (the number of failed login attempts is not detected, stored, not checked;
  • lack of control over actions with entities (for example, the list of documents requested by the user was issued based on the organization to which the user is attached, and if you know the ID of the document, you can successfully execute the request to update/delete the document, and the list of users well, at least without passwords, which, by the way, were stored in the database in the clear, without hashing, anyone could get at all).

And one more serious problem is not a formalized data storage scheme. As promised earlier, I’m talking about storing “any fields” from JSON. No, they were not stored as a row in the table. They were split into key-value pairs and stored in a separate table. For example, for users there were 2 tables - users, and users_data (string key, string value) - where the data itself was stored. The result of this approach was an increase in time with complex samples from the database.

Actually, this was enough to make and enforce the decision to transfer the system to a new api, understandable, documented, and supported.

Morality


Perhaps this system is a "Legacy", and the "old school" programmer who created it is the essence of the guardian of Legacy.

Nevertheless, the conclusions are as follows:

  1. If you are told “it’s difficult there, you won’t figure it out” - it means there is a complete atas
  2. If you are crushed by authority, it means something unclean
  3. Trust, but verify - security is not a condition, security is a process, moreover, is continuous, therefore it is better to make sure that the declared qualities correspond to reality, than to find out that all users have suddenly become “Ivan Ivanov Ivanychs”, but there are no bucks.

Source text: [From the sandbox] History of a little study of Legacy code