[From the sandbox] Investigation into one unknown archive

[From the sandbox] Investigation into one unknown archive

Moving. New town. Job seeking. Even for an IT specialist, this can take a long time. A series of interviews, which, in general, are very similar to each other. And as it usually happens when you have already found a job, after a while, one interesting office is announced.

It was difficult to understand what she was doing, however, her area of ​​interest was the study of someone else's software. It sounds intriguing, although when you realize that this is not a vendor who releases software for cybersecurity, you stop for a second and start scratching your turnips.

Briefly: they threw me an archive and offered to test it as a test task and try to calculate a certain signature based on the input data presented. It should be noted that I had very little experience in such activities and, probably, therefore, in the first iteration of the solution, I only had enough for a couple of hours - the further motivation to do this, went away. And yes, I, of course, the first thing I tried to run it on the phone/emulator - this application is invalid.

What we have: Archive with the extension ". apk" . I placed the task under the spoiler so that it would not be indexed by search engines: what if the guys didn’t like what I put the solution on Habr?

The task itself
The APK contains the functionality for generating a signature for an associative array.
Try to get a signature for the following data set:

  "user": "LeetD3vM4st3R",
  "password": "__s33cr $$ tV4lu3__",
  "hash": "34765983265937875692356935636464"

Roll up the sleeves

It is said that the archive contains the functionality of signing an associative array. By file extension, we immediately understand that we are dealing with an application written for Android. First, unpack the archive. In fact, this is a regular ZIP archive, and any archiver can handle it lightly. I used the apktool utility, and, as it turned out, inadvertently, walked around a couple of rakes. Yes, it happens (usually the opposite, yes?). The spell is quite simple:

  apktool d task.zip  

It turns out that the code and resources in the apk file are also stored in separate binaries, and they will need some other software to extract them. apktool implicitly got classes bytecode, resources, and laid it all out in a natural file hierarchy. You can proceed.

  ├── AndroidManifest.xml
 Ap── apktool.yml
 Lib── lib
 64 └── arm64-v8a
 Original── original
 │ ├── AndroidManifest.xml
 │ └── META-INF
 ├── res
 │ ├── anim
 │ ├── color
 Draw ├── drawable
 │ ├── layout
 │ ├── layout-watch-v20
 │ ├── mipmap-anydpi-v26
 │ ├── values
 Values ​​└── values-af
 S── smali
 │ ├── android
 │ ├── butterknife
 │ ├── com
 │ ├── net
 │ └── org
 Unknown── unknown
  Or── org  

We see a similar hierarchy (left its simplified version) and try to understand where to start. It is worth noting that I once wrote a couple of small Android applications, so the essence of the part of the directories and, in general, the principles of the Android device, I understand something about.

For a start, I decide to just walk through the files. I open AndroidManifest.xml and start to read meaningfully. My attention is attracted by a strange attribute

  android: supportsRtl = "true"  

It turns out that he is responsible for the support of languages ​​with the letter "from right to left" in the application. Getting started, strain. Not good.

Then my eyes cling to the folder "unknown". Under it is a hierarchy of the form: org.apache.commons.codec.language.bm and a huge number of text files with obscure content.We google the full name of the package and find out what is stored here, something related to the search algorithm for words phonetically similar to the given one. Frankly, here I began to strain more. A little poking through the directories, I actually found the code itself, and then the most interesting began. I was met not by the usual Java bytecode, with which I once had time to play, but something else. Very similar, but different.

As it turned out, Android has its own virtual machine - Dalvik. And, like every respected virtual machine, it has its own bytecode. It seems that on the first attempt to solve this problem, it was on this sad note that I announced an intermission, bowed, lowered the curtain and threw it all months at 4 until my curiosity finally finished me.

Rolling up the sleeves [2]

“Isn't it possible to make everything easier?” - that is the question I asked myself when I started the task a second time. I started searching the Internet for a decompiler from smali to Java. I saw only that this process is unequivocally impossible to perform. Frowning a little, I went to Github and drove a couple of key phrases into the search box. The first one came across smali2java .

  git clone
 gradle build
 java -jar smali2java.jar ..  

Errors. I see a huge structure and errors on several pages of the terminal. Having read a little into the essence of the content (and restraining emotions on the size of the frame), I find that this tool works on the basis of some described grammar and the byte code that she met clearly does not correspond to it. I open smali bytecode and see annotations, synthetic methods and other strange constructions in it. There was no such bytecode in java! How long? Delete!

The Dalvik virtual machine (as well as the JVM), as it turned out, is not aware of the existence of such concepts, as inner/outside classes (cheat. nested classes), and the compiler generates so-called “synthetic” methods to provide access from the nested class to external fields, for example.

As an example:

If the outer class (OuterClass) has a field

  public class OuterClass {
 List a;

In order for a private class to access an external class field, the compiler will implicitly generate the following method:

  static synthetic java.util.List getList (OuterClass p1) {
 p1 = p1.a;
 return p1;

Also, due to this “under-the-hood” kitchen, the work of some other mechanisms provided by the language is achieved.

You can learn more about this question from here .

Does not help. Swears even, on, seemingly, not a suspicious byte code. I open the source code of the decompiler, read and see something very strange: even Hindu programmers (with all due respect) would not have written this. The thought creeps in: not that generated code. I drop the idea for about 30 minutes, trying to understand what the error is. COMPLEX. I open again Github - and the truth, the parser generated on grammar. And here is the generator itself generator . I put it all away and try to come from the other side.

It is worth noting that a little later, I still tried to change the grammar and sometimes even the byte code itself, so that the decompiler still managed to digest it. But even when the bytecode became valid in terms of the grammar of the decompiler, the program simply did not return anything to me. Open Source ...

I leaf through the byte code and stumble upon unknown constants. Googling, I meet the same in the book on the reverse of Android applications. I remember that this is just the ID assigned by the compiler's preprocessor, which is assigned to the Android application resources (the time constant of writing the code is R. *).The next half hour is an hour, I briefly examine which registers are responsible for what, in what order the arguments are passed, and generally delve into the syntax.

What does it look like?

Discovered layout of the main window, and on it is already understood that in general occurs in the application: on the main screen (Activity) there is a RecyclerView (conditionally, View that can reuse UI objects that are not currently displayed, for memory utilization) with fields for entering a key/value pair, a couple of buttons that are responsible for adding a new pair key/value in some abstract container, and a button that generates sub is (signature) for this container.

Looking at the annotations and watching a certain amount of code suspiciously similar to the generated one, I start to google. The project uses the ButterKnife library, which allows you to produce inflate () - & gt; using annotations. bind () UI elements automatically. If there are annotations in a class, the ButterKnife annotation processor implicitly creates another binder class like & lt; original_class & gt; __ ViewBinding , which does all the dirty work under the hood. Actually, I received all this information from only one MainActivity file after manually recreating the similarity of the Java source from it. Half an hour later, I realized that the annotations of this library can also set callback on actions with buttons and found those key functions that were actually responsible for adding a key/value pair to the container and generating the signature.

Of course, in the course of the study, you had to crawl into the “guts” of various libraries and plugins, because even beautiful Lendos with dolls do not cover all use cases and details, which I think is common practice for any “reverseser.”

Laziness is a friend of a programmer

Having spent some time on the second source, I finally got tired and realized that I could not cook porridge like that. Climbing on Github again, and this time I’m looking closer. I find the project Smali2PsuedoJava is a decompiler in “pseudo-Java code”. Even if this utility can at least bring something into a human form, then for me, the author will have a mug of his favorite beer (well, or at least put a star on Github, for starters).

And it really works! Effect on face:

Meet Cipher.so

A little later, studying the Java pseudo-code of the project and mistrustfully comparing it with the smali bytecode, I find a strange library in the code - Cipher.so. Googling, I find out that this is a lib for encrypting a set of compile-time values ​​inside the APK-archive. This is usually necessary when the application uses constants of the form: IP addresses, credentials for an external database, tokens for authorization, etc. - what you can get with the help of reverse-engineering application. True, the author clearly writes that this project is abandoned, they say, go away. This is getting interesting.

This library provides access to values ​​through a Java library, where a specific method is the key of interest. It only stirs my interest, and I begin to climb deeper.

In short, what Cipher.so does and how it works:

  • in the Gradle-file of our project, the keys and the corresponding values ​​are written
  • All key values ​​will be automatically packaged into a separate dynamic library (.so) that will be generated at compile time. Yes - yes, it WILL be generated.
  • Then these keys can be obtained from the Java methods generated by Cipher.so
  • after creating the APK, the key names are hashed MD5 (for greater sesarnost, of course)

Having found the dynamic library I need in the folder with the archive, I proceed to picking it. To begin with, as an experienced reverser (no) I try to start with a simple one - I decide to look at the section with constants and for interesting lines in an ELF-like binary. Unfortunately, mac users have no readelf out of the box, and before the start we say the cherished:

  brew install binuitls  

And don't forget to set the path to /usr/local in PATH, because brew gently protects you from everything ...

  greadelf -p .rodata lib/arm64-v8a/libcipher-lib.so |  head -n 15  

We restrict the output to the first 15 lines, otherwise this can lead to a shock to an unprepared engineer.

In lower addresses we notice suspicious lines. As I found out, studying the source code of Cipher.so, the keys and values ​​are put in the usual std :: map: , this gives little information, but we know that in the binary itself, along with the encrypted passwords are number and obfuscated keys.

How are values ​​encrypted? Studying the sources, I discovered that encryption occurs using AES, the standard symmetric encryption system. So, if there are encrypted values ​​here, then the key should be located nearby ... Without studying, I came across an issue in the same project with the provocative name " Insecure key storage: secrets are very easy to retreive ". In it, then, in fact, I learned that the key is stored in clear form in the binary, and found the decryption algorithm. In the example, the key was at the zero address, and although I understood that the compiler could put it in another place in the binary file’s .rodata section, I decided that this suspicious unit at the zero address is the key.

Attempt # 1: I start deciphering the values ​​and consider that the encryption key is the same one. Mistake. OpenSSL hints that something is wrong. After reading a bit of the Cipher.so source, I understand that if the user does not specify a key when building, then the default key is Cipher.so@DEFAULT .

Attempt # 2: Error again. Hmm ... Is it really redefined by this constant? Making a mistake is quite simple: confusing code written in Gradle, with “gone” formatting. I check again. Everything seems to be so.

Instead of keys, their MD5 hashes lie, and then I try to try my luck and open a service with rainbow tables. Voila - one of the keys is the word "password". There is no second. It gives us, of course, not much. Both of these keys are at addresses 240 and 2a2, respectively. In principle, it is easy to recognize them immediately - 32 characters (MD5).

I checked everything again and tried to do the decryption with all the other lines (which are in the lower addresses) as a key for decryption - all in vain.
So, there is some other secret key, the algorithm of actions seems to be correct. I throw this task aside and try not to bury myself.

Having a little rummaged in the container signature algorithm, I still see calls to the Cipher.so library and code that also uses the cryptographic functions of the Java library.

Riddle (which I never guessed)

In the function that is responsible for encryption, at the very beginning there is a check for the keys in the container.

  public byte [] a (java/util/Map p1) {
  v0 = p1.size ()
  v1 = 0x0;
  if (v0! = 0) goto: cond_0
  p1 = new byte [v1];
  return p1;: cond_0
  v0 = "user";
  v0 = p1.containsKey (v0)
  if (v0 == 0) goto: cond_1
  p1 = new byte [v1];
  return p1;

Literally: if there is a key “user”, then this container is not signed (a zero signature is returned). A strange feeling: the problem seems to have been solved, but it seems suspiciously simple and somehow. Then why invent everything else? To knock off the easy way? Then why haven't I learned fluently this code before? Hmm ...

No, not right. I clarified the answer from a certain user in a blue messenger, whose contacts I provided when issuing a task. We dig further. Perhaps the key/value input set somehow changes as it is added to the container? I read the code more closely.

I draw your attention that the decompiler removed the annotations from the smali code. And what if he removed something important? I check the main files - like, nothing substantial. Everything is important in its place, and the meaning is not lost. I check callback functions that are responsible for writing a key/value pair from conditional TextBox to internal containers. I did not find anything criminal.

I became skeptical about every line of code - I can no longer trust anyone.

Simple solution # 2: I noticed that the signing procedure begins with checking for the presence of some value (substrings in a string) in the signature of the certificate with which the application was signed.

  @ OnClick//generation of signature
 protected void huvot324yo873yvo837yvo () {
 String signature = "no data";
 boolean result = some_packages.isKeyInSignature (this);
 if result {
  Map map = new HashMap ();

The value itself, of course, lies encrypted in that same ill-fated binar. And actually, if this value is not in the signature, then the algorithm will not sign anything, but simply return the string “no data”, as the signature ... Again, we’ll take Cipher ...

Final fight with decryption of keys

To understand the scale of the tragedy, I was so confused:

I made a hex dump of this section and looked at the first two lines, suspicions from which did not fall from the very beginning.

If you pay attention, the character that separates the lines here is ‘0x00’. It is also usually used by the standard C library, in functions for working with strings. From that no less interesting, what is the space character in the middle of the first line? Then mad attempts begin, where the key is:

  • whole first line
  • first line before the space
  • first line with a space and to the end
  • ...

The degree of paranoia can already be assessed. When you do not understand how difficult and cunning the task should be, then you begin to be driven. And yet, not that. Then the thought comes to my mind: “Does the algorithm correctly work out the issue on my machine?”. In general, the sequence of actions there is logical and did not cause any questions, but the question is: do the commands on my machine do what is required of them? So what do you think?

Checking all the steps manually, it turned out that

  echo "some_base64_input" |  openssl base64 -d  

on some input arguments it suddenly returns an empty string. Hmm.

Replacing it with the first base64 decoder on the machine, and sorting through the main candidates, a suitable key was immediately detected, and the keys were decoded accordingly.

Getting a signature from a certificate

  class a {
 public static boolean isKeyInSignature (android.content.Context p1) {
 v0 = 0x0;
 try TRY_0 {
  v1 = p1.getPackageManager ()
  p0 = p1.getPackageName ()
  v2 = 0x40;//GET_SIGNATURES
  PackageInfo p0 = v1.getPackageInfo (p0, v2)
  android.content.pm.Signature [] p0 = p0.signatures;
//Order are not guaranteed
  v1 = p0.length;
  v2 = 0x0;: goto_0
  if (v2 & gt; = v1) goto: cond_1
  v3 = p0 [v2];
  String v3 = v3.toCharsString ()
  String v4 = net.idik.lib.cipher.so.CipherClient.a ()
  v3 = v3.contains (v4)
 } TRY_0
 catch TRY_0 (android/content/pm/PackageManager $ NameNotFoundException) goto: catch_0;
  if (v3 == 0) goto: cond_0
  p1 = 0x1;
  return p1;: cond_0
  v2 = v2 + 0x1;
  goto: goto_0: catch_0
  p0 = Thrown Exception
  p1.printStackTrace (): cond_1
  return v0;

Here is what the generated pseudocode looks like, after my minor edits. Confused a couple of things:

  • poor knowledge of cryptography and "kitchen" device certificates
  • According to the documentation, this method does not guarantee the order of the certificates in the returned collection, and accordingly, they would not be able to go round in the same order in the same order - what if the application was signed by more than one certificate?
  • lack of knowledge how to extract the certificate from the APK, given that it is unclear what the Android Runtime does in this case

I had to delve into all these questions and the result was the following:

  • The certificate itself is in the directory of original/META-INF/CERT.RSA

    there is only one file in this directory with this extension - it means that the application is signed with just one certificate
  • on the site about research engineering Android applications, a listing was found that can extract the signature we need in the way that Android itself does. According to the author's assurances, at least.

By running this code, I can figure out the signature, and in fact, the key we need is a substring. Go ahead. Simple solution # 2 is swept away.

And the truth is, the key is in the certificate, it remains only to understand what's next, because if we have the key “user”, we still get a zero signature, and as we learned above, this is the wrong answer.

Write the documentation carefully!

Further studies to ensure that data entered from text fields change, are discarded due to lack of evidence. Paranoia rolls on with a new power: maybe the code that pulled the signature from the certificate is incorrect or is it the implementation of the code for old Android releases? I open the documentation again and see the following: ( https://developer.android.com/reference/android/content/pm/Signature.html#toChars () ):

Warning: The function encodes the signature as ASCII text. In the output I was getting above, there was a hex representation of the data. It seemed to me that the API was strange, but if you believe the documentation, it turns out that I was again driven into a dead end, and the encrypted key is not a substring of the signature. After sitting thoughtfully over the code for some time, I could not stand it and opened the source code for this class. https://android.googlesource.com/platform/frameworks/base/+/e639da7/core/java/android/content/pm/Signature.java

The answer was not long in coming. And actually, in the code itself - an oil painting: the output format is a regular hex string. And think: either I don’t understand something, or the documentation is written “slightly” incorrectly. Having quarreled in nowhere, I again set to work.


The following n hours have passed for:

  • checking the correctness of work in the code with RecyclerView and finding out its behavior through the source code again, not all moments are covered in detail in the dock and even on StackOverflow
  • Manual decompilation of the code fragment responsible for signing the collection into compiled Java. I took it for the assumption that I missed something after all and the first key in the container (“user”) implicitly leaves the collection.I decided to set the rest of the data on the code.

In general, this code refused to sign even the remaining arguments (further in the code when working with cryptography, these arguments implicitly threw me out of the way).

Not. It turned out that you cannot sign these inputs. Unfortunately, it’s impossible to pass this work and find out whether it really is so. It's a pity. For a while it took my mind, but I reassured myself that I did everything I could.

In fact, I spent a lot of time on this task, and at the same time on the restoration of gaps in knowledge. It was really helpful. One can trace the whole path and pay attention to how at first I clung to absolutely unrelated details. Perhaps it will help someone to understand how beginners solve problems of this kind, because we usually read “success stories” where all steps are logical, consistent and lead to the right result.

If someone wants to try to dig into this problem a little more or ask a question - write me in the blue messenger arturbrsg .

Stay tuned.

Source text: [From the sandbox] Investigation into one unknown archive