[From sandbox] Categories instead of directories, or Semantic file system for Linux

[From sandbox] Categories instead of directories, or Semantic file system for Linux

Data classification is in itself an interesting topic for research. I love collecting information that seems to be necessary, and I always tried to make logical hierarchies of directories for my files, and once in a dream I saw a beautiful and convenient program for assigning tags to files, and decided that it was impossible to continue this way.

The problem of hierarchical file systems

Users often encounter the problem of the entire choice of where to save the next new file and the problem of finding their own files (sometimes file names are not meant to be memorized with people).

The solution may be semantic file systems, which are usually an add-on to a traditional file system. Directories in them are replaced by semantic attributes, also called tags, categories, metadata. I will use the term "category" more often, because in the context of file systems, the word "tag" is sometimes weird, especially when the "subtags" and "aliases of tags" appear.

Assigning categories to files largely eliminates the problem of storing and searching the file: if you remember (or guess) at least one of the categories assigned to the file, the file will never be out of sight.

Earlier this topic has been raised more than once ( times , two , three , four and others), here I describe my decision.

Path to implementation

Immediately after this dream, I described in my notebook a command interface that provides the necessary work with categories. Then I decided that in a week or two I could write a prototype using Python or Bash, and then I would have to work on creating a graphical shell on Qt or GTK. The reality, as always, turned out to be much harsher, and the development was delayed.

The original idea was to first of all make a program with a convenient and concise command line interface that will create, delete categories, assign categories to files, and remove categories from files. I called the program vitis .

The first attempt to create vitis ended in nothing, since a lot of time began to go to work and the institute. The second attempt was already something: the master’s thesis managed to complete the conceived project and even make a prototype of the GTK shell. But that version turned out to be so unreliable and inconvenient that I had to rethink a lot.

I have actually used the third version myself for a very long time, transferring several thousand of my files into categories. This, among other things, was greatly facilitated by the implemented bash autocompletion. But some problems, such as the lack of automatic categories and the ability to store files of the same name, still remained, and the program was already bent under its own complexity. So I came to the need to solve the problems of developing complex software: write detailed requirements, develop a functional testing system, study packaging instructions and much more. Now I have come to what was intended, so this modest creation can be presented to the free community. Specific file management, such as managing through the concept of categories, raises unexpected questions and problems, and in solving them vitis has spawned five more projects around it, some of which will be mentioned in the article. Up to now, vitis has not acquired a graphical shell, but the ease of using file categories from the command line already overlaps for me any advantages of a regular graphical file manager.

Examples of usage

Let's start with a simple - create a category:

  vitis create Music  

Add some composition to it as an example:

  vitis assign Music -f "The Ink Spots -  

You can view the contents of the "Music" category with the "show" subcommand:

  vitis show music  

You can play it using the "open" subcommand

  vitis open music  

Because we have only one file in the "Music" category, then only it will start. For the purpose of opening files with their default programs, I made a separate vts-fs-open utility (standard tools like xdg-open or mimeopen did not suit me for a number of reasons; but, if anything, you can specify in the settings another utility for universal opening of files). This utility works well on different distributions with different working environments, so I recommend installing it along with vitis.

You can directly specify the program for opening files:

  vitis open Music --app qmmp  

We create more categories and add files with "assign". If files are assigned to categories that do not yet exist, a request is made to create them. Unnecessary query can be avoided by using the --yes flag.

  vitis assign Programming R -f "Introduction to R.pdf" "Statistical package R: probability theory and mathematical statistics.pdf" --yes  

Now we want to add the "Statistical R: Probability Theory and Mathematical Statistics.pdf" file to the "Mathematics" category. We know that this file already has the category "R" and therefore we can use the categorical path from the Vitis system:

  vitis assign Mathematics -v "R/Statistical R Package: Probability Theory and Statistics. pdf"  

Fortunately, bash autocompletion makes it easy to do.

We look at what happened using the --categories flag to see the list of categories for each file:

  vitis show R --categories  

Notice that the files were also assigned automatic categories by format, type (combines formats) and file extension. These categories are optionally disabled. Later, I will definitely localize their names.

Add something else to the Math:

  vitis assign Mathematics -f "Mathematical analysis - 1984.pdf" Perelman_Activating_mathematics_1927.djvu  

And now the interesting begins. Instead of categories, you can write expressions with the operations of union, intersection and subtraction, that is, use operations on sets. For example, the intersection of "Mathematics" with "R" will result in one file.

  vitis show R i: Math  

Subtract from the "Mathematics" mention of the language "R":

  vitis show Mathematics \\ R # or vitis show Mathematics c: R  

We can aimlessly combine music and R language:

  vitis show Music u: R  

The -n flag allows you to "pull out" the necessary files by the number and/or range from the result of the query, for example, -n 3-7 , or more difficult: -n 1,5, 8-10,13 . It is often useful with the open subcommand, which allows you to open the necessary files from the list.

Although we are moving away from using the usual directory hierarchy, it is often useful to have nested categories. Create a sub-category "Statistics" for the category "Mathematics" and add this category to the appropriate file:

  vitis create Math/Statistics

 vitis assign Mathematics/Statistics -v "R/Introduction to R.pdf"

 vitis show Mathematics --categories  

We can see that this file now has the category "Mathematics/Statistics" instead of "Mathematics" (the extra links are tracked).

It can be inconvenient to contact the full path; create a "global" alias:

  vitis assign Mathematics/Statistics -a Statistics

 vitis show Statistics  

Not just regular files

Internet links

To unify the storage of any information would be useful, at a minimum, to categorize links to Internet resources. And it is possible:

  vitis assign Habr Color Anomaly -i https://habr.com/ru/company/sfe_ru/blog/437304/- yes  

A file with an HTML page header and a .desktop extension will be created in a special location. This is the traditional label format in GNU/Linux. These tags get the automatic NetworkBookmarks category.

Naturally, shortcuts are created to use them:

  vitis open Color Anomaly  

Executing the command results in opening the newly saved link in the browser. Categorized shortcuts to Internet sources can serve as a replacement for browser bookmarks.

File Fragments

It is also useful to have categories for individual file fragments. Nice application, huh? But the current implementation so far affects only plain text files, audio and video files. Let's say you need to mark a certain piece of a concert or a funny moment in a movie, then when using assign you can use the flags --fragname, --start, --finish. Let's save the screensaver from "Duck stories":

  vitis assign vitis assign -c Screensavers -f Duck_Tales/s01s01.avi --finish 00:00:59 --fragname "Duck Tales intro"

 vitis open Screensavers  

In reality, no file clipping occurs, instead a pointer to a fragment is created, which describes the file type, path to the file, the beginning and end of the fragment. The creation and opening of pointers to fragments is delegated to utilities specially made by me for this purpose - these are mediafragmenter and fragplayer. The first creates, the second opens. In the case of audio and video recordings, launching a media file from a specific position to a specific position occurs using the VLC player, so it must also be in the system. At first I wanted to do it on the basis of mplayer, but for some reason it was very crooked with positioning at the right moment.
In our example, the file "Duck Tales intro.fragpointer" is created (it is placed in a special place), and then a fragment from the beginning of the file is played (since - start was not specified during creation) to the mark in 59 seconds, after which VLC closes.

Another example - we decided to categorize a separate performance at a concert of a famous performer:

  vitis assign Leps "Save our souls" -f Gregory \ Leps \ - \ Concert \ Sail \ - \ Songs \ Vladimir \ Vysotsky.mp4 --fragname "Save our souls" -  start 00:32:18 - finish 00:36:51

 vitis open "Save our souls"  

When opened, the file will be included in the desired position and after four and a half minutes will close.

How it all works + additional features

Category storage

At the very beginning of thinking through the organization of the semantic file system, I came up with three ways: through the storage of symbolic links, through a database, through a description in XML. I won the first way, because on the one hand, it is simple to implement, and on the other hand, the user has the opportunity to look at categories directly from the file system (and this is convenient and important). At the beginning of using vitis , the directory "Vitis" and the configuration file ".config/vitis/vitis.conf" are created in the user's home directory. In ~/Vitis, directories are created that correspond to categories, and symbolic links to the original files are created in these category directories. Category aliases are also just links to them. Of course, the presence of the "Vitis" directory in the home directory may not suit someone. We can switch to any other place:

  vitis service set path/mnt/MyFavoriteDisk/Vitis/ 

At a certain point, it becomes clear that files scattered in different places are poorly categorized, because their location may vary. Therefore, I first created a directory for myself, where I stupidly dropped everything and gave it all categories. Then I decided that it would be nice to arrange this moment at the program level. So the concept of "file space" appeared. At the beginning of using vitis , it would not hurt at once to set up such a place (all the files we need will be stored there) and enable autosave:

  vitis service add filespace/mnt/MyFavoriteDisk/Filespace/

 vitis service set autosave yes  

Without autosave, using the "assign" subcommand will require the --save flag if there is a desire to save the file being added to the file space.

Moreover, you can add multiple file spaces and change their priorities, this can be useful when there are a lot of files and they are stored on different media. Here I will not consider this opportunity, details can be found in the help of the program.

Migrating the semantic file system

Anyway, the Vitis directory and file spaces can theoretically sometimes move from place to place. To make it work, I created a separate link-editor utility that can edit links in bulk, replacing parts of the path with others:

  cp -r/mnt/MyFavoriteDisk/Vitis/~/Vitis
 link-editor -d ~/Vitis/-f/mnt/MyFavoriteDisk/Vitis/-r ~/Vitis/-R
 cp -r/mnt/MyFavoriteDisk/Filespace/~/MyFiles
 link-editor -d ~/Vitis/-f/mnt/FlashDrive-256/Filespace/-r ~/MyFiles -R  

In the first case, after we moved from/mnt/MyFavoriteDisk/Vitis/to the home directory, symbolic links associated with aliases are edited. In the second case, after changing the location of the file space, all links in Vitis are changed to new ones in accordance with the request to replace parts of their path.

Automatic categories

If you run the command vitis service get autocategorization , you can see that the default setting is to assign automatic categories by format (Format and Type) and file extension (Extension).

This is useful when, for example, you need to find something among the PDF or look at what is stored in EPUB and FB2, you can simply execute a query

  vitis show Format/MOBI u: Format/FB2  

It just so happened that the standard GNU/Linux tools such as file or mimetype did not suit me precisely because they did not always correctly define the format, I had to do my implementation on file signatures and extensions.In general, the topic for defining file formats is an interesting topic for research and deserves a separate article. While I can say that, perhaps, not for all formats in the world, I have provided true recognition, but in general, it already works well now. True, EPUB now defines the format as ZIP (in general, it is justified, but in practice this should not be considered normal behavior). Until now, consider this opportunity experimental, report bugs. In strange situations, you can always use categories by file extension, for example, Extension/epub.

If autocategories are included by format, autocategories are also included, which combine some formats by type: "Archives", "Pictures", "Video", "Audio" and "Documents". Localized names will also be made for these subcategories.

What isn’t said

vitis turned out to be a very versatile tool, and it is difficult to cover everything at once. Briefly mention what else you can do:

  • categories can be removed and removed from files;
  • The results of the queries on expressions can be copied to the specified directory;
  • files can be run as programs;
  • show command has many options, for example, sorting by name/date of change or access/size/extension, showing properties of files and paths to originals, enabling display of hidden files, etc.;
  • when saving links to web sources, you can also save local copies of HTML pages.

All details can be found in the user help.


Skeptics often say that "no one will arrange these tags himself." By my example, I can prove the opposite: I have already categorized over six thousand files, created over a thousand categories and aliases, and it was worth it. When you open a list of your business with one vitis open Plan command or when you open a book of Stolyarov about the LaTeX layout system with a single vitis open LaTeX command, it is already morally difficult to use the file system "in the old way" .

On this basis, a number of ideas arise. For example, you can make an automatic radio, which includes thematic music according to current weather, holiday, day of the week, time of day or year. Still close to the topic is a music player that knows about categories and can play music by expression with operations on categories as sets. It is useful to make a daemon that will monitor the “Downloads” directory and will suggest categorizing new files. And, of course, you should make a normal graphical semantic file manager. Once I even did a web-based file sharing service for an enterprise, but it was not a priority and became irrelevant, although it achieved a high level of workability. (Due to the big changes in vitis , it is no longer usable.)

here is a small demonstration


Vitis is not the first attempt to radically change the style of working with data, but I thought it important to implement my ideas and put the implementation into open access under the GNU GPL license. For convenience, a deb package for x86-64 has been made; it should work on all modern Debian distributions. There were some minor issues at ARM (all other programs related to vitis work fine), but later on a working package will be assembled for this platform (armhf). Creating RPM packages has so far ceased to be involved due to problems on Fedora 30 and the problem of splashing onto many RPM distributions, but later all the same, packages will be made for a couple of them.In the meantime, you can use make & amp; & amp; make install or checkinstall .

Thank you all for your attention! I hope this article and this project can be useful.

Link to the project repository

Source text: [From sandbox] Categories instead of directories, or Semantic file system for Linux