[Translation] Why Data Science teams need generalists, not specialists

[Translation] Why Data Science teams need generalists, not specialists


In the book “The Wealth of Nations,” Adam Smith shows how the division of labor becomes the main source of increased productivity. An example is the assembly line for the production of pins: “One worker pulls the wire, another straightens it, the third cuts off, the fourth sharpens the end, the fifth grinds the other end to fit the head”. Thanks to the specialization focused on certain functions, each worker becomes a highly qualified specialist in his narrow task, which leads to an increase in the efficiency of the process. Output per worker increases many times over, and the plant becomes more efficient in the production of pins.

This division of labor in terms of functionality is so ingrained in our minds even today that we quickly organized our teams accordingly. Data Science is no exception. Comprehensive algorithmic business opportunities require a multitude of job functions, so companies usually create teams of specialists: researchers, data analysis engineers, machine learning engineers, scientists involved in cause-effect relationships, and so on. The work of specialists is coordinated by a product manager with a transfer of functions in a way that resembles a pin factory: “one person receives data, another models them, a third performs them, a fourth measures” and so on,

Alas, we do not have to optimize our Data Science teams for better performance. But you do it when you realize what you are producing: pins or something else, and just strive to improve efficiency. The goal of the assembly lines is to complete the task. We know exactly what we want - these are pins (as in Smith’s example), but we can mention any product or service in which the requirements fully describe all aspects of the product and its behavior. The role of employees is to fulfill these requirements as efficiently as possible.

But the goal of Data Science is not to complete tasks. Rather, the goal is to explore and develop strong new business opportunities. Algorithmic products and services, such as recommendation systems, customer interactions, style preference classification, sizing, clothing design, logistics optimization, seasonal trend detection, and more, cannot be developed in advance. They must be studied. There are no drawings to reproduce, these are new features with their inherent uncertainty. The coefficients, models, types of models, hyperparameters, all necessary elements should be studied with the help of experiments, trial and error as well as repetition. With pins, training and design are performed in advance, prior to their production. With Data Science, you learn in the process, not before.

At a pin factory, when training comes first, we do not expect and do not want workers to improvise on any property of the product, in addition to improving production efficiency. Task specialization makes sense because it leads to process efficiency and production consistency (without making changes to the final product).

But when the product is still evolving and the goal is learning, specialization hinders our goals in the following cases:

1. This increases coordination costs.

That is, those costs that accumulate over the time spent on communication, discussion, justification and prioritization of the work that needs to be done. These costs are scaled superlinearly with the number of people involved. (As J. Richard Hackman taught us, the number of relations r grows similarly to the function of the number of members n in accordance with this equation: r = (n ^ 2-n)/2.And each relationship reveals a certain amount of cost ratio). When data analysts are organized by function, at each stage, with each change, each transfer of service, etc. A lot of specialists are required, which increases coordination costs. For example, statisticians who want to experiment with new functions will have to coordinate with data processing engineers who complement data sets every time they want to try something new. In the same way, each new, trained model means that the model developer will need someone with whom to coordinate their actions to put it into operation. Coordination costs act as a fee for iteration, which makes them more difficult and costly and more likely to force you to abandon the study. This may interfere with learning.

2. This complicates the waiting time.

Even more frightening than coordination costs is the time lost between work shifts. While coordination costs are usually measured in hours: the time it takes to conduct meetings, discussions, project reviews - the waiting time is usually measured in days, weeks or even months! The schedules of functional specialists are difficult to equalize, since each specialist must be distributed over several projects. A one-hour meeting to discuss changes may take several weeks to align the work process. And after the approval of changes, it is necessary to plan the actual work itself in the context of a multitude of other projects occupying the working hours of specialists. Work related to the correction of the code or research, which requires only a few hours or days to complete, may take much longer before resources are available. Until then, iteration and learning are suspended.

3. This narrows the context.

The division of labor can artificially limit learning, rewarding people for remaining in their specialization. For example, a research scientist who should remain within his functionality will focus his energy on experiments with various types of algorithms: regression, neural networks, random forest, and so on. Of course, a good choice of algorithm can lead to gradual improvements, but as a rule, much more can be learned from other activities, such as integrating new data sources. In the same way, it will help to develop a model that uses every bit of the explanatory power inherent in the data. However, its strength may lie in changing the objective function or relaxing certain restrictions. It is difficult to see or do when her work is limited. Since a specialist scientist specializes in optimizing algorithms, he is much less likely to do something else, even if it brings significant benefits.

Let us name the symptoms that occur when data science commands work as pin factories (for example, in simple status updates): “waiting for changes to the data pipeline” and “waiting for ML Eng resources”, which are common blockers. However, I believe that a more dangerous influence is that which you do not notice, because you cannot regret anything you do not know. Flawless fulfillment of requirements and complacency, achieved as a result of achieving efficiency of processes, may hide the truth that organizations are not familiar with the benefits of training that they miss.

The solution to this problem is, of course, in getting rid of the factory pin method. To stimulate learning and iteration, data science roles should be shared, but with broad responsibilities that are independent of the technical function, that is, organize data specialists so that they are optimized for learning.This means that it is necessary to hire "full stack specialists" - general specialists who can perform various functions: from concept to modeling, from implementation to measurement. It is important to note that I do not assume that when hiring full-stack specialists, the number of employees should decrease. Most likely, I will simply assume that when they are organized differently, their incentives better relate to the benefits of learning and efficiency. For example, you have a team of three people with three business qualities. At the pin factory, each specialist will devote a third of the time to each professional task, since no one else can do his job. In a full stack, every universal employee is fully dedicated to the entire business process, scaling up and learning.

With fewer people supporting the production cycle, coordination is reduced. The wagon moves seamlessly between functions, expanding the data pipeline to add more data, trying new functions in models, deploying new versions in production for causal measurements, and repeating steps as quickly as new ideas arrive. Of course, the wagon performs different functions sequentially, rather than in parallel. In the end, it's just one person. However, the task usually takes only a small part of the time required to access another specialized resource. So, the iteration time is reduced.

Our station wagon may not be as skilled as a specialist in a particular job function, but we do not strive for functional excellence or small incremental improvements. Rather, we strive to explore and discover new professional challenges with a gradual impact. With a holistic context for a complete solution, he sees opportunities that a narrow specialist will miss. He has more ideas and more opportunities. He also fails. However, the cost of failure is small, and the benefits of learning are high. This asymmetry facilitates rapid iteration and rewards learning.

It is important to note that this scale of autonomy and the diversity of skills provided to full-stack scientists is largely dependent on the reliability of the data platform on which to work. A well-designed data platform abstracts data processing scientists from the challenges of containerization, distributed processing, automatic failover, and other advanced computer concepts. In addition to abstraction, a robust data platform can provide seamless connectivity to the experimental infrastructure, automate monitoring and alert systems, provide automatic scaling and visualization of algorithmic results and debugging. These components are designed and created by data platform engineers, that is, they are not transferred from the Data Science specialist to the data platform development team. It is Data Science specialist who is responsible for all the code used to launch the platform.

I was also once interested in a functional division of labor using process efficiency, but through trial and error (there is no better way of learning) I found that typical roles better promote learning and innovation and provide the right indicators: discovering and building a much larger number of business opportunities than specialized approach. (A more effective way to learn about this approach to organization than the trial and error method I went through is to read Amy Edmondson’s book Teamwork: How Organizations Learn, Create Innovations, and Compete in the Knowledge Economy.)

There are some important assumptions that may make this approach to the organization more or less reliable in some companies. The iteration process reduces the cost of trial and error. If the cost of the error is high, you may want to reduce it (but, this is not recommended for medical applications or production).In addition, if you are dealing with petabytes or exabytes of data, you may need specialization in data design. Similarly, if maintaining business opportunities on the network and their availability is more important than improving them, functional excellence may outperform training. Finally, the full stack model is based on the opinions of people who understand this. They are not unicorns; You can find them or prepare them yourself. However, they are in high demand, and to attract and retain them in the company will require competitive material compensation, sustainable corporate values ​​and interesting work. Make sure that your corporate culture can provide such conditions.

Even with all this, I believe that the full stack model provides the best starting conditions. Start with them, and then consciously move to the functional division of labor only when it is absolutely necessary.

There are other disadvantages of functional specialization. This can lead to loss of responsibility and inactivity on the part of workers. Smith himself criticizes the division of labor, suggesting that it leads to a dulling of talent, i.e. workers become ignorant and withdrawn, because their roles are limited to several repetitive tasks. While specialization can ensure the effectiveness of the process, it is less likely to inspire employees.

In turn, universal roles provide all that stimulates job satisfaction: autonomy, skill and dedication. Autonomy is that they do not depend on something to achieve success. Mastery is strong competitive advantage. And dedication - in the ability to influence the business that they create. If we manage to get people to get carried away with their work and make a big impact on the company, then everything else will fall into place.

Source text: [Translation] Why Data Science teams need generalists, not specialists