Handling complexity

Data scientists build complex products. A complex product is one that consists of multiple different, interconnected parts. In order to design, build and maintain such a product, the data scientist needs to not only know how these different parts work internally, but also how they work together. For example, he needs to know:

  • the entities and relationship of the business objects like customers, products and transactions

  • the structure of the input data

  • how the data is filtered to a relevant subset

  • what transformations are applied

  • what features are being calculated and how they are being calculated

  • how the features are processed before plugging them into the model

  • what the model does, what it is trying to predict and how it works

  • in what type of environment the model is being developed and deployed

  • how the model output is processed or stored or passed between environments

  • how users interact with the product

  • how to monitor the model performance

  • how to detect and fix errors or problems

and so on. Being so involved in every part of the product makes it hard to explain to people outside of the development team what you are currently working on. People may have an idea about what your product does, but most of the time, they are not entrenched in the details and inner workings of the product like the development team. Usually, even people who are part of the development team have trouble giving a summary of the different components - not to mention the usual absence of documentation. This becomes apparent when a new member joins the team and struggles to grasp the structure of the product - an experience that I'm sure most of the people reading this book are familiar with.

The solution I would like to propose is to draw something like a 'world map' of your product. For a project that I am currently working on, I have literally taken a fictional map with mountains, forests, rivers and roads, and added captions that correspond to different parts of the product we're building. A mountain range signifies the border between two different computer systems while a road stands for a data pipeline that transports data from some data source to another system. Different landmarks represent the different components of the product.

Whenever we are being asked what we are currently working on, we can now simple point on a location on the map. This visual cue helps anyone quickly grasp not only what we're working on, but also where it fits into the overall project. Instead of saying that you're working on the feature generation, you can point at the town labeled 'feature generation'. Even people without a detailed understanding of your product will see that this town is located near a river called 'input data' that originates in a mountainous area called 'data warehouse'. They will notice an outgoing road leading to a valley named 'Hadoop cluster' with another town called 'prediction model'.

This helps to put things into context, because you have both an overview over the entirety of the project as well as an overview over the neighborhood and immediate connections of the component you are currently working on. This also creates an image in the head of everybody you are presenting your project to. A bunch of slides full of bullet points will be forgotten by your CEO within minutes, but an image like this won't be. People will remember your project as the one with the weird world map. Trust me on that.


  • Your project or product is a black box to most people, sometimes even to the people working on it.

  • Draw a 'world map' of your project or product to visualize it and make it easier to understand its parts.

Last updated