Data Science Team Topologies

https://ift.tt/bH8Ycp4 How data product development diverges from software Photo by Alina Grubnyak on Unsplash Today, we will discuss d...

https://ift.tt/bH8Ycp4

How data product development diverges from software

Today, we will discuss data science team topologies and how they differ from your typical software teams. How accurate is our thesis? Stick with me, and let's see if we can make a solid argument for it.

Building a data science team has a few key differences compared to a traditional software development team. From the roles and responsibilities of team members to the tools and processes used, data science teams require a unique approach to ensure they are set up for success. We'll explore some things to consider when building a data science team, including the importance of cross-functional collaboration, specialized skills, and experimentation and iteration in the data science process. And We'll have uncovered some valuable insights and best practices to help you navigate the complexities of building a data science team if we've convinced ourselves by the end.

Data-Driven, not Code-Driven

In data science, the problems rarely revolve around a specific technology or programming language and, more often, around the data itself. Consequently, the makeup of a data science team can look quite different from a traditional software team. A diverse set of skills and perspectives is a boon in working with data. That's why interdisciplinary teams, with a mix of experts of varying backgrounds and domain-specific areas, are often successful. In addition, a group of generalists — who can approach problems with a fresh perspective — can bring valuable insights and help drive innovation.

In their analytics pursuits and design of data products, data science teams also often have a more exploratory, iterative approach to problem-solving. As a result, data products and analytics projects are often characterized by uncertainty and a need for clear-cut solutions. Data scientists must constantly sift through large amounts of data to identify patterns and insights, and they may need to try multiple different approaches before arriving at a final solution. This iterative process can be time-consuming and make it difficult for scientists or leaders to predict precisely when a team will complete some projects. As a result, data science teams may require more flexibility and autonomy than traditional software teams.

To Know is to Know You Know Nothing

Data science team topologies fundamentally differ from typical software team topologies: they are built on experimentation and exploration. Unlike some software teams, which may have a clear set of requirements and a defined roadmap, data scientists are often tasked with uncovering insights and identifying new opportunities within vast amounts of data. This model requires a different approach to teamwork, emphasizing flexibility, adaptability, and a willingness to try new things.

In a high-functioning data science team, individual contributors and leadership should be comfortable with uncertainty and ambiguity. They should be able to pivot quickly when an experiment doesn't yield the desired results and willing to take risks and try new approaches. In contrast to traditional feature development teams, whose goals may include delivering a specific product on a particular schedule, data science teams focus on uncovering insights and making discoveries work on unpredictable time horizons. A vital aspect of a successful data science team is the ability to work in an iterative and exploratory manner, with a willingness to experiment and try new things while continuously looking for new opportunities and insights within the data.

The Map is Not the Territory

Data science teams often differ from typical software teams in their topologies because they require a deep understanding of the domain in which they operate. This expertise is critical to building impactful data products that genuinely solve problems and deliver value to the end user. With domain expertise, data scientists can avoid falling into the trap of optimizing for vanity metrics or building models that perpetuate bias in the data.

Photo by Marjan Blan | @marjanblan on Unsplash

Developing a holistic understanding of the problem space means understanding not just the technical aspects of the problem but also the building products with business context, user needs, and ethical implications in mind. Using a holistic approach, data science teams can align their work with the organization's overall goals and optimize the desired outcomes. This approach not only helps to mitigate bias in data products but reduces tunnel vision in optimizing experiment outcomes or model performance in favor of long-term business objectives or user behavior.

To drive more significant impact and create value through data products, data science teams (especially those developing user-facing data products) should be "stream-aligned" and accountable for their work's ultimate outcomes or user impact. These teams have a deep understanding of the problem space and the desired outcomes, are closely connected to the end user and are accountable for the impact and performance of their products. Furthermore, in collaboration with supporting teams and product stakeholders, they are responsible for ensuring their data products evolve to meet the shifting needs of the end users. By taking a holistic approach to data science and focusing on outcomes (rather than simply delivering data or models), data science teams can respond to user needs, develop a stronger sense of purpose, and create products that are adaptive to changing marketplaces and evolving user needs.

Different Ops Models

The structure and skills required for traditional DevOps functions versus those needed for DataOps or MLOps impact data science organizations. While classic DevOps teams focus on the infrastructure and operations required to deploy and maintain software, DataOps, and MLOps teams must also consider the unique needs of managing and deploying machine learning models. This nuance often emphasizes collaboration between data scientists, engineers, and operations professionals to ensure that models are deployed and continuously monitored, tested, and updated to meet the (evolving) needs of the user. Operations supporting production data products also require understanding the problem domain to ensure the products deployed are operating with the right level of accuracy and fairness while still addressing the business problems they intend to solve.

Conway's Law for Data Products

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure. — Melvin E. Conway

Creating a new data product involves not just data scientists but stakeholders from other parts of the organization. Such collaboration means that the communication architecture for data science teams must be designed to accommodate these different stakeholders and their different perspectives. This can be quite different from the communication architecture of some software teams, where the focus may be on communication within the development team. Data scientists must also be able to communicate the results of their work to non-technical stakeholders, such as business leaders, and help them understand the implications of the developed data products. Requiring data scientists to have a deep understanding of the problem space and strong communication skills ultimately impacts how to embed data scientists or teams within the organization.

Service-Oriented Structures

In the spirit of robustness, let's acknowledge that life is nuanced and try to argue against our own prompt. Then, taking a step back, let's ask ourselves, "When do data science team structures map most closely to typical software development?"

One example is when a team delivers a data science function as a service. For example, a platform for experimentation or a model used by multiple downstream products may require a similar structure and approach to a typical software development team. In such cases, the team may focus on building and maintaining the infrastructure, tooling, and processes to support the experimentation and model development process. They may also be responsible for ensuring the quality and performance of the models, as well as providing documentation and support to other teams using the platform or models.

However, while the structure and approach to development or sustaining may be similar in such cases, the required skill sets may differ. Data science teams delivering a function-as-a-service will still need a deep understanding of the specific domain, often complemented with strong collaboration and communication skills to effectively work with downstream teams and stakeholders. Supporting data products as-a-service requires providing guidance and support to ensure that the developed data products align with the desired outcomes, lest metrics or models be misinterpreted or misused.

A Different Kind of Team Topology

What have we learned? In data science, the product's architecture (such as an ML model) is often only fully understood once the data scientists have had a chance to experiment and explore. Consequently, data science team structures need to be flexible and adaptable, able to pivot as the team’s understanding of the problem and potential solutions evolves. Furthermore, it means that the communication architecture within and between the data science team and the rest of the organization needs to reflect this adaptability and handle a higher degree of uncertainty. This outsized focus on collaboration, exploration, and end-to-end ownership characterizes the topology of high-impact data science functions.

What do you think — did we convince ourselves? Where is this argument weak, and where does it resonate?

The views expressed within are my personal opinions and do not represent the opinions of any organizations, their affiliates, or employees.

Data Science Team Topologies was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Towards Data Science - Medium
https://towardsdatascience.com/data-science-team-topologies-b6844d4e2fa4?source=rss----7f60cf5620c9---4
via RiYo Analytics