To hire the right people for the right roles and organize the data science department, it’s important to distinguish between different types of data scientist.
One type of data scientist creates output for the decision makers to use in the form of product and strategy recommendations. They are decision scientists. The other creates output for using on machines like models, training data, and algorithms. They are modeling scientists.
Five key areas are required for data science operations. In small organizations, one person may do several of these things. In slightly bigger teams, each of these may be a role staffed by one or more individuals. In larger operations, each may be a team unto itself. These roles cover the creation, maintenance, and use of data.
Data infrastructure: data ingestion, availability, operations, access, and running environments to support workflows of data scientists. e.g. running a Hadoop cluster
Data engineering: determination of data schemas needed to support measurement and modeling needs, and data cleansing, aggregation, ETL, dataset management
Data quality and data governance: tools, processes, guidelines to ensure data is correct, gated and monitored, documented, standardized. This includes tools for data lineage and data security.
Data analytics engineering: analytics software libraries, productizing workflows, and analytic microservices.
Data-product product manager: creating products for internal customers to use within their workflow, to enable incorporation of measurements and outputs created by data scientists. Examples include: a portal to read out results of A/B tests, a failure analysis tool, or a dashboard that enables self serve data and root cause diagnosing of changes to metrics or model performance.
How to organize Data Science Department?
The Kinds of Data Scientist
HBR, NOVEMBER 2018