Who should be doing machine learning at your company?

The key question for infusing your organization with ML is “who” not “how”.

Published in

Becoming Human: Artificial Intelligence Magazine

4 min readNov 2, 2022

The big decision that you as a leader have to make on ML is who in your organization does different parts of the ML workflow. Who does the task influences the how — not considering this question at design phase is a key reason why many organizations find it difficult to operationalize ML.

The who influences the how

The answer to the “who” question influences the “how”. For example, take model creation/training. What types of ML models will be built by data scientists? Which ones are better built by domain experts? Which ones by practitioners in the field?

For ML models that will be built by data scientists, you are better off using a code-first ML framework such as Keras/TensorFlow or PyTorch and operationalizing it using Sagemaker, Vertex AI, Databricks, etc.

Provide domain experts with a low-code ML framework such as BigQuery ML or AI Builder. A data scientist may be unhappy with prebuilt models (and corresponding lack of flexibility), but the ease of use and data wrangling capabilities that BigQuery ML provides will be perfect for domain experts.

Practitioners in the field will need a no-code ML framework such as DataRobot or Dataiku to create ML models.

Different organizations will answer differently

The answer for the same ML problem will vary from organization to organization. For example, consider using ML for pricing optimization. As a leader, you have 3 choices of who does the work:

Have a data science team build dynamic pricing models. They will probably start with controlled variance pricing methods, and start to incorporate factors such as demand shocks. The point is that this is a sophisticated, specialized team, using sophisticated methods. They’ll need to do it with code.
Have domain experts build tiered pricing models. For example, a product manager might use historical data to craft the pricing for different tiers, and adapt the pricing to new market segments based on the characteristics of that market segment. This involves analytics or a regression model and is very doable in SQL.
Have practitioners determine the optimum price. For example, you could have store managers set the price of products in their store based on historical data and factors they find important. This requires a no-code framework (e.g. DataRobot to price insurance).

The correct answer depends on your business and the ROI you can expect to get from each of the above approaches.

Multiple tasks

It’s also important to recognize that the ML workflow is not just model training and deployment. Very commonly, engineering teams recognize that model training needs a data science skillset (stats, Python, notebooks), deployment needs an ML engineer skillset (software engineering, cloud infra, ops), while invoking the model can be done by any developer.

However, the end-to-end ML workflow consists of many more tasks than just these. I suggest that you consider each of these tasks separately: 1) training, 2) deployment, 3) evaluation, 4) invocation, 5) monitoring, 6) experimentation, 7) explanation, 8) troubleshooting, and 9) auditing. Then, ask who at your company will be doing the task for each ML problem. Very rarely will these other steps be carried out by an engineer (indeed, it’s a red flag if only engineers can do these things).

For example, a model that is created by a data scientist may have to be invoked as part of reports that are run within Salesforce by a practitioner (a sales person, in this case) and audited by a sales manager.

If you build your models in Spark, productionize them as notebooks, and deploy them as APIs, how will the sales manager do their audit? You now need a software engineering team that builds custom applications to enable this part of the ML workflow! This is a serious waste of time, money, and effort. This is a problem you could have avoided by deploying the model into a data warehouse that readily supports dashboards.

Summary

It sounds very obvious, but I see so many organizations making this mistake, that it is worth calling out: Organizations that try to standardize on an ML framework disregarding the skillset of the people who need to carry out a task will fail. Because skillsets vary within your organization, you will have different ML frameworks and tools in your organization.

Make sure to choose open, interoperable tools so that you don’t end up with silos. It is also helpful to go with solutions that address different levels of sophistication — this way, the vendor takes care of interoperability.