In the first part of ‘Data science, not for dummies’, I hopefully succeeded in taking away the mystique surrounding data science. Data science is merely a (complicated) profession and requires craftsmanship to get the maximum value out of your data. I ended that article with the conclusion:
‘Theory and tooling are in adequate supply if you want to get more value out of data. However, introducing these into actual practice in a sustainable way, and getting data science to thrive in your organisation in a structured way, is a major challenge that requires a solid approach and lots of patience. Smart technology and capable people alone will not be enough, not by a long shot.’
Okay, but how do you achieve this? How do you get data science to thrive? To help you on your way, I have listed a few best practices:
1. Test, test, test
The examination of data and the resulting insights forms the basis for important future decisions and actions. Be aware, however, that data science is not a ‘game’. The potential gains are substantial, but, at the same time, a lot can go wrong. Imagine, for example, that clients receive entirely inappropriate offers because your algorithm isn’t doing what it’s supposed to do. Or, that you have based your purchase decisions on a faulty inventory forecast.
The models that you develop must be robust, stable and sustainable, so they also continue to produce reliable results in the future. Don’t just settle for correlations that you think are visible in the data, but test your model over and over and check the results under a range of conditions. This implies validation of the results, by business experts, on top of simply checking your model statistically. Are the results logical and useful? Don’t just do it once, but do it on a regular basis!
Testing data science models is an art in and of itself. After all, it requires mathematics, statistics and probability calculations. Models create an abstraction of the real world, to be able to say something with a certain degree of reliability about big groups of observations (clients, products, orders). You will only be able to know how and when your model can be used if you have a proper understanding of how this works.
2. Work iteratively
Data science implies exploration and experimentation. You are pioneering to discover opportunities and threats. You don’t always know where it will take you when you start out and there is always a possibility of ‘mistakes’ because you are innovating. Therefore, it is important to create a culture in which it is okay for projects to fail and where mistakes are considered a learning experience. Try to avoid looking for the perfect solution, but know when to stop and focus on a cycle of ongoing learning and improvement.
‘Every outcome, positive or negative, is valuable’”
Don’t start with the most comprehensive data science project, but work iteratively, in small steps. Use the outcome to continuously perfect and improve the process. And, be sure to stop on time if the outcome is not what you are looking for!
3. Implement gradually
No matter how well your models approach reality, they will work just a little different once you have them ‘in production’. Therefore, fine-tune your models in the real world. Start with a small-scale rollout and evaluate how it works and what the results are. The process of going live will then be controlled and you can intervene on time in case of problems.
‘What is fragile should break early, while it is still small. Nothing should ever become too big to fail.’
(Nassim Nicholas Taleb)
This principle also applies to data science. Start with cases that involve a rather insignificant risk if things were to go wrong. This way, you and the rest of the organisation become familiar with the process of model development without excessive risks. You learn about the pitfalls, the best steps to divide the process into, who to best get involved to achieve your goals, etc. You then apply the lessons learned in larger, more important projects. Moreover, this approach ensures a ripple effect that helps data science become more firmly anchored within your organisation.
4. Monitor the model
The work is not done once the developed model or algorithm has been implemented/applied. When it involves a one-off analysis that results in decisions and actions, you can evaluate the impact afterwards. Was the result positive? In that case, the path ahead is clear for additional research.
Is the model used regularly or is it perhaps integrated in the operational systems? In that case, continue to monitor if the performance is up to par. Is the model still working well? Is there room for optimisation? What has the impact been to date? Data science is a process in which you continuously evaluate and look for improvement opportunities. After all, the world around us changes daily. Continuously monitoring the models that you’ve developed ensures sustainable solutions instead of various temporary quick wins.
5. Work multidisciplinary
Data science often has a (logical) data warehouse as its foundation, in which a range of data from various business processes is integrated. This integration creates significant value for your company. Is it then not equally logical to have people from different business units working together on your data science projects? Naturally, that also implies that different interests and opinions must be coordinated, but the added value could be significant. If everyone contributes domain knowledge and knowledge of the underlying data (definitions), the project results will improve exponentially. Therefore, don’t turn data science into a one-department or one-expert ‘show’, but compile multidisciplinary teams and invest in collaboration.
Collaboration within a data science team does not imply that everyone does the same work, but that the unique knowledge and experience of each team member are used to the best extent possible and that other team members fill gaps in expertise. Ensure that the role distribution is clear to all those involved and designate the owner for each sub-area of the project. Also, stimulate collaboration through consultation and coordination.
6. Ask questions
Most of the people in your company don’t know their way with data science, analytics and statistics. You might make the mistake of refraining from asking ‘laymen’ questions. After all, they have no idea what you’re talking about. So much reliance is therefore placed on the data scientist that this person becomes a single point of failure. Or, the results that are magically ‘conjured up’ are distrusted to such an extent that nobody uses them. Moreover, there is the risk that the data scientist will become stuck with certain ideas and convictions, because he or she is never challenged to look at things from a different perspective.
In other words, you should make sure that questions are asked until everyone ‘understands what you are doing’. This applies for questions that business experts ask the data scientist, and vice versa. Good results are only achieved when the data scientist has a good understanding of the business goals, how the processes are structured and where the opportunities are. Conversely, business experts can only validate the results if they understand what the analysis entails, what the starting points were, and which choices were made.
‘If you can’t explain it simply, you don’t understand it well enough’
Sometimes, you must help things along to get communication going. Check if everyone has a clear understanding of things and clarify where necessary. Don’t make assumptions, but instead, verify, because before you know it, time and money will have been spent on various projects that won’t bear fruit because of incorrect starting points.
7. Focus on action
It seems obvious, but occasionally things still go wrong in actual practice. Why are we examining the data? What do we hope to achieve with the results of this data analysis? How can we implement a model for this and what will it afford the organisation? Allow the business experts and data scientists in your team to answer questions together, first, before the actual examination work starts. Don’t search wildly for various hidden correlations that eventually prove to be unactionable. Ensure, however, that team members do not step on each other’s toes, but allow everyone to contribute to a healthy business case and an unequivocal step-by-step plan, from their own expertise.
No loose ends
Have you noticed? As a data scientist, I hardly ever talk about algorithms, statistics and reliability? I focus on organisation, processes, collaboration and communication for the most part? With good reason, too, because that is exactly where things go wrong for many organisations, in actual practice. The data science picture never spans beyond ‘having fun while tinkering with data and magically achieving higher profits’. Moreover, a picture that’s reinforced by various software suppliers and consultants who promise riches.
However, as explained in part 1 of this article, data science implies that you must conduct a solid examination with a structured work approach. Defining a use case, formulating, testing and evaluating a hypothesis and returning to step 1. The definition of ‘ready’ is then determined by the specific use case and not by the ‘lovely outcome’.
Data science might be hip, but to apply it successfully you must also arrange some mundane aspects properly, like structure, planning and organisation. You truly put data in the beating heart of your organisation by creating the right organisational setting to conduct data science professionally. This way, you also avoid all kinds of loose ends without a real purpose. All those who contribute to data science in your company will then work more efficiently and more effectively.
What would it mean to your organisation if you could implement data science in a much more structured and successful way?