For a year now, I have been affiliated with the Edison project funded by the European Commission. The goal of this project is to increase the number of data scientists and to expand their knowledge, by way of well-substantiated research to uncover the correct competences, training, education and accreditation for this profession. This will actually eradicate the mystery that is still associated with the ‘data scientist’ concept.
When I talk to companies about their (fruitless) search for data scientists, it turns out time and again that they don’t really know precisely what they are looking for. While the usefulness and necessity of data scientists were still questioned in many discussions a year ago, everyone is now convinced that they are needed. At the same time, the data science definition is highly susceptible to inflation and seems to be used all over the place for anything involving data analysis. The data scientist profession is certainly one of the most hyped professions around.
But why? What type of people are data scientists? What competences must they have? Does your organisation really and truly need a data scientist? If so, where can you find one or how can you train someone (or have them trained) to fulfil the role? The Edison project continues to provide more answers to these questions!
A foresighted outlook
In 2009, the New York Times published an article entitled: For Today’s Graduate, Just One Word: Statistics. In many respects, an important eye opener, which still seems very relevant to this day. A quote by Hal Varian, former Chief Economist at Google, summarises the article quite well: ‘I keep saying that the sexy job in the next 10 years will be statisticians.’ Three years later, Harvard Business Review gave a new twist to the quote with the famous article Data Scientist: The Sexiest Job of the 21st Century.
The article in the New York Times provides a good idea of why the data science profession is so ‘sexy’. The volume of data within organisations is reaching the explosive point, and the major problem relates to the knowledge and expertise held by people, enabling them to use and analyse all this data, in order to gain meaningful insights. We need people who possess the golden combination of statistical knowledge, computer / programming skills, while also being good with numbers.
Usefulness and necessity
You might be asking yourself, ‘do we really and truly need these types of people’? All I can answer is a definite ‘YES’. The two main reasons?
- The volume of data available for analysis will only continue to grow due to developments such as the Internet of Things, smart sensors and social networks. Also think of aspects like virtual or augmented reality, where each direction we look in, each event, and each movement is recorded in a VR application, game or website. Client insight at its best!
- Advanced algorithms are still needed to distil usable insights from the increasing volumes of data and to convert these insights to actions or new products and services.
Research by Forrester again sums it up nicely: Businesses are drowning in data but starving for insights. Of course, proper data architecture and the right hardware and software are of essential importance for data science (software tools, in particular, have developed significantly in recent years, so that the realisation of self-learning algorithms based on machine learning, for instance, has become more approachable), but man still remains the most important factor if an organisation is to become truly successful with data science. The MIT Sloan Management review survey indicates that the organisations that perform best in the field of data science are also very good at guiding and developing analytical talent. The importance of this human factor was confirmed once again during Gartner’s BI Summit in London last March. Of the overall presentations, 25% involved organisational change and change management. The ‘culture eats strategy for breakfast’ phrase was expressed frequently during the tech conference.
The fact that organisations ‘must take action’ is a given and has been expressed for years now. What is new, however, is the focus on what must be done with that data. Data might be the oil of the 21st century, but merely having data is not sufficient. According to Gartner, Algorithmic Business is the key to the future: converting data to action in order to improve decisions, optimise (operational) processes, and therewith, get ahead of the competition. The need for real-time action and the enormous volumes of data have made smarter algorithms indispensable.
Full of hot air
The search for data scientists is therefore on! However, this brings us to a major problem. The data scientist definition is not unequivocal and the term is therefore used and abused all over the place. For example, I recently heard a large software supplier say the following: ‘Adjustments are always difficult. We will simply fly in a Data Scientist who will make the adjustments possible.’ What was the conversation about? The adjustment of a standard feature report! This is the perfect example of what a data scientist is NOT needed for.
The fact that a definition lacks also leads to ambiguous training and education, with insufficient transparency, standards and quality. New York University, on the other hand, does have a great definition for data science: using automated methods to analyse massive amounts of data and to extract knowledge from the data. The long-standing method of the data analyst, to go in search of relevant insights by way of visualisation, is no longer effective with the enormous volumes of data in this day and age. The data scientist uses different methods and techniques. There are two important differences between the two roles:
- The process: big data calls for a different way of collecting, cleaning, analysing and validating the results. The statistical component, in particular, is relevant.
- The application: results achieved with data science are not just aimed at improving decision-making (based on analyses), but are actually also integrated in websites, processes and systems (based on algorithms).
These differences obviously call for a different range of competences, but where does one get a hold of the knowledge and expertise? Training institutes are throwing themselves en masse into the data science hype. Numerous websites and entirely new data institutes have been founded, due to a lack of ‘formal’ university bachelor programmes, all promising a great deal (like Data Science Retreat, School of Data Science and Silicon Valley Data Academy). ‘Renowned’ educational institutes also contribute to the uncertainty with their ad hoc master programmes (refer to the Data Science: what’s the half-life of a buzzword? article, amongst others). Not to mention, anyone can call themselves a data scientist on their LinkedIn profile after having concluded a mere 2-week course.
The key to success
How to proceed? How will you ensure that your organisation finds sound individuals with the right knowledge and competences, to help you become an algorithmic business? A great deal depends on the specific characteristics of your organisation, such as the market in which you operate, the corporate culture and the level of data maturity. However, what you already know at this stage, is that you must be ready for a future in which data will influence your organisation in a dramatically different way. A future for which you will not just need technology, but also people. People who are truly able to discover new insights with advanced analyses and algorithms.
The European Edison project has produced its first positive result to help you attract or train the right people: an all-encompassing competence and skills framework for data scientists, based on independent scientific research. The following five competences are absolutely essential for any data scientist:
- Data analytics: the ability to use the appropriate statistical techniques and predictive models on available data to discover significant new insights and relations.
- Data science engineering: having the necessary programming knowledge to research, design, develop and implement new applications and instruments.
- Domain expertise: the ability to convert organisational characteristics and specific business problems into relevant data analysis applications and methods.
- Data management: the ability to develop and implement a data management strategy for data collection, storage, preservation and availability for further processing.
- Scientific methods: creating new understandings and possibilities by using research methods (hypotheses, test / artefact and validation).
For further details, please refer to the Edison project website. The website also provides an overview of the skills per competence (such as knowledge of neural networks, Markov models and game theory). A must-have for you and your organisation! I am delighted with the announcement last week that the Ministry of Education, Culture and Science has decided that this framework should form the basis for all data scientist training and education programmes in the Netherlands!
The data scientist family
The Edison project has just come to an end and already new developments are constantly taking place. The latest interesting addition is that of a Data Science Profession Family. This builds further on the discussion of whether or not the required knowledge and competences for data science can be encompassed with numerous people within companies. The ‘family approach’ also elaborates further on roles such as managers, professionals and clerical positions. This provides many levers for the way in which data science teams can be put together.
Public versus private
I have learnt something else through my participation in the Edison project: when it comes to data science, there is a massive (knowledge) gap between the scientific / public sector and the sector in which private organisations operate. This, while all players have to face the same problems and all players develop a tremendous amount of data science knowledge. Perhaps you have heard about the Digital Single Market European Cloud Initiative aimed at a European Open Science Cloud? An open data infrastructure with user-friendly access. What about the European Grid Infrastructure? Both EU projects are aimed at facilitating the storage, processing and analysis of large volumes of data. The project members include many (leading) scientists from organisations like Cern, Max Planck Institutes and universities from across Europe. Projects that the private sector can also benefit from!
The public and private sectors face the same challengers, certainly when it comes to data science. Why then are we looking for solutions on our own, separately? Would it not be much better to join forces? The first step being to train appropriate and relevant data scientists, so that your organisation can also stop its fruitless search for analytical talent!