Data has made quite the promotion in recent decades: from a residual product of IT systems, useful for reporting purposes, to the ultimate means to make business processes smarter, improve decision-making and innovate products and services. Data must be available increasingly faster, in larger quantities, and in more places for more complex analyses, and that places high demands on the underlying data architecture. Many companies notice that their data warehouse solution has reached its limits and are looking for a way to migrate to an integrated data platform that can flexibly grow with the corporate ambitions and the technological developments. In line with the chosen cloud strategy, with maximum scalability and manageable costs. In this article, we introduce a reference architecture, that will allow you to specify a modern, flexible, cloud-based data architecture in just three workshops.
Has so much changed? Yes and no. It’s still about making data available in the right location, at the right time, and in the right format. But, the data volumes have grown at an explosive rate, data is scattered across an increasingly complex infrastructure, and, by now, everyone in the company is asking for data, and they often want it in real time. User demands are much higher with the emergence of self-service BI and data science, and data must be made available in many more formats: APIs, apps, files, etc. What’s more, companies must share data with clients, chain partners and other external parties, and between systems, to an increasing extent. The data architecture literally becomes the junction for application integration.
The problem might still seem very much the same, but the complexity has surely developed in fast motion in recent years. On top of that, the requirements with respect to data governance, security and privacy were tightened significantly. It must be possible to answer questions about the origin and use of data, quickly and reliably, and the ability to anonymise data for specific user groups, on-the-fly, is a hard requirement.
Many challenges that require you to let go of a few unswayable principles and procedures, and to look at your data architecture with fresh eyes.
Technology is (evolving) faster than ever
The tempo at which innovative technologies are made available for the processing, storage and making available of data, is faster than ever. Cloud environments like Microsoft Azure and AWS introduce new functions on what seems like a daily basis. Often, with such interesting possibilities to simplify, accelerate or otherwise improve your data architecture, that it’s worthwhile to adapt your data architecture’s design to follow suit. Therefore, you can’t set your new data architecture in stone but must ensure that it keeps evolving along with the technological developments. This means that the architects in your organisation must implement a different approach and must possess in-depth knowledge of cloud platforms, concepts like data virtualisation and new technologies like Hadoop, NoSQL, NewSQL and Snowflake.
Data integration 3.0
A modern data platform sets itself apart from the traditional data warehouse on two important points. Firstly, such a platform is ‘designed for the cloud’ and makes optimal use of all functional and technical possibilities of the relevant cloud platform. In addition, data virtualisation is used to minimise data replication and to ensure maximum uncoupling of data and logic. The combination of a cloud platform and data virtualisation makes it possible to connect new data sources and establish new data services much faster, also for real-time data. Via data virtualisation, one can offer all kinds of formats, without the need to constantly replicate the data. The underlying cloud platform also ensures the required performance level, even when large data volumes are involved. You can up- or downscale resources at any moment.
The data warehouse built for the cloud
Perhaps your organisation already has a cloud strategy, and the next logical step is to migrate your existing data warehouse to the cloud too. That’s the ideal opportunity to let go of your old ways of thinking and to implement a radical architectural change. Cloud platforms bring possibilities about that we could previously only dream about. A good example is Snowflake, an analytical database developed for the cloud, with implementations on Microsoft Azure and AWS. Snowflake gives you access to a highly scalable MPP database that supports ANSI SQL and hardly requires management. An extremely powerful combination with the functionality, for example, of Microsoft Azure. Also, scalable from minor implementations to company-wide, distributed data architectures.
Uncoupling with data virtualisation
When migrating to the cloud, however, you must take care to avoid the old pitfalls. One should not create various architectural layers in the new data platforms, between which to replicate data. By now, you know that it only makes the development process complex and inflexible. Also, it’s important to properly separate the data and logic, in view of flexibility and data governance requirements. That’s where data virtualisation comes into play. By steering clear of physical data integration, and opting for virtual integration, the process of developing and testing data services, in any format you desire, becomes much simpler. From data sets for data science and self-service BI, to complete API libraries. Moreover, a data virtualisation platform offers excellent possibilities for data lineage, authorisation and on-the-fly data anonymisation.
The utilisation of data virtualisation also entails one other advantage. Migration to the cloud will often happen gradually, and, perhaps, not all existing data sources and BI solutions will migrate. Moreover, you’ll likely also use data stored in (internal or external) systems that you cannot or may not copy to the cloud. For example, big data sources or privacy-sensitive information. Data virtualisation enables on-premise data to be combined with data in the cloud, in one uniform data model, in a way that’s transparent. Therefore, you’ll never encounter discontinuity problems in terms of data availability.
The data platform of the future
How does a data platform like the one we’ve described, look? In the illustration below, we provide a reference architecture for Microsoft Azure in combination with Snowflake and the Denodo data virtualisation platform.
We created this reference architecture so that it can evolve with the rapid technological developments. We always seek the best technological fit for each functional architecture component, preferably as a ‘service’ that one merely has to configure. When new functionalities become available, you determine how the technology might improve or enrich the data platform.
Like a traditional data warehouse, this data architecture consists of two or more layers, each with its unique functionality. However, not all of these layers must always be in place, and specific components or entire layers might be logical and contain no physical data. Nearly every component of the reference architecture is optional. This makes the architecture incredibly flexible and scalable. One can add or remove functionalities, depending on the company-specific requirements. The reference architecture distinguishes between these layers:
In this layer, the data is approximated, added, validated and repaired in the right format and with the right content. This layer may also contain micro-services, to generate or transform specific data. Here, the architecture always processes data in a ‘model-driven’ way, and, therefore, metadata repository is an important part of this layer.
Storage and movement
This layer contains all functionalities for data storage, transformation and movement. Depending on the type of data, here you’ll find a spot to store all the data, databases for processing large volumes of structured data, and functionalities to process (more complex) data.
This is where data from the different sources are integrated. For example, the real-time combination of structured and unstructured data, from on-premise systems and the cloud. By only applying business logic in this layer, without physically replicating data, it results in substantial flexibility and a much simpler architecture. Management becomes much easier, and new data can be added and supplied much faster.
Publication via the data counter
You make data available for any type of use, via the data counter. This layer contains tools that let you supply data in (virtual) tables or cubes, or via files and APIs for applications and systems to use. This layer also contains the data catalogue that quickly gives end users insight into and an overview of all available data.
This layer contains the functionalities for actual use of the data via the data counter. For example, apps, dashboards and visualisation tools, but also a data science lab to develop and test analytical models.
This important architectural layer contains functionalities to secure data, applications and infrastructure, and to quickly identify the continuously changing risks. In addition, different tools can be used here to control user access, to the last detail, integrated across the entire data architecture.
Seeing is believing
The described reference architecture provides endless possibilities and functions that you can activate and try out with the push of a button. What’s more, up- and down-scaling new applications, is easy. Thus, you can discover whether an application or technology is really usable, at an early stage. First see, then believe!
We can easily try out the different architectural components with you (and also new possibilities, in the future), in a representative proof of concept. The cloud platform gives us all the leeway, at a low cost, to determine whether new functionality is of value to your architecture. Of course, we give careful prior thought to what you want to achieve with the addition of a component, because new functionality is released almost daily, and it’s not sensible to give everything a try. By properly monitoring the architecture in terms of context, we prevent initiatives from getting bogged down in various detached experiments without any common thread or clear goal.
Figured out in three workshops
Designing a new data architecture does not necessarily have to take weeks or months. Thanks to the reference architecture, we, along with your team, can work out a data platform that suits your organisation, ambition and application landscape, and an approach that’s supported by your entire organisation, in just three workshops:
1. In the first workshop, (with a carefully selected core team) we identify the data consumption processes, the needs and requirements, and the possible risks and challenges.
2. In the second workshop, we specify all components of the target architecture, both functional and technical.
3. In the last workshop, we work out the main use cases, to then arrive at a step-by-step plan for implementation.
With the flexibility of the reference architecture, the cloud platform and data virtualisation, we can then help you, in quick, short cyclical steps, to work out the use cases and to start building your data platform in the cloud.
How long will your data warehouse still be up to par with the increasing demands to make data available in a faster, more flexible way?