Learn more
From our Data, Machine Learning and AI experts
Facilitating Data Science with a Machine Learning Platform
Machine Learning: a platform perspective
Businesses are making increasing use of Machine Learning. Predicting behavior or automating decisions - some examples of Machine Learning - has great value for organizations. Often these projects begin as separate innovation initiatives, but how do you ensure that the models built remain available and accurate? How do you create an overview of which models are in development? More importantly, how do you facilitate Data Scientists so they can work quickly and securely? A mature Machine Learning Platform plays a crucial role in this.
More and more companies are discovering the added value of Machine Learning, and are employing one or more Data Scientists to develop models. A Data Scientist, like any software developer, must be supported with the right tools.
A Machine Learning Platform aims to organize and facilitate the data science process. Machine Learning processes require support in process security, computing power requirements and efficient development to models (version control). We group these terms together under the heading 'organizing'.
'Facilitation' looks primarily at the process. Machine learning has fixed steps that must be supported within the organization, including data acquisition, training and deployment.
Organizing
What is needed in order to organize?
Who has access to what code, models and data? How do we keep this data within a secure environment? Data science tasks performed on not (fully) managed work environments, such as local laptops, pose a major threat in terms of loss of personal or privacy-sensitive data. A Machine Learning Platform facilitates a secure working environment that shields data from unwanted access.
Computing power is needed for two purposes: for training models and for hosting models. The requirements for the two environments are often different. A training environment is used for model learning. This process is characterized by a period of intensive computing power that can often be parallelized. The training process can then be split up and run simultaneously on multiple computing cores.
Hosting models involves making the trained model available so that it is ready for use. There are two categories in this:
A data science process is iterative. It is not a linear process in which you plan the details in advance and then work them out. It is, however, a process of discovery, adjustment, adaptation and testing. Version tracking is essential in this context, and this tracking is done in two areas:
Facilitate
Now that we've seen how a Machine Learning Platform organizes, it's time to look at how the data science process is facilitated. What phases should be supported?
Correct data is essential for a well-functioning model. Data acquisition includes obtaining and making this data available for Machine Learning purposes. A Machine Learning Platform provides an enabling role in this. Datasets must be discoverable and of good quality to be used directly in Machine Learning processes. In addition, a data platform, the role of Data Engineers and good cooperation between different disciplines are also important in this process.
Model Training is the central step in the Machine Learning process. This step results in a model that can be validated, tested and published. To carry this out, a Data Scientist needs a wide range of tools and frameworks. These can include development environments in Python or R and deep learning frameworks such as PyTorch. A Machine Learning Platform provides this environment. These tools and frameworks can be offered in a GUI, in a managed environment or in a linked, proprietary environment.
In practice, it becomes clear that Data Scientists prefer environments that can be set up flexibly, with the open source tools and frameworks to perform Model Training. In doing so, the platform enables the model training process to be recorded and monitored.
Once a model has been trained and validated, the time has come to put it into action. This results in a model that can be used repeatedly. The computational step required for repetitive deployment differs for Batch Inferencing and Real-Time Inferencing.
Both environments require a process to track versions and roll out new versions. A Machine Learning Platform helps track and automate this process so that new versions can be delivered at the push of a button. After deployment, the deployed versions can be monitored for performance (is the model online) and for model-fit (do the results match the truth). Both values can be a reason to make adjustments. This makes the above steps an iterative and repetitive process.
"With a Machine Learning Platform, the power of your Data Scientists and Engineers is fully leveraged."
Without using such a platform, your organization runs risks in the area of data security and the transparency of your Machine Learning processes.
Wondering about the added value of Machine Learning for your organization, or what innovative solutions data science can offer you? We will be happy to help you!
Learn more
From our Data, Machine Learning and AI experts