Ontology-based Meta AutoML

Abstract

Automated machine learning (AutoML) supports ML engineers and data scientist by automating single tasks like model selection and hyperparameter optimization, automatically generating entire ML pipelines. This article presents a survey of 20 state-of-the-art AutoML solutions, open source and commercial. There is a wide range of functionalities, targeted user groups, support for ML libraries, and degrees of maturity. Depending on the AutoML solution, a user may be locked into one specific ML library technology or one product ecosystem. Additionally, the user might require some expertise in data science and programming for using the AutoML solution.

We propose a concept called OMA-ML (Ontology-based Meta AutoML) that combines the features of existing AutoML solutions by integrating them (Meta AutoML). OMA-ML can incorporate any AutoML solution allowing various user groups to generate ML pipelines with the ML library of choice. An ontology is the information backbone of OMA-ML. OMA-ML is being implemented as an open source solution with currently third-party 7 AutoML solutions being integrated.

Keywords

Machine learning ontology AutoML Meta AutoML OMA-ML

1. Introduction

Machine learning (ML) is an important sub-domain of artificial intelligence, allowing to make predictions using models based on previous observations [1]. ML is used as an approach to solve a multitude of problems, like classification, clustering, or anomaly detection, from all kinds of business domains, like life sciences [2], manufacturing [3, 4], or the public sector [5, 6]. Engineering ML applications for practical use requires sound experience of ML engineers, respectively, data scientists. Tasks to be performed include data analysis, data preparation, feature engineering, model selection, validation, learning curve analysis and hyperparameter optimization. To support data scientists and also enable application domain experts to create ML pipelines, the field of automated ML (AutoML) [7] has emerged. AutoML aims at automating model selection and hyperparameter optimization, leading to higher efficiency and, potentially, better results. More progressive AutoML solutions also perform data preparation, feature engineering and validation, allowing to create entire ML pipelines automatically [8, 9]. Currently, AutoML is focused on supervised ML [7]. There is a growing number of AutoML solutions available, both academic as well as commercial. Current state-of-the-art AutoML solutions target one major ML library and compute a ML pipeline for this library only, e.g., Autosklearn [10] for Scikit-learn [11], Auto-Keras [12] for Keras [13], and Google AutoML [14] for Tensorflow [15]. While most AutoML solutions expand on including secondary ML libraries that offer support for one ML approach (e.g. Catboost, LightBGM); only AutoGluon [16] support multiple redundant ML libraries (MXNET, PyTorch).

The targeted user groups of AutoML solutions differ. Commercial solutions like RapidMiner Auto Model [17] or Google AutoML [14] offer a graphical user interface (GUI) usable for application domain experts potentially without programming skills (e.g., biologists), providing a workflow and deployment inside their ecosystem. Auto-WEKA [18] is an open source solution which also provides a GUI. Other open source solutions like Autosklearn [10], Auto-Keras [12], Auto-PyTorch [19], or TPOT [8] offer libraries that require programming skills.

The contribution of this article is three-fold: (a) we present a extended survey of state-of-the-art AutoML solutions; (b) based on the survey results we present a novel concept and (c) an implementation called OMA-ML (Ontology-based Meta AutoML) which combines the features of existing AutoML solutions.

This article is an extended version of [20]. Sections 3, 4.4 and 5 including Figs 1, 3, 5 and 6 are based on [20] and have been extended with current findings where needed. The survey of AutoML solutions, the description of the implementation and its evaluation are new.

This article is structured as follows: Section 2 presents related work. In Section 3, we introduce the basics of AutoML. Section 4 presents our survey on 20 AutoML solutions. Section 5 introduces the concepts of Meta AutoML and OMA-ML. In Section 6 the implementation of OMA-ML is presented, which is evaluated in Section 7. Section 8 concludes the article and indicates future work.

2. Related work

Six Peer-reviewed surveys of AutoML have been published recently: [21, 7, 22, 23, 24] and [25]. The survey [21] provides a detailed overview of the state-of-the-art in AutoML. The authors describe tasks in AutoML and algorithms to solve these tasks. However, they do not analyze specific AutoML solutions. The survey [7] presents AutoML concepts and algorithms and additionally reviews some AutoML solutions. However, the list of solutions is incomplete. It does not cover ADANET, AlphaD3M, AutoCVE, AutoGluon, Auto-Keras, Auto-Pytorch, Auto-WEKA, AWS Sagemaker Autopilot, Azure AutoML, EvalML, FLAML, Google AutoML, MLBOX, MLJAR, RapidMiner Auto Model and TransmogrifAI. The survey [22] introduces an overview of AutoML steps, presents 4 AutoML solutions and compares the characteristics of 5 additional AutoML solutions. In [23] the authors propose a new concept to benchmark AutoML solutions; their benchmark is applied on 4 different AutoML solutions using 39 different datasets.

In [26], an extensive AutoML solution classification benchmark suite containing 72 classification datasets is presented. The classification dataset used for the benchmark in our survey is part of this benchmark suite. The survey [24] briefly describes 4 different AutoML solutions and their internal pipeline building concept. Additionally, the authors present benchmark results collected from different sources related to the analysed 4 AutoML solutions, including [23]. Finally, [25] presents the progress made in the different areas covered by AutoML and a survey of 9 different AutoML solutions. This survey focused on the methods used by each AutoML solution, e.g. the model selection or if meta learning is used.

Figure 1.

AutoML input and output.

The concept of Meta AutoML is novel; it was first published in our conference paper [20]. We are only aware of one article [27] which presents a similar concept called Ensemble Squared, but this preprint is not a peer-reviewed publication. Like OMA-ML, Ensemble Squared uses third-party AutoML solutions which are invoked in parallel. Insofar, both OMA-ML and Ensemble Squared are Meta AutoML approaches. A difference of our approach is the use of the ML ontology to guide various components of OMA-ML. We see considerable benefits in this approach regarding extensibility.

3. Basics of AutoML

In this section we briefly introduce the behaviour of AutoML by describing its input/output behaviour and illustrate it by means of an example.

3.1 Input and output

Figure 1 shows the input and output of AutoML as a BPMN diagram (Business Process Model and Notation) [28]. AutoML requires the following inputs from the domain expert or the data scientist:

1.
Dataset: the dataset for the ML task, e.g. a CSV file for classification on tabular data;
2.
AutoML configuration:

(a)
ML task, e.g., classification or regression on tabular data, images, videos, or textual data.
(b)
ML target: e.g. label column in classification or regression tasks;
(c)
Optional configuration parameters, e.g. maximum run-time, model performance or hardware restrictions.

AutoML produces the following outputs for the domain expert or the data scientist:

1.
ML pipeline: The ML pipeline generated by AutoML is a piece of source code which can be executed to perform the ML task specified. A ML pipeline implements data preparation (e.g. feature selection, encoding or missing values imputation), the selected ML approach and its hyperparameter configuration [7].
2.
Report: A textual or graphical explanation of the AutoML result, including a listing of ML configurations and their respective performance measures.

AutoML solves the Combined Algorithm Selection and Hyperparameter optimization (CASH) problem [29]. Algorithms that solve the CASH problem search for the best ML approach and hyperparameter setting for a given ML task [7]. Different AutoML solutions use different algorithms to solve the CASH problem, e.g.:

1.
Auto-Sklearn: SMAC [30, 31];
2.
TPOT: Evolutionary algorithm [8];
3.
H2O AutoML: Grid Search [32];
4.
ATM: A combination of multi-armed bandit learning with Gaussian processes [33].

3.2 Example: Auto-sklearn

One of the more popular AutoML solutions by citations is Auto-sklearn [30]. It offers pipeline generation for classification and regression of tabular data. Listing 1 displays a simple Auto-sklearn implementation. In Auto-sklearn, the ML task is specified by the Python class used, e.g., AutoSklearnClassifier for classification of tabular datasets. The AutoML process is triggered by executing the fit function.

Listing 1
Auto-sklearn example

cls = AutoSklearnClassifier()

cls.fit(X_train, y_train)

predictions = cls.predict(X_test)

Without custom parameterization, Auto-sklearn will use its default configuration. Advanced users may customize the Auto-sklearn process [34] with a multitude of parameters, e.g.

1.
Hardware usage, e.g. memory_limit;
2.
Pipeline size or generation constraints, e.g, ensemble_size;
3.
Pipeline scoring/metrics, e.g. metric;
4.
Pre-processing constraints, e.g. exclude_preprocessors;
5.
Runtime constraints, e.g. time_left_for_this_task;
6.
Meta configuration (logging, save folder location, etc.), e.g. output_folder;

The Auto-sklearn result is a pipeline that can be used to make predictions using the predict function (see Listing 1). The sprint_statistics function displays statistics about the found ML pipelines [34], e.g. metric used, best validation score, and number of target algorithm runs.
4. A survey of AutoML solutions

In this section we compare all AutoML solutions we are aware of, in total 20 (16 open source and 4 commercial).

4.1 Methodology

We evaluate the AutoML solutions using the following criteria:

1.
Type: this includes information about (a) the licensing model:

–
Open source (OS): The code of the AutoML solution is publicly available under an open source licence.
–
Commercial (C): The AutoML solution is available as a commercial solution only.

(b) the way of accessing the AutoML solution:

–
Software library (SL): The AutoML solution is a software library implemented in a programming language like Python.
–
Local application (LA): The AutoML solution is a desktop application that can be executed on a computer.
–
Web service (WS): The AutoML solution is hosted as a web service and can be accessed via a web browser.

2.
Target user group: The intended user group of the AutoML solution:

–
Domain expert: An expert in an application domain (e.g., biology) who may not have programming expertise.
–
Data/computer scientist: A person with programming/data science expertise.

3.
ML tasks: An overview of supported dataset types and their ML tasks which the AutoML solution can process and generate models on (e.g. classification or regression for tabular data).
4.
Model result: This includes information about (a) the model type returned by the AutoML solution:

–
Single model (SM): Only one ML model is returned (e.g. one neural network).
–
Multiple models (MM): Several ML models are returned (e.g. a neural network and a decision tree).
–
Model ensemble (ME): An ensemble pipeline is returned combining several ML models.

(b) the export type generated by the AutoML solution:

–
Model instance (MI): The model is available as a runtime instance only; the AutoML solution provides no built-in export functionality.
–
Model as file (MF): The model is automatically exported or the AutoML solution provides an export functionality to save the ML model as a file.
–
Files (script and model) (F (S+M)): The AutoML solution generates an execution script, containing code to import the exported ML model and perform a prediction on a new dataset.

5.
Reporting: a characterization of the reporting functionality.

–
Basic: Only a minimum of information is shared with the user, e.g. only the metric of the model.
–
Detailed: Various information about the produced model is shared with the user, e.g. pipeline structure, hyperparameter configuration, etc.

6.
ML library: The libraries that are used by the AutoML solution, e.g. Keras, Tensorflow, etc.
7.
Maturity: This includes information about (a) the release status of the AutoML solution:

–
Released (R): The published version number is at least 1.0.
–
Pre-release (PR): The published version number is below 1.0, e.g. 0.1.
–
Unknown (UK): No release version number is available.

(b) the development status:

–
Actively developed (AD): On-going development with the last release being less than 6 months old, as can be observed in the release notes.
–
Not actively developed (NAD): Sporadic or no on-going development with the most recent release being older than 6 months ago, as can be observed in the release notes.
–
Unknown (UK): No activity or release information are available.

Additionally, an AutoML solution is classified as could not be executed (NE) if we could not execute any AutoML benchmark due to crashes, errors, or other reasons.
8.
Benchmark: Two benchmarks were performed with each AutoML solution:

(a)
Tabular binary classification using the PhishingWebsites dataset.1 The goal of this dataset is to predict if a website is malicious or not for a user. We selected this dataset as it was incorporated into the OpenML Benchmarking Suites [26]. The evaluation metric used is F1 score.
(b)
Tabular regression using the colleges dataset.2 The goal is to predict the percentage of students receiving the pell grant by a university. We chose this dataset for our benchmark, as it is intended to benchmark AutoML solutions. The evaluation metric used is RMSE.

For each benchmark three experiments were performed. The final benchmark score is the mean value of all three experiments within each task. 70% of the dataset were used to compute a ML model and the remaining 30 % were used to validate the AutoML solution model. Each experiment had a time limit of 10 minutes. For AutoML solutions that do not offer a time limit parameter (e.g. AutoKeras) a limitation of the search space or retry was used (e.g. for AutoKeras: max epoch $=$ 10). While most AutoML solutions support a time limit, the web services from Google, Azure and AWS have some restrictions:

•
AWS Sagemaker Autopilot: No time or training limit parameter can be entered using the Web GUI.
•
Google AutoML: The shortest time that can be entered is 1 hour.
•
Azure AutoML: No time or training limit parameter can be entered using the Web GUI.

Each experiment was performed on a AWS EC2 Virtual Machine with instance type m5dn.xlarge (4 Cores and 16 GB Ram). Local applications were tested on a device with similar computation power (4 Cores and 16 GB Ram). The web applications did not offer a similar hardware configuration, the closed available option was selected if a hardware configuration option was offered:

•
AWS Sagemaker Autopilot: 2 EC2 VM with instance type ml.m5.4xlarge.
•
Google AutoML: No hardware configuration can be selected.
•
Azure AutoML: 1 VM with instance type Standard_DS3_v2.

4.2 AutoML solutions

For this survey a total of 20 AutoML solutions have been examined:

1.
ADANET [35]: A Python software library using Tensorflow. It adaptively searches for the best ensemble of neural networks.
2.
AlphaD3M [36]: A Python software library using scikit-learn. Several model types are evaluated.
3.
ATM [33]: A Python software library using scikit-learn. Different Machine Learning approaches are evaluated to return the best model.
4.
AutoCVE [37]: A Python software library using scikit-learn and XGBoost. The final result is an ensemble model.
5.
AutoGluon [38]: A Python software library using a wide range of different ML libraries. Several ML approaches are computed to finally deliver a wide range of models.
6.
AutoKeras [12]: A Python software library using Keras. The best combination of hyperparameters and neural network architecture is selected.
7.
AutoSklearn [10]: A Python software library using scikit-learn. The best ensemble is selected during the AutoML process.
8.
Auto-Pytorch [19]: A Python software library using several ML libraries; the main libraries are Pytorch and scikit-learn, but Catboost and LightGBM are also supported. At the end of the training the best ensemble is selected.
9.
Auto-WEKA [18]: A local application using WEKA. Several ML approaches are used to find the best model.
10.
AWS Sagemaker Autopilot:3 A cloud-based application by Amazon using scikit-learn to generate multiple models.
11.
Azure AutoML:4 a cloud-based application by Microsoft using Microsoft’s Azure MachineLearning library in Python. Several ML approaches are optimized to find the best model.
12.
EvalML:5 A Python software library using several ML libraries to train multiple models. The main library is scikit-learn.
13.
FLAML [39]: A Python software library with scikit-learn as the main ML library.
14.
Google AutoML:6 A cloud-based application by Google, using Tensorflow to compute one ML model.
15.
H2O AutoML [32]: A Python library based on the Java H2O Framwork. It searches for the best model ensemble.
16.
MLBOX:7 A Python software library using Keras and scikit-learn to search for the best model.
17.
MLJAR [40]: A Python software library using scikit-learn. It trains multiple models and an ensemble to find the best solution.
18.
Rapidminer [17]: A desktop application using ML approaches from several major ML libraries (e.g. H2O and WEKA)
19.
TPOT [41]: A Python software library based on scikit-learn and Torch. It returns the best model found.
20.
TransmogrifAI:8 A Scala library using Spark ML as its base ML library. The best model is selected.

4.3 Survey results

The AutoML solutions vary considerably in target user, maturity and produced ML pipeline. An overview can be found in Fig. 2. For details see [42, 43, 44, 45, 46, 47].

We were unable to execute the local application Auto-WEKA for our benchmark. Any attempt at executing the AutoML process led to an error, using our benchmark datasets as well as the datasets made available by the developers.

Almost every open source AutoML solution requires programming knowledge of the user; only Auto-WEKA is targeted towards domain experts requiring no programming skills. While all evaluated commercial

Figure 2.

AutoML survey results.

AutoML solutions target domain experts, AWS Sagemaker Autopilot and Azure AutoML offer a programmable interface for computer/data scientists to execute more detailed experiments.

The most commonly supported ML tasks are classification and regression on tabular data. ATM and AutoCVE are the only AutoML solutions that only support classification on tabular data. While some AutoML solution offer a richer variety (e.g. AlphaD3M, Google AutoML) of supported tasks and/or input data types it is by no means the majority:

•

7 AutoML solutions support only classification and regression on tabular data.

•

8 AutoML solutions support up to 5 additional tasks on tabular or other data.

•

3 AutoML solutions support more than 5 additional tasks on tabular or other data.

Of the three AutoML solutions with the most options on different tasks, one is a commercial solution (Google AutoML) and two are open source libraries (AlphaD3M, AutoGluon).

The majority of the AutoML solutions train various ML approaches to either generate an ensemble or to find the best model during the AutoML process; a few AutoML solutions can only use one ML approach (e.g. neural networks with AutoKeras) or only the winning model is returned (e.g. FLAML).

In Section 3 we identified the output produced by the AutoML process as an ML pipeline and a report. The ML pipeline consists of a model and a script to execute a new prediction using the generated model. Of all surveyed AutoML solutions only 3 (Azure AutoML, AWS Sagemaker Autopilot and TPOT) produce a ML pipeline as previously defined. The majority of the solutions offer functionality for exporting the generated ML model; only 5 AutoML solutions do not have a default way to save the ML model.

Almost all AutoML solutions produce a detailed reporting after concluding the AutoML process, describing the parametrization of found model or even generate graphs with various information about the model and features (e.g. MLJAR).

Regarding maturity, the majority (14) of AutoML solutions is considered pre-release; most of those (8) are still being actively being worked on by their development team/community. Of the remaining 6 AutoML solutions, 3 have reached release status; only Auto-WEKA is not being actively worked on and is the only AutoML solution that currently could not be executed. All web services are classified as unknown, unknown, since their current version number and development status is not displayed. We assume that Google, Microsoft and AWS only publish new products after having reached maturity; therefore all web services AutoML solutions can be considered released and constantly being worked on until their services are discontinued.

MLJAR generates the best model for both, classification and regression benchmark experiments.

4.4 Discussion of existing AutoML solutions

AutoML solutions are implemented on top of specific ML libraries. They produce pipelines using software from those ML libraries that can be exported and imported into those ML libraries. Deciding on an AutoML solution results in a technology lock-in for the corresponding ML library or libraries. Comparing the performance between different ML libraries is not possible.

ONNX [48] is an open format for artificial neural networks (ANN) to enable interoperability between ML libraries. However, not every ML library supports ONNX. Furthermore, ONNX does not support other ML model types besides ANN.

AutoML solutions target specific user groups. Most open source AutoML solutions target users with programming skills, e.g. in Python. Commercial AutoML solutions provide a GUI which also address users without programming skills, e.g., domain experts.

All existing AutoML solutions have their individual features. They all solve the CASH problem, support specific ML tasks, target specific user groups and generate ML pipelines for specific ML libraries.

Meta AutoML allows combining the strengths of individual AutoML solutions, while alleviating their limitations: supporting various ML tasks and user groups while being technology-independent.

In the next section, we introduce OMA-ML, our concept for Meta AutoML.

5. An ontology-based concept for Meta AutoML

Before describing the concept of OMA-ML in detail, we start by defining goals we aspire for OMA-ML.

5.1 Goals for OMA-ML

By combining the features of individual AutoML solutions, we pursue the following goals for OMA-ML:

1.
AutoML: OMA-ML shall perform AutoML, i.e. generate an executable ML pipeline and a report based on a configuration and a dataset.
2.
User groups: OMA-ML shall target user groups with and without programming skills. It shall provide a GUI which allows intuitive configuration of AutoML and interactive reporting. Additionally, it shall provide an API to be used by application programmers.
3.
Technology-independent: OMA-ML shall support any number of ML libraries.
4.
ML tasks: A wide range of ML tasks shall be supported.

5.2 Meta AutoML

Meta AutoML is a novel concept of AutoML. Figure 3 shows the concept as a BPMN diagram. Similar to other AutoML solutions, the user enters the required input (dataset and AutoML configuration). The Meta AutoML solution then prepares various AutoML solutions to be executed in parallel. The results of the AutoML solutions are collected and the results of Meta AutoML (ML pipeline and report) are finalized.

Figure 3.

Meta AutoML workflow.

5.3 ML ontology

Figure 4.

Schema of the ML ontology.

An ontology is a formal, explicit specification of a shared conceptualization of a problem domain [49]. We are developing an ontology for the domain of ML [50]. One of the use cases of this ML ontology is to guide the Meta AutoML process. The ML ontology is modelled in RDF [51] using SKOS [52]. It currently consists of over 1800 RDF triples, specifying 104 ML approaches, 42 ML tasks, 55 metrics, 21 AutoML solutions, 16 ML libraries, their configuration items, interrelationships, and more. The ML ontology is open source and can be accessed from the GitHub repository for OMA-ML.9

Figure 4 shows the classes and relationships of the ML ontology. In the upper part of the diagram, classes representing general ML concepts are depicted. The class ML area represents the major ML areas, in particular supervised learning, unsupervised learning and reinforcement learning. The class ML task lists problems that can be solved using ML, e.g., classification or regression. Each task belongs to an ML area, e.g. classification belongs to supervised learning. With the class ML approach, algorithmic ML technologies are represented, e.g., neural networks, support vector machines, or decision trees; each ML approach is associated with one or several ML tasks. Finally, the class Metric formalizes prediction performance metrics used in ML, e.g., F1-score for classification tasks or RMSE for regression tasks. Metrics are used for ML tasks.

Figure 5.

OMA-ML software architecture and technology stack.

In the lower part of the diagram, implementations of ML concepts are depicted. The class ML library collects available ML libraries like Tensorflow or scikit-learn. The class AutoML solution contains instances like Autosklearn or Google AutoML. Each AutoML solution is used for one or more ML libraries and can perform one or more ML tasks. Finally the class Configuration item represents the knowledge about what configuration parameters are available for each AutoML solution and which ML approaches and metrics can be parameterized, e.g. the AutoML solution Autosklearn allows configuring the ML task classification.

For the classes of the ML ontology, a broader relationship can be used for representing hierarchies within the class, e.g., the ML task classification is a broader concept than the ML task binary classification (not expicitly depicted in Fig. 4).

The ML ontology is the information backbone of OMA-ML and is used in several components of OMA-ML, as shown in the next section.

5.4 OMA-ML software architecture

Figure 5 shows the software architecture of OMA-ML as a UML (Unified Modeling Language [53]) component diagram.

OMA-ML is designed as a 3-layer-architecture.

1.
Presentation layer: This is the user interface of OMA-ML. A GUI allows interaction and visualization. An ontology-guided wizard supports configuring OMA-ML. Additionally, an API provides batch access to OMA-ML.
2.
Logic layer: This implements the control logic of OMA-ML, designed as a blackboard architecture [54]. The OMA-ML controller invokes individual AutoML libraries via the adapter pattern [55], thus providing a plug-in architecture for multiple AutoML solutions.
3.
Data layer: This layer provides access to the ML ontology (read access), the ML model store and AutoML logs (write access).

5.5 User interface

In the GUI, a wizard guides the user to enter mandatory and optional AutoML configuration parameters. The wizard is based on the ML ontology, providing plausible configuration options only. For example, if the user selects AutoML solutions that produce ANN pipelines only, the wizard will only display configuration options for ANN.

Mandatory configuration parameters are as follows:

1.
Dataset: The dataset with labeled training data;
2.
ML task: The task the user wants to perform on the dataset, e.g. classification on tabular data (options from the ML ontology);
3.
ML target: The name of the label column in the dataset.

Optional configuration parameters are:

Figure 6.
OMA-ML control logic.

1.
Dataset schema: Schema information on dataset columns including data types (e.g. int, float, string, date) and categories (e.g., numerical, categorical, textual) (options from the ML ontology);
2.
Scoring: The prediction performance measure to be used as optimization target, e.g. accuracy (options from the ML ontology);
3.
AutoML solutions: Usage restrictions on particular AutoML solutions or ML libraries, e.g. AutoSklearn (options from the ML ontology);
4.
ML model constraints: Restrictions on ML approaches and custom configuration of ML approaches, e.g. ANN with maximum 10 hidden layers (options from the ML ontology);
5.
AutoML runtime constraints: General Meta AutoML constraints (monetary, time, hardware restriction) to influence the execution time, e.g. runtime limit 1 hour (options from the ML ontology);
6.
Training type: Training strategy for Meta AutoML, e.g. using a subset of the dataset only (options from the ML ontology);

After starting the OMA-ML process, the user interface is updated regularly with the current status of the AutoML processes which are executed in parallel. After termination of the OMA-ML process, the following output is provided:

1.
ML pipeline: The user can download the successfully generated ML pipelines as Python scripts and files specifying the pipeline structure. The Python scripts provide the following functionality:

(a)
Import the file specifying the pipeline structure;
(b)
Make predictions for a new, unlabeled dataset;
(c)
Save the prediction result.

2.
Report:

(a)
Description of the used AutoML solutions, their produced ML pipelines and respective performance evaluations;
(b)
ML pipeline leaderboard with scores.

When using OMA-ML in batch mode, the configuration file, including a link to the dataset can be passed to an API. The runtime state and output can be pulled from the API. Like in the online mode, the output consists of ML pipelines and reports.
5.6 Control logic

The OMA-ML control logic is designed using the blackboard pattern. Figure 6 shows an overview of the OMA-ML control logic as a BPMN diagram. When a new run of OMA-ML is triggered, the dataset analyzed at first, extracting the following metadata:

1.
Number of rows and columns;
2.
Data types of columns;
3.
Missing values.

Those metadata are needed for deciding whether pre-processing of the dataset is necessary for individual AutoML solutions. The OMA-ML strategy selection is based on the ML ontology, taking into account the configuration and the dataset analysis result. It selects AutoML solutions which perform the ML tasks specified in the configuration. The dataset is pre-processed if needed. For example, if an AutoML solution requires numeric features only, but the dataset contains textual features, then the textual features are encoded. If the dataset is very large (e.g. 100 million rows) and a small runtime limit is specified (e.g. 1 hour), then approaches with fast training times are selected or the dataset is downsized.

The selected AutoML solutions are invoked via their adapters in parallel by the OMA-ML controller. While executing AutoML, they continuously report their progress to the blackboard. The OMA-ML controller monitors the blackboard. After reaching the termination criteria (e.g. required accuracy is met, or run time limit is reached), the OMA-ML controller finalizes the OMA-ML run, saving the best performing executable ML pipelines to the ML pipeline store, generating a report, and storing it in the report store. Otherwise, the strategy may be altered, or alternative AutoML solutions may be triggered.

Figure 7.
OMA-ML component workflow.

5.7 Logging

All OMA-ML runs are logged in a structured format, including the following data:

1.
AutoML configuration;
2.
Dataset analysis result;
3.
OMA-ML strategy;
4.
Hardware configuration (kernels, memory, processor, etc.);
5.
AutoML actual run time (time spent);
6.
Generated ML pipelines characteristics (accuracy, size, etc.).

With many OMA-ML runs, we expect the log data to be a valuable source of information. Data mining techniques may be used to gain insights to improve the OMA-ML controller’s strategy selection. Using this log data additionally for supervised ML in the OMA-ML controller is subject to future work.
6. Implementation

OMA-ML is developed as an open source project and can be accessed as a GitHub repository.10 OMA-ML is under active development. At the time of writing, a minimum viable product is available with an initial set of AutoML solutions integrated, providing classification and regression tasks for tabular datasets. Additional dataset types and ML tasks are constantly being provided by integrating more AutoML solutions. An overview of the technology used for the implementation of OMA-ML can be seen in Fig. 5.

6.1 Component interaction

Figure 7 shows the component interaction between a user and OMA-ML as a UML sequence diagram. When a user interacts with OMA-ML, a dataset must be uploaded first. During the upload, the dataset is sent to the Logic Layer where it is persisted inside OMA-ML server. After having uploaded a dataset, the user can start configuring a new OMA-ML run by selecting a dataset and performing a (minimal) configuration: ML task to perform (e.g. binary classification), target, and time limit. The options available to the user are dynamically displayed by querying the ML ontology to only provide a sensible configuration; e.g., if the user wants to use PyTorch-based AutoML solutions only, AutoKeras will not be displayed as an option.

After finalizing the configuration, the user can start OMA-ML. The controller will perform automatic preprocessing if required, retrieving the preprocessing workflow from the ontology. For example, if an AutoML solution requires numeric features only but the dataset contains textual features, then the controller will adjust the dataset for this AutoML solution. When the dataset is prepared, all AutoML adapters will start the AutoML processes. They will constantly stream for process updates to the controller until the AutoML process terminates. Those process update information are in turn forwarded to the Presentation Layer.

When all AutoML adapters terminated their execution, a resulting ZIP file containing the ML model and a Python script for local execution will be sent to the controller. The user will be notified that the file is available for download, and can in turn send a download request, which will download the file to his local computer.

6.2 Presentation layer

The GUI is implemented as a web application in C# using the Blazor Framework,11 and provides the user with the following pages:

1.
Configuration: Providing an ontology-based wizard to configure a new execution of OMA-ML.
2.
Reporting: Presenting the leaderboard of the OMA-ML run, including ML models generated, their metric scores and runtimes (see Fig. 8). Additionally the user can download a ZIP file with the selected ML pipeline to perform predictions on new datasets.

Figure 8.
OMA-ML leaderboard.

The GUI is oriented at commercial AutoML solutions like RapidMiner AutoModel which also provide wizard-based configuration and a leaderboard for solutions.
6.3 Logic layer

A simple AutoML controller and seven AutoML adapters have been implemented so far. All components have been realized as Python solutions, containerized using Docker. The controller is based on the OMA-ML control logic (Fig. 6) and offers a gRPC12 interface to allow communication between GUI and Controller. Functionality provided by the gRPC interface can be grouped into the following categories: Dataset manipulation (upload, preprocessing, configuration), ontology queries, and AutoML session (start, information, results).

Another gRPC interface is implemented in each AutoML adapter. The controller uses those adapter interfaces to start new AutoML sessions and receive updates from those sessions. The updates sent to the controller during an ongoing AutoML run comprise the console output produced by the underlying AutoML libraries. After a AutoML library concludes successfully its search for the best performing model, the Adapter uses the templating language Jinja213 to generate the Python script as defined in Section 5.5.

6.4 AutoML libraries

At the time of writing, the OMA-ML system supports seven AutoML libraries:

1.
AutoPytorch
2.
AutoSklearn
3.
AutoKeras
4.
FLAML
5.
AutoGluon
6.
AutoCVE
7.
MLJAR

Currently, only classification and regression on tabular data is provided.
6.5 Data layer

The ML ontology (Section 5.3) is loaded into the AutoML controller using the Python library RDFlib.14 SPARQL is used for querying the ML ontology.

7. Evaluation

In this section, we evaluate the concept and implementation of OMA-ML against the goals specified in Section 5.1.

1.
AutoML: OMA-ML performs AutoML. A user can upload datasets; in the current stage of implementation, this is restricted to tabular data. The user may configure the AutoML execution. An ontology-based wizard guides the configuration setup. After executing OMA-ML, a resulting report in form of a leaderboard is presented. The user may download the resulting ML pipeline of choice in form of a ZIP file. When applying the phishing dataset used for classification benchmarks in our survey, OMA-ML achieves an F1-score of 0.949 (see an OMA-ML screenshot of the leaderboard in Fig. 8).
2.
User groups: Users with or without programming skills can interact with OMA-ML: A GUI provides access to OMA-ML functionality for users without programming skills. To interact with the AutoML Controller, users with programming skills can access OMA-ML programmatically via the gRPC API.
3.
Technology-independent: Any AutoML solution can be integrated in OMA-ML. For every additional AutoML solution, a new AutoML Adapter must be implemented and the ML ontology must be updated with metadata about configuration options of the AutoML solution. In the current state of the OMA-ML system, seven different AutoML solutions are integrated. Previous implementation experience indicates that adding a new AutoML solution requires implementation effort of about 1-4 person weeks, depending on the complexity of the AutoML solution. About 1-2 person days are required to analyze the AutoML solution and extend the ontology accordingly. The integration of additional AutoML solutions is subject to future work.
4.
ML task: Any ML task that is provided by an existing AutoML solution may be offered by OMA-ML by integrating this AutoML solution. In the current state of the OMA-ML system, only the tasks classification and regression on tabular data are supported. Several AutoML solutions (e.g. AutoKeras, Autosklearn, AlphaD3M, etc.) offer a wider range of ML tasks for other dataset types (e.g. texts, images, video, audio, and graphs). The extension of OMA-ML to support those tasks is subject to future work.

7.1 Limitations

OMA-ML offers users the possibility to compute ML pipelines for any ML task supported. However, this may come at a high cost of computing power, and thus energy consumption. This is a potential problem of all AutoML solutions, and OMA-ML multiplies this by executing various AutoML solutions in parallel. In recent publications about sustainability in AI systems, the term red AI in contrast to green AI is being used [56]. If certain AutoML solutions can be individually be rated red AI, executing them in parallel in OMA-ML could then be rated “deep red AI”. The authors of [57] use the field of Natural Language Processing (NLP), to illustrate the enormous consumption of energy required by modern approaches to achieve increasing results. They suggest that instead of focusing on prediction performance only, energy consumption shall be considered as well, in order to target more energy efficient algorithms.

In the OMA-ML concept, we do envisage a process step which may deal with this issue: the strategy selection. Using knowledge from the ML ontology, it may be possible to largely reduce the need for computation power required for OMA-ML execution. Firstly only the most promising AutoML solutions for a given task may be suggested. Secondly, only the most efficient configurations for those AutoML solutions may be provided. The more intelligent the strategy selection, the more energy-efficient may be the solution. Thirdly, the execution of AutoML solutions which are performing badly compared to others may be terminated early. Applying artificial intelligence for the strategy selection is subject to future work.

8. Conclusion and future work

AutoML continues to be an active field of research, with many open source as well as commercial solutions being actively developed and released. The contribution of this article is three-fold. Firstly, we presented a survey on 20 existing AutoML solutions, evaluating their functionality, targeted user groups, maturity and performance. The open source AutoML solutions are almost exclusively targeting data/computer scientists. While the commercial AutoML are marketed towards domain experts, they offer an easy-to-use GUI for domain experts. Some of them additionally offer an optional API for data/computer scientists. While the commercial AutoML solutions can be considered as released and mature software, most open source AutoML solutions are in pre-release state and continue to being actively enhanced. Most AutoML solutions support one ML library only. Therefore, choosing an AutoML solution results in a technology lock-in regarding the ML library.

The second contribution of this article addresses this issue. We presented OMA-ML (Ontology-based Meta AutoML), a novel ontology-based concept for Meta AutoML. OMA-ML combines the features of different AutoML solutions. It support multiple ML libraries by employing multiple AutoML solutions via a plug-in architecture. This allows enhancing OMA-ML regarding future third-party AutoML developments by extending the underlying ontology and implementing an adapter. OMA-ML supports multiple user groups, with and without programming skills. By combining the strengths of several AutoML solutions, it supports a wide range of ML tasks and ML libraries.

Thirdly, we presented an implementation of OMA-ML. A minimum viable product is available, consisting of a GUI with an configuration wizard, and a simple controller component implementation which currently integrates 7 third-party AutoML solutions. An ontology is the information backbone of the implementation guiding the configuration wizard. The OMA-ML system including the ontology are open source under GitHub.

The implementation of the OMA-ML system is ongoing work in progress. We plan the following next development steps:

Integration of more AutoML solutions;

Support for additional ML tasks for additional dataset types, e.g., text, image, audio, video and graph;

Implementation of an ontology-guided control logic including pre-processing and strategy selection using a blackboard approach;

Improvements in user experience;

Persistence of OMA-ML execution data.

Future research work includes the use of supervised ML on OMA-ML log data to improve the strategy selection in the OMA-ML controller. Furthermore, we plan to use learning curve analysis in the OMA-ML controller and to use transfer learning. We also plan to thoroughly analyze the OMA-ML system regarding user experience and generated ML pipelines quality.

Footnotes

OpenML phishing dataset: https://www.openml.org/d/4534.

OpenML colleges dataset: https://www.openml.org/d/42727.

AWS Sagemaker Autopilot website: https://aws.amazon.com/ sagemaker/autopilot/.

Azure AutoML product page: https://azure.microsoft.com/en-us/ services/machine-learning/automatedml/#features.

EvalML Github: https://github.com/alteryx/evalml.

Google AutoML website: https://cloud.google.com/automl/.

MLBOX GitHub: https://github.com/AxeldeRomblay/MLBox.

TransmogrifAI Github: https://github.com/salesforce/Transmogrif AI.

ML ontology Github: https://github.com/hochschule-darmstadt/ MetaAutoML/tree/main/controller/managers/ontology.

OMA-ML GitHub repository: https://github.com/hochschule-darmstadt/MetaAutoML.

Blazor website: https://dotnet.microsoft.com/apps/aspnet/web-apps/blazor.

gRPC website: https://grpc.io/.

Jinja documentation: https://jinja2docs.readthedocs.io.

RDFlib website: https://rdflib.dev/.

Acknowledgments

This work is funded by the German federal ministry of education and research (BMBF) in the program Zukunft der Wertschöpfung (funding code 02L19C157), and supported by Projektträger Karlsruhe (PTKA). The responsibility for the content of this publication lies with the authors.

We thank our graduate students Andre Brücke, Lukas Jansen, Luciano Jung, Daniel Kraft, Shamil Nabiyev, Sven Nawrat, Thanh Loan Nguyen, Tim Pachmann, Patrick Reckeweg, Gerrit Derk Scheppat, and Andre Wohnsland for contributing to the survey and Alex Becker, Dong Hung Pham, David Reyer, Fabio Burillo Ruiz, Lars Stockum, and Jonas Weßner for additionally contributing to the implementation of OMA-ML.

References

Russell

Norvig

. Artificial intelligence: A modern approach. 3rd ed. Prentice Hall Series in Artificial Intelligence. Upper Saddle River: Pearson; 2016.

Mukhin

Kilbas

Paringer

Ilyasova

Kupriyanov

. A method for balancing a multi-labeled biomedical dataset. Integrated Computer-Aided Engineering. 2022; 29(2): 209-225.

Zotov

Tiwari

Kadirkamanathan

. Conditional StyleGAN modelling and analysis for a machining digital twin. Integrated Computer-Aided Engineering. 2021; 28(4): 399-415.

Schwan

Schenck

. A three-step model for the detection of stable grasp points with machine learning. Integrated Computer-Aided Engineering. 2021; 28(4): 349-367.

Ga̧sienica-Józkowy

Knapik

Cyganek

. An ensemble deep learning method with optimized weights for drone-based water rescue and surveillance. Integrated Computer-Aided Engineering. 2021; 28(3): 221-235.

Buendia-Buendia

Buendia

Andina

. Determining geostrophic wind direction in a rainfall forecast expert system. Integrated Computer-Aided Engineering. 2018; 26(1): 111-121.

Zöller

Huber

. Benchmark and Survey of Automated Machine Learning Frameworks. Journal of Artificial Intelligence Research. 2021; 70: 409-472.

Moore

. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics (Oxford, England). 2020; 36(1): 250-256.

Charte

Rivera

Martínez

del Jesus

. EvoAAA: An evolutionary methodology for automated neural autoencoder architecture search. Integrated Computer-Aided Engineering. 2020; 27(3): 211-231.

10.

Feurer

Eggensperger

Falkner

Lindauer

Hutter

. Auto-Sklearn 2.0: The Next Generation. https//arxiv.org/pdf/2007.04074.

11.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

, et al. Scikit-Learn: Machine Learning in Python. J Mach Learn Res. 2011; 12: 2825-2830. doi: 10.5555/1953048.2078195.

12.

Jin

Song

. Auto-Keras: An Efficient Neural Architecture Search System. In: Teredesai

, editor. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM Digital Library. New York, NY, United States: Association for Computing Machinery; 2019. pp. 1946-1956.

13.

Chollet

. Keras; 25. 03. 2021. https//keras.io.

14.

Google Cloud. Cloud AutoML; 14. 01. 2021. https://cloud.google.com/automl?hl=de.

15.

Abadi

Agarwal

Barham

Brevdo

Chen

Citro

, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. https//arxiv.org/pdf/1603.04467.

16.

Erickson

Mueller

Shirkov

Zhang

Larroy

, et al. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv preprint arXiv: 200306505. 2020.

17.

Lanio

. RapidMiner Auto Model; 09. 03. 2018. https//rapidminer.com/products/auto-model/.

18.

Kotthoff

Thornton

Hoos

Hutter

Leyton-Brown

. Auto-WEKA: Automatic Model Selection and Hyperparameter Optimization in WEKA. In: Hutter

Kotthoff

Vanschoren

, editors. Automated machine learning. The Springer Series on Challenges in Machine Learning. Cham: Springer International Publishing; 2019. pp. 81-95.

19.

Mendoza

Klein

Feurer

Springenberg

Urban

Burkart

, et al. Towards Automatically-Tuned Deep Neural Networks. In: Hutter

Kotthoff

Vanschoren

, editors. Automated machine learning. The Springer Series on Challenges in Machine Learning. Cham: Springer International Publishing; 2019. pp. 135-149.

20.

Humm

Zender

. An Ontology-Based Concept for Meta AutoML. In: Maglogiannis

Macintyre

Iliadis

, editors. Artificial Intelligence Applications and Innovations. vol. 627 of Springer eBook Collection. Cham: Springer International Publishing and Imprint Springer; 2021. pp. 117-128.

21.

Zhao

Chu

. AutoML: A survey of the state-of-the-art. Knowledge-Based Systems. 2021; 212: 106622.

22.

Doke

Gaikwad

. Survey on Automated Machine Learning (AutoML) and Meta learning. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT); 2021. pp. 1-5.

23.

Gijsbers

LeDell

Thomas

Poirier

Bischl

Vanschoren

. An Open Source AutoML Benchmark. https//arxiv.org/pdf/1907.00909.

24.

. Analysis on Approaches and Structures of Automated Machine Learning Frameworks. In: 2020 International Conference on Communications, Information System and Computer Engineering. Piscataway, NJ: IEEE; 2020. pp. 474-477.

25.

Chauhan

Jani

Thakkar

Dave

Bhatia

Tanwar

, et al. Automated Machine Learning: The New Wave of Machine Learning. In: 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA 2020). Piscataway, NJ: IEEE; 2020. pp. 205-212.

26.

Bischl

Casalicchio

Feurer

Gijsbers

Hutter

Lang

, et al. OpenML Benchmarking Suites. https//arxiv.org/pdf/1708.03731.

27.

Yoo

Joseph

Yung

Nasseri

Wood

. Ensemble Squared: A Meta AutoML System. https//arxiv.org/pdf/2012.05390.

28.

ISO/IEC 19510: 2013(en), Information technology – Object Management Group Business Process Model and Notation; 31. 03. 2022. https//www.iso.org/obp/ui/#iso:std:iso-iec:19510:ed-1:v1:en.

29.

Thornton

Hutter

Hoos

Leyton-Brown

. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In: Dhillon

, editor. KDD’13 the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: August 11-14, 2013, Chicago, Illinois, USA. ACM; 2013. pp. 847-855.

30.

Feurer

Klein

Eggensperger

Springenberg

Blum

Hutter

, editors. Efficient and Robust Automated Machine Learning. MIT Press; 2015.

31.

Hutter

Hoos

Leyton-Brown

. Sequential Model-Based Optimization for General Algorithm Configuration. In: Coello

CAC

, editor. Learning and Intelligent Optimization. vol. 6683 of Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Nature; 2011. pp. 507-523.

32.

LeDell

Poirier

. H2O AutoML: Scalable Automatic Machine Learning. 7th ICML Workshop on Automated Machine Learning (AutoML). 2020. https//www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf.

33.

Swearingen

Drevo

Cyphers

Cuesta-Infante

Ross

Veeramachaneni

. ATM: A distributed, collaborative, scalable system for automated machine learning. In: 2017 IEEE International Conference on Big Data (Big Data). IEEE; 122017. pp. 151-162.

34.

Machine Learning Professorship Freiburg. AutoSklearn documentation; 16. 03. 2021. https//automl.github.io/auto-sklearn/master/api.html.

35.

Corinna

Xavier

Vitaly

Mehryar

Scott

. AdaNet: Adaptive Structural Learning of Artificial Neural Networks. In: Precup

Teh

, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. pp. 874-883. https//proceedings.mlr.press/v70/cortes17a.html.

36.

Drori

Krishnamurthy

Lourenco

Rampin

Cho

Silva

, et al. Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar. https//arxiv.org/pdf/1905.10345.

37.

Larcher

CHN

Barbosa

HJC

. Auto-CVE. In: López-Ibánez

, editor. Proceedings of the Genetic and Evolutionary Computation Conference. ACM Digital Library. New York, NY, United States: Association for Computing Machinery; 2019. pp. 392-400.

38.

Fakoor

Mueller

Erickson

Chaudhari

Smola

. Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation. In: H

Larochelle

Ranzato

Hadsell

Balcan

Lin

, editors. Advances in Neural Information Processing Systems. vol. 33. Curran Associates, Inc; 2020. pp. 8671-8681. https://proceedings.neurips.cc/paper/2020/file/62d75fb2e3075506e8837d8f55021ab1-Paper.pdf.

39.

Wang

Weimer

Zhu

. FLAML: A Fast and Lightweight AutoML Library. In: A

Smola

Dimakis

Stoica

, editors. Proceedings of Machine Learning and Systems. vol. 3; 2021. pp. 434-447. https//proceedings.mlsys.org/paper/2021/file/92cc227532d17e56e07902b254dfad10-Paper.pdf.

40.

Aleksandra

Piotr

. MLJAR: State-of-the-art Automated Machine Learning Framework for Tabular Data. Version 0.10.3. apy, Poland: MLJAR; 2021. https//github.com/mljar/mljar-supervised.

41.

Laadan

Vainshtein

Curiel

Katz

Rokach

. MetaTPOT. In: d’Aquin

, editor. Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM Digital Library. New York, NY, United States: Association for Computing Machinery; 2020. pp. 2097-2100.

42.

Jansen

. Evaluating AutoML performance of Sagemaker Autopilot and TransmogrifAI. Technical Report, Hochschule Darmstadt – University of Applied Sciences, Darmstadt, Germany; 2022. doi: 10.48444/h_docs-pub-281.

43.

Scheppat

. A comparison of AutoML solutions ATM and AWS Sagemaker Autopilot. Technical Report, Hochschule Darmstadt – University of Applied Sciences, Darmstadt, Germany; 2022. doi: 10.48444/h_docs-pub-266.

44.

Reyer

. Comparison of AutoML solutions Auto-PyTorch and AutoCVE. Technical Report, Hochschule Darmstadt – University of Applied Sciences, Darmstadt, Germany; 2022. doi: 10.48444/h_docs-pub-270.

45.

Wohnsland

. Comparison of AutoML solutions MLBox and auto-sklearn. Technical Report, Hochschule Darmstadt – University of Applied Sciences, Darmstadt, Germany; 2022. doi: 10.48444/h_docs-pub-273.

46.

Pachmann

. An evaluation and comparison of AutoML solutions: Azure AutoML and EvalML. Technical Report, Hochschule Darmstadt – University of Applied Sciences, Darmstadt, Germany; 2022. doi: 10.48444/h_docs-pub-271.

47.

Nabiyev

. Vergleichende Analyse der AutoML-Lösungen: AutoGluon und TPOT. Technical Report, Hochschule Darmstadt – University of Applied Sciences, Darmstadt, Germany; 2022. doi: 10.48444/h_docs-pub-265.

48.

The Linux Foundation. ONNX; 17. 03. 2021. https//onnx.ai/.

49.

Studer

Benjamins

Fensel

. Knowledge engineering: Principles and methods. Data & Knowledge Engineering. 1998; 25(1-2): 161-197.

50.

Humm

Bense

Fuchs

Gernhardt

Hemmje

Hoppe

, et al. Machine intelligence today: applications, methodology, and technology. Informatik Spektrum. 2021; pp. 1-11. doi: 10.1007/2Fs00287-021-01343-1.

51.

Cyganiak

Wood

Lanthaler

. RDF 1.1 Concepts and Abstract Syntax; 26. 03. 2021. https//www.w3.org/TR/rdf11-concepts/.

52.

Miles

Bechhofer

. SKOS Simple Knowledge Organization System Namespace Document; 06. 08. 2011. https//www.w3.org/2009/08/skos-reference/skos.html.

53.

ISO/IEC 19505-2: 2012 (en), Information technology – Object Management Group Unified Modeling Language (OMG UML) – Part 2: Superstructure; 31. 03. 2022. https//www.iso.org/obp/ui/#iso:std:iso-iec:19505:-2:ed-1:v1:en.

54.

Buschmann

Meunier

Rohnert

Sommerlad

Stal

. Pattern-Oriented Software Architecture, A System of Patterns. 1st ed. Wiley Software Patterns Series. s.l. Wiley; 2013.

55.

Gamma

. Design patterns: Elements of reusable object-oriented software. 39th ed. Addison-Wesley professional computing series. Boston: Addison-Wesley; 2011.

56.

Schwartz

Dodge

Smith

Etzioni

. Green AI. Communications of the ACM. 2020; 63(12): 54-63. doi: 10.1145/3381831.

57.

Strubell

Ganesh

McCallum

. Energy and Policy Considerations for Deep Learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019; pp. 3645-3650. doi: 10.18653/v1/P19-1355. https//aclanthology.org/P19-1355/.

Ontology-based Meta AutoML

Abstract

Keywords

1. Introduction

2. Related work

3.1 Input and output

4.1 Methodology

5. An ontology-based concept for Meta AutoML

5.1 Goals for OMA-ML

6.1 Component interaction

6.2 Presentation layer

6.4 AutoML libraries

1. AutoPytorch 2. AutoSklearn 3. AutoKeras 4. FLAML 5. AutoGluon 6. AutoCVE 7. MLJAR Currently, only classification and regression on tabular data is provided. 6.5 Data layer

7. Evaluation

8. Conclusion and future work

Footnotes

Acknowledgments

References

1.
AutoPytorch
2.
AutoSklearn
3.
AutoKeras
4.
FLAML
5.
AutoGluon
6.
AutoCVE
7.
MLJAR

Currently, only classification and regression on tabular data is provided.
6.5 Data layer