Constructing reusable knowledge for machine learning projects based on project practices

Abstract

Recently, machine learning (ML) techniques have been introduced into various domains. This study focuses on projects for the development of ML-based service systems in which ML techniques are applied to enterprise functions. In these projects, constructing reusable knowledge on projects that develop ML-based service systems is important to effectively implement such projects. Here, the collection of insights and development of architecture and design patterns for ML-based service systems are considered. We propose a method for collecting insights by referring to a development model based on project practices and developing patterns for ML projects as an enterprise architecture model. Through a practice, we attempt to collect insights as best practices and construct design patterns for ML projects using the proposed method.

Keywords

Machine learning ML service system best practice pattern

1. Introduction

Currently, numerous machine learning (ML) techniques are available as application programming interfaces (APIs). Therefore, ML techniques can be used for practical business applications. Accordingly, enterprises have begun to implement such techniques in their business functions. Here, we consider projects for developing service systems in enterprises using ML APIs. New features have emerged in service systems that use ML techniques (ML-based service systems). Furthermore, when ML techniques are applied to these business functions, acquiring training data on the target business domain is important. Thus, sufficient knowledge of, or prior experience in, the target business domain is essential. Therefore, representatives of the IT division and relevant business divisions are required to participate in the project. Consequently, numerous challenges arise in ML service system development projects (ML projects) in terms of requirements, design, implementation, and test phases [1, 2].

Therefore, collecting insights from project practices and creating reusable knowledge is necessary for the effective undertaking of ML projects. Here, we focus on the patterns of ML projects as reusable knowledge. Practitioners are considered to obtain insights through ML project practices and share them with their organizations. Such insights are described using different formats and are not shared beyond organizations.

We propose a method for constructing reusable knowledge for ML projects as patterns based on project practice. To this end, we implement a development model for ML-based service systems, such that practitioners can provide insights from projects by referring to the model. Moreover, we define steps according to which practitioners construct patterns from the collected insights, using enterprise architecture (EA)-based generic ML architecture and a design pattern model. By applying the proposed method, we attempted to confirm that practitioners can effectively collect insights on ML projects and construct patterns as reusable knowledge to conduct ML projects.

The remainder of this paper is organized as follows. In Section 2, we describe related studies. We introduce the ML-based service system and architecture and design patterns for ML service systems in Section 3. Moreover, we define the research hypothesis of this study. In Section 4, we present our method for constructing the reusable knowledge for ML projects from project practices. In Section 5, we demonstrate the practice in which we apply the proposed method. Finally, Section 6 discusses our results, and Section 7 summarizes key points and presents directions for future studies.

2. Related work

By reviewing the literature and surveys of several actual projects, [1, 2] identified many software engineering challenges that arise in a project for developing ML systems , and the extraction of knowledge for conducting ML projects was emphasized.

The best practices in ML projects for constructing knowledge were collected through a literature survey [3]. Furthermore, [4] introduced a general workflow for the development of ML systems as reusable knowledge for ML projects. The role of data scientists in ML system development projects was discussed in [5]. By combining these findings, a project model that represents the relationships among project activities, stakeholders, and project goals was subsequently proposed [6]. An architecture for representing an entire system is required for knowledge in practical projects in which big data analytics or ML techniques are applied [7]. In addition, reference architectures for development teams have been proposed [8, 9]. In [10], a reference architecture for intelligent systems was presented, which combines digital strategies and architectures [11] with artificial intelligence.

The development of architecture and design patterns has been considered a knowledge resource for software system architecture, and several studies have addressed this. In [12], several patterns focusing on the operational stability of ML systems were introduced. Moreover, through a systematic literature review, software engineering patterns for ML systems were identified and formalized in [13, 14]. These patterns are typically described as itemized documents and primarily target data scientists and ML application developers. Some surveys have been conducted to clarify how ML developers perceive these patterns [15]. The representation of ML patterns as EA-based models was investigated in [16]. In addition, code smells [17] and data smells [18] for ML systems were collected through literature reviews.

3. Research subject and hypothesis

3.1 ML-based service system

In our research, we considered developing a system using ML techniques for business functions that either support or are substitutes for human activities.

For a given input data, ML techniques predict and output the optimal option from a predefined set. In a workplace, these techniques are used in routine activities such as replying to service queries or conducting business assessments based on customer-provided information. To develop systems using ML techniques, the options for the target business domain are first defined; subsequently, the example inputs related to each option are collected. An ML model is generated (trained) from a training dataset containing such pairs of options and examples. Subsequently, this model is deployed into a runtime ML engine, which then obtains the input data and provides the output data using this prediction model.

In this study, we represent an ML-based service system development project carried out by employing an EA modeling approach using ArchiMate[19] as the EA modeling language. The ML-based service system using an EA, as illustrated in Fig. 1, was obtained by employing the three business concepts and three application concepts listed below.

Figure 1.

ML-based service system represented by ArchiMate.

•

Business layer concepts:

(a)

Business service: An explicitly defined and executed business activity.

(b)

Business process: A sequence of business activities that produce a planned outcome.

(c)

Business object: A set of concepts used within a particular business domain.

•

Application layer concepts:

(a)

Application service: An explicitly defined and exhibited application behavior.

(b)

Application component: An element of application functionality aligned to the implementation structure.

(c)

Data object: Data structured for automated processing.

3.2 Architecture design pattern for ML service systems

Software design patterns are a form of reusable knowledge in software engineering. In software design patterns, best practices are formalized such that engineers can use them to solve typical problems that occur when designing an application or system. Although no standard format exists for many patterns, the following items are defined to describe software design patterns:

•
Intent: Objective of the pattern;
•
Problem: Forces that the pattern seeks to resolve;
•
Solution: Suggested activities to solve the problem;
•
Context: Environmental information on the system; and
•
Discussion: Pre-conditions or limitations for applying the pattern.

Several security patterns [12], and architecture and design patterns [13, 14] have been introduced for ML service systems. For example, in [14], a pattern related to the ML system architecture pattern, known as the “data flows up, model flows down with federated learning” pattern, was described. Table 1 lists the aforementioned pattern elements. Figure 2 shows the solution proposed by the “data flows up, model flows down” pattern.

Table 1
Data flows up, model flows down with federated learning pattern

Intent Improve the response time for the input query and prediction performance based on the local users’ queries and output results.

Problem The ML application cannot return its prediction results in real time if the ML model is deployed to the cloud environment. The prediction performance depends on the users’ queries. If the data collected from the local device are stored in the cloud environment for retraining, the user’s privacy and data confidentiality must be preserved.

Solution
[] [shape=circle,draw,inner sep=0.4pt,font=] (char) 1;
Deploy the ML model to the local device.
[] [shape=circle,draw,inner sep=0.4pt,font=] (char) 2;
In each device, the ML model is retrained by the data locally collected.
[] [shape=circle,draw,inner sep=0.4pt,font=] (char) 3;
The difference models provided by local devices are averaged and updated into the ML model on the cloud.

Context
•
The application runs on a personal local device such as a smart phone.
•
This problem should be considered in the system design phase.

Discussion
•
The local device has sufficient computing resources.
•
Retraining based on each user’s log data does not have any negative impact.
•
Prediction models can be combined.

Figure 2.
Solution proposed in “data flows up, model flows down with federated learning” pattern.

As indicated in the context field, the “data flows up, model flows down” pattern is implemented in the ML system using mobile/edge devices such as mobile phones, cameras, and IoT devices. A major retail company in the USA introduced the ML system to identify traffic problems in the parking lots of their stores.1
¹
https://www.ibm.com/products/maximo/remote-monitoring.

They trained an ML model from the image data collected by parking lot security cameras and deployed it at each store (model flows down). The model was applied to real-time images from security cameras in stores, and notifications were sent to staff when traffic jams occurred. When a store encountered an adversarial situation, e.g., a false positive or negative, the application sent the corresponding set of images to the cloud for model retraining (data flows up).

Federated learning is a special example of this pattern. It was implemented on Google Keyboard using Android and is known as Gboard.2
²
https://ai.googleblog.com/2017/04/federated-learning-collabora tive.html.

When users type texts, their phones store information regarding the current context and whether they select the suggestions provided by Gboard. Gboard uses an ML model locally and retrains the model using the data stored. Thus, Gboard reflects the specific behavior of users, and the suggestions are improved. The difference between the original and retrained models is uploaded to Google Cloud. The base ML model in the cloud is updated at fixed intervals using these uploaded difference models and then re-deployed.
3.3 Research hypothesis

Intent	Improve the response time for the input query and prediction performance based on the local users’ queries and output results.
Problem	The ML application cannot return its prediction results in real time if the ML model is deployed to the cloud environment. The prediction performance depends on the users’ queries. If the data collected from the local device are stored in the cloud environment for retraining, the user’s privacy and data confidentiality must be preserved.
Solution	[] [shape=circle,draw,inner sep=0.4pt,font=] (char) 1; Deploy the ML model to the local device. [] [shape=circle,draw,inner sep=0.4pt,font=] (char) 2; In each device, the ML model is retrained by the data locally collected. [] [shape=circle,draw,inner sep=0.4pt,font=] (char) 3; The difference models provided by local devices are averaged and updated into the ML model on the cloud.
Context	• The application runs on a personal local device such as a smart phone. • This problem should be considered in the system design phase.
Discussion	• The local device has sufficient computing resources. • Retraining based on each user’s log data does not have any negative impact. • Prediction models can be combined.

As previously mentioned, the best practices, architecture, and design patterns of ML-based service systems are based on literature surveys. This implies that experts with sufficient knowledge of software engineering and ML techniques must analyze the literature and construct patterns. Moreover, insights obtained through project practices are not systematized as best practices or patterns unless published. A large organization conducting various types of ML projects has published their insights as patterns [20], how the insights were systematized into such patterns is unclear.

Figure 3.

Overview of proposed method.

Figure 4.

Generic agile development model.

Figure 5.

Workflow for using ML techniques.

Within this context, we considered the following research question (RQ):

How can practitioners construct reusable knowledge for ML projects from real project practices?

For this RQ, we propose a method for collecting insights from project practices and constructing reusable knowledge as patterns from the insights. Furthermore, we confirm the effectiveness of the proposed method in practice.

4. Proposed method

4.1 Overview

In this study, we consider knowledge construction based on ML project practices. Figure 3 overviews the proposed method.

The proposed method consists of the following steps:

1.
Prepare a development model based on ML project practices.
2.
Derive insights from ML projects by referencing the development model.
3.
Construct patterns from the collected insights.

4.2 Reference development model and collection of insights

In the proposed method, a development model for collecting insights is prepared as the first step. In this study, we used the agile development model for ML service systems proposed in [21]. This model was extended from the general agile development model and ML workflow model based on actual ML project practices.

A generic agile development model was proposed in [22] and is represented as shown in Fig. 4.

A workflow model for ML-based service systems was proposed in [4] and is represented as shown in Fig. 5.

Figure 4 shows that the work items are defined by specifying requirements, and iteration backlogs are specified from the work items. In Fig. 5, detailed activities for specifying the requirements for ML-based service systems, work items, and iteration backlogs are not represented. In ML projects, practitioners in the user segments are assigned to development activities and must clearly understand the items they are responsible for in each activity. Therefore, in [21], a reference agile development model for ML projects was extended from existing models based on the project practice data.

As project data, we used data on 23 ML projects collected in [23]. In these data, each ML project is represented as an ordered list of project activities defined in the ML project canvas [24]. By comparing the ML projects in this data, common activities in the ML projects were identified. Therefore, we analyzed the activities conducted before data collection. Table 2 lists the analysis results. For example, the purpose of the project was considered before data collection in 20 out of 23 projects (87.0%).

Table 2
Activities conducted before data collection

Activity	# of projects	Ratio (%)
Consider the purpose or goal	20	87.0
Consider the action based on the prediction	14	60.9
Determine the user segments	12	52.2
Define the metrics of success	10	43.5
Determine the algorithms and infrastructure	6	26.1
Refer to past knowledge or know-how	6	26.1
Discuss the UI and system for end users	4	17.4
Identify the expansion or secondary goal	3	13.0
Consider the data enhancement	1	4.3
Consider the model updates or maintenance	1	4.3
Consider the open strategy	1	4.3

From Table 2, the following common activities are derived to specify requirements:

•

Consider the purpose or goal.

•

Consider the action based on the prediction.

•

Determine the user segments.

•

Define the metrics of success.

•

Determine the algorithms and infrastructure.

By combining the first and third activities, we define a new activity: “Consider the user segments and their goals.” The requirements for the entire system and ML model were specified as work items based on these four activities. To execute the development tasks, we defined the metrics for the ML model and entire system as iteration backlogs from these work items. Using these derived activities, work items, and iteration backlogs, we extended the existing models described in Figs 4 and 5. Consequently, using ArchiMate [19], we represented a practice-based reference model for the agile development of ML-based service systems, as shown in Fig. 6.

In the second step, practitioners provide insights derived from ML projects by referencing this development model. In this study, we arranged a workshop in which practitioners participated and shared their insights.

Table 3

Subdivided pattern elements

Pattern element	Subdivided pattern element	Element in ArchiMate
Intent	Object of pattern	Outcome
Problem	Situation to be improved	Driver
	Assessment result of the situation to be improved	Assessment
	Goal achieved by applying the pattern	Goal
Solution	Suggested activities to solve the problem	Principle
Context	Phase in which the pattern is applied	Business process
	Device where system is running	Device
	System user	Actor
Discussion	Pre-condition or limitation in the pattern	Constraint

Figure 6.

Agile development model for ML service systems.

Figure 7.

Generic ML architecture and design pattern represented using ArchiMate.

4.3 Construction of patterns from collected insights

Patterns were constructed from the insights collected in the second step. The insights provided by practitioners describe recommended activities for conducting ML projects effectively, as well as for the project phases during which the activities are conducted. For example, the insight “We should consider the metrics on the reliability, safety, or fairness, as well as that on the accuracy when defining the metrics of project success” is recommended for the project planning phase. In this section, we outline the steps for constructing patterns from insights with this formatted information.

We used an EA model for the ML architecture and design patterns [16]. The relationships among key elements are sometimes not clearly described in the pattern documents, which hinders the common understanding of the patterns among stakeholders. By contrast, the EA model using ArchiMate represents the relationships among the pattern elements. To represent the ML design patterns as EA models, we first attempted to identify the common elements described in each field in the ML design pattern documents. By observing existing ML design patterns, we determined the following elements in the problem field.

•
Situation to be improved
•
Assessment result of the situation to be improved
•
Expected goal achieved by applying the pattern

In addition, the following elements are typically described in the context field.

•
Phase in which the pattern is applied
•
Device where the system is running
•
System user

For these subdivided elements and other existing elements, such as intent, solution, and discussion, we can assign the model elements defined in ArchiMate. Table 3 lists the mapping between the pattern elements and EA model elements.

Next, we attempted to represent the relationships between the elements in Table 3. For example, the solution represented as a principle is considered to realize the object of the pattern, which is represented as an outcome. This relationship can be represented using the realization notation defined in ArchiMate. Using relationships such as realization, composition, access, flow, serving, assignment, and association, the pattern elements in the ML architecture and design patterns can be connected. Consequently, as shown in Fig. 7, we can represent a generic ML architecture and design pattern using ArchiMate.

Table 4
Summary of collected insights

Plan and design Data Training App. development Deployment and maintenance Organization Governance Sum

Results of workshop 7 4 8 2 3 4 0 28

Results of Surban et al. 0 5 11 3 6 3 1 29

Table 5
Collected insights

Stage Best practice

S01 Plan and design Maintain analysis infrastructures as common as possible.

S02 Plan and design Return to review the project goal if required.

S03 Plan and design Agree with the goals and available computing resources.

S04 Plan and design Provide multiple candidate algorithms.

S05 Plan and design Present samples such that stakeholders are aware of the issues.

S06 Plan and design Agree with the business division on the business goal and goal to be achieved by the ML-based service system.

S07 Plan and design Confirm both potential users of the ML-based service systems and the number of such users.

S08 Data Visualize and observe the data.

S09 Data Confirm the processes and methods by which data are collected.

S10 Data At an early stage, discuss case in which there is insufficient data.

S11 Data Understand the customer’s system.

S12 Training Sprint and spike should be used properly in the Scrum development process

S13 Training Align with customers on the relationship between ML metrics and success measure.

S14 Training Compare the accuracy with a simple mechanism.

S15 Training Confirm the validity of prediction results.

S16 Training Visualize evaluation results.

S17 Training Interpret the ML model.

S18 Training Evaluate with stakeholders at an early stage.

S19 Training Work with customers on the metrics that can be interpreted in a business context.

S20 App. development Discuss non-functional requirements other than accuracy.

S21 App. development Consider the presentation of prediction results.

S22 Deployment and maintenance Design logic to monitor data and model.

S23 Deployment and maintenance Share the risks of model updates with users.

S24 Deployment and maintenance Discuss how to respond when the model shows anomalies.

S25 Organization Ensure personnel with business domain knowledge.

S26 Organization PMs should create a relationship between ML engineers and app developers at an early stage.

S27 Organization PMs should establish communication between the operations and development teams.

S28 Organization Check the decision makers’ level of understanding and experience with ML.

In this model, the pattern elements in the ML architecture and design patterns can be connected with relationships, such as realization, composition, access, flow, serving, assignment, and association.

The descriptions in the collected insights correspond to the “Solution in the pattern” and “Phase in which the pattern is applied” in the generic model. From the generic model, we construct specific pattern models according to the following steps:

1.
Obtain the “Objective of the pattern” by analyzing the “Solution in the pattern” described in the insight.
2.
Analyze the issue to be solved using the solution and current situation and derive the “Goal achieved by applying the pattern,” “Situation to be improved,” and “Assessment result of the situation to be improved.”
3.
Analyze the exceptions in the solution and identify the pre-condition or limitation in the pattern.
4.
Assess whether the solution can be applied only to the specific ML service system and derive the “System user” or “Device where system is running” if necessary.

Each model element corresponds to a description of the pattern documents. Therefore, the constructed pattern model can be converted into a pattern document.
5. Practice

	Plan and design	Data	Training	App. development	Deployment and maintenance	Organization	Governance	Sum
Results of workshop	7	4	8	2	3	4	0	28
Results of Surban et al.	0	5	11	3	6	3	1	29

	Stage	Best practice
S01	Plan and design	Maintain analysis infrastructures as common as possible.
S02	Plan and design	Return to review the project goal if required.
S03	Plan and design	Agree with the goals and available computing resources.
S04	Plan and design	Provide multiple candidate algorithms.
S05	Plan and design	Present samples such that stakeholders are aware of the issues.
S06	Plan and design	Agree with the business division on the business goal and goal to be achieved by the ML-based service system.
S07	Plan and design	Confirm both potential users of the ML-based service systems and the number of such users.
S08	Data	Visualize and observe the data.
S09	Data	Confirm the processes and methods by which data are collected.
S10	Data	At an early stage, discuss case in which there is insufficient data.
S11	Data	Understand the customer’s system.
S12	Training	Sprint and spike should be used properly in the Scrum development process
S13	Training	Align with customers on the relationship between ML metrics and success measure.
S14	Training	Compare the accuracy with a simple mechanism.
S15	Training	Confirm the validity of prediction results.
S16	Training	Visualize evaluation results.
S17	Training	Interpret the ML model.
S18	Training	Evaluate with stakeholders at an early stage.
S19	Training	Work with customers on the metrics that can be interpreted in a business context.
S20	App. development	Discuss non-functional requirements other than accuracy.
S21	App. development	Consider the presentation of prediction results.
S22	Deployment and maintenance	Design logic to monitor data and model.
S23	Deployment and maintenance	Share the risks of model updates with users.
S24	Deployment and maintenance	Discuss how to respond when the model shows anomalies.
S25	Organization	Ensure personnel with business domain knowledge.
S26	Organization	PMs should create a relationship between ML engineers and app developers at an early stage.
S27	Organization	PMs should establish communication between the operations and development teams.
S28	Organization	Check the decision makers’ level of understanding and experience with ML.

We hosted an online workshop at the Working Conference on Machine Learning Software Engineering (MLSE2021) in Japan on July 2, 2021. A total of 12 practitioners with some experience in ML projects participated in the workshop. All participants accessed an online canvas where the reference ML development model was presented, and posted their insights that were obtained through the ML projects on the canvas. As a result, we collected 28 insights and categorized these based on the viewpoints extended from those in [3]. Table 4 displays a summary of the collected insights.

Table 6
Constructed pattern (“Check the origin of the data”)

Intent	Avoid target leakage, whereby the ML model is trained by data unavailable at the prediction runtime.
Problem	The runtime input data for the prediction are sometimes not clearly defined when training the model. Consequently, the accuracy of the prediction cannot be obtained as expected.
Solution	Confirm the processes and methods by which data are collected.
Context	This pattern is applied when designing the overall ML system.
Discussion	A mechanism for collecting data must exist.

Figure 8.

Pattern model constructed from collected insights.

In Table 5, we show the collected insights as best practices.

From Table 5, we selected S09 “Confirm the processes and methods by which data are collected” as an example and applied the proposed pattern construction method. This insight corresponded to the description in the solution element in the pattern. By analyzing the purpose of the activity described in this insight, it was found that certain fields in the training data could not be used for the prediction because the runtime input for the prediction was not clearly defined when collecting the training data. This issue is known as “target leakage.” Therefore, the intent of the pattern was avoiding target leakage and the goal of the pattern was improving the accuracy. As a result, we obtained the pattern model for this insight, as illustrated in Fig. 8, and assigned the pattern name “Check the origin of the data.”

The pattern descriptions were converted from the pattern model. Table 6 presents the constructed pattern. Through the example analysis, it was confirmed that the RQ could be solved by the proposed method.

6. Discussion

In the proposed method, practitioners provide insights on their ML projects by referencing the development model. This reference model is based on the project practice, and insights are collected as best practices. In the implemented practice, 12 practitioners discussed their ML projects for two hours, based on which we obtained almost as many insights as those in the literature survey-based methods [3]. Therefore, project insights can be expected to be effectively collected by practitioners using the proposed method.

Table 5 demonstrates that we can collect insights that differ from those obtained using the literature survey-based method. This is because the detailed activities during the project planning stage are represented in the reference model. For example, we obtained the following insights in the project-planning stage:

•
Agree with the business division on the business goal and goal to be achieved by the ML-based service system.
•
Agree with the goals and available computing resources.
•
Confirm both potential users of the ML-based service systems and the number of such users.

The proposed method is expected to be usable in conjunction with the literature survey-based method. However, whether insights from ML project practices can be exhaustively collected using the proposed method is unclear. Furthermore, whether the number or quality of the collected insights depends on the experience or skill of the practitioners remains unconfirmed. These items must be investigated through continuous insight collection, and we plan to address this in future studies.

Through the implemented practice, we confirmed that we can successfully construct the patterns of ML projects using insights collected using the proposed method. Moreover, we can obtain a pattern description without missing any elements by converting the constructed model. This implies that practitioners can systematize the reusable knowledge of ML projects as patterns from the collected data without the support of experts with strong software engineering and ML skills. However, when constructing the pattern models of ML projects, knowing the quality characteristics required for ML-based service systems is necessary. Thus, determining the typical issues or risks in ML-based service systems and systematizing them as knowledge should be investigated in future studies.
7. Conclusions

In this study, we focused on projects for the development of ML service systems in which ML techniques are applied to enterprise functions. We considered a method for collecting insights on ML projects from the practices and construction of reusable knowledge as patterns from the insights collected. Therefore, we proposed a reference development model for ML projects by extending a generic agile development model and ML project workflow model. We also presented the steps for constructing the patterns as models from the EA-based generic ML architecture and design patterns. We collected 28 insights as best practices using the proposed method and confirmed that practitioners could collect insights effectively and that the patterns of ML projects could be successfully constructed based on the insights collected. Future studies should focus on investigating the quality or coverage of the collected insights and on systematizing the typical issues or risks in ML-based service systems as knowledge required for pattern development.

Footnotes

Acknowledgments

This work was supported by a JSPS Grant-in-Aid for Scientific Research (KAKENHI), Grant No. JP19K20416, and the JST-Mirai Project (Engineerable AI Techniques for Practical Applications of High-Quality Machine Learning-based Systems), Grant No. JPMJMI20B8.

References

Kumeno

. Software Engineering Challenges for Machine Learning Applications: A Literature Review. Intelligent Decision Technologies. 2019; 13: 463-476.

Lwakatare

Raj

Bosch

Olsson

Crnkovic

. A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation. In: Proceedings of the 20th International Conference on Agile Software Development (XP); 2019. pp. 227-243.

Serban

van der Blom

Hoos

Visser

. Adoption and Effects of Software Engineering Best Practices in Machine Learning. In: Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement; 2020. pp. 3:1-3:12.

Amershi

Begel

Bird

Deliner

Gall

Kamar

, et al. Software Engineering for Machine Learning: A Case Study. In: Proceedings of the 41st International Conference on Software Engineering; 2019. pp. 291-300.

Kim

Zimmermann

DeLine

Begel

. The Emerging Role of Data Scientists on Software Development Teams. In: Proceedings of the 38th International Conference on Software Engineering; 2016. pp. 96-107.

Takeuchi

Yamamoto

. AI Service System Development Using Enterprise Architecture Modeling. In: Proceedings of the 23rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (Procedia Computer Science vol. 159); 2019. pp. 923-932.

Earley

. Analytics, Machine Learning, and the Internet of Things. IEEE ITPro. 2015; 17(1): 10-3.

Demchenko

de Last

Membrey

. Defining Architecture Components of the Big Data Ecosystem. In: Proceedings of the International Conference on Collaboration Technologies and Systems (CTS); 2014. pp. 104-12.

Heit

Liu

Shah

. An Architecture for the Deployment of Statistical Models for the Big Data Era. In: Proceedings of IEEE International Conference on Big Data; 2016. pp. 1377-84.

10.

Zimmermann

Schmidt

Jugel

Möhring

. Evolution of enterprise architecture for intelligent digital systems. In: Proceedings of the 14th International Conference on Research Challenges on Information Science; 2020. pp. 145-153.

11.

Zimmermann

Schmidt

Sandkuhl

Jugel

Bogner

Möhring

. Evolution of Enterprise Architecture for Digital Transformation. In: Proceedings of the IEEE 22nd International Enterprise Distributed Object Computing Workshop; 2018. pp. 87-96.

12.

Yokoyama

. Machine Learning System Architectural Pattern for Improving Operational Stability. In: Proceedings of IEEE International Conference on Software Architecture Companion; 2019. pp. 267-274.

13.

Washizaki

Uchida

Khomh

Guéhéneuc

. Software Engineering Patterns for Machine Learning Applications (SEP4MLA). In: Proceedings of the 9th Asian Conference on Pattern Languages of Programs (AsianPLoP 2020); 2020.

14.

Washizaki

Khomh

Guéhéneuc

Takeuchi

Okuda

Natori

, et al. Software Engineering Patterns for Machine Learning Applications (SEP4MLA) – Part 2. In: Proceedings of the 27th Conference on Pattern Languages of Programs (PLoP 2020); 2020.

15.

Washizaki

Khomh

Guéhéneuc

Takeuchi

Natori

Doi

, et al. Software-Engineering Design Patterns for Machine Learning Applications. IEEE Computer. 2022; 55(3): 30-39.

16.

Takeuchi

Doi

Washizaki

Okuda

Yoshioka

. Enterprise Architecture based Representation of Architecture and Design Patterns for Machine Learning Systems. In: Proceedings of the 13th Workshop on Service oriented Enterprise Architecture for Enterprise Engineering (IEEE 25th EDOC Workshop); 2021. pp. 246-250.

17.

Zhang

Cruz

van Deursen

. Code Smells for Machine Learning Applications. In: Proceedings of the IEEE/ACM 1st International Conference on AI Engineering – Software Engineering (CAIN); 2022. pp. 217-228.

18.

Foidl

Felderer

Ramler

. Data Smells: Categories, Causes and Consequences, and Detection of Suspicious Data in AI-based Systems. In: Proceedings of the IEEE/ACM 1st International Conference on AI Engineering – Software Engineering (CAIN); 2022. pp. 229-239.

19.

The Open Group. ArchiMate 3.1 – A Pocket Guide. Van Hares Publishing; 2019.

20.

Lakshmanan

Robinson

Mann

. Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps. O’Reilly; 2020.

21.

Takeuchi

Kaiya

Nakagawa

Ogata

. Reference Model for Agile Development of Machine Learning-based Service Systems. In: Proceedings of the 3rd International Workshop on Machine Learning Systems Engineering (Companion Proceedings of the 28th Asia-Pacific Software Engineering Conference); 2021. pp. 115-118.

22.

Ambler

Lines

. Disciplined Agile Delivery: A Practitioner’s Guide to Agile Software Delivery in the Enterprise. IBM Press; 2012.

23.

Takeuchi

Doi

Kuno

Motohashi

. Collecting Data of Machine Learning Projects for Deriving Insights. In: Proceedings of the 2nd International Workshop on Machine Learning Systems Engineering; 2020.

24.

Mitsubishi Chemical Holdings Corporation. Machine Learning Project Canvas. Available from: https://www.mitsubishichem-hd.co.jp/news_release/pdf/190718.pdf.

Constructing reusable knowledge for machine learning projects based on project practices

Abstract

Keywords

1. Introduction

2. Related work

3. Research subject and hypothesis

3.1 ML-based service system

4.1 Overview

1. Prepare a development model based on ML project practices. 2. Derive insights from ML projects by referencing the development model. 3. Construct patterns from the collected insights. 4.2 Reference development model and collection of insights

Table 2 Activities conducted before data collection

Table 6 Constructed pattern (“Check the origin of the data”)

Footnotes

Acknowledgments

References

1.
Prepare a development model based on ML project practices.
2.
Derive insights from ML projects by referencing the development model.
3.
Construct patterns from the collected insights.

4.2 Reference development model and collection of insights

Table 2
Activities conducted before data collection

Table 6
Constructed pattern (“Check the origin of the data”)