Abstract
In light of recent trends toward introducing Artificial Intelligence (AI) to enhance the Human Machine Interface (HMI), companies need to identify the key issues of the communication between operator and production machines. Despite the fact that the industrial company starts to introduce chatbots to assist the communication between humans and machines, the virtual assistant (or digital assistant) by using human natural language is still widely required in the manufacturing domain. In this paper, we introduce an AI-based virtual assistant, Bot-X, for the manufacturing industry to handle a variety of complex services, e.g., order processing, production execution. This work expands the idea in three directions. Firstly, we introduce the design motivation of Bot-X, e.g., knowledge boundary in the manufacturing context. Secondly, the design principle of Bot-X is presented, including the framework, system architecture, model architecture, and the core algorithm. Then, three scenarios are presented to test the Bot-X usability and flexibility regarding the manufacturing environment.
Introduction
In the context of Industry 4.0, intelligent manufacturing plays a key role in monitoring and controlling the manufacturing processes, diagnosing the system runtime status, making intelligent decisions through real-time communication with the sensor, device, and human operator [1, 2, 3, 4]. As the most flexible entity in the manufacturing systems, operators need to handle a versatile range of tasks, e.g., checking the production process, identifying the defects of the product, reconfiguring production systems, making decisions, and so on. It is crucial to design flexible and adaptable interfaces – human-machine interface (HMI) to enhance the interaction between machines and operators for assisting operators in solving problems and making decisions quickly and easily [5, 6].
In general, there are two main challenges in designing a flexible and adaptable HMI. First, creating an HMI that works on different tablets or mobile devices is a challenge. With the development of IT technology, the high performance with low price portable devices, e.g., tablet, mobile phone, as handheld HMI devices are getting popular in manufacturing. However, due to compatibility issues, not all handheld devices work the same way. It is hard to design a unified HMI which is suitable for various handheld devices. Second, supporting an efficient and natural communication way. The mainstream of HMI is rooted in physical interaction, e.g., keyboard, touch screen, control panel. Voice commands enable HMI is still a new area to be explored.
In this paper, we propose a virtual assistant, Bot-X, to support goal-oriented communication through natural dialogues. The first phase of building Bot-X is to outline the knowledge boundary. Most of the knowledge of manufacturing is coming from three aspects, domain expert (can also be experienced operators/managers), industry standards, and work documents. In the second phase, we propose the framework of the Bot-X and list key components that enable Bot-X. The third phase is by presenting our model architecture of two layers neural networks with Softmax regression to identify the human intents. For simplicity and efficiency, Bag-of-Words (BOW) [7] is selected to implement the feature extraction for sentence generated from human speech.
To evaluate the performance of the proposed Bot-X, we integrated it with Aalborg University Manufacturing Execution Systems (AAU-MES) and an ERP system based on an open-source platform, Odoo [8]. Experimental results demonstrate that Bot-X creates a more user-friendly and efficient production environment for both manager and shop floor operators. Figure 1 describes the system overview of Bot-X based on the above three phases.
Overview of Bot-X.
There are three main contributions of this work. 1) We proposed a virtual assistant, Bot-X, for building future intelligent manufacturing. Unlike most previous works that focus on chatbots, we pay more attention to introduce natural dialogues phenome instead. 2) We collaborated with experts who come from the manufacturing area to design the domain-specific knowledge database for Bot-X. 3) We developed a deep learning model to identify human intents from speech, which can learn the key features from the natural conversation to accurately predict the human intents.
The rest of this paper is organized as follows. Section 2 presents related work. Section 3 briefly introduces the motivation of the Bot-X and corresponding to the manufacturing cases we are currently focusing on. It also describes the Bot-X framework, architectures, and core learning algorithm. Three scenarios are defined to test Bot-X in Section 4. Section 5 discusses the benefit and limitation of the Bot-X. Section 6 concludes the paper and proposes some lines of research for future work.
In recent years, Robotics Process Automation (RPA) tools received increasing attention regarding achieving the automatic process of manufacturing tasks, e.g., Blueprism [9], Automation Anywhere [10], UiPath [11], for the configuration of task automation. Though such tools enable the rapid development and deployment of automated processes and manufacturing tasks, they still tightly depend on the running environment, e.g., operation system, programming language, device type.
The other research stream of leveraging virtual assistants can be traced back to 1966, a well-known chatbot, ELIZA [12]. It is the most famous chatbot in the history which follows rule-based architecture and able to simulate to provide sophisticated therapeutic advice. In 1971, PARRY [13], as the first known chatbot to pass the Turing test, is also developed under the rule-based architecture with a focus on clinical psychology. Though rule-based chatbots provide an efficient and natural HMI, which is also easy to be customized and controlled, it still has two limitations. They are using hand-built rules, which takes a lot of time. Moreover, it is hard to pre-define all the rules before deploying the chatbot, especially for open-domain scenarios.
On the other hand, corpus-based chatbots leverage human to human conversations to learn the pattern of communication. Such chatbots are trained on billions of words, e.g., Twitter [14], Reddit, movie dialogue [15]. Therefore, corpus-based chatbots can provide a humanlike response without hand-built rules. However, it does not consider the conversation context because it only uses the immediately previous user utterance as input. To overcome this issue, many approaches leverage the reinforcement learning to learn how to generate the appropriate responses which fit into the whole conversation context [16, 17].
To enhance the automation communication with human operators, many companies start to introduce chatbots [18, 19, 20] to the shop floor to handle tasks of different scenarios, e.g., information querying, system diagnosis, production control. Nevertheless, the major problem of chatbots is time-consuming and failed to handle natural dialogues through voice control. Due to the text-based characteristic, a conversation between an operator and a chatbot has to go through the typing. An average professional typist usually types at speeds of 50 to 80 words per minute, while the speeds of speech can reach 150–160 per minute [21, 22]. A chatbot is programmed to deal with structure dialogue and unable to understand slangs used in everyday natural conversations and analyze the sentiments of sentences. Therefore, several voice-enabled virtual assistants, e.g., Alex from Amazon [23], Google Assistant [24], Cortana [25] from Microsoft, and Siri [26] from Apple, are developed and widely used in the context of entertainment or personal-oriented [27]. The key feature of those virtual assistants is they have a robust natural language process (NLP) capacities and able to handle continuous natural dialogues [28, 29].
Proposed Bot-X
In this section, we firstly present the motivation behind the Bot-X development through manufacturing cases based Smart Production Line at Aalborg University [30]. It answers four research questions, 1) what’s the background of building such advanced HMI, 2) how to define the knowledge boundary, 3) what values it can create for the manufacturing domain, and 4) how can we promote it to different industrial cases. Then, we describe the design principle of Bot-X.
Motivation
AAU Smart Production Lab is an interdisciplinary platform for research and teaching within industry 4.0 and smart factories domain. It locates at Aalborg University in Denmark, as described in [31]. Central in the AAU Smart Production Lab is a smart production line termed AAU Smart Production Line (AAU SPL). Figure 2 shows the digital model of the AAU SPL. The later introduces cases involving the development of Bot-X are based on AAU SPL.
Aalborg University smart production line.
Robotics & Automation group of Aalborg University had been thinking about introducing the voice control service to the current SPL by using some leading techniques, e.g., deep learning, TensorFlow [32]. The motivation is to speed up manufacturing intelligent processes by creating a smart and flexible interaction channel among the machines (e.g., production equipment, ERP systems) and human operators (e.g., salesperson, shop floor worker). Therefore, the group decided to proof of concept study.
The study begins with an investigation of the smart production environment used in the case study, hosted at Aalborg University. A three-week meeting was held, including business experts, production engineers, mechanical engineers, and computer scientists. The goal was to outline the key features directly related to the virtual assistant, Bot-X. The business experts participated in the meeting to identify the critical processes of manufacturing, order generation, and materials availability checking. The mechanical engineers participated in the meeting with the aim of defining the interface, communication protocols, and parameters of the production line, including the production equipment and network. In order to monitor the production system’s status, the team identifies possible states of the production module and classifies them accordingly. Therefore, production engineers were able to select the relevant production process for production scheduling. In order to integrate the Bot-X with the production system, computer scientists help to identify the core functions of the Bot-X and design the corresponding architecture.
As a base for this research project, domain knowledge of the experienced engineers, research articles, and corporative reports obtained from empirical case studies were used to formulate refined requirements based on the explicated problem.
Considering choosing the manufacturing usage scenarios, a general production workflow, which includes sales order generation, material checking, production scheduling, production execution, is selected. Knowledge involves different automation levels. Firstly, a salesperson needs to obtain basic information, including customer contact information and product, which needs to be produced, to create a sales order. Secondly, the production manager helps to check the inventory information to make sure the current materials are enough for production and set the production deadline for the order. Then, the shop floor worker mainly focuses on the actual production process, which might involve a switch on/off machine and checks production status.
Since the knowledge to fulfill a general production may involve different partners, it is important to make sure the virtual assistant understands the knowledge behind and responds to the users quickly and accurately. Table 1shows the examples of the Bot-X knowledge boundary, goals, key vocabulary, and corresponding functions, which help to perform the related actions.
Example Bot-X knowledge repository
The value of creating such a virtual assistant is helping to identify systems information of different layers from business to manufacturing, diagnose systems’ runtime status, e.g., system error, anomaly behavior, reduce the time of typing the commands, assist operators the routine work and provide work instruction.
In our case, the team decided that our purpose of introducing a virtual assistant is to improve the level of service for existing users. The virtual assistant will be integrated with an open-source business platform, Odoo. The virtual assistant was taken to be 1) supporting the salesperson handling the sales order generation, 2) helping the production manager to check the materials, generate the work order automatically, 3) and assisting the shop floor operators in monitoring the equipment states and send a notification when production is done.
The following section describes the design principle in detail.
In general, most of the virtual assistant can be categorized as social-oriented and task-oriented. The social-oriented virtual assistants focus on daily human life, e.g., Amazon Alexa, Google Assistant. Those virtual assistants are designed to be able to carry on the conversation, which covers our daily life, e.g., weather forecasting, checking news. The task-oriented virtual assistants, on the other hand, are designed with the purpose of helping humans to complete some specific domain-related tasks, e.g., book taxis, customer services. Since the goal of Bot-X is to serve the manufacturing domain, it is naturally designed as a task-oriented virtual assistant. The domain-specific knowledge of Bot-X will focus on product production processes.
The Bot-X design processes can be summarized as follows, Speech-to-Text process, task-oriented response model generation, Text-to-Speech service process.
Speech-to-text service
The first step of Bot-X development is implementing voice recognition, that is, the Bot-x should be able to recognize the human voice and translate it into text for late processing.
There are numerous speech-to-text web APIs that can be used to power the Bot-X. The following are three popular speech-to-text services. Google Speech-to-Text API. Driven by the machine learning models, Google provides the speech-to-Text API for translating the audio to the text. It currently supports 120 languages all over the world. IBM Watson API supports the customized speech recognition model, which fits the user’s specific requirements. Seven languages are supported so far. SpeechAPI supports noise suppression from a variety of sources and speech classification.
In our work, we choose Google Speech-to-Text API service for Bot-X with the consideration of the languages supporting.
Task-oriented response model generation
Since the Bot-X is developed to serve the industrial area, the dialogues are most manufacturing domain-specific instead of open domain conversation. Therefore, the task-oriented virtual assistant is chosen. It helps to take control of the dialogue or direct the customer to respond with a keyword. For this purpose, we emphasize potential problems that could occur during the manufacturing processes.
Data collection. To generate the response model, the first task is to collect data. The data sources should come from departments which directly related to manufacturing. We outline three questions for data collection: 1) which department might need to interact with Bot-X directly? 2) what are the key information (e.g., order information, customers information, production status) we need to focus on? 3) what are the common questions-response pairs could be? By clarifying these research questions, we are able to narrow down the domain knowledge and data pre-process (e.g., database building, questions-response generation).
Natural language processing workflow.
Model developing. In order to be able to carry on natural conversations with human preciously and efficiently, virtual assistant is designed according to natural language processing workflow, see Fig. 3. The main focus are four processes of responses model generation. Each of the process stages contains corresponding algorithms working in parallel. Figure 4 describes the four core processes, Parser, Analyzer, Executor and Generator. The non-structure raw data of the pre-defined the scenario, which covers the business level, manufacturing level, and control level, needs to be extracted as the data source. The above data will be required to apply pre-process, which focuses on reduce the vocabulary and interference words (e.g., punctuation). A semantic Parser component takes the sequence of words as input, normalizes, and extracts features from the natural language text generated by Speech-to-Text service. It helps to infer a semantic meaning representation in terms of identifying key named entities from the sequence of words. Clarification questions are used to confirm the user’s commands. For example, Bot-X asks a question, who is the customer, to confirm the customer name when it performs a sales order generation task. Analyzer helps to achieve the human intent identifying and scoring. The pre-defined the question-answers pairs and user profiles are used to support the model training and updating. The final response will be generated from the response template according to the scored response through the Generator. The response is selected based on the template, and the models will be updated according to the new data sample via Executor. To achieve the above processes, most recent works start to focus on building the learning models by using deep learning techniques. For the building model of Bot-X, the problem is considered as identifying the user intents to choose the correct answer. For this purpose, we use a fully connected feed-forward network. Two hidden layers are chosen [33]. The feed-forward network is able to learn specific relationships of tokens such as production and sales order because they appear together in a sample data but in different positions. By using the feed-forward networks, Bot-X can easily pull the question pattern out of the data and identify the human intent.
Four processes of response model generation.
The last step of the Bot-X is to convert text, generated text response, to spoken audio quickly and easily.
There are several speech-to-text APIs that are available. The following are three popular speech-to-text services. Pyttsx, as a popular text-to-speech wrapper, is widely used in both academic and industrial areas. It provides different speech engines for Mac OS, Windows, and Linux systems. gTTS, on the other hand, developed by Google supports many languages and provides a near-natural human sound. Amazon Polly leverages the deep learning technologies to synthesize the speech, which sounds a human-like-text-to-speech voice. It supports a variety of languages and allows customers to create a speech-enabled application by calling the APIs [34]. IBM Watson. Powered by IBM, the Watson provides a similar user experience like Polly. It helps to convert the text into natural-sounding audio and supports various languages [35].
In order to enhance the user experience, Bot-X not only supports the Pyttsx library for the offline scenario but also leverages the Amazon Polly service to deliver a natural-sounding voice that enables a high-quality voice output for the online end-user.
System architecture
Bot-X is a task-oriented virtual assistant, which is powered by three key layers, i.e., library layer, dialog layer, and application layer. Figure 5 shows the framework design of the Bot-X.
Library layer. This layer includes all core libraries, e.g., tflearn, Pytt3, boto3, tensorflow, which provides the necessary functions and API for Bot-X to be able to understand text, train model and generate a response.
Dialog layer. In the dialog layer, two key modules, intents identify the module and named entity named-entity recognition (NER) module, are implemented. We use two-hidden-layer feed-forward neural network to ensure the intents identify module can preciously and efficiently understand the human utterance. The key information extraction is implemented through the NER module by using trained Spacy model or Wit.ai supported by Facebook. Furthermore, a knowledge repository, which holds the labeled sample dataset, is also created in this layer.
Application layer. In order to support the manufacturing domain, several applications interfaces are designed in this layer, e.g., sales order generation, work order generation, production control. In our case, those interfaces are integrated with a popular open-source ERP systems – Odoo, and AAU Manufacturing Execution System. More details of integration and experiment are explained in Section 4. Moreover, Bot-X also supports some extra functions, e.g., checking the calendar event, human emotion detection. Those extra functions help to track the daily production schedule and identify worker’s mental conditions (e.g., tired, happy) by analyzing the facial expression.
Framework of Bot-X.
In this section, we first describe the software architecture of the Bot-X, then we present the model architecture.
Since Bot-X is Python-based application, it can be run on a different platform, e.g., Widows, Linux. There are four core components of Bot-X, dialogue I/O, NLU, dialog manager component, and NLG manager component. The dialogue I/O component mainly focuses on translating human voice to the text and vice versa by leveraging Google speech recognition service and Amazon polly service, respectively. NLU component helps to parsing the user’s utterance and identifying the human intent. The dialog manager component helps to track the dialog state and choosing the pre-defined action policy for application calls. NLG manager component generates the most reasonable response.
The software architecture of Bot-X.
The scoring model architecture is a two-hidden-layer feed-forward neural network. The input layer consists of the 79 features which are extracted from the questions-response pairs. Each of the two-hidden-layer contains 16 hidden units. The fourth layer is the final output layer. It contains 9 outputs units, which are probabilities of the intents. These output units are computed by applying a Softmax activation function. The model is illustrated in Fig. 7.
Computational graph for scoring model.
As we described in the previous section, the proposed approach is using Bag-of-Words model to handle questions in the manufacturing domain. The model helps to identify entity names and tags (e.g., product name and quantity) from the user’s utterance for known/unknown entities and tags. If the user’s intent is identified, but the entities are not found, the Bot-X will ask the clarification questions to confirm the entities. If both entities and tags are present, the Bot-X will dispatch an API call to perform action(s) and return the responses followed pre-defined templates. The following are pseudocode of our main algorithms.
Case study
Manufacturing environments are typically highly organized, structured, controlled, and rule-based. The correlated business operations and processes are normally located upstream. Salespersons and shop floor workers/robots do not share the same working space. Therefore, it is important to have a unified platform that can share or monitor the whole processes for the different types of users in a manufacturing company. As a testbed scenario, we chose an open-source business platform – Odoo, which provides modules and interface for both business and manufacturing purposes. Thus, by implementing the Bot-X into this platform, the entire production processes, i.e., sales order generation, manufacturing order creation, and production execution, can be integrated together as an integration solution for small and medium-sized enterprises (SMEs) to build a more efficient and productive intelligent manufacturing factory. An example of the current work environments is shown in Fig. 8.
The Bot-X is tested in different scenarios where the focuses were on assisted sales order generation, assisted manufacturing order creation, and assisted production control. The demo is available online.1
In the first scenario, Bot-X extracts the sales information from the natural conversation with the user. Such information consists of the customer name, product name, and product quantity. Usually, those information is manually entered into the ERP system by operators. To this end, an operator initials a dialog with a human user to command the Bot-X to perform a task, e.g., produce 6 flowers (the flower is made by Lego bricks) for John. The Bot-X uses the convolutional neural network (CNN) model to predict the human intent and infer the named entities from the user utterance. The clarification dialog is used to update the confidence from a user response if Bot-X failed to identify the entity from user utterance. Table 2 gives some examples of the policy of clarification dialog. Based on the confirmation question responses, the Bot-X stores the identified entities and maps them to the input parameters of the selenium API calls for sales order generation. A conversation concludes when the operator has confirmed the entities. Then, Bot-X invokes the selenium API to open the ERP system and automatically perform the sales order generation process, i.e., searching and choosing the customer and product from the system and assigning right quantity number to the sales order.
Samples of Bot-X dialog policy of sales order generation
Samples of Bot-X dialog policy of sales order generation
Experiment environments.
The second scenario is related to the task of creating a manufacturing order: the Bot-X checking the inventory to verify if the materials are enough to produce the product. This task is often carried out in the warehouse: once the sales order is specified, it will be automatically sent to the warehouse. The employee of the warehouse needs to check the bill of materials (BOM) from the database or inventory system. The manufacturing order will be created if the reserved quantity of required materials is enough to perform the production. Though such task does not require high skills worker, massive amount of BOM confirming and verifying by a human operator might lead to low efficiency and accuracy. Therefore, Bot-X can easily perform the above task preciously with high reliability. After receiving the voice commands, the Bot-X can directly invoke the selenium API to tracking the materials in the inventory system and verify if the stored materials meet the requirement of the sales order. The clarification dialog is also used to confirm human intent. Table 3 shows some examples of the clarification dialog.
Samples of Bot-X dialog policy of manufacturing order generation
The third scenario we tested is production control. Normally, production execution is started by shop floor operators. This task might require an operator to manually manipulate the physical switch or click the button on a touch screen through an HMI. Though some physical operations have to be done manually, other operations (e.g., sending the commands to the PLC of a production module) may be able to perform through a virtual channel. Bot-X is then expected to retrieve the information of manufacturing order, production scheduling, and equipment status from the AAU-MES. In our case, Bot-X plays a role as a middle agent between the production module and Odoo ERP/AAU-MES system. It is invoked by the user to ask a simple command, e.g., please start the production now. After identifying the human intent, the Bot-X will start to check the production schedule from the AAU-MES to see if any production is planned and needs to be executed. The Bot-X will use clarification dialog to confirm with the operator if the production is ready or not, i.e., the planned production is found and all the production modules are ready to work. The programmable logic controller (PLC) of production modules will start to control the production module to perform the production after receiving the commands and production parameters from Bot-X. Table 4 shows some examples of the dialog policy of production control.
Samples of Bot-X dialog policy of production control
To overcome the shortcoming of RPA, we developed Bot-X to support different platforms with a smooth learning curve. Python, which is in use at many places as an integration language to glue existing components are written in other languages (e.g., C/C++, Java), is chosen as a back-end programming language in this work. Thus, the “brain” of the Bot-X can be deployed on any server regardless of the hardware/software compatibility issues. The handheld devices only need to provide a user-friendly front-end interface that receives and transfers operators’ requests and leaves the computation part to the back-end server. The front-end interface is enabled through Bot-X combing with manufacturing domain knowledge to facilitate human-machine communication for building intelligent manufacturing.
Through the natural dialogue, the Bot-X understands the user utterance, which is grounded in the manufacturing environment, and natural language systems can help to extract valuable information for later operations based on the semantic analysis. However, there is a number of challenges that still need to be addressed before we deploy the virtual assistant in the production line, for example, ambient noise. The above experiments show that the microphone picked up a lot of ambient noise during the conversation. The accuracy of identifying the entities from user utterance is influenced by the ambient noise. Currently, Bot-X sets the dynamic energy threshold to automatic. It helps to adjust the microphone to filter the ambient noise automatically. However, the adjustment process is also time-consuming, which may lead to low efficiency.
Conclusions
In this work, a voice-controlled virtual assistant, Bot-X, is developed and applied to manufacturing environments. In particular, the Bot-X is tested in three scenarios to demonstrate the flexibility and easy of use by introducing natural language interaction between humans and machines.
Different from the entertainment domain, the manufacturing environment mainly focuses on the limited, fixed, and atomic actions, e.g., power on the machine, pick up the materials, create a sales order. Introducing higher levels of interactive dialogue leads to reduce the complexity of operation systems (e.g., ERP system, production system), reduce repetitive operations, and improve productivity. In particular, the most important aspect of this kind of interaction is the virtual assistant learned to understand human intents and executed commands without human intervene.
Future works will focus on how to reduce the ambient noise influence on Bot-X. Besides, several interviews are planned to collect feedback from our industrial partners regarding design user-friendly dialogue templates.
Footnotes
Acknowledgments
This research work is partially funded by the Manufacturing Academy of Denmark.
Author’s Bios
Chen Li holds a PhD in computer science and technology from Shanghai Jiao Tong University. He currently works as an Assistant Professor in the Department of Materials and Production at Aalborg University. His research areas are System Modelling, Big Data analysis, Smart Production, Predictive Maintenance and Human Machine Interface. His goal is to innovate and apply the experience he matured to help to build an AI-based platform for management, monitoring and optimizing the manufacturing systems of Industry 4.0. Contact him at cl@mp.aau.dk.
Hongji Yang is a professor in School of Informatics at Leicester University, UK. He received his B. Sc. and M. Sc. degrees in computer from the Jilin University, China, in 1982 and 1985, respectively, and the Ph. D. degree in computing from Durham University, UK in 1994. His research interest covers Software Engineering, Creative Computing, Web and Distributed Computing. Contact him at hongji.yang@leicester.ac.uk.
