Abstract
Manufacturers perpetually adapt their systems to meet unforeseen events, new objectives, competition, and improved understanding of processes. In that human-directed work, models mediate an enduring relationship between production resources and engineers. Accommodating new understanding in the models controlling production can lead to more effective manufacturing. That work has previously been the province of quality programs such as Six Sigma, but is now fertile ground to study human-computer interaction about that enduring relationship mediated by models. Can AI augment human capability in the arcane work of formulating and refining models? This question is relevant to complex system engineering generally, not just manufacturing. In answering this question, this paper adapts Klein’s flexecution for use in adaptable manufacturing systems. Theory flexecution, the methodical refinement of models, points to human-computer interactions that emphasize the roles of models, explanation, and machine agents that recognize the engineer’s goals. This perspective article illustrates these ideas with an example of formulating models for production scheduling.
Overview of improvement in technology for advancing patient experience
To improve system performance, engineers continually adapt manufacturing systems while the systems are in service; this has been true of manufacturing for several decades. (See, for example, Weckenborg et al. (2023).) The performance of other system types might be improved were they also adaptable by skilled users. One can imagine, for example, the value of a surgeon perfecting the operation of a robotic-assisted surgery system. But what characteristics of the system would enable this? Increasingly, the answer to this question is the ability for joint (AI/human) work with models of the system. Broadly speaking, today automation drives much of production, and models drive both automation and the routine manual activities of production. Today, the fitness of system models in the face of impending change is a principal element of manufacturing and industrial engineers’ situation awareness. The paper describes an opportunity to re-examine, through the lens of management science, the ways we use models.
The notion of model intended here is a broad one, encompassing “actionable” mental models, programs, and (sometimes executable) formal models of decision processes, plans, and digital twins. Models, whether they are mental or formal, serve to routinize production; models are what make automation tick.
The act of conceiving models, variously known as modeling or formulation, is a hallmark of intelligent behavior and situation awareness (See, for example, Endsley (1995)). It is a perennial topic of the philosophy of science. There’s a case to be made that machine agents have the ability to create models too. One may argue that machine agents are capable of conceiving of models; for example, unsupervised learning and plan recognition (e.g. Sukthankar et al. (2014)) are abstractive processes essential to formulation. But, of course, these processes are not self-acting.
Models can be shared and thereby help stakeholders establish common ground about the subject the models concern. Models can serve the role of public object in constructionist learning (See, for example, Stager (2005), Papert and Harel (1991), and Psenka et al. (2017)). This perspective article asks whether predictable patterns and conventions used in modeling in specific settings can be recognized by AI agents and thereby the agents assist in modeling tasks. Specifically, is there a role for machine agents in the tasks of formulating and refining the models driving production processes?
Joint cognitive work (JCW) is cognitive work (e.g., planning, perceiving, deliberating) in which human and AI agents interact on cognitive tasks. This article builds upon existing work in JCW including (i) Checkland’s notion of a learning system described in the Soft System Methodology (Checkland (2000)), (ii) works about establishing common ground in joint work (e.g Wray et al. (2022) and Klein et al. (2004)) and (iii) cognitive systems engineering (e.g. Woods and Hollnagel (2006)). Rather than seeing the future of manufacturing simply as isolated attempts to increase automation, it is useful to envision a future in which certain cognitive tasks are executed through a partnership between human and AI. The frontier of the effort to automate might then be pursued as JCW to methodically refine models and implement new sensing. That is the perspective on human-centered AI (HCAI) (see Riedl (2019)) and management science presented in this article. Many cognitive tasks could be performed as JCW with models. This is becoming increasingly obvious with the emergence of large language models (see Zhao et al. (2023)) (LLMs) to perform programming tasks. The article discusses some foundational concepts using as an example the joint formulation of production scheduling solutions.
The cognitive work of manufacturing has evolved over many years. Manufacturing and similar complex system engineering could benefit from a new paradigm about how these systems are to be adapted; this is discussed in the following section. Following that discussion, the article considers (enumerated by section number) (ii) the roles of situation awareness (see Endsley (1995)), scientific explanation (see Giere (1988) and Khalifa (2017)), models, and metamodels in complex systems, (iii) a framework for manufacturing HCAI called theory flexecution, (iv) an example use of the framework to formulate solutions for manufacturing production scheduling, (v) the challenge of developing a measurement science for explanation suitable to the new paradigm, and (vi) assessing where JCW is most feasible.
What has changed? What hasn’t?
The way we formulate models for manufacturing has changed greatly; the fact that we do has not. In 1832, Charles Babbage, known best for his work on the Analytical Engine, wrote an insightful book on manufacturing (Babbage (1832)) in which he said “When each processes, by which any article is produced, is the sole occupation of one individual, his whole attention being devoted to a very limited and simple operation, improvements in the form of his tools, or in the mode of using them, are much more likely to occur to his mind, than if it were distracted by a great variety of circumstances.” As this suggests, it was recognized early that the division of labor, making work more routine, made “working smarter, not harder” increasingly possible. What was missing at the time were the abilities to communicate, promulgate, and systematize advancements in practice. About one hundred years after Babbage, Walter Shewhart’s book Economic Control of Quality of Manufactured Product (Shewhart (1931)) answered this need by formalizing an actionable, mathematical, language for production quality. Increasingly, from the time of Shewhart’s work forward, the means to competitiveness has been the formalization and refinement of the models driving manufacturing. Figure 1 depicts a viewpoint on the use of models of a routine process of manufacturing that is as relevant today as it was in Shewhart’s time.

Some models describe causes in causal chains. Two such chains are depicted, one leading to product, the other to waste. The goal of analysts is to discover assignable causes (as described by Shewhart (1931)) and encode them in models to control production, thus minimizing the flow to waste. In practice, the system under study may concern a single link as depicted, or a network of such links. The latter might represent a supply chain or manufacturing process chain. The dotted-line rectangle represents a system boundary and the isolation of the analysts’ awareness from the environment, its knowledge, and exogeneous phenomena.
Manufacturing is currently in the midst of a transformative period known as the Fourth Industrial Revolution or Industry 4.0, described by Schwab (2016). This era is characterized by ubiquitous sensing, interconnectedness, and computational power. The capabilities that sparked the revolution have allowed many more concerns to be taken into account in decision making. Further, sophisticated software engineering techniques have eased the cognitive burden of modeling and allowed the composition of large-scale simulations from elementary component models. However, what has not yet emerged is model- and software-orientation in quality programs and management science. The Six Sigma1 quality program (see De Mast and Lokkerbol (2012), a popular management initiative in manufacturing, is a case in point (see Harry and Schroeder (1999) and Does and Trip (2001)). The cornerstone of Six Sigma is problem solving by a process described as define, measure, analyze, improve, and control (DMAIC). DMAIC does not take advantage of the fact that the models driving manufacturing are increasingly formal, rather than mental, and of increasingly greater scale. Software-orientation in modeling, the fact that models are now complex composite structures, adds evolutionary and cognitive dimensions to problem solving lacking in the quality programs.
The evolutionary dimension is lacking in the quality programs in the sense that those programs are insensitive to (1) the need for flexibility in problem-solving method and goal-setting (described by Linderman et al. (2003)) (2) the system’s technology readiness level, (see European Association of Research and Technnology Organizations (2014)), (3) improvements in decision support technology such as simulation, and (4) changes in corporate culture and capability, for example, those owing to mergers and acquisitions. From the cognitive perspective, the quality programs do not take advantage of opportunities afforded by (a) widely prevalent, web-assisted communities of practice, and (b) the model as a public “knowledge artifact”, facilitating communities of practice and constructionist learning.
Further, Six Sigma provides virtually no support for efficient system diagnosis (see De Mast and Lokkerbol (2012)). This may be owing to Six Sigma treating systems as black boxes understood only through exploratory data analysis where causal inference (see Pearl et al. (2016)) might be more effective (see Hopp and Spearman (2008)). In contrast, simulations and digital twins provide domain-specific knowledge facilitating diagnosis. Work in immersive analytics suggests that the tasks of foraging data and generating hypotheses can be shared among human and machine agents (see Skarbez et al. (2019)).
Dynamic capability an established concept in management science, and described by several author including Winter (2003), Sniukas (2020), and Helfat et al. (2007), provides a foundation for studying methodical improvement more comprehensivelyWinter (seeWinter (2003)) describes dynamic capability as the ability to extend, modify, or create the ordinary capabilities of routine activities (activities that allow the firm to “make a living” in the short term). Zollo & Winter (see Zollo and Winter (2002)) note three learning mechanism used in a firm’s effort to create dynamic capability. (1) Experience accumulation occurs through participation in routine activities. (2) Knowledge articulation is macrocognitive sense-making in which participants express their opinions, engage in constructive confrontations, and challenge each other’s viewpoints. (3) Knowledge encoding is the effort to formalize the best of what has been articulated, including its representation in standards, manuals, and models. Further, they argue that articulation and encoding are increasingly effective relative to tacit experience accumulation (a) among tasks that are less frequently executed, (b) when task experiences are more diverse, and (c) where uncertainty exists about the causality operating in the task.
Romme et al. (see Romme et al. (2010)) describe the effects of deliberate learning and dynamism of the business environment on the ability to change operating routines. Using a system dynamics model, they note how knowledge articulation and knowledge encoding affect the ability to change operating routines. They argue that repeated knowledge articulation unveils causal ambiguity and raises the level of awareness of the effectiveness of the firm’s processes. However, both knowledge acquisition and knowledge encoding require time that otherwise could be used to interact with operating routines (experience accumulation). Further, though tools created though knowledge encoding may have utility, they can, especially in dynamic environments, result in detrimental inertia.
Engineers refine manufacturing models to improve production outcomes. There is a limit, however, to how much a model can be refined given only the parameters from which it was originally formulated (see Smith and Browne (1993). New sensing opportunities and better understanding of causality in processes may spur re-evaluation of the model’s relation to the real-world it represents. To broaden the approach, we start with a working definition of theory used by Grier and others (see Giere (1988), van Fraassen (2009), and Hindriks (2008)): A theory is a hypothesis about the relationship between a collection of models and reality. Dissatisfaction with the operation of one’s production facility, for example, stems from one’s theory about it (what could be called the theory gap); it is very much a matter for human agency.
Among technical domains today, there are hundreds of formal languages, models, and protocols used for myriad communication and storage purposes. Among the more technical models are metamodels, models of modeling languages themselves. (This definition from Seidewitz (2003).) A structural metamodel is a model of a modeling language providing sufficient detail to serve as a storage form for the subject models. Structural metamodels provide schematic information about the represented models. The most prevalent examples of structural metamodels are the abstract syntax trees (ASTs) of programming languages. These are not unlike the sentence diagrams used in teaching the grammar of natural languages. However, designed for use in compilers, ASTs may not organize model content effectively for other purposes. In contrast, metamodels purpose-built for declarative domain-specific languages (DSLs) can highlight the modeling language’s relationship to the user’s domain problem (see Denno (1996)). Declarative languages describe what is required, not how what is required is achieved. Example declarative DSLs include Modelica (see Fritzson (2011)) and MiniZinc(see Stuckey et al. (2018)).
Structural metamodels provide an opportunity for joint human/machine work in formulating and refining models. Joint work, whether it includes machine agents or not, requires that the parties establish common ground (see Stefik (2021)), such as agreement on the meaning of terms. Illustrative of establishing common ground is the skill professors might exercise during office hours. In office hours, students visit to discuss disparate problems; the professor needs to quickly grasp the essence of the student’s question. Let’s suppose that the professor teaches a class in production scheduling using a combinatorial optimization solver like MiniZinc (see Stuckey et al. (2018)). In establishing common ground on a student’s MiniZinc problem, the professor would do well to start with the index sets the student defined in her model (See Lines 1 and 2 of Fig. 2).
In program comprehension, features of a program useful to understanding its plan and organization, such as index sets in MiniZinc, are called beacons (see Stapleton et al. (2020)). Index sets can serve as beacons because they typically enumerate things relevant to the real-world problem domain. In production scheduling, the index sets often enumerate production resources, input materials, and work to be performed. Beacons provide a strategy for machine agents to flesh out the pattern provided by the structural metamodel with information about the human’s effort to formulate a model, enabling a “conversation” about that effort.

A MiniZinc formulation of a simple resource allocation problem (not the apparel scheduling problem), assigning n tasks to n workers minimizing the cost of getting all the tasks done. Lines 1 and 2 are index sets. Line 3, backed up by data not shown, defines the cost of each worker to do each task. Line 4 describes the form of solutions. The simple declarative syntax, the choice of meaningful variable names, and the use of natural language annotations increase the likelihood that a machine agent could recognize what the analysts are seeking to achieve and how they are going about it.
Metamodels, with the help of program comprehension, provide the opportunity for joint human/machine work on models. Figure 3 depicts an architecture for joint model refinement. The bottom region of the figure represents the requirement for expert knowledge of analytical tools used in decision processes. The top region represents knowledge of situation awareness (SA). The technical constraints of model formulation include one’s ability to sense the environment and to apply the language of the model’s encoding. The middle region of the figure depicts Klein’s Flexecution (see Klein (2007)) adapted for use with formal models. Flexecution describes a cognitive process by which goals can change based on discoveries made while executing a plan (see Klein (2007)). Klein suggests that in complex settings flexecution is the norm and plans conceived ahead of time and still usable when needed are the exception. Theory flexecution is long-running flexecution where the goal is to improve a theory, the relationship between models and reality.

Theory flexecution involves refinement of models in consideration of the system environment, theory, and tool capabilities. This center portion of the diagram is adapted from Klein’s Flexecution to emphasize the roles of models, metamodels, and theory.
Some of the ideas of joint cognitive work with models outlined above were explored in an effort to formulate a production scheduling system for apparel manufacturing. The solution used a MiniZinc solver embedded in Python Jupyter notebooks (see Project Jupyter (2017)). In a field study, two analysts skilled in production scheduling and the firm’s enterprise resource planning (ERP) system worked with the author to transition from a spreadsheet-based process to one that uses the notebook. Typical of apparel manufacturing, sewing is the bottleneck process. Schedules determine the number of sewers of various skill levels that should populate each of the production lines each week to best meet customer orders.
To establish common ground in joint work, it is useful to start with things familiar to most participants. In this case, spreadsheets of sales orders contain concepts all of which are familiar to the analysts. The machine agent attached to a Jupyter notebook studied spreadsheets loaded into the notebook and made predictions about what is intended by each column or row of data. Bayesian abductive logic programming (BALP), described by Suktankar (Sukthankar et al. (2014)), was applied to perform plan recognition on the notebook. Specifically, the BALP system’s capabilities were used to hypothesize how data and variables are being used and, ultimately, about the general plan and level of completion of the notebook. As the machine agent calculates the most probable explanation (MPE) of how a plan goal is being achieved, it can verify its interpretation with the user. For example, the machine agent might not be able to find direct reference to the notion of job, but it might find three columns in a spreadsheet that look like, respectively, a date, a quantity, and a product type. Bayesian abductive inference might infer that the most probable notion of job is provided by 3-tuples of those elements.
The range of what need be anticipated in joint work with models can be significantly reduced in some applications; production scheduling is one such application. First, production scheduling involves a predictable set of activities; every production scheduling problem involves describing (1) work to be performed, (2) resources and their fitness to perform various types of work, and (3) preferences regarding how the resources should be applied to the work. Second, most scheduling problems can be described using constraints and goals. MiniZinc, with declarative constructs for constraints and goals, was designed to accommodate such descriptions (see Marriott and Stuckey (2018)). Though the field study machine agent examined both Python language and MiniZinc text, the declarative nature of MiniZinc made plan recognition from that more tractable than from the Python code.
To foster common ground about constraint-based optimization, analysts can be taught a sentence-based cognitive design pattern (see Hale and Schmidt (2008)). Specifically, combinations of columns in spreadsheets can be viewed as providing sentences describing the scheduling context. Similarly, certain sentence patterns represent the scheduling actions that are part of the resulting schedule. For example, in the apparel example there are three scheduling action sentence types:
Production on product ?p begins on week ?wstart.
Production on product ?p stops on week ?wstop.
While product ?p is being sewn, ?n sewers are on its production line.
The set of all possible scheduling action sentence sets (a powerset) can be conceived as sets of sentences completed with these variables filled by data. The problem of formulation can then be viewed as one where, first, sentences about input data are related to sentences about the scheduling actions, and then the sentence sets are winnowed by removing contradictions. For example, an entire sentence set is invalid if it contains sentences implying that a job finishes before it starts. Identified contradictions spur the specification of MiniZinc constraints. The notebook can be viewed as an expeditious approach to some of the same characteristics sought in low-code environments (see Maiya (2020) and Daniel et al. (2020)).
Beneficial socio-technical effects can be credited to the use of the notebook in the field study; the notebook served as a tool for constructionist learning. Further, having learned to be effective with computational notebooks, the analysts went on to develop a complementary notebook for forecasting the size of the workforce, a capability particularly useful during the pandemic. The notebook also proved to be useful in conversations with other stakeholders, such as plant managers. The notebook, as a public object, served to integrate system elements.
The roles of explanation
The previous section describes a simple example of JCW. In that, the sentence-based cognitive design pattern explains a viewpoint on how the formulated problem will be solved. Further, explanations provided by program comprehension using the MiniZinc metamodel focus on goals and are mechanistic in describing means. “Mechanisms are entities and activities organized such that they are productive of regular changes from start or set-up to finish or termination conditions” (see Machamer et al. (2000)). Mechanistic explanations explain why by explaining how (see Bechtel (2017)). However, in as far as the individual steps of how (the causes) are uncertain, mechanistic explanation is unconvincing. Not all the explanation that would be useful in joint work is mechanistic.
JCW needs additional modes of explaining. Manufacturability problems, for example, often involve a search for causes (not just correlation) through experimentation. The method of explanation most relevant to such investigations is interventionism through experimentation and counterfactual reasoning (see Pearl et al. (2016), Woodward (2003), and Spirtes et al. (2001)). The essential idea behind interventionism, and what distinguishes it from quality programs such as Six Sigma, is that under certain conditions, intervention in the process under study reveals a cause. Roh et al. (Roh et al. (2016)) and Witherell et al. (Witherell et al. (2014)) define metamodels for experimental exploration of an advanced manufacturing process. These metamodels federate submodels of the individual physical processes (heating, melting, thermal conductivity) of the subject additive manufacturing process. Fitting to this purpose, these are not structural metamodels but knowledge graphs built from the Web Ontology Language (OWL). Using knowledge from the manufacturer’s own experimentation with the process, as well as that of the literature, such metamodels help with the tasks “Question the theory” and “Refine the model” as depicted in Fig. 3. These are metamodels for applying an interventionist mindset in theory flexecution.
A third viewpoint on explanation, unificationism, is of less obvious value to engineering than it is to science. Unification highlights connections and common patterns in a body of knowledge (or “explanatory store”); it seeks the best tradeoff between the number of patterns of derivation used and the number of conclusions that can be generated (see Kitcher (1989)). There is obvious value in science to finding common patterns and confirming evermore general hypotheses; that could be taken as the purpose of science. The unificationist view is also valuable to JCW owing to its inherent need to weed out irrelevant elements of an argument (Kitcher, who first described unificationism, contrasts the differences between explanations of (1) how salt dissolves in water, and (2) how salt “hexed by a magician” does it (see Kitcher (1989))). Unification is a key activity in sense-making and immersive analytics (see, for example, Pirolli and Card (2005), and Skarbez et al. (2019)).
Explanation has a role in organizing activity towards dynamic capability. The three modes of explaining discussed each contribute differently to flexecution. Mechanistic, interventionist, and unificationist explanations reveal or argue for, respectively, a mechanism, a cause, and a uniting principle.
There is also the matter of classifying and organizing the individual propositions of an explanation. This can help identify the nature of the theory gap, the difference between the model and the observed reality. One method for classifying scientific explanations, demonstrated by Overton in Overton (2012), involved classifying sentences of a collection of journal papers according to how each sentence bridges five notions: theory (meaning here the “laws” of a domain of investigation), model, kind, entity, and observation. Each sentence is associated with a class defined by its participation in two of the five notions (thus the 25 classes: theory-theory, theory-model, model-kind, etc.). Recent work in natural language processing such as participant, intervention, comparator, and outcome (PICO) extraction automates a similar kind of classification (see Liu et al. (2021)).
Ultimately, a measurement science for explanation in joint work will be needed. How do explanations facilitate joint work? Theory flexecution is a methodical way to discover and bridge the gap between models and observed reality. Discovery, however, is not necessarily what motivates explanation in areas outside of joint work with models. For example, efforts towards explainable AI (XAI) (see, for example, Stefik (2021)) typically focus on trustworthiness and accountability to the end user of a model (see Dosilovic et al. (2018) and Hoffman et al. (2018)). Compared to JCW with models, XAI is more a situation in which experts are developing models for a disjoint population of users. Despite these differences, there are certainly areas where the goals of XAI and explanation for JCW intersect. For example, some measurement techniques for XAI employ a teacher/student paradigm; this may be useful for assessing explanation in JCW also (see Pruthi et al. (2021)).
Finally, there are broad implications for systems engineering were it to accommodate explanation as a distinct kind of communication. Communication of information serves the ordinary goal of eliciting behavior from other agents; communication of information allows components to act as a system (see Miller (1966)). In JCW with models, explanations are communicated to further understanding (see Achinstein (1983)). Presumably, understanding changes behavior in some enduring yet hard-to-predict ways. A possibly useful distinction between communicating information and communicating to further understanding is that the latter might be viewed as concerning type causality (see Halpern (2016)) whereas the former only concerns actual causality.
Where is joint cognitive work most feasible?
We have discussed a broad range of topics including metamodels, problem solving, dynamic capability, compositionality, declarative formulation, and the constructionist notion of public object. Design for joint formulation may draw on many disciplines, including computer science, systems engineering, cognitive science, management science, philosophy of science, and cultural anthropology. These needs may also hold for true for similar work in the emerging age of intelligent machines. We have argued that explanation plays key roles in enabling joint work with models. To summarize this discussion, the following five characteristics are cited as contributing to the likelihood that a given investigation is fit for joint human/machine theory flexecution:
Conclusion
Increasingly, models mediate the relationship between humans and the resources of complex systems. As models (and technology generally) become more complex, we risk losing touch with the rationale for their complexity, and therefore the ability to adapt them. At its highest level, the paper describes an opportunity to re-examine the roles of models through the lens of management science. Theory flexecution is offered as means to mitigate the shortcoming discussed and provide a road map for the methodical use of models, metamodels, and analytical tools. In theory flexecution, analysts negotiate goals mindful of model shortcomings, measurement uncertainty, and the capabilities of the analytical tools used. In a constructionist learning environment, engineers are introduced to a sophisticated tool little by little, and a link from the engineers’ requirements to a solution is established.
A key role of the machine agent in theory flexecution is to help produce a model in the system’s declarative domain-specific language. This invites comparison to emerging generative AI methods, which can also produce such models. Compared to state-of-the-art generative AI for direct formulation of models, theory flexecution emphasizes a long-running relationship with engineeers where rationale for the current formulation is provided. In the foreseeable future, LLMs may subsume the lowest-level work of formulating models. However, several important management tasks of the enduring relationship with the model require uniquely human awareness and methodical systems engineering. As discussed in the paper, these include (1) recognizing new sensing opportunities, (2) eliciting, recording, and maintaining system requirements, (3) validating the model to those requirements, and (4) guiding and on-boarding engineers in a constructionist learning process.
Consensus is firm that manufacturing’s future involves increasing levels of automation. To some, increased automation may suggest diminished roles for humans. It is with some irony then that we recognize that human roles in the evolution of manufacturing systems seem assured; the challenge lies in finding roles for machine agents in system evolution. The possible future described is a human-centered one, where human agency continues to direct the processes of discovery. Shneiderman (Shneiderman (2021)) notes that “Human curiosity and desire to understand the world means that humans are devoted to causal explanations, even when there is a complex set of distant and proximate causes for events.” Explanation’s role in theory flexecution could keep human agency at the center of manufacturing’s future. Theory flexecution is directed at some of the same goals as manufacturing’s quality programs. The quality programs have been refined over many years, but they have yet to take advantage of advances in simulation, modeling languages, and software engineering. Theory flexecution is a means to address these shortcomings and provide the evolutionary and cognitive dimensions of dynamic capability. The potential benefits of theory flexecution include continuous improvement, knowledge retention, training in analytical skills, and resilience.
Our near-term plans include exploring the capabilities of large language models to analyze analysts’ descriptions of scheduling problems. Technology trends such as low-code tools and executable notebooks are providing some of the building blocks for near-term progress. Long-term goals include developing a measurement science for the use of explanation in joint work.
Footnotes
Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.
