A semi-conditional grammar, introduced by Gheorghe Păun, is a form of regulated rewriting system where each rule consists of a context-free core rule along with two strings , . The rule is applicable if (the positive condition) occurs as a substring of the current sentential form, but (the negative condition) does not. The maximum lengths i, j of the positive or negative conditional strings, respectively, give a natural measure of descriptional complexity, known as the degree of such grammars. Employing several normal form results on phrase-structure grammars as derived in particular by Viliam Geffert, we improve on previously obtained results by reducing the number of nonterminals of semi-conditional grammars of a given degree while maintaining computational completeness of the said mechanisms.
One of the key areas in modern formal language theory is descriptional complexity. In a nutshell, this area investigates how succinct can a grammatical device, automaton, or rewriting system represent a formal language class (with respect to a certain complexity measure). You can also see this as applying the parsimony principle (or Occam’s razor) to mechanisms that describe formal language classes. This is particularly interesting for devices that are able to describe all recursively enumerable languages, as they are then as powerful as any computational device might ever be according to the famous hypothesis of Church and Turing. For more details on this part of formal language theory (or computability theory) and on applying the parsimony principle, we refer to [6].
In this paper, we focus on the nonterminal complexity of semi-conditional grammars of a certain degree so that they still achieve computational completeness, i.e., they still characterize the class of recursively enumerable languages (denoted by RE). This includes questions like: Does there exist a semi-conditional grammar for any RE language L such that has degree and uses only six nonterminals? We will answer this question affirmatively in this paper, while it is left as an open question if five nonterminals suffice for this degree.
To arrive at such results, it is often helpful to make use of other similar results and to simulate the corresponding devices. Among the most often used ones are the normal forms for type-0 grammars provided by Geffert in [12], because they also use very few nonterminals to characterize RE.
Descriptional complexity measures as studied in this paper were first introduced by Gruska concerning context-free grammars in [13,14] under the umbrella of classification of context-free languages. Clearly, this study was focussing on subclasses of the context-free languages, and in contrast to what can be observed when describing recursively enumerable languages, especially bounding the number of nonterminals in context-free grammars yields an infinite hierarchy of language classes. To underline the importance and history of nonterminal complexity in the context of regulated rewriting, we refer to [2,7,22].
A semi-conditional grammar (in short, SCG) is an extension of context-free grammar (introduced by Gh. Păun in [21]) in which each rule is associated with two strings called the permitting and forbidden1 strings. A rule can be applied to a sentential form w only if w contains the permitting string (i.e., positive context) and does not contain the forbidden string (i.e., negative context) as a subword. The maximal length of the permitting string (say i) and the forbidden string (say j) constitutes the degree of the grammar and is denoted by . A SCG is termed simple if for each rule at most either the permitting string or the forbidden string is present; refer to [19]. If both these control strings are absent in a rule, then the rule is said to be an unconditional. Otherwise the rule is termed conditional. In the field of SCGs, we would like to point to [15–17,20,22], mainly contributed by Meduna, Masopust, Vaszil and Okubo. It is known that even SCGs of degree are computationally complete, i.e., they characterize the language family RE; see [16]. However, the published constructions do not provide a bound on the number of nonterminals for these grammars.2 This is why we have put a * in the corresponding entries of the table, as it is usual in such a situation, and in fact, one could replace the expressions involving also by *, but we refrained from doing so, as we consider the present formulation to be more informative.
Survey on nonterminal complexity results for semi-conditional grammars.
For SCGs of degree , we improve on the previous result (due to [20]) of seven nonterminals, bringing it down to six. For semi-conditional grammars of degree , also nothing better than the sufficiency of seven nonterminals was known. We reduce this bound to five by a non-trivial use of another normal form result of Geffert. We survey the previously known results and the new results obtained in this paper in Table 1, along with indicating the number of conditional rules. In Table 1, T refers to the terminal alphabet and to a subset of the context-free rules of the given type-0 grammar of a special form; see Remark 2. Notice that if we neglect the number of conditional rules as a descriptional complexity aspect, our results improve on all previously published ones, sometimes trading off degree parameters and number of nonterminals.
Observe that even when we put the size of into the account of the number of conditional rules, this does not mean that we necessarily have arbitrarily large numbers of them to get, for instance, non-recursive (undecidable) languages. To the contrary, recall the famous undecidable halting problem language of Turing machines: as this is a single language, there is clearly some bound on the number of conditional rules that is necessary to generate this with a semi-conditional grammar, say, with degree and six nonterminals. As there do exist unary non-recursive languages, also the term does not matter too much if we are concerned about describing undecidable languages with small resources.
Survey on nonterminal complexity records for simple SCGs.
Table 2 lists the current nonterminal complexity records for the simple SCG variation, showing that this syntactical restriction gives rise to further resource requirements to represent all RE languages. from SCG requirements to represent RE languages. This table also raises the interesting question if one could find a way to represent RE with less than five nonterminals in general SCG when increasing the degree to or bigger.
In order to show our results, we make use of several normal forms for type-0 grammars that were presented in [12]. This is somehow special, as previously (mostly) the normal form with non-context-free deletion rules and was used. We also derive several useful properties of the said normal forms and we use them later while proving our results.
This paper is an enhanced version of the previous conference paper that appeared in the proceedings of CiE 2018; see [8]. In comparison with the conference version, here we present full proofs as well as further discussions of our results that had to be suppressed in the conference version due to page length constraints.
Preliminaries and definitions
It is assumed in this paper that the reader is familiar with the fundamentals of language theory and mathematics in general. Let denote the set of non-negative integers. Let denote the free monoid generated by a finite set Σ (called alphabet) under an operation termed concatenation, where λ denotes the unit of , also called the empty string. Any element x of is called a word or string (over Σ) and is the reversal of the string x, reading it in reversed order. Any subset of is called a language. A word v is a subword of if there are words u, w such that . Let denote the set of all subwords of . Clearly, is a finite language.
Semi-conditional grammars
Following [21], a semi-conditional grammar is a quadruple , where
N is the nonterminal alphabet,
T is the terminal alphabet , so that defines the total alphabet,
is the start symbol,
P is a finite set of rules of the form with
,
,
, where is a special symbol, intuitively meaning that the condition is missing.
The rule (with l being the label of the rule) is said to be conditional if or . String α is permitting, while β is forbidden, as formally explained now. The rule labeled l can be applied on a string if and only if and ( or ) and ( or ). Under these conditions, will be transformed into , which is denoted by .
When no confusion exists on the rule being applied we avoid mentioning the label and we simply write . The language of G is defined as
where denotes the reflexive and transitive closure of ⇒.
A semi-conditional grammar is said to be of degree, where , if in every rule of P we have and whenever . It is termed simple if every rule has at most one non-empty condition. Formally, a semi-conditional grammar G is simple if implies .
Geffert normal forms
Following traditional notation, a type-0 grammar is a quadruple , where N is the nonterminal alphabet, T is the terminal alphabet, P is the finite set of rules of the form , where and , and .
In [12], quite a number of normal forms for type-0 grammars have been derived. They all differ by the number of nonterminals that are used and also by the number of non-context-free rules. We will hence speak of -GNF to refer to a Geffert normal form with n nonterminals and r non-context-free rules. However, all these normal forms characterize the class RE.
The best known such normal form is the -GNF with nonterminals S (the start symbol) and A, B, C, D that uses context-free rules with S as its left-hand side in phase one; in a second phase, non-context-free deletion rules and are applied to finally derive a terminal string. More precisely, in the first phase, in stage one, terminal symbols are produced to the right of the nonterminal S (which is the only nonterminal used in left-hand sides of rules in the first phase) and codifications thereof (using the alphabet ) are produced to the left of S at the same time. Then, in stage two of the first phase, strings over are produced to the left of S while strings over are produced to the right of S, in a sense coding a correct derivation process. Phase 1 ends by replacing S by λ or by . Phase 2 then checks the correctness of all guesses from the first phase. More details on the behavior of Geffert normal form grammars can be found in the next section.
In [12], it was also proved that every recursively enumerable language is generated by a type-0 grammar with three nonterminals only, i.e., is the set of nonterminals, T is the set of terminal symbols, S is the start symbol and P contains the context-free rules of the form and the only non-context-free rule of the form . We therefore call this -GNF for short. Another variant of GNF uses four nonterminals S, A, B, C and two non-context-free rules , and will be called -GNF in this paper.
Properties of -GNF and -GNF
We need the following properties of -GNF that follow from its construction from the better-known -GNF with nonterminals S, A, B, C, D; the rules of this normal form are transformed by applying the morphism , , and (otherwise behaving like the identity) to obtain a grammar in -GNF. So, let be a type-0 grammar in -GNF. Then, the following properties are true, as they mostly follow from what is well known about -GNF.
No sentential form w that is derivable in G contains a substring .
If , then w contains at most one of the three substrings or or , called the central part. In the latter two cases, the derivation is stuck and only the former case will yield a fruitful derivation. The central parts , or are necessarily in the position where (in phase one) the start symbol S was sitting.
If and w contains no substring , then either or there is no way to terminate this derivation.
A derivation with works in two phases. Phase 1 actually splits in two stages. In the first stage, rules of the form are applied, with and , while in the second stage, rules of the form are applied, where and , with . The first phase ends with applying a rule of the form , where u, v are as in the previous phrase. In phase two, the deletion rule is applied.
In view of Property 1, we can assume that in any sentential form that is derivable according to G, there is a symbol X occurring immediately to the right of a substring and X will satisfy .
We collect all rules with plus the only rule within , and we can do this for any form of Geffert normal form grammar that we discuss. This set of rules is also contained in Table 1. The reason of this definition will become clear in the proofs of the correctness of the simulations that we introduce next. But the intuition behind this definition of can be understood already at this point. We have to stop malicious derivations that are possible when ‘restarting’ the simulation of Phase 1 within the already started simulation of Phase 2, which is potentially possible, as we like to re-use the nonterminal S in the simulation of Phase 2. But even if this unintended, malicious simulation could start, at some point rules from must be applied in order to be able to derive a terminal string. It is therefore sufficient to block this malicious derivation by disabling the simulation of any rule from . This also explains why the cardinality of enters Table 1.
To give an example how most properties easily follow from what is known about the Geffert normal form using and as deletion rules or can be shown by a simple direct induction argument for -GNF, consider the first property. As it is easily observed by the rules or with and , any sentential form in such a grammar is element of , and looking a bit closer, even of . The morphism described above yields the claim for -GNF.
The last property of -GNF needs some further explanation. If , then the claim follows with Property 1. If , then there might be no X (immediately) to the right of at all. We fix this as follows. If , we consider a -GNF grammar for (with start symbol S) and add the rule to it. Notice that the grammar constructed by Geffert contains rules of the form if and only if it contains rules of the form , and the mentioned rules are the only ones involving S, so that this additional rule does not harm Geffert’s construction.
We shall now consider the properties of another normal form -GNF whose non-context-free rules are , . This normal form can be obtained from -GNF with the morphism , , and .
No sentential form derivable in the grammar contains any of the substrings , , , . Also substrings of the form are not possible (in other words, after the occurrence of B, then A is not possible to occur in the sentential form).
If , then w contains at most one of the two substrings , (called the central part) at most once. This also rules out and as substrings of sentential forms.
Again, the derivation proceeds in two phases, the first one split into two stages. Only in phase two, the central part (either or ) will appear.
If at some point in the second phase, the intended central part (either or ) disappears and instead either or shows up in the center (notice that these strings can always show up to the left or to the right of the central part), then the derivation is blocked.
All these claims can be seen by reviewing corresponding results on the basic -GNF and observing the effect of the described codings. Let us explain how to see these properties (that we will make use of in the constructions and simulations throughout this paper) further.
We carefully walk through the construction described in [12]. Geffert did not start with a characterization of RE by grammars or Turing machines, but rather by pairs of morphisms. Namely, as a first step (Theorem 1), it is shown (referring to [11]) that each RE language can be represented by morphisms , where and are some carefully chosen alphabets satisfying , with
Looking at the proof, it becomes clear that is not erasing, while is erasing, i.e., there is some such that .
This theorem is used to prove that each RE language can be represented by some Extended Post Correspondence Problem (EPC), yielding Theorem 2. An instance P of EPC is given by a set of r pairs of binary strings , with , and some binary string for each symbol . Then,
The proof of Theorem 2 makes use of Theorem 1 by using unambiguous binary block encodings c of the letters in . Then, many pairs are created by setting and for the ith letter . Similarly, for each , we set . Notice that none of the constructed in this way is empty, while exactly one equals λ. Also, none of the is empty.
Such an EPC P is then used to show the well-known GNF results. For instance, for the normal form having rules and , one uses the morphisms with and , as well as with and . Then, add rules for and rules , for . Whenever one starts using any of the rules (and none of the non-context-free deletion rules), some non-empty word over will be present, and if S is in the sentential form, this word will be to the left of S. Moreover, there is exactly one rule of the form for some . Returning to Footnote 3, it is clear that the sub-phases applying rules of the form and of the form can get mixed, but in that case, Geffert could prove that a resulting sentential form cannot derive a terminal string. Similar remarks are true for the other GNFs, as they only differ in the encodings.
For instance, we obtain the -GNF by re-defining as and , as well as by setting and . Again, whenever one starts using any of the rules (and none of the non-context-free deletion rules), some non-empty word over will be present, and if S is in the sentential form, this word will be to the left of S. Moreover, there is exactly one rule of the form for some .
For -GNF, we take as and , as well as with and . Whenever one starts using any of the rules (and none of the non-context-free deletion rules), some word from will be present, and if S is in the sentential form, this word will be to the left of S. Moreover, there is exactly one rule of the form for some .
Semi-conditional rules with 6 nonterminals simulating and and .
Main results
In this section, the main results of this paper (as collected in Table 1) are presented and proved. Recall that all these results show that even under rather strict bounds on the descriptional complexity of semi-conditional grammars, all recursively enumerable languages can be generated. The proofs are all based on simulating type-0 grammars that are obeying some variants of Geffert normal forms, which are known to characterize the recursively enumerable languages as explained in the previous section.
In all simulations, it could happen that sentential forms are derived in the simulating grammar by using context-free rules only that consist of a mixture of nonterminals and terminals (not necessarily the terminals are at the end of the derived string), see Footnote 3. However, one can easily observe for each of the simulating grammars that the derivation will finally be stuck. Therefore, we do not discuss this case in details for each simulation separately.
In the next theorem, we employ the daring and novel idea of reusing the nonterminal S back in the derivation while simulating Phase 2 of the GNF grammar derivation with the aim to further reduce the number of nonterminals, but at the cost of more conditional rules, because (having S is present again) we need to avoid unintended ‘restarts’ of the simulation, see Remark 2.
For each recursively enumerable language L, there is a semi-conditional grammar of degree with only six nonterminals that describes L.
Consider some recursively enumerable language L represented by some type-0 grammar
in -GNF. G is simulated by a semi-conditional grammar of degree , where .
The particular non-context-free deletion rules and of G are simulated by the semi-conditional rules given in Fig. 1 where the last rule is considered for each context-free rule of G.
The intended derivations of phase one of G can be simulated by literally the same rules (where only the forbidden context $ was added that has no influence here as $ is not present until end of phase one). Observe that it is not possible to start such a simulation when still being in the phase one simulation, as or must be present as a substring, which can only happen after finishing with the simulation of phase one. The transition from phase one to phase two is done by (simulating) a rule of the form with .
In phase two, we consider a sentential form , where , and with . If u ends with A and v starts with B, i.e., and , can be applied. In the simulating semi-conditional grammar, we find
If instead is a substring of the current sentential form , this means that and , and now is simulated as follows:
By induction, this argument shows that .
Conversely, consider a sentential form w derivable by the simulating grammar that is also derivable by the -GNF grammar G. Assume first that w contains some occurrence of S. As w is derivable in G, neither $ nor # occur in w only rules of the type apply, . In particular, Rule 3 is not applicable. Hence, for the string derivable from w in one step we also find . This means that we are simulating phase one of G faithfully with .
We are left with the case that w does not contain S. As w is also derivable in G, we are in (the simulation of) phase two. Still, w does not contain any occurrences of $ and # either. If w is a terminal string, nothing remains to be shown. So, w contains some occurrences from and satisfies Property 1 of -GNF. Hence, . All rules but Rule 1 and Rule 4 require that . If the central part is a substring of w, then Rule 1 applies, starting an intended simulation cycle as discussed above. We will look into this intended derivation later.
If is a substring of w, then , with , and (by induction), so that only Rule 1 is applicable. This leads us to , replacing any occurrence of A by . As , only Rule 2 is applicable on . The resulting string is obtained from by replacing some occurrence of B by #. As , only Rules 3 or 6 apply. As will have S occurring to the right of the only occurrence of $, Rule 6 cannot be applied, leaving us with Rule 3 as the only possibility. In order to have , is enforced due to Properties 2 and 3 of -GNF, as there are no other occurrences of the substring in w but the one that we singled out. Hence, with , we know that . Due to the forbidden contexts and due to the absence of subword or , only Rules 3 or 6 might apply, with Rule 3 being disabled, as . In order to apply Rule 6, is required. Hence, follows, as the arguments on the intermediate sentential form are the same. As , Rules 1, 2, 3 are not applicable (and of course, no rule from phase one). As , Rules 5 and 6 do not apply, either. Hence, we are left with either applying Rule 4 or Rule 7 next. However, applying Rule 4 would require to appear as a substring in , which would mean that appears twice as a substring in w, contradicting Properties 2 and 3 of -GNF. Confer our definitions of and at the beginning of this paragraph. Rule 7 would be the intended rule to be applied on , leading us to some that satisfies by applying the rule from G, this way finishing our argument in this case.
The argument for the case that is a substring of w is along similar lines. If is a substring of w, then , with , and (by induction), so that only Rule 4 is applicable. We arrive at a string where some occurrence of C has been replaced by . By the structure of , only Rule 5 is applicable now, replacing some other occurrence of C by #, leading to a string . Due to the forbidden contexts and due to the absence of subword or , only Rules 3 or 6 might apply, with Rule 3 being disabled, as . In order to apply Rule 6, is required. This situation is only possible if and . So, with , we are forced to . The argument continues as in the previous case, as the u, v discussed here are also possible as , discussed previously.
We can also rule out using rules . We might attempt to use some rules with when facing (obtained from by replacing one occurrence of A by ) or . In both cases, $ occurs, ruling out the application of rules . But this means by definition of that for all rules that are applicable to or to .
However, if we attempt to err from , then some string from will be created to the left of S. This prevents Rule 2 from being (ever) applied and hence there is no way to get rid of $ again by applying Rule 6. But this is the prerequesite to (ever) delete S, because surely there is no # to the right of S, disabling Rule 3. Hence, this derivation can never terminate.
If we try to start using the context-free rule simulations from on, we might be tempted to create a non-empty string from before applying Rule 3. This would give us (to mention the shortest possibility) , which would actually allow us to have an unwanted terminating derivation. This case is therefore stopped by having the only rule of the form in G (that is responsible for this unwanted derivation) to be guarded in by putting into its rule set, as .
By induction, we see that .
By replacing the use of symbol S by a new nonterminal , we can avoid using an unbounded number of conditional rules. This would yield a result nearly matching the one of Okubo [20].
It is open if the number of nonterminals can be further reduced for semi-conditional grammars of degree . In the following result, we show that if we increase the (maximum) forbidden context length from one to two, then with 6 conditional rules and 6 nonterminals we can obtain a computational completeness result. Notice that we are not aware of any previous similar results for semi-conditional grammars of degree for any i.
Semi-conditional rules with 6 nonterminals for simulating and .
For each recursively enumerable language L, there is a semi-conditional grammar of degreewith only six nonterminals and six conditional rules that describes L.
Consider some RE language L represented by some type-0 grammar
in -GNF. G is simulated by a semi-conditional grammar of degree , where .
The particular non-context-free deletion rules and of G are simulated by the semi-conditional rules given in Fig. 2. The idea behind the construction of the conditional rules of as follows. Recall that a typical sentential form in phase two of a grammar in -GNF consists of a prefix from , followed by either or , followed in turn by an infix from ; the string terminates with a suffix from . The central parts and are encoded as and , respectively. More specifically, the A in is encoded by $ and the B is encoded by #, using Rules 1 and 2. The negative conditions of these rules avoid applying them repeatedly. Similarly, in the central part , the first C is changed to and the second C is changed into , using Rules 3 and 4. When a subword is present in the string, then it means that or are encoded properly and therefore the markers $ and # can be deleted, using Rules 5 and 6. Malicious derivations are avoided with the following strategies. (a) As Rule 3 introduces , its application cannot be followed by applying Rule 2 due to the forbidding condition of Rule 3. (b) The positive condition in Rule 4 avoids applying Rule 3 on the second C of the central part or anywhere in the non-central part and Rule 4 afterwards. (c) One other possibility is that a substring can also be encoded to when a substring is also present in the string (by Rules 1 and 4). However, in that case, the central part is untouched and the negative condition in Rule 5 stops the encoded to vanish.
We now explain how the simulation of G by should work. The intended derivations of phase one of G can be simulated by literally the same rules of G, as these rules appear in , as well. In phase two, we consider a sentential form , where , and with . If u ends with A and v starts with B, i.e., and , can be applied. In the simulating semi-conditional grammar, we find
If instead is a substring of the current sentential form , this means that and , and now is simulated as follows:
By induction, this argument shows that .
To prove the converse part, by induction, we can assume that we consider a string w derivable from S in that is also derivable from S in G. In particular, none of the nonterminals $ and # occurs in w. We are going to argue that after some further derivation steps of (starting from w), either the derivation is stuck, or some string is produced that is also derivable from S in G. By induction, this shows .
Assume that in , using one of the unconditional rules in . As these unconditional rules correspond to the context-free rules of G, is also valid in G.
Hence, we are left with a situation where we apply a conditional rule to w. By the nature of the positive conditions, only Rules 1 and 3 are possibly applicable. Hence, . As in G, w belongs to phase two of the -GNF grammar G. According to Property 1 of -GNF discussed above, the interesting cases that remain to be discussed is when (i) , or (ii) , with , and .
Case (i): Clearly, if , we are ready, as a terminal string is already derived. Assume with and . Now, only Rule 1 is applicable. This leads us to , replacing some occurrence of A by $ and Rule 1 cannot be applied again until $ is deleted (and is present). As , only Rule 2 and Rule 4 are applicable on . If Rule 4 is applied, this means that the A that was changed to $ was immediate left of some C and therefore the A in the substring is still untouched. In that case, the derivation stucks-up as no rule is applicable anymore including Rule 5 (as is a forbidden context and is still present in the string). So, only Rule 2 is possible to apply to . If Rule 2 is applied, then the string results. As , only Rule 4 and Rule 5 are possibly applicable. As , Rule 4 is not applicable. So, after applying Rule 5, we get the resultant string . As , neither Rule 1 nor Rule 3 are applicable and the only applicable rule is Rule 6 which derives the desired string .
Case (ii): Assume with , and . As is not present in the string, Rule 1 cannot be applied. The only applicable rule is Rule 3. Consider . Comparing with w, some occurrence of C has been replaced by . We discuss three sub-cases in the following.
(a) If Rule 3 has been applied to the second C of the substring , then . As # is present in the string , Rules 2 and 3 cannot be applied. As $ is present in the string, Rules 1 and 6 also cannot be applied. By the second property of -GNF, w (and hence ) does not contain as a substring. Hence, the permitting context of Rule 4 is not a substring of , so that Rule 4 cannot be applied. Neither is the permitting context of Rule 5 available in . As none of the rules can be applied, the derivation gets stuck.
(b) If any C not belonging to the substring is replaced by , then . By the forbidden contexts, Rules 1, 2, 3, 6 cannot be applied. As before, the presence of the substring in would presuppose the presence of a second substring in w, which is impossible by the structure of w, so that Rule 4 is not applicable. Finally, , disabling the application of Rule 5.
(c) Hence, the only possibility is that the first C in the substring was changed to (by Rule 3). Hence, must hold. A case analysis similar to the previous ones proves that now, the only applicable rule is Rule 4. On applying it to the string , some occurrence of C is replaced by , yielding . Two further sub-cases arise. (c1) Assume that some C that is not next to is changed. According to the second property of -GNF, is not present in the string. Then, remains as a substring in . Though is a substring of and is a permitting context of Rule 4, we cannot apply Rule 4 as . (c2) Therefore, Rule 4 is applied only to the second C of . Thus, . Now the only applicable rule to is Rule 5 and applying it we get . As is still not a substring of , Rule 1 cannot be applied and the only applicable rule is Rule 6. Applying it three times produces the desired string .
By an induction argument, we see that .
Aiming at further trade-off results, trading the number of nonterminals for degree conditions, we consider semi-conditional grammars of degree . In [20], Okubo has shown that seven nonterminals with degree suffice to describe any RE language. No genuine results for degree are given in the literature. Finally, we improve on the nonterminal number by reusing the nonterminal S again, but with degree and of more conditional rules.
For each recursively enumerable language L, there is a semi-conditional grammar of degreewith only five nonterminals that describes L.
We start out with a representation of a recursively enumerable language by a type-0 grammar in -GNF that has only three nonterminals S, A, B. We are going to construct an equivalent a semi-conditional grammar of degree with additional nonterminals $ and #. G has context-free rules of the form that we simply also incorporate in the semi-conditional grammar that we produce without any conditions (apart from the rules that replace S by a terminal string), plus one non-context-free deletion rule, namely . This particular rule is simulated by the semi-conditional rules listed in Fig. 3. The intended simulation is as follows, basically executing the given rules in order.
Semi-conditional rules with 5 nonterminals for simulating , with , and for simulating .
Notice that we might assume that in any sentential form that is derivable according to G, a symbol X occurs to the right of the substring that satisfies according to the last -GNF property. So, we can assume that the symbol X that we are testing in Rules and does exist. This shows that . Before we continue with proving the converse inclusion , we observe the following.
If and if in , then there is only at most one applicable rule in , which is Rule 1. Furthermore, this requires that .
This claim is easily understood by inspecting the rules listed in Fig. 3, as all other rules require that . This nicely fits with Property 4 of -GNF grammars.
Now we prove the converse inclusion . Consider some sentential form w of . By induction, we can assume that w is also a sentential form of G. We are going to argue that after some further derivation steps of (starting from w), either the derivation is stuck, or some sentential form is produced that is also a valid sentential form of G. By induction, this shows .
As assumed above, w is a valid sentential form in G also, so it is possibly part of a derivation taking place in phase one in this -GNF grammar only. Hence, only context-free rules have been used, and this could be also assumed with , i.e., the simulation was the intended one. In particular, none of the nonterminals S, $, # occurs in the sentential form w at the end of the simulation of phase one.
Hence, we are now in phase two of the -GNF grammar. According to the properties of -GNF listed above, the interesting cases that remain to be discussed is when (i) or (ii) or (iii) with , , where in the latter two cases (ii) and (iii), there is no way to terminate the derivation in G according to Properties 3 and 4 of -GNF grammars. According to Property 5 of -GNF grammars, the only rule applicable in G would be the non-context-free deletion rule. In , we also find that at most some rules from Figure 3 might apply. By our Observation, Case (iii) is ruled out. In , as can be easily checked, the only applicable rule would then be the first rule . This results in a string with where one occurrence of B is replaced by $. Now, due to the fact that $ is a forbidden context in the first rule, it cannot be applied again. We can check that rules numbered 3, 4, 5, 6, , are not applicable as # is not contained in . As , neither Rule 7 is applicable. So, we have to employ Rule 2. In order to continue, we can conclude that (1) or (2) or (3) , as in the given normal form grammar G, no strings with a sequence of more than four B’s can be produced, see Property 2 of -GNF grammars, and there is only one position where one could find a sequence of three or four B’s according to Property 3 of -GNF. It is then clear that for we have that (a) or (b) (both from ), or (c) or (d) or (e) (all three from ), or (f) or (g) or (h) (all three from ), or (j) any other occurrence of B within u or v is replaced, leading to where the subword between the u- and the v-part is identical to .
In each case, both $ and # are present in the string , which immediately rules out all but Rules 3, 6, due to the forbidden strings in the other rules. Rule 6 requires two occurrences of $, which is not satisfied with any of the variations of . We are now discussing possibilities for with the remaining Rules and 3.
If we try to apply Rule , this requires the substring for , which leaves us with Case (d). In that case, . Now, Rule 1 is excluded, as substring is missing; Rule 2 is excluded by the presence of #; Rules 3 and are not possible, as $ is not present; Rule 4 is not applicable, as the substring is not present; similarly, Rule 5 is not applicable, as the substring is missing; Rules 6 and 7 require two occurrences of $ in the string, which is not the case here. Consider now applying Rule , which leads us to . By our Observation, there is no possible continuation of this derivation in .
If we want to apply Rule 3 instead to the string , which in fact has the same effect as Rule , namely, to delete the only occurrence of $ in , this means that for or . This leaves us with Cases (a) or (c) or (g).
In Case (a), we find . By carefully looking through all permitting conditions, it becomes clear that only Rules 4 or 9 are applicable. If Rule 9 is applied, we derive , and this derivation is stuck due to our Observation. So, emerges from by replacing one occurrence of B by . Apart from arriving at , which is in fact the intended simulation, any other occurrence of B might be also replaced. Anyways, as contains both and #, due to the forbidden conditions, we have to delete S next by using . This way, we check if A is to the right of . Hence, we either have , which is in fact the intended simulation, or where either and is obtained from v by replacing one occurrence of B left of some occurrence of A by $, or vice versa, interchanging the roles of u and v. In any case, as both # and $ are present, we have to apply one of the Rules 3, 6 or 8 next. Let us look a bit closer. As the symbol left to # is A and as this is the only occurrence of # in , Rules 6 and 8 cannot be applied due to the permitting conditions. Rule 3 requires $ sitting next right to #, which enforces , so that Rule 3 applies, leading to as intended.
In Case (c), , the same argument as in the previous analysis applies, i.e., only Rules 4 or 9 are applicable, with the latter rule leading nowhere due to our Observation. Hence, we arrive at some obtained from by replacing one occurrence of B by . This leads us to three possibilities for : , , or , with either or , plus replacing one occurrence of B by in to get . As contains both and #, due to the forbidden conditions, we have to delete S next by using , but this requires , leaving us with Case , which then necessitates . Now, the derivation is finally stuck, as can be verified: as , one of Rules 3, 6, 8 should be applied, but none of the permitting subwords is present in .
In Case (g), . By inspection, it becomes clear that only Rule 9 is applicable, but the resulting string cannot lead to any further sentential form due to our Observation.
Let us therefore return to the only remaining case, which is . As , the only applicable rule is Rule 5. Hence, (A) , or (B) , or (C) , where or and exactly one occurrence of A in was replaced by in order to get . As , only Rules 3, 6, or might be applicable. In Case (A), only Rule 6 is applicable by checking the permitting conditions, i.e., . In Case (B), Rule 3 is not applicable, as the symbol to the right of is not A or B. The permitting conditions of Rules 6 and require the substring , which is not present in this case. The same reasoning shows that Case (C) knows no continuation. To summarize, we are forced to . As $ is present in , we cannot apply Rules 1, 4, 5, or . As , the only applicable rule is Rule 7. This leads us to or , where or , and either is obtained from v by replacing one occurrence of A by #, or is derived from u in a similar fashion. Anyways, the presence of $ and # in the new string restricts us to using Rules 3, 6, in the next derivation step, which always requires $ and # to be neighboring symbols, leaving us with as the only possibility, as u cannot end with A (and hence cannot end with #) and v cannot start with A (and hence cannot start with #).
As , only Rules 3, 6 or might be applicable to . The permitting condition of Rule 3 is not satisfied. If we apply Rule 6, we arrive at . As can be checked by a case-by-case analysis, there is no rule that applies to transform any further in . If we use Rule instead, we get . Again, only Rules 3, 6 or might be applicable. The required permitting strings of Rules 3 and 6 are not subwords of . Hence, we have to apply Rule once more, leading to . Now, recall that u ends with a B and v starts with a B, unless they are empty. Hence, Rules 4 and 5 do not apply. The presence of # prevents applying Rules 2 and 7. The absence of $ does not allow us to apply Rules 3, 6, 7 or , either. As , Rule 1 cannot be applied. Hence, only Rule is applicable, yielding . As in G is valid (recall that we are still considering Case (i) from the beginning of this analysis), the induction step is shown.
There is one issue that needs to be finally addressed. As S is introduced as an auxiliary symbol in the conditional rules in Fig. 3, within , a new simulation of a derivation in G might start. This possibility is stopped by the fact that rules of G that are terminating the first phase, i.e., rules of the form for , are now guarded by the condition checking if $ is absent. More precisely, we have to prevent rule from being applicable after an attempted re-start of a simulation of phase one. We can first assume that the only way to generate the empty word within G is by using a rule , which would be put into anyways. This means that, starting with S, some string from might be created. Now, in order to keep Rule from being applicable, must be empty. Hence, only rules of the form have been applied, but there is exactly one rule of this form, and this belongs to and is hence guarded by the forbidden condition $ in . So, if such a restart is attempted, then this could never lead to a terminal string.
As the number of rules listed under and in Fig. 3 can grow based on the size of the terminal alphabet, the number of conditional rules can only be bounded by , where collects all rules of the simulated grammar G that are used for terminating the second stage of the first phase in a derivation of a Geffert normal form grammar. We also refer to the discussions of Remark 1.
We can get a better bound on the number of conditional rules at the expense of one additional nonterminal as shown in Fig. 4.
Semi-conditional rules simulating and .
For each recursively enumerable language L, there is a semi-conditional grammar of degreewith only six nonterminals and 13 conditional rules that describes L.
Instead of giving a full proof, we only list the rules simulating the given -GNF grammar in Fig. 4 and make the following remarks.
As S is no longer (ab)used as an auxiliary symbol, there is no need to shield the applications of S any more.
Also, we allow deleting $ and # when no A’s are present instead of testing for terminal symbols to the right of # (see Rules and ). The reason is that A’s can be absent only at the very end of the simulation of Phase 2 of G. Hence, both tests are equivalent.
The remaining (rather obvious) details are left to the reader. In particular, an intended derivation of the suggested simulating grammar follows closely the one presented in Eq. (1) for the previous simulation.
Future research questions
The interested reader might have wondered why we restricted our attention to (simple) semi-conditional grammars of degree with , although even degree is known to suffice for computational completeness. However, the published construction of [16] (simulating matrix grammars, based on [18]) is not helpful to get any bounds on the nonterminal complexity of such systems; even if we start out with a matrix grammar with a few nonterminals (as done in the construction of Mayer), and if we notice that few nonterminals suffice in matrix grammars, as shown in [7] for the currently best bounds, then we still arrive at a semi-conditional grammar with quite a number of nonterminals, as the matrix grammars considered in [18] are binary5 to start with, while in [7], the matrix grammars that are constructed contain arbitrarily many rule in their matrices. The standard construction to show that binary normal form matrix grammars characterize RE introduces an unbounded number of nonterminals. The new construction indicated in Footnote 2 uses a finite number but not few nonterminals. We pose it as an open question to construct a (simple) semi-conditional grammars of degree using only, say, ten nonterminals.
Clearly, other open questions concern the optimality of our nonterminal complexity results for degrees , or . It is also still somewhat unclear how to use longer forbidding conditions in order to lower the number of nonterminals or the number of conditional rules necessary to achieve computational completeness. Recall that our result for degree is the only one ever obtained in this direction. Intuitively speaking, bigger degrees are particularly interesting when dealing with -GNF or similar normal forms. They could help further reduce the number of nonterminals.
If one has another look at Tables 1 and 2, one observes a certain discrepancy. For instance, one observes that with degree , only seven nonterminals and six conditional rules suffice to characterize RE according to the result of Okubo [20], while nine nonterminals and eight conditional rules seem to be necessary (at least, they are sufficient) to characterize RE according to a result of Fernau et al. [9]. The difference is that with simple SCG, in all rules, at least one context string is absent. Conversely, let us call a rule that has both a permitting and a forbidden string attached to it strongly conditional. Hence, we can rephrase [9, Theorem 3.1] as: RE can be characterized by SCG of degreewith nine nonterminals and eight conditional rules, none of which are strongly conditional. By way of contrast, Okubo’s result (looking at his proof) gives: RE can be characterized by SCG of degreewith seven nonterminals and six conditional rules, four of which are strongly conditional. Hence, in a sense, counting strongly conditional rules (on top) offers another view on simple versus non-simple semi-conditional grammars. We could now ask questions like: Can we find results that characterize RE by semi-conditional grammars of degree that have, say, eight nonterminals but use only two strongly conditional rules? This opens up another research venue. Actually, some results in this research line have been published in a paper presented at CiE 2024 [10].
Finally, let us mention that – on top of trying to achieve computational completeness results with as little resources as possible – one could also strive to understand if few(er) resources are needed to describe non-recursive languages. Here, we refer to Remark 1. In particular, one could try to further lower the nonterminal complexity to achieve this goal compared to the more ambitious target to describe all recursively enumerable languages. To achieve this more modest goal, one could also try to simulate mechanisms that are unknown to describe all of the recursively enumerable languages, although they contain undecidable languages, as exemplified by limited Lindenmayer systems or regulated grammars with unconditional transfer, see [1,3–5].
Footnotes
Acknowledgements
The research leading to this paper was in large parts undertaken when Rufus Oladele and Lakshmanan Kuppusamy visited the University of Trier in September and October, 2017. The final writing of this paper was initiated during another research visit of Lakshmanan Kuppusamy to Trier during September 2022. All authors gratefully acknowledge the possibility to make use of third party overhead money provided by the University of Trier to support the mentioned travels. The first and second authors would like to thank the Indo-German (DST-DAAD) research collaboration project “Succinct Representation of Grammars: Theory and Applications” (No. DST/INT/DAAD/P-01-2024(G), DAAD No. 57724085) as the paper was finished under the scope and objective of the project. We are also grateful to the comments of the reviewers that helped improve our manuscript.
Notes
References
1.
DassowJ., A remark on limited 0L systems, Journal of Information Processing and Cybernetics EIK24(6) (1988), 287–291.
2.
DassowJ.PăunG., Regulated Rewriting in Formal Language Theory, EATCS Monographs in Theoretical Computer Science, Vol. 18, Springer, 1989.
3.
FernauH., Membership for 1-limited ET0L languages is not decidable, Journal of Information Processing and Cybernetics EIK30(4) (1994), 191–211.
4.
FernauH., Membership for k-limited ET0L languages is not decidable, Journal of Automata, Languages and Combinatorics1 (1996), 243–245.
5.
FernauH., Unconditional transfer in regulated rewriting, Acta Informatica34 (1997), 837–857. doi:10.1007/s002360050108.
6.
FernauH., Parsimonious computational completeness, in: Developments in Language Theory – 25th International Conference, DLT, MoreiraN.ReisR., eds, LNCS, Vol. 12811, Springer, 2021, pp. 12–26. doi:10.1007/978-3-030-81508-0_2.
7.
FernauH.FreundR.OswaldM.ReinhardtK., Refining the nonterminal complexity of graph-controlled, programmed, and matrix grammars, Journal of Automata, Languages and Combinatorics12(1/2) (2007), 117–138.
8.
FernauH.KuppusamyL.OladeleR.O., New nonterminal complexity results for semi-conditional grammars, in: Sailing Routes in the World of Computation, 14th Conference on Computability in Europe, CiE, ManeaF.MillerR.G.NowotkaD., eds, LNCS, Vol. 10936, Springer, 2018, pp. 172–182. doi:10.1007/978-3-319-94418-0_18.
FernauH.KuppusamyL.RamanI., Counting simple rules in semi-conditional grammars is not simple, in: Twenty Years of Theoretical and Practical Synergies, 20th Conference on Computability in Europe, CiE, L.L. Patey and E. Pimentel, eds, LNCS, Vol. 14773, Springer, 2024, pp. 192–204. doi:10.1007/978-3-031-64309-5_16.
11.
GeffertV., A representation of recursively enumerable languages by two homomorphisms and a quotient, Theoretical Computer Science62 (1988), 235–249. doi:10.1016/0304-3975(88)90068-0.
12.
GeffertV., Normal forms for phrase-structure grammars, RAIRO Informatique théorique et Applications/Theoretical Informatics and Applications25 (1991), 473–498. doi:10.1051/ita/1991250504731.
13.
GruskaJ., On a classification of context-free languages, Kybernetika3(1) (1967), 22–29.
14.
GruskaJ., A few remarks on the index of context-free grammars and languages, Information and Control (now Information and Computation)19 (1971), 216–223. doi:10.1016/S0019-9958(71)90095-7.
15.
MasopustT., Formal Models: Regulation and Reduction, PhD thesis, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic, 2007.
16.
MasopustT., A note on the generative power of some simple variants of context-free grammars regulated by context conditions, in: Language and Automata Theory and Applications, LATA, DediuA.-H.IonescuA.-M.Martín-VideC., eds, LNCS, Vol. 5457, Springer, 2009, pp. 554–565. doi:10.1007/978-3-642-00982-2_47.
17.
MasopustT.MedunaA., Descriptional complexity of semi-conditional grammars, Information Processing Letters104(1) (2007), 29–31. doi:10.1016/j.ipl.2007.05.002.
18.
MayerO., Some restrictive devices for context-free languages, Information and Control (now Information and Computation)20 (1972), 69–92. doi:10.1016/S0019-9958(72)90287-2.
19.
MedunaA.GopalaratnamM., On semi-conditional grammars with productions having either forbidding or permitting conditions, Acta Cybernetica11 (1994), 309–323.
20.
OkuboF., A note on the descriptional complexity of semi-conditional grammars, Information Processing Letters110(1) (2009), 36–40. doi:10.1016/j.ipl.2009.10.002.
21.
PăunG., A variant of random context grammars: Semi-conditional grammars, Theoretical Computer Science41 (1985), 1–17. doi:10.1016/0304-3975(85)90056-8.
22.
VaszilG., On the descriptional complexity of some rewriting mechanisms regulated by context conditions, Theoretical Computer Science330 (2005), 361–373. doi:10.1016/j.tcs.2004.06.032.