Abstract
The conventional approach to event extraction requires predefined event types and their corresponding annotations to train event extractors. However, these prerequisites are often difficult to satisfy in real-world applications. To automatically induct event types, most work has been devoted to clustering event triggers, where a cluster of event triggers is represented as an event type. Some works use trigger semantics, while others use co-occurrence relationships to cluster triggers. However, the clustering results of event triggers obtained by the above work are not sufficiently detailed in describing event types, making it difficult to accurately determine the corresponding event types manually. This paper proposes an open-domain event type induction framework that automatically discovers a set of event types from a given corpus. Unlike previous work on event trigger clustering, this paper takes into consideration the hierarchical relationship of event types to partition the event trigger clusters into event mains and subtypes. The framework employs a latent variable-based neural generation module and a semantic-based clustering module, the former of which obtains event trigger clusters representing the main types of events by jointly projecting the co-occurrence and semantic information of event triggers into a latent space for event type latent variable mining, and the latter of which further divides these event trigger clusters into event subtypes based on semantic information. Finally, experiment results show that, compared with the benchmark model, the ETGen-Clus can improve event type quality scores of 6.23% and 3.11% on the two datasets, respectively.
Introduction
Event extraction from text has received considerable research attention. Traditional event extraction tasks [6,11,30] typically require a predefined event schema, which consists of event types and argument roles and can be extracted using supervised machine learning models such as deep neural networks based on manually annotated data. In practice, however, the annotation process of event schema extraction is recognized as being very resource-intensive in terms of time and money. In addition, the coverage of manually constructed schemas is usually very low, making it impossible to generalize them to new scenarios.
For the purpose of automatically inducing event types and argument roles from raw text, researchers have investigated algorithms based on ad-hoc clustering [4,14,26,29] and probabilistic generative methods [3,8,23,34] to discover a set of event types and argument roles. Generally, these approaches employ manually designed representations and rely on strong statistical assumptions. Some studies remove those restrictions using techniques such as transfer learning [7,16], GAN [33] and VAE [15] without the need to explicitly derive patterns for new event types to extract events. However, these methods still require significant annotation of a known set of types.
Previous researchers have also utilized unsupervised learning methods to alleviate the cost pressure of data annotation, such as reference [12] and reference [14] uses the semantic information of event triggers to cluster event triggers and represent the event types through event trigger clustering. More recently, reference [27] used the pre-training model to obtain a semantic feature of event triggers and their object heads and then projected the two semantic features onto the same spherical vector space for clustering. Reference [19] mentioned high consistency of word cluster represents significant topic features, and clusters of event triggers possessing significant topic and high semantic similarity can help to identify specific event types, whereas the above method mainly focuses on the semantic information of the event triggers to generalize the event types and lacks the mining of the hidden event types of the documents themselves, resulting in low consistency of the clustered event trigger clusters.
To address the above problems, this paper introduces a novel open-domain event type induction framework. Unlike the traditional event type induction framework, this paper classifies event types into main types and subtypes and divides the task into two steps: generation of event main types and clustering of event subtypes. In the process of generating event master types, in order to better mine the hidden event types in the text, the model will feed more feature information, i.e., co-occurrence and semantic information of event triggers, to the generation module in the framework to improve the coherence of event trigger clusters. Then clustering based on the semantic information of event triggers is performed to get clusters of event triggers that will be represented as event subtypes. Compared to the event type induction model that focuses on semantic information, the event type induction model in this paper greatly improves the coherence of event trigger clusters with less loss of semantic similarity. The main contributions of the work are summarized as follows:
A neural variational inference model for generating event types that combines semantic information of event triggers with text-trigger co-occurrence fully exploits the event types in the hidden space, thereby improving the coherence of event trigger clustering.
The framework exploits the potential relationship between event type hierarchies and event trigger clusters to induce more consistent and high semantic similarity in event trigger clusters to represent high-quality event types.
This paper conducted extensive experiments on real-world datasets and demonstrated that our model can induce high-quality event types.
The structure of this paper is as follows: Section 2 gives an overview of related work. Section 3 introduces the ETGen-Clus framework. Section 4 details the experimental setup. Section 5 reports the experimental results and provides a thorough analysis. Section 6 concludes the research.
Related work
Event schema induction
The seminal work in event schema induction is the coreference chains [4] for template induction. Currently, mainstream methods include probability-based generative models [3,8,23,34] that jointly model predicates and arguments, as well as ad-hoc clustering algorithms [14,26,29]. Reference [20] adopts an entity-centric approach to event schema induction, where entities are grouped into semantic slots and the predicate for entity clusters is identified in a post-processing step. These methods rely on discrete, handcrafted features and impose strong statistical assumptions. For event type induction, reference [15]design a semi-supervised vector quantization variational autoencoder framework that utilizes knowledge learned from existing annotations to discover new event types. Reference [27] reduces the complexity of feature generation using pre-trained language models and relaxes strict statistical assumptions through latent spherical space clustering. These methods did not take into account the hierarchical relationship between event types.
Event extraction
Event extraction is considered a fundamental task in the field of information extraction [5]; it assumes a pre-defined event schema and attempts to identify event triggers and their associated arguments. Existing event extraction approaches have exploited two levels: sentence-level [21] and document-level [9]. Some also utilize RNNs [17], CNNs [6] and GANs [33] for extracting new event types. Recently, for zero-shot event extraction, reference [16] proposed transfer learning by applying a model trained on seen event types to unseen ones. Reference [35] used deep learning models to represent the semantics of labels and then developed an encoding model that aligns unrelated contexts with their respective definitions. This facilitates the mapping of candidate event mentions and definitions into a common embedding space, resulting in superior accuracy and efficiency in extracting arbitrarily defined event types.
Framework
As shown in Fig. 1, ETGen-Clus consists of three parts: 1) extraction of salient event triggers using dependency-based syntactic structures; 2) generation of latent space event main types; and 3) event trigger clustering based on semantic features to induce event subtypes.

Architecture overview. (Circles are features of event triggers obtained from the BERT pre-training model, squares are bag-of-words (BoW) representations of event triggers, and triangles are event triggers, where different colours represent semantically different.)
Similar to previous research [14,27], this paper employs a lexicality-based approach to identify candidate event triggers. Specifically, given a sentence, this paper leverages a dependency parser named Spacy to extract relevant linguistic features such as tokens, lemmas, and part-of-speech (POS) tags. Due to Spacy’s superior speed and accuracy compared to other natural language processing tools and considering the relatively large corpus used in this paper, utilizing the Spacy toolkit is expected to significantly enhance efficiency and yield favorable outcomes. Subsequently, this paper selects all non-auxiliary verb tokens as our candidate verb tokens from the input sentence. However, indiscriminately considering all verb tokens as candidate event triggers may lead to certain drawbacks, including an overabundance of overly general or information-poor event triggers as well as excessively specific and difficult-to-generalize event triggers. To mitigate these issues, this paper adopts a saliency-based approach to identify more informative verb tokens that satisfy two criteria: first, candidate event triggers should exhibit high frequency within the corpus, and second, the candidate event trigger frequency should not be excessive when compared to large general-domain background corpora. Use saliency scores [27] ranked in the top 80% as candidate event triggers. The saliency score calculation formula is as follows:
Event main type generative model

Plate notations for models (T – event triggers; d – documents; the observed variables are denoted by gray circles, while hidden variables are represented by white circles.)
Notation used in this paper
Given a corpus
Pre-trained language models, in particular BERT [10] and GPTs [24], have demonstrated impressive capabilities in various natural language processing tasks by providing rich informational representations of the language. In the framework, this paper uses the pre-trained BERT model to extract semantic features from event triggers. Specifically, given the pre-training procedure of BERT, which encompasses the incorporation of masking language modeling and next-sentence prediction, the necessity arises to employ sentences as input data for the purpose of capturing semantic information at the level of individual words. Accordingly, this paper utilizes documents as BERT inputs, with the aim of generating specific semantic embeddings for each event trigger within their respective sentences.
The joint probability of a document is
The main goals of the model are twofold. First, the model needs to learn the optimal parameters. Second, seek to perform inference to estimate the posterior distribution of the latent variable e, given a document d. To accomplish both tasks, the model uses a neural inference network to obtain the variational parameters and implements the amortized variational inference method [28]. Following reference [28], the model collapses the discrete latent variable e to obtain an Evidence Lower Bound (ELBO) [18] of the log marginal likelihood.

High-level schema of the architecture for our inference network.
In Equation (3), the first component denotes the reconstruction error. In addition, the KL divergence term in this paper is computed in closed form. To maximize the ELBO, this paper uses the ADAM optimizer. This paper also alleviates the problem of component collapse by using a high moment weight (>0.8) and learning rate (in [0.001, 0.1]) in the ADAM optimizer while applying batch normalization and dropout techniques. These strategies are followed by [28]. After model training, this paper assigns an event main type feature to each event trigger feature by maximizing likelihood.

ETGen
The parameter learning is shown in Algorithm 1, where α, β and θ are model parameters. The notation
Upon convergence of the generative model train, given a corpus C, first utilize the learned parameters α to sample K event main types of logarithmic probabilities. Then, for each document d in the corpus, employ the parameter θ to sample K event main types of logarithmic probabilities corresponding to each event trigger in the text. Simultaneously, by utilizing the parameter β, obtain the logarithmic probabilities of each event trigger’s semantic feature
The generating function for each event main type corresponding to each event trigger is defined as follows:

The schematic diagram of event main types generated by our model, where each box represents an event main type and gray circles indicate the event trigger.
In the generated event trigger clusters, each event trigger exhibits a certain degree of semantic similarity and high coherence, representing the main type of event. Subsequently, based on the viewpoint that event triggers of event subtypes possess highly similar semantic features, this module uses the semantic characteristics of event triggers as input to perform further clustering for each event main type in order to obtain subtypes of events. Three classical clustering algorithms, namely Kmeans, Points To Identify The Clustering Structure (OPTICS) [1] and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) [2], will be applied as the final module of the framework for comparison. To reduce the sparsity of event trigger features and computational costs, this paper utilizes the Uniform Manifold Approximation and Projection (UMAP) [22] algorithm for dimensionality reduction. UMAP was chosen over T-distributed Stochastic Neighbor Embedding (t-SNE) [13] because it is known to better preserve both local and global features of high-dimensional data in lower projection dimensions.

The schematic diagram of event subtypes generated by clustering the event triggers.
By implementing further clustering algorithms, this module is able to locate areas of semantic density among event triggers, which represent subtypes of events. As shown in Fig. 5, the event main type is commit, and after clustering, each event trigger is clustered into its respective cluster, with purple representing the hurt event, blue representing the justice event, and orange representing the hearing event. The clustering results vary across different clustering algorithms. Figure 6 displays the visualization results of event trigger clusters after applying both Kmeans and OPTICS clustering algorithms based on semantic features. It can be observed that the distance-based Kmeans algorithm can only determine the number of event subtypes using a pre-defined K value, forcing all event triggers into one category, while the density-based OPTICS clustering algorithm can identify more densely populated clusters of event triggers in the semantic space.
In summary, the cluster method of this module involved utilizing UMAP for dimension reduction on event trigger semantic feature vectors to make them less sparse and computationally efficient, followed by using classical clustering to identify the dense areas of event trigger semantic features. Finally, the framework effectively identified fine-grained event subtypes by locating clusters of semantically similar event triggers within dense regions of each event main type.

UAMP visualization for Kmeans and OPTICS.
Datasets
Comparison with existing datasets
Comparison with existing datasets
This paper performs experiments on two datasets: MAssive eVENt(MAVEN) and GNBusiness-All(GNB). MAVEN was collected and constructed by reference [31]. To ensure consistency, this paper used identical pre-processing procedures to extract event triggers for both datasets. The introduction to these corpora is as follows:
MAVEN is a large-scale general-domain event detection dataset that utilizes English Wikipedia as the data source, consisting of 4,480 documents, 49,873 sentences, and 1,276k tokens. The content of the dataset covers a wide range of subject areas, including history, science, culture, technology, society, and sports, among others. Each article aims to provide comprehensive information on a particular topic, including its history, background, related concepts, important figures and events, and developmental trends, among other aspects.
GNB was obtained by reference [20] by crawling Google’s business news sites.1
News demonstration:
A comparison of MAVEN and GNB is shown in Table 2.
Hyper-parameters setting
This paper runs all experiments on a desktop computer with the Intel Core i7-13900HX CPU, one NVIDIA GeForce RTX 4060 GPU, 32 GB of RAM, and the PyTorch deep learning platform. Most of the hyper-parameters in our generative model are shown in Table 3. The number of event main types in an experiment is chosen according to experiments of event type discovery.
Evaluation metrics
Event type coherence
Several qualitative metrics have been proposed to assess topic coherence. Reference [25] showed a framework in which various existing topic coherence measures are combined together.
Formally, the event coherence
Event trigger semantic similarity
Due to the semantic similarity of triggers within an event type, this paper adopts semantic similarity as another measure of event type quality. This paper defines the cosine similarity of the top 10 event triggers under each event cluster. Results close to 1 indicate better similarity of event types; results close to −1 indicate poor similarity.
Event type quality
This paper defines the overall metric for the quality of an event type as the weighted sum of its event type coherence and event trigger semantic similarity.
Compared methods
Experimental results
Event type discovery

Averaged event type quality scores of the two datasets with different numbers of event main types K.
In order to determine the optimal number of event main types, this paper first applies a generative model to MAVEN and GNB to discover event main type number K by grid search in [5; 100] with the step equal to 5. Figure 7 shows the averaged event type quality score of the two datasets. For the average event type quality score, the optimal number of event types for MAVEN is 10 and 20 for GNB. This paper chose
Some clustering examples are presented in Table 4 of this paper. From the observations, it can be noted that event trigger clusters with higher event quality scores are more likely to determine event types compared to clusters with lower scores. This finding indirectly suggests that the metric proposed in this paper is effective in evaluating the quality of event trigger clusters.
Clustering of event triggers in the top five and bottom two event quality scores found on the MAVEN dataset. Event types have been manually annotated, and italics indicate subtypes of events. The event main type is indicated above the subtypes
The induction results for event types, with each method being run five times, and the average results for each metric
Table 5 presents the results of all methods on two datasets. This paper used 3 metrics to evaluate the event trigger clustering results for each dataset, resulting in 6 metrics for both datasets combined. This paper determined the number of event types induced by each method based on their event type quality scores. Event type induction methods are divided into three groups: induction based on semantic information, induction based on co-occurrence information, and induction in the ETGen-Clus framework.
As shown in Table 5, the results of all the methods on the two datasets. In this paper, coherence, semantic similarity, and event type quality scores are used to evaluate the clustering results of event triggers on each dataset. Compared to the baseline model, the model used in this paper achieves higher event type quality scores of 6.23% and 3.11% on the two datasets, respectively, and since the first six models in the table perform the clustering induction task for the semantic features of event triggers, they all have higher semantic similarity. However, their coherence is reduced compared to the ProdLDA models. This is mainly due to the fact that ProdLDA is a topic model that performs the event type induction task using information about the co-occurrence of event triggers in documents. Although ProdLDA performs better in obtaining the coherence of the set of event triggers, it performs poorly in terms of the semantic similarity of event triggers. In contrast, the ETGen-Clus achieves improvements in both coherence and semantic similarity aspects of the event trigger induction by incorporating co-occurrence and semantic information. Particularly in terms of coherence, ETGen-Clus outperforms models that induce event types based solely on semantic information, with increases of 22.01% and 28.08% on the two datasets, respectively. After considering these two metrics together, this paper’s model outperforms the existing baseline model.
As shown in Table 6, among the clustering modules used in the framework, the use of OPTICS and HDBSCAN resulted in the best performance, indicating that density-based clustering models can better group significant event triggers together.
Results of event type induction obtained by selecting different clustering algorithms
Results of event type induction obtained by selecting different clustering algorithms
For the GNB dataset, all of the models showed a decrease in coherence metric scores. Each text in GNB consists of a news headline and first paragraph, which is relatively shorter in content compared to the MAVEN dataset and has a narrower range of event types covered, limiting the utilization of co-occurrence information of event trigger words. Therefore, except for the similarity metric, all other metrics showed a decrease.

Comparison between ETypeClus and ETGen-Clus.

Event type hierarchical relationship structure.
For the event type induction model based on semantic information, only the semantic information of event triggers is used as input, lacking the assistance of co-occurrence information of event triggers, and the information mining of potential event types is insufficient. In contrast, the generative model in this paper is based on the neural variant autoencoder, fuses semantic information and co-occurrence information of event triggers to mine event type latent variables, and provides highly coherent clusters of event triggers for the semantic segmentation of event triggers in the next step. As shown in Fig. 8, in the range of the number of event types [10:100], the generative model proposed in this paper that fuses the semantic information of event triggers and co-occurrence information improves the event quality scores compared with those obtained from the semantic information-based event triggers induction task represented by ETypeClus, indicating that the model in this paper has been effective in the event trigger implicit event type information mining.
Event type hierarchical relationship analysis
The event type can be defined as a class or concept and can be established hierarchically through parent-child relationships or inclusion relationships. As shown in Fig. 9, Legal can be considered an event type and can be further categorized into subtypes such as Violent Crime, Theft Crime, Judicial, etc. thus forming a hierarchical structure. This model the event type hierarchy by using event trigger clusters to represent event types, which enables different properties to be exhibited by the event trigger clusters of main types and subtypes. Specifically, for an article describing a Legal event type, it may include event types such as Violent Crime, Theft Crime and Judicial. In the research task of this paper, this paper considers that in the cluster of event triggers that denote legal event types, there will be event trigger words that denote event types such as Violent Crime, Theft Crime and Judicial. Therefore, the event triggers in the event trigger clusters representing legal event types are more likely to co-occur in the same article. On the other hand, since the Violent Crime, Theft Crime and Judicial event types are more specific compared to the Legal event type, the semantic similarity between them will be higher for event triggers in the Violent Crime, Theft Crime and Judicial event type clusters.
As shown in Table 7, the generative module fully integrates co-occurrence information and semantic information of event triggers, resulting in better coherence of generated event triggers that conform to the properties of event main types. Similarly, in the event trigger word clustering module, the coherence of event triggers is reduced, but the similarity is significantly increased, conforming to the properties of event subtypes. Therefore, the framework fully utilizes the hierarchical relationship between event types to induce high-quality event types.
The results of event type induction on two datasets by two modules
The results of event type induction on two datasets by two modules
This paper proposes an event type induction framework oriented towards event type hierarchies. Experimental results show that the ETGen-Clus framework outperforms event type induction models that only utilize semantic or co-occurrence information for event triggers. Specifically, this paper analyzes the relationship between event type hierarchy and event trigger clusters and divides the induction task into two steps, namely, generating event trigger clusters with higher coherence that represent the main types of events and inducting event trigger clusters with higher semantic similarity that represent the subclasses of events. The framework emphasizes the importance of event type hierarchy in event type induction. Experiments on two datasets demonstrate that ETGen-Clus is able to infer salient event types that are consistent with human judgment. In the future, we will explore the corresponding argument roles for each event type, analyze the entities on which each of them depends in the original text by clusters of event triggers denoting the event type, and finally obtain the argument roles under a specific event type to form a complete event schema.
Footnotes
Acknowledgements
This work was supported by the National Natural Science Foundation of China (NSFC) under grants (No.72204261, No.72371245).
