Abstract
Most code search tools seem to yield semantically correct matches, but the search results rarely meet the demands of users perfectly. These results still have to be modified manually. One major reason is that existing tools lack the ability of intent predicting to guess what else a user might do after obtaining the results. In this paper, we propose an intent-enforced code search approach (IECS) that can predict the potential intents for a query before performing code retrieval. Then it expands the query with the intents and applies the Extended Boolean Model to retrieve the relevant results without any subsequent modification. We implement SnippetGen, a code search tool performing IECS. Compared with CodeHow and Google Code Search (CS), SnippetGen outperforms them by 28.5% with a precision score of 0.846 (i.e., 84.6% of the first returned results are relevant results) when we utilize these tools to perform 70 queries on a codebase consisting of 27K projects downloaded from GitHub. We also perform a controlled experiment by asking 20 participants to complete 3 tasks with SnippetGen and CodeHow. The results confirm the effectiveness of SnippetGen in programming practice.
Introduction
To reuse the existing method with the same functionality as much as possible, many code search tools are proposed. Early code search engines, e.g., Google 1 , Krugle 2 and Koders 3 , offer only the keyword-based search with low precision [23, 25]. The later work did semantic search to enhance the accuracy of the search results, e.g., signature matching, type matching [23, 13]. But these approaches are impractical, because they require too little or too much specification. The current work supports free-text query via query expansion to promise better usability, such as the latest code search tool CodeHow [2] which expands the query with the APIs and considers the impact of the both APIs and text similarity. Although these existing code search tools seem to yield semantically correct matches [3], the search results might be too complex or too slow to meet the user demands. These results always need to be modified [4]. One major reason is that these tools lack the ability of intent predicting to guess what else a user might do after obtaining the search results.
In this paper, we propose an intent-enforced code search approach (IECS) which makes use of the intent to enhance the search rather than transform the result of the initial search to meet the user needs. For the specific method, we propose the intent extraction algorithm to extract the intents from the past records of modifications. In this process, we refine the commonly-used intents and enrich the intents with the method’s description consisting of method signature, FQN 4 and remarks. Given a query, we predict the potential intents for the query before performing code retrieval. Then we expand the query with the intents and apply the Extended Boolean Model [25] to retrieve the relevant code without any subsequent modification.
We implement SnippetGen, an IECS-supported code search tool. The front-end is a visual studio 2010 extension. The backend is supported by Sourcerer [17]. We evaluate SnippetGen against CodeHow [2] and CS [24] by performing 70 real-world queries on a codebase consisting of 27KB projects downloaded from Github. The results show that SnippetGen achieves a precision score of 0.846 which outperforms them by 28.5% when the top 1 results are inspected. We also perform a controlled experiment by asking 20 participants to complete 3 tasks with SnippetGen and CodeHow. The results confirm the effectiveness of SnippetGen in programming practices.
The contributions of this paper are as follows: An intent-enforced code search approach (IECS) is proposed. It can predict potential intents for a query, expand the query with the intents and apply the Extended Boolean Model to retrieve the relevant code without any subsequent modification. The refinement strategy is applied to refine the intents to ensure the appropriate intents benefit the code search. SnippetGen, that performs IECS, is implemented. The experiment results show that SnippetGen outperforms CodeHow and CS by 28.5% with a precision score of 0.846.
The rest of this paper is organized as follows. We present the overall structure in Section 2, describe the intent-enriched and intent-enforced components in Sections 3 and 4. Then we describe in-house and controlled experiments in Sections 5 and 6. Finally, we discuss the work, introduce the related work and conclude the paper in Sections 7, 8 and 9.
Approach overview
Figure 1 presents the overall structure of SnippetGen which consists of intent-enriched, intent-enforced component. SnippetGen constructs a codebase by collecting projects from github (open source repositories) and indexes the code at the method level using Sourcerer [18]. In the intent-enriched component (as shown in Fig. 1a), SnippetGen records every code modification through the code version tracking service in the background, and exploits the intents from the past records of modifications. Given a query, SnippetGen feeds it to the intent-enforced component (as shown in Fig. 1b) and predicts the potential intents for the query by computing the semantic similarity between the intents and the query. Then SnippetGen expands the query with the potential intents to retrieve the relevant methods without any subsequent modification by considering the impact of both intents and text similarity.

SnippetGen: Overall structure.
Figure 2 gives a screenshot of SnippetGen user interface. After writing some methods, a user can trigger the SnippetGen search facility by right-clicking on the written methods and selecting the SnippetGen menu option (as shown in Fig. 2a). Then SnippetGen sends the query to Sourcerer which, in turn, returns relevant results in the second-level menu. When a result is selected, the code snippet is presented in the Snippet Viewer (as shown in Fig. 2b). If the result matches the user needs, he can click the accept button (as shown in Fig. 2d) to integrate the result to IDE; otherwise he can click the page turn button (as shown in Fig. 2c) to navigate the other results.

A screenshot of SnippetGen user interface.
Our personal experience supports an underlying assumption that people are most likely to do what they have been used to do before. In this section, we propose the intent extraction algorithm to extract the intents from each methods past records of modifications. However, each method may have many intents. If SnippetGen incorporates all intent elements for expansion, it may produce even worse results than not expanding the query. Thus SnippetGen considers the refinement strategy to ensure the intents benefit the code search. Algorithm 1 describes this process procedurally in the three steps.
/* step 1: identify modificaion */
foreach m i in M
∣ Δ i =m i -o //obtain the i-th modifications(Δ i )
∣ add Δ i to Δ s ; // obtain all modifications(Δ s )
end
/* step 2: refine modification */
// identify the common modifications
Δ
c
=
foreach Δ c i in Δ c
∣ // obtain the node operations pairs(nop)
∣ from Δ c i
∣ Δ c i → nop:
∣ // obtain the concrete instances(ci) of types,
∣ methods, variables and constants
∣ from nop
∣ nop→ ci
∣ if abstractMatch(ci) is true
∣ ∣ // ci’s edit type or inheriting type is
∣ ∣ equivalent
∣ ∣ if ci is inconsistent with the mapper
∣ ∣ ∣ omit nop;
∣ ∣ ∣ continue;
∣ ∣ end
∣ ∣ ci = ai;//substitute the abstract
∣ ∣ identifiers(ai) for ci
∣ ∣ build the mapper (ai, ci);
∣ end
end/** step 3 extract intents */
mapper→ intents; //extract the intents from the mapper
method = method+intents; //enrich the method with the intents
Return intents

The process of intent extraction.
Given a query, SnippetGen predicts the potential intent for the query and incorporates the impact of both intent and textual description on code retrieval.
Predicting intents
For each method m
i
, SnippetGen uses standard tokenization, stop word removal, identifier splitting and stemming to convert the query and the intents into a bag of words [25]. Note that the intent here contains intent itself, method signature, FQN and remarks. Then SnippetGen computes textually semantic similarity scores using the standard Vector Space Model (VSM) [14], and returns the top i potentially semantic intent scores denoted as:
In VSM, the query and the intents are represented by a vector. The term frequency (tf) of a word is the number of times the word appears in the query or the intents normalized by the total number of words in the query or the intents. The inverse document frequency (idf) of a word is the logarithm of the total number of documents (i.e, the number of intents in the codebase) divided by the number of documents that contain the word.
Expanding query
A query Q
t
containing n terms is defined as:
For a code snippet, we define the following three features:
f1 stands for the intent. f2 stands for the FQN, which gives a brief summarization of the methods functionality. f3 stands for the method body containing source code statements that implements a functionality.
A query can be expressed in terms of f
i
:t
i
where t
i
∈ Q
t
and f
i
∈ F
t
. It means to search a field f
i
that contains the term t
i
. SnippetGen constructs a Boolean query expression for retrieving code snippets that match the query in terms of text similarity:
This query expression retrieves methods that contain the terms t1, t2, ⋯, t n in fields f2 (FQN) and f3 (Method Body).
Having obtained k potentially relevant intents, SnippetGen tokenizes the intent and gets a keyword list A
i
. Then it constructs Boolean query expressions as follows:
A method may be retrieved by more than one query expressions defined above. SnippetGen combines the query expressions into an expanded query for retrieving methods:
Retrieving code snippets
To retrieve relevant methods given the queries, we adopt the Extended Boolean model (EBM) [9], which combines the characteristics of the VSM and Boolean model. In EBM, a document d is represented as a vector. Given a query expression q
expand
= (q
intent
1
, q
intent
2
⋯ q
intent
k
, q
text
), a generalized disjunctive and a generalized conjunctive query are defined as follows:
The similarity between the expanded query q
expand
and a document d is computed using p-norm [25] and ranked according to Equations (1) and (2):
In Equation (3), if the term is an intent (such as “Microsoft,Ace,OleDb,12.0,Excel12.0, HDR=NO”), the weight is the intent score. If the term is not an intent (such as “access”), its weight is measured by the normalized term frequency. ft,d isthe frequency of the term t in method d, and idf t is the inverse document frequency of the term t, The constant 0.5 is used for balancing the term weight.
To evaluate SnippetGen, we present our experimental setting, evaluation metrics and experimental results in this section.
Experimental setting
First, we construct the codebase consists of 27KB projects downloaded from github, containing about 11.7 million source code files and 11.4 million functions written in C#. Second, we collect 70 real-world queries created by Portfolio’s author [12]. Third, we implement the front-end of SnippetGen as a Microsoft Visual Studio 2010 extension and deploy the backend of SnippetGen as a Microsoft Azure cloud service which is performed by Sourcerer running on five Azure virtual machines (including 1 master node and 4 workers). Forth, we employ 20 participants. 6 participants are graduate students who have at least of two years of C# programming experience. The others are PhD students who have 3-6 years of C# programming experience. Each participant is asked to use SnippetGen to address a number of queries reported in Portfolios user study and examine the top 20 search results for each query.
Research questions
To investigate the overall effectiveness of our approach and the benefit of distinctive features of our approach, we have identified the following research questions:
RQ1: How effective is SnippetGen?
This RQ evaluates the precision of SnippetGen in retrieving relevant code. To answer this question, each participant runs SnippetGen, CodeHow and Google Code Search (CS), using the 3-4 queries and manually inspect the top 20 results returned from each query to judge whether they are relevantor not.
CodeHow is the latest query-expanded code search tool and performs better than all Lucene-based code search tools [24]. Participants use CodeHow and enter the query conditions directly to retrieve the results. CS represents conventional keywords-based code search web applications. Participants should go to the website, look for implementations and extract them by copying and pasting results into the workspace.
RQ2: Is the refinement strategy effective?
One distinctive feature of our approach versus existing code search tools is the refinement strategy (Section 3). It tries to refine the intents. Answers to this research question help us evaluate whether this feature ensure the refined intents benefit the code search or not. To answer this question, we compare two versions of SnippetGen, one with the refinement strategy and the other without the refinement strategy (referred to as SnippetGen noRS ). In the implementation of SnippetGen noRS . we omit the refinement strategy and expand the query with the all intent elements directly regardless of whether or not the intent is appropriate.
RQ3: Is the Extended Boolean model effective?
Another distinctive feature of our approach versus existing code search tools is the Extended Boolean model used in code retrieval (Section 4). Answers to this research question will shed light on whether this feature is useful for code search or not. To answer this question, we implement a variant of SnippetGen (referred to as SnippetGen noEB ), which uses Apache Lucene to retrieve methods. We compare SnippetGen (using the EBM) and SnippetGen noEB (using Lucene).
Evaluation metrics
To evaluate the effectiveness of SnippetGen, we make use of the Precision@k metric:
We also make use of Normalized Discounted Cumulative Gain (NDCG) [17] to measure the precision of a code search based on the graded relevance of the results of a set of queries. It varies from 0.0 to 1.0, with 1.0 representing the ideal ranking of the results. The higher the NDCG value, the higher the accuracy is.
RQ1: The Overall Effectiveness of SnippetGen
We compare SnippetGen with CodeHow and CS by performing the 70 queries. As Table 1 shows, when the top 1 results are inspected, SnippetGen achieves a precision score of 0.846, which means that 84.6% of the first returned results are relevant methods without any subsequent modification. When the top 5 results are inspected, SnippetGen achieves a precision score of 0.861. These results are considered satisfactory. Note that only the results, which both receive relevant feedback and need not to be modified subsequently, are labeled as relevant. Thus the precision of CodeHow and CS is lower than previous papers, such as ref [2, 10].
The comparison among SnippetGen, CodeHow and CS
The comparison among SnippetGen, CodeHow and CS
Next we pick out CodeHow for comparative analysis, because CodeHow is the latest code search tool proposed in 2015. As Table 1 shows, CodeHow achieves a score of 0.658 when the top 1 results are inspected. SnippetGen achieves 28.5%, 66.2%, 70.3%, and 89.5% improvements in terms of Precision@1, Precision@5, Precision@10, and Precision@20, respectively. In terms of NDCG, SnippetGen obtains a score of 0.873, which also outperforms the CodeHow (0.712) by 22.6%. In the same way, SnippetGen performs better than CS.
Figure 4 shows the percentage of queries that SnippetGen performs better/worse than CodeHow. When the top 1 returned results are examined, SnippetGen wins in 36% of the queries and loses in 18% of the queries. In terms of the top 5 results, SnippetGen wins in 61% of the queries and loses in only 7% of the queries. The results confirm that the improvement achieved by SnippetGen is statistically significant.

The percentage of queries that SnippetGen performs better/worse than CodeHow code search.
To analyze the reason for the lost cases, we continue to explore what factors are correlated with the accuracy of SnippetGen. We depict the two lost cases (i.e., T1 “Convert utc time to local time” and T2 “How to get Color from Hexadecimal color code”) as shown in Fig. 5(a), which shows that the more method’s past modifications provided, the less common subset is likely to be shared among these modifications, which results in that accuracy can go down. For the draw, the methods are never modified. In this case, SnippetGen should be similar to what other tools could achieve. Different from Fig. 5(a), we depict the two promising cases (i.e., T3 “access data from excel” and T4 “find regular LINQ expressions”) as shown in Fig. 5(b), which shows that the more method’s past modifications provided, the higher accuracy is. It illustrates when methods are similar, adding the modifications may not decrease the number of common modifications, but may induce more identifier abstraction and produce more sufficient intents.

The accuracy of SnippetGen.
The above results illustrate that the accuracy of SnippetGen varies with the similarity and number of past modifications, and the similarity takes precedence over the number. Too many modifications or too few is not good for the accuracy. The more similar modifications are, the higher accuracy is.
RQ2: The Effectiveness of the refinement strategy
To answer this question, we compare SnippetGen with SnippetGen noRS . As Fig. 5 shows, SnippetGen noRS achieves precision scores of 0.44, 0.41, 0.43, and 0.41 when the top 1, 5, 10, 20 results are inspected, respectively. SnippetGen achieves 84%, 85%, 83%, and 81% improvement in terms of Precision@1, Precision@5, Precision@10, and Precision@20, respectively. In terms of NDCG, SnippetGen achieves a 43.7% improvement compared with SnippetGen noRS . The result indicates that the improvement achieved by the refinement strategy is statistically significant.
RQ3: The Effectiveness of the Extended Boolean model
To answer this question, we compare SnippetGen with SnippetGen noEB . As Fig. 6 shows, SnippetGen noEB achieves precision scores of 0.72, 0.69, 0.61 and 0.51 when the top 1, 5, 10, 20 returned results are inspected, respectively. SnippetGen outperforms SnippetGen noEB in terms of Precision@1, Precision@5, Precision@10, and Precision@20. In terms of NDCG, SnippetGen achieves a 24.2% improvement compared with SnippetGen noEB . The result confirms that the improvement achieved by the Extended Boolean model is statistically significant.

The comparison between SnippetGen and SnippetGen_noRS.

The comparison between SnippetGen and SnippetGen_noEB.
To evaluate the effectiveness of snippetGen in practice, we still assign 20 participants from the in-house experiment into 5 groups and design 3 programming tasks as follows: Task 1: LogCat C implement a useful log tool for Android. Main functions: line numbers printing, function calls, Json parsing, XML parsing and Log saving.
5
Task 2: ExcelDataReader C implement a lightweight library written in C# for reading Microsoft Excel files and offers multiple data access and database admin commands.
6
Task 3: Network sockets C implement the endpoints of internet connections between an Android client device and a Java server app over a local network.
7
Each team chooses a task by a lottery and completes the task using SnippetGen and the latest code search tool CodeHow. During the experiment, participants utilized both code search tools, entered a series of queries and searched for the methods that are needed to complete the tasks. To evaluate the accuracy of the two code search tools in retrieving methods based on user queries, we recorded the number of queries the participants used, as well as the methods they examined for completing the tasks. We also asked them to examine the top 10 returned results for each query, and mark which of them need not to be modified manually and are helpful for completing the tasks.
Our results show that the participants entered in total 95 queries to each code search tool during the experiment. Table 2 shows SnippetGen returns around 13%–24% more relevant results than CodeHow for 3 tasks. SnippetGen also outperforms CodeHow in terms of NDCG, which measures the rank of the first relevant returned result.
The user study results
The user study results
In addition, we conducted a survey by asking 20 participants if SnippetGen is helpful for their programming tasks. 16 (80.45%) participants stated that the more they use SnippetGen, the quicker they write code. 6 (30%) graduate students stated that SnippetGen helps them to come up-to-speed and perform difficult tasks far beyond their technical capabilities, just as the experienced team members have been used to do before. Overall, the user study confirms the usefulness and effectiveness of SnippetGen in programming practice.
The incorrect return results
Although SnippetGen is effective, it still cannot locate relevant code for all the queries. One unsuccessful query example is “Convert utc time to local time”. We find that some returned results are actually about “Convert local time to utc time”. Similar examples include “How to change RGB color to HSV”. Because our approach cannot distinguish semantic meanings of different orders of the intent. These examples show the importance of understanding the semantic meanings of potentially relevant code’s intent. In the future, we will perform deeper natural language analysis to achieve better understanding of query and codes intent.
The quality of the intent
Through the analysis to intent-enriched component, we have obtained the following insights.
Our experiments illustrate that the intent strictly depends on the similarity of the methods past modifications. If the modifications are diverse, SnippetGen extracts the fewer common modifications and obtains the insufficient intent. If the modifications are similar, SnippetGen extracts the more common modifications and obtains the sufficient intent.
To reach the maximum similarity, SnippetGen tries two ways. First, it applies the heuristic algorithm to pick out the most similar modifications from all modifications dynamically. No matter how many times the method is modified, the intent still remains or becomes more sufficient despite a big difference between some modifications. Second, SnippetGen can use the threshold t s = 0.6 in LCEOS (as shown in the process of identifying common intent) to tolerate inexact matches between modifications. If SnippetGen fails to find any common edit operation between two modifications, it generalizes all concrete instances of types, methods, variables and constants with the abstract identifiers $t, $m,$v and $c, to match edit type or inheriting type.
Threats to validity
We have identified the following threats to validity:
Threats to internal validity: our empirical study involves human subjects. The limited number and the programming capabilities of the human subjects may bias the results. The process of determining the relevance of a method could be also subjective. We will to conduct experiments and user studies involving more subjects, API methods and programming tasks to further reduce this threat.
Threats to external validity: we have used 70 queries in our in-house experiment. Although these queries are real-world queries collected from Portfolios author [16], admittedly they do not cover all types of queries that a developer may ask. Although our codebase consists of 27KB projects and 11.4 million methods, it is just a tiny sample of all available source code. We will reduce the threats to external threats by investigating more queries over a much larger codebase.
Related work
Code search
Early code search engines are the keywords-based information retrieval techniques [5]. For example, Google (codesearch.google.com), Koders (koders.com) and Krugle (krugle.org) offer only a keyword-based search and file-level retrieval. They generally have limited utility in looking for appropriate code fragments for a particular application. Tools such as Blueprint [15] and Assisme [20] use general search engines to search for results and automatically extract the relevant code snippets from the returned results. However, Stylos and Myers [16] observed that the search results of these code search engines seem inaccurate, because they are not designed to specifically support programming tasks.
To improve the accuracy, later work did semantic search. Originally, the work by Wing looked at matching function signatures [24], but was extended to match more complete formal semantics using prolog and Larchbased [10]. However, these techniques were impractical and did not to really succeed because either they attempted to do too little or too much. Other works in this area such as Specifications signature or type matching does not really find items of interest.
To promise better usability, recent work support free-text query via query expansion either by using an appropriate ontology [7, 8], natural language [22], or collaborative feedback [21]. These existing code search tools seem to yield semantically correct matches, but the search results might be too complex or too slow to meet the user needs. These results still has to be modified [4]. However, SnippetGen could retrieve the more relevant methods without any subsequent modification. Because this tool has the ability of intent predicting and can guess what else the user might do after obtaining the results.
Query expansion
A query expansion is a commonly-used technology to support free-text query. When the user input is usually vague, adding one or more synonyms of the words appearing in the user queries can enhance the precision of the search result. However, it was observed that automatically expanding a query with inappropriate synonyms may produce even worse results than not expanding the query [18]. The key factor of the query expansion is the choice of appropriate synonyms. Recently, several approaches have been proposed to improve the effectiveness of code search via query expansion.
McMillan et al. propose Portfolio that takes natural language descriptions as “synonyms” and outputs a list of functions or code fragments along with corresponding call graphs [7, 11]. Shaowei Wang recommends code fragments from a short textual query as “synonyms” and proposes a new and specialized query expansion algorithm (instead of using Rocchio) which incorporates structural information and employs parameter tuning. Wang et al. [6] proposed an active code search approach which incorporates user feedback as “synonyms” to refine the query. Dietrich et al. [8] proposed an approach that utilizes feedback and trace ability links to improve the efficacy of future queries. Currently, Fei et al. [2] propose a latest code search technique that could understand the APIs a user query refers to and considers both text similarity and potential APIs.
Our work is viewed as a query expansion, but we differ from others, because we first propose to incorporate intent as “synonyms” for expansion and consider the impact of both potential intents and text similarity on code search. Besides, we propose the refinement strategy in the intent extraction algorithm. This strategy can pick out the appropriate intents which benefit the code search.
Conclusion
In this paper, an intent-enforced code search approach (IECS) is proposed. It predicts the intent before performing code retrieval and applies EBM to retrieve the relevant code without any subsequent modification.
In the future, we plan to address the issues discussed in Section 7. Then we plan to either introduce similarity choosing heuristic algorithm to ensure a sufficient intent, or employ the deep learning approach to make the intent become self-improvement in the intent-enriched component. Also in the intent-enforced component, we adopt the deep learning approach for the large-scale codebase to improve search performance. Moreover, we introduce structured call sequences to capture code-usage patterns from method call sequences at a fine-granularity level to synthesize the relevant code with the higher precision.
Footnotes
codesearch.google.com
krugle.org
koders.com
Fully Qualified Name.
https://github.com.codepath android_guides wiki Sending and Receiving Data with Sockets
Acknowledgments
This work was supported by the key scientific research projects of Henan Province in 2015 under Grant No. 15A520022 and High Talent Scientific Research Project. We thank QingHuang for his contribution to this project. We thank all participates in experiments who provided us helpful comments.
