Abstract
Although code review is an essential step for ensuring the quality of software, it is surprising that current code review systems do not have mechanisms to protect the integrity of the code review process. We uncover multiple attacks against the code review infrastructure which are easy to execute, stealthy in nature, and can have a significant impact, such as allowing malicious or buggy code to be merged and propagated to future releases. To improve this status quo, in this work we lay the foundations for securing the code review process. Towards this end, we first identify a set of key design principles necessary to secure the code review process. We then use these principles to propose
Introduction
Code review is an integral part of the software development cycle [48,60]. It plays an essential role for software quality assurance by ensuring that new code changes are integrated into the codebase only after being vetted by a number of reviewers [16,52]. Modern code review systems such as Gerrit [19], Collaborator [11], Crucible [15], and ReviewBoard [54] have become very popular as they facilitate the code review process by providing an interactive web UI to discuss and review the code changes. Gerrit, for example, is used by Google for code review in open-source projects like Go, Chromium, and Android [8]. Also, GitHub hosts over 240 million repositories and is used by Microsoft to host thousands of projects and perform many development tasks [7,12,43]. Our analysis of the top 50 most starred GitHub projects shows that 45 out of 50 projects are using code review features, demonstrating their value to software engineers.
Despite providing a rich set of features, modern code review systems do not incorporate safeguards against tampering with the code review process. Even if a project owner defines a flawless code review policy, there are no guarantees that the intended code review process was indeed followed. As a result, end users and any independent auditor are left no choice but to assume that the code review server faithfully followed the code review policy. The lack of a secure code review step is the cause behind a significant percentage of attacks [44]. A compromised code review system can cause great damage [39]. Moreover, several important threats are associated with the code review system [53,58]. In the current threat landscape, attacks against the software supply chain have become a common occurrence with devastating impacts [6,13,65]. As such, it is paramount to ensure that the various steps of the software development chain have adequate protection.
Code review systems are susceptible to attacks that can manipulate the review process without being detected. For example, consider a project in which a reviewer finds a security bug in a proposed code change and then gives the change a negative review score which should block it from being merged into the project’s codebase. A malicious server, however, can hide or manipulate the negative review in order to make the change mergeable. As another example, a malicious server can bypass or tamper with the minimum number of approving reviews required by the review policy before a change can be merged.
Such attacks are mainly possible due to two major shortcomings of code review systems: (1) There is no or limited access to the code review information in subsequent steps of the development chain (e.g., build) as the review history is stored in a local database on the code review server. As such, neither the end user nor anyone else can verify what occurred in the code review step. (2) There is no reliable code review history as the reviews and the review policy are not protected and can be tampered with. Signing code commits provides protection against tampering with the code but does not protect against manipulation of the code review process itself.
To improve this status quo, we start by identifying a set of key design principles necessary to secure the code review process. We then use these principles to propose
Our goal in this work is not to improve the quality of a code review system (i.e., design better code review policy rules and best practices). Instead, we seek to lay the foundations for securing a given code review process. In other words, when a code review policy is in place, our goal is to ensure that the policy is actually respected, i.e., to ensure the integrity of the code review process. This basic guarantee must be provided if we are to hope the code review step is secure.
Based on a thorough understanding of the code review process subtleties on different code review platforms, we perform a comprehensive analysis of the attack surface in relation to the components of a code review system. As a result, we are the first to uncover a large attack surface by identifying attacks that tamper with the integrity of the code review process. These attacks are mostly related to specific details of the code review systems, easy to execute, stealthy in nature, and can have a significant impact. For instance, the attacks can lead to introducing a vulnerable or backdoored piece of code into the software product. This analysis enables us to identify a set of key design principles necessary to secure the code review process. We apply these principles to design We implement We analyze the security guarantees provided by
Background
This section provides background on the code review process. After a general overview, we look at two popular code review systems, GitHub [27] and Gerrit [19].
The code review process
Code review is common practice with the purpose of improving the quality, readability, and maintainability of the source code as well as the knowledge sharing [49]. Code review is usually done in a peer process in which new pieces of code are reviewed by developers other than the author of the code. Over the years, different approaches were used to review code, from offline code inspection meetings to an asynchronous tool-based code review process. Over the past decade, a modern code review process has been adopted by both open source and industrial communities [52,55]. For instance, Microsoft, Google, Facebook, and VMware perform the review process using CodeFlow [9], Gerrit [19], Phabricator [50], and ReviewBoard [54], respectively.
After a code author submits a new code change, reviewers check differences between the proposed change and the codebase (i.e., the stable version of the source code). Reviewers may accept, reject, or ask for further changes. This process is repeated until either the reviewers are satisfied and the code change can be integrated into the codebase, or they reach the conclusion that the code change cannot be integrated into the codebase.
The “pull-based development model” is a specific form of modern code review, in which a developer forks a repository and makes changes in the fork. Then, she submits the changes as a

A typical code review workflow.
As shown in Fig. 1, a typical code review workflow has four steps:
We provide next an overview of the code review workflow on GitHub and Gerrit, pointing out which steps in these code review systems correspond to the four steps of the generic code review workflow presented in Section 2.2.
: A code review policy is created by the project owner. The policy defines the rules that govern the code review process and the conditions that must be satisfied before proposed changes can be integrated into the codebase. For example, a minimum number of positive reviews may be required before merging new changes.
: Developers who want to modify the codebase propose changes and request to merge the proposed changes into the codebase. We refer to this request as a “merge request” (though different code management systems have specific names for it: “pull request” in GitHub, “merge request” in GitLab and “change” in Gerrit). A merge request is then created, and one or more reviewers are assigned to review the proposed changes.
: After reviewing the proposed changes, reviewers provide feedback, either positive (e.g., approve changes) or negative (e.g., request new modifications). In the latter case, developers propose new changes to address the reviewers’ concerns The review-change cycle continues until reviewers are satisfied with and approve the changes.
: When the merge policy is satisfied, the approved change is merged to the codebase by an authorized user (e.g., by the project owner).
The code review on GitHub
GitHub, the most popular web-based hosting service for open source projects, features integrated code review capabilities for those who want to collaboratively develop code. A merge request, referred to as “pull request”, allows developers to ask for code review and to receive feedback about their proposed code changes before those changes can be integrated into the codebase. This is a highly popular feature: In 2019 alone, developers made over 87 million pull requests on GitHub [28], which represents a growth of 28% compared to 2018. We provide next an overview of the GitHub’s code review process.
GitHub permissions
GitHub defines several permission levels for contributing to a repository, but four are relevant for the code review process:1
We focus on organization repositories, which are typical for collaborative projects. However, our work can also cover user account repositories, which have a more limited set of permissions.
The owner of a GitHub project can define a code review policy which describes, on a per branch basis, the rules pertaining to the code review process for changes to that branch (Step
). By default, code review is disabled, but can be enabled from a configuration option called “Branch Protection Rules” [29]. The remainder of this section refers to the case when code review is enabled.
With each review, a reviewer provides a rating for the proposed changes and text feedback. The rating is mandatory and can be one of three values: Approve (reviewer approves merging the proposed changes), Request changes (reviewer’s feedback must be addressed before the changes can be merged), and Comment (general feedback without explicitly approving the changes or requesting additional changes). The text feedback consists of comments about the proposed changes and is optional if the Approve rating is chosen.
One of the most common code review policy rules defines the required number of approving reviews before changes can be merged.2
By default, the number of required approving reviews is set to 1.
The review policy may contain additional optional rules. One such rule is to dismiss existing approving reviews when a code-modifying commit is pushed to the pull request branch. In other words, the code review process is reset and any existing approving reviews before this new commit will not be counted towards satisfying the required number of approving reviews.
The review policy may have a rule that requires approving reviews from “code owners”, which is a set of designated individuals that are responsible for code in a repository. In this case, the required number of approving reviews must be from these specific individuals. Finally, we note that the project owner can bypass the code review policy rules, based on her admin permission. For instance, the project owner can approve her own pull requests, and can merge a pull request even though there are not enough approving reviews.

The typical lifecycle of a pull request on GitHub.
In a typical workflow for a pull request on GitHub, the developer first clones the repository. Then she updates the code locally and submits the new code change to the GitHub repository. Next, the developer creates a pull request which will be reviewed by a number of reviewers (Step
). If reviewers request for changes or the developer herself decides to submit additional changes, she can update the pull request by submitting new commits (Step
). Finally, when the pull request is approved (per the code review policy), it can be merged into the codebase (Step
).
Although initially only the project owner has the ability to merge a pull request (based on her admin and permission), she can extend the admin or maintain permissions to other trusted users in order to manage day-to-day operations such as merging pull requests.
For example, consider the pull request shown in Fig. 2. The developer creates a local feature branch (“dev”) in which she places her changes and creates a new commit C1. Receiving feedback from reviewers, she improves her pull request and submits two new commits (C2 and C3). When C3 is approved by reviewers, the pull request is merged into the base branch (“master”), which results in commit C5.
The code review on Gerrit
Gerrit, a popular code review system used in big open-source projects like Go, Chromium, and Android [8], is a highly configurable tool that was designed specifically to support the code review process. A merge request, referred to as a “change”, allows code changes to be reviewed before being integrated into the codebase.
A Gerrit server manages the source code using two locations: an “authoritative repository” which contains the stable version of the codebase and a “pending changes” location which is a staging area for new code changes. According to the Gerrit workflow, developers fetch code from the authoritative repository and push their new changes to the staging area. The proposed changes are being reviewed, possibly updated, and eventually are getting merged (i.e., submitted) into the authoritative repository.
Gerrit users’ activity is centered around two types of actions. First, Gerrit users can perform reviews on a change. All review information is stored by Gerrit in a dedicated database, which is stored separately from the underlying Git source code repository. Second, Gerrit users can update the source code in the change (i.e., adding, modifying, or deleting files). Any modification to a Gerrit change is referred to as a “patch set” and results in a new Git commit object. We note that users can update the commit message of the change through the web UI, which also results in a new patch set being added to the change.
Gerrit permissions
Compared to GitHub, Gerrit has a more complex framework for defining permissions. Access rights are defined based on groups. Every user is a member of one or more groups, and the users’ access rights are inherited from the groups to which they belong. The following access rights are the most relevant for the code review process and can be defined on a per branch basis:
Read: allows to read any data of a project, including any proposed code changes. Code-Review[−2 .. +2]: allows to submit a code review with a rating score that can range between Upload to Code Review: allows to create a new change for code review. We also refer to this as a “Create Change” access right. Add Patch Set: allows to upload a new patch set to existing changes. Submit: allows to merge a change into the destination branch. In addition to needing this access right, a user can merge a change only if the change also satisfies the existing code review policy. Rebase: allows to rebase changes via the web UI. Owner: allows to modify a project’s configuration, which includes granting/revoking any access rights.
Gerrit user groups
Gerrit comes predefined with the following user groups which are relevant for the code review process:
Registered Users are signed-in users who have the Code-Review[−1 .. +1] permission, meaning they can provide feedback on a change (score between
Change Owners are users who have created a change. They have the Code-Review[−1 .. +1] permission for that change, meaning they can review and rate the change but cannot cause it to become approved or rejected. By default, users from this group also have the following permissions: Read, Add Patch Set, Rebase, and Submit. We note that a Change Owner will be able to submit their change only when it meets the rules of the code review policy (i.e., it gets approved by other reviewers).
Project Owners are users who have the Owner permission and as such can grant/revoke themselves (or others) any access rights. For example, they can change the permissions of the default groups (e.g., allowing Registered Users to block a change). By default, users from this group also have all the permissions that are relevant for code reviewing (Read, Code-Review[−2 .. +2], Create Change, Add Patch Set, Submit, and Rebase). Many Gerrit instances use the name Maintainers for this group.
Administrators are the most powerful users in Gerrit. In addition to all permissions of Project Owners, users in this group also have capabilities needed to administer a Gerrit instance and typically have direct filesystem and database access. It is noteworthy that, in a typical Gerrit’s workflow, Administrators are not involved in the code review process.
In addition to these predefined groups, many Gerrit instances also commonly define the Developers group. This group is created by Project Owners, who can define new custom groups. Users in this group are typically core developers, who have all the permissions of Registered Users, plus the ability to Code-Review[−2 .. +2] and Submit changes.
Gerrit code review policies
The owner of a Gerrit project can define a code review policy using different components (Step
). The main component is the submit rule, a logic that evaluates code changes through a rating process to determine if the proposed changes can be merged into the codebase. Each code review consists of a rating (i.e., a score) and text feedback. The text feedback is optional and consists of comments about the proposed changes. The rating is mandatory and is an integer that ranges from
The default submit rule in Gerrit is that a change can be merged if it has received at least a review with the highest score (
Creating changes
In a typical code review workflow on Gerrit, developers create a change by uploading a new commit to the Gerrit server which is automatically sent to the
). During the course of the review process, reviewers discuss the change and might ask for improvements (Step
).
Any modification to an existing change is called a “patch set”. At the repository level, a change is always represented by a single commit object. When a patch set is added, this commit object is amended (i.e., a developer amends the previous commit using “
We note that patch sets are normally created by developers to either update the code or the commit message. However, anybody (including the reviewer) who has the Add Patch Set permission can create a new patch set.
Merging changes

The typical lifecycle of a code change in Gerrit under the default “merge if necessary” strategy.
When enough reviewers approve the change, the last patch set will be merged into the
). For example, consider the change shown in Fig. 3. A developer creates the change by sending commit C1 to the Gerrit server. Reviewers ask for improvements twice and, the developer submits two amended commits (C2 and C3) to get the change approved.
Different strategies may be employed for merging a change into the codebase. In Gerrit terminology, the merge strategy is referred to as the “submit type”. Gerrit supports six submit types, such as Fast Forward Only (the head of the codebase repository is fast-forwarded to the change commit), and Merge Always (equivalent to “
Consider the repository shown in Fig. 3, in which a developer forks the
We note that the project owner may change most default review policies through the Gerrit web UI. For example, the project owner can create new groups or can change the permissions of existing groups.
The owner can also change other default global project settings, such as the review workflow, the submit rule, and the submit type. These settings are defined in three files,
Threat model
We assume a threat model in which the attacker seeks to violate the integrity of the code review process in a web-based code review system. This can have severe negative consequences for the code repository, such as merging into the production codebase code that has not been properly vetted and contains vulnerabilities. Such dangerous code may consist for example of experimental features that were rejected, debugging code, or code in which security features were removed intentionally for testing purposes.
To tamper with the integrity of the code review process, the attacker has two main avenues: Manipulate the code review policy and/or the steps of the actual code review process (i.e., the individual code reviews and their sequence). The former attack can be carried out through a variety of approaches. A malicious server may deem a code change mergeable even though the minimum number of approving reviews is not met. It can also add unauthorized users to the code review process or disable some rules (e.g., mandatory reviews from specific users). Moreover, an attacker may manipulate the code review policy by counting outdated reviews or changes from unauthorized users. The latter attack can be executed by modifying or removing existing code reviews (e.g., removing a review with a low score, or changing the score in a review), or by adding illegitimate code reviews. Specific attacks are described in Section 4.
We assume the attacker is able to tamper with the repository (e.g., modify data stored on the Git repository) and can modify the internal database that stores the code review information. The attacker can also tamper with the configuration files that define the code review policy and other important settings that can affect the code review process. Finally, the attacker may manipulate the information displayed to users, for example, by hiding some of the reviews and making a change look like it is ready to be merged when in fact it is not. This scenario may happen directly through a compromised or malicious code review server. Such attacks have been on the rise recently [17,40,47,51,53,57,64]. It can also happen indirectly through MITM attacks, such as government attacks against GitHub [36,42].
We assume that all Git commits are signed by the clients who create the commits and digital signatures provide adequate security (i.e., the attacker cannot compromise a digital signature scheme). The ability to sign commits on the client side is available either through Git’s command line tool or by using a tool such as le-git-imate [1,2] for commits created using a web-based UI such as GitHub/GitLab. In practice, this assumption is supported by the fact that most code review systems allow a rule in the code review policy to require that all commits need to be signed. Even though the attacker can bypass the code review process and write to the Git repository, signed commits combined with Git’s commit hash chaining mechanism greatly limit the attacker’s ability to arbitrarily tamper with the repository while remaining undetected. As such, an unscrupulous server or a malicious reviewer cannot simply manipulate individual commits (e.g., inject arbitrary new commits or modify existing commits) without self incriminating themselves. Removing an existing commit from the end of the commit chain, or entirely discarding a commit submitted via the web UI are actions that have a high probability of being noticed by developers. Otherwise, our solutions cannot detect such attacks, and a more comprehensive solution should be used, such as a reference state log [62].
We focus on attacks that tamper with the integrity of the code review process (specific attacks are described in Section 4). A limitation of our work is that attacks that tamper with code changes (i.e., commits) performed by the user via the web UI are not addressed. For such attacks, we refer the reader to a comprehensive list of attacks and defenses [2].
We trust the users who are authorized to update the source code repository associated with the code review system. That includes users who are authorized to propose code changes (i.e., developers) or to merge code changes (e.g., the project owner or a maintainer). We assume that the attacker does not have the signing key of these trusted individuals. As such, the attacker cannot tamper with the signed commits.
In addition, we assume that reviewers are not fully trusted and they may attempt to perform attacks that bypass the code review policies, as long as they do not self incriminate (i.e., remain undetected). The code review policy is calibrated so that the minimum number of approving reviews reflects the trustworthiness of the reviewers (i.e., there are enough honest reviewers to avoid a situation where reviewers give misleading scores in order to merge dangerous code).
Finally, we assume that addressing the following problems is outside the scope of this paper:
A reviewer who always approves a merge request regardless of the quality of the proposed change. In other words, we do not address human carelessness. A weakness or bug that is not caught by a regular code review system. Our goal is not to improve the code review system’s ability to catch subtle code bugs/vulnerabilities that may have been introduced by malicious developers. Instead, we seek to protect the integrity of the code review process (i.e., ensure that the prescribed review policy is being followed correctly). A flaw in the code review policy that allows dangerous code to become part of the codebase. For example, consider a review policy that requires three approving reviews for code changes, but does not dismiss existing code reviews upon receiving a new code update. Assume that a merge request gets approved by two honest reviewers, and then the third reviewer maliciously injects a backdoor to the merge request as well as approves it. This results in merging dangerous code to the codebase without violating the code review policy. This attack could be avoided if the review policy enforces dismissing stale approving reviews. We do not seek to detect such review policy flaws.
Security guarantees
To address this threat model, the goal of a secure code review system should be to enforce the following security guarantees:
Attacks
In this section, we identify new attacks against a typical code review workflow. Common to these attacks is the fact that an unscrupulous code review server can manipulate arbitrarily code review information. Two main attack avenues can be used to manipulate the code review process: (1) Manipulate the steps of the actual code review process (i.e., the individual code reviews and their sequence); (2) Manipulate the code review policy. The goal of these attacks is twofold:
The attacks described in this section are easy to execute, as the server simply has to manipulate data in the code review database and/or configuration files describing the code review policy. Nevertheless, the attacks’ impact can be significant. Many of the attacks are stealthy in nature due to the lack of protections for code review metadata and also because subtle violations of the code review policy are difficult to detect manually.
Code review manipulation attacks
An attacker can achieve the aforementioned goals by modifying, deleting, or adding to the existing code reviews. To achieve AG1, the attacker may decide to improve the rating of existing code reviews, or delete negative reviews, or add new illegitimate positive reviews. To achieve AG2, the server may lower the rating of existing reviews, remove positive reviews, or add new negative reviews. For example, assume that the scenario presented in Fig. 3 uses a review policy in which a change is mergeable if there is at least one review with the highest score (

Review manipulation attack.
When the code review server determines the review policy is satisfied for a change, it notifies the merger to proceed with the merging operation (e.g., by rendering a green “Merge” button on the GitHub pull request page or a blue “Submit” button on the Gerrit change page). The assumption is that the server correctly assessed that the review policy is satisfied based on the existing reviews. However, an unscrupulous server may automatically merge the code change or mislead the merger to merge the change even if the review policy has not been satisfied.
One way the server can execute a code review manipulation attack is to tamper with the code review policy (i.e., the rules defining the policy or the configuration files that are relevant for the policy). This may lead to merging of dangerous code, for example, if the minimum number of approving reviews is reduced.
Another way to execute a code review manipulation attack is to falsely declare that a change is mergeable even when it does not satisfy the review policy. The server counts that such an attack will go unnoticed based on two facts: (1) The merger will not notice that the review policy is not satisfied, especially when a small detail is not respected. For example, if the review policy requires at least two approving reviews, the merger may not notice if out of two approving reviews, one review is performed by a user who is not authorized to provide reviews; (2) After the change is merged, there is no record of the reviews in the repository, so an auditor cloning the repository cannot verify the correctness of the code review process. Even when the auditor has access to the code review server which allows access to the code reviews database, there is no mechanism for independent validation of the code review process.
We introduce next a variety of code review policy manipulation attacks that can be used by a malicious server to achieve AG1 and AG2.
Bypassing a minimum number of approving reviews
A popular code review policy rule is that a code change can be merged if it receives a minimum number of approving reviews. A malicious server can bypass this rule by either changing the minimum number of approvals or by completely ignoring this rule. For example, consider a GitHub code review policy that requires that at least 3 reviewers provide approving reviews. The server can tamper with the review process by declaring a change is mergeable after only 2 approving reviews. As a result, a change that has not received enough scrutiny and may still contain security vulnerabilities will be merged.
Counting reviews from unauthorized reviewers
A server can upgrade or downgrade the reviewers’ permissions in order to achieve AG1 and AG2, respectively. The server can make it look like a user who submitted an approving review has write permissions, when in fact the user has only read permissions and the review should not be counted towards the minimum required number of approvals. The server may also count reviews from the author of the code change. Most code review systems allow receiving reviews from the user who owns code changes, however, their approval does not count toward the minimum number of approvals. A malicious server can ignore this policy and the attack can go undetected easily. Finally, the malicious server can add an unauthorized reviewer to the code review process.
These attacks are simple to execute, either by manipulating the configuration files that define user permissions, or by temporarily misrepresenting the user permissions when the merger is merging the change. In all these cases, the merger can be deceived to prematurely merge a change that has not received enough reviewing scrutiny. Similarly, an auditor who wants to later verify if the code review policy was correctly enforced, has no way to reliably do so.
Excluding required reviews from specific users
Code review systems may enforce a rule which requires that code changes be reviewed by specific users. GitHub, for example, allows requiring approving reviews from users in a group called “code owners”. Gerrit also allows defining a rule called “Master and Apprentice”, according to which all code changes introduced by a user (i.e., Apprentice) must be approved by another user called Master.
However, a malicious server may deem a code change mergeable even though there are not enough approving reviews from specific users. The server can also manipulate the composition of this group of users from which reviews are required. The attack can not be detected easily since neither the merger (before merging) nor a verifier (later, after merging) cannot reliably verify if the server enforced the policy correctly.
Counting outdated reviews
One common practice is to reset the code review process when a code-modifying commit (i.e., a patch set) is pushed in a change. That means existing approving reviews are dismissed and should not be counted towards the require number of approving reviews. This policy helps to catch bugs introduced by a patch set. A malicious server, however, may disable/ignore this policy to take advantage of stale approving reviews and deem a change as mergeable with insufficient review scrutiny.
Merging changes without addressing comments
A code review system may enforce addressing all comments before allowing the code change to be part of the codebase. GitHub by default does not allow to merge a change unless requests for changes are addressed. Gerrit also allows defining such policy – make a change submittable if all comments have been resolved. Thus, if any reviewer rejects a code change, it cannot be merged unless the same reviewer approves it. A malicious server, however, can bypass this policy to prevent discovered code defects from being patched.
Misusing the project owner’s authority
In most popular code review systems, the project owner has the ability to bypass the code review policy rules – this includes merging code changes even if the review policy is not satisfied. A malicious server can take advantage of this authority to merge vulnerable code changes. The impact of this attack could be severe since during the verification step, the auditor has no way to verify if the project owner made the merge or a malicious server impersonates the project owner’s role.
Accepting changes from unauthorized users
A code review policy rule may enforce accepting changes only from a specific group of authors. For example, Gerrit allows setting a rule making a code change submittable if it has a specific commit author. GitHub also allows to define a similar policy – specify users that are not allowed to push to matching branches. This rule prevents unintentionally accepting code changes from inexperienced developers even though their code changes get approved by reviewers. A malicious server may ignore this policy.
Modifying project settings
A malicious server can modify different project settings to achieve AG1 and AG2. The server can simply disable the commit signing requirement. This would weaken significantly the project’s security because then code could be arbitrarily manipulated in an undetectable fashion. The server can manipulate user permissions, for example by modifying project configuration files that control access rights and composition of user groups in Gerrit. Consequently, unauthorized users will be able to participate in the review process. In a similar scenario, the server may modify the rating score associated with a group of users. That could result in approving or rejecting a code change maliciously. Moreover, an unscrupulous GitHub server may alter the
Solution: Securing the code review process with SecureReview
In this section, we introduce
Design principles
Popular code review systems such as GitHub and Gerrit do not provide verifiable guarantees about the integrity of the review process due to the following shortcomings:
To address the aforementioned shortcomings, we identify a set of design principles that should be satisfied by any solution that seeks to add integrity to a code review system.
Strawman solutions
Current code review systems do not have mechanisms to ensure the integrity of the code review policy. As a result, the integrity of the code review process can be violated by simply tampering with this policy. It may be tempting to assume that we can address this issue with straightforward approaches.
SecureReview design
In a typical code review workflow (as described in Fig. 1 of Section 2.2), a
Creating and storing a signed code review policy
Popular code review systems allow users to define a code review policy, e.g., GitHub’s Branch Protection Rules [29], GitLab’s Merge Request Approvals [34], or Gerrit’s Submit Rules [22].
Common to these code review systems is that a code review policy consists of a set of rules by which the owner of a software project can define the requirements that must be fulfilled before a proposed change can be merged into the codebase. GitHub, for instance, provides a small set of rules such as the minimum number of required approving reviews and dismissing stale pull request approvals. Gerrit, on the other hand, has a more complex set of rules and also allows authorized users to add new customized rules to the code review policy.
Unfortunately, current code review systems, do not protect the code review policy against a malicious server or against malicious reviewers. To address this issue,
Reviewers: A list of users (i.e., public keys) who can review the code. This ensures that only authorized reviewers are allowed to participate in the code review process.
Signature: The review policy must be signed with the project owner’s private key. This protects the review policy from being tampered with.
To protect the code review policy, we are faced with several challenges. First, we need to determine what should be included as part of the code review policy. In addition to the explicit code review policy rules, there is additional information that must be protected to ensure the integrity of the code review process, such as the
Second, the project owner needs a mechanism to sign the code review policy. Third, the signature should be stored on the server in a way that is accessible, so that it can be retrieved later, whenever verification is performed. Finally, most existing popular code review systems have a fixed set of policy rules and do not allow us to define new fields, e.g., a field to store a signature over the review policy.
Creating and storing signed reviews

The review unit’s format. Each review unit is computed by the reviewer who performed the review.

The data over which the signature field in a review unit is computed. Note that the signature for the first review in a
The signatures will prevent the reviews from being tampered with, thus addressing one of the limitations of current code review systems. To address the other limitation, (i.e., reviews are not accessible after the code review phase), we are faced with a challenge: How to make the reviews accessible in a way that requires no (or minimal) server-side changes in current code review systems? To tackle this challenge,

Create signed code reviews.
When the
With
Finally, the Merger embeds the review units into the commit message of the merge commit. For this,
Compact integration: the Merger integrates the review units from the commits of the Merge Request, without the signature field in each review unit.
Full integration: the Merger integrates the entire review units from the commits of Merge Request, including the signature field in each review unit.
From a security perspective, in the Full integration option, the Merger is trusted to include the review units, but cannot tamper with or remove review units from the middle of the chain of review units. However, the Merger may omit review units from the end of the chain. In general, since the Merger is usually a trusted entity (such as the project owner), this should not be a major concern. Still, the Full integration provides some accountability for the Merger, as it allows a verifier to check that no information was tampered in the review units that are included in the merge commit, thus reducing the trust in the Merger. As opposed to that, the Compact integration option fully trusts the Merger to correctly integrate the review units.
On the other hand, the Full integration option has the drawback that the commit message of the merge commit may increase significantly because of the extra signatures that are included. For example, if a change on the Gerrit server receives 10 reviews,3
We note that at Google, each code change receives a peak of 12.5 reviews for changes of 1250 lines [55].
When cloning or pulling changes from a repository, an auditor can verify the integrity of the code review process for each branch in the repository by executing the Validate_Branch procedure. The procedure first checks if the code review policy (lines 1–2) and the commits in the branch (lines 7–9) have valid signatures. It then traverses the commit tree in the branch and extracts the commits corresponding to a merge request by calling the Extract_Merge_Request_Commit procedure (line 13). The Validate_Branch procedure then calls the Validate_Reviews procedure to check whether the merge request was properly approved according to the intended code review policy (line 16).
At a high level, Validate_Reviews contains three steps: (1) extract the review units from the commits of the merge request and validate the signature over each review unit; (2) check the chaining between review units (i.e., that each review includes the signature field of the previous review unit); (3) check whether the sequence of reviews indicated by the signed review units leads to a code change that respects the intended code review policy. For step (3) of this procedure, we check the most common policy rules such as if the minimum number of approvals is met and if the reviewers are authorized to provide approving reviews. However, since Gerrit allows a project owner to customize the review policy arbitrarily, a complete treatment of step (3) would require a more complex check that is outside the scope of this paper.
The Validate_Reviews procedure receives as input a valid signed review policy and a set of commits corresponding to a merge request and checks whether the merge request was approved according to the intended review policy. After validating the signatures on the review units and checking the chaining between review units (lines 1–4), it checks if a direct push (i.e., no reviews) is detected (lines 5–10). Then if the merger is authorized (lines 11–15), the most common code review policy rules are checked (lines 16–18) such as if the merge request’s creator was allowed to submit a code change; if reviews are created by authorized reviewers; if the sequence of review units meets the minimum number of approving reviews defined by the policy; if only those approvals that satisfy the review policy are counted; if there are required reviews from specific users; and if outdated approving reviews are dismissed correctly.
Versioning the code review policy
Although it may happen seldom over the lifetime of a project, changing the code review policy is allowed in code review systems, including in GitHub and in Gerrit.
Deployments
In this section, we show how to integrate our design into two popular web-based code review systems, GitHub and Gerrit.
We implement
To capture and store reviews,
The content script runs in the browser and can read or modify the
The background script cannot access the content of the
For signing and storing reviews: retrieve parent commit object (i.e., parent commit, tree hash, commit message) create a review unit, embed it in the commit message, create a signed commit object and push it to the server;
For merging the change: check if the code review policy is being followed, create a signed merge commit object that includes the review units in the commit message and push it to the server.
To maintain efficiency and satisfy design principle DP6 (Minimum impact on the user’s experience),
With
In the following, we highlight
SecureReview for GitHub
In this section, we describe how
Since GitHub provides no mechanism to protect this information,
For each new review,
To illustrate this procedure, we consider the example shown in Fig. 2. The reviewer makes three reviews before the
For the first three merge methods,
SecureReview for Gerrit
Gerrit’s code review policy is defined in three configuration files ( i.e.,
When a reviewer performs a new review,
Security analysis
In this section, we first show how
Achieving the security guarantees
When dealing with a malicious server, these security checks allow
Mitigating the attacks
In this section, we analyze
Experimental evaluation
In this section, we explore the different costs of deploying
Popularity is based on the “star” ranking by GitHub users, which reflects their level of interest in a project.
Repositories chosen for the evaluation. We show the size of the working directory of the default branch, the file count, the average file size, and the total number of commits
We conducted our experiments on a client system with an Intel Corei7 CPU at 2.70 GHz and 16 GB RAM. The client software consisted of Linux 5.0.16-100.fc28.x86_64 with git 2.19 and GnuPG 2-2.7 for 2048-bit RSA signatures. On the server side, we used GitHub.com itself and a self-hosted Gerrit server on an Amazon EC2 instance with two vCPUs (Intel XeonE5-2686 at 2.30 GHz), 8 GB RAM, Linux version 5.3.0-1023-aws, Gerrit server 3.2.2, and git 2.17. We also note that: (1) the time to push the Git commit object to the server is not included in the measurements, (2) when running
We focused our experiments on three performance metrics: execution time, storage overhead, and verification time. We benchmarked both integration options of
Execution time for storing signed reviews and merging changes (in seconds)
To perform a merge commit,
On Gerrit,
We note that increasing the number of reviews per merge request will not affect execution time significantly because we use REST API [23,30] to get all merge request commits (i.e., all review units embedded in commits) at once. More review units could affect the time to verify signatures over review units. However, in the merge operation, signature verification has a very small portion of the overall execution time (for a merge request with four review units, signature verification takes 0.023 seconds out of the 0.40–1.94 seconds needed to execute the merge request).
Storage overhead per merge commit
To measure that overhead, we first computed the size of merge commit objects created during our execution timing tests (which included four review units per merge commit) and contrasted it to the same merge commits without
When Compact integration is in place, each review unit added about 58 bytes to the merge commit object (and a total of 232 bytes for four review units). For Full integration, the storage overhead for one and four review units was 572 bytes and 2288 bytes, respectively. The Compact integration adds less than 8% overhead per review unit. Full integration, however, has an overhead of 25% to 73% for different repositories. To put these values in context, the largest storage overhead in Table 3 is approximately 2 KB per commit for full integration of reviews, which represents less than 0.0006 of the repository size even for a small repository like gitignore. Therefore, we argue that the storage overhead is negligible and does not affect the user experience. Of note, the storage overhead depends on the size of the original commit message and the integration option – the larger the commit message is, the less the overhead caused by
There is plenty of work on improving code review quality. Work by Bosu [4] describes a methodology to evaluate the impact of peer code review on security vulnerabilities in Open Source Software (OSS) communities. For instance, it suggests that analyzing a set of keywords used in the code review comments could lead to identifying security vulnerabilities. This approach is extended in another study [5] by finding the code changes that are more prone to contain security vulnerabilities by asking the code reviewers to pay more attention to such code snippets. Work by Ogale [46] proposes a tool to identify the scope of mandatory code review which is part of the source code that should be manually reviewed. Ibrahim et al. [37] give a checklist for the secure code review process that should be done before releasing the software or even before committing the code to the codebase. While all these tools improve the security stance of a project, they do not address that the review process and policy are followed. In this case,
Kalyan et al. [38] explore the shortcomings of existing code review tools and introduce Fistbump, a tool to improve the code review process on GitHub. Fistbump manipulates the existing pull request interface on GitHub and allows to manage the discussion between the author of a merge request and the selected reviewers. Fistbump also applies best practices in web security, such as cross site scripting (XSS) prevention, and HTTPS communication. Using GitHub APIs, CodeReviewHub [10] creates a task list per pull request on GitHub and adds a pending task for each comment. The main goal is to improve the pull request management by keeping track of unaddressed comments and open issues. RevRec [66] tries to improve the code review process on pull request based systems such as GitHub by recommending the best code reviewers. These systems are improving the developer experience and workflow, and their improvements can accommodate
In addition, there are studies that improved the security of the VCS which relate to the security of the code review process as well. Wheeler [63] provides an overview of security issues related to software configuration management tools and provides a set of security principles, threat models, and solutions to address these threats. Gerwitz [25] provides a detailed description of creating and verifying Git signed commits.
Work by Torres-Arias et al. [62] covers some attack vectors against VCS where a malicious server tampers with Git metadata to trick users into performing unintended operations. The attacks are mitigated by maintaining a cryptographically-signed log of user actions. This solution may be used to protect against complex attacks that seek to tamper with the sequence of reviews (e.g., removing reviews from the end of the review chain). However, such solutions only focus on the consistency of the VCS itself and target attacks that cause developers to have inconsistent views of the branches in a Git repository. In particular, they were not designed to account for the intricacies of the code review policies and cannot defend against the specific attacks identified in this work. Whereas they can complement the solutions proposed in our work, they can not fully ensure the integrity and consistency of the entire process by which code changes are integrated into the VCS.
Perhaps most closely related work to
Code review is one step of the software development cycle. Prior work seeks to secure the integrity of the entire software supply chain [61].
Conclusion
In this paper, we introduced
Footnotes
Acknowledgments
This research was supported by the US NSF (National Science Foundation) under Grants No. CNS 1801430, DGE 1565478, and DGE 2043104.
