The Importance of Code Privacy

For many businesses, their code is their intellectual property (IP), and granting third parties access to it requires thorough consideration and review of the third party and, in some cases, contracts, certifications and more to ensure trust.

Even if the third-party vendor is a good actor, and keeps up with the best security standards, bad things can and does still happen, which can put your business at risk.

This, among other reasons, is why Code Privacy should be an important metric when picking software vendors. Tools you use today might be able to access your source code; automated code style fixers, error monitoring software and more. Anything with a Git integration might have been given permissions to access your private repositories.

OtterWise is built on a foundation that prioritises simplicity, and privacy. By reducing the information that OtterWise can access, we lower the risk for our clients, and ourselves.

Tracking Code Coverage, Without Accessing Code

Building a code coverage tool that does not have access to code is not an easy feat, especially when Git providers make needed API permissions too lax.

So, to avoid ever coming in to contact with source code, some measures had to be taken. First, let's inspect where code could have been accessed:

Git Diff (for Patch Coverage)
Coverage Files (for example coverage XML clover)
File viewer (inside OtterWise)

Let us tackle each of the things, one by one:

Making the Git Diff Private

A unified git diff usually looks something like this:

diff --git a/resources/assets/js/orders/order-manage.vue b/resources/assets/js/orders/order-manage.vue
index 1592c4dba3..607fe5a208 100644
--- a/resources/assets/js/orders/order-manage.vue
+++ b/resources/assets/js/orders/order-manage.vue
@@ -554,6 +554,7 @@
 								<col width="90" />
 								<col width="140" />
 								<col width="100" />
+								<col width="100" />
 								<col width="80" />
 							</colgroup>
 							<thead>
@@ -573,6 +574,9 @@
 										({{ item.currency || global_setting("user_account.settings.system.currency") }})
 									</th>
 									<th class="text-right">{{ trans("misc.discount") }}</th>
+									<th class="text-right">
+										{{ trans("misc.profit") + " %" }}
+									</th>
 									<th class="text-right">
 										{{ trans("misc.total") }}
 										({{ item.currency || global_setting("user_account.settings.system.currency") }})

Currently full of source code, which we don't want. These files are necessary to figure out the patch coverage, but the code itself is not.

What is interesting to us, are the file names, as well as the diff markings (+/- etc.)

So to make it private, all we got to do is trim out the code. Our open-source uploader script handles this, here is a rough example:

// Split diff into an array of lines
$diffLines = explode("\n", $diff);

// Strip code!
foreach($diffLines as $index => $line) {
        // Skip everything we want to keep

        if(Str::startsWith($line, 'diff --git a/')) {
            continue;
        }

        if(preg_match('/^(new file mode [0-9]{6})/', $line)) {
            continue;
        }

        if(preg_match('/^(deleted file mode [0-9]{6})/', $line)) {
            continue;
        }

        if(preg_match('/^(index ([0-9a-zA-Z]{7})\.\.([0-9a-zA-Z]{7}))/', $line)) {
            continue;
        }

        if(Str::startsWith($line, '--- ')) {
            continue;
        }

        if(Str::startsWith($line, '+++ ')) {
            continue;
        }

        if(preg_match('/^(@@ -[0-9]{1,}(,[0-9]{1,}){0,1} \+[0-9]{1,}(,[0-9]{1,}){0,1} @@)/', $line, $matches)) {
            $lines[$index] = $matches[1];
            continue;
        }

        // Otherwise get first character (+ or -)

        $lines[$index] = $line[0];
}

// Put back together the diff for later usage
$diff = implode("\n", $diffLines);

The output will now be:

diff --git a/resources/assets/js/orders/order-manage.vue b/resources/assets/js/orders/order-manage.vue
index 1592c4dba3..607fe5a208 100644
--- a/resources/assets/js/orders/order-manage.vue
+++ b/resources/assets/js/orders/order-manage.vue
@@ -554,6 +554,7 @@



+



@@ -573,6 +574,9 @@



+
+
+

Notice how now, it is mostly empty lines, except in those places with a -/+ (to indicate diff). Filenames are kept for reference and to track the history of each file. This process happens in the CI workflow, so OtterWise never sees the original diff.

Coverage Files

The another point of contact with code is the Coverage Files (clover, etc.) which can contain class and method names, which some might deem sensitive. Therefor, similarly to the git diff, we strip away any data we can get away with, as it is not needed. A coverage clover might look like this (trimmed for brevity):

<?xml version="1.0" encoding="UTF-8"?>
<coverage generated="1672008447">
    <project timestamp="1672008447">
        <package name="LasseRafn\CsvReader">
            <file name="/home/runner/work/csv-reader/csv-reader/src/Reader.php">
                <class name="LasseRafn\CsvReader\Reader" namespace="LasseRafn\CsvReader">
                    <metrics .../>
                </class>
                <line num="53" type="method" name="__construct" visibility="public" complexity="4" crap="4.02" count="17"/>
                <line num="54" type="stmt" count="17"/>
                <line num="55" type="stmt" count="1"/>
                <line num="58" type="stmt" count="17"/>

But after stripping values, the file being sent to OtterWise looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<coverage>
    <project>
        <package>
            <file name="/csv-reader/csv-reader/src/Reader.php">
                <class>
                    <metrics .../>
                </class>
                <line num="53" type="method" complexity="4" crap="4.02" count="17"/>
                <line num="54" type="stmt" count="17"/>
                <line num="55" type="stmt" count="1"/>
                <line num="58" type="stmt" count="17"/>

Notice how class names, method names and namespaces have been removed. We have to keep line "type", as it indicates if the line is a statement, method, or something else, which is important in coverage tracking; however, the actual line content is never retrieved. Filenames are kept for referencing and keeping history of coverage across files.

File Viewer inside OtterWise

Note, for public repositories, we permit the File Viewer without any extra opt-in or extensions, as the code is publicly available. This section primarily focuses on Private Repositories, although the options are available for public ones too.

This one is a bit trickier, since it requires actually displaying code in the browser, inside OtterWise. I decided to go with multiple solutions to the problem, to avoid the user having a worse experience from using OtterWise.

Let us look at the 3 options that will be provided to users.

Option 1: Annotations

We can create GitHub check annotations on a per-line basis without ever viewing or having access to code. This lets us provide a decent experience and quick overview, with no additional effort for the users. Enabling this can be done through the repository settings page.

Option 2: Downloadable coverage files

By letting you download the coverage file generated during CI, it can be imported into your supported IDE or code editor, such as PhpStorm, that can render a per-line diff.

Option 3: Opt-in code access

Organisation admins can opt in to granting access to code, team members cannot opt in to the feature without admin access. Only team members with access to the repository will be able to access the code through OtterWise once the feature is enabled.

When a user attempts to view a file inside OtterWise, we will notify them that it is not possible without granting additional permissions through a separate GitHub OAuth flow specifically for code viewing on a user-level (note this requires that your organization also approves code access inside the GitHub "OAuth Application policy" tab.) Once the user approves the app, we will generate a GitHub API Token that lasts for 2 hours and is revoked afterward with no ability for us to regenerate without explicit user action (OAuth flow).

We believe this solution is a good compromise between security and User Experience. We are of course open to feedback regarding extending or shortening the 2-hour window.

The Outcome

Implementing these changes, means that we can remove GitHub scopes that previously let us view code, and also simplify our ingress API to not have to strip away code anymore, as it is never accessed nor sent.

The only code-related data that reaches OtterWise servers is:

Line numbers
File names
Repository names
File coverage numbers

All uploader code, which runs in your CI environment, is open source and pulled directly from GitHub rather than our servers. This lets you inspect the code that is executed, and optionally hardcode your CI to a specific uploader SHA, to ensure it never changes.

Final Thoughts

While I see arguments for entirely ditching the File Viewer (viewing code inside OtterWise), I do believe some users will appreciate the simplicity of the option. It also lets us show public repository code for contributors, since the code is publicly available.

A good compromise was found by letting organisation admins decide, while letting users still take advantage of code coverage tracking and per-line coverage information.