Git-assisted Jekyll Workflow in Multiuser Environments

TL;DR
Motivation
Workflow Overview
Authoring and Reviewing Content
You Git Management System
Automated Build Environment
Further (Security-related) Consideration

TL;DR

We utilize the continuous integration features of Git Management Systems like Gitlab, to trigger a git pull of an preview and production branch of our blog content. The preview branch is made for creation of blog posts and can be previewed visiting a sub-URI, protected with Basic Auth. To publish content a Pull Request is created to the production branch. After an admin/moderator reviewed it, it is merged. Both actions trigger the pull and build on the web server. The illustration in section Workflow Overview describes the interactions between all entities during the process. All discussed config files can be pulled from Github.

Motivation

I like the concept behind Jekyll and static content generators in general. Especially the fact the resulting files, exposed to the web server, are static is particularly appealing. The cleartext files lets you keep track and manage blog post in a Git repository - this seems natural. This enables collaborative work on blog posts, convenient for work on long articles. A use case common in companies or in organizations sharing a platform, like hackerspaces. As the author base gets bigger you want to enforce some kind of access controls or simply having somebody to look over articles for quality assurance, e.g., fixing typos. We had this exact scenario for our CTF-Team at the university. We migrated from Wordpress, since we usually work together through Gitlab anyway and the separate account management was a burden. The blog is used for writeups of CTF-challenges, which are written by multiple people, often one-time visitors. Active users consist of a small group and are a couple of students, mostly employees of the university. However, we realized this could change quickly. An automated, secure and future-prove publishing workflow was needed. Besides, we prove-read each others articles before publishing.

To sum up, this is our environment:

Potentially not overseeable user base, including a certain fluctuation - students come and go.
Technical users (at least, to some degree), who can handle Git and Markup.
Usage of a Git Management system (Git-MS), in our case Gitlab, with different access rights. Employee can push and merge into “master” branch, whereas students can only push to a “preview” branch.
Our Server is a separate machine, which runs GNULinux and an Apache web server.
Users have no access to the server, only Git.

Workflow Overview

The illustration bellow shows the workflow starting at the authoring of an article, and going through all steps until it is published. At the first glance it looks complex, but the complexity is not exposed to the users, as you will see at the end of the post. We have two different groups of users: the admins, which can push and merge into the master branch and the developers, which author blog posts and are allowed to push into the preview branch only. The latter can also be done by the admins, in case they are creating a blog post they “should” push into the preview branch.

Blogworkflow

We will go through the general process, from creating the first commit to the final publishing, and later discuss how to implement the single steps.

Create the actual content inside the preview branch. Commit locally.
Push the changes to the repository.
The previous push triggers a “Deploy Hook”, previously created in the Git-MS. It basically makes a HTTP GET request to a defined URL on you web server.
The triggered URL is a simple CGI-script which executes a git pull and builds the static content of the preview branch.
The static content is copied into a folder, published by the server under a specific sub-URI, which optionally is password protected. Consider this you stagging environment.
When the previous steps gone through smoothly and the blog post on the stagging environment look like expected, the author created a Merge Request (in Github-speak it’s called a Pull Request) to the master branch.
A user with the admin role reviews the Merge Requests, eventually, accepts it.
The third step is repeated. The Deploy Hook, this time triggered by a merge, calls the same URL on the web server again.
The CGI-scripts pulls the master branch and generates the static content again. (It actually pulls both branches, if there are any changes it builds the particular one.)
The final, reviewed content is published on you blog page.

Authoring and Reviewing Content

This section considers the first and the seventh step. To the most part, this depends on the internal culture of your company or whatever group. The first step kept simple: tell the users where to place the .md files and strongly expect them to adhere to the folder structure of assets. Otherwise it will get really messy with time. Part of the process is that your admins also push blog post content to the preview branch. Even though, you can’t force them by technical means, this is a strong part of the feedback loop.

The ninth step requires your admins to make a short review of the file structure and the article. Creating Jekyll posts gives one the ability to publish malicious script code on you web server. A quick look into the .md file(s) of the Merge Request and a git diff are usually enough. Better, look at the staging environment and check if it builds like expected - also easier to proof read. The staging environment, like the name says, is made for this. Remember, we assume we have a potentially untrusted group of authors. This sounds like we have serious trust issues with our users (which are at the moment actually only a couple of people), and I really don’t want to oversell this point. However, my experience is that environments change and the amount of users can grow beyond overseeable. The process was made with that in mind.

Even though, feedback and QS is also up to you, we, meaning our admins and other users, fix typos directly in the preview branch. The pushed commits are automatically added up to the currently opened Merge Request. The Merge Request, like an issue or ticket, itself is perfect for discussions about the content itself and allows you to quote particular lines of text to make your point. This is a big bonus of a Git-ish environment.

You Git Management System

The requirements for your Git-MS are creating Pull/Merge Request and being able to add Deploy Hooks. Even though I will go through Gitlab since, I know it well, pretty much every Git-MS I know of, has this features: Github (obviously), Gitbucket and Gogs.

First add a Deploy Hook for push and Merge Request events. On a Gitlab-instance this is the “Settings”->”Integrations” menu of your project.

Deploy-Hook

Choose an URL, which leads to your build script.

To be enable git pull to your Git-MS, a valid account with read-access to your repository is required. Either create one or instead, add a “Deployment Key”, a functionality perfect for this use case. Your user or Deployment Key needs read-access, only. In case somebody messes with your server she shouldn’t be able to mess with your repository.

Deploy-Key

Both functionalities are designated for continuous integration, but we will (mis-)use them for automated building of our Jekyll blog.

Automated Build Environment

At this point it is important you define a base folder, which is not exposed by the web server. In our case, let it be JEKYLL_ROOT=/var/www/jekyll_root. (Usually the a sub-folder html is published, only. Again consider your setup and make sure it is not publicly accessible.)

Before doing anything else we have to consider the SSH-setup. I assume your private SSH key and known hosts file are inside $JEKYLL_ROOT/.ssh/ folder. The git-command has no elegant way to pass these files via commandline, workarounds exist however. Since Git version 2.3 you can pass the parameters as an environment variable:

GIT_SSH_COMMAND='ssh -i $JEKYLL_ROOT/.ssh/id_rsa -o UserKnownHostsFile=$JEKYLL_ROOT/.ssh/known_hosts' git pull

If you stuck with an older version the solution is to create a Shell-script with you SSH command.

echo "ssh -o UserKnownHostsFile=$JEKYLL_ROOT/.ssh/known_hosts -i $JEKYLL_ROOT/.ssh/id_rsa $*" >> ssh.sh
chmod 750 ssh.sh
GIT_SSH='./../ssh.sh' git pull

The first version of the Git command is more elegant, but we will use the latter one since, it works on every system. At this point, you have created a Deploy Hook, let it be $URI/path-to-web-hooks/jekyll-update-script.cgi.

Before the build script, we take care of some prerequisites. Clone both branches into separate folders. In the following example the branches are in blog and blog-preview respectively. It’s time to create the corresponding build script on the web server.

#!/usr/bin/env bash

echo -e "Content-type: text/html\r\n\r"
no_pull="Already up-to-date."
CONFIGS="../_config.yml,../_config_preview.yml"   # Additional config, overides base_paths
JEKYLL_ROOT="/var/www/your-jekyll-blog-env"
BLOG_OUT="$JEKYLL_ROOT/blog-preview"
BLOG_PATH="$JEKYLL_ROOT/jekyll-repo-preview" # This repo ist checkout to "preview" branch

function update_blog {
  cd $BLOG_PATH
  git reset --hard HEAD # a build can create junk files, you want get rid of
  pull_result=`GIT_SSH='$JEKYLL_ROOT/ssh.sh' git pull 2> /dev/null`

  if [[ $pull_result != $no_pull ]]
  then
    echo -e "Building new version of of the Jekyll blog.\r\n\r"
    rm -rf $BLOG_PATH/.bundle 2>&1 > /dev/null  # remove, users shouldn't push bundle configs
    BUNDLE_GEMFILE="$JEKYLL_ROOT/Gemfile bundle install --path /var/www/jekyllblog/jekyll_gems 2>&1 >  /dev/null"
    BUNDLE_GEMFILE="$JEKYLL_ROOT/Gemfile JEKYLL_ENV=production bundle exec jekyll build -s $BLOG_PATH -d $BLOG_OUT --config $CONFIGS 2>&1 > /dev/null"
  else
    echo -e "No changes to the blog were made.\r\n\r"
  fi
}

update_blog # execute for staging env

CONFIGS="../_config.yml"        # default
BLOG_OUT="JEKYLL_ROOT/blog"
BLOG_PATH="JEKYLL_ROOT/jekyll-repo"

update_blog # execute for production env

In order the for this script to work, you have to create a valid HTTP response. Hence, there is an echo with the Content-Type, which must be present. Further messages are useful for debugging only, though. Both branches are build after another.

At first, we set the path to JEKYLL’s config files. Usually, you can pass sub-URIs via commandline with jekyll build --baseurl /preview, but depending on your design template, further adjustments are needed, which can’t be passed as a build argument. I found creating a second config for the preview environment, the more solid solution (have a look at the sample code).

Before pulling the corresponding branch, we reset the repository to the last known commit. This is optional, but I found previous build artifacts sometimes mess up the pulls and manual intervention is needed - we want the more care-free version.

Pull the results. If no changes are available from origin, skip the build and continue. Otherwise, install the dependencies via bundle install, then build. This is done twice, changing the path variables (in an ugly manner), for the preview, and then for the production environment.

We are done, for the most part. You should take care of the Apache config, which expose the cgi-script, the preview-URI, and the actual static blog content. I have included a sample config file in the provided code.

We keep the destination of local Gems outside of users’ reach, that’s an obvious one. We also, keep the Gemfile outside the repository. Just to be sure, Jekyll’s config files are also moved outwards. It is wise to have an additional copy of the Gemfile inside the repository, so your users have the same versions of Gems as on your server, but it is not wise to use the same for the automated build environment. You have some control over the master branch, but the preview is designated for everyone. This basically means arbitrary code execution on the build environment. It would be nice to be able to update Jekyll via Git, but this would force you to jail the build process into chroot. Building within a Docker container would also be overblown, but would spare you the work replicating all libraries for Ruby inside the chroot. Introducing a build server, a third entity for this workflow introduces unnecessary complexity, imho.

Be sure to execute rm -rf $BLOG_PATH/.bundle after pulling from the repo. Otherwise, the code execution issue will bite you. Updating the Gems with bundle update, would re-introduce the issue we navigated around before, since, the file .bundle/config contains the path to the Gem folder.

The build environment runs with the permissions of your web server. Running the whole script as a separate user would have been convinient, but the Linux Kernel ignores the setuid-flags on ascii files beginning with a hashbang, for security reasons :-).

Also, consider protecting the staging environment behind Digest or Basic Auth, so the preliminary code is not accessible by the public. It is also wise to limit the exposure of jekyll-update-script.cgi to your Git-MS. You could utilize a simple firewall rule, or just use the build-in access controls of your web browser.

I would be happy to hear, if you found a way to make the process more secure and/or simpler.

Git-assisted Jekyll Workflow in Multiuser Environments

16 Sep 2017

Contents

TL;DR

Motivation

Workflow Overview

Authoring and Reviewing Content

You Git Management System

Automated Build Environment

Git-assisted Jekyll Workflow in Multiuser Environments

16 Sep 2017

TL;DR

Motivation

Workflow Overview

Authoring and Reviewing Content

You Git Management System

Automated Build Environment

Further (Security-related) Consideration