#207 Syntax Highlighting (revised)
- Download:
- source codeProject Files in Zip (108 KB)
- mp4Full Size H.264 Video (18.9 MB)
- m4vSmaller H.264 Video (10.6 MB)
- webmFull Size VP8 Video (11.8 MB)
- ogvFull Size Theora Video (24.5 MB)
We have an example application for posting code snippets online. To create a snippet we give it a name, select its language, enter the code then submit the form. When we do this and view the snippet the code isn’t syntax-highlighted. In this episode we’ll show you how to add this feature.
Comparing The Options
One of the easiest ways to add syntax highlighting is on the client through JavaScript as this way the server doesn’t have to do the work. There are a number of libraries available that will do this including Rainbow and SyntaxHighlighter. Both of these are fairly limited in the languages they support and make more work for the browser so instead we’ll focus on a solution that parses the code on the server.
There are several server side solutions available. CodeRay is one of the easiest to use in a Rails application as it’s written entirely in Ruby and has no other dependencies. It’s very fast at parsing code too so we don’t need to worry about adding caching. It’s main downfall is that it doesn’t support a large number of languages, only 18 in the current release. One of the most popular server-side solutions is Pygments. Github uses this and it supports almost any language you can throw at it. For Rails users its main issue is that it’s written in Python so we need to use something else in order to run it. The pygments.rb gem provides a wrapper around Pygments so that we can use it directly in Ruby code. Ultraviolet is another option. This uses Textmate syntax files which means that it can handle a large variety of languages and that we can customize these if we want to. This can be slow to parse files, however.
To compare these solutions we’ll run a benchmark to see how quickly they each parse the same file.
require "rubygems" require "benchmark" require "coderay" require "pygments" require "uv" repeat = 50 content = File.read(__FILE__) # run it once to initialize CodeRay.scan(content, "ruby").div(css: :class) Pygments.highlight(content, lexer: "ruby") Uv.parse(content, "xhtml", "ruby", true, "amy") Benchmark.bm(11) do |b| b.report("coderay") do repeat.times { CodeRay.scan(content, "ruby").div(css: :class) } end b.report("pygments") do repeat.times { Pygments.highlight(content, lexer: "ruby") } end b.report("ultraviolet") do repeat.times { Uv.parse(content, "xhtml", "ruby", true, "amy") } end end
This also servers as a nice way of demonstrating how to do syntax highlighting in each of the three options. To run this file we’ll need to install each of the gems.
$ gem install coderay pygments.rb uv
When these finish installing we can run our benchmark script.
$ ruby syntax_banchmarks.rb user system total real coderay 0.200000 0.000000 0.200000 ( 0.205535) pygments 0.700000 0.000000 0.700000 ( 0.697237) ultraviolet 3.740000 0.020000 3.760000 ( 3.771054)
We can see from the results that CodeRay and Pygments ran quickly while Ultraviolet took a lot longer. The results show that we could use Coderay, and possibly Pygments, to do real-time highlighting but not Ultraviolet.
Using Pygments
Let’s apply one of these solutions to our Rails application so that it highlights the code snippets. We’ll use Pygments but if you prefer one of the other solutions the approach is much the same. We’ll need to add the pygments.rb
gem to the gemfile and run bundle
to add it to our application.
gem 'pygments.rb'
Next we’ll modify the template that displays each snippet. This currently renders the snippet wrapped in pre
tags.
<pre><%= @snippet.code %></pre>
We’ll remove the pre tags as Pygments adds its own and pass the snippet into Pygments.highlight
.
<%= raw Pygments.highlight(@snippet.code, lexer: @snippet.language) %>
As we know what language the snippet is in we can use the lexer
option to tell Pygments this. If we don’t have this information we can omit this option and Pygments will try to work out the language based on the code content. Alternatively if we know the filename or MIME type we can pass these in. We need to pass the output from Pygments.highlight
into the raw method as it generates HTML and we don’t want this to be auto-escaped.
When we reload the page it looks just like it did before. Pygments is adding the correct markup but as we haven’t added any CSS the tokens aren’t coloured. Pygments comes with a css
method to generate styling. We could just run this and copy the output into a CSS file or we could take advantage of the asset pipeline and use that here. We can create a new stylesheet with an erb
extension and add the line to generate the CSS here.
<%= Pygments.css %>
When we reload the snippet’s page now the code will be highlighted.
Pygments comes with several different styles built in. If we run Pygments.styles
from the Rails console we’ll see a list of the styles we can use.
1.9.3-p125 :001 > Pygments.styles => ["monokai", "manni", "rrt", "perldoc", "borland", "colorful", "default", "murphy", "vs", "trac", "tango", "fruity", "autumn", "bw", "emacs", "vim", "pastie", "friendly", "native"]
We can take any one of these and pass it as a style
option like this
<%= Pygments.css(style: "colorful") %>
When we reload the page now we’ll see that style applied.
Highlighting Snippets Embedded in Markdown Documents
We now have our snippets highlighted without having had to write much code. What though it our code isn’t contained in a separate attribute but instead is embedded in a Markdown document instead? Let’s say that we have a blogging application with an Article
model and that we want to be able to embed code snippets inside the content of an article like we can on Github by using three backticks and specifying the language, like this:
If we view this article after submitting the form the code snippet will be displayed as we entered it with the backticks and language name visible. This is because to display the article’s content we currently just call simple_format
on the content attribute so that the line breaks are displayed properly.
<h1><%= @article.name %></h1> <%= simple_format @article.content %> <%= link_to 'Edit', edit_article_path(@article) %>
Before we can syntax-highlight our code snippets we’ll need to be able to handle Markdown so that we can add code sections within articles. We’ll use the Redcarpet gem to do this and, as ever, we’ll need to add this gem to the gemfile and run bundle
to install it.
gem 'redcarpet'
Now in our template we’ll use Markdown instead of calling simple_format
. We’ll use a helper method called markdown
method that we’ll write shortly.
<%= markdown @article.content %>
We’ll write the helper method in the ApplicationHelper
class.
module ApplicationHelper def markdown(text) renderer = Redcarpet::Render::HTML.new(hard_wrap: true, filter_html: true) options = { autolink: true, no_intra_emphasis: true, fenced_code_blocks: true, lax_html_blocks: true, strikethrough: true, superscript: true } Redcarpet::Markdown.new(renderer, options).render(text).html_safe end end
What we do here is a little different from what we did in episode 272 as Redcarpet has had some significant changes since then. We create a new Redcarpet renderer then we render the Markdown text, passing in a number of options to determine how it will be rendered. We call html_safe
on the output from this so that Rails doesn’t try to HTML-escape it.
When we reload our article now the code snippet is interpreted properly, although the code isn’t syntax-highlighted as it’s not being passed through Pygments. This is easy to do with Redcarpet. All we need to do is define a new renderer class that inherits from Redcarpet::Render::HTML
and override its block_code
method. This takes a code snippet and a language and we can pass these to Pygments so that it can render the code. We can then use this renderer instead of the default one in our markdown method.
module ApplicationHelper class HTMLwithPygments < Redcarpet::Render::HTML def block_code(code, language) Pygments.highlight(code, lexer:language) end end def markdown(text) renderer = HTMLwithPygments.new(hard_wrap: true, filter_html: true) options = { autolink: true, no_intra_emphasis: true, fenced_code_blocks: true, lax_html_blocks: true, strikethrough: true, superscript: true } Redcarpet::Markdown.new(renderer, options).render(text).html_safe end end
When we reload the page now the code is highlighted.
Caching Highlighted Code
We’ll finish this episode by talking about caching. How might we cache the output from the Pygments.highlight
method if we needed to? One option is to take a SHA of the code and use that as a cache key so that it auto-expires. We can do this by using SHA1.hexdigest
to create a hash to use as a key (along with the word “code” and the code snippet’s language) then using Rails.cache.fetch
and passing in that key. If a cache item with that key is found its value will be returned; if not the block will be executed and the output from it stored in the cache.
class HTMLwithPygments < Redcarpet::Render::HTML def block_code(code, language) sha = Digest::SHA1.hexdigest(code) Rails.cache.fetch ["code", language, sha].join('-') do Pygments.highlight(code, lexer:language) end end end
As we’re using Markdown we should move the cache so that it’s around the entire markdown method too, but we’ll just cache the syntax-highlighting code here. Caching is disabled by default in development mode so we’d need to enable that to test this. Its effectiveness will depend on the cache store we decide to use.