Twitter Email
subspace (/ˈsʌbspɛɪs/)
A Jekyll playground site

Lunr.js as search engine for a static site

Last modified: , 1896 Words
02:02 on Wednesday, 01. September 2021 | by Admin in Jekyll

Static sites are a great thing and offer many advantages, but generating dynamic content is naturally difficult and a search feature usually needs a database back end and server-sided dynamic content to work. But this is only one side of the medal, there are ways to implement it without any server-sided dynamic content creation. Like in many other areas, modern JavaScript can rescue the situation by providing powerful and efficient local search solutions that run on the visitor’s browser.

For such a solution, the basic workflow looks like follows, assuming we are using a static site generator like Jekyll or HuGo. But basically, the solution works for any site that is based on static content (pure HTML, CSS and JavaScript).

  • When building the site, generate a static page containing the search index. This can be a HTML page or a JSON document, it really does not matter. Most JavaScript search engines expect the index to be in the JSON format.
  • Use a client-sided script that uses this index and presents the results.

lunr.js is such a script. It runs in your browser and searches a JSON-formatted search index for keywords. Ideally, the index should contain various metadata and some of the article’s real content. For my solution, an index entry for a single document contains the following:

  • The title of the post or article
  • The name of its author
  • The list of categories it belongs to. On this site, an article can be member of multiple categories.
  • The modification date. Used for sorting the results.
  • An excerpt for the article. This is typically a short (a few sentences at max) summarize or abstract. It must be part of the front matter. If you do not want to maintain an excerpt for all your content, you can also use the first couple of sentences from the actual content or instruct jekyll to generate excerpts automatically using an excerpt separator.
  • The URL for the article. Needed to provide a link in the search result.
  • Keywords: A list of words describing the main subject the article is about.
  • tags: The list of tags for the article or post.

This is how a single index entry looks like

"en-2021-09-01-lunrjs-for-jekyll": {
        "title": "Lunr.js as search engine for a static site",
        "author": "Admin",
        "category": "",
        "modified": "02:02 on Wednesday, 01.September 2021",
        "excerpt": "How to use [lunr.js](https://lunrjs.org) to implement a site search for a static web site built with Jekyll or similar static page generators.",
        "url": "/en/2021/09/01/lunrjs-for-jekyll/",
        "keywords": "[\"lunrjs\", \"lunr\", \"search\", \"jekyll\"]",
        "tags": "[\"\"]",
        "words": "280"
      }


This is the liquid template code fragment for building the index on this site

This code fragment iterates over all articles in all collections to generate one search index entry per document. The code should be easy enough to modify for your own personal needs.

please note lines 5-7 and 9-11 to see how you can exclude whole collections or use tags to > exclude specific posts from appearing in the index. I exclude the sys collection, because it > does not contain real posts, just support documents. I also exclude all posts tagged with drafts, > because they are normally unpublished.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<script>
      {% assign nr_items = 0 %}
  window.store = {
      {% for collection in site.collections %}
      {% if collection.label == "sys" or site[collection.label].size < 1 %}
          {% continue %}
      {% endif %}
      {% for post in site[collection.label] %}
      {% if post.tag contains 'drafts' %}
          {% continue %}
      {% endif %}
      "{{ post.url | slugify }}": {
        "title": "{{ post.title | xml_escape }}",
        "author": "{{ post.author | xml_escape }}",
        "category": {{ post.category | default: "" | strip_html | strip_newlines | jsonify }},
        "modified": "{{ post.modified | date: "%H:%M on %A, %d.%B %Y" |xml_escape }}",
        "excerpt": {{ post.excerpt | default: "No content excerpt available" | strip_html | strip_newlines | jsonify }},
        "url": "{{ site.baseurl }}{{ post.url | xml_escape }}",
        "keywords": {{ post.keywords | default: "" | strip_html | strip_newlines | jsonify }},
        "words": "{{ post.content | number_of_words }}"
      }
      {% assign nr_items = nr_items | plus:1 %}
      {% unless forloop.last %},{% endunless %}
      {% endfor %}
    {% unless forloop.last %},{% endunless %}
    {% endfor %}
  };
  var nr_items = {{ nr_items }};
</script>
<script src="{{ site.baseurl }}/assets/js/lunr.min.js"></script>
<script src="{{ site.baseurl }}/assets/js/search.js"></script>

In our example, we build the JSON object with the window object as its container. While this is a good solution, it’s not mandatory. You could use an independent object for it, but the window object has the simple advantage that you do not need to pass it to your search script. It’s always available and shared to all scripts running in the current browser tab.

So, in our example, the search index is part of the /search.html page that also offers a search box and a container to present the search results. The index could be put into a separate JSON file, but then your scripts would need to read it in which creates additional and unneeded complexity. It is also possible to pass a search query to /search.html using the query parameter. For example: /search.thml?query=fun directly performs a search. This is how the search box in the top right corner works, it simply calls search.html passing whatever string you typed into the search box as parameter.

Search index size considerations

Since the whole index will be part of the search.html page on your site, wouldn’t this page grow very big?

It depends on the site, obviously. On my small site with less than 100 documents, the search.html isn’t even 100kB in size. I’ve done the math and found that the average size per index entry is about 450 to 500 bytes. To make it simpler, let’s just assume half a kB per document, so 1000 documents would result in about 512kB (half a MB), which is still nothing given today’s average bandwith availability. With 10.000 documents (already a fairly large site) the JSON index would be about 5 MB in size, still very manageable dimensions.

and search performance?

Nothing you should worry about unless you get into high 5 digit numbers of documents. JavaScript on modern browsers is fast enough to perform such searches in less than a second. The memory requirements per tab would be more of a concern, since the JSON index must be read into and kept in memory (part of the window object). Reading the search.html page could therefore result in performance problems with really large indexes (ten-thousands of entries).

On the average site, performance problems wouldn’t be a concern. Expect that a local search using JavaScript would be faster than a remote server-side database search on a server running under high load. The hardware performance found in devices not older than 5-10 years should normally be good enough

Above, we did our homework, that is, building the search index. Now focus on lines 30 and 31 of the code fragment. They include two scripts: lunr.min.js is the search engine you download from the lunr website. It’s also possible to include it from an online resource (CDN) that always gives you the most recent version. Please see the docs on the lunr website for instructions how to include the script on your site. I prefer to host it locally, but that’s only one of many options.

The second script (search.js) is the interface script that initializes lunr with the search index, performs the search and inserts the results into the DOM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
 (function() {
  function displaySearchResults(results, store) {
    var searchResults = document.getElementById('search-results');

    if (results.length) { // Are there any results?
      var appendString = '';
        appendString += '<h4>The query found ' + results.length + ' results.</h4>';
      for (var i = 0; i < results.length; i++) {  // Iterate over the results
        var item = store[results[i].ref];
          appendString += '<li class="lunr-searchresult"><a href="' + item.url + '">' + item.title + '</a>';
          appendString += '<br>' + item.excerpt;
          appendString += '<br><span class="author subheader" style="float:right;">' + item.words + ' Words</span><span class="author subheader" style="float:left;"><span class="time_symbol"></span><span class="time" >' + item.modified + '</span></span>';
          appendString += '<div class="clearfix"></div>';
          //appendString += '<p>' + item.content.substring(0, 150) + '...</p></li>';
      }

      searchResults.innerHTML = appendString;
    } else {
      searchResults.innerHTML = '<h4>Nothing found.</h4>';
    }
  }

  function getQueryVariable(variable) {
    var query = window.location.search.substring(1);
    var vars = query.split('&');

    for (var i = 0; i < vars.length; i++) {
      var pair = vars[i].split('=');

      if (pair[0] === variable) {
        return decodeURIComponent(pair[1].replace(/\+/g, '%20'));
      }
    }
  }

  var searchTerm = getQueryVariable('query');

  if (searchTerm) {
    document.getElementById('search-box').setAttribute("value", searchTerm);

    // Initalize lunr with the fields it will be searching on. I've given title
    // a boost of 10 to indicate matches on this field are more important.
    var idx = lunr(function () {
      this.field('id');
      this.field('title', { boost: 10 });
      this.field('author');
      this.field('category');
      this.field('keywords');
      this.field('excerpt');

      for (var key in window.store) { // Add the data to lunr
        this.add({
        'id': key,
        'title': window.store[key].title,
        'author': window.store[key].author,
        'category': window.store[key].category,
        'excerpt': window.store[key].excerpt,
        'keywords': window.store[key].keywords
        });

      }
    });
    var results = idx.search(searchTerm); // Get lunr to perform a search
    displaySearchResults(results, window.store); // We'll write this in the next section
  }
})();

As you can see, it’s quite straightforward. Lines 2-20 present the results of the search and build the response. This will then be inserted into the search-results object which should normally be a simple <div id="search-results"></div>.

Beginning in line 43, lunr.js is initialized with data from the window.store object (our JSON index). Line 63 executes the search and

The search form on the search page.

A simple form to enter a search string. The page will then call itself passing the string as parameter. The JavaScript function getQueryVariable() will detect the query parameter and perform the search.

<form id="searchform" style="margin:0;padding: 0 20px 20px 20px;" action="/search.html" method="get">
  <input type="text" id="search-box" name="query">
  <input type="submit" value="Search">&nbsp;&nbsp;
  <span id="total_items"></span>
</form>
<!-- here goes the results -->
<ul id="search-results"></ul>

The code for a search box anywhere on your site

This can go whereever you want. The site header would be a good place for it.

<form id="lunr-searchform" action="/search.html" method="get">
 <div class="lunr-wrap">
   <div class="lunr-search">
    <input type="text" class="lunr-searchTerm" name="query" placeholder="Search this site...">
    <button type="submit" value="search" class="lunr-searchButton">
        <span class="search_symbol"></span>
    </button>
   </div>
 </div>
</form>
The css for the search form(s)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
li.lunr-searchresult a {
    font-weight: bold;
    font-size: 130%;
}
li.lunr-searchresult {
    margin-bottom: 20px;
    font-family: $sans-font !important;
    font-size: 90% !important;
}

div.lunr-search {
  width: 100%;
  position: relative;
  display: flex;
}

input.lunr-searchTerm {
  width: 100%;
  border: 0;
  border-right: none;
  padding: 5px;
  height: 18px;
  border-radius: 0;
  outline: none;
  color: $field_color;
  font-weight:bold;
  background-color: $field_bg;
}

button.lunr-searchButton {
  width: 40px;
  height: 28px;
  border: 0;
  background: #00B4CC;
  text-align: center;
  color: #000;
  border-radius: 0;
  cursor: pointer;
  font-size: 18px;
}

/*Resize the wrap to see the search bar change!*/
div.lunr-wrap{
  width:20%;
  position: absolute;
  top: 50%;
  right: 20px;
  transform: translate(0, -50%);
}

form#lunr-searchform, form#searchform {
    background-color: $background_color;
}

form#lunr-searchform {
    margin: 0;
    padding: 0;
    float: right;
    margin-right: 20px;
    width: 20%;
    min-width:40%;
}

form#searchform input {
    color: $field_color;
    background-color: $field_bg;
    border: 1px solid $accent_color;;
    padding: 4px;
}

span.search_symbol {
    font: normal normal normal 15px/1 FontAwesome !important;
    color: black;
    margin: 0 4px 0 0;
}

span.search_symbol:before {
    content: "\f002";
}