4 Steps To Setting Up The Perfect Elasticsearch Test Environment

Reading time: 6 minutes

When using Elasticsearch, you often need to do some experimenting. Especially when it comes to the more exotic or dangerous queries, like boolean search queries, update queries, and delete queries, you’ll want an Elasticsearch test environment that you can safely use instead of abusing your production deployment.

In this quick guide, you’ll create an Elasticsearch test environment that you can fire up and preload with some test data with minimal effort.

A word of warning: I assume you’re working in a Unix-like environment, e.g. Linux or MacOS.

1. Install Elasticsearch

Let’s start by firing up a terminal (is yours awesome yet?). Create a directory that will serve as the base of your test environment. For me, that’s: ~/dev/estest/

The next step is to download Elasticsearch into this directory. You can download it from https://www.elastic.co/downloads/elasticsearch

I recommend the “Linux” or “Mac” .tar.gz links, not the .deb or .rpm versions. This way you can isolate your environment to the directory we just created and be sure you won’t have to mess with system-wide configs and data directories.

At the time of writing this article, the latest version is 7.4.2, so that’s what you’ll see in these examples. You should download the latest version there is.

Now untar the .tar.gz file, with something like:
$ tar -xf elasticsearch-7.4.2-darwin-x86_64.tar.gz

If all went well, you have a directory called elasticsearch-7.4.2. The beauty is that you can install multiple versions this way with ease since you can simply install other versions besides this one in their own, versioned directory.

Elasticsearch has a terrific out-of-the-box setup. You don’t need to edit config files.

To run elasticsearch, start it with:
$ ./elasticsearch-7.4.2/bin/elasticsearch

You can verify that Elasticsearch is running by visiting the REST API through your browser: http://localhost:9200/

You should see something like:

{
"name" : "mbp.local",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "F5VboRxcTuu7MSEPGH-LXQ",
"version" : {
"number" : "7.4.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
"build_date" : "2019-10-28T20:40:44.881551Z",
"build_snapshot" : false,
"lucene_version" : "8.2.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

Once you start experimenting with queries, it might be nice to know that you can tail the log files in the ./log directory inside your Elasticsearch installation.

For more installation options, check out the official documentation.

2. Single node installation advice

You are running on a single node, which requires some extra care.

First of all, alway create your indices with a single shard and no replicas.

A single shard saves you resources. For testing purposes, a single shard is usually enough. Using no replicas prevents Elasticsearch from going into a ‘yellow’ state because it wants to assign replicas to other nodes, which there aren’t.

If you are low on disk space (I, for one, am always low on disk space), you might want to add the following setting to config/elasticsearch.yml:

cluster.routing.allocation.disk.threshold_enabled: false

This disables the safety thresholds that are very useful on production deployments, but unneeded on a small test setup.

3. Running queries: pick your poison

There are lots of ways to fire a query at a REST API. It comes down to personal preference and the situation at hand. I will list a few convenient ways so you can pick your poison.

Command-line

Using the command line to work with JSON and REST APIs is not the most user-friendly. However, sometimes it’s all you have at hand. In those cases, you’ll be glad you know your way around tools like curl and jq.

I find curl the easiest tool to use since it is comes preinstalled on most Linux distributions. To get the same page we’ve seen before, but now on the command-line, you can use curl -s localhost:9200.The -s means ‘silent’, it helps in not filling up your terminal with useless progress bars and such.

If you never heard of jq, it’s time to fire up apt-get, yum, brew or whatever your OS uses to install packages. jq is a powerful command-line JSON processor. I use it mainly for syntax highlighting but it can do a lot more. If you ever need to do some bash scripting in combination with JSON you should definitely read up on its features.

Using curl and jq together to create a beautiful JSON output

Curl can perform all the HTTP operations with the -X parameter. To demonstrate, let’s create an index with a PUT request:

Creating an index with curl -X PUT

Web interfaces

If you have more at hand than the command line, you may be better off with a nice GUI. Here are just two favorites of mine. If you know anything better, please share it with all of us in the comments section!

Kibana
Elastic, the company behind Elasticsearch, has created many tools for the Elasticsearch ecosystem. One of the more prominent tools is Kibana — a powerful web application that allows you to create powerful visualizations, dashboards and explore your data.

You can download Kibana here: https://www.elastic.co/downloads/kibana

My advice, again, is to download the tarball (tar.gz) version for your operating system and install it in its own directory in your test environment. Kibana’s version number follows that of Elasticsearch, so in my case, I ended up with a directory called kibana-7.4.2-darwin-x86_64. To start it, run:

$ kibana-7.4.2-darwin-x86_64/bin/kibana

After it has booted, which can take a while, you can visit the webpage at http://localhost:5601

Kibana will offer you to load some sample data. Go ahead and let it, it allows you to experiment with all the features Kibana offers.

In this example screenshot, you can see the e-commerce dashboard:

A Kibana Dashboard showing insights into the e-commerce sample data

If you go into Kibana’s ‘Dev Tools’ section, you can fire JSON requests manually:

Indexing a document using Kibana’s Dev Tools

Cerebro
This is my personal favorite. Cerebro is lightweight and it gives a nice overview of the running cluster nodes, the indexes, server load, disk usage, et cetera. On top of that, it allows you to create and delete indices, manage aliases, manage index templates, and much more.

Above all, it has a nice REST interface in which you can create arbitrary queries of all types (POST, PUT, GET, DELETE). This interface includes a JSON syntax checker, a curl command generator and a convenient query history.

Using Cerebro’s rest interface to PUT some data in a test index

To install Cerebro: go to the downloads page, download the .tgz file and extract it just like we did with the Elasticsearch tarball. You can then run it with the command:

./cerebro-x.x/bin/cerebro

Visit http://localhost:9000 to use the interface.

4. A good set of test data

What does good test data entail? Let’s consider two situations.

First situation: If you have specific data on which you want to perform specific queries, it’s best to just load a (partial) copy of that production data into your test environment.

Second situation: for fiddling around, you want to have something at hand already for all types of queries, so you don’t have to spend time creating a data set each time.

Here’s a non-exhaustive list of stuff you might want to fiddle with:

  • Inner documents vs. nested documents
  • parent-child relationships
  • arrays
  • numbers
  • dates / times
  • text and keywords
  • Geodata (geo points, hashes)
  • Geo shapes
  • IP addresses

I could try to craft a very nifty test set that exploits all these features. Instead, I plan on writing more articles, including test data and example queries, to explain the more advanced Elasticsearch features in more detail. So for now, I would like to refer you to the test sets we’ve loaded earlier when we installed Kibana.

If you inspect it closely, you’ll see that they contain all kinds of data. The e-commerce test set contains numbers, geo points, inner documents, timestamps, and floats (numbers). In addition, the data logs test set contains IP addresses.

There’s more than enough to start experimenting!

Final notes

You now have a working Elasticsearch test environment, with test data at hand. One that you can fire up whenever you need it, and that you can easily move around. You could even run it from a USB stick.

I deliberately didn’t mention Docker in this article, since it is one extra dependency you’d need to install and learn about. If you already use Docker, this is definitely a nice alternative to create a test environment. There is a thorough guide on running Elasticsearch on Docker here. By using Docker, it’s relatively easy to set up a multi-node cluster too.

Liked this article? Please share it with others:

One thought on “4 Steps To Setting Up The Perfect Elasticsearch Test Environment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.