As organizations grow and scale, having access to searchable, indexed data becomes necessary. This often means delivering a client-facing search engine on websites and applications. Other times, companies need search capabilities for business intelligence and analysis of logs or other metrics. While numerous search solutions exist, Elasticsearch and the ELK stack (Elasticsearch, Kibana, and Logstash) are known as industry leaders in providing a flexible, scalable solution to many search-related problems.
This article digs into Elasticsearch’s capabilities and use cases, focusing on the Amazon Elasticsearch Service (Amazon ES). Amazon ES is a managed solution for deploying and running an Elasticsearch cluster.
What is Amazon Elasticsearch Service?
Elasticsearch is an open-source search engine written in Java and designed for distributed or multi-tenant environments. It’s built for scalability while still offering speed and flexibility for indexing and searching data. Given this flexibility, Elasticsearch has a wide variety of use cases, from storing analytics data and logs to more general search purposes, and from inventory data to full-text document search. We’ll explore some specific use cases in more depth later in the article.
Amazon ES is a fully managed solution for hosting Elasticsearch. Amazon ES handles deploying and running Elasticsearch and gives us access to tools to quickly scale our instance up and down based on our software or business needs. Amazon ES also provides monitoring, performance, and security that we rely on across all AWS services. Rather than maintaining the servers and instances ourselves, using a service like Amazon ES takes care of a lot of the low-level configuration automatically.
How to Get Started
We can start with Amazon ES, by heading to the Amazon ES Dashboard on the AWS Console and clicking Create a new domain. This opens a multi-step form to configure the Elasticsearch domain.
First, we’ll select a deployment type. Amazon ES supports “production” domains spread across multiple AWS availability zones, “development” domains in a single zone, or custom options.
Amazon ES also provides a set of configurable options to apply to the Elasticsearch cluster itself. These include options like enabling or disabling auto-tuning performance of the Elasticsearch cluster, the size and number of data nodes, data storage types, and creating dedicated master nodes for our cluster.
This step can seem intimidating for a first-time Elasticsearch user, as we need to understand some of the underlying concepts that power Elasticsearch. That said, this level of customization is a significant benefit of using Amazon ES rather than building and managing the cluster ourselves. Manually tuning a cluster’s performance can involve trial-and-error tweaking of JVM settings as we monitor our cluster. On the other hand, Amazon ES tracks and analyzes usage, and we can automatically apply recommended settings to the cluster.
This automation allows us to keep our focus on the actual end-user functionality of the cluster. Similarly, knowing the type and number of data nodes needed is highly dependent on the needs of the application using Elasticsearch. Some use cases require much larger and longer-term index storage than others. Scaling our instances with Amazon ES becomes a much simpler task than manual scaling and deployment of new nodes.
Last, setting up Amazon ES involves configuring access control and security features. Amazon ES offers fine-grained control over every cluster layer, including network configuration, authentication, and access policies. For example, network configuration allows us to decide if the cluster is accessible from the Internet or only from applications within an AWS Virtual Private Cloud.
Again, this configuration layer is a key differentiator when choosing to use Amazon ES over a self-managed cluster. Manually integrating authentication and security measures can take up valuable development cycles that we could use designing the application uses of Elasticsearch. Integration with AWS IAM and Cognito also allows for straightforward user management and integration into pre-existing AWS ecosystems.
Once we finish configuring and deploying our Amazon ES domain, the next task is integrating it into our applications. One common approach to indexing data in Amazon ES is by using Elasticsearch’s built-in REST API. The Elasticsearch documentation for index APIs outlines the basics of this approach. The Amazon ES Developer Guide also provides code samples for signing HTTP requests to work with Amazon ES’s fine-grained access control. This approach is beneficial if we need to build indexing into our application directly for full-text content search or eCommerce.
The other common approach is to load streaming data sources directly into Amazon ES. Streaming data can come from multiple sources within the AWS ecosystem, such as S3, DynamoDB, or CloudWatch logs. Documentation for all of these can be found in the Developer Guide. We can also use Logstash to consume data streams, transform them, and send them to Amazon ES. Amazon ES supports standard Logstash plugins, allowing for easy integration regardless of where Logstash is running.
Amazon ES provides several tools for accessing or querying data, from Elasticsearch’s URI-based search to supporting SQL queries provided in a POST request body. Additional tools like asynchronous search enable custom searching to match the needs of our application.
Amazon ES automatically provides a running Kibana instance with all Elasticsearch domains if we need data visualization or analysis. Importantly, this means that our Kibana instance uses the same authentication and access control settings set when creating the Amazon ES domain.
Examples of Ideal Use Cases
Given the flexibility of Amazon ES’s feature set, the actual use cases for Elasticsearch can vary. In addition to the examples mentioned above, here are some more detailed use cases of how Amazon ES can integrate with products and applications:
- Elasticsearch can operate as the backend for an app to generate roll-up reports on sales metrics. An app like this could input data from Salesforce (or other CRM software) into Logstash and then stream it into Elasticsearch for indexing. We can use Kibana to generate scheduled reports and automatically share them with stakeholders. The Elastic blog contains details of how to build such a pipeline.
- Amazon ES is excellent for indexing analytics data, allowing business analysts to perform ad-hoc queries with the data set. We can perform such an analysis directly in Kibana, with user management handled through AWS Cognito or a SAML-based authentication provider. It’s also straightforward to build custom analysis tools on top of Elasticsearch using its REST APIs.
- A classic example of using Elasticsearch is powering search on eCommerce sites. A company can index its product information within Amazon ES, and Facets help refine searches to find products quickly, even in large data sets.
- Centralized log aggregation is an increasingly popular application for the ELK stack. In the age of microservice architecture, it’s practically essential to have the ability to query your logs and drill down into a specific time window in near real-time.
How a Partner Can Make It Easier
Though there are many benefits of choosing a managed Elasticsearch provider like Amazon ES to deploy and manage your cluster, there’s still a learning curve to gain specific expertise in Elasticsearch. This expertise gap is where a cloud service provider like Mission can benefit organizations looking to adopt Amazon ES quickly.
Mission helps lower the learning curve for starting with Elasticsearch and Amazon ES. As shown above, there are a lot of configuration options to consider when setting up a domain. Learning the details of those options can require a team taking weeks or months to get up to speed on the ins and outs of Elasticsearch configuration. Partnering with Mission can drastically lower that ramp-up time.
Mission also helps plan data migration into Amazon ES. Many organizations looking at Elasticsearch aren’t starting from the ground up. They may have numerous data sources that they must load and index into their cluster. Mission helps with the migration itself and creates a plan to properly index the data and set your organization up for success once the data is ready.
It’s essential to consider the details of how our organization would implement and use Elasticsearch. As the above section on use cases shows, it’s not always easy to know the best way to integrate Amazon ES into your existing applications and microservices. This uncertainty is another area where Mission will help, guiding your organization towards approaches to integration that make the most sense for your specific use cases.
In this article, we’ve explored the various uses of Elasticsearch and the ELK stack and how data-heavy organizations can use its flexible approach to indexed data in many valuable ways. We’ve also seen how a managed solution like Amazon ES can help bypass some low-level manual configuration required to spin up and tune an Elasticsearch cluster to our organizations’ needs.
Mission’s managed AWS services offering allows enterprise organizations to accelerate their use of Amazon ES and take advantage of the service’s benefits sooner.