Graphql vs elasticsearch

Graphql vs elasticsearch DEFAULT

GraphQL Search Indexing

To enable our marketing stakeholders to manage these creatives, we need to pull together data that is spread across many services — GraphQL makes this aggregation easy.

As an example, our data is centered around a creative service to keep track of the creatives we build. Each creative is enhanced with more information on the show it promotes, and the show is further enhanced with its ranking across the world. Also, our marketing team can comment on the creative when adjustments are needed. There are many more relationships that we maintain, but we will focus on these few for the post.

Displaying the data for one creative is helpful, but we have a lot of creatives to search across. If we produced only a few variations for each of the shows, languages, and countries Netflix supports, that would result in over50 million total creatives. We needed a proper search solution.

The problem stems from the fact that we are trying to search data across multiple independent services that are loosely coupled. No single service has complete context into how the system works. Each service could potentially implement its own search database, but then we would still need an aggregator. This aggregator would need to perform more complex operations, such as searching for creatives by ranking even though the ranking data is stored two hops away in another service.

If we had a single database with all of the information in it, the search would be easy. We can write a couple join statements and where clauses: problem solved. Nevertheless, a single database has its own drawbacks, mainly, around limited flexibility in allowing teams to work independently and performance limitations at scale.

Another option would be to use a custom aggregation service that builds its own index of the data. This service would understand where each piece of data comes from, know how all of the data is connected, and be able to combine the data in a variety of ways. Apart from the indexing part, these characteristics perfectly describe the entity relationships in GraphQL.

Since we already use GraphQL, how can we leverage it to index our data? We can update our GraphQL query slightly to retrieve a single creative and all of its related data, then call that query once for each of the creatives in our database, indexing the results into Elasticsearch. By batching and parallelizing the requests to retrieve many creatives via a single query to the GraphQL server, we can optimize the index building process.

Elasticsearch has a lot of customization options when indexing data, but in many cases the default settings give pretty good results. At a minimum, we extract all of the type definitions from the GraphQL query and map them to a schema for Elasticsearch to use.

The nice part about using a GraphQL query to generate the schema is that any existing clients relying on this data will get the same shape of data regardless of whether it comes from the GraphQL server or the search index directly.

Once our data is indexed, we can sort, group, and filter on arbitrary fields; provide typeahead suggestions to our users; display facets for quick filtering; and progressively load data to provide an infinite scroll experience. Best of all, our page can load much faster since everything is cached in Elasticsearch.

Indexing the data once isn’t enough. We need to make sure that the index is always up to date. Our data changes constantly — marketing users make edits to creatives, our recommendation algorithm refreshes to give the latest title popularity rankings and so on. Luckily, we have Kafka events that are emitted each time a piece of data changes. The first step is to listen to those events and act accordingly.

When our indexer hears a change event it needs to find all the creatives that are affected and reindex them. For example, if a title ranking changes, we need to find the related show, then its corresponding creative, and reindex it. We could hardcode all of these rules, but we would need to keep these rules up to date as our data evolves and for each new index we build.

Fortunately, we can rely on GraphQL’s entity relationships to find exactly what needs to be reindexed. Our search indexer understands these relationships by accessing a shared GraphQL schema or using an introspection query to retrieve the schema.

In our earlier example, the indexer can fan out one level from title ranking to show by automatically generating a query to GraphQL to find shows that are related to the changed title ranking. After that, it queries Elasticsearch using the show and title ranking data to find creatives that reference these values. It can reindex those creatives using the same pipeline used to index them in the first place. What makes this method so great is that after defining GraphQL schemas and resolvers once, there is no additional work to do. The graph has enough data to keep the search index up to date.

Let’s look a bit deeper into the three steps the search indexer conducts: fan out, search, and index. As an example, if the algorithm starts recommending show 80186799 in Finland, the indexer would generate a GraphQL query to find the immediate parent: the show that the algorithm data is referring to. Once it finds that this recommendation is for Stranger Things, it would use Elasticsearch’s inverted index to find all creatives with show Stranger Things or with the algorithm recommendation data. The creatives are updated via another call to GraphQL and reindexed back to Elasticsearch.

The fan out step is needed in cases where the vertex update causes new edges to be created. If our algorithm previously didn’t have enough data to rank Stranger Things in Finland, the search step alone would never find this data in our index. Also, the fan out step does not need to perform a full graph search. Since GraphQL resolvers are written to only rely on data from the immediate parent, any vertex change can only impact its own edges. The combination of the single graph traversal and searching via an inverted index allows us to greatly increase performance for more complex graphs.

The indexer currently reruns the same GraphQL query that we used to first build our index, but we can optimize this step by only retrieving changes from the parent of the changed vertex and below. We can also optimize by putting a queue in front of both the change listener and the reindexing step. These queues debounce, dedupe, and throttle tasks to better handle spikes in workload.

The overall performance of the search indexer is fairly good as well. Listening to Kafka events adds little latency, our fan out operations are really quick since we store foreign keys to identify the edges, and looking up data in an inverted index is fast as well. Even with minimal performance optimizations, we have seen median delays under 500ms. The great thing is that the search indexer runs in close to constant time after a change, and won’t slow down as the amount of data grows.

We run a full indexing job when we define a new index or make breaking schema changes to an existing index.

In the latter case, we don’t want to entirely wipe out the old index until after verifying that the newly indexed data is correct. For this reason, we use aliases. Whenever we start an indexing job, the indexer always writes the data to a new index that is properly versioned. Additionally, the change events need to be dual written to the new index as it is being built, otherwise, some data will be lost. Once all documents have been indexed with no errors, we swap the alias from the currently active index to the newly built index.

In cases where we can’t fully rely on the change events or some of our data does not have a change event associated with it, we run a periodic job to fully reindex the data. As part of this regular reindexing job, we compare the new data being indexed with the data currently in our index. Keeping track of which fields changed can help alert us of bugs such as a change events not being emitted or hidden edges not modeled within GraphQL.

We built all of this logic for indexing, communicating with GraphQL, and handling changes into a search indexer service. In order to set up the search indexer there are a few requirements:

  1. Kafka. The indexer needs to know when changes happen. We use Kafka to handle change events, but any system that can notify the indexer of a change in the data would be sufficient.
  2. GraphQL. To act on the change, we need a GraphQL server that supports introspection. The graph has two requirements. First, each vertex must have a unique ID to make it easily identifiable by the search step. Second, for fan out to work, edges in the graph must be bidirectional.
  3. Elasticsearch. The data needs to be stored in a search database for quick retrieval. We use Elasticsearch, but there are many other options as well.
  4. Search Indexer. Our indexer combines the three items above. It is configured with an endpoint to our GraphQL server, a connection to our search database, and mappings from our Kafka events to the vertices in the graph.

After the initial setup, defining a new index and keeping it up to date is easy:

  1. GraphQL Query. We need to define the GraphQL query that retrieves the data we want to index.
  2. That’s it.

Once the initial setup is complete, defining a GraphQL query is the only requirement for building a new index. We can define as many indices as needed, each having its own query. Optionally, since we want to reindex from scratch, we need to give the indexer a way to paginate through all of the data, or tell it to rely on the existing index to bootstrap itself. Also, if we need custom mappings for Elasticsearch, we would need to define the mappings to mirror the GraphQL query.

The GraphQL query defines the fields we want to index and allows the indexer to retrieve data for those fields. The relationships in GraphQL allow keeping the index up to date automatically.

The output of the search indexer feeds into an Elasticsearch database, so we needed a way to utilize it. Before we indexed our data, our browser application would call our GraphQL server, asking it to aggregate all of the data, then we filtered it down on the client side.

After indexing, the browser can now call Elasticsearch directly (or via a thin wrapper to add security and abstract away database complexities). This setup allows the browser to fully utilize the search functionality of Elasticsearch instead of performing searches on the client. Since the data is the same shape as the original GraphQL query, we can rely on the same auto-generated Typescript types and don’t need major code changes.

One additional layer of abstraction we are considering, but haven’t implemented yet, is accessing Elasticsearch via GraphQL. The browser would continue to call the GraphQL server in the same way as before. The resolvers in GraphQL would call Elasticsearch directly if any search criteria are passed in. We can even implement the search indexer as middleware within our GraphQL server. It would enhance the schema for data that is indexed and intercept calls when searches need to be performed. This approach would turn search into a plugin that can be enable on any GraphQL server with minimal configuration.

Automatically indexing key queries on our graph has yielded tremendously positive results, but there are a few caveats to consider.

Just like with any graph, supernodes may cause problems. A supernode is a vertex in the graph that has a disproportionately large number of edges. Any changes that affect a supernode will force the indexer to reindex many documents, blocking other changes from being reindexed. The indexer needs to throttle any changes that affect too many documents to keep the queue open for smaller changes that only affect a single document.

The relationships defined in GraphQL are key to determining what to reindex if a change occurred. A hidden edge, an edge not defined fully by one of the two vertices it connects, can prevent some changes from being detected. For example, if we model the relationship between creatives and shows via a third table containing tuples of creative IDs and show IDs, that table would either need to be represented in the graph or its changes attributed to one of the vertices it connects.

By indexing data into a single store, we lose the ability to differentiate user specific aspects of the data. For example, Elasticsearch cannot store unread comment count per user for each of the creatives. As a workaround, we store the total comment count per creative in Elasticsearch, then on page load make an additional call to retrieve the unread counts for the creatives with comments.

Many UI applications practice a pattern of read after write, asking the server to provide the latest version of a document after changes are made. Since the indexing process is asynchronous to avoid bottlenecks, clients would no longer be able to retrieve the latest data from the index immediately after making a modification. On the other hand, since our indexer is constantly aware of all changes, we can expose a websocket connection to the client that notifies it when certain documents change.

The performance savings from indexing come primarily from the fact that this approach shifts the workload of aggregating and searching data from read time to write time. If the application exhibits substantially more writes than reads, indexing the data might create more of a performance hit.

The underlying assumption of indexing data is that you need robust search functionality, such as sorting, grouping, and filtering. If your application doesn’t need to search across data, but merely wants the performance benefits of caching, there are many other options available that can effectively cache GraphQL queries.

Finally, if you don’t already use GraphQL or your data is not distributed across multiple databases, there are plenty of ways to quickly perform searches. A few table joins in a relational database provide pretty good results. For larger scale, we’re building a similar graph-based solution that multiple teams across Netflix can leverage which also keeps the search index up to date in real time.

There are many other ways to search across data, each with its own pros and cons. The best thing about using GraphQL to build and maintain our search index is its flexibility, ease of implementation, and low maintenance. The graphical representation of data in GraphQL makes it extremely powerful, even for use cases we hadn’t originally imagined.

If you’ve made it this far and you’re also interested in joining the Netflix Marketing Technology team to help conquer our unique challenges, check out the open positions listed on our page. We’re hiring!

Sours: https://netflixtechblog.com/graphql-search-indexing-334c92e0d8d5

GraphQL vs Elasticsearch what should i use for fast searching performance that return with many different schema?

You are comparing apple with orange if you are comparing GraphQL with ElasticSearch. They are totally different technologies.

GraphQL is the API layer technology which compare to REST. It mainly defines the request/response format and structure of your HTTP based API. It is not another NoSQL that help you to store and query data efficiently.

If you are using GraphQL , you still need to query the data by yourself , which the data may actually store and come from NoSQL , SQL DB , ElasticSearch or other web service or blablabla . GraphQL does not care about where you store the data ,the data can even store at multiple data sources. What he cares is that you tell him how to get the data.

Back to your case , you most probably can use ElasticSearch for storing and searching the data efficiently. And put GraphQL in front of ElasticSearch such that users/developers interact with the service through GraphQL API in order to enjoy GraphQL benefits.

Sours: https://stackoverflow.com/questions/54273965/graphql-vs-elasticsearch-what-should-i-use-for-fast-searching-performance-that-r
  1. Teddy bear pig
  2. Arpeggio jazz ensemble
  3. Mexican seal tattoo
  4. How to fix horizontal lines on led tv
  5. Hsn handheld vacuum

Since being introduced by Facebook, GraphQL has taken the API world by storm as an alternative to REST APIs. GraphQL fixes many problems that API developers and users have found with RESTful architecture. However, it also introduces a new set of challenges which need to be evaluated. Because GraphQL is not simply a evolutionary replacement for REST, this post will deep dive into the pros and cons of each and when GraphQL makes sense for your application.

History

Before RESTful APIs, we had RPC, SOAP, CORBA, and other less open protocols. Many pre-REST APIs required complex client libraries to serialize/deserialize the payload over the wire. A fundamental difference compared to today’s RESTful APIs, is that SOAP is strongly typed using formal contracts via WSDL (Web Services Description Language). This could wreck havoc on interoperability since even removing a 32-bit integer restriction to accepting a 64-bit long meant breaking upstream clients. SOAP is not an architecture, but a full protocol implementation consisting of security, error handling, ACID transactions, etc Some of the complexity is due to many abstraction layers baked into SOAP. For example, SOAP is able to run on HTTP, TCP, UDP, etc. While the protocol implementation afforded much abstraction, the application and data layer’s rigid contract between the client and server created a tight coupling between the two.

RESTful architecture was introduced in 2000 as a much simpler way to enable machine to machine communication using only the ubiquitous HTTP protocol without additional layers in a stateless and type free way. This enabled systems to be loosely coupled and more forgiving to contract changes between systems such as between companies.

REST today

REST APIs have become the de facto standard for companies deploying APIs and launching developer platforms. The beauty of REST is that a developer working with someone else’s API doesn’t need any special initialization or libraries. Requests can be simply sent via common software like cURL and web browsers.

REST uses the standard CRUD HTTP Verbs (GET, POST, PUT, DELETE) and leverages HTTP conventions and centered around data resources (HTTP URIs) rather than attempting to fight HTTP. Thus an e-commerce with a resource can behave similar to a bank with a resource . Both would probably need CRUD operations on that resource and prefer to cache queries (i.e. both and would be cached). 3rd party developers to a new API need to only reason about the data model and leave the rest to HTTP convention rather than digging deep into thousands of operations. In other words, REST is much tighter coupled to HTTP and CRUD compared to SOAP, but provides loose data contracts.

Problems with REST

As more variety of APIs are placed in production use and scaled to extreme levels, certain problems in RESTful architecture transpired. You could even say GraphQL is between SOAP and REST taking pieces from each.

Server driven selection

In RESTful APIs, the server creates the representation of a resource to be responded back to a client.

However, what if the client wants something specific such as return the names of friends of friends of a user where their job is engineer.

With REST, you might have something like:

GraphQL allows you to represent this query in a cleaner way:

Fetching multiple resources

One of the main benefits of GraphQL is to make APIs less chatty. Many of us have seen an API where we first have to first and then fetch each friend individually via endpoint, this can result in N+1 queries and is a will known performance issue in API and database queries. In other words, RESTful API calls are chained on the client before the final representation can be formed for display. GraphQL can reduce this by enabling the server to aggregate the data for the client in a single query.

More in depth analytics

While API analytics is also a negative for GraphQL apis since there is very little tooling out there. The tools that do support GraphQL APIs can provide much more insights into queries than RESTful APIs.

Problems with GraphQL

Caching

Caching is built into in the HTTP specification which RESTful APIs are able to leverage. GET vs POST semantics related to caching are well defined enabling browser caches, intermediate proxies, and server frameworks to follow. The following guidelines can be followed:

  • GET requests can be cached
  • GET requests can stay in browser history
  • GET requests can be bookmarked
  • GET requests are idempotent

GraphQL doesn’t follow the HTTP spec for caching and instead uses a single endpoint. Thus, it’s up to the developer to ensure caching is implemented correctly for non-mutable queries that can be cached. The correct key has to be used for the cache which may include inspecting the body contents.

While you can use tools like Relay or Dataloader that understands GraphQL semantics, that still doesn’t cover things like browser and mobile caching.

Diminishes shared nothing architecture

The beauty of RESTful APIs is that they complement shared nothing architecture well. For example, Moesif has a endpoint and a endpoint. Publicly, those two endpoints simply look like two different REST resources. Internally though, they point to two different microservices on isolated compute clusters. The search service is written in Scala and the alerting service is written in NodeJS. The complexity in routing HTTP requests via host or URL is much lower than inspecting a GraphQL query and performing multiple joins.

Exposed for arbitrary requests

While a main benefit of GraphQL is to enable clients to query for just the data they need, this can also be problematic especially for open APIs where an organization cannot control 3rd party client query behavior. Great care has to be taken to ensure GraphQL queries don’t result in expensive join queries that can bring down server performance or even DDoS the server. RESTful APIs can be constrained to match data model and indexing used.

Rigidness of queries

GraphQL removes the ability for custom query DSLs or side effect operations on top of an API. For example, the Elasticsearch API is RESTful, but also has a very powerful Elasticsearch DSL to perform advanced aggregations and metric calculations. Such aggregation queries may be harder to model within the GraphQL language.

Non existent monitoring

RESTful APIs have the benefit of following the HTTP spec with regards to resources just like a website. This enables many tools to probe a URL such as which would return 5xx if not OK. For GraphQL APIs, you may not be able to leverage such tools unless you support placing the query as a URL parameter as most ping tools don’t support HTTP and request bodies.

Besides ping services, there are very few SaaS or open source tools that support API analytics or deeper analysis of your API calls. Client errors are presented as a 200 OK in a GraphQL API. Existing tools that expect 400 errors will not work so you may miss errors happening on your API. Yet at the same time, more flexibility given to the client requires even more tools to catch and understand problems with your API.

What is Moesif? Moesif is the most advanced REST and GraphQL analytics platform used by Thousands of platformsto measure how your queries are performing and understand what your most loyal customers are doing with your APIs.

Conclusion

GraphQL APIs can be exciting new technology, but it is important to understand the tradeoffs before making such architectural decisions. Some APIs such as those with very few entities and relationships across entities like analytics APIs may not be suited for GraphQL. Whereas applications with many different domain objects like e-commerce where you have items, users, orders, payments, and so on may be able to leverage GraphQL much more.

In fact, GraphQL vs REST is like comparing SQL technologies vs noSQL. There are certain applications where it makes sense to model complex entities in a SQL Db. Whereas other apps that only have “messages” as in high volume chat apps or analytics APIs where the only entity is an “event” may be more suited using something like Cassandra.

Monitor REST And GraphQL APIs With Moesif

Learn More

Derric Gilling

Derric Gilling

Co-founder & CEO @Moesif. Previously CTO @TroveMarket and Computer Architect @Intel. Studied @UMichigan.

Sours: https://www.moesif.com/blog/technical/graphql/REST-vs-GraphQL-APIs-the-good-the-bad-the-ugly/
Comparing Elasticsearch and Solr

GraphQL and Elasticsearch: A Love Letter 💌

Our GraphQL API on Elasticsearch powers one of the largest websites dedicated to cooking in Switzerland, with thousands of recipes. It delivers relevant search results and personalized teasers in the blink of an eye by leveraging both, the benefits of GraphQL as an aggregated API endpoint and Elasticsearch as a performant full-text search and recommendation engine.

A more detailed description of our GraphQL Elasticsearch API in German language can be found on our company page as a one pager:

This is our setup: We have several web frontends which need different data from our Elasticsearch backend and some other 3rd party REST APIs. We will look at an API for recipes which is used to display recipe teasers as search results and the recipe detail page.

We will focus on our primary goal, which is to provide our web frontends with an easily accessible GraphQL API endpoint for data from Elasticsearch.

We could have used one of the existing GraphQL to Elasticsearch libraries — but that would not have served our main goal of a simple end-user friendly API, because:

  • we don’t want to expose all possible Elasticsearch methods with its complex query language
  • we want to determine granularly which documents and fields should be available via the API and hide irrelevant fields(e.g. metadata used for scoring)
  • we only want to expose what’s needed by the end users — and keep as much flexibility as possible for further changes and enhancements without breaking the API
  • we want the ability to include 3rd party resources and APIs in a simple way
  • we want to make the GraphQL API for the end user as simple as possible and hide complex Elasticsearch queries (e.g. the ones needed for full-text search, boosting and personalization) under the hood

Therefore, we decided to build our GraphQL API granularly, based on the needs of our API customers. By explicitly defining the GraphQL API and its schema we are keeping the flexibility to extend and enhance.

One key benefit of GraphQL is the well defined schema which makes it easy to consume the API. But this schema must be explicitly defined.

From the data import into Elasticsearch up to the GraphQL endpoint, the data schema is mostly redundant. To minimize error prone copy pasting and code duplication, we used the Code First approach and share the schema. Write once — use everywhere.

Therefore we defined our data schema by using Typescript models. These declarative models are then converted to Elasticsearch and GraphQL schema as needed. We open sourced part of this approach as @smartive/es-model.

For meaningful results, we gathered the most frequent queries of our frontends and tested them against the GraphQL API at scale. We used jMeter to set up the load tests and flood.io to parallelize the load onto different machines.

This uncovered performance issues and bottlenecks. That is why we implemented caching for expensive 3rd party requests. We also identified issues with our relational dataset and the way GraphQL works. By denormalizing our datasource we could resolve this issue and even improve the overall performance.

At the end we have a GraphQL API running on two NodeJS nodes and two Elasticsearch nodes, serving up to 5000rpm in ø 50ms.

Since we have much more reads than writes to our datasource we could drastically increase the performance by denormalizing our relational datasource in Elasticsearch.

Imagine you want to retrieve the top 10 recipes with their ingredients which match your search term “pie” from the GraphQL API.

If the ingredient details like name and image are stored in a separate index/table, then — per default — the GraphQL API will make one additional Elasticsearch Query for every ingredient of every recipe in the result set.

So for one GraphQL API call you would have:

  • 1 Elasticsearch query which retrieves 10 recipes
  • 5 Elasticsearch queries per recipe to retrieve the recipe’s ingredient details (if a recipe has ø 5 ingredients)

This makes a total of 51 Elasticsearch queries per GraphQL query!

We could optimize this by parsing the AST context of the GraphQL Query to bring it down to one Elasticsearch query for all ingredients of the 10 recipes. But this is still too much. If we want to retrieve our result set with one single Elasticsearch query per GraphQL query we need to denormalize our datasource.

By denormalizing, we include the ingredients detail in every recipe dataset. This way the GraphQL API only needs one single Elasticsearch Query to get all the requested data.

This approach of course only makes sense if you have a high read/write ratio and your datasource changes infrequently. But if that’s the case, then denormalization can drastically improve performance.

Caching GraphQL APIs is much more complex than caching REST endpoints. So to keep things simple, we only applied caching for resources which are not under our control as for example 3rd party APIs.

For expensive 3rd Party REST API calls which are not under our control, we’ve chosen to cache data in Redis. The cache is invalidated when data is updated or stale.

GraphQL provides a great framework for organizing and combining your APIs. But it also introduces a lot of complexity when dealing with relational data. Elasticsearch offers professional full-text search and high performance data retrieval. Together they build a world class team. But be aware of your use cases and restrict the APIs to make sure to deliver the best performance.

Want to become a GraphQL pro? Follow us and read our whole series on enterprise-grade GraphQL applications.

Sours: https://blog.smartive.ch/graphql-and-elasticsearch-a-love-letter-9ed64d5c094

Vs elasticsearch graphql

Deprecated and no longer maintained

When I first started with grapql this was one of the big advantages I found, to generate a more user friendly api out of already existing schema definitions. However this particular package has been stale for a while and I don't have any time to work on it. Also elastic and graphql is moving fast forward so to keep up with the apis are very hard. Hopefully i will get back to this package in the future since I think graphql is a very neat way to expose an api.

Until then, thanks for all the stars and please contact me if you have any ideas or would like to work on this tool

Schema and query builder for Elastic Search

  • Creates a static typed graphql schema from an elastic search mapping
  • Transforms your graphql query and creates an elastic search body
  • Runs the search on your elastic index
  • Returns the results and calls your hits schema

For working example, checkout elasticsearch-graphql-server-example

js-standard-style

Compatibility

This package is tested and working on

  • graphql version ^0.6.2 (should be okey from version 0.5.x)
  • ElasticSearch version ^2.3.1

Usage

Query Builder

It will fetch the current mapping from elasticsearch and create a static typed schema for you. Add the schema to you graphql server and the type helper will lead you. The hits field will resolve to whatever schema you send in. So you can use elasticsearch for searching data and then easily get your real data from anywhere. See full example in in /examples

Example query

TODO

  • Support multiple indexes
  • Do smarter elasticsearch queries
  • Add more options, like query type etc.
  • Add tests
  • Allow more aggregation types
  • Allow more complex filters
Sours: https://openbase.com/js/elasticsearch-graphql
REST vs GraphQL - What's the best kind of API?

Geo-GraphQL with ElasticSearch

The following takes heavy inspiration from the Sangria docs as well as the HowToGraphQL Scala tutorial. We will create GraphQL queries to filter users and coffee shops based on name and geolocation.

All final code is available on GitHub

ElasticSearch

Easiest setup is with this following docker-compose file, or installing Elastic locally and running on the default port (9200).

Running should get you started. Make sure you can get a response from ElasticSearch in your browser at localhost:9200

To load our test users and coffee shops, change directory into the scripts folder and run the load script:

$ cd src/main/resources/scripts/
$ ./load-test-data.sh

Navigating to http://localhost:9200/test-users/_search?pretty should give you a list of users.

Dependencies

Resources

Let’s add the graphiql.html to our resources directory (under ) so we can have a playground to test our GraphQL queries. We will see a route below that will serve this file from the resources directory.

There are just two routes; one that accepts a POST to /graphql and another that serves static resources. Right now, making a POST request to the graphql endpoint will just return a string. We create under

Models can be divided into a few categories; variables, responses, and common. Variables are classes that will be mapped to GraphQL arguments. The responses will the responses from the GraphQL Server and will tie directly into the GraphQL Schema. Common models can be shared between these inputs and outputs, this will be made clear shortly.

Let’s add a models folder with nested directories and classes as follows:

├── Main.scala
└── models
├── common
│ └── Location.scala
├── responses
│ ├── CoffeeShop.scala
│ ├── SearchResponse.scala
│ └── User.scala
└── variables
├── BBox.scala
└── Filter.scala

Common

is a simple case class with latitude and longitude properties.

Variables

Let’s build our Bounding Box and Filter classes. takes two objects, topLeft and bottomRight as properties, and Filter takes optional and optional objects as properties. topLeft and bottomRight correlate to our screen’s corners that the map is rendered on.

Responses

Let’s create a generic class:

A case class will have name, id, and location properties. The location property will be an object of type .

Our case class will contain a total property and a list of users.

The and models are very similar to the User and User Response ones:

Let’s go ahead and make/add the following files in

├── Elastic.scala
├── GraphQLSchema.scala
├── GraphQLServer.scala
├── Main.scala
└── models

GraphQL Schema

For an in-depth look at Sangria and GraphQL Schema definitions, please see the Sangria docs

Here we tie our models to a GraphQL Schema:

ElasticSearch

In the HowToGraphQL tutorial, the author uses Slick and an in-memory H2 database to save and query data. We will be using elastic4s. We create a trait to hold our Elastic config and methods (makes it easier to test):

Next we will add a class that implements this trait and the methods we have defined:

Sangria has a concept of Context that flows with GraphQL Queries. It’s super important for our use case as it will hold an instance of our Elastic class and the BBox variable.

We can define our context as a simple case class in :

case class MyContext(elastic: Elastic, bbox: Option[BBox] = None)

GraphQL Server

Now we will create a GraphQL Server that will have an Akka Http and an instance of our class. The endpoint method below takes a , parses it, and returns a .

You’ll see that we make a call to . Let’s build that next:

Here is where we pass our Elastic instance as well as our previously defined . Let’s not forget to update to route requests to our GraphQLServer:

To run our server, we use sbt. From the root directory:

$ sbt ~reStart

Navigate to localhost:8080 and enter our query and bbox variable:

When we search with the following bounding box coordinates, we will see 3 users returned from the query. All of these users are located in Austin, TX.

{
"data": {
"geoSearch": {
"users": {
"hits": [
{
"id": 3,
"name": "Ricky",
"location": ...
},
{
"id": 4,
"name": "Carter",
"location": ...
},
{
"id": 5,
"name": "Mitch",
"location": ...
}
]
}
}
}
}

Let’s adjust our bounding box to cover more area:

We see a 4th user, who happens to be in San Diego, CA:

{
"id": 1,
"name": "Duane",
"location": {
"lat": "32.715736",
"lon": "-117.161087"
}
}

We adjust our bounding geo box a last time:

And we see our 5th and final user in Mexico City 🇲🇽

{
"name": "Matt",
"id": 2,
"location": {
"lat": "19.42847",
"lon": "-99.12766"
}
}

The last piece of the puzzle is adding an additional field to search and filter our users by. We are going to add a name filter so that we can make queries like so:

In order to filter by name, let’s update the method in :

Now if we run the new query from above, we will see only Matt and Mitch returned! Super easy to add in new functionality.

{
"data": {
"geoSearch": {
"users": {
"hits": [
{
"id": 2,
"name": "Matt",
"location": {
"lat": "19.42847",
"lon": "-99.12766"
}
},
{
"id": 5,
"name": "Mitch",
"location": {
"lat": "30.366666",
"lon": "-97.833330"
}
}
],
"total": 2
}
}
}
}

Finally, let’s say we want to search for users in our geo box with names that start with “M” as well as coffee shops in the area with the name “Starbucks”

{
"data": {
"geoSearch": {
"users": {
"hits": [
{
"id": 2,
"name": "Matt",
"location": {
"lat": "19.42847",
"lon": "-99.12766"
}
},
{
"id": 5,
"name": "Mitch",
"location": {
"lat": "30.366666",
"lon": "-97.833330"
}
}
],
"total": 2
},
"coffeeShops": {
"hits": [
{
"name": "Starbucks"
},
{
"name": "Starbucks"
}
],
"total": 2
}
}
}
}

We can quickly wire up a simple React app with Mapbox and Apollo to display some of our data (source code):

Sours: https://towardsdatascience.com/geo-graphql-with-elasticsearch-b01a6bdf0dc8

You will also like:

.



2235 2236 2237 2238 2239