Open Source Search Engines

Table of Contents

Apache Lucene
- Lucene++
Apache Solr
Open Semantic Search
- Subprojects
  - Solr PHP UI
Elasticsearch
- Other Projects
  - dejavu
  - Fess
  - Searchkit
OpenSearch
- Other Projects
Gigablast
YaCy
- Articles
Vald
Weaviate
MWMBL
Alexandria
Wiby
OpenSearchServer
Metasearch
- MetaGer
Not Web Scale
- meilisearch
- Typesense
Smaller Engines
- Sonic
- ZincSearch

Apache Lucene

https://lucene.apache.org/
The open source Java library that powers Apache Solr and Elasticsearch, among many other search projects.

Lucene++

https://github.com/luceneplusplus/LucenePlusPlus
An open source C++ port of Lucene.

Apache Solr

https://solr.apache.org/
See also dedicated pages on Solr

Open Semantic Search

https://opensemanticsearch.org/
Under the hood one is running Apache Solr, but there are some significant changes that make listing Open Semantic Search separately worthwile.¹

Subprojects

Solr PHP UI - Stars: 20 - Updated: 12/2021 - Checked: 2/2024
- A frontend for Open Semantic Search.
- GitHub Repo
Solr Ontology Tagger - Stars: 39 - Updated: 1/2022 - Checked: 5/2023
Solr Synonames - Stars: 5 - Updated: 10/2020 - Checked: 5/2023

Elasticsearch

https://elastic.co/
See also the dedicated pages on Elasticsearch.

Other Projects

dejavu - Open source, JS web-based UI for Elasticsearch and OpenSearch.
Fess - Open source, enterprise search server with web crawler and GUI. Written in Java.
Searchkit - Updated: 3/2023 - Checked: 3/2023 - Stars: 4.6k - Open source library for building search UI's with JS, React, Vue, Angular, etc. Written in TypeScript primarily.

OpenSearch

https://opensearch.org/
An open source fork of Elasticsearch started by Amazon.²
See also the dedicated pages on OpenSearch

Other Projects

Please see Other Projects under Elasticsearch. Only projects that are for OpenSearch exclusively will be listed here.

Gigablast

https://gigablast.com/
GitHub Repo
Founded in 2000 by Matt Wells as a closed source search engine it was later open sourceed. It is written in C++, is distributed, and includes both the engine and a crawler.

YaCy

Please see the dedicated page on YaCy.

Vald

https://vald.vdaas.org/
GitHub Repo
An open source, distributed vector search engine built using Go, utilized by Yahoo Japan.

Weaviate

https://weaviate.io/
GitHub Repo
Open source vector search engine written in Go.
Semantic Search through Wikipedia with Weaviate

MWMBL

https://mwmbl.org/
GitHub Repo
Open source, non-profit search engine written in Python.³

Alexandria

https://www.alexandria.org/
GitHub Repo
Open source search engine that uses CommonCrawl and is written in C++.

Wiby

https://wiby.me/
GitHub Repo
Installation and Setup Instructions
Open source search engine written in PHP, C, and Go.

OpenSearchServer

https://www.opensearchserver.com/
GitHub Repo
Open source search engine written in Java, includes bundled crawler.
Note: No updates since 8/2021 as of 3/2023.

Metasearch

MetaGer

https://metager.org/
Git Repo
Open source metasearch engine run by a nonprofit.

Not Web Scale

meilisearch

https://www.meilisearch.com/
GitHub Repo
An open source search engine written in Rust.

Typesense

https://typesense.org/
GitHub Repo
An open source Algolia alternative written in C/C++.⁴

Smaller Engines

Sonic - Updated: 1/2023 - Checked: 3/2023 - Stars: 18k - A lightweight, speedy search backend written in Rust.
ZincSearch - Updated: 3/2023 - Checked: 3/2023 - Stars: 14.7k - Lightweight alternative to Elasticsearch, written in Go. Includes a web UI.

Footnotes

It isn't meant for web search particularly but it offers a number of features which could be useful in a search engine - e.g. exploratory search as well as collaborative annotation and tagging. ↩
The fork was started following controversial licensing changes by Elasticsearch. For more on the history of this controversy see Graham Gillen's Elasticsearch vs OpenSearch series. For a brief evaluation of OpenSearch's progress see Matt Asay's One year of OpenSearch: Grading AWS’ open source effort. ↩
The project has some similarities with what I'm looking to do with Phoebe. It is open source, a non-profit, and the code is written in Python. ↩
Some interesting functionality includes tunable ranking, sorting, faceting & filtering, grouping & distinct, federated search, and curation. It doesn't appear to be in web scale usage but they've expressed interest in benchmarking larger datasets so I submmited an issue requesting CommonCrawl be benchmarked. ↩