Natural language processing (NLP), the field of AI that involves parsing text for tasks including summarization and generation, is a fast-growing technology. According to a 2021 survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their NLP budgets grew by at least 10% compared to 2020, while a third said that their spending climbed by more than 30%. Fortune Business Insights pegged the NLP market at $16.53 billion in 2020.
Against this backdrop, Deepset, the startup behind the open source NLP framework Haystack, today announced that it raised $14 million in a Series A investment led by GV with participation from Harpoon Ventures, System.One, Lunar Ventures and Acequia Capital. The capital infusion arrived alongside Deepset Cloud, a new subscription product for building NLP-powered software.
"Driven by [our] belief in open source, the Deepset team has … been contributing models and research outcomes to the open source NLP community [for years]," Rusic told TechCrunch via email. "Haystack, the company’s flagship open source product, was born out of the experiences, expertise and know-how gained while building NLP for large organizations and the need for a proper set of building blocks for scalable, API-driven NLP back-end applications."
CEO Milos Rusic co-founded Deepset with Malte Pietsch and Timo Möller in 2018. Pietsch and Möller -- who have data science backgrounds -- came from Plista, an adtech startup, where they worked on products including an AI-powered ad creation tool.
Haystack lets developers build pipelines for NLP use cases. Originally created for search applications, the framework can power engines that answer specific questions (e.g., "Why are startups moving to Berlin?") or sift through documents.
Haystack can also field "knowledge-based" searches that look for granular information on websites with a lot of data or internal wikis. Rusic says that Haystack has been used to automate risk management workflows at financial services companies, returning results for queries like "What is the business outlook?” and "How did revenues evolve in the past years?" Other organizations, like Alcatel-Lucent Enterprise, have leveraged Haystack to launch virtual assistants that recommend documents to field technicians.
A screenshot of the Haystack interface. Image Credits: Haystack
According to Rusic, the goal with Haystack was to enable developers and product divisions to build modern, API-driven NLP apps successfully -- and quickly. He notes that, while it's often straightforward for a data science team to come up with a prototype, challenges can arise in transitioning from prototype to production. About 80% of AI projects -- including NLP projects -- never make it into production, according to a 2019 Gartner survey.
"[With Haystack,] development teams … are equipped with all the components to build a full-stack NLP application and are guided with the proper workflows … Modern NLP moves very fast, and it’s much easier to bridge the gap between the cutting-edge research and the actual production-ready technologies through open source," Rusic said. "[Prebuilt NLP systems] are the basis [for Haystack] and often provide great results in pipelines without additional training. Customization, if needed, happens with end users and experts who provide feedback by testing and using new iterations of a [system] or a pipeline."
But not every company chooses -- or wishes -- to go the DIY route. For those preferring a managed solution, there's the aforementioned Deepset Cloud, which supports customers across the NLP service lifecycle. The service starts with experimentation -- i.e., testing and evaluating an app, and adjusting it to a use case, and building a proof of concept -- and ends with labeling and monitoring the app in production.
"All NLP services that are developed [with Deepset Cloud] can be used in any end application, simply by integrating an API," Rusic said. "Example applications are NLP-driven enterprise search (think 'modern Google-like' search) and knowledge management."
With the new financing secured ($15.6 million in total), Deepset aims to translate its open source success -- thousands of organizations currently use Haystack -- into increased revenue. Rusic says that the 30-person, Berlin, Germany-based company was bootstrapped and break-even before raising its first funding round in 2021, and now has large enterprise customers including Airbus.
"[With the new funding,] we'll continue to build the open source Haystack NLP project — adding more features, making it even more straightforward for NLP-savvy back-end developers to create NLP services," Rusic said. "[We'll also] develop Deepset Cloud into a fully fledged enterprise software-as-a-service to build language-aware applications. This will include enabling more flexible workflows, more granular product lifecycle guidance, and offering essential and supplemental tools, like labeling and data integrations."