Database for AI apps and unstructured data

A complete backend for AI apps and to process, annotate, organize and search unstructured data (Text, PDF, Image, Audio, Video and other unstructured data)

Introducing the world's first database designed specifically for unstructured data

Eighty percent of the world's data is unstructured, however, managing it efficiently remains challenging, requiring developers to build extensive systems. AI applications require robust management of structured and unstructured data. BerryDB solves this issue by natively supporting both structured and unstructured data types.

Architecture diagrams (10).png

BerryDB Core Features

BerryDB offers a complete set of features for managing unstructured data, including search, query, processing, and annotation.

Natively manage PDF, text, images, audio, video, and JSON data types

BerryDB natively supports unstructured data types, including PDFs, images, audio, video, and JSON. With built-in AI models, it processes text, image, audio, and video data, making the resulting metadata automatically searchable.

Infinite flexibility to handle changing data with a JSON-native database.

The JSON-native database offers unparalleled flexibility to adapt to your ever-changing data. In AI applications, the need to continuously evolve your data model is crucial.

Embedded semantic layer in the database

The built-in annotation studio enables the extraction and annotation of semantic information from unstructured data using APIs and a user-friendly interface. These extracted annotations are automatically indexed for search

Unified search capabilities

BerryDB integrates metadata search (SQL), full-text search, vector (similarity) search and annotation search into a single platform. Users have the flexibility to combine these search queries as required.

JSON
Image
Audio/Video
Text

Model structured and unstructured data in a dynamic schema

Built-in editor for modeling structured and unstructured data:

Supports image, text, PDF, audio and video data types. Supports embedding of unstructured objects. Allows dynamic addition of new fields including nested objects and arrays to schema with no downtime

Supports embedding unstructured data types (PDFs, images, audio, videos) within the schema

For instance, in the patient document shown here may contain a patient photo, a CT scan image and a text document

Seamless management of large text documents with full text search capabilities

This eliminates the need for Elastic search or a separate text retrieval system. Additionally, includes utilities for converting documents from PDF, PPT and other formats to searchable text

Screenshot 2023-10-03 at 10.35.39 AM.png
Screenshot 2023-09-28 at 12.04.38 PM.png

Extract semantic info from unstructured data

Embedded semantic layer in the database with built-in AI/ML models to process unstructured data

APIs and ML models that extracts semantic info from text, image, video and others

Named entity recognition, object recognition, text labeling, text summarization and 20+ semantic extraction models are built-i. The platform also supports integration of custom models, offering flexibility required for AI applications

Built-in annotation studio for manual and ML-based annotations of image, text, audio and video

Ability to review and approve ML-based annotations

Supports in-place annotation of data

Supports enrichment of data by adding annotation info in-place

Process and label PDF and other text documents

Extract data from PDFs, Excel and create records and annotations in BerryDB

Process text, PDF documents to create records

Built-in libraries and APIs to process text documents. For example, convert PDFs to paragraph database

Identify annotations and store them

Identify annotations in the text using built-in ML models and store the annotations

Supports all text annotations
Easy to use APIs
image (6).png
Architecture diagrams (7).png

Unified search capabilities

Integrates metadata search (SQL), full-text search, vector (similarity) search, annotation search, and LLM search into a single platform. Users have the flexibility to combine these search queries as required.

Metadata search

Search through field names using SQL and API

Full text search

Search through large text docs embedded within JSON. No need for Elastic search

Vector search, LLM based search

Similarity search or semantic search of data. No need for a separate vector db. As data changes, system updates built-in vector db

Annotation search

Annotate data (manual or ML-based) and search through annotations. No need for a separate annotation system

Rapid AI Application Development

Radically simplifies the process of building RAG applications

An integrated Enterprise AI stack

BerryDB streamlines the creation of RAG applications by providing an integrated vector database, embedding model, and LLM integration. This eliminates the need for a vector database and embedding generation, allowing users to rapidly develop RAG applications

Synchronized Vector Index with Database, Full-Text Search, and Annotation Indexes

Database search returns deterministic results whereas vector search returns semantically relevant results. Most AI applications require both kinds of search operations and hence require database indexes to be synchronized with vector index. This is automatically achieved in BerryDB

Low Latency User Experience for RAG Applications

Delivers an ultra-fast vector database built on FAISS and an in-memory database. This design removes the need to route data through multiple systems, such as a separate vector database, ensuring a low-latency user experience.

Looks great
Easy to use
Try it yourself!

Multi-dimensional scaling and super fast performance

Horizontally scale data, query and index nodes

Scale them independent of each other

Super fast in-memory database

BerryDB is a JSON native database, does not require conversion to other formats. It is an in-memory database with eventual persistence to disk. As a result, it is 5-10x faster performance on JSON reads and writes compared to other dbs (MongoDB, MySQL etc)

Architecture diagrams (1).png
Screenshot 2023-10-06 at 5.45.36 PM.png

BerryDB API

Powerful API to access, enrich and search through data

User friendly APIs for applications

Powerful APIs to populate unstructured data, search and SQL based APIs for queries, annotation APIs and bulk upload from CSV provides a comprehensive APIs to build applications

Notebook access and SDKs

Notebook UI for accessing/processing objects using Python or Java SDKs

Demo video

Four minute introductory demo

Use cases

Example use cases

AI applications

Provides document store, vector store and label store needed for generative AI applications. It is also a highly scalable and fast performant backend, ideal for large AI apps.

Health care (FHIR) platform

Provides an ideal database to store and search FHIR resources. It handles FHIR structures natively. Most healthcare applications need to handle images, audio and video and BerryDB provides the right framework for these applications

Ecommerce catalog management

Manage millions of objects and 1000s of labels for each object. BerryDB annotation studio and ML based labeling provides a scalable solution. BerryDB's combination search provides capability to search through labels and other metadata in a single query

Customer support audio processing

Handle large number of audio files, transcribe it and search through the transcribed data. Perform manual and automated text labeling using annotation studio

Ready to get started?

Contact us or sign up for beta access