Database for unstructured data and AI applications

A database for AI applications and unstructured data (Text, PDF, Image, Audio, Video and other unstructured data)

The world's first database designed specifically for unstructured data

Eighty percent of the world's data is unstructured, however, managing it efficiently remains challenging, requiring developers to build extensive systems. AI applications require robust management of structured and unstructured data. BerryDB solves this issue by natively supporting both structured and unstructured data types.

Architecture diagrams (10).png

Why BerryDB ?

BerryDB is the first database purpose-built for handling unstructured data and building AI applications

Rapidly build highly scalable knowledge databases

Building knowledge databases using traditional DBs is hard. Developers need to build multiple backends including RDBMS for storing metadata, full text search engine like elastic search, annotation system, vector store and a graph store. BerryDB consolidates these into a flexible JSON DB

Natively manage PDF, text, images, audio, video, and JSON data types

BerryDB natively supports unstructured data types, including PDFs, images, audio, video, and JSON

Infinite flexibility to handle changing data with a JSON-native database.

JSON is the standard output format for AI models, BerryDB can seamlessly consume data produced by AI workflows making it faster to build AI applications. The JSON format offers unparalleled flexibility to adapt to your ever-changing data

Built-in semantic layer

The built-in annotation system enables the extraction and annotation of unstructured data. Extracted annotations are automatically indexed for search

Unified and hybrid search capabilities

BerryDB integrates metadata search (SQL), full-text search, vector search and annotation search into a single database. Users have the flexibility to combine these search queries as needed

JSON
Image
Audio/Video
Text

Model structured and unstructured data in a dynamic schema

Built-in editor for modeling structured and unstructured data:

Supports image, text, PDF, audio and video data types. Supports embedding of unstructured objects. Allows dynamic addition of new fields including nested objects and arrays to schema with no downtime

Supports embedding unstructured data types (PDFs, images, audio, videos) within the schema

For instance, in the patient document shown here may contain a patient photo, a CT scan image and a text document

Screenshot 2023-10-03 at 10.35.39 AM.png
Screenshot 2023-09-28 at 12.04.38 PM.png

Extract semantic info from unstructured data

Embedded semantic layer in the database with built-in AI/ML models to process unstructured data

ML based annotations

Named entity recognition, object recognition, text labeling, text summarization and 50+ semantic extraction models are built-in. Users can configure custom ML models for annotations

Built-in annotation studio for manual curation

Ability to review and approve ML-based annotations

Supports in-place annotation of data

Supports in-place enrichment of data by adding annotations as JSON sub-trees

Architecture diagrams (7).png

Unified search capabilities

Integrates metadata search (SQL), full-text search, vector (similarity) search, annotation search, and LLM search in a single system. Users have the flexibility to combine these search queries as needed

Metadata search

Search through field names using SQL and API

Full text search

Search through large text docs embedded within JSON. No need for Elastic search

Vector search, LLM based search

Similarity search or semantic search of data. No need for a separate vector db. As data changes, system updates built-in vector db

Annotation search

Annotate data (manual or ML-based) and search through annotations. No need for a separate annotation system

Multi-dimensional scaling and super fast performance

Horizontally scale data, query and index nodes

Scale them independent of each other

Super fast in-memory database

BerryDB is a JSON native database, does not require conversion to other formats. It is an in-memory database with eventual persistence to disk. As a result, it is 5-10x faster performance on JSON reads and writes compared to other dbs (MongoDB, MySQL etc)

Architecture diagrams (1).png
Screenshot 2023-10-06 at 5.45.36 PM.png

BerryDB API

Powerful API to access, enrich and search through data

User friendly APIs for applications

Powerful APIs to populate unstructured data, search and SQL based APIs for queries, annotation APIs and bulk upload from CSV provides a comprehensive APIs to build applications

Notebook access and SDKs

Notebook UI for accessing/processing objects using Python or Java SDKs

Ready to get started?

Contact us or sign up for beta access