Database for unstructured data and AI applications

A database for AI applications and unstructured data (Text, PDF, Image, Audio, Video and other unstructured data)

The world's first database designed specifically for unstructured data

Eighty percent of the world's data is unstructured, however, managing it efficiently remains challenging, requiring developers to build extensive systems. AI applications require robust management of structured and unstructured data. BerryDB solves this issue by natively supporting both structured and unstructured data types.

Architecture diagrams (10).png

Why BerryDB ?

BerryDB is the first database purpose-built for handling unstructured data and building AI applications

Rapidly build highly scalable knowledge lakes

Building knowledge databases using traditional DBs is hard. Developers need to build multiple backends including RDBMS for storing metadata, full text search engine like elastic search, annotation system, vector store and a graph store. BerryDB consolidates these into a flexible JSON DB

Natively manage PDF, text, images, audio, video, and JSON data types

BerryDB natively supports unstructured data types, including PDFs, images, audio, video, and JSON

Infinite flexibility to handle changing data with a JSON-native database.

JSON is the standard output format for AI models, BerryDB can seamlessly consume data produced by AI workflows making it faster to build AI applications. The JSON format offers unparalleled flexibility to adapt to ever-changing data

Built-in semantic layer and multi-layered knowledge graphs

The built-in annotation system enables the extraction and annotation of unstructured data. Extracted annotations are built as separate layers on a multi-layered knowledge graphs and automatically indexed for search

Unified and hybrid search capabilities

BerryDB integrates SQL based search for metadata, full-text search, vector search and annotation search into a single database. Users have the flexibility to combine these search queries as needed

JSON
Image
Audio/Video
Text
PDF/Docs

Structured and unstructured data in a dynamic schema

Built-in editor for modeling structured and unstructured data:

Supports image, text, PDF, audio and video data types. Supports embedding of unstructured objects. Allows dynamic addition of new fields including nested objects and arrays to schema

Supports unstructured data types (PDFs, images, audio, videos) in the schema

For instance, a patient schema (shown here) may contain a patient photo, a CT scan image and a large text document for medical history

Screenshot 2023-10-03 at 10.35.39 AM.png
Screenshot 2023-09-28 at 12.04.38 PM.png

Extract semantic info and build multi-layered knowledge graphs

Embedded semantic layer in the database with built-in AI/ML models to process unstructured data

ML based annotations

Named entity recognition, object recognition, text labeling, text summarization and 50+ semantic extraction models are built-in. Users can configure custom ML models for annotations

Built-in annotation studio for manual curation

Ability to review and approve ML-based annotations

Supports in-place annotation on a multi-layered knowledge graph

Supports in-place enrichment of data by adding annotations as JSON layers

realt4535_create_an_image_for_an_AI_database_that_can_store_l_c2618434-36a8-4add-bdec-66c8afd424f2_3 (1).png

Unified search on multi-layered JSON data

Integrates metadata search (SQL), full-text search, vector (similarity) search, annotation search, and LLM search in a single system. Users have the flexibility to combine these search queries as needed

SQL search

Search through field names using SQL and API

Full text search

Search through large text docs embedded within JSON. No need for Elastic search

Annotation search

Annotate data (manual or ML-based) and search through annotations. No need for a separate annotation system

Vector search, LLM based search

Similarity search or semantic search of data. No need for a separate vector db.

Multi-dimensional scaling and super fast performance

Horizontally scale data, query and index nodes

Scale them independent of each other

Super fast in-memory database

BerryDB is a JSON native database, does not require conversion to other formats. It is an in-memory database with eventual persistence to disk. As a result, it is 5-10x faster performance on JSON reads and writes compared to other dbs (MongoDB, MySQL etc)

Architecture diagrams (1).png
Screenshot 2023-10-06 at 5.45.36 PM.png

BerryDB API

Powerful API to ingest, enrich and search through data: See https://docs.berrydb.io/python-sdk/

User friendly APIs for applications

Powerful APIs to process unstructured data, search and SQL based APIs for queries, annotation APIs and bulk upload from CSV provides a comprehensive APIs to build applications

Notebook access and SDKs

Notebook UI for accessing/processing unstructured data and building knowledge layers using Python or Java SDKs

Demo

Product introduction video

Ready to get started?

Contact us or sign up for beta access