Graphbook

The ML workflow framework

Github
Example Workflow with Llama

Build ML Workflows

Graphbook is a framework for building efficient, interactive DAG-structured ML workflows composed of nodes which can be implemented in Python. It provides common ML processing features and a web UI to assemble, monitor, and execute data processing pipelines.

Get started

1. Build

Write processing nodes using Python in your favorite code editor

class DinoV2(BatchStep):
    def __init__(self, id, logger):
        super().__init__(id, logger)
        processor = AutoImageProcessor.from_pretrained('facebook/dinov2-base')
        model = AutoModel.from_pretrained('facebook/dinov2-base')
        
    def on_item_batch(self, images, items, notes) -> dict:
        inputs = self.processor(images=images, return_tensors='pt')
        outputs = self.model(**inputs)
        last_hidden_states = outputs.last_hidden_state
        return {'embeddings': last_hidden_states}
class DinoV2(BatchStep):
    def __init__(self, id, logger):
        super().__init__(id, logger)
        processor = AutoImageProcessor.\
            from_pretrained('facebook/dinov2-base')
        model = AutoModel.\
            from_pretrained('facebook/dinov2-base')
    
        
    def on_item_batch(self, images, items, notes) -> dict:
        inputs = self.processor(
            images=images,
            return_tensors='pt')
        outputs = self.model(**inputs)
        last_hidden_states = outputs.last_hidden_state
        return {
            'embeddings': last_hidden_states
        }

2. Assemble

Assemble an ML workflow in our graph-based editor with your own processing nodes

Abstract workflow

3. Run

Run, monitor, and adjust parameters in your workflow

Elaborate workflow

Build AI/ML Workflows

Iterate, operate, and monitor an ML-based data processing pipeline all in one tool. Connect to any data source, use your own PyTorch or Tensorflow models, and maximize your GPU utilization without having to write tedious multiprocessing code.

Expedite Development

Graphbook give users the ability to easily develop solutions to various tasks that require ML inference pipelines. It facilitates development by offering interactivity, visualizations, and multiprocessing IO. Built to address diverse needs, Graphbook significantly reduces AI/ML workflow development time.

Always Open-Source

Do not trust a third party with your data. Graphbook is always free and open source. Deploy your own Graphbook instances on-premise or in the cloud, and start building.

Learn more

Core features

At its core, Graphbook is a framework for building interactive DAG-structured ML workflows. There are many features that enables its power.

Web UI
Graphbook offers a visual workflow editor enabling users to effortlessly combine the functional units that were previously written in Python. This makes the editor accessible by everyone, from ML engineers to non-technical professionals, so that business logic can always be adjusted.
Extensible
Extend the capabilities of Graphbook, by making your own functional nodes in Python. Create a ML processing node to annotate a data point, a data source node to ingest from a TCP stream, or a human-in-the-loop node that awaits for human feedback.
Data
Seamlessly connect to multiple data sources, including S3 and local storage, to process unstructured data within your workflow. Explore and analyze your entire dataset directly within the editor interface, simplifying data exploration and analysis.
Models
Integrate your own ML models or off-the-shelf models from Huggingface. Experiment with different models to find the best fit for your data processing workflow. Automatic UI elements on each node simplify parameter adjustments and checkpoint management, enabling quick and seamless experimentations.
Views
Gain valuable insights into your data processing workflow with automated visualizations of node outputs, facilitating human-in-the-loop analysis. Monitor model categorizations, sentiment analysis, segmentations, and other labelings directly within the editor interface, empowering informed decision-making based on real-time insights.
Logs
Access detailed logs from each node or the entire system to generate valuable reports and diagnose workflow issues effectively. Enhance visibility by inserting custom log statements into the code of any node's execution lifecycle, and conveniently view them within the editor interface for comprehensive monitoring and troubleshooting.
Interactive
While iterating upon your workflow, you can run from an individual cell, which will also execute its dependent cells. You can sample single batches of data by stepping through a node, or run the entire workflow with all of your data when you’re ready. And, for whatever reason, you can pause/resume your data pipeline processing mid-execution.
Optimization
Graphbook prioritizes the optimization of ML workflow execution, handling multiprocessing I/O for reading and writing, batching inputs, and maximizing GPU and CPU utilization. ML engineers can concentrate on refining business logic, knowing that workflow performance is optimized for efficiency and scalability.

Hosting Options

Self-Hosted | Always Free
The software is free. Visit our open-source repository, clone, and deploy the most convenient way.
Download
Graphbook Cloud
Get started quickly and forget about provisioning enough resources such as GPUs.
Coming soon

Our most common asked questions.

Is this no-code ML?

No. But you can build no-code ML for your customers and internal teams with this framework.

Can I use a VCS with Graphbook?

Yes. Your nodes are written in Python and workflows are serialized as .json files. You can track everything with Git.

Can I write workflows in Python?

Not yet, but we plan on adding this feature soon.  For now, you must assemble workflows in the UI.

How can we deploy to production?

In Graphbook, you can continue to use your workflow as-is or set new variables (directly in the workflow) such as where your production database is.

Can I use an LLM like GPT?

Yes, Graphbook is abstract enough where you can implement anything that can be written in Python including sending API requests to OpenAI.

What makes Graphbook efficient?

The framework has a custom implementation of multiprocessing workers that run in the background for both loading and dumping to keep your GPU at max utilization.

How do I contribute?

You are very welcome to contribute! Visit our repo.