Feature/Idea: Document Management #259

toliver38 · 2025-01-07T11:01:20Z

Users transitioning from tools like RagFlow like document management features alongside Knowledge Graph capabilities. While TrustGraph includes document loading through the Workbench and metadata integration, it lacks tools to manage ingested documents effectively.

Discussed briefly here: Discord Discussion

Problem

TrustGraph does not provide:

A way to list or search ingested documents.
Tools to delete, move, or rename documents within collections.
Status visibility for ingested documents.

These gaps make it difficult to curate and organize knowledge bases effectively.

Proposed Features

1. CLI Tools for Document Management

tg-list-documents -c <collection>: Lists all documents in a specified collection.
tg-delete-document -c <collection> -f <document_id>: Deletes a document by its ID within a collection.
tg-rename-document -f <document_id> -n <new_name>: Renames a document.

2. Workbench GUI Enhancements

Add document listing with search and sorting capabilities.
Provide options to delete, move, rename, or tag documents in an intuitive interface.

3. API Endpoints to Support Document Management

List Documents

Endpoint: GET /api/v1/documents
Description: Retrieves a list of all documents in a specified collection.
Query Parameters:
- collection: The name of the collection to list documents from (optional).

Response:

[
  {
    "document_id": "abc123",
    "name": "example.pdf",
    "metadata": { "key": "value" },
    "collection": "default",
    "user": "trustgraph"
  },
  {
    "document_id": "xyz456",
    "name": "example2.pdf",
    "metadata": { "key": "value" },
    "collection": "default",
    "user": "trustgraph"
  }
]

Delete Document

Endpoint: DELETE /api/v1/documents/{document_id}
Description: Deletes a document by its unique ID.
Path Parameters:
- document_id: The ID of the document to delete.

Response:

{ "message": "Document deleted successfully." }

Rename Document

Endpoint: PUT /api/v1/documents/{document_id}/rename
Description: Renames a document by its unique ID.
Path Parameters:
- document_id: The ID of the document to rename.
Body:
```
{ "new_name": "new_document_name.pdf" }
```

Response:

{ "message": "Document renamed successfully." }

Concern

I've found deleting nodes in graph-based tools can be resource-intensive, especially when nodes have complex relationships. This may impact TrustGraph, depending on the scale of the graph for each document and the efficiency of the deletion process.

The text was updated successfully, but these errors were encountered:

JackColquitt · 2025-01-07T17:08:47Z

Have you looked into or been using knowledge cores?

https://trustgraph.ai/docs/cores/

A lot of the data management you're talking about is on the roadmap for our knowledge core approach. Also, "knowledge core" is very much a placeholder term, so open to suggestions. 😆

toliver38 · 2025-01-07T18:57:30Z

I really like the Knowledge Core concept. Its really helpful for reuse. After playing with it a bit I started to see the possibility of sharing different subject matter expert knowledge cores between organizations.

My issue is the other day I uploaded about 100 pdf files and after some testing and evaluation I wanted to remove the documents from the backend as I deemed they were irrelevent to the collection I was working with.

On another occasion I wasn't able to easily list the documents by name that were in the store already so I wasn't sure if some of the documents had been processed and some had not. After digging into the logs I found out the pipeline was halted due to a malformed pdf. This is what motivated #243

JackColquitt · 2025-01-08T17:09:56Z

I don't know if you seen the word "collection" at some points in the code base, but as @cybermaggedon alluded to here:

https://discord.com/channels/1251652173201149994/1251652174270959798/1326535071187992648

we've been planning on this ability to manage data in the system for quite a while. This approach enables many features that improve data storage management all the way to providing a scheme for controlling data access management.

JackColquitt added the enhancement New feature or request label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/Idea: Document Management #259

Feature/Idea: Document Management #259

toliver38 commented Jan 7, 2025 •

edited

Loading

JackColquitt commented Jan 7, 2025

toliver38 commented Jan 7, 2025

JackColquitt commented Jan 8, 2025

Feature/Idea: Document Management #259

Feature/Idea: Document Management #259

Comments

toliver38 commented Jan 7, 2025 • edited Loading

Problem

Proposed Features

1. CLI Tools for Document Management

2. Workbench GUI Enhancements

3. API Endpoints to Support Document Management

List Documents

Delete Document

Rename Document

Concern

JackColquitt commented Jan 7, 2025

toliver38 commented Jan 7, 2025

JackColquitt commented Jan 8, 2025

toliver38 commented Jan 7, 2025 •

edited

Loading