Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New branch 1(data set optimized) #1143

Open
wants to merge 52 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
9aafa71
added sup_cumulative branch changes
KunalTiwary Sep 23, 2024
715762d
minor changes in count
KunalTiwary Sep 23, 2024
d510c26
added schedule_update_SpeechConversation
KunalTiwary Sep 30, 2024
4256239
created a proxy endpoint for xlit api
kartikvirendrar Oct 9, 2024
394b169
Merge pull request #1123 from AI4Bharat/xlit-proxy-api
ishvindersethi22 Oct 10, 2024
30a62cb
added fix for ac_enabled_stage
KunalTiwary Oct 28, 2024
085b3ea
Merge pull request #1125 from AI4Bharat/minor_fix_ac_en_stage_bbm
aparna-aa Oct 28, 2024
2c36d9e
added new project type OCRTextlineSegmentation
kartikvirendrar Nov 28, 2024
7c65424
Update project_registry.yaml
kartikvirendrar Nov 28, 2024
286b385
added minor changes
KunalTiwary Dec 4, 2024
c43552c
Update views.py
ishvindersethi22 Dec 5, 2024
c56fb8e
Update views.py
tahirjmakhdoomi Dec 5, 2024
8f7a94f
Update annotation_registry.py
tahirjmakhdoomi Dec 5, 2024
9de45a0
Update annotation_registry.py
tahirjmakhdoomi Dec 5, 2024
600b0e8
Update annotation_registry.py
tahirjmakhdoomi Dec 5, 2024
491659d
Update annotation_registry.py
tahirjmakhdoomi Dec 5, 2024
af10b5a
Update views.py
tahirjmakhdoomi Dec 5, 2024
5c6b91d
Update views.py
tahirjmakhdoomi Dec 7, 2024
068a73c
Update views.py
tahirjmakhdoomi Dec 7, 2024
2182c7a
Update views.py
tahirjmakhdoomi Dec 7, 2024
8855826
added fix for draft_data_json
KunalTiwary Dec 8, 2024
31f79a8
added changes in download
KunalTiwary Dec 9, 2024
a6a8b04
Merge pull request #1128 from AI4Bharat/back-brnch-master-ante-changes
ishvindersethi22 Dec 10, 2024
8d9fd12
added minor changes for ocr_te
KunalTiwary Dec 12, 2024
dc78e76
small bug fix
KunalTiwary Dec 13, 2024
1bf9ab0
Merge branch 'back-branch-master' into xlit-proxy-api
ishvindersethi22 Dec 13, 2024
a9d758c
Merge pull request #1127 from AI4Bharat/xlit-proxy-api
ishvindersethi22 Dec 13, 2024
7cb99c9
Added Task Analytics Cron Setup
Shanks0465 Dec 21, 2024
3b5e9df
Added Task Analytics Caching
Shanks0465 Dec 21, 2024
0aed468
Added On Start Trigger for Task Count
Shanks0465 Dec 21, 2024
2c096f2
Added Workspace Task Analytics Cron
Shanks0465 Dec 27, 2024
fddfeb5
Added freeze_task to SpeechConversation and updated assign_new_tasks
Shanks0465 Dec 31, 2024
eae43c2
Added freeze task filter to assign review and supercheck tasks
Shanks0465 Dec 31, 2024
abdc6e6
Added OCRSegmentCategorizationEditing Task Count
Shanks0465 Jan 1, 2025
fed6705
Merge pull request #1132 from AI4Bharat/reports-caching-task-count
ishvindersethi22 Jan 2, 2025
ebf59f1
Updated Task Analytics Cron to 1 hour
Shanks0465 Jan 2, 2025
c5ba18c
Merge branch 'back-branch-master' into reports-caching-task-count
ishvindersethi22 Jan 2, 2025
e08409d
Merge pull request #1134 from AI4Bharat/reports-caching-task-count
ishvindersethi22 Jan 2, 2025
d305844
Updated Task Count Cron with minute set to 0
Shanks0465 Jan 3, 2025
e8bf2e9
Merge pull request #1136 from AI4Bharat/reports-caching-task-count
ishvindersethi22 Jan 3, 2025
e219314
Delete backend/projects/migrations/0053_alter_project_project_type.py
ishvindersethi22 Jan 3, 2025
7bafe1e
Delete backend/users/migrations/0034_alter_user_is_approved.py
ishvindersethi22 Jan 3, 2025
7f71004
Merge branch 'back-branch-master' into speech-task-freeze
ishvindersethi22 Jan 3, 2025
dd36877
Merge pull request #1135 from AI4Bharat/speech-task-freeze
ishvindersethi22 Jan 3, 2025
ffe5658
Update views.py
ishvindersethi22 Jan 4, 2025
2e996ea
all comments removed
munishmangla98 Jan 16, 2025
c4e5094
all comments removed
munishmangla98 Jan 16, 2025
689861b
optimized code dataset
munishmangla98 Jan 16, 2025
872cef8
removed 5 things from dataset serializer
munishmangla98 Jan 17, 2025
00ca005
Un Comment view list
munishmangla98 Jan 18, 2025
6cd9576
data set optimized
munishmangla98 Jan 24, 2025
77032ad
data set optimized
munishmangla98 Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion backend/dataset/admin.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import resource
# import resource
from django.contrib import admin
from import_export.admin import ImportExportActionModelAdmin
from .resources import *
Expand Down
18 changes: 18 additions & 0 deletions backend/dataset/migrations/0047_speechconversation_freeze_task.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Generated by Django 3.2.14 on 2024-12-31 01:54

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('dataset', '0046_merge_20240416_2233'),
]

operations = [
migrations.AddField(
model_name='speechconversation',
name='freeze_task',
field=models.BooleanField(default=False, help_text='Field to Indicate whether the current task is frozen by the administrator to prevent being annotated.', verbose_name='freeze_task'),
),
]
7 changes: 7 additions & 0 deletions backend/dataset/models.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""
Model definitions for Dataset Management
"""

from django.db import models
from users.models import User, LANG_CHOICES
from organizations.models import Organization
Expand Down Expand Up @@ -485,6 +486,12 @@ class SpeechConversation(DatasetBase):
help_text=("Prepopulated prediction for the implemented models"),
)

freeze_task = models.BooleanField(
verbose_name="freeze_task",
default=False,
help_text="Field to Indicate whether the current task is frozen by the administrator to prevent being annotated.",
)

def __str__(self):
return str(self.id)

Expand Down
15 changes: 15 additions & 0 deletions backend/dataset/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,21 @@ class DatasetInstanceSerializer(serializers.ModelSerializer):
class Meta:
model = DatasetInstance
fields = "__all__"


class DatasetInstanceSerializerOptimized(serializers.ModelSerializer):
class Meta:
model = DatasetInstance
fields = [
"instance_id",
"parent_instance_id",
"instance_name",
"instance_description",
"dataset_type",
"public_to_managers",
"organisation_id"
]



class DatasetInstanceUploadSerializer(serializers.Serializer):
Expand Down
82 changes: 76 additions & 6 deletions backend/dataset/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@
from . import resources
from .models import *
from .serializers import *
from django.db.models import Prefetch, Q, F
from utils.dataset_utils import get_batch_dataset_upload_status
from rest_framework.response import Response
from rest_framework.decorators import action
from rest_framework import status
from .tasks import upload_data_to_data_instance, deduplicate_dataset_instance_items
import dataset
from tasks.models import (
Expand Down Expand Up @@ -186,6 +191,22 @@ def get_dataset_upload_status(dataset_instance_pk):


# Create your views here.
# def get_batch_dataset_upload_status(instance_ids):
# """
# Batch fetch upload status for a list of dataset instance IDs.
# Replace this with actual logic to retrieve status from your database.
# """
# # Mock data for testing
# status_data = {}
# for instance_id in instance_ids:
# status_data[instance_id] = {
# "last_upload_status": "Completed",
# "last_upload_date": "2023-01-01",
# "last_upload_time": "12:00:00",
# "last_upload_result": "Success",
# }
# return status_data

class DatasetInstanceViewSet(viewsets.ModelViewSet):
"""
ViewSet for Dataset Instance
Expand Down Expand Up @@ -244,6 +265,8 @@ def retrieve(self, request, pk, *args, **kwargs):
),
],
)


def list(self, request, *args, **kwargs):
# Org Owners and superusers see all datasets
if request.user.is_superuser:
Expand All @@ -257,7 +280,6 @@ def list(self, request, *args, **kwargs):
queryset = DatasetInstance.objects.filter(
organisation_id=request.user.organization
).filter(Q(public_to_managers=True) | Q(users__id=request.user.id))

if "dataset_visibility" in request.query_params:
dataset_visibility = request.query_params["dataset_visibility"]
if dataset_visibility == "all_public_datasets":
Expand All @@ -267,18 +289,15 @@ def list(self, request, *args, **kwargs):
queryset = queryset.filter(public_to_managers=True)
elif dataset_visibility == "my_datasets":
queryset = queryset.filter(users__id=request.user.id)

# Filter the queryset based on the query params
if "dataset_type" in dict(request.query_params):
queryset = queryset.filter(
dataset_type__exact=request.query_params["dataset_type"]
)

# Serialize the distinct items and sort by instance ID
serializer = DatasetInstanceSerializer(
queryset.distinct().order_by("instance_id"), many=True
)

# Add status fields to the serializer data
for dataset_instance in serializer.data:
# Get the task statuses for the dataset instance
Expand All @@ -288,14 +307,65 @@ def list(self, request, *args, **kwargs):
dataset_instance_time,
dataset_instance_result,
) = get_dataset_upload_status(dataset_instance["instance_id"])

# Add the task status and time to the dataset instance response
dataset_instance["last_upload_status"] = dataset_instance_status
dataset_instance["last_upload_date"] = dataset_instance_date
dataset_instance["last_upload_time"] = dataset_instance_time
dataset_instance["last_upload_result"] = dataset_instance_result

return Response(serializer.data)


# def get_queryset(self):
@action(detail=False, methods=["get"], url_path="optimized-list")
def list_optimized(self, request):
# Base queryset determination based on user role
queryset = DatasetInstance.objects.all()
if request.user.is_superuser:
queryset = queryset
elif request.user.role == User.ORGANIZATION_OWNER:
queryset = queryset.filter(
organisation_id=request.user.organization
)
else:
queryset = queryset.filter(
organisation_id=request.user.organization
).filter(Q(public_to_managers=True) | Q(users__id=request.user.id))
# Apply filters using request query parameters
dataset_visibility = request.query_params.get("dataset_visibility")
if dataset_visibility == "all_public_datasets":
queryset = queryset.filter(public_to_managers=True)
elif dataset_visibility == "my_datasets":
queryset = queryset.filter(users__id=request.user.id)
dataset_type = request.query_params.get("dataset_type")
if dataset_type:
queryset = queryset.filter(dataset_type__exact=dataset_type)
archived_datasets = request.query_params.get("archived_datasets")
if archived_datasets == "true":
queryset = queryset.filter(is_archived=True)
elif archived_datasets == "false":
queryset = queryset.filter(is_archived=False)
# Sort by criteria
sort_type = request.query_params.get("sort_type")
if sort_type == "recently_updated":
queryset = queryset.order_by(F("last_updated").desc(nulls_last=True))
else:
queryset = queryset.order_by("instance_id")
# Optimize related field loading
queryset = queryset.prefetch_related(
Prefetch("users"), # Prefetch the related users
)
# Serialize the data
serializer = DatasetInstanceSerializerOptimized(queryset.distinct(), many=True)
# Batch process upload status for all datasets
instance_ids = [instance["instance_id"] for instance in serializer.data]
status_data = get_batch_dataset_upload_status(instance_ids)
# Annotate upload status in the response
for dataset_instance in serializer.data:
instance_id = dataset_instance["instance_id"]
if instance_id in status_data:
dataset_instance.update(status_data[instance_id])
return Response(serializer.data, status=status.HTTP_200_OK)


@is_organization_owner
@action(methods=["GET"], detail=True, name="Download Dataset in CSV format")
Expand Down
Loading
Loading