This is the Python Client for the Sama API endpoints and Databricks Connector.
See our Python SDK tutorial and Databricks Connector tutorial.
from sama import Client
client = Client("your_api_key")
client.create_task_batch("project_id", [{"url": "https://yoururl.com/img.jpg", "input2": "value2"}])
client.get_delivered_tasks("project_id", from_timestamp="2023-09-02T10:23:36.536167366Z")
from sama.databricks import Client
client = Client("your_api_key")
client.create_task_batch_from_table("project_id", spark_df) # spark_df contains inputs to tasks
spark_df = client.get_delivered_tasks_to_table(spark, "project_id", from_timestamp="2023-09-02T10:23:36.536167366Z")
This class provides methods to interact with SamaHub API endpoints.
This method is the constructor to initialize the SamaHub API client.
api_key
(str): The API key to use for authentication.silent
(bool, optional): Whether to suppress all print/log statements. Defaults toTrue
.logger
(Logger, optional): The logger to use for logging. Defaults toNone
.log_level
(int, optional): The log level to use for logging. Defaults tologging.INFO
.
This method creates a batch of tasks on SamaHub using the asynchronous batch task creation API endpoints, specifically the tasks file upload approach.
-
proj_id (str)
: The project ID on SamaHub where tasks will be created. -
task_data_records (List[Dict[str, Any]])
: A list of task "data" dictionaries which can contain inputs and pre-annotations. -
batch_priority (int, default=0)
: The priority of the batch. A negative number indicates a higher priority. -
notification_email (Union[str, None], default=None)
: An email address where SamaHub will send notifications about the batch creation status. -
submit (bool, default=False)
: A flag determining whether to create the tasks in a submitted state.
- A JSON response from the last
create a batch of tasks from an uploaded file
endpoint call.
The method first constructs a tasks list with data, priority, and submission status. It then calls the 'create a batch of tasks' endpoint without providing the actual tasks list. This initial call returns a batch ID and a tasks_put_url (a pre-signed AWS S3 URL), which allows for the tasks to be uploaded directly to S3. This method bypasses the 1000 tasks limit. After uploading the tasks to S3, a subsequent API call is made to notify the system that the tasks are uploaded and to begin the task creation process.
Cancels an ongoing batch creation job.
Parameters:
proj_id (str)
: The project ID on SamaHub where the task exists.batch_id (str)
: The ID of the batch to cancel.
Returns:
- Response from the API endpoint.
Rejects a task on SamaHub to send it back for rework.
Parameters:
proj_id (str)
: The project ID on SamaHub where the task exists.task_id (str)
: The ID of the task to reject.reasons (List[str])
: List of reasons for rejecting the task.
Returns:
- Response from the API endpoint.
Updates priority of tasks
Parameters:
project_id (str)
: The project ID on SamaHub where the task exists.task_ids (List[str])
: The IDs of the tasks to update priority.priority (int)
: The priority.
Returns:
- Response from the API endpoint.
Delete tasks
Parameters:
project_id (str)
: The project ID on SamaHub where the task exists.task_ids (List[str])
: The IDs of the tasks to delete.
Returns:
- Response from the API endpoint.
Fetches the status and details of a single task. More details can be found in the Sama documentation.
Parameters:
proj_id
: The project ID on SamaHub.task_id
: The ID of the task to fetch.same_as_delivery (default=True)
: Whether to fetch the task as it would be delivered.
Returns:
- Task details from the API.
Fetches status and details for multiple tasks. Returns a generator object. More details can be found in the Sama documentation.
Parameters:
proj_id (str)
: The unique identifier of the project on SamaHub. This parameter specifies the project under which the tasks reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, it filters tasks that belong to this specific batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. This is useful for filtering tasks based on client-defined batches.client_batch_id_match_type (str, optional)
: This parameter specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".date_type (str, optional)
: Determines which date to use for the timestamp filters. Examples might include "creation_date" or "completion_date".from_timestamp (str, optional)
: Filters tasks that have a date (specified bydate_type
) after this timestamp.to_timestamp (str, optional)
: Filters tasks that have a date (specified bydate_type
) before this timestamp.state (TaskStates, optional)
: An enum value that specifies the desired status of the tasks to filter. For example, "delivered" or "acknowledged".omit_answers (bool, optional)
: Flag to determine if answers related to tasks should be omitted from the response. Defaults to True.
Returns:
- An iterable generator object with task details.
Get all task deliveries since a given timestamp (RFC3339 format).
Parameters:
proj_id (str)
: The unique identifier of the project on SamaHub. It specifies the project under which the deliveries reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, it filters deliveries that belong to this specific batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. This is useful for filtering deliveries based on client-defined batches.client_batch_id_match_type (str, optional)
: Specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".from_timestamp (str, optional)
: Filters deliveries that have a date after this timestamp.task_id (str, optional)
: The unique identifier for a specific task. If provided, it fetches deliveries related to this specific task.
Returns:
- An iterable generator object with task deliveries.
Fetches all task deliveries since the last call based on a consumer token.
Parameters:
proj_id (str)
: The unique identifier of the project on SamaHub. It specifies the project under which the deliveries reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, it filters deliveries that belong to this specific batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. This is useful for filtering deliveries based on client-defined batches.client_batch_id_match_type (str, optional)
: Specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".- `consumer (str, optional)``: Token that identifies the caller, so different consumers can be in different places of the delivered tasks list.
Returns:
- An iterable generator object with task deliveries.
Fetches information about a batch creation job.
Parameters:
proj_id
: The project ID on SamaHub.batch_id
: The ID of the batch to fetch details for.omit_failed_task_data (default=False)
: Whether to omit data about failed tasks.
Returns:
- Batch creation job details.
Fetches the JSON schema for task creation on SamaHub.
Parameters:
project_id (str)
: The project ID on SamaHub.
Returns:
- A dictionary containing the JSON schema for task creation.
Fetches the JSON schema for task deliveries on SamaHub.
Parameters:
project_id (str)
: The project ID on SamaHub.
Returns:
- A dictionary containing the JSON schema for task deliveries.
Fetches high-level information about a project from SamaHub.
Parameters:
project_id (str)
: The project ID on SamaHub.
Returns:
- A dictionary containing information about the project.
Fetches high-level statistics about a project's tasks within a specified time frame from SamaHub.
Parameters:
project_id (str)
: The project ID on SamaHub.from_timestamp (str, optional)
: Filters tasks that have a date after this timestamp.to_timestamp (str, optional)
: Filters tasks that have a date before this timestamp.
Returns:
- A dictionary containing project statistics.
Creates a batch of tasks using data from a DataFrame. Each DataFrame column will be used as an input to the task creation, e.g. url='https://wiki.com/img.jpg', client_batch_id='batch1' Prepend 'output_' to column to specify pre-annotations Return JSON - batch_id if successful
Parameters:
spark_dataframe (DataFrame)
: The Spark DataFrame to be converted to task data records.project_id (str)
: The project ID on SamaHub where tasks are to be created.batch_priority (int)
: The priority of the batch. Defaults to 0. Negative numbers indicate higher prioritynotification_email (Union[str, None])
: The email address where SamaHub should send notifications about the batch creation status. Defaults to Nonesubmit (bool)
: Whether to create the tasks in submitted state. Defaults to False
Returns: JSON - batch_id if successful
This method fetches all deliveries since a given timestamp
Parameters:
spark (SparkSession)
: A spark sessionproject_id (str)
: The unique identifier of the project on SamaHub. Specifies the project under which the deliveries reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, filters deliveries that belong to this batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. Useful for filtering deliveries based on client-defined batches.client_batch_id_match_type (str, optional)
: Specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".from_timestamp (str, optional)
: Filters deliveries that have a date after this timestamp.task_id (str, optional)
: The unique identifier for a specific task. If provided, fetches deliveries related to this specific task.
Returns:
Returns deliveries in a DataFrame.
Fetches all deliveries since the last call based on a consumer token.
Parameters:
spark (SparkSession)
: A spark sessionproject_id (str)
: The unique identifier of the project on SamaHub. Specifies the project under which the deliveries reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, filters deliveries that belong to this batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. Useful for filtering deliveries based on client-defined batches.client_batch_id_match_type (str, optional)
: Specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".consumer (str, optional)
: Token that identifies the caller, so different consumers can be in different places of the delivered tasks list.
Returns:
Returns deliveries in a DataFrame.
Fetches the status and details of a single task. More details can be found in the Sama documentation.
Parameters:
spark (SparkSession)
: A spark sessionproject_id
: The project ID on SamaHub.task_id
: The ID of the task to fetch.same_as_delivery (default=True)
: Whether to fetch the task as it would be delivered.
Returns:
- A DataFrame containing the task status
Fetches status and details for multiple tasks. More details can be found in the Sama documentation.
Parameters:
spark (SparkSession)
: A spark sessionproject_id (str)
: The unique identifier of the project on SamaHub. This parameter specifies the project under which the tasks reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, it filters tasks that belong to this specific batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. This is useful for filtering tasks based on client-defined batches.client_batch_id_match_type (str, optional)
: This parameter specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".date_type (str, optional)
: Determines which date to use for the timestamp filters. Examples might include "creation_date" or "completion_date".from_timestamp (str, optional)
: Filters tasks that have a date (specified bydate_type
) after this timestamp.to_timestamp (str, optional)
: Filters tasks that have a date (specified bydate_type
) before this timestamp.state (TaskStates, optional)
: An enum value that specifies the desired status of the tasks to filter. For example, "delivered" or "acknowledged".omit_answers (bool, optional)
: Flag to determine if answers related to tasks should be omitted from the response. Defaults to True.
Returns:
- A DataFrame with tasks and their status