Deploying the uploader to the cloud

While there is not an end to end solution that you can deploy onto the cloud, the iridauploader does allow you to use it's modules to simplify your code for cloud deployment.

Why can't I just deploy straight to cloud?

The main difficulty is that each cloud storage solution maintains files differently, and it would not be feasible for us to support every cloud environment available.

How to Deploy to cloud

The simplest way is to incorperate the iridauploader modules from pip / PyPi .

pip install iridauploader

Example for creating a new instance of the API, and a MiSeq Parser:

import iridauploader.api as api
import iridauploader.parsers as parsers

api_instance = api.ApiCalls(client_id, client_secret, base_url, username, password, max_wait_time)
parser_instance = parsers.parser_factory("miseq")

Examples for deployment on Azure Cloud

In these examples we have the following setup: * We are using an Azure Function App using Python * Files are stored in blob storage containers (in our example myblobcontainer) * We use a BlobTrigger to run when a new run is uploaded with the path identifier myblobcontainer/{name}.csv

Example function.json file:

{
  "scriptFile": "__init__.py",
  "disabled": false,
  "bindings": [
      {
          "name": "myblob",
          "type": "blobTrigger",
          "direction": "in",
          "path": "myblobcontainer/{name}.csv",
          "connection":"AzureWebJobsStorage"
      }
  ]
}

For the following example, we have this simple setup at the top of our __init__.py function app file.

from azure.storage.blob import BlobServiceClient
from azure.storage.blob import BlobClient
from azure.storage.blob import ContainerClient
import azure.functions as func

from iridauploader import parsers


# connect to our blob storage
connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# These strings could be fetched somehow, but this works for an example
container_name = "myblobcontainer"
container_client = blob_service_client.get_container_client(container_name)

Miseq example

For this example, we will be getting the entire folder for a miseq run, as a set of blobs. When parsing directly from other sequencers, please consult the parser documentation for file structure differences.

def main(myblob: func.InputStream):
    logging.info('Python blob trigger function %s', myblob.name)

    # download the sample sheet so it can be parsed
    download_sample_sheet_file_path = os.path.join(local_path, local_file_name)
    with open(download_sample_sheet_file_path, "wb") as download_file:
        download_file.write(myblob.read())
    logging.info("done downloading")

    # get run directory (getting the middle portion)
    # example 'myblobcontainer/miseq_run/SampleSheet.csv' -> 'miseq_run
    run_directory_name = posixpath.split(posixpath.split(myblob.name)[0])[1]

    # we are gonna use miseq for this example
    my_parser = parsers.parser_factory("miseq")
    logging.info("built parser")

    # This example was tested locally on a windows machine, so replacing \\ with / was needed for compatibility
    relative_data_path = my_parser.get_relative_data_directory().replace("\\", "/")
    full_data_dir = posixpath.join(
        run_directory_name, 
        relative_data_path)

    # list the blobs of the run directory
    blob_list = list(container_client.list_blobs(full_data_dir))
    file_list = []
    # The file_blob_tuple_list could be useful when moving to the uploading stage in the case where
    # you do not want to use the iridauploader.api module to upload to irida, otherwise it can be ignored
    file_blob_tuple_list = []
    for file_blob in blob_list:
        file_name = remove_prefix(file_blob.name, full_data_dir)
        file_list.append(file_name)
        file_blob_tuple_list.append({"file_name": file_name, "blob": file_blob})

    # TODO, put a try catch around this with the parser exceptions.
    # We can catch errors within the samplesheet or missing files here
    sequencing_run = my_parser.get_sequencing_run(
        sample_sheet=download_sample_sheet_file_path, 
        run_data_directory=full_data_dir, 
        run_data_directory_file_list=file_list)
    logging.info("built sequencing run")

    # move to upload / error handling when the parser finds an error in the run


def remove_prefix(text, prefix):
    if text.startswith(prefix):
        return text[len(prefix):]
    raise Exception("should not happen")

Directory example

In this example we will be using the basic file layout for a directory upload.

.directory_run
├── file_1.fastq.gz
├── file_2.fastq.gz
└── SampleList.csv

def main(myblob: func.InputStream):
    logging.info('Python blob trigger function %s', myblob.name)

    # download the sample sheet
    download_sample_sheet_file_path = os.path.join(local_path, local_file_name)
    with open(download_sample_sheet_file_path, "wb") as download_file:
        download_file.write(myblob.read())
    logging.info("done downloading")

    # get run directory (getting the middle portion)
    # example 'myblobcontainer/directory_run/SampleSheet.csv' -> 'directory_run
    run_directory_name = posixpath.split(posixpath.split(myblob.name)[0])[1]

    # we are gonna use directory for this example
    my_parser = parsers.parser_factory("directory")
    logging.info("built parser")

    # list the blobs of the run directory
    blob_list = list(container_client.list_blobs(run_directory_name))
    file_list = []
    # The file_blob_tuple_list could be useful when moving to the uploading stage in the case where
    # you do not want to use the iridauploader.api module to upload to irida, otherwise it can be ignored
    file_blob_tuple_list = []
    for file_blob in blob_list:
        file_name = remove_prefix(file_blob.name, run_directory_name)
        # trim the leading 
        file_name = file_name.replace("/","")
        file_list.append(file_name)
        file_blob_tuple_list.append({"file_name": file_name, "blob": file_blob})

    # TODO, put a try catch around this with the parser exceptions.
    # We can catch errors within the samplesheet or missing files here
    sequencing_run = my_parser.get_sequencing_run(
        sample_sheet=download_sample_sheet_file_path, 
        run_data_directory_file_list=file_list)
    logging.info("built sequencing run")

    # move to upload / error handling when the parser finds an error in the run


def remove_prefix(text, prefix):
    if text.startswith(prefix):
        return text[len(prefix):]
    raise Exception("should not happen")