Apps

An 'App' is a versioned program that allows you to run code on ByteNite.

Your app can range from a single function to a complex system with multiple libraries and functions. Currently, we only support Python apps.

This guide details the components of an app's directory and provides instructions on where and how to write code for embedding it into a ByteNite app.

App Directory Overview

To create a new app directory in your local environment, execute the following ByteNite Dev CLI command:

bytenite app new [app-id]

A sample folder with pre-populated files and fields will be generated at your current path:

/[app-id]
├── app
│   ├── runner.py
│   ├── main.py
│   └── [libraries]
├── manifest.json
└── schema.json

Directory Structure:


Initialize App Settings: manifest.json

The app's manifest outlines the configuration, requirements, ID, and version information of your app. Many applications heavily depend on the underlying hardware or container, so customizing this configuration file to your app's needs will significantly impact on its smooth execution.

While the manifest can vary between versions, allowing you to adjust configurations as your app evolves, the ID is permanently assigned during your initial app upload.

Below is an example configuration file, including ID, version, platform, platform config, and device requirements:

manifest.json
{
  "id": "my-first-stable-diffusion-app",
  "version": "0.4.rev0",
  "platform": "docker",
  "description": "A stable diffusion app using HuggingFace's diffusers",
  "entrypoint": "main.py",
  "platform_config": {
    "container": "huggingface/diffusers-pytorch-cuda:latest"
  },
  "device_requirements": {
    "min_cpu": 4,
    "min_memory": 8
  }
}

App Versioning

The manifest fields that control your app's ID and version are id and version . Use these fields to manage multiple uploads and maintain consistency across your deployments.

id string

Description:

The given ID of your app.

version string

Description:

A semantic version for managing updates.

Supported Format:

"[major].[minor].rev[revision]"

  • major int

  • minor int

  • revision int

Example:

"0.1.rev2"

Platform & Hardware Requirements

ByteNite offers integration with container images that can be pulled from the Docker container registry (Docker Hub), providing all most common container types.

The platform and platformConfig fields let you choose platform type and the container image reference that will be imported and used by your app.

The deviceRequirements field lets you specify hardware requirements for your app. All the hardware specifications that you can control using ByteNite are located inside this field.

platform string

Description:

The platform required by the app to run.

Supported Values:

  • "docker"

platformConfig object

Description:

An object containing platform configurations.

Supported Properties:

  • container string A Docker container image reference. Refer to the official Docker documentation to select the most suitable base image for your application. Examples:

    • "python:3.8-alpine"

    • "tensorflow/tensorflow:latest-gpu"

    • "blender/blender:latest"

  • private_image boolean Set this flag to true if the image repository is private.

  • username string Your Docker Hub username (not required if private_image is false).

  • token string Your Docker Hub Personal Access Token (PAT) (not required if private_image is false).

Examples:

{
    "container": "huggingface/transformers-pytorch-cpu:latest",
    "private_image": true,
    "username": "4925k",
    "token":"dckr_pat_HgNOmERVLDm1YBSvAJELJeGOOAM"
}
deviceRequirements object

Description:

Minimum hardware requirements of the devices that will run the app.

Supported Properties:

  • min_cpu int The minimum number of vCPUs required for the machine running the app.

  • min_memory int The minimum amount of RAM in GiB required for the machine running the app.

Example:

{"min_cpu": 2, "min_memory": 2}

Additionally, you can check out a complete list of app metadata as returned by the API endpoint in the Apps API reference.


Develop the Main Script: runner.py

The runner.py script implements your app’s core functionality: reading input chunks, applying your logic, and producing an output. This script operates within a standardized framework to ensure smooth integration with ByteNite’s system.

Central to this script is the run function, where you will develop four parts:

1. Reading Inputs

2. Handling Parameters

3. Developing the Core Functionality

4. Saving Outputs

The following paragraphs contain a description and breakdown of the run function. Let's start by analyzing the arguments:

runner.py
def run(basepath, params, metadata):
basepath string

Description:

The default base path of the task runner environment. Use this variable to access the input file and to write outputs:

  • Input file location: os.path.join(basepath, 'data.bin')

  • Output directory location: os.path.join(basepath, 'result')

params dictionary

Description:

metadata dictionary

Description:

A dictionary containing data type and other relevant metadata.

1. Reading Inputs

Your run function should start by reading the input file, data.bin, located in the task directory. This file contains the raw binary input data for your task, as passed by your Partitioning Engine. If you're handling common types like strings, vectors, tables, or media, remember to decode them before moving to the next step. For instance:

runner.py > run
    # 1. Reading Inputs
    
    # Read input data i.e from basepath/data.bin
    with open(os.path.join(basepath, 'data.bin'), 'rb') as f:
        data = f.read()
        
    # Transform and process data accordingly. 
    data_decoded = data.decode('utf-8') # Example assuming the binary data chunk is a UTF-8 encoded string

If your app doesn't need to handle any input file, you may skip this step.

2. Handling Parameters

When your application receives parameters, they can be accessed through the params variable. This dictionary reflects the data provided in a Create Job request at params.app. For example, if your API request includes the following params object, everything inside params.app denotes the job parameters:

params
{
    "app": {
        "case": "lower"
    },
    "partitioner": {...},
    "assembler": {...}
}

The params.app content will be assigned to your params variable, which you can use as in this example:

runner.py > run
    # 2. Handling Parameters
    
    # Use parameters as expected in your app params object.
    case = params['case'] # Example assuming there is a "case" key in the params

3. Developing the Core Functionality

The primary functionality of your app should be implemented within the run function as well. This can also be extended through external libraries placed in the app folder, which can then be invoked by the run function.

runner.py > run
    # 3. Developing the Core Functionality
    
    # Develop your app's core processing functionality
    # For example, processing an input string according to the parameter 'case':
    if case == "upper":
        data_processed = data_decoded.upper()
    elif case == "lower":
        data_processed = data_decoded.lower()
    elif case == "title":
        data_processed = data_decoded.title()
    else:
        data_processed = data_decoded

4. Saving Outputs

Store the processed result as expected by your Assembling Engine in the output directory os.path.join(basepath, OUTPUT_PATH) . If you're using the passthrough engine, this is going to be your final output file. Example:

runner.py > run
    # 4. Saving Outputs
    
    # Save your output file at OUTPUT_PATH within your basepath
    output_path = os.path.join(basepath, OUTPUT_PATH, 'hello_world.txt')
    with open(output_path, 'w') as f:
        f.write(output_string)


Require Parameter Validation: schema.json

Receiving parameters in your app that are already 'validated', meaning they comply with a JSON Schema, can help prevent key-not-found errors or other issues arising from malformed parameter objects.

The schema.json file within your app directory contains your JSON schema. Input app parameters submitted to the Jobs API will be validated against this schema before the job can start.

Here is an example of a JSON schema that requires a single input string with a maximum length of 50 characters:

schema.json
{
  "$id": "db://image-generation-simple-inputs",
  "definitions": {
    "input": {
      "type": "string",
      "description": "A prompt for image generation",
      "maxLength": 50
    }
  },
  "properties": {
    "prompt": {
      "$ref": "#/definitions/input"
    }
  },
  "required": ["prompt"],
  "title": "Stable Diffusion App Schema"
}

Another benefit of schemas is that they generate a graphical interface in your Job Launch UI console, allowing you or your users to set up and launch jobs easily.


💡 App Development Tips

Here are a few tips the get the most out of your ByteNite apps:

  • When you write code for your app, it will be executed by distributed task runners. This means your app’s code will run on each worker machine as a separate task based on your data partitioning setup. To ensure efficient execution, provide only the core logic that should run on each worker node. Do not include custom distributed computing logic—task distribution and resource orchestration are fully handled by ByteNite’s load-balancing system.

  • However, within each worker node, you can utilize multiple CPU cores to take advantage of multi-core architectures.

  • Follow the guidelines in the 1. Reading Inputs and 4. Saving Outputs sections to manage data properly. Ensure your data sources are correctly set up, as custom data implementations can cause errors or inefficiencies and increase your container runtimes, leading to increased costs.

  • We recommend incorporating robust error handling and logging to debug your apps efficiently.

An here are a few dos and don'ts about your app's code:

Should define the core functionality of your app.

Can distribute subtasks across multiple CPU cores on the worker machine.

Should not distribute tasks to other worker machines. Your app code is already the one running on each task runner.

Should not read from additional data sources. Data sources are configured in the Jobs API, and data ingestion is handled by the Partitioning Engine.

Should not write to additional data sources. Data sources are configured in the Jobs API, and data export is handled by the Assembling Engine.

⚠️ Can interact with external data sources or call additional endpoints only if necessary—when the required functionality cannot be handled by the Data Engines or Data Sources.

Last updated

Was this helpful?