Previous
Deploy a model
Viam’s managed training handles TensorFlow and TFLite classification and detection out of the box. For anything else, you can use a custom training script: a different framework, custom preprocessing, non-image data, or a training pipeline you want to share with your organization.
Before writing your own, check the registry for existing training scripts and pre-trained models you can deploy directly. If a training script there fits your needs, skip ahead to Submit a training job.
A training script is a Python project that the Viam platform runs in the cloud. The platform provides your script with a dataset and an output directory; your script produces a trained model.
my-training/
├── model/
│ ├── training.py
│ └── __init__.py
└── setup.py
setup.py declares your dependencies:
from setuptools import find_packages, setup
setup(
name="my-training",
version="0.1",
packages=find_packages(),
include_package_data=True,
install_requires=[
# Add your dependencies here, for example:
# "tensorflow>=2.11",
# "numpy",
],
)
Your script receives two required command-line arguments from the platform:
| Argument | Description |
|---|---|
--dataset_file | Path to a JSONLines file containing dataset metadata: file paths and annotations for each data point. |
--model_output_directory | Directory where your script must save its model artifacts. |
You can add custom arguments (like --num_epochs or --labels) and pass them when you submit the training job.
Here is the overall shape of a training script:
import argparse
import json
import os
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--dataset_file", dest="data_json",
type=str, required=True)
parser.add_argument("--model_output_directory", dest="model_dir",
type=str, required=True)
# Add custom arguments as needed:
parser.add_argument("--num_epochs", dest="num_epochs",
type=int, default=200)
return parser.parse_args()
def load_dataset(data_json):
"""Parse the JSONLines dataset file."""
entries = []
with open(data_json, "r") as f:
for line in f:
entries.append(json.loads(line))
return entries
if __name__ == "__main__":
args = parse_args()
dataset = load_dataset(args.data_json)
# --- Your training logic goes here ---
# Use the dataset entries to train a model with
# whatever framework fits your use case.
model = ...
labels = ...
# Save model artifacts to the output directory.
# The format must match what your ML model service expects.
# For example, tflite_cpu expects a .tflite file and labels.txt.
with open(os.path.join(args.model_dir, "model.tflite"), "wb") as f:
f.write(model)
with open(os.path.join(args.model_dir, "labels.txt"), "w") as f:
f.write("\n".join(labels))
The critical parts:
--dataset_file and --model_output_directory at minimum.image_path and either classification_annotations, bounding_box_annotations, or both. Non-image datasets will have a different structure depending on how the data was captured.tmp/ subdirectory are excluded: use it for intermediate work.If the script exits with a non-zero status or produces no files in the output directory, the training job is marked as failed.
Each line of the dataset file is a JSON object like this:
{
"image_path": "/path/to/data/img1.jpeg",
"classification_annotations": [{ "annotation_label": "blue_star" }],
"bounding_box_annotations": [
{
"annotation_label": "blue_star",
"x_min_normalized": 0.382,
"x_max_normalized": 0.51,
"y_min_normalized": 0.356,
"y_max_normalized": 0.527
}
]
}
Bounding box coordinates are normalized to the range 0.0-1.0 relative to image dimensions.
For classification, read classification_annotations.
For object detection, read bounding_box_annotations.
See the example training script for complete parsing functions that handle both annotation types.
The platform provides API_KEY and API_KEY_ID environment variables if your script needs to call Viam APIs during training, for example, to query additional data:
import os
from viam.rpc.dial import DialOptions
from viam.app.viam_client import ViamClient
async def connect() -> ViamClient:
dial_options = DialOptions.with_api_key(
os.environ.get("API_KEY"), os.environ.get("API_KEY_ID")
)
return await ViamClient.create_from_dial_options(dial_options)
For a full working training script, see the classification-tflite example on GitHub. It trains a TFLite single-label classification model using TensorFlow and Keras.
Before submitting a cloud training job, test your script locally against an exported dataset.
viam dataset export --destination=<destination> --dataset-id=<dataset-id>
This downloads the binary data files and a dataset.jsonl metadata file.
To download only the JSONL file without binary data, add --only-jsonl.
You can get the dataset ID from the DATASETS tab or by running viam dataset list.
The test-local command runs your training script inside the same Docker container that cloud training uses.
This catches problems that plain Python testing misses: missing system dependencies, Python version differences, and package conflicts.
viam training-script test-local \
--training-script-directory=my-training/ \
--dataset-file=dataset.jsonl \
--dataset-root=<destination> \
--model-output-directory=<output-dir>
The --dataset-file path is relative to --dataset-root.
The command mounts your script, dataset, and output directories into the container.
To match a specific cloud container version, use --container-version.
Run viam train containers list to list available container versions with their framework versions and end-of-life dates.
You can pass custom arguments with --custom-args:
viam training-script test-local \
--training-script-directory=my-training/ \
--dataset-file=dataset.jsonl \
--dataset-root=<destination> \
--model-output-directory=<output-dir> \
--custom-args=num_epochs=5,labels="label1 label2"
The training containers are built for linux/x86_64 (amd64). On ARM systems like Apple Silicon Macs, Docker uses Rosetta 2 emulation automatically, which may be slower but ensures your script runs in the same environment as cloud training.
If you prefer a quick check without Docker, you can run your script directly:
python3 -m model.training --dataset_file=<path/to/dataset.jsonl> \
--model_output_directory=<output-dir>
tar -czvf my-training.tar.gz my-training/
viam training-script upload --path=my-training.tar.gz \
--org-id=<org-id> --script-name=my-training-script
You can also specify --framework, --type, --visibility, and --description when uploading.
See the CLI reference for the full list of flags.
To find your organization ID, run viam organization list.
After uploading, your script appears in the registry.
Once a training script is in the registry, whether you uploaded it or are using someone else’s, submit a training job to run it against a dataset.
Every custom training job runs inside a container. You must select a container version that matches the framework your script requires (for example, a PyTorch container for a PyTorch script). To see the available containers with their framework versions and end-of-life dates, run:
viam train containers list
Use viam train submit custom from-registry:
viam train submit custom from-registry --dataset-id=<dataset-id> \
--org-id=<org-id> --model-name=my-model \
--model-version=1 --version=1 \
--script-name=<namespace>:<script-name> \
--container-version=<container-version> \
--args=num_epochs=100,labels="'label1 label2'"
You can get the dataset ID from the DATASETS tab or by running viam dataset list.
Use the ML Training Client API to submit training jobs programmatically.
In the Viam app, go to the DATA page and click the TRAINING tab. Click a job ID to view its logs.
List training jobs:
viam train list --org-id=<org-id> --job-status=unspecified
View logs for a specific job:
viam train logs --job-id=<job-id>
Training logs expire after 7 days. You will receive an email when your training job completes.
If a job fails, check the logs first: the error message usually indicates the problem. Note that training scripts may emit log lines at the error level and still succeed; check the final job status rather than individual log lines.
Was this page helpful?
Glad to hear it! If you have any other feedback please let us know:
We're sorry about that. To help us improve, please tell us what we can do better:
Thank you!