Documentation
  • Introduction
  • Tutorials
    • Getting started
    • Python SDK quickstart
    • Model-assisted labeling
  • How to annotate
    • Label images
      • View and navigate in the image interfaces
      • Image interface settings
      • Image segmentation interface
      • Image vector interface
    • Label 3D point clouds
      • View and navigate in the 3D interface
      • Upload, view, and overlay images
      • 3D interface settings
      • 3D point cloud cuboid interface
      • 3D point cloud vector interface
      • 3D point cloud segmentation interface
      • Merged point cloud view (for static objects)
      • Batch mode (for dynamic objects)
      • Smart cuboid propagation
      • 3D to 2D Projection
      • Tips for labeling cuboid sequences
    • Label sequences of data
      • Use track IDs in sequences
      • Use keyframe interpolation
    • Annotate object links (beta)
    • Customize hotkeys
  • How to manage
    • Add collaborators to a dataset
    • Create an organization
    • Configure the label editor
    • Customize label queue
    • Search within a dataset
    • Clone a dataset
    • Work with issues
    • Bulk change label status
    • Manage QA processes
  • How to integrate
    • Import data
      • Cloud integrations
    • Export data
      • Structure of the release file
      • Exporting image annotations to different formats
    • Integrations
      • Hugging Face
      • W&B
      • Databricks
      • SceneBox
    • Create an API key
    • Upload model predictions
    • Set up webhooks
  • Background
    • Main concepts
    • Sequences
    • Label queue mechanics
    • Labeling metrics
    • 3D Tiles
    • Security
  • Reference
    • Python SDK
    • Task types
    • Sample formats
      • Supported file formats
    • Label formats
    • Categories and attributes
    • API
Powered by GitBook
On this page
  • Image
  • Image sequence
  • 3D point cloud
  • Point cloud data
  • Camera image
  • Ego pose
  • 3D point cloud sequence
  • Multi-sensor sequence

Was this helpful?

  1. Reference

Sample formats

PreviousTask typesNextSupported file formats

Last updated 6 months ago

Was this helpful?

A is a data point you want to label. Samples come in different types, like an image, a 3D point cloud, or a video sequence. When uploading () or downloading () a sample using the , the format of the attributes field depends on the type of sample. The different formats are described here.

The section Import data shows how you can obtain URLs for your assets.

Image

Supported image formats: jpeg, png, bmp.

{
    "image": {
        "url": "https://example.com/image.jpg"
    }
}

If the image file is on your local computer, you should first upload it to our asset storage service (using ) or to another cloud storage service.

Image sequence

Supported image formats: jpeg, png, bmp.

{ 
  "frames": [
    {
      "image": {
        "url": "https://example.com/frame_00001.jpg"
      },
      "name": "frame_00001" // optional
    },
    {
      "image": {
        "url": "https://example.com/frame_00002.jpg"
      },
      "name": "frame_00002"
    },
    {
      "image": {
        "url": "https://example.com/frame_00003.jpg"
      },
      "name": "frame_00003"
    }
  ]
} 

3D point cloud

On Segments.ai, the up direction is defined along the z-axis, i.e. the vector (0, 0, 1) points up. If you upload point clouds with a different up direction, you might have trouble navigating the point cloud.

{
    "pcd": {
        "url": "https://example.com/pointcloud.pcd",
        "type": "pcd"
    },
    "images": [
        { ... },
        { ... },
        { ... }
    ], // optional
    "name": "frame_00001", // optional
    "timestamp": "00001", // optional
    "ego_pose": {
        "position": {
            "x": -2.7161461413869947,
            "y": 116.25822288149078,
            "z": 1.8348751887989483
        },
        "heading": {
            "qx": -0.02111296123795955,
            "qy": -0.006495469416730261,
            "qz": -0.008024565904865688,
            "qw": 0.9997181192298087
        }
    },
    "default_z": -1, // optional, 0 by default
    "bounds": { // optional
        "min_z": -1,
        "max_z": 3
    }
}
Name
Type
Description

pcd

Required. Point cloud data.

images

Reference camera images.

name

string

Name of the sample.

timestamp

int, float, or string

Timestamp of the sample. Should be in nanoseconds for accurate velocity/acceleration calculations. Will also be used for interpolation unless disabled in dataset settings.

ego_pose

Pose of the sensor that captured the point cloud data.

default_z

float

Default z-value of the ground plane. 0 by default. Only valid in the point cloud cuboid editor. New cuboids will be drawn on top of the ground plane, i.e. the default z-position of a new cuboid is 0.5 (since the default height of a new cuboid is 1).

bounds

dict of <string, float>

Point cloud bounds: a dict with values that are used to initialize the limiting cuboid. The z-values are also used for height coloring when provided.

Supported values: min_x, max_x, min_y, max_y, min_z and max_z.

Point cloud data

{
    "url": "https://example.com/pointcloud.bin",
    "type": "kitti"
}
Name
Type
Description

url

string

Required. URL of the point cloud data.

type

string: "pcd" | "binary-xyzi" | "kitti" | "binary-xyzir" | "nuscenes" | "ply"

Camera image

A calibrated or uncalibrated reference image corresponding to a point cloud. The reference images can be opened in a new tab from within the labeling interface. You can determine the layout of the images by setting the row and col attributes on each image. If you also supply the calibration parameters (and distortion parameters if necessary), the main point cloud view can be set to the image to obtain a fused view.

{
    "name": "Camera example 1", // optional
    "url": "https://example.com/image.jpg",
    "row": 0,
    "col": 0,
    "intrinsics": { // optional
        "intrinsic_matrix": [
            [1266.417203046554, 0, 816.2670197447984],
            [0, 1266.417203046554, 491.50706579294757],
            [0, 0, 1]
        ]
    },
    "extrinsics": { // optional
        "translation": {
            "x": -0.012463384576629082,
            "y": 0.76486688894964,
            "z": -0.3109103442096661
        },
        "rotation": {
            "qx": 0.713640516187247,
            "qy": -0.001134052598226082,
            "qz": 0.0036449450274057696,
            "qw": 0.7005017073187271
        }
    },
    "distortion": { // optional
        "model": "fisheye",
        "coefficients": {
            "k1": -0.0539124,
            "k2": -0.0101993,
            "k3": -0.00202017,
            "k4": 0.00120938
        }
    },
    "camera_convention": "OpenCV", // optional
    "rotation": 1.5708 // optional
}
Name
Type
Description

name

string

Name of the camera image.

url

string

Required. URL of the camera image.

row

int

Required. Row of this image in the images viewer.

col

int

Required. Column of this image in the images viewer.

intrinsics

Intrinsic parameters of the camera.

extrinsics

distortion

Distortion parameters of the camera.

camera_convention

string: "OpenGL" | "OpenCV"

Convention of the camera coordinates. We use the OpenGL/Blender coordinate convention for cameras. +X is right, +Y is up, and +Z is pointing back and away from the camera. -Z is the look-at direction. Other codebases may use the OpenCV convention, where the Y and Z axes are flipped but the +X axis remains the same. See diagram 1.

rotation

float

Camera intrinsics

{
    "intrinsic_matrix": [
        [1266.417203046554, 0, 816.2670197447984],
        [0, 1266.417203046554, 491.50706579294757],
        [0, 0, 1]
    ]
}
Name
Type
Description

intrinsic_matrix

Camera extrinsics

{
    "translation": {
        "x": -0.012463384576629082,
        "y": 0.76486688894964,
        "z": -0.3109103442096661
    },
    "rotation": {
        "qx": 0.713640516187247,
        "qy": -0.001134052598226082,
        "qz": 0.0036449450274057696,
        "qw": 0.7005017073187271
    }
}
Name
Type
Description

translation

object: { "x": float, "y": float, "z": float }

rotation

object: { "qx": float, "qy": float, "qz": float,

"qw": float }

Distortion

// Fisheye
{ 
    "model": "fisheye",
    "coefficients": {
        "k1": -0.0539124,
        "k2": -0.0101993,
        "k3": -0.00202017,
        "k4": 0.00120938
}
// Brown-Conrady
{ 
    "model": "brown-conrady",
    "coefficients": {
        "k1": -0.2916058942,
        "k2": 0.0763231072,
        "k3": 0.0,
        "p1": 0.0014829263,
        "p2": -0.0019540316
    }
}
Name
Type
Description

model

string: "fisheye" | "brown-conrady"

coefficients

Fisheye: object: {

"k1": float,

"k2": float,

"k3": float,

"k4": float,

}

Brown-Conrady: object: {

"k1": float,

"k2": float,

"k3": float,

"p1": float,

"p2": float

}

Ego pose

The pose of the sensor used to capture the 3D point cloud data. This can be helpful if you want to obtain cuboids in world coordinates, or when your sensor is moving. In the latter situation, supplying an ego pose with each frame will ensure that static objects do not move when switching between frames.

{
    "position": {
        "x": -2.7161461413869947,
        "y": 116.25822288149078,
        "z": 1.8348751887989483
    },
    "heading": {
        "qx": -0.02111296123795955,
        "qy": -0.006495469416730261,
        "qz": -0.008024565904865688,
        "qw": 0.9997181192298087
    }
},
Name
Type
Description

position

object: { "x": float, "y": float, "z": float }

Required. XYZ position of the sensor in world coordinates.

heading

object: { "qx": float, "qy": float, "qz": float,

"qw": float }

Segments.ai uses 32-bit floats for the point positions. Keep in mind that 32-bit floats have limited precision. In fact, only 24 bits can be used to represent the number itself (the significand, excluding the sign bit), or about 7.22 decimal digits. If you want to keep two decimal places, this only leaves 5.22 decimal digits, so the numbers shouldn't be larger than 10^5.22 = 165958.

To avoid rounding problems, it is best practice to subtract the ego position of the first frame from all other ego positions. This way, the first ego position is set to (0, 0, 0) and the subsequent ego positions are relative to (0, 0, 0) . In your export script, you can add the ego position of the first frame back to the object positions.

3D point cloud sequence

{ 
  "frames": [
    { ... },
    { ... },
    { ... }
  ]
} 
Name
Type
Description

frames

Required. List of 3D point cloud frames in the sequence.

Multi-sensor sequence

{
  "sensors": [
    {
      "name": "Lidar", 
      "task_type": "pointcloud-cuboid-sequence",
      "attributes": { ... }
    },
    {
      "name": "Camera 1", 
      "task_type": "image-vector-sequence",
      "attributes": { ... } 
    },
    ...
  ]
}
Name
Type
Description

sensors

Required. List of the sensors that can be labeled.

Sensor

Name
Type
Description

name

string

Required. The name of the sensor.

task_type

string

attributes

object

array of

See for the supported file formats.

Required. Type of the point cloud data. See file formats for the list of supported file formats.

If the point cloud file is on your local computer, you should first upload it to our asset storage service (using ) or to another cloud storage service.

Extrinsic parameters of the camera relative to the .

The rotation that needs to be applied when displaying the image. Valid options are 0, , , and . Useful for when a camera is mounted upside-down.

If the image file is on your local computer, you should first upload it to our asset storage service (using ) or to another cloud storage service.

2D array of floats representing 3x3 matrix in row-major order​

Required. Intrinsic matrix used in the pinhole camera model.

​ and ​ are the focal lengths in pixels. We assume square pixels, so ​. and are the offsets (in pixels) of the principal point from the top-left corner of the image frame.

Required. Translation of the camera in lidar coordinates, i.e., relative to the .

Required. Rotation of the camera in lidar coordinates, i.e., relative to the (or equivalently: a transformation from camera frame to ego frame). Defined as a . By default, we use the OpenGL/Blender coordinate convention for cameras. +X is right, +Y is up, and +Z is pointing back and away from the camera. -Z is the look-at direction. Other codebases may use the OpenCV convention, where the Y and Z axes are flipped but the +X axis remains the same. See diagram 1. You can specify the camera convention in .

Required. Type of the distortion model: or .

Required. Coefficients of the distortion model: k1, k2, k3, k4 for fisheye (see the ) and k1, k2, k3, p1, p2 for Brown-Conrady (see the , note that and are not used).

Required. Orientation of the sensor. Defined as a .

array of

array of

Required. The of the sensor. Currently, pointcloud-cuboid-sequence and image-vector-sequence are supported.

Required. The sample attributes for the sensor. Currently, and are supported.

π4\frac{\pi}{4} 4π​
π2\frac{\pi}{2}2π​
3π4\frac{3 \pi}{4}43π​
KKK
KKK
K=[fx0ox0fyoy001]K = \begin{bmatrix} f_x & 0 & o_x\\ 0 & f_y & o_y \\ 0 & 0 & 1 \end{bmatrix}K=​fx​00​0fy​0​ox​oy​1​​
fxf_xfx​
fyf_yfy​
fx=fyf_x = f_yfx​=fy​
oxo_xox​
oyo_yoy​
upload_asset()
upload_asset()
fisheye
Brown-Conrady
k4k_4k4​
k5k_5k5​
OpenCV fisheye model
OpenCV distortion model
rotation quaternion
task type
Point cloud data
camera images
Ego pose
Camera intrinsics
Camera extrinsics
ego pose
Distortion
ego pose
Camera image
rotation quaternion
ego pose
3D point clouds
sensors
3D point cloud sequence
image sequence
client.add_sample()
client.get_sample()
Python SDK
upload_asset()
sample
#3d-point-cloud
3D point cloud formats
Diagram 1: camera convention for calibrated camera images on Segments.ai.