Video data available for this challenge has been recorded by cameras aimed at intersections in urban areas. Videos were recorded in diverse conditions, including daytime and nighttime conditions.

Data Sources

The NVIDIA AI City Data Set consists of the following video data sources
1. Silicon Valley Intersection Data - More than 70 hours of 1080p data at 30 frames per second captured from multiple vantage points.
2. Virginia Beach Intersection Data - More than 50 hours of 720x480 resolution data at 30 frames per second captured from traffic cameras.
3. Lincoln, Nebraska Data - More than 10 hours of 720x480 resolution data at 30 frames per second captured from handheld cameras.

The NVIDIA AI City Data Set will comprise of the following annotations
1. Collaborative annotation as applied to video data sources 1, 2 and 3 listed above

The NVIDIA AI City Challenge will also make available tools for the following purposes
1. Annotation
2. Evaluation

Participants are also allowed use of the UA-DETRAC benchmark suite which comprises of 10 hours of labeled traffic video data from multiple locations in China, available at

Overview of Annotation Process

As part of the 2017 IEEE Smart World NVIDIA AI City Challenge, 28 teams collaboratively annotated more than 150,000 keyframes extracted from over 80 hours of traffic video captured at various intersections in Santa Clara, and San Jose, California, Lincoln, Nebraska, and Virginia Beach, Virginia. More than 150 volunteers participated in the annotation phase. Each volunteer was asked to draw bounding boxes around and label objects in the following classes: Car, SUV, SmallTruck, MediumTruck, LargeTruck, Pedestrian, Bus, Van, GroupOfPeople, Bicycle, Motorcycle, TrafficSignal-Green, TrafficSignal-Yellow, TrafficSignal-Red, and Crossing. Annotators could use rectangles, ellipses, circles, and polygons to describe an object. Collaboratively, the teams contributed over 1.4M annotations. Some keyframes and annotations were removed following a quality review process. Moreover, since many of the videos were recorded at odd angles (not parallel to the road) and most popular frameworks expect rectangular bounding boxes, the "Crossing" objects lead to bounding boxes that covered many other objects and were removed from this year's dataset.

Labeled and Processed Datasets

After cleaning, the annotation data was processed into three datasets, namely aic480, aic1080, and aic540. The aic480 dataset contains all videos and associated keyframes of size 720x480. Similarly, the aic1080 dataset contains all videos and associated keyframes of size 1920x1080. The aic540 dataset is a down-sampled version of the aic1080 dataset. All datasets are located in the /datasets directory.

Each dataset was split into three sections (train, val, test), which are stored in subdirectories with the same name. The test partition has been temporarily removed and will be provided to teams several days before the challenge deadline. The train and val directories each contain the following sub-directories:

- images: The directory contains all keyframe images extracted from videos. The file names generally follow the format "<intersection>_<date>_[num_]_<frame_ID>.jpeg". Keyframes were extracted at 0.5 second intervals. The video timestamp where the keyframe was extracted from can thus be computed as <frame_ID> * 0.5.

- labels: The directory contains an annotation file for each image in the images directory, with the same name but the ".txt" extension. The file contains the bounding boxes for each annotated object in the image, one per line, in the following format: <class> <xmin> <ymin> <xmax> <ymax>. The numbers are pixel coordinates within the image canvas. The bounding box coordinates were derived by circumscribing the user annotated shape in a rectangle aligned with the canvas axes.

- video: The directory contains the videos that keyframes were extracted from.

- info: The directory contains log data captured from the keyframe extraction process. Note that video file names within the logs do no match the dataset video file names, but can be associated based on the log file name.

- unlabeled: The directory contains keyframes that were not labeled by the annotators or were removed by our filtering process.

- odd_frames: The directory contains all odd numbered frames from the dataset. These were not included in the annotation dataset.

- labels.json: The file contains the original annotations for each of the keyframes in the images directory. Unlike the annotations in the labels directory and those in the derived datasets (see next section), these annotations contain the original user-defined shapes (polygon, circle, etc.).

An additional dataset, ua-detrac, has been included in the ua-detrac directory. This data set has been made available by the University of Albany's UA-DETRAC benchmark system. It contains 30 fps keyframes from intersections in China, annotated using a different class set than the AIC datasets, including only the car, bus, van and other classes. The dataset has not yet been processed into the AIC format or derived formats. Check the /datasets/ directories at a later date for an update.

Derived Formats

Each of the three AIC datasets has been processed in three derived popular formats: KITTI, Pascal VOC, and DarkNet. The processed datasets can be found in the <dataset>-{kitti, voc, darknet} directories. The KITTI and VOC formats are similar to the AIC format, containing images and labels in different directories for the training and val splits, but require file names to be numeric. A mapping between the numbered image/label files is provided in the train/val directories. Note that the labels directory is named annotations in the darknet version of the dataset.

The VOC datasets have a different format. They contain files for both train and val images (annotations) in the same directory JPEGImages (Annotations), and provide files in the ImageSets/Main subdirectory that specify which of these images are training vs. validation images. Additionally, <split> (train or val) and the number of objects of type <class> that the keyframe contains. NOTE that this format is different than the original VOC format, which contains -1, 0, or 1 values, depending on the presence or absence of the object.


The /datasets/scripts directory contains two useful scripts: and
The script will display an image from an AIC dataset (aic480, aic540, or aic1080) and its associated annotations.

The script requires the Matplotlib, Numpy, and Pillow libraries and can be invoked in several ways:

python2 <image_path>
python2 <image_path> <annotation_path>
python2 <image_file> [script must be executed from dataset root directory]
python2 -d <dataset_root_path> <image_file>
python2 <annotation_path>
python2 <annotation_file> [script must be executed from dataset root directory]

/datasets/aic540$ /datasets/scripts/ san_tomas_20170602_028_00988.jpeg
/ $ /datasets/scripts/


- format: The script assumes the format of the dataset is AIC. The script currently supports the AIC, KITTI, and Darknet formats. VOC format support may be added at a later time. You must specify the format via the -f parameter if the image/annotation pair is part of one of the derived datasets, e.g.,
python2 -d /datasets/aic540-kitti -f kitti 000001.jpeg

- output: The script assumes screen output. Use the -o parameter to output to an image file instead.
python2 -d /datasets/aic540-kitti -f kitti -o 000001-ann.jpeg 000001.jpeg
TThe script helps create custom datasets by transforming AIC datasets into KITTI, DarkNet, and VOC datasets as those described above. Additionally, it can be used to re-scale original keyframe images and annotations to a preferred image size (e.g., 300x300).

The script requires the Pillow library and can be executed in the following way:

python2 -d <AIC_dataset_path> [-f <format: kitti, voc, darknet> -w <width> -h <height>] <NEW_dataset_path>

The width and height parameters are optional and are only necessary if you wish to rescale the input data. Note that test samples will be provided in the original dataset canvas size and any bounding boxes detected by your methods will need to be converted to that size. If the image size is the same as the original size, symbolic links will be created to the original keyframes, reducing the amount of space required to store new datasets.

Besides creating dataset versions that down/up-scale the keyframe canvas, the script can also be used to create a different split of the dataset. For example, one could create a new dataset where the val split is itself randomly divided into a val and test split (making sure to symlink the train directory rather than copying it), then use the script to generate a KITTI version of the new dataset. The following script does exactly this. Set the DS (current dataset) and NDS (new dataset) variables accordingly before executing:

export DS="aic480"
export NDS="aic480"
mkdir ${NDS}
cd ${NDS}/
ln -s "/datasets/${DS}/train/"
for f in val test; do mkdir $f; mkdir $f/images; mkdir $f/labels; done
tfile1=$(mktemp /tmp/foo.XXXXXXXXX)
cd "/datasets/${DS}/val/labels/"; ls | shuf > ${tfile1}; cd -
nl=$(( `cat ${tfile1} | wc -l` / 2 ))
tfile2=$(mktemp /tmp/foo.XXXXXXXXX)
awk '{f = substr($1, 1, length($1)-4); if(NR < '${nl}'){print "ln -s /datasets/${DS}/val/labels/"f".txt val/labels/"f".txt"; print "ln -s /datasets/${DS}/val/images/"f".jpeg val/images/"f".jpeg"} else {print "ln -s /datasets/${DS}/val/labels/"f".txt test/labels/"f".txt"; print "ln -s /datasets/${DS}/val/images/"f".jpeg test/images/"f".jpeg"}}' ${tfile1} > ${tfile2}
sh ${tfile2}
rm -f ${tfile1} ${tfile2}
cd ../

One could then create a KITTY version of the new dataset, scaled down to 300x300, by executing:

/datasets/scripts/ -d ${NDS} -f kitti -h 300 -w 300 ${NDS}-300x300


Contact if you have trouble with the datasets and/or associated scripts. If you are able to diagnose the problem and suggest a fix please also do that so as to speed up the process.

For any questions, please email