ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.



This work has been published in arXiv: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.


  • train contains tools for training network using various architectures. It can be further used for visulaization of network's performance. This section is mainly for pixelwise segmentation and scene-parsing.
  • visualize can be used to view the performance of trained network on any video/image as an overlay. (Will be added soon)

Trained model

Find a train model here:

Implementation in other frameworks:

Thank you for your contribution. We have not verified results of the above two implementations but still we feel that researchers working on these different frameworks might find it useful.


  • Testing the network

    Testing the network

    Hi I was able to successfully train the encoder and the decoder. I wanted to test the network with an image and am setting up a script in Lua. I am a little confused understanding the network as an end-to-end system. When I load the .net file from training session of the decoder, I see the following model:

    nn.ConcatTable {
        |`-> (1): cudnn.SpatialConvolution(3 -> 13, 3x3, 2,2, 1,1)
        |`-> (2): nn.SpatialMaxPooling(2x2, 2,2)
         ... -> output

    However, when I load the .net file from the training session of the encoder, I see the model is much bigger. Am I suppose to pass the test image through the encoder and then connect the output of the encoder to the decoder?

    Following is a snippet of how I'm loading the .net files.

    require 'nn'
    require 'image'
    require 'cunn'
    require 'cudnn'
    test_img = '/path/to/image/test.png'
    network = '/path/to/train/trained_decoder/model-299.net'
    net = torch.load(network,'b64')
  • difficulty in reproducing your result

    difficulty in reproducing your result


    Your team mentioned the significance of setting batch size in training. Then may I know how do you explain why batch size would impact on the final result so strongly?

    I have trained with a batch size of 2 and adjusted my learning rate to 1e-5. I also modified the original code by adding a iterSize of 4. In essence, the real batch size is 8. However, it wouldn't achieve the performance of the pre-trained model you provided.

    Here's the error curve: Encoder screen shot 2016-09-10 at 4 18 09 pm

    Decoder screen shot 2016-09-10 at 4 18 20 pm

    My result:


    Thank you for answering.

  •  Early convergence Issue

    Early convergence Issue


    I found several model I trained would reach the great performance that the model-best.net you offered. I set opt.lua according to your documentation, except that I used a batch size of 2.

    Here's my result: image

    and there is the result tested by your model-best.net


    I trained it on cityscapes for several time. Actually the training process tended to converge at a early stage(80-100th epoch). Here is a graph of test error trend. 2016-08-03 9 44 05

    For parameter settings, I basically followed your default setting or your documentation.


    smallNet : false
    learningRate : 0.001
    datahistClasses :   810274
    [torch.FloatTensor of size 20]
    batchSize : 2
    dataconClasses : table: 0x401fe608
    dataClasses : table: 0x401fe4d0
    channels : 3
    printNorm : false
    save : savemodel/
    CNNEncoder : historymodel/enc_1/model-best.net
    labelHeight : 32
    labelWidth : 64
    plot : false
    nGPU : 2
    lrDecayEvery : 100
    weightDecay : 0.0005
    imHeight : 256
    dataset : cs
    momentum : 0.9
    devid : 1
    cachepath : historymodel/
    datapath : datasets/Cityscapes/
    threads : 8
    maxepoch : 300
    noConfusion : all
    learningRateDecay : 1e-07
    model : models/encoder.lua
    imWidth : 512

    and I got Best test error: 0.46744307547808, in epoch: 88


    smallNet : false
    learningRate : 0.001
    datahistClasses :   45323724
    [torch.FloatTensor of size 20]
    batchSize : 2
    dataconClasses : table: 0x400dd688
    dataClasses : table: 0x400dd550
    channels : 3
    printNorm : false
    save : savemodel/
    CNNEncoder : historymodel/enc_2_728/model-best.net
    labelHeight : 256
    labelWidth : 512
    plot : false
    nGPU : 2
    lrDecayEvery : 100
    weightDecay : 0.0005
    imHeight : 256
    dataset : cs
    momentum : 0.9
    devid : 1
    cachepath : historymodel/dec
    datapath : /home/eeb433/Documents/Yuhang/dilation/datasets/Cityscapes/
    threads : 8
    maxepoch : 300
    noConfusion : all
    learningRateDecay : 1e-07
    model : models/decoder.lua
    imWidth : 512

    and I got Best test error: 0.77709275662899, in epoch: 95

    Thanks for your answer!

  • Performance Analysis

    Performance Analysis

    I took a video as input and on the lower right corner the frames are displayed. It says 23 frames at input resolution of 512x272 px. It runs on a Titan X (Pascal) and has cuDNN v5.1 support. So i can not reproduce the inference time in the paper (150 frames). Is there still a trick to get a much better performance?

    Thats what i typed into the terminal:

    qlua demo.lua -i /home/timo/SegNet/Farbvideo.avi -d /home/timo/ENet-training/model/ -r 0.5 
  • Training with other datasets with different image size

    Training with other datasets with different image size


    I'm trying to train ENet with a different dataset where images and labels have size 500 (width) x 210 (height). I developed my own loadDataset.lua file and added this option in run.lua. However, I'm getting the following error when the data is being loaded:

    ==> Training: epoch # 1 [batchSize = 10]
    /root/torch/install/bin/luajit: ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:41: input and target should be of same size stack traceback: [C]: in function 'assert' ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:41: in function 'forward' ./train.lua:99: in function 'opfunc' /root/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' ./train.lua:112: in function 'train' run.lua:61: in main chunk [C]: in function 'dofile' /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

    I would like to load and use the dataset images with their original sizes (no resizing). What should I put as image/label width and height for training the encoder in this case? In my case, I tried the following values assuming that labels in the encoder are normally 1/8th of the original image size in other datasets (e.g., CityScapes):

    --imHeight 210 --imWidth 500 --labelHeight 27 --labelWidth 63

    Could anyone give some advise?

    Thank you.

  • demo.lua:221: attempt to call method 'squeeze' (a nil value)

    demo.lua:221: attempt to call method 'squeeze' (a nil value)

    Hi, when i am running my self trained model i get the following error for every processed frame and the output windows stays white:

    demo.lua:221: attempt to call method 'squeeze' (a nil value)
    stack traceback:
        demo.lua:221: in function <demo.lua:186>

    When i run your provided model everything works. When i run my self trained encoder alone, it is also working (but with reduced output resolution of course).

    This is how i proceed:

    First i train the encoder: th run.lua --dataset cs --datapath /home/udo/CityScapes --model models/encoder.lua --save save/trained/model/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --nGPU 1

    Than, after moving the database data.t7 to data_enc.t7 (so a new one is created for the decoder with it's correct output resolutin), i train the decoder: th run.lua --dataset cs --datapath /home/udo/CityScapes --model models/decoder.lua --save save/trained/model-dec/ --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --nGPU 1 --CNNEncoder /home/udo/enet/ENet-training/train/save/trained/model/model-best.net Both are running fine and converge as i should.

    This is how i run the demo: qlua demo.lua -i ~/CityScapes/leftImg8bit/test -d ~/enet/ENet-training/train/save/trained/model-dec/

    I am pointing the demo directly to the saved model of the decoder training, is there some preprocessing step necessary? The trained decoder model is a bit smaller than the encoder (2983638 vs 2916948). Your model is bigger: 3230016. To the encoder and decoder have to be "fused" in an intermediate step?

    Best regards, Udo

  • Where to set #classes

    Where to set #classes

    Hi, I have a same dataset as CamVid except with two classes, I assume the only where that needs to modify is loadCamvid.lua. I changed the classes and conClasses. Also I changed this line mask = rawImg:eq(13):float() to mask = rawImg:eq(3):float().

    I could train the encoder part, but for the decoder the result has 13 classes.

    Is there anywhere I should change?

    -Best Mina

  •  module 'fastimage' not found

    module 'fastimage' not found

    I want to test ENet. I enter the following in the terminal:

    qlua demo.lua -i /home/timo/example_image/004.png -m /home/timo/ENet-training/model/model-best.net

    Then I get the following output:

    Found Environment variable CUDNN_PATH = /usr/local/cuda/lib64/libcudnn.so.5
    GPU # 1 selected
    Loading model from: /home/timo/ENet-training/model/model-best.net
    No stat file found in directory: /home/timo/ENet-training/model/home/timo/ENet-training/model/model-best.net
    newcatdir= /home/timo/ENet-training/model/categories.txt
    Loading categories file from: /home/timo/ENet-training/model/categories.txt
    Network has this list of categories, targets:
    1   Unlabeled   true
    2   Road    true
    3   Sidewalk    true
    4   Building    true
    5   Wall    true
    6   Fence   true
    7   Pole    true
    8   TrafficLight    true
    9   TrafficSign true
    10  Vegetation  true
    11  Terrain true
    12  Sky true
    13  Person  true
    14  Rider   true
    15  Car true
    16  Truck   true
    17  Bus true
    18  Train   true
    19  Motorcycle  true
    20  Bicycle true
    qlua: ./frame/frameimage.lua:17: module 'fastimage' not found:
        no field package.preload['fastimage']
        no file '/home/timo/.luarocks/share/lua/5.1/fastimage.lua'
        no file '/home/timo/.luarocks/share/lua/5.1/fastimage/init.lua'
        no file '/home/timo/torch/install/share/lua/5.1/fastimage.lua'
        no file '/home/timo/torch/install/share/lua/5.1/fastimage/init.lua'
        no file './fastimage.lua'
        no file '/home/timo/torch/install/share/luajit-2.1.0-beta1/fastimage.lua'
        no file '/usr/local/share/lua/5.1/fastimage.lua'
        no file '/usr/local/share/lua/5.1/fastimage/init.lua'
        no file '/home/timo/torch/install/lib/fastimage.so'
        no file '/home/timo/.luarocks/lib/lua/5.1/fastimage.so'
        no file '/home/timo/torch/install/lib/lua/5.1/fastimage.so'
        no file './fastimage.so'
        no file '/usr/local/lib/lua/5.1/fastimage.so'
        no file '/usr/local/lib/lua/5.1/loadall.so'
    stack traceback:
        [C]: at 0x7f5bf93969c0
        [C]: in function 'require'
        ./frame/frameimage.lua:17: in function 'init'
        demo.lua:147: in main chunk

    "luarocks install fastimage" does not work unfortunately. Does somebody has an idea? Thank you in advance!

  • Training on CamVid with 14 classes instead of 12

    Training on CamVid with 14 classes instead of 12

    I am trying to train ENet on CamVid but after adding 2 additional classes, Lanes and Traffic Signals. I recreated the annotations and changed the training, validation and test split of the dataset from 701 images. I have created the new train.txt and test.txt files exactly as it has been created for the default CamVid dataset. The only change i made was in the loadCamVid file where i changed the classes list according to the new dataset. However on running the run.lua file f\with the correct paths to the dataset and model, I am facing this error. It would be great if someone can help pointing out what the problem might be.

  • Training on SUN RGB-D dataset

    Training on SUN RGB-D dataset


    I want to train ENet model on SUN RGB-D dataset, but I found that the ground truth of each image is not consistent.

    I following the source code to load the label of each image with m = require 'matio' label = m.load(/path/to/folders/'seg.mat').seglabel Then, drawing an output image with the label, and making different index label has different color.

    But, for example, beds are labelled with different color/index in following images 0000001 0000001_color_gt 0000002 0000002_color_gt

    And other objects have different index in different images. Also, SUN RGB-D dataset has 38 classes (including unlabelled class), so the index interval should be [0, 37] or [1, 38]. But some seg.mat file has the index number larger than 37 and 38, for example, 45, 46 appeared.

    I'm wonder what's going wrong about the ground truths?

    Many thanks.

  • Class Accuracy is 0

    Class Accuracy is 0

    I'm training both encoder and decoder on the CamVid dataset and use --noConfusion all. It works fine for encoder, while for decoder, the class accuracies for column-pole, sign-symbol, pedestrian and bicyclist are 0.000% during both training and testing. I'm not sure why this could happen? The dataset was downloaded from SegNet github, and the image size is kept as it is (360x480). Here's confusionMatrix for one model: Testing:

    [[ 6378220   69799      10      23     124  301373       0    3156   11950       0       0]   94.287% 	[class: Sky]
     [  137783 8384576       4   12455   44300  470470       0  397151  318598       3       0]   85.861% 	[class: Building]
     [   44235  287893       2     255    3949   56334       0   36627   41310       0       0]   0.000% 	[class: Column-Pole]
     [      10   20881       0 9232678  701740      26       0   28850  281855       0       0]   89.934% 	[class: Road]
     [       2   42951       0  656156 2133744       7       0   41084  820141       0       0]   57.761% 	[class: Sidewalk]
     [  281655 1948040       3     127    1128 2107336       0  118706   53577       3       0]   46.720% 	[class: Tree]
     [    7756  339229       0       6     132   44766       0    6926    2599       0       0]   0.000% 	[class: Sign-Symbol]
     [     910  262004       1    2599    6494    4922       0  108354   81889       0       0]   23.194% 	[class: Fence]
     [    5352  131591       1   64122   61088   11641       0   68830 1224900       0       0]   78.142% 	[class: Car]
     [      47  165030       0     303     944      93       0   62262   25160       0       0]   0.000% 	[class: Pedestrian]
     [       3   24571       0    1233    2030     177       0   29364   16965       0       0]]  0.000% 	[class: Bicyclist]
     + average row correct: 43.263584944676% 
     + average rowUcol correct (VOC measure): 33.552783618056% 
     + global correct: 77.335819603064%
  • input and target should be of same size

    input and target should be of same size


    Im trying to run your code for training with CamVid Dataset (annotation from Segnet as guided) on Google Colab. The encoder works fine but I got error when training decoder with model loaded from previously trainned encoder.

    /content/torch/install/bin/luajit: ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:28: input and target should be of same size stack traceback: [C]: in function 'assert' ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:28: in function 'forward' ./train.lua:101: in function 'opfunc' /content/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' ./train.lua:116: in function 'train' run.lua:59: in main chunk [C]: in function 'dofile' ...tent/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x563cbe00b570

    I have printed size of x, y and yt: ==> x: 10 3 360 480 [torch.LongStorage of size 4]

    ==> y: 10 12 360 480 [torch.LongStorage of size 4]

    ==> yt: 10 45 60 [torch.LongStorage of size 3]

    Look likes y and yt are not same size in function err = loss:forward(y,yt) while running decoder.

    Am I doing something wrong?

  • Training ENet using own Dataset

    Training ENet using own Dataset


    Thank you for providing the code.

    I have some problem to training ENet using own dataset. 'no search file' error occurs even though data are in a folder.



    and also same symptom if change to the Absolute path. How can I fix this error? help me plz..

  • Assertion `t >= 0 && t < n_classes` failed,

    Assertion `t >= 0 && t < n_classes` failed,

    when i train on my data, this error has been shown. And i has checked the label, it has been resized to [0 ,classes-1]. This problem shown in train,lua (err = loss:forward(y,yt) -- updateOutput). can anyone help me ?

  • Cityscapes Test Result

    Cityscapes Test Result

    Does your Cityscapes test dataset have labels?I have downloaded a label without a test data set from the official website.So I can't get the test mean IOU

  • Which encoder weights should I use as CNNEncoder?

    Which encoder weights should I use as CNNEncoder?

    I'm wondering if there is any reason you have used "model-100.net" as encoder initialization when you train decoder, this line? When I use "model-best.net" as pre-trained encoder, my decoder best number is as follows? Best test error: 0.75259215056896, in epoch: 79 Is it similar to what you get?

