Abstract
The core functionality of advanced driver assistance systems and self-driving cars depends on the ability to recognize drivable road areas. As there are many different categories of useful markings incorporated within the road area, lane detection and classification form a major step for taking appropriate actions leading to truly autonomous driving. Existing datasets do not provide ample classification of lane types and adequate granularity for precise localization of lane markings. A new dedicated dataset for semantic segmentation of 11 varied lane types obtained by reannotating the BDD100K dataset is presented. The reannotation process involves pixel-level lane markings of 76,000 image instances of the BDD100K dataset, which were originally represented as a sequence of coordinate points. This opens up the possibility of high resolution both spatially and semantically in the context of lane understanding for autonomous driving. Baseline results on the proposed dataset based on the Bilateral Segmentation Network (BiSeNetV2), which considers the spatial data and the categorical semantics distinctly, are presented. The performance is pinned at 85% accuracy during testing using BiSeNetV2 architecture. The dataset is expected to open up new directions of research to address problems such as severe class imbalance, segmentation of multiple classes of texture-less and eccentric features (lanes), and so forth. As a result, applications such as lane-centric activity interpretation, future event prediction, and continuous learning are expected.
Keywords
A lane, which is a part of a roadway, to control as well as to guide vehicles by reducing traffic conflicts, is an essential aspect of driving scene understanding. Knowing lane position lets the vehicles know where and how to go, avoiding the risk of running into another vehicle or any other object on the road. It can also prevent the vehicle from drifting off the driving lane. Thus, lane detection is a critical component in advanced driver assistance systems (ADAS) and autonomous systems ( 1 , 2 ). This multi-feature detection is a real challenge for computer vision and deep learning techniques. Lane detection can be performed on real-time video, for which detection is performed at single-frame level as well as across the frames. Vehicles, pedestrians, markers, and other objects can be recognized and detected at a single-frame level. A semantic scene segmentation can be performed to detect the drivable road area and, therefore, to detect the lane markings and perform lane segmentation ( 3 ). To accomplish all these, data pre-processing is required.
Data annotation—the process of adding metadata to a dataset—is a crucial stage of data pre-processing. These metadata take the form of a tag which can be added to any data type, including image, text, or video. Supervised machine learning models learn to recognize patterns in annotated data. Once an algorithm processes enough annotated data, it starts recognizing the same pattern when presented with new, unannotated data. Therefore, there is a need for clean, perfect, annotated data. Image annotation is the human-powered task of attaching labels to an image. This can range from one label for the entire image or numerous labels for every group of pixels within the image. Labels are chosen to help relate the computer vision model information about what the image shows. The data shows different objects on the road, such as vehicles, barriers, street lights, traffic lights, and so forth. Across frames, these scenes change. So, to solve the lane detection problem, these unwanted objects are ignored from the driving scene.
The main contribution of this study includes the introduction of a semantically labeled lane classification dataset obtained by the extensive reannotation of the BDD100K dataset. The implementation of a lane detection and classification framework using weighted cross-entropy loss on the proposed dataset, which has class imbalance resulting in 85% accuracy, is noteworthy.
Semantic Segmentation
By semantic segmentation, each pixel of an image is labeled with a corresponding class. It is different from instance segmentation because, in instance segmentation, various objects of the same class will have different labels. In the case of autonomous systems, self-driving cars can detect regions of the images obtained from segmentation. When object detection is performed, one has to deal with background noise as well as distractions. With image segmentation, there is automatic removal of background noise which significantly increases the accuracy of object recognition. It can also improve computational efficiency. Moreover, image segmentation gives insight into how the human visual system performs the same task. To perform segmentation, it is not required to know either the object or the visual concept of the object beforehand.
Considering the limitations, it is unclear how well some top algorithms work on general imagery ( 4 – 6 ). These methods are fine-tuned for specific situations or contexts; therefore, the generality seems dark. The amount of data required to train the algorithm is enormous. Thus, it is difficult for those applications with fewer examples. The computation resource needed in the case of segmentation is heavy. Moreover, in some cases, the segmentation error’s effect on the system may be crucial. An understanding of incorrect segmentation is of immense importance.
Dataset for Lane Detection and Classification
Existing datasets do not cover all the aspects of the lane marking classification. Caltech, TuSimple, and CULane are the most extensively used datasets for both deep learning and classical techniques ( 7 – 9 ). They are based on red-green-blue (RGB) photographs from typical cameras. TuSimple merely annotates the final frame of each video. The performance of the lane detection algorithm is influenced by differences in illumination and motion blur in the pictures. Event cameras are distinguished by their dynamic range and low latency as a separate sort of sensor that solves the problem with regular cameras. The purpose of the Mapillary dataset is to better comprehend street scenes around the world ( 10 ). The Cambridge-Driving Labeled Video Database, with a resolution of 960 × 720, is also used to comprehend road situations ( 11 ). ApolloScape is a dataset for autonomous driving research ( 12 ). LLAMAS is an unsupervised dataset with dashed lane annotations at the pixel level ( 13 ). Because all of these datasets focus on the whole road scenario, the lane markings are treated as a single entity rather than having lane subclasses. KITTI caters to urban driving circumstances and, consequently, may not be an asset for other environmental conditions ( 14 ). DET captures traffic scene data utilizing a dynamic vision sensor—an event camera which offers a low latency and high dynamic range ( 15 ). Nevertheless, given that neuromorphic vision sensors are new and limited in specification, by current standards, their application to autonomous driving use cases requires further research with better hardware. VPGNet was coupled with an exclusive dataset and algorithms for forecasting lane markings using the vanishing point ( 16 ). Numerous lane markers and road signs are carefully marked, in addition to vanishing point labels. This dataset comprises photos shot at night and with varying levels of rainfall because of the weather and harsh lighting conditions. CurveLanes compensates for the lack of curve scenes in previous datasets by including curve lane lines in more than 90% of its photographs ( 13 ). VIL-100 supports lane detection instance segmentation with up to eight lanes in a frame ( 17 ). ApolloScape, VIL-100, and VPGNet have lane classifications based on the color and continuity of lanes. They do not consider lane direction from the perspective of ego vehicles and weather, time, city, and scene conditions. Moreover, the VIL-100 dataset has fewer examples, requiring intense training of the model. Each lane line has its significance and, especially concerning autonomous vehicles, each lane type has an action associated with it—whether the vehicle can overtake or not, whether it is a pedestrian crosswalk, and whether there is, therefore, the need to slow down or stop. So, to have such kinds of action, it becomes necessary to train the model using such informative data. Therefore, the need to have a study on this variety of lane markings with more examples led to this reannotation work.
BDD100K Dataset
With 100,000 movies annotated for 10 different perceptual tasks in autonomous driving, including lane detection, the Berkley Deep Drive collection is the largest and most diversified driving video dataset ( 18 ). This collection contains high-resolution photos and geographical positioning system/inertial measurement unit (GPS/IMU) data from various scenarios such as city streets, residential areas, and highways in various weather conditions and times of day. The dataset consists of 100,000 videos, each about 40 s long, 720 pixels resolution, and 30 frames per second. Each image is a 1,280 × 720 RGB image. Here, the lane marking represents one single line. The vertices represent a lane line equation of type y = mx + p (or higher degree). From each video, a key-frame is sampled at the 10th second, and for those key-frames, annotations are provided. They are labeled at several levels: image tagging, road object bounding boxes, drivable areas, lane markings, and full-frame instance segmentation. These annotations help understand the diversity of the data and object statistics in different types of scenes.
Lane Markings in the BDD100K Dataset
Lane markings are important road instructions for drivers. They indicate the driving direction and localization for autonomous driving systems when GPS or maps do not have accurate global coverage. Here, the BDD100K dataset caters to long-term autonomy. It includes significant variations in scenes such as illumination (day and night), weather changes, seasonal changes, dynamic objects, constructions, and so forth. It also has various attributes for the lane markings. The attributes considered for the lane markings include lane category, the driving direction, and the continuity of the lane.
Figure 1 shows the different categories considered for the lane markings, lane continuity, and lane direction. The lane category comprises of single- and double-crossing with white, yellow, and any other color depending on the country of usage. Also, there are curbs and crosswalks for pedestrian usage. The lane markings are divided into two types based on how they instruct the vehicles in the lanes: vertical lane markings indicate those that are for the vehicles in the driving direction of their lanes, while parallel lane markings are for the vehicles in the lanes to stop. Parallel lanes can also be along the current driving direction. If a lane marking is parallel to the passing car, it may serve to guide cars and separate lanes; if it is vertical, it can be treated as a sign of deceleration or stop. Apart from this, lane continuity is also considered, which is indicated by whether the line is a solid (full) one or a dashed one. A solid lane implies no overtaking being allowed and a dashed lane implies overtaking is permitted but to be done looking out for other objects. The lane markings are labeled with eight main categories: road curb, crosswalk, double white, double yellow, double other colors, single white, single yellow, and single other colors are the ones that are considered. The remaining categories are ignored during the evaluation. Also, the attributes of continuity (full or dashed) and direction (parallel or perpendicular) are labeled.

Number of instances for: (a) lane category distribution, (b) lane continuity distribution, and (c) lane direction distribution of BDD100K dataset.
Workflow Logic
The workflow of the proposed model for lane annotation, detection, and classification is depicted in Figure 2. Using the original RGB images, and the original BDD annotations, the proposed lane class labels are generated. The original annotation has only coordinate points for lane markings, which define the lane lines, rather than the lane class. For a detailed lane understanding by autonomous vehicles, an extensive study of lane classes as proposed in this paper is required. Based on the labels generated and their attributes, the lane lines are re-classified to new lane classes. Slopes are determined for individual lane lines belonging to the straight-line category. The ratio of the slope of lane lines belonging to the same category should be approximately 1 and, therefore, a tolerance of ±0.5 in the slope ratio is set. For lane classes of curved nature, a threshold pixel distance of 40 pixels is set between the extremes of two adjacent lane lines of the same class. This threshold is set on an experimental basis considering the images in the dataset. There are chances that lanes can be labeled to the wrong class, or that a part of the drivable area can be mistakenly included as lane marking, if this threshold is breached. Once the lane lines are merged, they are filled with the same labels as that of the lane markings to generate a semantic lane label. Those lane lines that are not merged and remain as single, are dilated to a thickness using a 5 × 5 pixel grid structuring element. This enhanced pixel-level semantic information will enable the model to learn the features quickly. Based on these newly acquired lane lines, mask images are generated. These mask images and the original images are the input for the model training. The model distinguishes between spatial details and category semantics. The spatial data is extracted via the detailed branch, which employs wide channels and shallow layers. The semantic branch uses narrow channels and deep layers to capture the semantics. To improve the mutual connection, an aggregation layer is used, allowing the merging of both types of feature representations. The booster semantic head tries to improve segmentation performance while incurring no additional inference costs. As the dataset is a skewed one, the loss function used here is a weighted cross-entropy loss function. The weight value is based on the number of instances for each type of lane class.

Workflow logic of the proposed model.
Re-Categorization of the BDD100K Dataset
The direction attributes are considered the primary attribute. Lane style is a class under it, and lane type is a class further under lane style. The lane type is classified as: road curb, double, single, or crosswalk. While yellow indicates a distinction against the traffic in the opposite direction, white indicates lane splits in the same direction. As the significance of color is meager, all the double variations have been merged into a single type of double. The same holds for single variants as well; therefore, they have all been integrated into a single type called single. Thus, the lane direction variants are parallel and vertical; the lane style variants are solid and dashed; and the lane type variants are road curb, double, single, and crosswalk. On considering different permutations and combinations, we get a list of categories. It can be seen that a curb will be painted alternatively black and white, while a crosswalk will be a pattern. Moreover, a curb is seen mostly as a part of a pavement with an elevation, rather than a lane marking. Therefore, by eliminating the curb and dashed crosswalk, we get an effective list of categories as following (as shown in Figure 3).

Re-categorization of the BDD100K dataset.
As per Figure 3, the following are the classes that were arrived at after the re-categorization of the already existing dataset:
parallel_solid_double
parallel_solid_single
parallel_crosswalk
parallel_dashed_double
parallel_dashed_single
vertical_solid_double
vertical_solid_single
vertical_crosswalk
vertical_dashed_double
vertical_dashed_single
Annotation Strategy
Annotation initially began as a work in parts. Each image was found to have different varieties of lane. Some were straight lines, some were curves, and a few were crosswalks. So, in the initial stage, the lanes were dealt with concerning their geometry. The straight lines were originally annotated with two coordinate points and the curves with four coordinate points. Each of these was studied separately, annotated, and visualized separately. Once it was seen that each geometric pattern could be dealt with, the image was then annotated. Once the straight lines and curves were done with, the crosswalk was dealt with. Then, the methodology for all these was combined and the dataset was annotated as a one-time process. To perform this, the original JavaScript Object Notation (JSON) file data is taken as the input. The label data (labeldata) and the corresponding image name (imname) are considered. For each line type lane, the slope is determined. The range of the slope categorizes the lane as a line or a curve. Also, the number of coordinates (coor) determines the lane type as a line or curve. This slope, coordinate length, and the label data from the JSON categorizes the lane. The individual strategy has been elaborated via flowchart in the following sections.
Annotation Flow Process for Line-Type Lane
Once they are categorized, the lanes that are categorized as lines are considered. Figure 4 depicts the workflow process considered in annotating straight lines. Each image is considered and the number of labels present in the image is noted. The number of coordinate points for each of these labels is determined. If this number is “2,” then the slope is determined for each of the labels. Once the slope for all the “2” coordinate point labels is determined, then the slopes are compared with one another. If the slope falls within the same particular range, then the two-lane markings are pointing to one-lane marking. Based on that, the lines are joined and filled to create a lane marking pattern and labeled as a double line. If not, the line is considered a single line. Those that are single lines are dilated to enable lane line detection.

Flowchart for lane annotation strategy for line-type lanes.
Annotation Flow Process for Curve-Type Lane
Figure 5 showcases the work process for those lanes that are curved. From the JSON file data, if the slope range does not categorize the lane as a line, then it is treated as a curve. This is determined based on the length of the coordinates. If the length is 4, then the coordinates are first sorted. Then the lower point and upper point distance of each curve is determined concerning another curve in the image. If this distance falls within a particular range, then the lanes are considered as one single lane and the curves are joined together and the plot is filled. If not, they are considered as single curves and they are dilated.

Flowchart for lane annotation strategy for curve-type lanes.
Annotation Flow Process for Crosswalk-Type Lane
Figure 6 depicts the annotation for a crosswalk. In the case of the crosswalk, the label name is determined from the original JSON itself. Initially, based on the length of the coordinates, the lines are joined. Then, a mouse event is created. Based on the original image and the plot created, if the crosswalk exists within the two lines, the mouse is clicked. This creates a valid event, which enables the two lines to be joined and later to be filled. The event is continued until a break-in event is enabled.

Flowchart for lane annotation strategy for crosswalk-type lanes.
Figure 7 depicts the annotated image. Figure 7a is the annotation obtained when the lane types were line, Figure 7b is the annotation obtained when the lane type is curved, and Figure 7c is the annotation for the crosswalk-type of lane marking.

Output of lane annotation strategy for: (a) line-type lane, (b) curve-type lane, and (c) crosswalk-type lane.
Pseudocode for Combined Lane Annotation
This section is where the work process of different types of lanes is combined and is modified, such that the input will be the RGB image and the JSON file and the output will be an annotated image. So, if a single image has a straight line, a curved line, and a crosswalk, the annotation process is done as a single procedure and an annotated image is created. The algorithm shows where all the lanes were generated by executing as one single code. Here, all the classes as mentioned
where
slopediv = the ratio of the slope of lane1 and lane2,
consdiff = the difference between intercepts for lane1 and lane2 lines,
d1 = the distance between the starting point of lane1 and lane2, and
d2 = the distance between the ending point of lane1 and lane2.
Thus, on applying the pseudo logic, a combined lane annotation is obtained, as in Figure 8, which has all the types of lane category—parallel or vertical, solid or dashed, single or double—incorporated. This is a filled and dilated image of a particular scenario of a lane.

Output of combined lane annotation strategy.
Of the original BDD100K dataset, on reannotation, a total of 76,000 images were considered. The remaining images had occluded lanes, or lane markings were not visible. As such lane features were thin structures and are a difficult terrain whilst training, it would be more difficult if the lanes were not visible. Therefore, those images were not considered in this initial study. Of the annotated images, 57,000 images were allocated for training, 9,500 for validation, and 9,500 for testing purposes.
Table 1 indicates the number of instances each of the classes has in a total of 76,000 images. The images were split on a 75:25 ratio between training and validation + test. The split was made on a random basis, although there was the issue of an imbalanced dataset. The background pixel was found to be a lot more in comparison with that of the other labels. Also, there is a class imbalance between the different lane classes.
Number of Images Found for Each Class for Each of the Training, Validation, and Test Datasets
Figure 9 depicts the instances for each lane class before regrouping and reannotation work. It can be seen that the crosswalk has more instances than any other class, as each crosswalk has to be identified by two parallel lines, while that is not necessary for the other lane classes. Figure 10 compares the number of images for each class of train, validation, and test dataset, on the completion of the reannotation of the dataset. A reduction in the number of labels can be seen in comparison with Figure 9, which is the result of a combination of lanes available as double marking into a single class. The statistics were taken to check on the availability and proportion of each class and it was seen that, along with the background, what was generated is an unbalanced dataset. Moreover, it can be seen that the dataset is an imbalanced dataset and, therefore, the issue of this class imbalance is overcome by using the weighted loss function.

Comparison of the number of instances for each lane class before regrouping.

Comparison of the number of images for each lane class for training, validation, and test datasets after reannotation.
Evaluation
To measure the performance of this dataset, the Bilateral Segmentation Network (BiSeNetV2) architecture was proposed ( 19 ). The BiSeNetV2 network consists of detail and semantic branches. While the detailed branch captures low-level features, the semantic branch obtains high-level semantic context. The guided aggregation layer enhances mutual connection and fuses both types of feature representation. The training utilized Tensorflow to configure and build the environment under the Ubuntu 18.04.6 hardware environment. The processor is AMD with GPU of RTX 2080; Cuda with Cudnn version 11.6; Tensorflow version 2.4; and Python version 3.8. Once the environmental experiment configuration was done, the model was trained. The parameters of this experiment were set as follows: several classes 11, batch size of 64, learning rate 1e-5, steps_per_epoch calculated as the ratio of the length of the image list to batch size, total number of iterations 1200, learning_momentum = 0.9, weight_decay = 0.0001.
Training of this network was done on this reannotated dataset. Figure 11 shows the results of training accuracy concerning epochs and validation loss, and validation accuracy concerning epochs. The accuracy was found to be around 85%, while the validation loss ranges between 0.02 and 0.15. This can be attributed to the class imbalance found in the dataset and, therefore, a more detailed study is required to steer through this dataset.

Training accuracy, validation loss, and validation accuracy on training of BiSeNet architecture.
Conclusion and Future Work
A reannotated dataset was introduced to overcome the limitations faced by the vehicle in making decisions and to enable the classification of lanes. Experiments conducted on this dataset showed the efficiency of the algorithm and also the need for a more robust one for real-world application. This work is aimed at fostering new studies on lane classification learning and will shed light in that direction. The dataset will be continuously improved and will be tried on state-of-the-art approaches to illustrate its usefulness.
Footnotes
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: T. Rajalakshmi, R. Senthilnathan; data collection: T. Rajalakshmi; analysis and interpretation of results: T. Rajalakshmi, R. Senthilnathan; draft manuscript preparation: T. Rajalakshmi. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
