State-of-the-art (SoTA) models have improved object detection accuracy with a large margin via convolutional neural networks, however still with an inferior performance for small objects. Moreover, these models are trained mainly based on the COCO dataset, and its backgrounds are more complicated than road environments, and thus degrade the accuracy of small road object detection. Compared with the COCO dataset, the background of a surveillance video is relatively stable and can be used to enhance the accuracy of road object detection. This paper designs a computationally efficient mixed stage partial (MSP) network to detect road objects. Another novelty of this paper is to propose a mixed background data augmentation method to enhance the detection accuracy without adding new labelling efforts. During inference, only the input image is used to detect road objects without further using any subtraction information. Extensive experiments on KITTI and UA-DETRAC benchmarks show the proposed method achieves the SoTA results for highly-accurate and efficient road object detection.