The success of deep learning has fast paced the evolution of current technology at unprecedented rate. In particular, deep convolutional neural networks (CNNs) has gained a lot of attention due to their extraordinary performance in a wide range of computer vision applications. While the performance of CNNs has been excellent, their implementation complexity has, however, always posed a challenge due to their computational and memory access intensive nature of CNNs especially for resource constrained embedded platforms. In this paper, we propose a novel reduced-parameter CNN architecture that can be used for image classification applications, which results in a significant network model size reduction. Our reduction method, inspired by SqueezeNet, replaces convolutional layer kernels with smaller sized kernels and removes all the fully connected layers other than the last classifying layer. The proposed architecture results in less computational complexity when deployed in hardware. We implemented the proposed architecture by fitting all trained network parameters on-chip using Xilinx Vivado targeting Zynq XC7Z020-1CLG484C FPGA device. The proposed architecture has 11.2× less parameters and has an improvement of 2.8× Area-Delay Product, compared to LeNet, resulting in an efficient hardware deployment.