Deep learning technology is regarded as one of the latest advances in data science and analytics due to its learning abilities from the data [1]. As a result, deep learning is widely applied in the human crowd analysis domain [2]. Although it has achieved remarkable success in this area, a fast and robust model for pushing behavior detection in the human crowd is unavailable. This paper proposes a model that allows crowd-monitoring systems to detect pushing behavior early, helping organizers make timely decisions before dangerous situations appear. This particularly becomes more challenging when applied to real-time video streams of crowded events, which the proposed model accomplishes with reasonable time latency. To achieve this, the model employs a hybrid deep neural network.