3D human pose estimation has received increasing attention as it is the foundation for many downstream tasks. However, this task is challenging due to inherent depth ambiguity and occlusion issues. Thanks to the ability of diffusion models to generate multiple hypotheses, they are promising in reducing uncertainty in results. Inspired by this, we propose a diffusion-based temporal constraint framework for 3D human pose estimation, called DTCPose, which generates multiple 3D candidate poses with 2D poses as conditions to synthesize the final pose to improve estimation accuracy. Simultaneously, to ensure the temporal stability of the 3D output sequences, we introduce temporal constraints into the model to reduce the jitter of the results. Extensive experiments on Human3.6M and MPI-INF3DHP datasets demonstrate that our approach performs predominantly in both single-hypothesis and multi-hypothesis 3D human pose estimation.