Multimodal Sensing and Data Processing for Speaker and Emotion Recognition using Deep Learning Models with Audio, Video and Biomedical Sensors