Multimodal Semantic Understanding And Navigation In Outdoor Scenes