Inferring object states and articulation modes from egocentric videos