أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Jonathan Malmaud

Whats Cookin? Interpreting Cooking Videos using Text, Speech and Vision

77 - Jonathan Malmaud , Jonathan Huang , Vivek Rathod 2015

We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. In particular, we focus on the cooking domain, where the instructions correspond to the recipe. Our technique relies on an HMM to align the r ecipe steps to the (automatically generated) speech transcript. We then refine this alignment using a state-of-the-art visual food detector, based on a deep convolutional neural network. We show that our technique outperforms simpler techniques based on keyword spotting. It also enables interesting applications, such as automatically illustrating recipes with keyframes, and searching within a video for events of interest.

الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط استرجاع المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد