Home > Society and Social Sciences > Society and culture: general > Weakly Supervised Learning from Multiple Modalities: Exploiting Video, Audio and Text for Video Understanding.(English)

Weakly Supervised Learning from Multiple Modalities: Exploiting Video, Audio and Text for Video Understanding.(English) (Paperback) | Released: 01 Sep 2011

Name: Weakly Supervised Learning from Multiple Modalities: Exploiting Video, Audio and Text for Video Understanding.(English)
Brand: Bookswagon
Price: 5948.34 INR
Availability: OutOfStock

By: Timothee Cour (Author) | Publisher: Proquest, Umi Dissertation Publishing | Publisher Imprint: Proquest, Umi Dissertation Publishing

Write Reviews

₹5,948

Out of Stock

ISBN-10

124361966X

ISBN-13

9781243619662

Page Number

192

Language

English

Imprint

Proquest, Umi Dissertation Publishing

Weight (gr)

354

Dimention(mm)

246x10x189

See all details

Premium quality

Bookswagon upholds the quality by delivering untarnished books. Quality, services and satisfaction are everything for us!

Easy Return

Easy return

Not satisfied with this product! Keep it in original condition and packaging to avail easy return policy.

Certified product

First impression is the last impression! Address the book’s certification page, ISBN, publisher’s name, copyright page and print quality.

Secure Checkout

Secure checkout

Security at its finest! Login, browse, purchase and pay, every step is safe and secured.

Money back guarantee

Money-back guarantee:

It’s all about customers! For any kind of bad experience with the product, get your actual amount back after returning the product.

On time delivery

On-time delivery

At your doorstep on time! Get this book delivered without any delay.

Notify me when this book is in stock

Add to Wishlist

About the Book

As web and personal content become ever more enriched by videos, there is increasing need for semantic video search and indexing. A main challenge for this task is lack of supervised data for learning models. In this dissertation we propose weakly supervised algorithms for video content analysis, focusing on recovering video structure, retrieving actions and identifying people. Key components of the algorithms we present are (1) alignment between multiple modalities: video, audio and text, and (2) unified convex formulation for learning under weak supervision from easily accessible data. At a coarse level, we focus on the task of recovering scene structure in movies and TV series. We present a weakly supervised algorithm that parses a movie into a hierarchy of scenes, threads and shots. Movie scene boundaries are aligned with screenplay scenes and shots are reordered into threads. We present a unified generative model and novel hierarchical dynamic program inference. At a finer level, we aim at resolving person identity in video using images, screenplay and closed captions. We consider a partially-supervised multiclass classification setting where each instance is labeled ambiguously with more than one label. The set of potential labels for each face is the characters' names mentioned in the corresponding screenplay scene. We propose a novel convex formulation based on minimization of a surrogate loss. We show theoretical analysis and strong empirical proof that effective learning is possible even when all examples are ambiguously labeled. We also investigate the challenging scenario of naming people in video without screen-play. Our only source of (indirect) supervision are person references mentioned in dialog, such as "Hey, Jack!." We resolve identities by learning a classifier from partial label constraints, incorporating multiple-instance constraints from dialog, gender and local grouping constraints, in a unified convex learning formulation. Grouping constraints are provided by a novel temporal grouping model that integrates appearance, synchrony and film-editing cues to partition faces across multiple shots. We present dynamic programming inference and discriminative learning for this partitioning model. We have deployed our framework on hundreds of hours of movies and TV, and present quantitative and qualitative results for each component.

Best Seller

| | See All

Product Details

ISBN-13: 9781243619662
Publisher: Proquest, Umi Dissertation Publishing
Publisher Imprint: Proquest, Umi Dissertation Publishing
Height: 246 mm
No of Pages: 192
Series Title: English
Sub Title: Exploiting Video, Audio and Text for Video Understanding.
Width: 189 mm

ISBN-10: 124361966X
Publisher Date: 01 Sep 2011
Binding: Paperback
Language: English
Returnable: N
Spine Width: 10 mm
Weight: 354 gr

Related Categories

Very poor	Poor	Neutral	Good	Great

Share this product

Weakly Supervised Learning from Multiple Modalities: Exploiting Video, Audio and Text for Video Understanding.(English) (Paperback) | Released: 01 Sep 2011

Premium quality

Easy return

Certified product

Secure checkout

Money-back guarantee:

On-time delivery

Best Seller

Similar Products

How would you rate your experience shopping for books on Bookswagon?

Thank you for your rating!

Customer Reviews

Weakly Supervised Learning from Multiple Modalities: Exploiting Video, Audio and Text for Video Understanding.(English)

New Arrivals

Inspired by your browsing history