Video annotation progress. Included in the metadata
field of the Operation
returned by the GetOperation
call of the google::longrunning::Operations
service.
Video annotation request.
Video annotation response. Included in the response
field of the Operation
returned by the GetOperation
call of the google::longrunning::Operations
service.
A generic detected attribute represented by name in string format.
A generic detected landmark represented by name in string format and a 2D
location.
Detected entity from video analysis.
Explicit content annotation (based on per-frame visual signals only).
If no explicit content has been detected in a frame, no annotations are
present for that frame.
Config for EXPLICIT_CONTENT_DETECTION.
Video frame level annotation results for explicit content.
Deprecated. No effect.
Face detection annotation.
Config for FACE_DETECTION.
Deprecated. No effect.
Video segment level annotation results for face detection.
Label annotation.
Config for LABEL_DETECTION.
Video frame level annotation results for label detection.
Video segment level annotation results for label detection.
Annotation corresponding to one detected, tracked and recognized logo class.
Normalized bounding box.
The normalized vertex coordinates are relative to the original image.
Range: [0, 1].
Normalized bounding polygon for text (that might not be aligned with axis).
Contains list of the corner points in clockwise order starting from
top-left corner. For example, for a rectangular bounding box:
When the text is horizontal it might look like:
0––1
| |
3––2
A vertex represents a 2D point in the image.
NOTE: the normalized vertex coordinates are relative to the original image
and range from 0 to 1.
Annotations corresponding to one tracked object.
Config for OBJECT_TRACKING.
Video frame level annotations for object detection and tracking. This field
stores per frame location, time offset, and confidence.
Person detection annotation per video.
Config for PERSON_DETECTION.
Config for SHOT_CHANGE_DETECTION.
Provides “hints” to the speech recognizer to favor specific words and phrases
in the results.
Alternative hypotheses (a.k.a. n-best list).
A speech recognition result corresponding to a portion of the audio.
Config for SPEECH_TRANSCRIPTION.
Annotations related to one detected OCR text snippet. This will contain the
corresponding text, confidence value, and frame level information for each
detection.
Config for TEXT_DETECTION.
Video frame level annotation results for text annotation (OCR).
Contains information regarding timestamp and bounding box locations for the
frames containing detected OCR text snippets.
Video segment level annotation results for text detection.
For tracking related features.
An object at time_offset with attributes, and located with
normalized_bounding_box.
A track of an object instance.
Annotation progress for a single video.
Annotation results for a single video.
Video context and/or feature-specific parameters.
Video segment.
Word-specific information for recognized words. Word information is only
included in the response when certain request parameters are set, such
as enable_word_time_offsets
.