The proliferation of UAVs and fighters equipped with stabilized, high-magnification video pods and imaging radars has a number of corollary consequences. Bandwidth has become a key battlefield constraint.
Specialized reconnaissance fighter aircraft are a dead concept. And some poor analyst has to sift through the video tsunami at the other end, in order to find items of interest.
The USA is using a number of approaches to help deal with the flood, and one unconventional approach involves a DARPA project called VIRAT (Video Image Retrieval and Analysis Tool). It doesn’t recognize faces, perform before/after analysis, or rely on rewinds. Instead, it aims to distinguish certain types of behaviors, so it can provide alerts to intelligence operatives or ground forces during live operations.
DARPA has this to say about what VIRAT is, and isn’t:
“First, VIRAT seeks to enable military analysts to establish alerts that continuously query a real-time video stream. Generated alerts will cue analysts to dangers or opportunities in real-time and will provide actionable information even as events are unfolding on the ground during a combat mission. Second, VIRAT is developing tools that will put an entire petabyte-scale video archive at the fingertips of a military analyst, allowing them to very rapidly retrieve, with high precision and recall, video content that was gathered previously…. The fundamental enabling technology being pursued under VIRAT is the development of representations of actions and activities that are robust…”
That last sentence is key. VIRAT is about recognizing and reporting actions: someone has gone into a building, shooting, accelerating in their car, a group meeting is going on, etc. That goes beyond simple video rewind capability, all the way to providing the play-by-play descriptions. Chick Hearn, call your office.
Now, here’s what VIRAT is not:
”....The primary focus of VIRAT is activity-based and dynamic information…. The VIRAT program will not support the development of new algorithms for tracking, moving target detection and indication, image-based change detection, geo-registration, motion pattern learning, anomaly detection, and sensor fusion. While it is expected that such algorithms may be useful to VIRAT, the system will use existing capabilities in these areas…. Face recognition, gait recognition, human identification, or any form of biometrics will not be funded or used in any way within this program.”
The VIRAT program will be conducted in 3 phases.
Phase 1 – Prototype Algorithm Development and System Design. They’ll use government-provided Day TV and IR data from a controlled collection made by “Predator-type sensors.” The Phase 1 system have a coherent system architecture, and work with existing military systems used by the program’s “transition partners.” The goal is 85% identification of target events or activities from the controlled archives, with no more than 8 false positives per video stream hour. When using real-time streaming, the goal is 75% identification from the streaming video, with no more than 12 false positives per video stream hour. The system should be able to change to a different stored query within 1 second, and set up a new query for viewing in 10 minutes.
Phase 2 – Algorithm Optimization and System Integration. Basically, improve performance and identify a wider variety of event types. The goal is now 90% identification of target events or activities from the controlled archives, with no more than 4 false positives per video stream hour. When using real-time streaming, the goal is 85% identification from the streaming video, with no more than 6 false positives per video stream hour. The system should be able to change to a different stored query within 1 second, and set up a new query for viewing in 5 minutes.
Phase 3 – Integration, Demonstration, and Transition to the military services. Phase 3 efforts must handle query refinement and complex searches that include multiple event types, handle faster rates of streaming video, and larger video archives, and use actual operational Day TV and IR data from a wide range of UAV platforms. The goal is now 95% identification of target events or activities, with no more than 2 false positives per video stream hour, whether the query involves archives or streaming video. The system should be able to change to a different stored query within 1 second, and set up a new query for viewing in 1 minute.
Advance warning from DARPA: this won’t be easy:
“The focus of VIRAT is down-linked aerial video, which should be carefully taken into account by proposed approaches. Spatial resolution is, at most, 10cm ground sample distance and more typically 20-30cm. The sensor is moving rapidly and is distant from the scene. Video quality can vary considerably due to sun angle, haze, rain and other environmental conditions. Sensor gimbal motion, sensor field of view changes, and sensor jitter will influence the presence and appearance of objects within each image. Obscuration and occlusion will vary with ground activity, changes in viewing perspective, and site-specific obstructions. Operational video sources may utilize visible or infrared wavelengths, with infrared display options including white-hot and black-hot settings.”
To read more about this history of VIRAT, click here.