The Carnegie Mellon system has been able to automatically follow the movements of 13 people within a nursing home, even though individuals sometimes slipped out of view of the cameras. The researchers made use of multiple cues from the video feed: apparel color, person detection, trajectory and, perhaps most significantly, facial recognition.
Multi-camera, multi-object tracking has been an active field of research for a decade, but automated techniques have only focused on well-controlled lab environments. The Carnegie Mellon team, by contrast, proved their technique with actual residents and employees in a nursing facility — with camera views compromised by long hallways, doorways, people mingling in the hallways, variations in lighting and too few cameras to provide comprehensive, overlapping views.
The performance of the Carnegie Mellon algorithm significantly improved on two of the leading algorithms in multi-camera, multi-object tracking. It located individuals within one metre of their actual position 88 per cent of the time, compared with 35 per cent and 56 per cent for the other algorithms.
These automated tracking techniques also would be useful in airports, public facilities and other areas where security is a concern. Despite the importance of cameras in identifying perpetrators following this spring’s Boston Marathon bombing and the 2005 London bombings, much of the video analysis necessary for tracking people continues to be done manually, researchers said.
The researchers—Alexander Hauptmann, principal systems scientist in the Computer Science Department (CSD); Shoou-I Yu, a Ph.D. student in the Language Technologies Institute; and Yi Yang, a CSD post-doctoral researcher—will present their findings June 27 at the Computer Vision and Pattern Recognition Conference in Portland, Ore.
Carnegie Mellon researchers developed their tracking technique as part of an effort to monitor the health of nursing home residents.
“The goal is not to be Big Brother, but to alert the caregivers of subtle changes in activity levels or behaviors that indicate a change of health status,” Hauptmann said. “All of the people in this study consented to being tracked.”
The CMU work on monitoring nursing home residents began in 2005 as part of a National Institutes of Health-sponsored project called CareMedia, which is now associated with the Quality of Life Technology Center, a National Science Foundation engineering research center at CMU and the University of Pittsburgh.
“We thought it would be easy,” Hauptmann said of multi-camera tracking, “but it turned out to be incredibly challenging.”
Something as simple as tracking based on color of clothing proved difficult, for instance, because the same color apparel can appear different to cameras in different locations, depending on variations in lighting. Likewise, a camera’s view of an individual can often be blocked by other people passing in hallways, by furniture and when an individual enters a room or other area not covered by cameras, so individuals must be regularly re-identified by the system.
Face detection helps immensely in re-identifying individuals on different cameras. But Yang noted that faces can be recognized in less than 10 per cent of the video frames. So the researchers developed mathematical models that enabled them to combine information, such as appearance, facial recognition and motion trajectories.
Using all of the information is key to the tracking process, but Yu said facial recognition proved to be the greatest help. When the researchers removed facial recognition information from the mix, their on-track performance in the nursing home data dropped from 88 per cent to 58 per cent, not much better than one of the existing tracking algorithms.
The nursing home video analyzed by the researchers was recorded in 2005 using 15 cameras; the recordings are just more than six minutes long. Further work will be necessary to extend the technique during longer periods of time and enable real-time monitoring. 
The researchers also are looking at additional ways to use video to monitor resident activity while preserving privacy, such as by only recording the outlines of people together with distance information from depth cameras similar to the Microsoft Kinect.