Those of you who have read my previous blog posts on Augmented Reality (AR) (Ref1, Ref2, Ref3) know that one of the main challenges of AR is being able to track the user’s position and orientation in real time. Overlaying a 3D model approximately at the right location on top of the physical world can be done relatively easily – a GPS and a compass can provide sufficient accuracy for rough alignment, which is enough for several types of applications, including finding the nearest restaurant, the whereabouts of your friends, or displaying nutritional info of a product one holds in his hand at the grocery. That is easy.
The real difficulty arises when one wants to get accurate augmentation – I mean the type of augmentation that engineers would require. Let’s say, for instance, that an electrical engineer uses a smartphone AR app to aim at an electric cable with the phone and “click” on it with a crosshair to display – say – the live voltage being measured on that cable. It such a situation, it is extremely important that the AR system should display the voltage related with that specific cable, and not the other one located 5 cm to the right, because the engineer’s life may depend on it… Talking with several of our users, we came to realize that accuracy is paramount if we want to develop AR apps for engineers. Actually, a non-accurate AR app would indeed be “cool” (for a while) but quickly abandoned by serious users, as soon as they would realize they could not rely on it.
In our previous work, we “walked around” the user tracking problem by doing the augmentation on panoramic images. An image being static, no tracking is required, and the augmentation is very precise (no jittering is observed). That enabled us to develop prototypes for testing hypotheses that could not easily been tested with standard AR technology (that require real time tracking). Having said that, augmenting images is far from ideal: an image is, by definition, out of date from the moment it is captured, it may not be up to date with the surrounding world and, most importantly, it is static, so it cannot display any live event taking place in the scene (such as a user trying to interact with the augmentation). That is rather limiting. Live augmentation is something most researchers in the field are trying to achieve. After all, reality is about the present – so augmenting it should be taking place now…
We looked again at the reasons why we chose panoramas in the first place. First, a panorama represents an environment – if we are to augment reality using a static image, better be one that has a very wide field of view, to show enough of the environment and this way partly compensate the fact that the camera cannot be moved. Also an image is static, so no tracking is required, as discussed above, making the augmentation very precise. But there was also another reason: a panorama provides image data all around the camera. That is important, as that image data is used to calculate the camera position. Building corners, windows, and other features are used to calculate the camera position. But think of typical standard cameras, with their relatively narrow field of view – if your camera gets too close to a wall for instance, all the camera can see will be a featureless wall surface, and no striking feature to calculate the position from. In such a situation, a panoramic camera has much more chance of seeing other features, decreasing the risk of falling into a situation where the camera position cannot be calculated. That means more accurate augmentations. In summary, panoramas are good, but they would be even better if they would not be static…
So we proposed a combination of the two: augmenting live panoramic video. We used a nice panoramic video camera from Point Grey Research as the basis of our system, which is used as follows: the camera is installed on a tripod, at a stationary position. In an “initialization” phase, the live panoramic stream is first aligned with the 3D model – this way the augmentation can be displayed at the right location on the panoramic stream. This initialization process actually calculates the camera position in the model. From that point, augmentation can take place, and a user can augment any area surrounding the camera, assuming it is visible from the camera position. Since the camera is stationary, the augmentation is jitter free, and potentially much more accurate than with systems that require live camera tracking. Suppose now the user wants to augment a different location in the building, he simply moves the tripod. In the process, the system “tracks” surrounding features, calculating the camera position every frame. When the user puts the tripod back on the ground, the system knows where the camera is located (since he tracked it while it was being moved), which means the user does not have to re-initialize the system, and can resume augmentation right away.
Our system is composed of a panoramic camera, one tripod, and 2 laptops. The camera produces 75 Mb of data per second, so we needed quite some processing power to augment that sort of video stream…
The system in action is shown in the following video:
(Please visit the site to view this video)
In short, this is very close to true live AR: the augmentation is done in real time, augmentation can be done from anywhere (and not from specific panorama locations), and as a bonus we get very steady (jitter-free) augmentations because the camera does not move. But something is still missing. In the photos below, extracted from the video demo, the user does not see the augmentation. He actually relies on verbal instructions from his colleague, holding the laptop, to position his hand correctly with respect to the pipe / duct. That is not ideal.
So we thought of adding a tablet to our setup:
In addition of being displayed on the laptop, the augmentation is also broadcast to a tablet, held by the user. So the user sees what the panoramic camera system is augmenting. The tablet is portable, so the user can walk around, and see what is being augmented. But he sees the augmentation from the camera position. That seems a bit strange, but look at the second video, below…
(Please visit the site to view this video)
As you could see, the system we have proposed is different from the typical AR app, in which the user user augment what is directly in front of the tablet’s camera. In our system, the user sees what another camera sees – which means he may see himself in the augmentation. Consequently, he may use that video as a feedback to position his hand correctly and this way interact with the augmentation. But that system has a major advantage over typical AR systems: since the camera is stationary, the user gets no jitter on the augmentation, regardless of how much he moves with the tablet. That could prove to be quite an advantage for engineers who require high augmentation accuracy...
Our results lead us to conclude that panoramic video augmentation is not only possible but also has many advantages over typical AR apps, such as accurate and garanteed jitter-free augmentation. Using the system, we realized that it is probably more aligned with the needs of engineers, architects, operators, and other professionals who need to design, build, maintain and operate infrastructure.
I agree, our setup is a bit complex – 2 laptops, a fancy panoramic camera, a tablet... Of course, this is research. However, we are working with devices that will, in the future, become widely available in a smaller and cheaper form. Laptops will be more powerful, and small and portable panoramic cameras are now appearing on the market. Our study shows the possibility is real.
Through the various augmented reality projects we have done, we realized the enormous potential of that technology in the infrastructure world. There is an enormous amount of data available for infrastructure, and AR represents a solution for making use of that data more easily, on site, where we need it most. Future will tell when AR will end up being sufficiently developed for being used by professionals. But we are convinced that future it not very far...
Want to read more? Check our papers:
Côté S., Trudel P., Desbiens M.-A., Giguère M., Snyder R., 2013. Live mobile panoramic high accuracy augmented reality for engineering and construction. Proceedings of the 13th International Conference onConstruction Applications of Virtual Reality (CONVR) 2013, London, November 2013. PDF
Côté S., Trudel, P., 2013. Third Person Perspective Augmented Reality for High Accuracy Applications. Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR) 2013, Adelaide, October 2013. PDF
As always, stay tuned!
And many thanks to Chuck Fields, owner of the Paddy Wagon Irish Pub in Richmond, Kentucky, for giving us access to his building and permission to share our results!