Ask most people how a phone knows where it is outdoors and they will say GPS. Indoors, GPS mostly stops working, so a range of other methods fill the gap. One of the more striking is the visual positioning system (VPS), which locates a user from what their phone camera sees rather than from any radio signal. It is the technology behind the arrows-on-the-floor augmented-reality wayfinding demos that look almost magical, and it deserves a clear, honest explanation rather than either hype or dismissal.

This post explains VPS end to end: how it matches camera imagery against a pre-built map to work out where you are standing, why that capability powers augmented-reality wayfinding, where it struggles, and the privacy question that comes with putting a live camera in the loop. It closes by contrasting a camera-free approach, because the presence or absence of a camera is the single biggest difference between VPS and the alternative many venues actually want. For how VPS sits against every other indoor method, see how positioning methods compare.
What is a visual positioning system for indoor navigation?
A visual positioning system (VPS) locates a user by matching what their phone camera sees against a pre-built visual map of the space. The device recognizes visual features (signage, structure, layout) and computes where the camera is standing and which way it points, which is what makes phone-held augmented-reality wayfinding possible. VPS can be accurate when the environment is well-mapped and well-lit, but it depends on the user holding up a camera, works less well in feature-poor or dim spaces, and raises the privacy question of pointing a live camera around a venue.
The sections below unpack each part of that: the feature-matching mechanism, the augmented-reality use case it enables, the conditions where it degrades, and the camera-in-the-loop privacy trade that shapes where VPS is a good fit.
How VPS works: feature matching against a pre-built visual map
VPS is built on two stages, and the first happens long before any visitor arrives. Someone walks the space with a capture device and records imagery from many positions and angles. Software processes that imagery into a visual map: not a picture, but a database of distinctive visual features and their positions in three dimensions. A corner of a sign, the edge where a wall meets a floor, the pattern of a structural column, each becomes a landmark with known coordinates.
The second stage runs live on the visitor's phone. When the user opens the wayfinding app and raises the camera, the app extracts features from the live view and searches for the same features in the pre-built map. When enough of them match, the system can solve backwards for the only camera position and orientation that would produce that particular view. In effect it asks: given that I can see these known landmarks arranged exactly this way, where must the camera be standing and which way must it point. The answer is the user's pose, position plus facing direction, often accurate to well under a metre when the match is strong.
That reliance on facing direction is what sets VPS apart from radio methods. A signal-based system can tell you roughly where you are but struggles to know which way you are looking. VPS solves both at once, because a camera image encodes direction as well as location. Knowing exactly which way the user faces is precisely what the next capability needs.
Why VPS powers AR wayfinding
Augmented-reality wayfinding overlays directions onto the live camera view: a floating arrow on the floor, a pin hovering over the right doorway, a path that bends around the corner ahead. For that overlay to sit correctly in the world, the app must know not just where the user is but exactly which way the phone is pointing, to within a degree or two. Get the direction wrong and the arrow points at the wrong shop.
This is why VPS and AR wayfinding are so closely linked. Because VPS recovers full pose, position and orientation together, from the camera image, it gives AR the anchor it needs to place graphics convincingly in the real scene. A radio-only system rarely knows facing direction well enough to keep an overlay glued to the floor as the user turns. VPS does, which is what makes the arrows-on-the-floor experience feel like it belongs to the building rather than floating loosely on the screen.
For that reason, VPS is at its most compelling in exactly the moment a lost visitor holds up a phone and asks "which way." It is a foreground, on-demand experience: the user actively points the camera and follows the overlay. That framing matters, because it is also the source of the technology's main limits.
The limits: lighting, feature-poor spaces, mapping upkeep, battery, and the camera in hand
VPS is impressive when conditions suit it and noticeably fragile when they do not. The constraints are worth stating plainly.
- Lighting. The live camera view has to resemble the conditions the map was built under. Dim spaces, harsh glare, and strong changes between day and night lighting can all reduce how many features match, which degrades or breaks the position fix.
- Feature-poor spaces. Matching needs distinctive visual detail. A long corridor of identical white walls, a plain glazed atrium, or a repetitive facade gives the system little to lock onto, and repeated patterns can even produce confident but wrong matches.
- Change over time. The visual map is a snapshot. Refit a store, move signage, run a seasonal display, or change the layout, and the map drifts out of date until someone recaptures the space. That upkeep is an ongoing cost, not a one-time build.
- Battery and effort. Running the camera and live feature matching draws power, and more to the point, the user has to hold the phone up and point it. That is fine for a short "which way" moment and tiring as a continuous mode of navigation.
- The camera in hand. VPS only works while the camera is actively pointed at the scene. It is not a background, hands-in-pocket method, and it cannot locate anyone who is not deliberately holding up a device.
None of this makes VPS a bad technology. It makes it a foreground, on-demand tool with real environmental dependencies, best judged by whether your space and use case suit those conditions. For how accuracy claims hold up once a method meets a real building, see indoor accuracy in real conditions.
The privacy question of a camera-in-the-loop method
There is one more limit that is not about accuracy, and it deserves its own heading because it shapes where VPS is acceptable at all. VPS works by processing a live camera feed of a public space. Even when the app only extracts features and keeps no images, the method still requires pointing a working camera around a venue full of other people, and users, venue operators, and regulators increasingly ask what that camera can see and where the imagery goes.
The concern sharpens in sensitive settings. A camera raised in a hospital corridor, a school, a changing-area vicinity, or a busy transit hall captures bystanders who never chose to be in frame. Whether the system stores those frames, processes them on-device, or sends them to a server changes the risk profile, and those distinctions are exactly the kind of thing a data protection assessment has to pin down. This is the same fault line that separates camera and biometric methods from sensing that never forms an image at all, discussed in camera and biometric methods and, for the counting side, computer vision versus depth sensing.
The point is not that VPS is inherently unsafe. Well-designed implementations minimise what they retain. The point is that a camera in the loop is a decision with privacy weight, and for some venues that weight alone rules it out regardless of how accurate it is. For the wider treatment of locating people without a camera, see positioning without a camera.
The camera-free alternative for continuous visitor positioning
Where a venue wants continuous, background positioning without a camera pointed at anyone, a different method is worth knowing about, and it is worth being exact about what it is not. Ariadne does not run a visual positioning system. It does not use the phone camera, build a visual map, or process any image of the space. The contrast is the whole point: VPS is camera-based, Ariadne is camera-free.
Ariadne measures this with Hybrid Fusion, its patented camera-free method. Time-of-Flight depth sensing counts every visitor at the entrances, capturing geometry rather than images, while patented phone signal sensing follows movement through the interior, detecting the signals a phone emits even in airplane mode, and tracks that movement to about one-metre precision. The sensor streams both feeds to Ariadne, where Hybrid Fusion combines them into one trajectory per visit and computes counts, dwell, and paths. The streams carry no identifier: no MAC address, no device ID, no biometric data, and no camera is involved. Identifiers are stored only when a visitor explicitly opts in, which keeps the method GDPR-friendly and outside biometric territory.
The two methods answer different questions. VPS is a foreground experience: a user raises a camera and follows an overlay, and it excels at the phone-held moment. Ariadne's method runs in the background and never forms an image, which suits continuous, venue-wide flow measurement where pointing a camera at the public is neither practical nor wanted. Because Time-of-Flight captures geometry rather than pictures, there is no image of anyone to hold, so the privacy question that shapes a camera-in-the-loop method does not arise in the same way. If your requirement is continuous positioning without a camera, see camera-free indoor navigation.
FAQ
What is a visual positioning system (VPS)?
A visual positioning system locates a user by matching what their phone camera sees against a pre-built visual map of the space. It recognizes visual features and computes where the camera is standing and which way it points, which is what makes phone-held augmented-reality wayfinding possible.
How accurate is VPS indoors?
When the space is well-mapped and well-lit and enough features match, VPS can locate a user to well under a metre and, unlike radio methods, also recover which way they are facing. Accuracy falls in dim, feature-poor, or changed environments, or when the live view does not resemble the captured map.
Why does AR wayfinding use VPS?
Augmented-reality overlays have to sit correctly in the live camera view, which needs both position and facing direction to within a degree or two. VPS recovers full camera pose from the image, so it gives AR the anchor it needs to keep an arrow glued to the right spot as the user turns.
Is a visual positioning system a privacy risk?
It depends on the implementation, but a camera in the loop always carries privacy weight because it processes a live view of a public space and the people in it. Where frames go, how long they are kept, and where they are processed all shape the risk, which is why some venues rule out camera-based methods regardless of accuracy.
Do I need cameras for indoor positioning?
No. Ariadne counts with Hybrid Fusion: Time-of-Flight depth sensing plus patented phone signal sensing, never cameras. Time-of-Flight captures geometry rather than images, and signal sensing captures no MAC address by default, so the measurement involves no video, no faces, and no biometric data.
Does Ariadne use a visual positioning system?

No. Ariadne does not use the phone camera, build a visual map, or process any image. It is a camera-free approach based on Time-of-Flight depth sensing and patented phone signal sensing, fused centrally, designed for continuous, anonymous venue-wide flow rather than foreground AR wayfinding.



