Clean modern interior doorway viewed from below, three small ceiling-mounted sensor pucks visible in the soffit (represent...

Stereo vs ToF vs thermal sensor: a vendor-neutral guide to picking a people counter

Jun 3, 202615 min read

Three sensor families do most of the work

Almost every credible people counter sold into retail, transport, public buildings, and industrial sites today is built on one of three sensing methods. Stereo vision pairs two image sensors and triangulates depth from the difference between them. Time-of-Flight, or ToF, fires invisible infrared pulses and measures how long they take to return. Thermal, sometimes labelled as passive infrared imaging, reads the heat signature of a body against a colder background. Each family has been refined for more than a decade, and each makes a different set of trade-offs across accuracy, lighting, install, calibration, group detection, and privacy.

Flat vector infographic comparing stereo, ToF, and thermal people-counting sensors with icons and short feature labels in Ari

This is a vendor-neutral guide. The goal is not to name a winner, because the winning sensor depends on the building you intend to install it in. The goal is to give a buyer enough understanding of the three families to ask the right questions of any supplier, and to know which family fits a cold-storage warehouse, a daylight-flooded atrium, a low-lit boutique, or an outdoor public square. The companion Ariadne hardware overview sets out where our own sensors fit in this map; this article is the underlying physics.

How each family works in one paragraph

Stereo vision

Two image sensors a few centimetres apart capture the same scene from slightly different angles. A processor compares the two images and computes depth from the parallax, the way human eyes do. The output is a depth map of the scene, from which a counting algorithm separates people from the floor and from each other. Stereo vision is an image-based method, which has two consequences worth flagging up front: it needs reasonable light to work well, and it produces video frames at some point in the pipeline that may or may not be retained, depending on how the device is engineered.

Time-of-Flight

A ToF sensor fires a brief pulse of infrared light from a small emitter and measures how long the reflected pulse takes to come back from whatever is in front of it. That round-trip time gives the distance to every point in the field of view, typically to within a few centimetres. There is no image of the scene, only a depth grid: a low-resolution map of heights and shapes. People are detected as objects rising above the floor plane, and counted as they cross a virtual line under the sensor. The signal is invisible to the eye and works the same whether the room is lit or dark.

Thermal infrared

A thermal sensor reads the long-wave infrared radiation that warm bodies emit. A person at roughly 33 to 36 degrees Celsius shows up as a bright blob against a cooler floor or background. The output is a low-resolution heat image. Thermal does not need any visible light, and it does not care about the colour or pattern of the floor or the clothing. It does care, a lot, about the temperature of the environment and about anything else in the scene that is roughly body-warm.

Accuracy: what you can realistically expect

Across the three families, top-tier modern sensors all claim very high counting accuracy in their datasheets, often quoted in the high 90s as a percentage. Those numbers are real in the conditions a vendor controls for and are a poor guide to the conditions a real building presents. A more honest comparison is to look at where each family loses accuracy.

  • Stereo vision. Counts accurately in well-lit, visually busy scenes. Loses accuracy when light drops, when shadows from windows or fixtures move across the scene, when patterned floors or reflective surfaces confuse the disparity calculation, and when groups bunch closely so that the depth map merges several people into one blob.
  • Time-of-Flight. Counts accurately regardless of light level, because it carries its own infrared illumination. The depth map separates people cleanly from the floor and, with adequate ceiling height, separates adjacent people from each other. Loses some accuracy outdoors in strong direct sunlight, which carries enough infrared to wash out the return pulse on cheaper sensors, and at very long ranges where signal-to-noise falls.
  • Thermal. Counts accurately in cold or dark spaces, where a body stands out sharply against the background. Loses accuracy when the background warms toward body temperature (a sunlit floor, an overhead heater, a hot piece of equipment), when bodies are heavily insulated by clothing, and at any range where the thermal blob shrinks toward a single pixel and becomes hard to separate from another body next to it.

A useful mental model: in a steady indoor environment, all three families will hit similar headline accuracy. The differences appear at the edges, and the edges are where most install regret comes from.

Lighting independence

Lighting is the single biggest separator. Stereo vision is a visible-light technology and behaves like a camera: it works well from bright daylight down to good office lighting, and degrades as illumination falls. It can be paired with infrared illuminators for low-light environments, which helps but adds parts and cost. Time-of-Flight and thermal are both active or passive infrared and ignore visible light entirely. A ToF sensor performs identically at 2 a.m. in a closed shop and at noon in a sunlit lobby, with the noted caveat about direct outdoor sun. A thermal sensor works in a pitch-black warehouse without any modification.

If the install has variable light through the day, or runs into the night with the lights off, ToF and thermal are the safer choices. Stereo vision is workable when illumination is controlled, predictable, and bright enough to read a printed page.

Install complexity

All three families ship as small ceiling-mounted devices that can be mounted over a doorway or in a corridor. The differences come from what they need around them.

  • Mounting height. Stereo and ToF typically install between 2.4 and 4 metres above the floor, with thermal often sitting in a similar range. Stereo needs enough height that the two sensors see the same scene with useful parallax; very high ceilings reduce depth resolution. ToF range falls with the square of distance, so above roughly 5 to 6 metres the accuracy starts to drop. Thermal can mount higher than stereo in some cases, because a warm body remains visible to a thermal sensor as long as the resolution is enough to separate it from neighbours.
  • Field of view and coverage. ToF tends to give the narrowest, most predictable footprint, which is useful at a single doorway but means more devices for a wide entrance. Stereo and thermal sensors with wide-angle optics can cover a larger area per device, at the cost of edge accuracy.
  • Power and data. All three are most often deployed over Power over Ethernet (PoE), which delivers power and data on one cable. Battery variants exist for retrofit in buildings with no convenient cabling, but they trade install simplicity for a maintenance commitment.
  • Outdoor rating. Most stereo and ToF devices are designed for indoor use. Outdoor counting at a square, a transit interchange, or an open-air pedestrian zone usually wants either an IP-rated outdoor variant or a thermal sensor, which is naturally well suited to weather-exposed environments.

Calibration cadence

Calibration is the difference between a sensor that holds its accuracy for years and one that quietly drifts six months after install.

  • Stereo vision. Sensitive to the geometry of the two image sensors and to scene changes. A bumped sensor, a new floor finish, or moved fixtures can require a re-zero. Light conditions also matter: a stereo system commissioned in summer daylight may need recommissioning when winter overcast settles in.
  • Time-of-Flight. Largely self-calibrating against the floor plane. A ToF sensor reads the height of every point continuously and adapts to small changes in mounting angle. Major scene changes (new partition, moved doorway) still call for a check, but day-to-day drift is low.
  • Thermal. Sensitive to ambient temperature drift. The contrast that defines a person depends on how warm the background is, so a thermal install in a building with seasonal heating and cooling extremes may need its detection thresholds revisited across the year.

Across all three families, the right cadence is the same in spirit: a documented post-install commissioning count, periodic spot checks against a manual reference, and a full recommissioning after any physical change to the entrance, the floor finish, or the fixtures around the counting zone. A separate people counter calibration guide sets out the procedure in detail.

Group detection

A surprising number of people-counting problems come from one issue: a group of three or four people crossing a doorway shoulder to shoulder and being counted as one or two. Each family handles this differently.

  • Stereo vision. Disparity-based depth has a real chance of separating adjacent bodies, provided light is good and the algorithm has learned the shapes. Performance falls when the scene is busy enough that the depth map merges people.
  • Time-of-Flight. A high-resolution ToF depth map is the strongest performer here, because adjacent heads are at slightly different heights and the sensor reads them as distinct objects. ToF is the most reliable family for tight queues, families with strollers, and groups of children.
  • Thermal. Adjacent bodies often merge into a single warm blob, especially at higher mounting heights, because the resolution of a low-cost thermal array is limited. Thermal is reliable for counting individuals but weaker at separating tight groups.

If group counts matter (entry to a paid event, a stroller-heavy mall, a family-friendly museum), ToF tends to be the safest bet, with stereo a workable second under controlled lighting.

Privacy posture

Privacy is where the three families diverge most sharply, and where buyer due diligence is most often skipped. The question to ask of any sensor is not what it ships to the cloud, but what it captures in the first place. The privacy posture of a device is set by its physics, not by a settings page.

  • Stereo vision. Captures images of the scene as part of its sensing pipeline. Even when the unit discards the raw frames locally and sends only count events, the question for a data protection officer is whether the unit is processing identifiable images at any point. Many vendors process and discard frames inside the hardware; some do not. A buyer should ask for documentation in writing.
  • Time-of-Flight. Captures a depth map only. The output is a grid of heights, not pixels of a face. There is no image, no recognisable person, and no biometric data in the signal. From a GDPR standpoint, ToF is the cleanest of the three families to defend because there is nothing identifying in the first place.
  • Thermal. Captures a low-resolution heat image. At the resolutions typical of people-counting thermal sensors, an individual is a blob of pixels with no facial features. Thermal is generally treated as non-personal data, though the analysis is less settled than for ToF and depends on the resolution of the device.

The line worth holding to during procurement is the same one a museum, hospital, or public-sector buyer will hold to: a system that captures no images and no identifiers is easier to deploy, easier to explain to a board, and easier to defend if a complaint is ever raised. ToF wins this comparison clearly. Stereo can be deployed responsibly with the right vendor commitments. Thermal sits in between.

Where each sensor family fits which environment

Putting the trade-offs together, the question of which family to install in which space largely answers itself.

flat vector infographic comparing stereo, ToF, and thermal people-counting sensors with icons and feature labels

Cold storage and freezer warehouses

Below zero, with workers in insulated clothing, thermal contrast collapses and people stop standing out from the background. Visible light may be limited or harsh from a small number of bright lamps. ToF is the right family here: the infrared depth pulse is unaffected by temperature, the depth map separates a heavily clothed worker from the floor cleanly, and the sensor itself runs reliably in a cold environment.

Outdoor pedestrian zones and squares

Direct sunlight, weather, and unpredictable backgrounds make stereo vision and standard indoor ToF less reliable. Thermal sensors, in suitable enclosures, handle weather well and read bodies clearly against a cool ground. For mixed conditions, outdoor-rated ToF variants exist and can be the right answer at sheltered transit entrances; for fully open spaces, thermal is often the more durable choice.

Low-light retail, restaurants, and bars

Mood lighting kills stereo accuracy. ToF works identically in any light level and is the natural choice for a boutique with dim spotlights, a restaurant where the lighting drops sharply in the evening, or a bar where ambient light is by design low. Thermal also works, but ToF gives better group separation in tight entrances.

High-ceiling atriums and transit halls

At 6 metres and above, ToF range starts to limit accuracy and stereo depth resolution starts to fall. Two practical options: a multi-sensor array of either family at lower mounting positions over each entry point, or a thermal sensor mounted higher with looser group-separation expectations. The right choice depends on whether the building wants per-doorway accuracy or only an aggregate building count.

Daylight-flooded office lobbies and shopping centres

All three families work here. Stereo can be excellent because light is plentiful and the scene is controlled. ToF is excellent because the infrared signal is well within range. Thermal is workable but may struggle on a hot summer day when the floor warms and contrast falls. The decision often comes down to privacy posture and install logistics rather than physics.

How Ariadne approaches the choice

Ariadne is camera-free by design. Our hardware is built around Time-of-Flight depth at entries and patented phone signal sensing in the interior, fused centrally in the platform rather than at the device.

Ariadne measures this with Hybrid Fusion, its patented camera-free method. Time-of-Flight depth sensing counts every visitor at the entrances, capturing geometry rather than images, while patented phone signal sensing follows movement through the interior, detecting the signals a phone emits even in airplane mode. The sensor streams both feeds to Ariadne, where Hybrid Fusion combines them into one trajectory per visit and computes counts, dwell, and paths. The streams carry no identifier: no MAC address, no device ID, no biometric data, and no camera is involved. Identifiers are stored only when a visitor explicitly opts in, which keeps the method GDPR-friendly and outside biometric territory.

We chose ToF for entry counting because it gives the strongest group separation, holds its calibration through changes in light and scene, and captures no images at any point in the pipeline. Combined with signal sensing for interior journeys, the same physical sensor reports counts, dwell, and paths without any biometric data crossing the wire. The hardware sits in the Ariadne sensor lineup, the method is described on how it works, and the data handling is set out in the privacy policy.

That does not make stereo or thermal the wrong answer for every site. A cold-storage operator with no need for journey analytics will be well served by ToF alone. An open-air pedestrian programme may genuinely want a thermal device in an outdoor enclosure. The point of a vendor-neutral comparison is to make those calls on physics, not on marketing.

A short procurement checklist

If you take one thing from this guide into a vendor conversation, take this list. Any serious supplier of any sensor family should answer all of these clearly and in writing.

  1. What does the sensor capture? Images, depth map, thermal image, or something else. Ask what reaches the processor, not only what reaches the cloud.
  2. Is any of it personal data under GDPR? Get the answer in writing, ideally with a reference to the relevant Article 29 / EDPB guidance the vendor relies on.
  3. How does the sensor perform in your lighting? Ask for documented accuracy at your actual lux range, not at a controlled lab condition.
  4. How does it handle groups? Ask about counting accuracy at peak doorway density, not at one-person-per-second throughput.
  5. What is the calibration cadence? Ask how often a sensor of this family typically needs recommissioning, and what triggers it (light change, scene change, temperature drift).
  6. What is the mounting envelope? Ceiling height range, supported door widths, multi-doorway coverage, indoor versus outdoor rating.
  7. What is the maintenance commitment? Power source (PoE or battery), expected service life, firmware update process, and what happens when a sensor fails.

Those seven questions sort credible vendors from glossy ones across all three sensor families. Once you have the answers, the choice of family for your building tends to be obvious. The hub linked above and the people counting solution page set out what that looks like in practice for an Ariadne deployment.

FAQ

Which sensor family is most accurate for people counting?

Top-tier sensors of all three families reach broadly comparable headline accuracy in steady indoor conditions. The real differences appear at the edges: stereo loses accuracy in poor light, thermal loses accuracy as backgrounds warm toward body temperature, and ToF holds accuracy across a wide range of lighting and tends to be the strongest for separating people in tight groups.

Does any of these sensor families use a camera?

No. Ariadne counts with Hybrid Fusion: Time-of-Flight depth sensing plus patented phone signal sensing, never cameras. Time-of-Flight captures geometry rather than images, and signal sensing captures no MAC address by default, so the measurement involves no video, no faces, and no biometric data.

Is Time-of-Flight better than stereo for privacy?

Generally yes, because ToF captures a depth grid rather than an image of the scene. Stereo systems can be deployed responsibly when the vendor documents how raw frames are handled, but ToF avoids the question entirely because there is no image to handle.

Can a single sensor type cover every environment in our estate?

Often, yes, particularly if the environments are all indoor and similar in scale. ToF covers indoor sites with the widest range of lighting and ceiling conditions. For cold-storage or outdoor sites, a mixed deployment is sometimes the right answer. The procurement checklist above is the practical way to decide.

Related articles

More on People Counting:

people counting platform page

Talk to us

Two questions, twenty minutes, a real walkthrough of your venue's footfall.

What to expect

  • 20-minute screen share, walked through on your venue map
  • Live walkthrough of Hybrid Fusion sensor outputs
  • Where Ariadne fits, and where it doesn't

Got a different question?

Send us a message

Anything that isn't a sales conversation. We'll route it to the right person and get back within one business day.