This was a pretty short and minor project I did for my "gymnasiearbete" way back. The idea started as, "What if top down reversing camera, but obstacles are highlighted as red?"

After politely begging the school to buy a stereoscopic camera for my project I could get to work. Unfortunately I don't have many pictures from this project. (This will be based on the paper I wrote on it at the time.)

I start by introducing how stereoscopic camera feeds can be used to determine a depth buffer. This was also a bit before computer vision started getting very machine learning oriented, so it was just good old comparing 2 pictures and trying different offsets to see where there was the least disparity between the two in a small subsection of the image and then applying some filtering on the result.

I projected the pixels into 3D space with the acquired depth to make projections from different angles a more fancy version of those car cameras that just warp the picture to make it look top down. If I had done this today, I'd probably make it a mesh and then render that mesh to have no gaps in the point cloud. At the time i decided to instead do a morphology type filter to fill the empty gaps between the points in the point cloud.

Then I get into how to determine the height of the ground, so that I later can classify things above and below ground level to be obstacles. I start by stepping from the right towards the left which in the case of this graph is from below ground and upwards. Once I find the the peak I grab the samples close to the peak to do a Gaussian distribution curve fitted to those samples. The center of the curve is used as the ground height with some margin for error.

I then classify each pixel as an obstacle or ground and I applied a red or green hue to the pixel based on the classification to get an image like the one above. Conveniently the camera I used had a built in IMU that I used to determine orientation and could use to help calculate the normal of the ground plane so the fitting would work in real-time without manual adjustments.

I could then project this image into top down. Admittedly the irl testing I did seemed to indicate most people preferred the camera pov with red and green obstacle detection over top-down pov obstacle detection, camera pov color and top-down pov color.

I'll also include this picture of some top down tests I did. This one completely filled gaps but couldn't run real-time, as it was made in python (sorry) and it was not hardware accelerated.


I remember this project being a lot of fun to work on. At the time when I made this being 18 I found this quite challenging and fun to have a crack at. Looking back with the skill-set I have today, I would do things very differently. Although I would probably not work on autonomous car tech in the first place today, it's really just emulating trains, but less efficient if you think about it, which is of course an incredibly boring answer to cool tech, but infrastructure is better when it is boring and reliable vs cool but experimental and potentially unreliable.