The growth of this particular field was possible thanks to technological progress in the imaging market (high resolution cameras, storage of thousands of pictures and movies) and embedded systems (high computing capabilities in our phones, cars, cameras, watches, even fridges....). In fact, that progress is not really connected with hardware. Hardware is the Opto-electronics and Electronics market. Computer Vision uses many medical researches about cognition of human. After extraction whichever data it needs from an image, CV algorithms try to outperform a human in recognizing, analyzing and categorizing images. It's really a simple idea at first sight. If there is a chair in the image, it's just natural to write a program that says so. In fact …. recognizing a chair is a big problem for engineers. A human can not only perceive on hundreds levels of recognition layers, but can also create non-existent data. Our biologic vision reconstructs most of the data we need to see a chair. I will try to bring the topic closer to you with some real life applications that are used now or will be used in a few years.
"What is It?" nr 1. |
"What is It?" nr 2. |
"What is It?" nr 3. |
First big wave of modern computer vision came with algorithms of face detection. Nowadays cameras have it, facebook has it, every intelligent CCTV camera has it. Face detection is the first so successful technology to come into houses of common people. Of course, even before facebook there were created military-grade systems for tracking and intrusion detection but even the best systems had to be controlled by humans. The false positive rate was too high and cost of such systems was huge because of low computing capabilities of computers. The industry had it, though it was (and in many fields still is) limited to very simple operations as blob detection and color segregation. Unfortunately industry needs success rate or efficiency of visual cognition systems on level higher than 99.9% and it cannot be achieved in most cases with current state of the art algorithms. So the biggest market for computer vision became mass market, where each new visual function is well received even if it fails in 20% cases. There are technologies that are nowadays seen as a computer vision but are really rather optical systems or sensors, as laser barcode scanners.
Why face detection? Because face is so easy to detect. It's so characteristic and different in nature and synthetic graphics (made by a human), that its detector has a high detection rate even in the dumbest implementations seen in first “intelligent” cameras. In comparison, there is no known descriptor for a mug ... In fact people have problems with it as well. They recognize environment, as background: table, kitchen, spoon etc. and assume that something of such a simple shape has to be a mug. Without background it is a very hard task even for a human.
Detection and recognition are two different tasks. If we can detect a face...we can detect only a part of graphics that looks like a face. We don't know that it's “Brad” or “Eva”. That technology is just starting to evolve and you can see it on facebook, though they are striving to upgrade the system for high recognition rate. Its roots are in biometrics, which you can probably know from science fiction films and fingerprints and iris scanners.
So face detection is really a simple task. If you want to analyze images of buildings, crowds, cars, different actions and objects, moving bodies – it's a whole new world to research. So what can we really observe nowadays in the state of the art computer vision industry?
Let's start with surveillance. You probably know that London has hundreds of thousands CCTV cameras already installed. Each London citizen is recorded more than a 300 times a day. At first it was used mainly to track down suspects “manually”. What can do computer vision do nowadays? These are the data that can be extracted in real-time from a camera for business intelligence purposes; for example, people crossing a scan line, baggage, car tracking, sitting, walking, running. And that's only CCTV. Do you know what data some companies take from cameras in big malls? Places of biggest attraction, speed of walking between specific shelves. Some cameras can say where you are moving your hand while taking a product from a shelf. In most cases such systems have low efficiency, at 50-70%.
On the other hand , U.S. require new cars to have rearview cameras by 2018. Nvidia created some time ago a new embedded system with huge computational powers suited for cars (Tegra TK1) for that reason. The car accidents will be recorded with exact information about car numbers, gps data and environmental specification. If there are cameras on the side and front view, the systems for automatic parking will be commonly used. Best vision systems (Google has been a pioneer for a long time) can drive a car better in a city with many people than a real driver -
http://www.technologyreview.com/news/520746/data-shows-googles-robot-cars-are-smoother-safer-drivers-than-you-or-i/ .
Can you see a car and a pedestrian? This pedestrian is a car graphics copied, rotated and moved. Do we want to build an algorithm that "thinks", that there is a pedestrian or a rotated car? |
Computer vision is 3D imaging as well. These are 3D scanners with structured light, Time of Flight (ToF technology), laser and infrared stuff that most of common people don't know. We can do 3D models with a single lens camera. So now...imagine that we can make a sparse 3D model of each town. Looks like Google Street View? Nope. Google wants to create real 3D towns. And now it can be constructed from a mere.... Flickr photos.
https://www.youtube.com/watch?v=ofHFOr2nRxU
OCR technology is almost everywhere. We can build document scanners (as a passport scanner) with a simple internet camera and some programming skills. We can analyze whole graphics and tell what elements make it attractive to people.
Google indexes all graphics on the internet with many levels of graphical descriptors. If you use some licensed graphics on your website – Google knows. Even if it was resized, cut in half, compressed and color space was altered. In USA if you earn a lot of money using such graphics, you will meet the police department quicker than you bought your server.
Did you know that with a single smartphone camera you can measure your own heart rate? That's called micro moves. Just watch it on YouTube. That's a better introduction than reading about it...
https://www.youtube.com/watch?v=3rWycBEHn3s
That was really interesting. I'm glad to be updated by your post on this subject. I myself implemeted for my bechelor of science thesis the Hough transform aided by automatic histogram analysis for irregular patterns. I failed to continue any work connected with Computer Vision ever since, however this was really satisfying field of programming when algorithm got better and better. I must admit that I envy you your work :)
ReplyDeleteGrzegorz, that’s very interesting. I had no idea it’s used so commonly and that in London you’re recorded 300 times a day... What do they do with the data? Do they use them to prevent crime or make investigations? Or do they just like to spy on people?
ReplyDeleteAre you maybe aware if such systems are common also in Poland? In your presentation you wrote there field wan’t so popular in our country, so I expect we’re not so advance yet, are we?
I’ve been on a lecture where anti-facial recognition makup was presented: http://www.theartblog.org/2011/04/interview-adam-harvey-and-the-anti-face/
DeleteI even tried to do it and tested it on Facebook facial-recognition engine, but my photographer didn’t want to cooperate, saying it’s a stupid idea... Well, maybe it’s better I haven’t got such pictures on FB ;)
I don't have any knwoledge about real systems incorporated into CCTV network in London working real-time (I know about those working on demand with little human help though). That statistics includes priopretary cameras in shops as well and they are not connected with any bigger IT infrastructure. These videos are only used if the police asks the shop owner to disclose the recordings. It's really difficult to find any comprehensive data on government systems and scale of it's usage.
DeleteIn Poland we do not have such systems. What's more important - most of it does not include recording. It's just "electronic eye" for security. However we do have some minor experimental systems calibrated for human tracking on few bigger road crossings in Warsaw and airports. Unfortunately I don't remember the names. Such systems consists of around 5-20 CCTV cameras. I don't know how it is used in practise.
@Kasia - it would be really very hard to wear one of those CV Dazzle looks every day:)
DeleteI’ve really enjoyed this topic! Great idea and great presentation!
ReplyDeleteJust to give you some more examples, where this (or similar) technologies are used… Do you have camera? Some of them are already having not only face recognition, but smile detection as well – they’re taking a picture when they detect your facial movement. Some machines are also trying to recognize your mood based on your mimic (https://www.youtube.com/watch?v=45eLpzk6N34), but what’s most important for me is commercial usage of face mimic capture. While I was preparing my graduation diploma I did some research and came across this software: https://www.youtube.com/watch?v=24qUFDdZAG8. It helped me a lot with mimic animations, using only Kinect camera. It’s not Avatar (https://www.youtube.com/watch?v=1wK1Ixr-UmM), but still it was useful.
Great post with lots of knowledge!
ReplyDeleteI did a OCR assignment project in PJWSTK which was a simple letter recognition (drawn on a 16x16 grid) application and it proved to be a pain in the neck, absorbing loads of my free time :) I wonder what the architecture (both logical and physical) of the GoogleCar auto-driving system looks like.
Speaking of Google - recently they announced that their captcha recognition algorithm (which will be used in recognizing precise building addresses in Google Street) proved successfull in 99,8% cases - that's way better (some 10% i think) than human efficiency...
Nice topic Grzegorz - thanks for that ;)
ReplyDeleteIt was specially interesting for me, as I have nothing to do with computer vision or graphics. For me is like magic what you have written there :)
Thanks for the interesting presentation! I didn't know much in this topic so I was surprised what this technology can do, especially the micro moves! Maybe because of such technology we will detect if someone have heart attack or suddenly stops breathingon the street, this could be very usefull.
ReplyDeleteThanks Grzesiek for interesting presentation. I was not aware that some of computer vision inventions are mounted in cars. I hope this will solve some issues with car accidents especially on parking.
ReplyDeleteI had viewed some time ago episode of Top Gear. In that episode they tested new Mercedes. They presented ability of that car to drive behind other car using electronic software. The driver did not have to touch brake pedal to stop the car. It done it by itself. That was amazing. I had heard some information about augmented reality made by Google. I think you can look at it – from some teasers I got the impression that this is more than anything I could imagine.
https://www.youtube.com/watch?v=Vb2uojqKvFM
Google has some fantastic vids with google glass. Problem is that...it's only rendering. Nevertheless we should expect such functions in near future (working perfectly fine).
DeleteI know that Google's car is already in, if we can call it that way, productivce state. Furthermore I have seen a lot of applications that are more sophisticated - like searching for your keys with smartphone's camera when you urgently need to go out and you cannot find them ;)
ReplyDeleteGoogle glasses - to make long story short, is connected with CV along with agumented reality - as it has to recognize places as well (but mostly it may rely on GPS location..)
I have seen a couple of years ago that one person had his notebook locked with camera/face combo - if it didn't recognize your face, you could not log on... even sometimes the owner was angry because of the fact it was not able to recognize his face... the fun part about that is another colleague who made a stupid smile and it passed... since that moment I was very sceptic about those algorithms, however recent experiences with MacDonald's face-recognition system, which bases on some very advanced stuff, was able to tell if person is an oldie/young/mother with child etc and show reliable adverts on the screen.
To the "spy" stuff... under the term of tracking "bad guys", common citizens can also be tracked easliy, what fears me a lot.. is that what we call freedom? :)
This topic is very prospective and developmental.
ReplyDeleteThanks to that technique, in the future we will be able to visit everything sitting in front of the computer.
Development of such techniques and monitoring what is in London should have a big impact on catching criminals.
I think that we are also constantly observed in the big cities in Poland, but for sure not as often as in London.
Number of cameras in Polish cities is still growing and the largest number in the intersections and traffic lights.
The advantage of this observation is for sure rise of safety.
This post was a little bit frightening - I cannot imagine being recorded 300 times a day (although probably I will never notice that I started to be recorded..:).
ReplyDeleteI don't know how much is it connected to the topic picked by you on the scientific basis, but after reading your post I recalled a news that I've read a couple of years ago about a Google AI which learned to recognize cats on YouTube videos without any earlier clues or data - you can read about it here.
As it comes to the pictuers posted by you - I did not have a problem with recognizing two of them -2 qnd 3 (even in a cropped version, although with no. 2 I had to have a second look), but I still have a problem with picture no.1 - is it a water hose?
It's a fire hydrant.
DeleteThis comment has been removed by the author.
ReplyDeleteWelcome to Matrix :-) Actully advanced vision solutions are a futuristic toys in the branch that I work for which. The glasses with the map of the Warehouse, for example a big logistic centre, would really help workers to find very quick the product they look for the Picking and then packaging. I didn't really know that this technology is so advanced and really so "matrix like", because it seems to me like a very complicated thing to work with the image or thousands of images to mash them up and analyse with some rocket science algorithm. Anyway I would love to try some google products like Glass or anything. The augmented reality is a very interesting topic.
ReplyDeleteThis is very interesting for me! I didn't know that many details about this part of IT until I have read your article. I really like it. This is one of that things which can help people. I cannot recognize many of those things which are on photos and I have to touch men to know is he breathing. If computer can help me with those things, I'm fascinated by this. Thanks a lot for this article.
ReplyDeleteInteresting read, thanks a lot !
ReplyDeleteDo you have the same impression that development of computer vision or better say application of this capability is tightly connected with artificial intelligence ?
Humans will not be able to analyze such huge amounts of data (London cameras example).
I share few opinions here that such technologies could be foreseen as scary, but in the same time real-world experience telling us that worst things people are doing is when they think that nobody can see them. So maybe it will help if people would not have such sense of security ? Maybe if they would think (even if it's not true) that by doing crazy things in London they are constantly recorded - they would think more before doing ?
I'd like to share this video, just for "fun": https://www.youtube.com/watch?v=1U8KsQPIrY0
Thank's for reply. Almost all of the state-of-the-art algorithms contains algorithms for a AI family, but mostly statistic-based and from data mining field. AI specific, as neural networks or genetics, are rarely used in practise because they are hard to maintain and usually even researchers don't know exactly why they work. Such systems are hard to ship into market.
DeleteI didn’t realize that computer visions have the lots of different using… These days it plays an important role in every sector of life, and I’m just waiting to see a progress of these solutions in the future. For example, these movements invisible for the humans’ eyes… this discovery can be a first step to reveal more aspects in a human body.
ReplyDeleteThe biggest concern for me is surveillance, because I’m not sure if our data is used for our safety or for other purposes. For example the government or administration could easily use this date to fight their opponents politically. As we all know you can always find some dirt on someone. Apart from this one concern I’m impressed by this technology and I’m sure It’ll get us many profits.
ReplyDelete