Friday, August 12, 2016

AIR 037 Fuchun Sun at Tsinghua University Visual or auditory cognition of the

AIR 037 | Fuchun Sun at Tsinghua University: Visual or auditory cognition of the 8-year war

On August 13, CCF-GAIR intelligent handling of the second day of the Summit session, Professor, Tsinghua University Fuchun Sun brings major research projects of the National Natural Science Fund Committee Visual or auditory information the Committee's eight-year plan. Vera Bradley case

Fuchun Sun said that after 8 years of this project demonstrate until 2008 in the National Natural Science Fund Committee was established, which is a major and audio-visual night's cognitive computing project. From 2008 to 2017, total 8 years, their aim is the study of human cognitive mechanism of Visual or auditory, developing new computational model, improve your computer with people related to the Visual or auditory perception images, voice and text comprehension and efficiency, was designed around the expression of cognitive processes and the basic scientific questions.

What is the human cognitive mechanism of Visual or auditory?

 Extraction and expression of basic characteristics of perception and integration

 Machine learning and understanding data Vera Bradley cell phone case

 Study on cross-modal information collaboration and calculation of

Their main job is to study the human visual auditory perception, how to encode these information in the brain is? How do brain regions cooperation information integration? And put these into quantifiable model, so we can encode the auditory and Visual information, through a calculation model for processing to perceive and understand the environment, and human understanding that understanding and comparing eventually make use of this technology in unmanned areas.

Shows from 2008 in the speech begins 8 years important achievements in the Visual or auditory perception, and since 2009 has held a total of 7 times no one challenge in the process:

Further results on cognitive science.

Natural language understanding and brain-computer interfaces to integrate it into the car's platform, many results still remain in the laboratory.

The important progress achieved by unmanned vehicle platforms, further promote innovation, leading the development of unmanned vehicles industry.

Your vision for the future

1 research, cognitive mechanism, how to form a Computable model, we explore the many ways, needed further improvement.

2, in the cognitive process context-aware topology information expression and understanding.

3, to explore the emerging multimodal sensors, such as integration with voice and video information.

Study human-machine intelligence 4, mixing, this is version 2.0 has just referred to by the national artificial intelligence, we have to study human-machine hybrid intelligent systems.

  No one challenge platform hopes more results on cognitive science.

The following speeches:

Distinguished guests, ladies and gentlemen, good morning! If one day you sit in the driver's car, saw cab with no driver, or the driver does not grasp the steering wheel, you don't be shocked, because we have entered an era of unmanned. You may have imagined, from Changsha to Wuhan, more than 2,800 kilometers of road, and a rainy day sunshine, manual intervention only accounted for 0.75% of the entire section, from Beijing to Tianjin, 150 kilometers of road, without human intervention, to achieve full autonomous driving. May be more difficult for you to imagine that, in an autonomous section inside the vehicle through the 2.08-meter barriers only 11 centimeters of wool, autonomous vehicle efficiency is the artificial 5 times.

We today give us into the State Committee on NSFC major research plan Visual or auditory information through the eight years of a story.

Cognitive computing to Visual or auditory information, I compare it to 8 years 5 years of the war of liberation and the Korean war. After 8 years of this project demonstrate until 2008 in the National Natural Science Fund Committee was established, significant Visual or auditory cognitive computing project. Here we would like to thank Zheng Nanning, Li Deyi, Chen Lin and Sun jiaguang, academician.

Visual or auditory information is first seen this picture you can see that this figure is referred to in an article published in 1997, we can see that God's special favor. From our eyes to the cortex of the slightly curved. This link is connected perception and information processing section, we think such a long sensing and processing parts, our tactile and auditory perception and sections are not so long, so we see that the eyes are the Windows of our hearts. We obtain 80% information from sight, our 60% are associated with visual cortex.

Vera Bradley cell phone case

Hearing is very important of part, natural figure as after coefficient coding yihou of dilute function, in recent years of research we more found has touch and Visual isomorphism, this on makes we thought blind of eyes special good, future we can through artificial camera put Visual coding into touch coding, let blind sense by external of world, this two years has has artificial retinal of appeared, desire through another a article channel. We also found that in these two years, it is this voice, voice drying characteristics in a sparse coding are particularly good. Voice also has bottom like a tactile structure? This is the problem we have to study. Audiovisual awareness information in this plan refers to Visual or auditory perception with others voice and images related to the text.

Audiovisual awareness information in our daily lives very much, we have a variety of means, we call for information, cell phones, cameras, network cameras, satellite remote sensing and so on and so on. They should be able to think, we now live in a three-way world, where three world? Is an online world, the physical world, and their shared knowledge of the world. Before the network, we live in a world of duality, three people working in the world? I am a student that you give me both in English translation, I said you give me, he gave me at once, he put it in the Google translation check. Global intelligence in the past we it's hard to say the robot, its intelligence is part of today in the online world, robots can get global intelligence.

Speak one of the most simple examples of autonomous vehicle completely to an unfamiliar environment, such as from the airport to our venue, it simply find a path, sogou map it can plan paths, our venue can be found through pattern recognition camera, into the venue, which is brought about by the network. As we know, the network there are a lot of, we call the mass of audio-visual perception of sensory data. How quickly and effectively we find these techniques, turning it into the available knowledge, which is very important in the drone.

As we know, the voice inside the monitor and sound monitor, first of all, we take a look at the video monitor, such as Beijing, now has more than 1 million camera, is the amount of information that it does? An hour is equivalent to the amount of CCTV, the sum total of all of the programs, so a large amount of information is difficult to obtain basic processing in a timely manner. Voice monitoring, we take a simple number, such as Falun Gong outside fit phones up to 500,000 a day, daily call volume for 400 million minutes, how do we deal effectively with the information, but also very difficult.

We'll look at a comparison between robots and people, we are very clear, structured information, such as reports, examination of student information registration, that kind of thing. Machine capacity far greater than man, but for non-structured information, such as auditory information, driving process Visual information far superior to machines, we can quickly find my familiar friend in the crowd. Someone driving a car can drive in any complex environment. Despite the machine's calculating speed faster, but the cognitive abilities of the computer is very backward, its cognitive ability than a three-year-olds.

We compare one machine parallelism is the wall on the approach, perceived, are particularly good at cross-modal information, you know the cat's visual cortex, auditory cortex are heavy, but it is separate. Visual touch audio, how do these work with? Why do people have such a strong ability, integrity and selectivity are people strong. People in the crowd could see the mother, is the mother of children recognized by the twins, the machines were in the wrong. And children know what he meant was, this is the machine can't do.

We regard the guiding ideology of auditory perception is to study visual auditory perception, how to encode these information in the brain is? How do brain regions cooperation information integration? And put these into quantifiable model, so we can encode the auditory and Visual information, through a calculation model for processing to perceive and understand the environment, and human understanding that understanding and comparing eventually make use of this technology in unmanned areas.

About our great plans from 2008 to 2017 for 8 years, our objective is the study of human cognitive mechanism of Visual or auditory, developing new computational model, improve your computer with people related to the Visual or auditory perception images, voice and text comprehension and efficiency, make an important contribution to national security and the national economy. So we focus on this need, our goal is around the cognitive processes of expression and calculation of basic scientific questions.

    , Extract features of perception, expression and integration, which we will explore basic features of human visual or auditory information extraction, expression and integration mechanism, laying the Foundation for the build efficient calculation model.

    Second, machine learning and interpretation of sensory data, mainly around the image, voice, and language data is unstructured and semi-structured features made computers difficult to realize the transformation from data to the semantic layer, establishment of new machine learning methods are effective ways of achieving this transformation.

    Third, on cross-modal information collaboration and calculation of learning. Visual and auditory information it is a dynamic sequence, it can be expressed as what? Sports manifold forms, for example, this is a form of Visual information, it is auditory information. Visual auditory integration first of all to find both the public part of the information flow, and then called integration of information before it can be to deal with the integration of information.

Multimodal fusion we take a look at two basis functions of sensing information is the same. Function of image and sound base is not the same, this leads to the concept of sparse, if the difference between the two functions is relatively small, we can find the public portion, this is such a principle based on sparse coding. So our expected results are organized around three core scientific questions to leading basic research. Mainly we were Visual or auditory information on basic theory of cognitive problems after an eight-year effort has made a lot of progress, the three have made breakthroughs in key technologies, such as Visual or auditory information for collaborative computing, natural language understanding and audiovisual awareness related to the cognitive brain-machine interfaces. We also created two international competitions, a challenge is the future of unmanned vehicles, a game is a brain-computer interface.

Here's a look at our achievements, we have three national natural science award.

(PPT)

    Driving brain is our outstanding research achievements over the years, its main achievement is to simulate our driving experiences. One called down and put our people in this environment how to make decisions, and learn through our long experience of human cognitive abilities. There are learning and audio-visual information one of our time we remove people in driving some of the emotional impact, such as a person in the drive may have some emotional effects, is to get rid of in our cognitive processes.

    We continue to look at, this is our brain, it has a long memory, personality, one's character decides he drove conservative or that some people are more assertive. We have long-term significance, who formed in the process of driving experience and skills in the long term. Motivation is to finish path planning for one-time tasks from start to finish. Short-term memory: indicates that the driver's attention, just driving around and the current situation of the past. Emotions: denial of emotional part of the brain into the driver's brain, never distracted because of the emotions, the robot has been dedicated. Everyone has seen in the past, said a driver after a street when Street building have a picture of a pretty girl, and crashed, robotic unmanned vehicles is now can put an end to this phenomenon. Another is learning and thinking, such as SLAM on the basis of, memory match, complete quadratic programming, to decide the next time. This is driving the concept of brain.

We take this concept from the uplink and downlink are following a pattern, our eyes, our ears can perceive the outside environment. Our people based on perceptions of environmental information, for example, where, beside this there are no obstacles and goals through long-term memory area for decision in this case how do I drive, this is called action. And between information and information on the action, did I achieve my results, they form a closed loop, from situational awareness to the situation analysis and decision-making, to the precise control and online actions. So we formed such a driver's brain, earlier we called perception part, called perceptual domain. Planning component is called cognitive domain, this part we call the action field.

So, for example, our first driving the car is GPS, radar and optical systems, in general we GPS is not used in the game, and then form a long-term and short-term memory, and the integration of sensory information, form the driving situation maps. There is a very important concept is the right of way, is itself occupies space in the formation, formed on the basis of the ownership. Such as speed there should be little change, how big should it one around the corner, forming decision memory pool. By controlling modules unmanned vehicles, from perception to decision to control, forming a closed loop. This is done by NVIDIA DrivePX autopilot system.

(PPT)   

This is the last form of wheeled robots, to learn from experienced drivers to drive on the left shows the driving experience of the past, this side is a driving experience, we see that this is our driverless cars on the far right, by perception, through the formation of realize the driver's perception of the situation. And by carefully extracting and forming memories. This is current of cognitive, people in driving process inside current of cognitive, including mountain, and Visual integrated formed of driving situation figure, with experience situation for match, this environment in I completed so a task should how driving, I find I of experience library inside, I past do had this thing, this situation Xia should so do may effect best, found match, found yihou on put this experience used to learning, used to awareness, then manipulation steering wheel.

This process can be achieved by deep learning, like we did, our second model to reasoning, in this environment facing some obstacles, how am I going to do, how fast change of the driving, angle change, creating a model, this model can also express deep learning.

We also have a very important group is the detection of vehicles, which is used since 1998 no article vehicle detection method is completely out of training samples, we present here a way.   This triple reasoning is between two and three dimensional space interact and then confirmed, completely out of the framework of the training samples, full three Uighur Li Yongle scenes and images.

Over the past eight years, we have Visual or auditory information cognitive mechanism and has also done a lot of work, for example in the fields of neuroscience, Neuron and 2012 and the IEEE CVPR2010 on a work. The work in the United States CVPR, greatly improves the efficiency of this method.  This work is about how tactile Visual segmentation, using different time structures of noise down time segmentation of brain processes, optimization of two time scales.

As we all know, internationally an international brain imaging Assembly, the report of the General Assembly are hard to do here, which is elected by the academic Committee of the Organizing Committee, for more than 20 years in our country does not have a go at its 18 session the Academician Lin Chen as the first theme of the Conference report.

Multi-channel brain-computer interfaces we have a very good work, in two consecutive years, this article has been listed as the top referrers of this magazine article.

(PPT)

This is an article about a non-invasive brain-machine interface efficient character input, typing whole characters 1 time times, this article was published in the United States journal of natural science, which is by far the best work in this area. We put the brain-computer interface in terms of unmanned, through the movement of mind control to control unmanned vehicles. Through brain-computer interface for automatic parking. This is from the 2008 games has been insisting since the brain-computer interface, now non-immersion in a brain-computer interface to our leading position in the world.

Here are some shows I will do a report. This figure is the 2011 July from Changsha to Wuhan, a total of 286 km, which lasted 3 hours, 22 minutes, this section has rain, overtaking, entire human intervention only 2140 meters. This work was completed on November 25, 2014 the long distance highway from Beijing to Tianjin autonomous driving test, which lasted 1 hour and 30 minutes.

(Video)

Last two minutes of time to introduce our challenge of the unmanned vehicles, I starting from 2009 to 7 tournament was held last year, the first is the long-BA ecological district of XI ' an, when 2.6 kilometers of road. 2010 is organized by the University in Xian, 2.6 kilometers of road, test curves, and so on. 2011 in Ordos, added to the 10 km behind me there is a table listing the basic situation of several matches. Changshu, starting in 2013, 2014 and 2015 are in Changshu, Jiangsu Province. We have seven games listed in this table, teams growing number in seven years, most of the 22 teams, game scenario is more complex, from 2.6 kilometers to 6.7 km to 10 km to 13.5 km. From the results, manual intervention was largely gone, and the speed is getting faster, includes just me from Changsha to Wuhan from Beijing to Tianjin is done without human intervention, so our game is confined inside the closed roads and the road to real environment.

To summarise, 8 years has made many achievements, some work that we think is very important in the future. First is the cognitive mechanism of research results, how to form a Computable model, we explore the many ways, needed further improvement. Second is the topology information in the environmental sense in the cognitive process of expression and understanding. Explore the emerging multimodal sensors. Such as integrated voice and video information. Is the problem of human-computer intelligent hybrid, this is version 2.0 has just referred to by the national artificial intelligence, we have to study human-machine hybrid intelligent systems.

Finally our idea is that we want to use this platform to publish more results on cognitive science. The second natural language understanding and brain-computer interfaces to integrate it into the car's platform, many results still remain in the laboratory. Third is what we require is the important progress achieved by unmanned vehicle platforms, further promote innovation, leading the development of unmanned vehicles industry.

Finally, I close with this poem today, I share with you: celebrating Global Summit on artificial intelligence and robotics, human slave imitation of WINS, Kiss my Dr farming cont books. (PPT)

Thank you very much!

No comments:

Post a Comment