Speech is the basic means of communication between humans. Using speech, humans can convey their thoughts and feelings to others in a way much more intricate than in any other animal species, and thus the human speech system is the most complicated one and comprises a number of organs - from lungs, trachea (windpipe), larynx and vocal folds, to oral cavity with tongue, teeth and lips, and nasal cavity.
Speech considered as a sound signal contains a multitude of information. Beside what has been said, it includes information on the speaker that reveal the emotional state, the identity of a known speaker or the gender and age of an unknown one. We understand the meaning, perceive the speaker's dialect, education level and culture. We understand what has been said relying of our knowledge of the language and on context. Thus, segmentation of the sequence of sounds that we hear is possible only if we are familiar with the language. Speech perception is, therefore, not an inherited but a learned ability. Furthermore, one can focus on a particular speaker among many, estimate the position of the speech source, and often understand things that have not been actually said, but rather implied.
Acquisition of the sound signal is the first step in speech perception. The brain has to determine whether the received sound indeed originates from speech, because speech is processed in a way fundamentally different from music or ambiance noise. The brain also has to identify whether the language used is one the listener is familiar with. A real-time phonetic analysis of the content is then carried out, without waiting for the speaker to finish the utterance, and ignoring non-speech sounds such as filled pauses, throat clearing etc.
The reconstruction of the entire utterance is performed based on the sequence of the obtained phones, taking into account semantic context as well. The meaning of the utterance will thus most probably be reconstructed correctly even if certain phones are missing or are poorly articulated, which is often the case in spontaneous speech.