Geez Bones, this is basic stuff and you're actually requiring me to type a textbook here. There are innate switches in our brains for the facilitation of language. They get set on and off during the acquisition phase. The switches control what sounds are used in your native language's phonemic inventory. The ones that could have been set on as a child, but are not required in the native language, get set off. You don't use them. And they become difficult to learn in adulthood.
The Indian retroflex 't' is one example. Indians make the 't' sound by curling the tongue backwards and hitting the roof of the mouth. Our 't' is made by our tongues touching the alveolar ridge, which is forward. Once you get past a certain age that ability is either cemented in or missing. And learning later in life is never a natural process. Although many actors have done a great job of imitating accents and dialects with much work (although Pygmalion was based on false assumptions about language acquisition and Shaw had many crazy, pre-linguistics, ideas about language). But that's not natural and hard to do consistently or to know the exceptions to other dialectical pronunciation rules.
Children add new words to their lexicon at a slow pace which gradually increases, and then goes through the roof after the age of 5 and continues throughout most of our lives. Again, these are not things misunderstood by linguists. If I mentioned half the equation to someone else studying linguistics, because I don't want to write a book, we would still have a common understanding of exactly what language acquisition means and what it doesn't mean.
Everything affecting language is 'human', but some factors are innate, some are conscious, and some are unconscious. 98% of our thoughts are unconscious. To use cognitive linguistics and George Lakoff specifically, our speech is formed in frames. We know the central point of what we're going to say before we can even formulate the sentence that carries that thought. Ideas do not exist in sentences. Our accompanying hand gesture to indicate the peak of a mountain can start in the sentence before we get to the mention of the mountain peak. Our hand already knows where the sentence/grammar/words are going because we don't think in those terms, we think in ideas, concept, frames. Forming the sentences and phrases are unconscious and conscious. We consciously modify or chooses better words as the words are coming out of our mouths. We can also plan what we say, but sometimes that comes out different than we planned.
Those rules of grammar and sentence construction are acquired. What consonants sound like next to other phonemes and how they change in different settings are not learned from reading. Orthography doesn't even capture sounds as we make them (the 'knight' example, there is no silent 'k', there is simply no 'k' in 'knight' in English, only in orthography because it's left over from past languages that did pronounce it exactly the same as it was written).
The majority of human existence had spoken language and no written language. Books can only influence language because you also hear the spoken version in your head. And many times, like Sartre's self-taught man in Nausea, if you haven't heard people actually say the words you're reading you'll come up with the wrong pronunciation (and look stupid). There are no consistent pronunciation rules for written English. Orthography and all writing is a byproduct of spoken language. And it certainly doesn't drive pronunciation or prosody changes when it does a poor and inaccurate job of representing it.