
It’s amazing how often I see people destroy the quality of grammar in their web pages, by keyword stuffing sentences in the hope this will improve visibility. In this issue of the Manchester SEO blog, I will be revealing how Google reads your text, why keyword stuffing can become a false economy, and outlining the role grammar plays in Search Engine Optimisation (SEO).
It is widely accepted that Google is able to identify which language a particular document is written in, weed out pages of spammy, nonsensical gibberish and will even be able to tell if the text from the current page is duplicated from a page it has read previously. By all accounts, you might say Google meets the criteria for being able to read.
So How Does Google Read Text?
Believe it or not, the basic principle is actually quite simple. By applying word n-grams (see also), Google is able to predict the next most statistically likely word in any given sequence, based on patterns it has become familiar with. By keeping giant arrays of n-grams which keep score of the most common sequences of words, Google is able to track trends in the use of the language itself and is able to measure how closely the patterns of words in your text match those that are widely used. It will know something is amiss when it finds extremely rare sequences occurring regularly throughout your document. Using this method, it is able to quantify how standard your use of language is and therefore (to some degree) how grammatically correct it is. The beauty of this method is that Google’s working language model will adapt as human languages evolve. It will always be aware of the contemporary trends, new slang terms and phrases (and how to use them in a sentence!). Indeed it will be learning in a very similar manner as a child would, and although Google may be a passive user and librarian of the Internet, I would say (in terms of language acquisition), it comes half way to solving the Turing Test:
The Turing Test
A human judge engages in a natural language conversation with one human and one machine, each of which tries to appear human. All participants are placed in isolated locations. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test.
Google Labs have taken this to the logical extreme, demonstrating the power of billions of acquired web pages from around the world, manifesting itself in a tool that will identify the most likely language for an arbitrary block of text input:
The Impact of Language and Grammar on SEO
Obviously there is more to it than this, but the combined effect of employing this method with several others means you will be rewarded for using proper English. While it may be helpful to feature important words and phrases, this must be balanced against the readability and language of your document for the best results!
Tags: content, Google, keyword stuffing, language, manchester seo


[...] How #Google #Reads #Text, Importance of #Language, #Grammar to #SEO http://just.roger-it.co.uk/how.....-language/ [...]
Is Google multi-lingual? Can Google read other languages as well as English?
[...] How #Google #Reads #Text, Importance of #Language, #Grammar to #SEO http://just.roger-it.co.uk/how.....-language/ [...]
P.S. – Sorry, forgot to tell you great post!
Hi there Singapore SEO Consultant, yes Google can apply this method to any language, although some languages present additional problems. For instance – Chinese and Japanese do not have the luxury of whitespace as delimiters for the start or end of words, making it difficult to know how to break up a string of input into its word-parts.
I think Google Labs are showing off in their language detector (in this post) which defaults to Chinese – perhaps their way of showing us that they have solved this problem too!
If you do the n-gram modelling at the character level, as opposed to the word level, it will start to highlight common sequences of characters, enabling identification of words.
The language detection probably works by compressing the text using various n-gram models of languages. The one that compresses it the best is the probable candidate. I remember seeing experiments where this worked reasonably effectively back in the 90’s, so will have come along a fair bit since then.