Using machine translation to create ‘original’ content

25 December 2009

This is a speculation based on a query submitted to one of the forums I regularly participate.

The original poster insisted that his / her blog is 100% original, but it has been penalized for duplicate / unoriginal (i.e. copied) content. At this point, the first thing I do is to search for any duplicate sources. There have been many who claim to be writing original content, but in actual fact they had just lifted passages from elsewhere. If two or more sentences are the same verbatim, then it is most probable that the blogger / site owner copied the passage from another source. There used to be those who just altered the syntax a little or changed a few words, but it was still easy enough to spot if something had been copied.

In this case, there were no obvious duplicate copies. Yet there was something that struck me as strange and remarkable. The prose was odd, and consistently odd. There were also a few words that were not in English. If anyone had written or edited the passages, he or she would have spotted them. So I had the feeling that these passages were copied from non-English sources, and translated into English using an on-line translation service. Some machine translations are impressively good, but they are still not perfect, and they are often clumsy in style. This would explain why some words were never translated, because they are not in the database, and why there were oddities in English that were consistent.

These are my suspicions and yet to be substantiated, however, if this were the case, then it represents an attempt to beat the penalty on duplicate content. But perhaps search engines are wise to it, and have already detected such attempts, as it seems the case in the particular example I have encountered. In any case, search engines are likely to have a better grasp of languages in the future, and even robots will be able to notice mistakes and oddities in grammar, syntax and style. When that happens, though, machine translations would be too good, and this strategem may actually work.