A Weed by Any Name: Duplicate Content, Scraping, Plagiarism

https://www.convinceandconvert.com/content-marketing/is-social-media-creating-a-plagiarism-problem-infographic/
Hearing Health & Technology Matters
August 18, 2015

HHTM editors have been spending their time uneconomically of late, weeding out problems (and learning new words) that we didn’t even know existed back in 2011 when HHTM began.  Economics is defined as the allocation of scarce resources–a familiar concept to any audiologist who runs a practice; ditto for any editor responsible for gathering original content. Productivity and bottom lines plummet when resources (e.g.,time, expertise, money, knowledge) are frittered away on tangential, even trivial, tasks (e.g., pouring over spreadsheets, punching holes in chart paper, chasing duplicate content).

Frittering away resources (opportunity costs in econ lingo) means foregoing essential tasks like testing hearing, fitting hearing aids, keeping abreast of new technology, or  reading and writing new post material.  Frittering diminishes market position, income, credibility, curiosity, peace of mind.  Witnessing such waste makes for anxious owner/audiologists and irritable editors.  Today’s post is a step toward restoring order in the editorial House  — not just at HHTM but in our profession.

A Few Weeks in the Lives of Web-Based Editors and Webmasters

 

Upwards of two million blog posts are written daily on the Internet, much of it copied.  Web content plagiarism is thought to exceed 63% in 2014, up from 25% in 2009 and 44% in 2011.  HHTM’s Internet contribution is tiny–just under 1/millionth of the daily Internet content–but mighty and copyrighted.  We aim for 0% plagiarism and minimal duplicate content on our site and elsewhere.

As it turns out,  it’s mighty difficult to achieve either goal without frittering away those previously-mentioned scarce resources.  What, you ask, is the difference in plagiarism and duplicate content from the perspective of the Internet?

 

Duplicate Content:  It’s a Google Thing, Sometimes It’s a Legal Thing

 

Online empires, like Google, know how important authentic content is, and it is well known that a website that has copied content is penalized with a lower ranking. (quote from Emilia Sukhova)

Case 1.  A well-respected non-profit (NPO) hearing care website, replete with a top-name board of directors, used HHTM content with abandon on its site for over three years before it was discovered.  While we appreciated their spreading the word and acknowledging authorship, HHTM content was simply copy and pasted onto their site and submitted to Google, creating identical copies of the same Web page.  In addition to infringing on HHTM’s copyright, the NPO had ignored Google’s quality guidelines, which encourage Website owners to deliver high-quality, original content.

Google calls such antics “duplicate content,” or “scraping{{1}}[[1]]Scraping is simply a mechanism, usually automated versus manual, used for “data mining” and can arguably be used for legitimate purposes.  But when the data is someone else’s content, it’s infringement.[[1]].”  When it finds scraping–and it always does–Google sends algorithms with misleadingly cute names{{2}}[[2]]E.g., Panda.[[2]] to mete out punishment by taking one of the  websites to the SEO woodshed for a downgrade or even blacklist.  Those who ignore Google, through laziness, lack of oversight or gaming of search rankings, can find Web pages languishing in search results, or their entire Websites penalized, recovery from which can be very difficult and costly. Google is pretty good at figuring out which website to penalize, but relying on Google for justice is not a good business practice.{{3}}[[3]]Identical copies of the same Web page can affect the original articles’ performance in search results and risk worse for the scraper.[[3]]

We contacted the NPO.  An administrator sent an apology for the “mistake by a former employee who is no longer with us,” removed our posts, and assumed that was that. Which it was not.  On the extremes, she faced stark choices that created new consumption problems for the NPO’s own scarce resources:

  • Door 1:  Take down the entire NPO website.
  • Door 2:  Take necessary steps to remove infringing articles from Google’s search index.

Choosing Door 1 is draconian, not a realistic choice, but it does fix the problem.  That leaves Door 2 as the only choice, which preserves the site but is costly in time, money, and expertise.   Eventually, the NPO found and hired a webmaster who resolved the problem and set the NPO website on a new and better content  path.

HHTM’s take:

  • Resources frittered:  35 emails, extra billing hours from our webmaster.
  • Weed level:  Nuisance infestation, call the Orkin man for a one-off.
  • Future planning:  Use experience to create a protocol for enforcing HHTM’s copyrights under the Digital Millennium Copyright Act (DMCA).

Plagiarism:  It’s an Ethical Thing

 

Not everybody, especially aggressive Internet marketers,  thinks duplicate content is a bad idea, but everybody agrees that stealing is a major violation.

There is a big difference between scraped content and copyright infringement. Sometimes, a company will copy your content … and claim the credit of creation. Plagiarism is the practice of someone else taking your work and passing it off as their own. Scrapers aren’t doing this. But others will, signing their name to your work. It’s illegal, and it’s why you have a copyright symbol in your footer.  If it happens to you, you’ll be thinking about lawyers, not search engines.

Case 2.  This was a case study of how a Website owner can get into trouble by trying to game Google to get search ranking, and the potential fall-out faced by other Website owners as a result.  More importantly, it is a case study of pure plagiarism by doctoral-level professionals in our field.

Since 2012, A US private practice audiology website’s blog contained multiple unauthorized republications of copyrighted articles by HHTM as well as from other hearing healthcare websites and the Huffington Post.  Articles appeared to be cherry-picked for high-value keywords.  All were manually stripped of bylines, copyrights, links and other identifying information.  All were represented as original content on the renegade audiology website.

HHTM contacted the owner/Audiologist.  A practice employee sent an apology, explaining that they’d had an “outsourced company do blog posts for the bulk of this and were unaware of the infringement issues.”  Posts were taken down, the painful Door 1/ Door 2 process was reiterated.  Ultimately, the practice hired a webmaster who informed them it would take “a few weeks” to kill the webpages completely.  Ninety unattributed HHTM posts were removed from their site.

HHTM’s take:

  • Resources frittered away by HHTM:  53 emails, extra billing hours from our webmaster, a beer owed to a competitor who helped track down the practice owner’s email address.
  • Weed level:  Severe kudzu.  Structural termites, too.  Set up long-term exterminator contracts.
  • Future planning:  Get an intellectual property attorney. Develop Cease and Desist letter. Establish protocol to seek injunctive relief.

Professionals Pass the Buck

 

“We must be doing something right if people want to steal our stuff.”  (comment by one of HHTM’s more sanguine editors)

Cases 1 and 2 are just instances of numerous scraping and plagiarizing of “our stuff.”   Though they differ in severity, they share similarities that reflect on our profession:

  • Every responsible party involved has been a professional individual, business or organization.
  • Responsible parties  (owners, Board members) never responded  to HHTM or took ownership.  Instead, they delegated the tasks of apologizing and correcting to administrative  (non-Audiology) employees.
  • The blame was shifted to  non-Audiology persons or entities that were no longer associated with the practice or organization.

This sets up two puzzles.

The Professional Puzzle:  Presumably the Website owners or decision makers learned about plagiarism and copyright infringement in high school or college, whenever they took their first English course that required writing a research paper. Certainly, all individuals with a PhD or AuD after their names know full well when they are cheating.  Surely they accept the responsibilities that come with their professional titles, certifications, licenses, and bank accounts.

The SEO Puzzle:  Most Audiology practice owners are not Webmasters or well-versed in Google lore, but most if not all are de facto managers and marketers who include Websites in their marketing plans and budgets.  Surely owner/managers accept the responsibility of developing or hiring expertise to follow Google’s best practices to earn greater visibility and reputation in search results.

Why?

 

  • Why did PhDs and AuDs assume that it’s okay to take a smorgasbord approach to gathering other people’s content from around the Web to publish on their own Websites? And not just any people, but other professionals in their own profession?
  • When it wasn’t intentional, why weren’t the responsible professionals paying closer attention to what they were publishing on their own sites?
  • Why did owners/managers not perform due diligence before hiring or outsourcing for essential non-audiological practice development?
  • Is our training deficient, or not keeping up with technology?
  • Is the medium to blame?  Are professionals confusing marketing with academic writing on the Web?
  • “Borrowing” content has been a problem since the Web was born, and will continue to be a problem. Setting aside the SEO issues, it’s unethical.  Why isn’t it taken more seriously?
  • Who EVER thought Audiologists would become familiar with terms like injunctive relief, much less seek it from their own brethren?

Readers are encouraged (actually begged) to submit posts and write comments to flesh out our understanding of the conundrums posed by these examples and questions.

Fig . From Wynne, Practice Management p 425

Fig  1. From Wynne,  Basic Computer Principles, p 425.  In Hosford-Dunn, Roeser & Valente (Eds) Audiology: Practice Management (1st Edition, 2000).  New York:  Thieme.

A Rose by Any Name

 

Going from specific cases to general principles, it’s clear that the real issue and the right answer are square in front of us.  The problem is one of ethics,  not weeds in the Google garden.   There are plenty on online instructions on how to avoid plagiarism{{4}}[[4]]Including “self-plagiarism.”[[4]] and behave ethically on the Web{{5}}[[5]]For instance, see WordPress’s simple description and direction for avoid plagiarism.[[5]].

The right answer is easy:  If you act ethically by writing original content on your site, the SEO problem is a non-issue.

feature image from BigStock.com; Fig 1 from Audiology Practice Management (1st Ed)

  1. As a provider of web services to the world of Audiology we see this issue constantly. I understand your frustration and questions in relation to the owner professional. However, we have taken over sites that owner professionals have trusted the running of to established professionals. The actions undertaken by these people have been badly advised at best and downright disgraceful at worst.

    We have seen scraped content direct from RSS feeds in a supposed blog. We have seen duplicate content with absolutely no attempt to hide it. We have also seen sites being used as link farms to support other sites for other Audiology Practices in the US. This was done with buried anchor text links in blog posts, for instance the phrase the best hearing aids would be linked to the website of another Practice. This link strategy is one that will be hammered by Google when they eventually cotton on that they are un-natural links.

    The reason why they haven’t is because they do not know enough about the Audiology world. However, in the case of the sites we have taken over, we have removed all the outbound links and we have disavowed incoming links from other audiology sites that were also acting as link farms. This is the second one we have done and Google is going to work out why. We feel it is important to protect our Client, therefore we took the decision to disavow, we did not do this lightly. It sends a signal to Google that we feel the inbound links are shall we say spurious, they are.

    I and my partner personally believe that when we are entrusted with something as important to a Practice as a website, it is our responsibility to act in the best manner for that Practice. We spend a lot of time and energy generating original content for each website design, we also write 40 articles a month for blogs belonging to Practices. We ensure that these articles do not fall foul of the plagarism algorythms. It isn’t easy, but that is what we are paid to provide.

    I am glad that you have posted this article, because ongoing practices by some web providers is pretty bad. However, it is not these companies that will be penalised by Google, it is the Practice owners who have entrusted them with their websites that will feel the pain.

    1. mjaudseo

      Thanks= for your thoughtful, supportive comments. Yes, it is practice owners that feel the pain, just as they do if they hired inexperienced, unethical or under-educated hearing practitioners to provide services to patients. Hiring and outsourcing are important parts of practice management, including hiring web providers.

  2. Great article Holly!

    I’d like to offer a few free resources for practices interested in verifying the uniqueness of their web content. Copyscape and Plagspotter are two free tools to use for a quick check on your content. You will have to check every URL that you are concerned with. Blog posts and manufacturer-specific content is typically the most duplicated in our experience.

    Best,

    Stephanie

    1. mjaudseo

      Thank you for your comment, Stephanie. Our editors are instructed to use Duplichecker to check content, but perhaps your suggestions are an improvement. In your experience, how does Duplichecker compare to Copyscape and Plagspotter?

Leave a Reply