How to Clean HTML from Google Docs (2020)

GDoc2HTML homepageGoogle Docs is a great tool for collaboration when you create content in teams. It makes life much easier for editors, and it means that you can avoid using Microsoft Word’s cumbersome Track Changes feature.

But if you need to put content into WordPress, it’s a bad idea to copy and paste from Google Docs. And although it has an Export to HTML function, it spits out some surprisingly messy code.

You could just copy and paste everything into a text editor and then copy and paste it into WordPress. If you’ve gone to all the trouble of adding styles in Google Docs, you probably don’t want to do this.

There is an easier way that outputs clean code from Google Docs without the need to recreate all of your formatting.

Omar al Zabir created GDoc2HTML, a script that runs from the Google Scripts editor. Jim Birch subsequently forked it and improved the code output. I recommend you use Jim’s version, as it’s the most up-to-date. There’s just one thing it can’t handle: links. It turns them into underlined text. But it’s still pretty useful.

How to Use GDoc2HTML

Don’t panic. You don’t need to write any code to use this script. It’s a simple copy and paste operation that takes a few minutes:

  1. Open the Google Docs document that you want to export as clean HTML.
  2. Click Tools -> Script Editor in the toolbar. A new tab or window will open containing the Script Editor.
  3. Delete the existing code in the Script Editor.
  4. In a third tab, open the raw GDoc2HTML script. Copy and paste the entire script into the Script Editor.
  5. Click on the name field at the top of the document and give the script any name you like.GDocs2HTML rename document
  6. Click the Run button Run button at the top of the Script Editor.
  7. Click Review Permissions to authorize. Be sure to click Approve.
  8. When you come back to the Script Editor, click the Function drop-down list, and select ConvertGoogleDocToCleanHTML
  9. Run the script again.
  10. Check your email for a completely clean HTML version of your document.

Two things to note: you won’t get a notification that the script has completed. You’ll know it worked because you’ll get the email.

If the email doesn’t arrive, check that:

  • It isn’t in your Spam folder
  • You’ve allowed a few minutes for it to arrive.

How to Reuse GDoc2HTML

When you add a script to a Google Docs document, it’s a one-time operation. So when you start a new document in Google Drive, you’ll need to repeat the entire process.

While this isn’t ideal, it’s far better than manually editing HTML code. Once you get used to the process of running the script, it only takes a few minutes each time. And it has probably saved me hours of admin, so I’d say that’s a good compromise.

I do recommend that you bookmark the raw script link so that you can easily access it next time you need to export code. Alternatively, bookmark the GDoc2HTML Github page.

How Do You Clean Code?

Messy code can be problematic for bloggers. Do you know any other methods for cleaning up Google Docs HTML? Let me know about them in the comments, and we’ll feature them on the blog.


2 thoughts on “How to Clean HTML from Google Docs (2020)

Leave a Reply

Your email address will not be published. Required fields are marked *