Clean up HTML on paste in CKEditor
We use CKEditor at FiveFilters.org for our PastePad service. The idea is to allow users to paste content that’s not currently publically available on the web for processing with one of our web tools. This can be content that’s in a Word document, an email, or behind a paywall.
CKEditor can automatically clean up HTML it identifies as coming from MS Word, but there’s no way to force cleanup on all pasted content. By default, HTML cleanup occurs in the following two cases:
User clicks the ‘paste from word’ toolbar icon
User pastes content copied from MS Word itself
In the second case, CKEditor looks for signs of MS Word formatting. It does this by testing whatever you paste against the following regular expression:
/(class=\"?Mso|style=\"[^\"]*\bmso\-|w:WordDocument)/
If there’s a match, it will be cleaned up. Otherwise it will paste as normal.
I want to avoid editing core files, so my solution is simply to ensure that this regular expression always matches pasted content. Here’s what I’ve come up with:
CKEDITOR.on('instanceReady', function(ev) {
ev.editor.on('paste', function(evt) {
evt.data['html'] = '<!--class="Mso"-->'+evt.data['html'];
}, null, null, 9);
});
I haven’t tested extensively, but this appears to work as expected (CKEditor 3.6.2). You can try it out.
What the code does is it registers a new listener for the paste event, just like the Paste from Word plugin. When it receives the pasted HTML, it simply prepends an HTML comment containing one of the strings the Paste from Word plugin looks for. The listener has a priority of 9 to ensure it runs before the plugin which will trigger the actual cleaning (default priority of 10).
Note: I posted this solution on StackOverflow as an alternative to another solution, titled “CKEditor – use pastefromword filtering on all pasted content.” StackOverflow recently deleted some of my answers (and hid them from me) so I’m moving the rest of my meagre contributions over to my own blog.