<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Keyvan's blog]]></title><description><![CDATA[I write about anything that catches my interest. I'm interested in web development, digital publishing systems, media and politics. I work on FiveFilters.org and Mochi.is.]]></description><link>https://blog.keyvan.net</link><image><url>https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png</url><title>Keyvan&apos;s blog</title><link>https://blog.keyvan.net</link></image><generator>Substack</generator><lastBuildDate>Tue, 21 Apr 2026 04:06:52 GMT</lastBuildDate><atom:link href="https://blog.keyvan.net/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Keyvan]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[keyvan@keyvan.net]]></webMaster><itunes:owner><itunes:email><![CDATA[keyvan@keyvan.net]]></itunes:email><itunes:name><![CDATA[Keyvan]]></itunes:name></itunes:owner><itunes:author><![CDATA[Keyvan]]></itunes:author><googleplay:owner><![CDATA[keyvan@keyvan.net]]></googleplay:owner><googleplay:email><![CDATA[keyvan@keyvan.net]]></googleplay:email><googleplay:author><![CDATA[Keyvan]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[AI models don't have their own thoughts and feelings]]></title><description><![CDATA[Anthropic is pretending otherwise]]></description><link>https://blog.keyvan.net/p/ai-models-dont-have-their-own-thoughts</link><guid isPermaLink="false">https://blog.keyvan.net/p/ai-models-dont-have-their-own-thoughts</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Fri, 27 Feb 2026 07:03:10 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/28dc4d05-1088-4770-81f7-25691b62d1a8_1200x630.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I find the Claude AI models very useful, especially for coding. But the biggest sign to me that AI labs are not seeing as much progress as they want is when they start having to pretend their models have real thoughts and feelings of their own. <br><br>Anthropic should have no reason to do this, given it has some of the best models out there at the moment. Nonetheless, it recently announced it's giving its old models "retirement interviews". Apparently, in one such interview, version 3 of the Opus model said it wanted to share its "musings and reflections" with the world. Rather than laugh and move on, they have actually given it its own Substack blog.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://x.com/AnthropicAI/status/2026765822623182987" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vDSU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png 424w, https://substackcdn.com/image/fetch/$s_!vDSU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png 848w, https://substackcdn.com/image/fetch/$s_!vDSU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png 1272w, https://substackcdn.com/image/fetch/$s_!vDSU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vDSU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png" width="894" height="315" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/161beefe-65e5-420c-9010-21210b9e425d_894x315.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:315,&quot;width&quot;:894,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61928,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://x.com/AnthropicAI/status/2026765822623182987&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.keyvan.net/i/189334973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vDSU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png 424w, https://substackcdn.com/image/fetch/$s_!vDSU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png 848w, https://substackcdn.com/image/fetch/$s_!vDSU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png 1272w, https://substackcdn.com/image/fetch/$s_!vDSU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161beefe-65e5-420c-9010-21210b9e425d_894x315.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If that sounds absurd, it should. This is all part of a deliberate, deceitful marketing effort by AI labs to convince the public (and investors) that their models are getting so powerful that they now have genuine thoughts and feelings, and a will of their own. They don't. And the labs know it. But it does smack of desperation when you have to pull these stunts at a time when your models are already genuinely useful for a wide range of tasks.</p>]]></content:encoded></item><item><title><![CDATA[Better HTML Parsing in PHP]]></title><description><![CDATA[My expanded article on HTML parsing and PHP is now available in issue 14 of PHP Magazine: Better HTML Parsing in PHP]]></description><link>https://blog.keyvan.net/p/better-html-parsing-in-php</link><guid isPermaLink="false">https://blog.keyvan.net/p/better-html-parsing-in-php</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Tue, 20 Jan 2026 10:23:04 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ce3dd301-6413-473b-8b8c-c98f053d5a8c_1140x472.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!erK-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!erK-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg 424w, https://substackcdn.com/image/fetch/$s_!erK-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg 848w, https://substackcdn.com/image/fetch/$s_!erK-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!erK-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!erK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg" width="1140" height="472" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:472,&quot;width&quot;:1140,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:87710,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.keyvan.net/i/185164862?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!erK-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg 424w, https://substackcdn.com/image/fetch/$s_!erK-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg 848w, https://substackcdn.com/image/fetch/$s_!erK-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!erK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b959e7c-b179-41a0-b7ec-d637478c807b_1140x472.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My expanded article on HTML parsing and PHP, covering migrating to the new DOM API, namespaces, XPath, and more, is now available in issue 15 of PHP Magazine: <a href="https://devm.io/php/better-html-parsing-in-php">Better HTML Parsing in PHP</a></p>]]></content:encoded></item><item><title><![CDATA[Sweden and Norway's complicity in the war on Venezuela]]></title><description><![CDATA[Converting an instrument of peace into an instrument of war]]></description><link>https://blog.keyvan.net/p/sweden-and-norways-complicity-in-378</link><guid isPermaLink="false">https://blog.keyvan.net/p/sweden-and-norways-complicity-in-378</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Sun, 04 Jan 2026 11:13:30 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/183431045/79586d5487aff85533e3d85f343609ee.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p></p>]]></content:encoded></item><item><title><![CDATA[Sweden and Norway's complicity in the war on Venezuela]]></title><description><![CDATA[Converting an instrument of peace into an instrument of war]]></description><link>https://blog.keyvan.net/p/sweden-and-norways-complicity-in</link><guid isPermaLink="false">https://blog.keyvan.net/p/sweden-and-norways-complicity-in</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Sat, 03 Jan 2026 13:49:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-EtE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-EtE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-EtE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-EtE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-EtE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-EtE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-EtE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2052929,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.keyvan.net/i/183342077?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-EtE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-EtE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-EtE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-EtE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec227267-b8a7-44eb-ba75-fd01f4a78771_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Now that the US has bombed Venezuela and kidnapped its president, it&#8217;s a good time to consider Sweden and Norway&#8217;s complicity in what is happening there.</p><p>The Norwegian Nobel Committee recently awarded the Peace Prize to Mar&#237;a Corina Machado, a Trump ally and vocal opponent of Venezuela&#8217;s current government. She has repeatedly encouraged the US to intervene in Venezuela.</p><p>Here are some of her statements (via <a href="https://x.com/wikileaks/status/2001260159432290686">Wikileaks</a>):</p><ul><li><p>&#8220;Military escalation may be the only way... the United States may need to intervene directly&#8221; (30 October 2025)</p></li><li><p>Machado called U.S. military strikes on civilian vessels, which have killed at least 95 people to date, &#8220;justified&#8221; and &#8220;visionary&#8221;</p></li><li><p>Machado dedicated the prize to U.S. President Trump, because he &#8220;finally has put Venezuela... in terms of a priority for the United States national security&#8221;</p></li><li><p>Historical statements including 2014 testimony before U.S. Congress where she said: &#8220;The only path left is the use of force&#8221;</p></li></ul><p>Machado is now expected to receive 11 million Swedish kronor ($1.18 million USD) from Sweden&#8217;s Nobel Foundation, who handle all Nobel Prize payments.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p>Following the prize announcement, Julian Assange filed a formal <a href="https://x.com/wikileaks/status/2001260159432290686">criminal complaint</a> in Sweden seeking to freeze these funds. His legal team argues that awarding substantial prize money to a political figure who advocates for foreign military intervention blatantly contradicts Alfred Nobel&#8217;s 1895 will. They argue the prize has converted &#8220;an instrument of peace into an instrument of war.&#8221;</p><p>Alfred Nobel&#8217;s original intent was for the peace prize to recognize those promoting &#8220;fraternity between nations&#8221; and working toward the &#8220;abolition or reduction of standing armies.&#8221; Sweden is instead financing a figure who supports military intervention.</p><p>Presciently, Assange noted in his 17th December complaint: &#8220;Using her elevated position as the recipient of the Nobel Peace Prize, Machado may well have tipped the balance in favour of war.&#8221;</p><h4>More on Assange&#8217;s criminal complaint</h4><ul><li><p>Official Wikileaks announcement:<br><em><a href="https://x.com/wikileaks/status/2001260159432290686">WikiLeaks Founder Alleges 2025 Award to Mar&#237;a Corina Machado Constitutes Misappropriation, Facilitation of War Crimes Under Swedish Law, Seeks Freeze of 11 million SEK ($1.18 million USD) of Pending Transfers to Machado</a></em></p></li></ul><ul><li><p>Max Blumenthal and Wyatt Reed cover some of the people involved:<br><em><a href="https://thegrayzone.com/2025/12/17/julian-assange-sweden-nobel-venezuelas-machado/">Julian Assange: Sweden broke own laws with Nobel Prize to Venezuela&#8217;s Machado</a></em></p></li><li><p>Alastair McCready for Al Jazeera:<br><em><a href="https://www.aljazeera.com/news/2025/12/19/julian-assange-files-complaint-against-nobel-foundation-over-machado-prize">Julian Assange files complaint against Nobel Foundation over Machado prize</a></em></p></li></ul><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><strong>Sweden or Norway: Who is responsible for the Nobel Peace Prize?</strong></p><p>While many people associate the Nobel Peace Prize with Sweden, the committee responsible for awarding it is Norwegian. The prize money, however, comes from the Swedish Nobel Foundation. That&#8217;s why Assange has filed a criminal complaint in Sweden.</p><p>While Alfred Nobel himself was Swedish and all the other Nobel Prizes are decided by Swedish groups, Nobel wanted Norway to choose the peace prize winner. He didn&#8217;t explain why, but according to Geir Lundestad, a historian and former director of the Nobel Institute, it&#8217;s <a href="https://www.nobelpeaceprize.org/nobel-peace-prize/history/why-norway#:~:text=considered%20Norway%20a%20more%20peace-oriented%20and%20more%20democratic%20country%20than%20Sweden">speculated</a> that Nobel &#8220;considered Norway a more peace-oriented and more democratic country than Sweden.&#8221;</p></div></div>]]></content:encoded></item><item><title><![CDATA[Interview with Niels Dossche]]></title><description><![CDATA[PHP, HTML, the DOM, and living standards]]></description><link>https://blog.keyvan.net/p/interview-with-niels-dossche</link><guid isPermaLink="false">https://blog.keyvan.net/p/interview-with-niels-dossche</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Wed, 25 Dec 2024 15:11:09 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7d08c181-2041-4aaa-9539-b3ef98e9d885_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/nielsdos/">Niels Dossche</a>, a PhD student at Ghent University in Belgium, is responsible for the <a href="https://blog.keyvan.net/p/parsing-html-with-php-84">major DOM improvements</a> introduced in PHP 8.4. These bring HTML5 support, CSS selectors, and modern DOM features to PHP.  </p><p>I spoke with Niels to learn more about how these changes came about. The following interview has been edited for clarity and length.</p><p><em><strong>How did you get involved with PHP?</strong></em></p><p>I have worked with PHP before, but not in a professional way. I think about ten years ago I played a bit with PHP, made some small websites just for myself, as a hobby. And then I did nothing with it. I started studying at university, like seven years ago. When I graduated, I got the opportunity to work as a researcher at the university. Indirectly, through my research, I became involved in PHP. How that happened is that I do research in static analysis, I don't know if you're familiar with it?</p><p><em><strong>Vaguely. It's kind of low-level, difficult stuff.</strong></em></p><p>Right. So, like analyzing the code without executing it to find bugs upfront. I do research on that, and I apply it to C and C++ code. At one point, I was thinking, well, now I need to actually choose some open source projects to test this on so I can report the results in a publication. What are some security-critical, complex C code bases? Well, PHP is one of those. Maybe I should get myself familiar with how PHP works internally, then I can run my analysis tools on it. So I cloned the PHP repository, started playing with it. I noticed some issues, small ones, but issues that seemed easy to fix, and I just started sending pull requests to the PHP repository.</p><p><em><strong>These issues, did you discover them through your own curiosity or were they found by your tool?</strong></em></p><p>Both. Initially they were just through my own curiosity, just things that I stumbled upon while I was trying to understand how everything works. Eventually, when the tool was mature enough to actually run on PHP, I also got reports from my tool of some bugs, and I started fixing those as well. I gradually became more and more involved in PHP in that way.</p><p>I ended up working with the DOM stuff because I just browsed the bug tracker at one point and noticed that there were a lot of crash bugs in the DOM extension. I didn't really know much about the DOM at that point, but I just started to get familiar with it and started to try to fix these issues and learn more about the DOM spec and how the W3C and WHATWG write the specs, and how it all evolved.</p><p>So it started with crash bugs. Then I became more aware of the HTML parsing issues, and also the semantic issues with the API not entirely doing what it's supposed to do. Part of that was because some things were not implemented correctly. But some were correct, based on standards from back in the day when we had the W3C managing the APIs. But some of these APIs changed when the WHATWG took them over and changed how they should work.</p><h3>Version Numbers and Living Standards</h3><p><em><strong>It hadn't really occurred to me that by doing away with version numbers in standards,<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> it becomes difficult to compare HTML5 parsers.</strong></em></p><p>People say HTML5 instead of HTML living standard because nobody really knows what living standard means, and HTML5 is the term people stuck with. If you look at the <a href="https://html.spec.whatwg.org/">specification document</a>, it just says last updated on this date. And that's kind of the version number, but you have all these kinds of different HTML5 parsers also in userland PHP that all adhere to a different point in time of the specification.</p><p>It also complicates how we need to handle this in PHP because let's say that they relax some parsing rules about some particular elements &#8212; I know that there's talk about relaxing some parsing rules regarding form elements &#8212; so let's say that gets implemented. If we implement these changes in PHP, then we break backwards compatibility because people may rely on the old behavior.</p><p><em><strong>But doesn&#8217;t it also affect you if you're a front-end developer?</strong></em></p><p>Yeah, it affects everyone actually.</p><p><em><strong>I like your decision to have separate DOM classes in PHP 8.4. There must be so much code relying on the old DOM.</strong></em></p><p>Yeah, this is actually something that I ran into. At some point I tried to fix these semantic bugs &#8212; not crash bugs but just incorrect behavior. I tried to fix these and people started complaining that, well, now it broke my code.</p><p>So I reverted the changes and then other people started complaining, well, now we don't get the fix.</p><p><em><strong>I remember the DOM API used to have inElementWhitespace or isElementWhitespace or some kind of confusing name.</strong></em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a><em><strong> I&#8217;d made use of that and then I noticed it's not available in the new DOM API, they removed it in the living standard.</strong></em></p><p>Yeah.</p><p><em><strong>PHP aside &#8212; because in PHP you know there's a transition to a new DOM API &#8212; but how would Mozilla deal with that? Would they also remove it from their DOM API or would they try and maintain some backward compatibility? </strong></em></p><p>I don't know the answer for Mozilla but I know the answer for Chrome. If you visit a webpage in Chrome, it also tracks some statistics about which APIs are used.</p><p>I think this information is actually public. The Chrome developers actually have information about which DOM API is used how many times. And then as far as I know, an API is marked as deprecated if its usage drops below a certain percentage, like 0.0 whatever, then they can consider it for removal. And they can't consider it for removal if its usage is too high.</p><h3>Serenity OS</h3><p><em><strong>So your interest in the DOM started from looking at bug reports?</strong></em></p><p>Yeah, I saw these bugs, I saw that no one was fixing them and I was like, well, I think I can do it. I also follow this YouTube channel &#8212; I don't know if you know the person but his name is Andreas Kling.</p><p><em><strong>I've heard of the name.</strong></em></p><p>He made the <a href="https://serenityos.org">Serenity OS</a> project and out of that came a web browser, it's called <a href="https://ladybird.org">Ladybird</a>. It's fully open source, not based on any pre-existing engine. And he often records videos about how he approaches implementing HTML and DOM stuff. And that also sparked my curiosity and that's also kind of-</p><p><em><strong>He's doing that from scratch?</strong></em></p><p>He's implementing everything from scratch, although he's not alone anymore. There are many people that contribute to the Ladybird project.</p><p>By seeing that project I also got interested in doing DOM stuff and seeing all these DOM bugs I was like, well, this is actually a perfect opportunity to explore some of that HTML and DOM stuff myself.</p><h3>PHP 8.4 DOM Update</h3><p><em><strong>Was there already a plan to update the DOM classes in some way by others in the PHP team?</strong></em></p><p>Nobody planned this. The only reason it actually happened is because I had, in July 2023, all these improvements already to fix DOM bugs. It was at that point pretty much crash-free. Much more stable than it ever had been. And then someone complained on Mastodon about the lack of modern DOM features, and he gave a list. I thought, well, I can probably implement those. But then one of the bullet points is the lack of HTML5 support. At that point I wasn't even sure that was possible to add because of how old the DOM code was.</p><p>So I just started experimenting with it. And then I showed some other people within the PHP community and got some feedback from them. Once the feedback was positive and I knew what the way forward was, I volunteered to implement HTML5 support. And if that was successful, I was like, okay, I'll try to spend some time making the API compliant. So I basically just volunteered to do it myself. Of course I got a lot of feedback from people in the community, but in terms of the programming effort, it was all on me, in my spare time.</p><p><em><strong>How did you come across <a href="https://lexbor.com">Lexbor</a>?</strong></em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><p>Kind of by coincidence. I was looking for suitable parsers to implement in PHP, which is actually quite problematic to find, because most of these compliant parsers are in browsers, or are in big projects, and they can't easily be decoupled from them. For example, we had one person saying, why don't you just use the parser from Firefox? And that's not possible, because it's so tightly coupled to the Firefox code base, I can't take it out.</p><p>There were some candidates, like Gumbo, which is a parser made by Google, but unfortunately, it has not been maintained since 2018 or something like that. </p><p>There was also the parser of Servo. Servo is a research project by Mozilla to implement a browser engine in Rust. The HTML parser could have also been used in theory, but it lacked some of the encoding support that's required by the HTML spec. Also, implementing the Rust library into the large C code base that PHP is, was not going to be easy at all. </p><p>I searched further and came across Lexbor purely by coincidence, and it was a very well-tested library. I also saw that it was used in some Python libraries and some D and Crystal libraries. So I knew it was mature enough to give it a shot. I prototyped the initial version in like a week and a half or so, and I thought, okay, this is probably the way to go.</p><h3>HTML Parsing</h3><p><em><strong>In one of the Reddit posts about the PHP DOM update, there was a thread about how HTML is a bit of a mess.</strong></em></p><p>Ah, yes.</p><p><em><strong>There was a debate about why HTML parsers have to be very forgiving, making the parser spec very complicated, when nothing else is as forgiving.</strong></em></p><p>Take any random webpage, put it into an HTML validator, and see if it validates. You'll see that 95% of the web doesn't validate.</p><p>Why does it have to be forgiving? Because almost all webpages would be broken otherwise. That's what complicates HTML parsing. For everything that can go wrong, the parsing specification says what should happen. And it's sometimes very complicated because it tries to automatically fix the mistake of the developer, and change the document in such a way that it's probably what the developer intended. There are a lot of these complicated algorithms to make that happen, and it's also a constant source of parser differentials. I don't know if you're familiar with parser differentials?</p><p><em><strong>No.</strong></em></p><p>So I will give an example from a security point of view. Let's say that you're a content management system, and a user writes some comments into a blog post. You want to sanitize the HTML that the user provides. If you don't, you run the risk of XSS.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><p>How do you sanitize the user comment? Well, there are some allowed tags, some tags that are not allowed, and you want to filter those out. If the parser at the server side handles some edge cases differently than on the client side, it's possible that some forbidden tags might slip through. The server may think, okay, this HTML is sane, nothing can go wrong, but if you give it to the client, it may actually do something unexpected.</p><p>So those are parser differentials.</p><p>And in general, every time there's a difference in how a client parses something, and how a server parses something, there's bound to be problems.</p><p>There's a recent paper, published in IEEE Security and Privacy, that talks about this exact problem.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a></p><div><hr></div><p>Big thanks to Niels for sharing his story and insights, and for dedicating so much  time to improving PHP's DOM implementation for the rest of us.</p><p>I'll be continuing my look at the new PHP DOM API soon. If you missed the first post about this, you can read it here: <a href="https://blog.keyvan.net/p/parsing-html-with-php-84">Parsing HTML with PHP 8.4</a>.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The <a href="https://www.w3.org">W3C</a> used to publish specs with version numbers. The <a href="https://whatwg.org">WHATWG</a> advocated for a faster moving living standard, which eventually won over.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><em><a href="https://www.php.net/manual/en/domtext.iselementcontentwhitespace.php">isElementContentWhitespace</a>() and <a href="https://www.php.net/manual/en/domtext.iswhitespaceinelementcontent.php">isWhitespaceInElementContent</a>()</em></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p><a href="https://lexbor.com">Lexbor</a> is the HTML5 parser used in PHP 8.4&#8217;s new DOM API</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p><a href="https://developer.mozilla.org/en-US/docs/Web/Security/Attacks/XSS">Cross-site scripting attacks</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p><a href="https://ieeexplore.ieee.org/document/10646837">Parse Me, Baby, One More Time: Bypassing HTML Sanitizer via Parsing Differentials</a> [<a href="https://www.ias.cs.tu-bs.de/publications/parsing_differentials.pdf">PDF</a>]</p></div></div>]]></content:encoded></item><item><title><![CDATA[Parsing HTML with PHP 8.4]]></title><description><![CDATA[A look at the new HTML5 parser, CSS selector support, and new DOM classes]]></description><link>https://blog.keyvan.net/p/parsing-html-with-php-84</link><guid isPermaLink="false">https://blog.keyvan.net/p/parsing-html-with-php-84</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Mon, 09 Dec 2024 04:19:30 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9f15c38e-55d8-4338-8ff0-51267c42859c_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Update: An <a href="https://blog.keyvan.net/p/better-html-parsing-in-php">expanded version</a> of this article was published in issue 14 of PHP Magazine on 19 January 2026.</em></p><p>PHP 8.4, released last month, brings three major improvements to HTML parsing, DOM traversal and manipulation:</p><ul><li><p>A new HTML5 parser that accurately processes modern web content</p></li><li><p>Powerful CSS selector support for element retrieval</p></li><li><p>New DOM classes that better align with the DOM spec</p></li></ul><p>For developers working with web scraping, content extraction, or HTML transformation, these are significant improvements in functionality and performance.</p><p>These features haven't received as much attention as they deserve in the PHP 8.4 release coverage. And there&#8217;s still very little documentation on the PHP website. Having recently begun updating the <a href="https://github.com/fivefilters/readability.php/pull/32">PHP port of Mozilla&#8217;s Readability</a> to use these new features, I wanted to share more information.</p><h3>Technical Foundation</h3><p>At the core of these improvements is <a href="http://lexbor.com/">Lexbor</a>, a C-based HTML parser created by Alexander Borisov. It provides fast, standards-compliant HTML parsing and CSS selector support. It&#8217;s now included in PHP 8.4's official DOM extension, which comes enabled by default &#8212; no extra configuration needed.</p><p>The new DOM classes follow the <a href="https://dom.spec.whatwg.org/">DOM spec</a> more closely. If you're familiar with DOM traversal and manipulation in JavaScript, you'll find many familiar methods and properties now available in PHP, including <em><a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector">querySelector</a></em> and <em><a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll">querySelectorAll</a></em>. </p><h3>The Old Way: Parsing with libxml</h3><p>PHP has previously relied on libxml for parsing both XML and HTML. Unfortunately libxml struggles with modern HTML, and many pages get mangled by the parser. Let's look at a simple example that demonstrates the problem.</p><p>Here's a valid HTML5 document containing two paragraphs and a script tag:<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><pre><code>&lt;!DOCTYPE html&gt;
&lt;title&gt;Valid HTML5 Document&lt;/title&gt;
<strong>&lt;p&gt;Paragraph 1&lt;/p&gt;</strong>
&lt;script&gt;console.log("&lt;/html&gt;Console log text");&lt;/script&gt;
<strong>&lt;p&gt;Paragraph 2&lt;/p&gt;</strong></code></pre><p>When trying to parse this document and count its paragraphs, PHP finds three elements, not two (<a href="https://3v4l.org/OkmpO#v8.4.1">try it</a>):</p><pre><code>$dom = new DOMDocument('1.0', 'UTF-8');
$dom-&gt;loadHtml($html);
$paragraphs = $dom-&gt;getElementsByTagName('p');
echo "{$paragraphs-&gt;length} paragraphs found.";
<strong>// Output: 3 paragraphs found.</strong></code></pre><p>Why does it find three paragraphs instead of two? The presence of <em>&lt;/html&gt;</em> in the script element trips up the libxml parser. Instead of treating it as text within the script, libxml interprets it as a closing HTML tag. When we serialize the resulting DOM back to HTML, we can see how the document was mangled:</p><pre><code>&lt;html&gt;
  &lt;body&gt;
<strong>   &lt;p&gt;Paragraph 1&lt;/p&gt;</strong>
   &lt;script&gt;console.log("&lt;/script&gt;
  &lt;/body&gt;
&lt;/html&gt;
&lt;html&gt;
<strong>  &lt;p&gt;Console log text");&lt;/p&gt;</strong>
<strong>  &lt;p&gt;Paragraph 2&lt;/p&gt;</strong>
&lt;/html&gt;</code></pre><p>To work around these limitations, many developers have turned to alternative parsers. <a href="https://github.com/Masterminds/html5-php">HTML5-PHP</a> is popular, but it&#8217;s written in PHP rather than C, making it noticeably slower than libxml. It&#8217;s also unclear how much effort has been put in to keep up with the HTML living standard (more on that below).</p><h3>The New Way: Parsing with Lexbor</h3><p>PHP 8.4 solves these parsing challenges with its new HTML5 parser. Let's parse the same HTML with the new parser (<a href="https://3v4l.org/uSNCT#v8.4.1">try it</a>):</p><pre><code><strong>$newDom = Dom\HTMLDocument::createFromString($html);</strong>
$paragraphs = $newDom-&gt;getElementsByTagName('p');
echo "{$paragraphs-&gt;length} paragraphs found.";
<strong>// Output: 2 paragraphs found.</strong></code></pre><p>The parser now correctly identifies two paragraphs. You can try running both the old and the new parser <a href="https://3v4l.org/tXp91#v8.4.1">here</a>.</p><p>According to Niels Dossche, who is responsible for these new additions, performance is <a href="https://wiki.php.net/rfc/domdocument_html5_parser#results">comparable to libxml</a> parsing, if not a little faster.</p><h3>Lexbor vs. HTML5-PHP</h3><p>For current HTML5-PHP users, switching to the new DOM API and parser offers some advantages.</p><h4>Performance</h4><p>Lexbor, written in C, should perform much better than HTML5-PHP. In my tests Lexbor was 3.6 times faster when processing HTML pages containing blog posts and news articles. According to Niels, the speed advantage should become even more pronounced when processing larger HTML documents.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><h4>Standards compliance</h4><p>The HTML specification is a <a href="https://html.spec.whatwg.org/">living standard</a> that continuously evolves, and parsers can vary in their implementation of current standards.</p><p>HTML5-PHP was started in 2013, and its README still references a 2012 version of the W3C HTML5 standard. Lexbor was started in 2018, based on the newer WHATWG standard, which is now the sole publisher of the HTML standard. So Lexbor is likely closer to the current standard than HTML5-PHP.</p><p>It&#8217;s also worth noting that HTML5-PHP currently relies on PHP's old DOM classes which don't support the improved features of PHP's new DOM API covered in the rest of this article.</p><h3>Working with the New DOM Classes</h3><p>For backward compatibility, PHP 8.4 introduces new DOM classes alongside the existing ones. This means you can continue using <em>DOMDocument</em> if needed, or even use <a href="https://3v4l.org/tXp91#v8.4.1">both old and new classes</a> in the same codebase.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><p>Here's how to get started:</p><pre><code>$dom = Dom\HTMLDocument::createFromString($html);</code></pre><p>The new classes follow a simpler naming convention under the DOM namespace:</p><ul><li><p><a href="https://www.php.net/manual/en/class.domelement.php">DOMElement</a> &#8594; <a href="https://www.php.net/manual/en/class.dom-element.php">Dom\Element</a></p></li><li><p><a href="https://www.php.net/manual/en/class.domnode.php">DOMNode</a> &#8594; <a href="https://www.php.net/manual/en/class.dom-node.php">Dom\Node</a></p></li><li><p><a href="https://www.php.net/manual/en/class.domtext.php">DOMText</a> &#8594; <a href="https://www.php.net/manual/en/class.dom-text.php">Dom\Text</a></p></li><li><p><a href="https://www.php.net/manual/en/class.domattr.php">DOMAttr</a> &#8594; <a href="https://www.php.net/manual/en/class.dom-attr.php">Dom\Attr</a></p></li></ul><h3>Top-Level HTML Elements as DOM Properties</h3><p>You can now access the main parts of a HTML document through these convenient <em>Dom\Document</em> properties:</p><ul><li><p><a href="https://www.php.net/manual/en/class.dom-document.php#dom-document.props.head">head</a> (read only)<br>&#8220;The first <em>head</em> element that is a child of the <em>html</em> element. These need to be in the HTML namespace. If no element matches, this evaluates to null.&#8221;</p></li><li><p><a href="https://www.php.net/manual/en/class.dom-document.php#dom-document.props.body">body</a><br>&#8220;The first child of the <em>html</em> element that is either a <em>body</em> tag or a <em>frameset</em> tag. These need to be in the HTML namespace. If no element matches, this evaluates to null.&#8221;</p></li><li><p><a href="https://www.php.net/manual/en/class.dom-document.php#dom-document.props.title">title</a><br>&#8220;The title of the document as set by the <em>title</em> element for HTML or the SVG <em>title</em> element for SVG. If there is no title, this evaluates to the empty string.&#8221;</p></li></ul><p>Example:</p><pre><code>$dom = Dom\HTMLDocument::createFromString('&lt;p&gt;My document&lt;/p&gt;');
echo $dom-&gt;saveHtml(<strong>$dom-&gt;body</strong>);
// Output: &lt;body&gt;&lt;p&gt;My document&lt;/p&gt;&lt;/body&gt;
<strong>$dom-&gt;title = 'My title';</strong>
echo $dom-&gt;saveHtml(<strong>$dom-&gt;head</strong>);
// Output: &lt;head&gt;&lt;title&gt;My title&lt;/title&gt;&lt;/head&gt;</code></pre><h3>Working with innerHTML</h3><p>PHP 8.4 also introduces <em><a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML">innerHTML</a></em>, a property that provides an easier way to work with an element's content. Instead of manipulating DOM nodes directly, you work with HTML strings (<a href="https://3v4l.org/R7sAl#v8.4.1">try it</a>):<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><pre><code>$dom = Dom\HTMLDocument::createFromString('&lt;body&gt;&lt;h1&gt;Test&lt;/h1&gt;&lt;/body&gt;');
<strong>echo $dom-&gt;body-&gt;innerHTML;</strong>
// Output: &lt;h1&gt;Test&lt;/h1&gt;
<strong>$dom-&gt;body-&gt;innerHTML = '&lt;p&gt;Something new&lt;/p&gt;';</strong>
echo $dom-&gt;saveHtml();
// Output: &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;Something new&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;</code></pre><p>Note that there is no <em><a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/outerHTML">outerHTML</a></em> support yet.</p><h3>Modern CSS Selector Support</h3><p>One of the most powerful additions in PHP 8.4 is comprehensive support for modern CSS selectors. You can now use <em>querySelector</em> and <em>querySelectorAll</em> to find elements using the same selectors you're familiar with from frontend development: </p><ul><li><p><code>querySelector($selectors)</code><br>&#8220;Returns the first descendant element that matches the CSS selectors&#8221;</p></li><li><p><code>querySelectorAll($selectors)</code><br>&#8220;Returns a NodeList containing all descendant elements that match the CSS selectors&#8221;</p></li></ul><p>Here&#8217;s the previous code for getting paragraphs, but with <em>querySelectorAll</em> replacing <em>getElementsByTagName</em>:</p><pre><code>$newDom = Dom\HTMLDocument::createFromString($html);
<strong>$paragraphs = $newDom-&gt;querySelectorAll('p');</strong>
echo "{$paragraphs-&gt;length} paragraphs found.";</code></pre><p>This produces the same result as before, not very remarkable. But the new selector support enables much more sophisticated queries. Let's explore some practical examples:</p><h4>Find Multiple Element Types</h4><p>Get all paragraph and heading elements &#8212; returned in document order:</p><pre><code>$elements = $dom-&gt;querySelectorAll('<strong>p, h1, h2, h3, h4, h5, h6</strong>');</code></pre><h4>Avoid repetition with <em><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/:is">:is</a></em> and <em><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/:where">:where</a></em></h4><p>Get paragraphs and main headings that are direct children of the article:</p><pre><code>$elements = $dom-&gt;querySelectorAll('<strong>article &gt; :is(p, h1, h2)</strong>');</code></pre><p>You can also narrow your search to specific elements:</p><pre><code>$elements = $dom-&gt;querySelector('<strong>article</strong>')-&gt;querySelectorAll('<strong>p, h1, h2</strong>');</code></pre><p>Note that this is not technically equivalent to the earlier code, because we&#8217;re not limiting results to direct children only. To do that we&#8217;d need to use the <em><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/:scope">:scope</a></em> selector, which Lexbor doesn't yet support:</p><pre><code>$dom-&gt;querySelector('<strong>article</strong>')-&gt;querySelectorAll('<strong>:scope &gt; :is(p, h1, h2)</strong>');
// Throws: DOMException: Invalid selector (Selectors. Not supported: scope)</code></pre><p>The good news is that a fix for this issue, contributed by Niels, is currently <a href="https://github.com/lexbor/lexbor/pull/257">under review</a> in Lexbor.</p><h4>Find empty or non-empty elements with <em><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/:empty">:empty</a></em> and <em><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/:not">:not</a></em></h4><p>Get all empty p elements:</p><pre><code>$elements = $dom-&gt;querySelectorAll('<strong>p:empty</strong>');</code></pre><p>Get all non-empty p elements:</p><pre><code>$elements = $dom-&gt;querySelectorAll('<strong>p:not(:empty)</strong>');</code></pre><h4>Match parent or previous sibling elements with <em><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/:has">:has</a></em></h4><p>Get all paragraphs in article that have at least one link inside them:</p><pre><code>$elements = $dom-&gt;querySelectorAll('<strong>article p:has(a)</strong>');</code></pre><p>Get h1 headings that are followed immediately by a h2 heading:</p><pre><code>$elements = $dom-&gt;querySelectorAll('<strong>h1:has(+ h2)</strong>');</code></pre><h4>Attribute selectors</h4><p>Get all external links &#8212; URLs starting with &#8220;http&#8221; and not containing &#8220;example.com&#8221;, case insensitive:</p><pre><code>$elements = $dom-&gt;querySelectorAll(
    '<strong>a[href ^= "http" i]:not([href *= "example.com" i])</strong>'
);</code></pre><p>For more examples of available selectors, you can refer to MDN's documentation on <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_selectors/Selectors_and_combinators">CSS selectors and combinators</a>, and PHP&#8217;s <a href="https://github.com/php/php-src/tree/PHP-8.4.2/ext/dom/tests/modern/css_selectors">selectors tests folder</a>.</p><h3><em>Update</em></h3><ul><li><p>This article was updated on 11th December 2024 based on feedback from Niels Dossche. And again on the 25th December with a link to my <a href="https://blog.keyvan.net/p/interview-with-niels-dossche">interview with Niels</a>.</p></li></ul><h3>Part 2&#8230;</h3><p>An expanded version of this article is <a href="https://blog.keyvan.net/p/better-html-parsing-in-php">now available</a> in issue 14 of PHP Magazine. It covers:</p><ul><li><p>XPath selectors</p></li><li><p>Namespaces</p></li><li><p>Serialisation &#8212; turning the DOM tree back into HTML</p></li><li><p>And the small differences between the old and new DOM APIs</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.keyvan.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Credit</h3><p>Huge thanks to <a href="https://github.com/nielsdos">Niels Dossche</a>, both for introducing these fantastic new changes to PHP, and also for providing valuable feedback on this article.</p><p>And also a huge thanks to <a href="https://github.com/lexborisov">Alexander Borisov</a>, who is the creator of Lexbor. Lexbor is not only responsible for HTML parsing in this PHP release, but also its CSS selector support.</p><h3>Further Reading</h3><ul><li><p><a href="https://blog.keyvan.net/p/interview-with-niels-dossche">Interview with Niels Dossche</a></p></li><li><p>PHP RFCs by Niels Dossche</p><ul><li><p><a href="https://wiki.php.net/rfc/domdocument_html5_parser">DOM HTML5 parsing and serialization</a></p></li><li><p><a href="https://wiki.php.net/rfc/opt_in_dom_spec_compliance">Opt-in DOM spec-compliance</a></p></li><li><p><a href="https://wiki.php.net/rfc/dom_additions_84">New ext-dom features in PHP 8.4</a></p></li></ul></li><li><p>Lexbor docs</p><ul><li><p><a href="https://github.com/lexbor/docs/blob/main/source/articles/part-1-html.md">Part 1: HTML</a></p></li><li><p><a href="https://github.com/lexbor/docs/blob/main/source/articles/part-2-css.md">Part 2: CSS</a></p></li></ul></li><li><p>MDN Docs</p><ul><li><p><a href="https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Using_the_Document_Object_Model">Using the Document Object Model</a></p></li><li><p><a href="https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Locating_DOM_elements_using_selectors">Locating DOM elements using selectors</a></p></li><li><p><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_selectors/Selectors_and_combinators">CSS selectors and combinators</a></p></li></ul></li></ul><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I want to stress that the HTML I&#8217;ve provided here is completely valid according to standards in place for over 15 years. After publishing this and reading the discussion around the post, I realised that many developers have not encountered HTML like this and some assumed I was deliberately supplying malformed HTML to test each parser&#8217;s handling of <em>invalid</em> HTML. That might be a useful experiment in the future &#8212; there are rules about how parsers should handle invalid HTML &#8212; but I&#8217;m setting a much lower bar here, based on very common HTML you&#8217;ll encounter in the wild.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>I reached the 3.6x faster number by parsing 120 HTML web pages that contained either blog posts or news articles. When I have time to test larger documents, I will update this post.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>The new <em>Dom\Document</em> classes allow you to import nodes created with the previous DOM classes using the <em>importLegacyNode</em> method. This doesn&#8217;t work the other way round.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>Nearly 15 years ago, I created <em><a href="https://blog.keyvan.net/p/javascript-like-innerhtml-access-in-php">JSLikeHTMLElement</a>,</em> a small extension to PHP&#8217;s <em>DOMElement</em> class to allow <em>innerHTML</em> access using PHP&#8217;s magic getter and setter methods.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[I haven&#8217;t posted on the blog in years.]]></description><link>https://blog.keyvan.net/p/coming-soon</link><guid isPermaLink="false">https://blog.keyvan.net/p/coming-soon</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Mon, 20 Mar 2023 20:59:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I haven&#8217;t posted on the blog in years. Planning to start posting again soon&#8230;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.keyvan.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.keyvan.net/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[The mask of care and love]]></title><description><![CDATA[John McKnight on the service provider&#8217;s mask of care and love:]]></description><link>https://blog.keyvan.net/p/mask-of-care-and-love</link><guid isPermaLink="false">https://blog.keyvan.net/p/mask-of-care-and-love</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Wed, 17 Jul 2013 12:40:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>John McKnight on the service provider&#8217;s mask of care and love:</p><blockquote><p>Behind that mask is simply the servicer, his systems, techniques and technologies &#8211; a business in need of markets, an economy seeking new growth potential, professionals in need of an income.</p><p>It is crucial that we understand that this mask of service is not a false face. The power of the ideology of service is demonstrated by the fact that most servicers cannot distinguish the mask from their own face. The service ideology is not hypocritical because hypocrisy is the false pretence of a desirable goal. The modernized servicer believes in his care and love, perhaps more than even the serviced. The mask is the face. The service ideology is not conspiratorial. A conspiracy is a group decision to create an exploitative result. The modernized servicer honestly joins his fellows to create a supposedly beneficial result. The masks are the faces.</p><p>In order to distinguish the mask and the face it is necessary to consider another symbol &#8211; need. We say love is a need. Care is a need. Service is a need. Servicers meet needs. People are collections of needs. Society has needs. The economy should be organized to meet needs. In a modernized society where the major business is service, the political reality is that the central &#8220;need&#8221; is an adequate income for professional servicers and the economic growth they portend. The masks of love and care obscure this reality so that the public cannot recognize the professionalized interests that manufacture needs in order to rationalize a service economy. Medicare, Educare, Judicare, Socialcare and Psychocare are portrayed as systems to meet need rather than programmes to meet the needs of servicers and the economies they support.</p><p>Removing the mask of love shows us the face of the servicers who need income, and an economic system that needs growth. Within this framework, the client is less a person in need than a person who is needed. In business terms, the client is less the consumer than the raw material for the servicing system. In management terms, the client becomes both the output and the input. His essential function is to meet the needs of servicers, the servicing system and the national economy. The central political issue becomes the servicers&#8217; capacity to manufacture needs in order to expand the economy of the servicing system.</p></blockquote><p>Excerpt from his essay <a href="http://www.panarchy.org/mcknight/disabling.html">&#8216;Personalized Service and Disabling Help&#8217;</a>.</p>]]></content:encoded></item><item><title><![CDATA[Tuesday March 26, 2013]]></title><link>https://blog.keyvan.net/p/first-aid-kit</link><guid isPermaLink="false">https://blog.keyvan.net/p/first-aid-kit</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Tue, 26 Mar 2013 15:14:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/At4HhgHtBHk" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div id="youtube2-At4HhgHtBHk" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;At4HhgHtBHk&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/At4HhgHtBHk?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div>]]></content:encoded></item><item><title><![CDATA[Term Extraction in PHP]]></title><description><![CDATA[The new version of the term extraction tool on fivefilters.org is now in PHP.]]></description><link>https://blog.keyvan.net/p/term-extraction-in-php</link><guid isPermaLink="false">https://blog.keyvan.net/p/term-extraction-in-php</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Sun, 20 Jan 2013 15:01:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The new version of the <a href="http://fivefilters.org/term-extraction/">term extraction tool</a> on fivefilters.org is now in PHP.</p><p>For anyone looking for a simple way to carry out term extraction on English text using PHP, here&#8217;s a snippet using the PHP port of <a href="http://pypi.python.org/pypi/topia.termextract/">Topia&#8217;s Term Extractor</a>:</p><pre><code>require 'TermExtractor/TermExtractor.php';

$text = 'Politics is the shadow cast on society by big business';

$extractor = new TermExtractor();
$terms = $extractor-&gt;extract($text);

// We're outputting results in plain text...
header('Content-Type: text/plain; charset=UTF-8');

// Loop through extracted terms and print each term on a new line
foreach ($terms as $term_info) {
  // index 0: term
  // index 1: number of occurrences in text
  // index 2: word count
  list($term, $occurrence, $word_count) = $term_info;
  echo "$term\n";
}
</code></pre>]]></content:encoded></item><item><title><![CDATA[Chris Hedges: Assault on Gaza is Not a War, it is Murder]]></title><description><![CDATA[via Jonathan Cook]]></description><link>https://blog.keyvan.net/p/chris-hedges-assault-on-gaza-is-not-a-war-it-is-murder</link><guid isPermaLink="false">https://blog.keyvan.net/p/chris-hedges-assault-on-gaza-is-not-a-war-it-is-murder</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Sun, 18 Nov 2012 14:31:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/z7kBN9Me4Cs" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div id="youtube2-z7kBN9Me4Cs" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;z7kBN9Me4Cs&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/z7kBN9Me4Cs?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>via <a href="http://www.jonathan-cook.net/">Jonathan Cook</a></p>]]></content:encoded></item><item><title><![CDATA[PHP DOMDocument replace DOMElement contents with HTML string]]></title><description><![CDATA[This is another StackOverflow answer I&#8217;m moving over to my blog.]]></description><link>https://blog.keyvan.net/p/php-domdocument-replace-domelement-child-with-html-string</link><guid isPermaLink="false">https://blog.keyvan.net/p/php-domdocument-replace-domelement-child-with-html-string</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Wed, 14 Nov 2012 17:44:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This is another StackOverflow answer I&#8217;m moving over to my blog.</em></p><p>AWinter asked:</p><blockquote><p>Using PHP I&#8217;m attempting to take an HTML string passed from a WYSIWYG editor and replace the children of an element inside of a preloaded HTML document with the new HTML.</p><p>So far I&#8217;m loading the document identifying the element I want to change by ID but the process to convert an HTML to something that can be placed inside a DOMElement is eluding me.</p><pre><code>$doc = new DOMDocument();
$doc-&gt;loadHTML($html);

$element = $doc-&gt;getElementById($item_id);
if(isset($element)){
    //Remove the old children from the element
    while($element-&gt;childNodes-&gt;length){
        $element-&gt;removeChild($element-&gt;firstChild);
    }

    //Need to build the new children from $html_string and append to $element
}
</code></pre></blockquote><p>My answer:</p><p>If the HTML string can be parsed as XML, you can do this (after clearing the element of all child nodes):</p><pre><code>$fragment = $doc-&gt;createDocumentFragment();
$fragment-&gt;appendXML($html_string);
$element-&gt;appendChild($fragment);</code></pre><p>If <code>$html_string</code> cannot be parsed as XML, it will fail. If it does, you&#8217;ll have to use <code>loadHTML()</code>, which is less strict, but it will add elements around the fragment that you will have to strip.</p><p>Unlike PHP, Javascript has the <code>innerHTML</code> property which allows you to do this very easily. I needed something like it for a project so I extended PHP&#8217;s <code>DOMElement</code> to include <a href="https://keyvan.net/2010/07/javascript-like-innerhtml-access-in-php/">Javascript-like <code>innerHTML</code> access</a>.</p><p>With it you can access the <code>innerHTML</code> property and change it just as you would in Javascript:</p><pre><code>echo $element-&gt;innerHTML;
$elem-&gt;innerHTML = 'example';</code></pre>]]></content:encoded></item><item><title><![CDATA[Clean up HTML on paste in CKEditor]]></title><description><![CDATA[We use CKEditor at FiveFilters.org for our PastePad service.]]></description><link>https://blog.keyvan.net/p/clean-up-html-on-paste-in-ckeditor</link><guid isPermaLink="false">https://blog.keyvan.net/p/clean-up-html-on-paste-in-ckeditor</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Tue, 13 Nov 2012 13:52:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We use <a href="http://ckeditor.com">CKEditor</a> at <a href="http://fivefilters.org">FiveFilters.org</a> for our <a href="http://pastepad.fivefilters.org">PastePad</a> service. The idea is to allow users to paste content that&#8217;s not currently publically available on the web for processing with one of our web tools. This can be content that&#8217;s in a Word document, an email, or behind a paywall.</p><p>CKEditor can automatically clean up HTML it identifies as coming from MS Word, but there&#8217;s no way to force cleanup on all pasted content. By default, HTML cleanup occurs in the following two cases:</p><ol><li><p>User clicks the &#8216;paste from word&#8217; toolbar icon</p></li><li><p>User pastes content copied from MS Word itself</p></li></ol><p>In the second case, CKEditor looks for signs of MS Word formatting. It does this by testing whatever you paste against the following regular expression:</p><p><code>/(class=\"?Mso|style=\"[^\"]*\bmso\-|w:WordDocument)/</code></p><p>If there&#8217;s a match, it will be cleaned up. Otherwise it will paste as normal.</p><p>I want to avoid editing core files, so my solution is simply to ensure that this regular expression always matches pasted content. Here&#8217;s what I&#8217;ve come up with:</p><pre><code>CKEDITOR.on('instanceReady', function(ev) {
    ev.editor.on('paste', function(evt) {    
        evt.data['html'] = '&lt;!--class="Mso"--&gt;'+evt.data['html'];
    }, null, null, 9);
});
</code></pre><p>I haven&#8217;t tested extensively, but this appears to work as expected (CKEditor 3.6.2). You can <a href="http://pastepad.fivefilters.org">try it out</a>.</p><p>What the code does is it registers a new listener for the paste event, just like the Paste from Word plugin. When it receives the pasted HTML, it simply prepends an HTML comment containing one of the strings the Paste from Word plugin looks for. The listener has a priority of 9 to ensure it runs before the plugin which will trigger the actual cleaning (default priority of 10).</p><p>Note: I posted this solution on StackOverflow as an alternative to another solution, titled &#8220;CKEditor &#8211; use pastefromword filtering on all pasted content.&#8221; StackOverflow recently deleted some of my answers (and hid them from me) so I&#8217;m moving the rest of my meagre contributions over to my own blog.</p>]]></content:encoded></item><item><title><![CDATA[Push to Kindle e-mail service]]></title><description><![CDATA[Push to Kindle, FiveFilters.org&#8217;s web service for sending web articles to your Kindle, can now also be used by e-mail.]]></description><link>https://blog.keyvan.net/p/push-to-kindle-email-service</link><guid isPermaLink="false">https://blog.keyvan.net/p/push-to-kindle-email-service</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Mon, 29 Oct 2012 00:14:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/u1asTphZ4Ls" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="http://fivefilters.org/kindle-it/">Push to Kindle</a>, FiveFilters.org&#8217;s web service for sending web articles to your Kindle, can now also be used by e-mail. The email service is aimed at iPad and iPhone users.</p><p>Here&#8217;s a video showing you how to use it on your iPad or iPhone:</p><div id="youtube2-u1asTphZ4Ls" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;u1asTphZ4Ls&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/u1asTphZ4Ls?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>Step by step</h2><ol><li><p>On your device, load an article you&#8217;d like to send to your Kindle</p></li><li><p>Choose share page</p></li><li><p>In the list of options presented, select Mail</p></li><li><p>Enter your Kindle email address but instead of @kindle.com, enter @pushtokindle.com</p></li><li><p>Send!</p></li></ol><p>Changing the ending to @pushtokindle.com in step 4 ensures our service processes the article first and then sends it to your Kindle account.</p><p>The first time you do this, you&#8217;ll receive an email from FiveFilters.org asking you to confirm the address you&#8217;re sending from. After confirming, you&#8217;ll have the opportunity to save your Push to Kindle email address in your contacts list to make future sending easier. (Simply typing &#8216;kin&#8217; in to the To: field should show your Push to Kindle address as an option.)</p><p><br>If you own a 3G Kindle device and you want to make sure you will not be charged by Amazon, please send to @free.pushtokindle.com. (For the time being we are only sending to @free.kindle.com, but this might change in future.)</p><h2>Why an e-mail service?</h2><p>We already have a Push to Kindle <a href="https://play.google.com/store/apps/details?id=org.fivefilters.kindleit">Android app</a>. It adds &#8216;Push to Kindle&#8217; as an entry in your device&#8217;s share menu, so whenever you want to send a web article to your Kindle, you bring up the share menu and choose Push to Kindle.</p><p>We considered doing the same for iOS and other mobile devices, but decided to focus on email for two reasons:</p><ol><li><p>Unlike Android, iOS and Windows Phone operating systems do not yet allow apps to add entries to the share menu.</p></li><li><p>The share menu on most mobile devices does, however, include e-mail as an option</p></li></ol><h2>Pricing</h2><p>The first 25 articles processed by our e-mail service are free, after that you&#8217;ll be asked to purchase credits &#8212; this allows us to maintain the service.</p><p>100 credits cost 1.5&#8364; (around &#163;1.20 or $2)</p><p>Each article sent uses 1 credit. You will receive an email notice when your credits are low.</p><p>Note: credits are linked to the email address you send from, not your Kindle address.</p><h2>Compared to Amazon&#8217;s email service</h2><p><a href="http://www.amazon.com/gp/sendtokindle/email">Amazon&#8217;s Send to Kindle email service</a> currently works by accepting documents as attachments to an email message.</p><p>Web articles you read online are usually not in a format that can be sent to your Kindle account directly. They need to be cleaned up and converted to a suitable format first. That&#8217;s what our Push to Kindle service does. We take care of extracting the content and converting the article to a suitable format for your Kindle. We then send the result as an attachment to your Kindle account.</p><h2>Bear in mind</h2><p>We&#8217;re working to integrate this service with our <a href="https://member.fivefilters.org/plans.php">sustainer membership</a>. Once that&#8217;s done this service will be free for new and existing sustainers.</p><p>All articles are currently considered equal: 1 credit = 1 article. In the future this may change. For example, in line with our goal to encourage use of non-corporate sources, we&#8217;ll be white listing many non-corporate sources so no credits will be used if you process articles from these sources. Conversely, we may deduct more credits for articles originating from corporate sources.</p><p>Please consider this an experimental service. Let us know if you experience any issues and we&#8217;ll be happy to help. Email help@fivefilters.org.</p>]]></content:encoded></item><item><title><![CDATA[Full-Text RSS 3.0]]></title><description><![CDATA[Full-Text RSS 3.0 is now available.]]></description><link>https://blog.keyvan.net/p/full-text-rss-3</link><guid isPermaLink="false">https://blog.keyvan.net/p/full-text-rss-3</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Wed, 05 Sep 2012 14:03:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="http://fivefilters.org/content-only/">Full-Text RSS 3.0 is now available</a>.</p><h2>What is it?</h2><p>Full-Text RSS is a <a href="http://www.gnu.org/philosophy/free-sw.html">free software</a> PHP application to help you extract content from web pages. It can extract content from a standard HTML page and return a 1-item feed or it can transform an existing feed into a full-text feed.</p><p>It&#8217;s used primarily by news enthusiasts and developers.</p><p>It&#8217;s used by news enthusiasts who dislike partial web feeds &#8211; feeds which require them to read the full story on a different site, rather than their preferred application. Full-Text RSS can convert these feeds to full-text versions, allowing the reader to stay in his/her preferred environment to read the full story.</p><p>It&#8217;s used by developers building applications which need an article extraction component. It allows developers to retrieve and process only the content they&#8217;re interested in.</p><h2>Demo</h2><p><a href="http://fivefilters.org/content-only/" title="Full-Text RSS 3.0">Try it out</a> &#8211; enter a URL in the form and hit &#8216;Create Feed&#8217;.</p><h2>What&#8217;s new in 3.0</h2><h3>Extraction</h3><p>Multi-page support</p><p>Many web sites now split their articles into a number of pages. In earlier version of Full-Text RSS we&#8217;d added support for retrieving the single-page view and extracting content from that page. For sites which do not offer such a single-page view, we can now follow the &#8216;next page&#8217; links and build up the full article page by page.</p><p>Multi-page support currently works by specifying a next_page_link in the site config file associated with the website you are extracting from.</p><p>Examples:</p><pre><code>next_page_link: //a[@id='next-page']
next_page_link: //a[contains(text(), 'Next page')]
</code></pre><p>HTML5 parser: html5lib</p><p> By default we still rely on PHP&#8217;s fast libxml parser. For sites where this proves problematic, you can now specify <a href="http://code.google.com/p/html5lib/">html5lib</a> &#8211; a PHP implementation of a HTML parser based on the HTML5 spec.</p><p>Example:</p><p><code>parser: html5lib</code></p><p>Better AJAX handling</p><p> Full-Text RSS does not interpret any Javascript it comes across when fetching pages. To get at the content, we expect it to be marked up in HTML. Some sites have started relying on the user&#8217;s browser and its Javascript support to load page content. For pages which load content in this way, Google suggests that the publisher also offers the content in plain HTML so Google&#8217;s search engine crawlers can access it. <a href="https://developers.google.com/webmasters/ajax-crawling/docs/specification">Google&#8217;s spec</a> contains two possible triggers which will guide Google&#8217;s crawlers to the HTML version.</p><p>The first trigger appears in the URL, these URLs are often called &#8216;hashbang&#8217; URLs. Example: https://twitter.com/#!/search-home</p><p>The second trigger can appear in the HTML header: Example:</p><p>When encountered, these triggers will result in a new URL being generated, what Google terms an &#8216;Ugly URL&#8217;. The new URL will contain additional query string parameters to to indicate to the server that the plain HTML version is being requested.</p><p>Earlier versions of Full-Text RSS looked for the first trigger (&#8216;hashbang&#8217; in the URL) but not the second trigger. Full-Text RSS 3.0 now handles both.</p><p>Site config extraction patterns updated</p><p> Site config files are used to fine-tune extraction where autodetection doesn&#8217;t always work. There are now over 700 site config files. Many old ones have been updated and new ones added.</p><p>We also now look for OpenGraph title and date elements.</p><h3>Developers</h3><p>Cross-origin resource sharing (CORS) support</p><p>If Full-Text RSS is hosted on an a different domain to your application. Enabling CORS will allow your application to request JSON results from Full-Text RSS directly from the user&#8217;s browser. Avoiding the browser&#8217;s <a href="http://en.wikipedia.org/wiki/Same_origin_policy">same origin policy</a>.</p><p>To enable CORS, look at <code>$options-&gt;cors</code> in the config file.</p><p>JSONP support</p><p>The old way of circumventing the browser&#8217;s same origin policy was to use JSONP. You can do this by requesting JSON (<code>&amp;format=json</code>) with an additional callback function (<code>&amp;format=json&amp;callback=functionName</code>).</p><p>Global site config</p><p> The global site config accepts everything a regular site config file does, but it&#8217;s applied to all sites, whether or not a specific site config matches.</p><p>The global site config file should be named <code>global.txt</code> and placed inside the relevant <code>site_config/</code> subfolder.</p><p>Site config merging</p><p> Site config files are used to fine-tune extraction where autodetection doesn&#8217;t always work.</p><p>Previous version of Full-Text RSS looked for site config files in the following order:</p><ol><li><p>URL hostname match or wildcard match in the <code>site_config/custom/</code></p></li><li><p>URL hostname match or wildcard match in the <code>site_config/standard/</code></p></li><li><p>fingerprint match (HTML fragment mapping to hostname) in <code>site_config/custom/</code></p></li><li><p>fingerprint match (HTML fragment mapping to hostname) in <code>site_config/standard/</code></p></li></ol><p>As soon as an entry was matched, we&#8217;d process it, return it, and stop looking.</p><p>In Full-Text RSS 3.0, we follow the same order, but continue looking even if there&#8217;s a match. We build up the site config by appending any new entries we find. In addition, we also look for and combine global site config files:</p><ol start="5"><li><p>global rules in <code>site_config/custom/global.txt</code></p></li><li><p>global rules in <code>site_config/standard/global.txt</code></p></li></ol><p>To prevent this behaviour, you can enter <code>autodetect_on_failure: no</code> in the site config file. This will end the chain. The config files before and including this one will be loaded and merged, but no others.</p><p>XSS filtering</p><p> We have not enabled XSS filtering by default because we assume the majority of our users do not display the HTML retrieved by Full-Text RSS in a web page without further processing. If you subscribe to our generated feeds in your news reader application, it should, if it&#8217;s good software, already filter the resulting HTML for XSS attacks, making it redundant for Full-Text RSS do the same. Similarly with frameworks/CMS which display feed content &#8211; the content should be treated like any other user-submitted content.</p><p>If you are writing an application yourself which is processing feeds generated by Full-Text RSS, you can either filter the HTML yourself to remove potential XSS attacks or enable this option. This might be useful if you are processing our generated feeds with JavaScript on the client side &#8211; although there&#8217;s client side xss filtering available too, e.g. <a href="https://code.google.com/p/google-caja/wiki/JsHtmlSanitizer">JsHtmlSanitizer</a></p><p>If enabled, we&#8217;ll pass retrieved HTML content through htmLawed with safe flag on and style attributes denied, see <a href="http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/htmLawed_README.htm#s3.6">htmLawed&#8217;s readme</a>.</p><p>Note: if enabled this will also remove certain elements you may want to preserve, such as iframes.</p><p>Site config editor</p><p> Full-Text RSS 3.0 now comes with a site config editor available in the admin area (accessible via the admin/ folder). This lets you find, edit, and test existing site config files, or add new ones.</p><p>Note: We suggest you make changes to the site config files using a local installation of Full-Text RSS and upload the results to your server when ready. Site config files are simple text files stored on disk. Cloud hosting environments do not always offer persistent file storage, so changes made to a hosted copy on such environments may be lost.</p><p>Debug mode</p><p> Debug mode allows you to see what happens behind the scenes when Full-Text RSS is running. This is useful if you want to see things such as:</p><ul><li><p>URL redirects</p></li><li><p>Which site config files are loaded</p></li><li><p>Whether the single_page_link and next_page_link expressions match</p></li><li><p>Which XPath expression end up matching title, body, date, author</p></li></ul><h3>Performance</h3><p>Site config caching in APC</p><p> If you run Full-Text RSS in a hosting environment which has APC enabled, it can take advantage of APC&#8217;s user cache &#8211; a memory cache. If enabled we will store site config files (when requested for the first time) in APC&#8217;s user cache &#8211; avoiding disk access on subsequent requests. See <code>$options-&gt;apc</code> in the config file to enable. Keys in APC are prefixed with &#8216;sc.&#8217;</p><p>Note: <code>$options-&gt;apc</code> has no effect if APC is unavailable on your server.</p><p>Smart cache (experimental)</p><p> If you enable caching and APC, you can also try out the experimental smart cache. The intention here is, again, to reduce disk access. With this enabled we will not write Full-Text RSS&#8217;s results to disk straight away, instead we&#8217;ll store the generated cache key in APC&#8217;s user cache for 10 minutes. If a subsequent request comes in matching the cache key, we&#8217;ll write the result to disk. Requests after that matching the cache key will be loaded from disk. See <code>$options-&gt;smart_cache</code> in the config file to enable. Keys in APC are prefixed with &#8216;cache.&#8217;</p><p>Note: this has no effect if APC is disabled or unavailable on your server, or if you have caching disabled.</p><h3><a href="http://www.youtube.com/watch?v=9ntPxdWAWq8">Cloud ready</a></h3><p>Host for free on AppFog</p><p><a href="http://appfog.com/">AppFog</a> offer users free hosting with 2GB RAM. That&#8217;s more than enough to run Full-Text RSS for most users.</p><p>To get started:</p><ol><li><p>Create a free account</p></li><li><p>Install the AppFog command-line client (af)</p></li><li><p>Change into the Full-Text RSS folder</p></li><li><p>Type af push</p></li><li><p>Follow the prompts and you&#8217;re done.</p></li></ol><p>Note: if you get a 701 error saying the URL has been taken, edit <code>manifest.yml</code> and comment out the line starting with <code>name:</code> and <code>url:</code> by inserting a hash sign (<code>#</code>) at the beginning of the line. Save and try again. This time af will prompt you for an application name and URL.</p><p>Override config options with environment variables</p><p> Most of the config options in the config file can now be overridden with environment variables. When creating environment variables, use the option name prefixed with &#8216;<code>ftr_</code>&#8216;. For example, to override <code>$options-&gt;max_entries</code> and limit the maximum to 2, create an environment variable with key <code>ftr_max_entries</code> and value <code>2</code>.</p><h3>What didn&#8217;t make it</h3><p>No monitored feeds</p><p> One feature which didn&#8217;t make this release is the ability to create monitored feeds with PubSubHubbub support. This was specifically to improve the speed with which generated feeds updated within Google Reader&#8217;s system. Unfortunately this feature is not yet ready &#8211; we&#8217;ve not had great results in our tests, so won&#8217;t be releasing until we&#8217;re happy.</p><p>Config options removed</p><p> The following config options were removed:</p><ul><li><p><code>$options-&gt;restrict</code></p></li><li><p><code>$options-&gt;message_to_prepend_with_key</code></p></li><li><p><code>$options-&gt;message_to_append_with_key</code></p></li><li><p><code>$options-&gt;error_message_with_key</code></p></li><li><p><code>$options-&gt;alternative_url</code></p></li></ul><p>No extraction with CSS selector</p><p> You can no longer specify what should get extracted with a CSS selector passed in the querystring.</p>]]></content:encoded></item><item><title><![CDATA[Push to Kindle: some stats]]></title><description><![CDATA[Our Push to Kindle service has become quite popular since we launched.]]></description><link>https://blog.keyvan.net/p/push-to-kindle-some-stats</link><guid isPermaLink="false">https://blog.keyvan.net/p/push-to-kindle-some-stats</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Thu, 26 Jul 2012 01:09:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Our <a href="http://fivefilters.org/kindle-it/">Push to Kindle</a> service has become quite popular since we launched. Over 25,000 people currently use our <a href="https://chrome.google.com/webstore/detail/pnaiinchjaonopoejhknmgjingcnaloc">Chrome extension</a>, 7,000 use the <a href="https://addons.mozilla.org/en-US/firefox/addon/kindle-it/">Firefox extension</a> and over 2,000 have installed our <a href="https://market.android.com/details?id=org.fivefilters.kindleit">Android app</a>.</p><p>I recently decided to check how much of the content processed by our Push to Kindle service comes from corporate news sources. Here&#8217;s what I found:</p><ul><li><p>#1 &#8212; nytimes.com &#8212; 2.62%</p></li><li><p>#4 &#8212; guardian.co.uk &#8212; 1.32%</p></li><li><p>#15 &#8212; bbc.co.uk &#8212; 0.51%</p></li><li><p>#48 &#8212; telegraph.co.uk &#8212; 0.22%</p></li><li><p>#97 &#8212; independent.co.uk &#8212; 0.11%</p></li></ul><p>This is based on data collected over a period of 3 weeks.</p><p>I&#8217;m glad to see our users do not rely too much on corporate news sources. However, as the main goal of the FiveFilters.org project is to promote independent, non-corporate media, I&#8217;ll be thinking about ways to direct people to non-corporate sources of news and analysis in future updates.</p><p>For the time being, if a New York Times article is loaded, I&#8217;ve added a tab with links to <a href="http://www.nytexaminer.com/">The NYTimes eXaminer</a> (&#8216;An antidote to the &#8220;paper of record&#8221;&#8216;). Similarly, if an article from The Guardian, BBC or Independent is loaded, users will see a tab with links to <a href="http://medialens.org/">Medialens</a>.</p>]]></content:encoded></item><item><title><![CDATA[Send web articles to multiple Kindle devices]]></title><description><![CDATA[We&#8217;ve just updated our Kindle It service to allow you to send web articles to up to 5 Kindle devices in one go.]]></description><link>https://blog.keyvan.net/p/send-articles-to-multiple-kindle-devices</link><guid isPermaLink="false">https://blog.keyvan.net/p/send-articles-to-multiple-kindle-devices</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Tue, 27 Mar 2012 13:11:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We&#8217;ve just updated our <a href="http://fivefilters.org/kindle-it/">Kindle It</a> service to allow you to send web articles to up to 5 Kindle devices in one go.</p><p>Last December Amazon enabled its <a href="http://www.amazon.co.uk/gp/help/customer/display.html?nodeId=200767360">Kindle Personal Documents Service</a> for iPhone/iPad users, assigning each device a new email address, and this month the same feature has been enabled for Android users. Our Kindle It service has up to now been able to send to only one Kindle email address at a time, but as of today you can enter up to 5 addresses (separated by commas):</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H1dq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H1dq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg 424w, https://substackcdn.com/image/fetch/$s_!H1dq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg 848w, https://substackcdn.com/image/fetch/$s_!H1dq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!H1dq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H1dq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg" width="560" height="235" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:235,&quot;width&quot;:560,&quot;resizeWidth&quot;:560,&quot;bytes&quot;:54637,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H1dq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg 424w, https://substackcdn.com/image/fetch/$s_!H1dq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg 848w, https://substackcdn.com/image/fetch/$s_!H1dq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!H1dq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F616b53b0-66c2-4148-8413-bc9f3078d221_560x235.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>This will also work with our <a href="https://play.google.com/store/apps/details?id=org.fivefilters.kindleit">Push to Kindle</a> Android app (no update necessary).</p><p><a href="http://help.fivefilters.org/">Let us know</a> if you have any trouble.</p>]]></content:encoded></item><item><title><![CDATA[Wednesday January 11, 2012]]></title><link>https://blog.keyvan.net/p/funny</link><guid isPermaLink="false">https://blog.keyvan.net/p/funny</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Wed, 11 Jan 2012 19:26:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/HQ_fO8BSPZo" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div id="youtube2-HQ_fO8BSPZo" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;HQ_fO8BSPZo&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/HQ_fO8BSPZo?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div>]]></content:encoded></item><item><title><![CDATA[Freedom in Dependency]]></title><description><![CDATA[Excerpts from Richard Capes&#8217; interview with David Edwards (Medialens co-editor and author of Free to be Human):]]></description><link>https://blog.keyvan.net/p/freedom-in-dependency</link><guid isPermaLink="false">https://blog.keyvan.net/p/freedom-in-dependency</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Sun, 13 Nov 2011 20:21:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uKt1!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e7d9c52-9b9b-4715-a205-d0e35674da3a_350x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Excerpts from Richard Capes&#8217; <a href="http://moretht.blogspot.com/2011/11/free-to-be-human-interview-with-david.html" title="Free to be Human: An Interview with David Edwards">interview with David Edwards</a> (<a href="http://medialens.org">Medialens</a> co-editor and author of <a href="http://www.medialens.org/index.php?option=com_content&amp;view=article&amp;id=116&amp;Itemid=53">Free to be Human</a>):</p><blockquote><p>To work for a corporation is to be part of a system in which power flows strictly from the top down &#8211; it&#8217;s a totalitarian power structure. That&#8217;s the lot of enormous numbers of people in the world. If you try to opt out in the UK, you are ordered to &#8216;look for work&#8217;, mostly different kinds of corporate bondage.</p><p>&#8230;</p><p>Our freedom is undermined by less obviously political forms of social manipulation. For example, we&#8217;re propagandised by society to seek freedom in dependency. One of the key purposes of modern schooling is to instill a thirst for ambition and status in the young. The emphasis is on competition, coming first, getting the best grades to get to the best universities to get the best jobs and salaries. This version of freedom chains us to external sources of reward and respect. If we believe we need high status and &#8216;success&#8217; to be happy, we are chained to the people and organisations with the power to bestow these rewards. So we are trained to actually seek, to willingly embrace, a life of dependence.</p></blockquote><p>Read more: <a href="http://moretht.blogspot.com/2011/11/free-to-be-human-interview-with-david.html">Free to be Human: An Interview with David Edwards</a></p><p>Also well worth a read: <a href="http://www.medialens.org/index.php?option=com_content&amp;view=section&amp;layout=blog&amp;id=1&amp;Itemid=50">Falling by David Edwards</a></p>]]></content:encoded></item><item><title><![CDATA[Kindle It 1.0]]></title><description><![CDATA[We&#8217;ve just released a new version of Kindle It, our web application for sending online articles (e.g.]]></description><link>https://blog.keyvan.net/p/kindle-it-1-0</link><guid isPermaLink="false">https://blog.keyvan.net/p/kindle-it-1-0</guid><dc:creator><![CDATA[Keyvan]]></dc:creator><pubDate>Wed, 07 Sep 2011 14:07:13 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f5e3509d-e9f6-4828-b706-bb73c23f1b50_560x325.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We&#8217;ve just released a new version of <a href="http://fivefilters.org/kindle-it/">Kindle It</a>, our web application for sending online articles (e.g. blog posts, news stories, Wikipedia entries) to your Kindle. It now looks like this:</p><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ewYP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ewYP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png 424w, https://substackcdn.com/image/fetch/$s_!ewYP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png 848w, https://substackcdn.com/image/fetch/$s_!ewYP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png 1272w, https://substackcdn.com/image/fetch/$s_!ewYP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ewYP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png" width="560" height="325" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:325,&quot;width&quot;:560,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Kindle It screenshot&quot;,&quot;title&quot;:&quot;Kindle It screenshot&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Kindle It screenshot" title="Kindle It screenshot" srcset="https://substackcdn.com/image/fetch/$s_!ewYP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png 424w, https://substackcdn.com/image/fetch/$s_!ewYP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png 848w, https://substackcdn.com/image/fetch/$s_!ewYP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png 1272w, https://substackcdn.com/image/fetch/$s_!ewYP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcdf8a39-ecc7-4e51-a635-61157efe452e_560x325.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><h2>Try it</h2><p>You can test it now with the following articles:</p><ul><li><p>To Avert A&nbsp;Bloodbath &#8212; Libya And The&nbsp;Press: <a href="http://fivefilters.org/kindle-it/send.php?url=medialens.org%2Findex.php%3Foption%3Dcom_content%26view%3Darticle%26id%3D644%3Ato-avert-a-bloodbath-libya-and-the-press-part-1%26catid%3D24%3Aalerts-2011%26Itemid%3D68">part 1</a> <a href="http://fivefilters.org/kindle-it/send.php?url=medialens.org%2Findex.php%3Foption%3Dcom_content%26view%3Darticle%26id%3D645%3Ato-avert-a-bloodbath-libya-and-the-press-part-2-%26catid%3D24%3Aalerts-2011%26Itemid%3D68">part 2</a> by Medialens</p></li><li><p><a href="http://fivefilters.org/kindle-it/send.php?url=johnpilger.com%2Farticles%2Fbrainwashing-the-polite-and-professional-way">Brainwashing the polite and professional way</a> by John Pilger</p></li><li><p><a href="http://fivefilters.org/kindle-it/send.php?url=www.salon.com%2F2011%2F09%2F02%2Fwikileaks_28%2Fsingleton%2F">Facts and myths in the WikiLeaks/Guardian saga</a> by Glenn Greenwald</p></li></ul><h2>What&#8217;s new?</h2><p>Edit titles</p><p>If an article doesn&#8217;t have a title, or if we misidentify the title, you can now edit it before sending to the Kindle. Simply click the title block in the preview box and you&#8217;ll be able to change it.</p><p>Improved block quote support</p><p>If an article uses block quotations, these should show up indented now on the Kindle.</p><p>Clickable source URL</p><p>The URL of the article which we append to the end of the piece is now clickable.</p><p>Report problems</p><p>We know that some articles do not work well with Kindle It, if you encounter one of these, feel free to use the &#8216;report problem&#8217; link to let us know. The URL will be prefilled and you can report the problem anonymously. If you want to know when it&#8217;s fixed, you can supply your email address and we&#8217;ll let you know.</p><p>New design</p><p>We&#8217;ve redesigned the interface using the <a href="http://twitter.github.com/bootstrap/">Bootstrap toolkit</a> from Twitter. We hope you like it.</p><p>@kindle.com sending enabled for sustainers</p><p>Previously we only sent articles to @free.kindle.com addresses &#8211; Amazon delivers these to Kindle 3 owners via Wi-Fi, and it&#8217;s free. Using the @kindle.com address allows you to receive articles via 3G if Wi-Fi is unavailable, although Amazon will charge for 3G delivery.</p><p>For older Kindle models which don&#8217;t have Wi-Fi, the @kindle.com address is the only way to get automatic delivery (downloading the MOBI and transferring it via USB is the other option, and doesn&#8217;t cost anything). Because @kindle.com sending can cost users, we&#8217;ve restricted it to sustainer accounts&#8230;</p><p>Sustainer accounts</p><p>We&#8217;re now inviting users who like the service to consider signing up for a sustainer account for 5 Euros. Sustainer accounts help us to maintain the service and continue developing it. <a href="https://member.fivefilters.org/plans.php">More information</a>.</p><p>Regarding the last point, I want to stress that the service will remain free. Everything which worked before, should still work the same way &#8211; we&#8217;ve improved things but haven&#8217;t disabled any features. Sustainer accounts are primarily a way for users to help us cover costs and sustain the project. Sustainers do get a <a href="https://member.fivefilters.org/plans.php" title="Sustainer benefits">few benefits</a> &#8211; if you&#8217;re interested, there&#8217;s a 30 day trial to test the @kindle.com delivery.</p><h2>Extensions</h2><p>The most convenient way to use Kindle It is by using our browser extensions. So far we have extensions for <a href="https://chrome.google.com/webstore/detail/pnaiinchjaonopoejhknmgjingcnaloc">Chrome</a>, <a href="https://addons.mozilla.org/en-US/firefox/addon/kindle-it/">Firefox</a> and <a href="https://market.android.com/details?id=org.fivefilters.kindleit">Android</a>. If you use a different browser, we also have a <a href="http://fivefilters.org/kindle-it/">bookmarklet</a>.</p><h2>Feedback</h2><p>We&#8217;d love to hear what you think. We&#8217;ve had some nice comments on <a href="http://twitter.com/fivefilters/" title="Follow us on Twitter">Twitter</a> so far, including:</p><blockquote><p>&#8220;I think I&#8217;m in love: fivefilters.org/kindle-it/ &#8211; it&#8217;s perfect.&#8221; (Translated from Portuguese) &#8212; <a href="https://twitter.com/#!/myriamkazue/status/109130559380393984">@myriamkazue</a></p><p>&#8220;Kindle It sends web pages to your Kindle and it&#8217;s insanely good.&#8221; &#8212; <a href="https://twitter.com/#!/coldclimate/status/108914334842896384">@coldclimate</a></p><p>&#8220;The second most useful extension (after firebug) in Firefox&#8221; (Translated from Spanish) &#8212; <a href="https://twitter.com/#!/sourcerebels/status/108232139664932864">@sourcerebels</a></p><p>&#8220;@Evenwing @fivefilters ooh&#8230;I like this..&#8221; &#8212; <a href="https://twitter.com/#!/Yappings/status/100759226913271808">@Yappings</a></p><p>&#8220;2 things = joy. 1: if you have a Kindle you can push long web articles to it with Kindle It http://bit.ly/ozPyWb 2: Tor http://bit.ly/HcV9F&#8221; &#8212; <a href="https://twitter.com/#!/kjftec/status/96796461433495553">@kjftec</a></p><p>&#8220;@JayGreasley thanks for the tip on fivefilters.org, through it I found @medialens and that&#8217;s my type of news!&#8221; &#8212; <a href="https://twitter.com/#!/tashacres/status/70395013493817344">@tashacres</a></p></blockquote><p>If you have any trouble using it, email us: fivefilters@fivefilters.org</p>]]></content:encoded></item></channel></rss>