Using tt-rss FeedIron plugin to clean up posts


I read a lot of RSS feeds with tt-rss (after the death of Google Reader a couple years ago), but not all pages provide a RSS feed for their articles. And even if they do, the feed sometimes is a mess because the content is not really tailored to the feed. Luckily there is a plugin for tt-rss called “FeedIron” which takes care of this and allows heavy customization of the article, for example

  • clean up of HTML content (getting rid of unnecessary elements in the feed, like comments, social sharing button, extended author information)
  • replacing the article with original content from the page
  • modification of the content with regex

I do use this plugin to get full articles from arstechnica in my feed. Unfortunately, after replacing the original article in the feed with the multi-page content (get the configuration for that here) from the original page, the images are broken because ars uses javascript to put images into the article. To fix this, I modified the configuration for the page:

"arstechnica.com": {
    "type": "xpath",
    "multipage": {
        "xpath": "span[@class='numbers']\/a",
        "append": true,
        "recursive": true
    },
    "xpath": [
        "section[@class='article-guts']"
    ],
    "cleanup": [
        "aside",
        "div[@class='article-expander']",
        "nav"
    ],
    "modify": [{
        "type": "regex",
        "pattern": "(data-thumb=\"(.+?)\".*?data-src=\"(.+?)\".*?>)",
        "replace": "><a href=\"$2\"><img src=\"$1\" \/><\/a>"
    }]
}

The interesting part here is the modify section:

"modify": [{
    "type": "regex",
    "pattern": "(data-thumb=\"(.+?)\".*?data-src=\"(.+?)\".*?>)",
    "replace": "><a href=\"$2\"><img src=\"$1\" \/><\/a>"
}]

The configuration has a regex pattern which matches on the images, grabs the URLs to the thumbnail and the original image and replaces the javascripty stuff with a regular HTML link and image. This way when I read the feed with TTRSS-Reader, my android client of choice for tt-rss I do see the images even without javascript.

Weitere Artikel

Google verwendet Daten aus unseren Smartphones

Sonnenaufgang und Sonnenuntergang

Dark Mode für Firefox und Thunderbird

Video Thumbnails unter Windows erstellen

Eindrücke aus Red Dead Redemption 2

Neuer Bluray Player: PS4

How to hardware reset the new Oura ring

Neuer Monitor: Dell S2716DG

Ein paar Bilder

Endlich da: Mein Oura Ring