Using tt-rss FeedIron plugin to clean up posts


I read a lot of RSS feeds with tt-rss (after the death of Google Reader a couple years ago), but not all pages provide a RSS feed for their articles. And even if they do, the feed sometimes is a mess because the content is not really tailored to the feed. Luckily there is a plugin for tt-rss called “FeedIron” which takes care of this and allows heavy customization of the article, for example

  • clean up of HTML content (getting rid of unnecessary elements in the feed, like comments, social sharing button, extended author information)
  • replacing the article with original content from the page
  • modification of the content with regex

I do use this plugin to get full articles from arstechnica in my feed. Unfortunately, after replacing the original article in the feed with the multi-page content (get the configuration for that here) from the original page, the images are broken because ars uses javascript to put images into the article. To fix this, I modified the configuration for the page:

"arstechnica.com": {
    "type": "xpath",
    "multipage": {
        "xpath": "span[@class='numbers']\/a",
        "append": true,
        "recursive": true
    },
    "xpath": [
        "section[@class='article-guts']"
    ],
    "cleanup": [
        "aside",
        "div[@class='article-expander']",
        "nav"
    ],
    "modify": [{
        "type": "regex",
        "pattern": "(data-thumb=\"(.+?)\".*?data-src=\"(.+?)\".*?>)",
        "replace": "><a href=\"$2\"><img src=\"$1\" \/><\/a>"
    }]
}

The interesting part here is the modify section:

"modify": [{
    "type": "regex",
    "pattern": "(data-thumb=\"(.+?)\".*?data-src=\"(.+?)\".*?>)",
    "replace": "><a href=\"$2\"><img src=\"$1\" \/><\/a>"
}]

The configuration has a regex pattern which matches on the images, grabs the URLs to the thumbnail and the original image and replaces the javascripty stuff with a regular HTML link and image. This way when I read the feed with TTRSS-Reader, my android client of choice for tt-rss I do see the images even without javascript.

Weitere Artikel

No Man's Sky NEXT

Selfmade phone wallpaper

Shooting timelapse on Sony Alpha

New keycaps

Crossfit Regionals 2018

Ein paar Fotos mit der Sony A7II

Crossfit Open WOD 18.5

Neue Kamera: Sony A7 II

Crossfit Open WOD 18.4

Crossfit Open WOD 18.3