Commons Base64OutputStream – Principle of least surprise?

Yesterday I had the requirement to write base64 encoded content mixed with non-base64 encoded content. I have to deal with potentially large files (hundreds of MBs), so I wanted to work with streams to avoid memory issues. Also I am bound to an API of an external product, which hands an InputStream into my code and expects an InputStream back. (As a side note, did you know about PipedInputStream and PipedOutputStream?)

I am already using commons-codec Base64 encoder for small pieces of data, so I quickly discovered Base64OutputStream to write base64 encoded data in a streaming manner. I wrote a quick junit test to verify that my usage of this class produces the correct result and was somewhat surprised that 4 characters in the base64 encoded data were missing…

See the full git repository of the code at https://gist.github.com/MoriTanosuke/7129937.

As you can see above, not closing the Base64OutputStream does not produce the correct base64 encoded data. Only if I call #close() after writing into the stream, the missing bytes are encoded and written into the stream. Because I wrapped the Base64OutputStream around a PipedOutputStream and have to write additional non-base64 encoded data into it afterwards, I can not close the stream.

The solution for me was to switch to Base64InputStream and have commons-codec do the encoding when I read from my original InputStream source. That way I get valid base64 data for all my tested inputs. I tried to call #flush() instead, but that does not write the missing bytes into the stream. A quick peek into the sourcecode revealed that the code in question is only executed in the method #close().

This was kind of surprising, because the official javadoc in OutputStream#flush says

Flushes this output stream and forces any
buffered output bytes to be written out. The
general contract of flush is that calling it
is an indication that, if any bytes previously
written have been buffered by the implementation
of the output stream, such bytes should
immediately be written to their intended
destination.

It took me only a couple of minutes to switch from Base64OutputStream to Base64InputStream, but if my test string in the unit test was just one character different, I’d not have catched this prior to integration testing – which is always a headache if you’re trying to make systems talk to each other and you have this kind of unexpected behavior somewhere deep down in the guts of your code.

Running Ghost behind Apache Reverse Proxy

After migrating to Ghost a couple of days ago, I wanted to hide the application behind my Apache HTTP Server. Because I’m running my blog on a subdomain I already had a VirtualHost configured, so the question was how to set up mod_proxy to forward all requests to Ghost.

Here’s my apache configuration for the virtual host on which my blog is hosted on my server:

<Virtualhost *:80>
  ServerName blog.myserver.tld
  ServerAlias www.myserver.tld blog.myserver.tld
  
  ServerAdmin webmaster@localhost
  
  RewriteEngine On
  RewriteOptions Inherit
  # rewrite jekyll URLs to Ghost URLs
  RewriteRule ^/d{4}-d{2}-d{2}-(.+?)/?$ $1 [R]
  RewriteRule ^/d{4}/d{2}/d{2}/(.+?)/?$ $1 [R]
  # previous feed address to ghost feed
  RewriteRule ^/atom.xml$ /rss [R]
  
  <ifmodule mod_proxy.c>
    ProxyVia On
    ProxyRequests Off
    ProxyPass / http://www.kopis.de:2368/
    ProxyPassReverse / http://www.kopis.de:2368/
    ProxyPreserveHost on
    <proxy *>
      AllowOverride All
      Order allow,deny
      allow from all
    </proxy>
  </ifmodule>
</Virtualhost>

You can also see my RewriteRules here that help me keep old URLs from my Jekyll blog and not loose any search results due to broken URLs. That was quite important to me, because I think you shouldn’t break URLs when migrating your underlying software. Unfortunately, Ghost does not yet provide a mechanism to customize the URLs, so I had to go with this redirects as suggested by W3C.

Additionally, with real redirects I can instruct Disqus to crawl my website again and auto-fix all comments, which are tied to URLs in their system. No more URL mapping spreadsheets to edit! 🙂

Migrating from Jekyll to Ghost

Yesterday I decided to make the move from Jekyll to the new blogging platform Ghost. I was interested in the project since the kickstarter campaign, but I didn’t join the founders then.

After releasing Ghost to the public I downloaded the published version and ran it on my local machine. Ghost is based on Nodejs, which I’m already familiar, so using npm and forever to run it was not much of a problem. After playing around with the admin interface for a bit I decided to research ways to migrate from Jekyll to Ghost.

Both blog platforms are using Markdown when you edit posts, so I figured it wouldn’t be too hard to convert the files. Unfortunately, Ghost is quite silent when it comes to documentation about their APIs or the import/export formats. I tried to come up with my own nodejs module to import files, but quickly discovered the great Jekyll plugin Jekyll-To-Ghost.

All you have to do is

  • copy the plugin file into your _plugins folder
  • make sure that your _config.yml includes safe: false (to enable plugins)
  • run jekyll build

After your site is generated, you have a file _site/ghost-exported.json with all your posts in an acceptable JSON format which you can import into your installation of Ghost via http://yourdomain:2368/ghost/debug/.

I forked and modified the plugin a bit to convert links generated by the tag post_url into hard relative links (Ghost doesn’t support custom URL formats yet) and fixed the conversion of timestamps which was broken for me.

After importing all my posts I decided to run my own theme, so I copied the default theme casper and made some changes to it. Next would be to modify the theme and add Disqus for comments. I’m curious how the migration for the comments goes along, because I have to modify the identifiers in Disqus to use the new URL structure too.

Converting file content to base64 with Apache Camel 2.10.2

Recently I wanted to convert input files of various formats into an XML message with certain meta information and the content of the input file as as base64-encoded element. Because I’m working on enterprise integration at the moment and we’re using Apache Camel a lot, I first tried to find an available solution from Camel.

Sure enough, if you’re using a version of Camel higher than 2.11.0, you can just use the shipped Base64DataFormat and be done. It’s as simple as this XML configuration:

<camelContext id="camel" >
  <dataFormats>
    <base64 id="base64" />
  </dataFormats>
  <route>
    <from uri="file://inputDirectory" />
    <marshal ref="base64" />
    <to uri="file://outputDirectory" />
  </route>
</context>

Unfortunately, I’m stuck with Camel Version 2.10.2 – so no Base64DataFormat for me. I decided to roll my own DataFormat and use Apache Commons Base64.

My route (using the Spring DSL) now looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<beans >"http://www.springframework.org/schema/beans"
  >"http://www.w3.org/2001/XMLSchema-instance" >"http://camel.apache.org/schema/spring"
  xsi:schemaLocation="
     http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
     http://camel.apache.org/schema/spring http://camel.apache.org/schema/spring/camel-spring.xsd">

  <bean id="base64Dataformat" class="my.package.Base64DataFormat" />
  
  <camel:camelContext >"http://camel.apache.org/schema/spring">
  
      <route>
          <from ref="file://inputDirectory" />
          <!-- convert input to base64 encoded string -->
          <marshal>
              <custom ref="base64Dataformat" />
          </marshal>
          <!-- convert into string, the velocity template isn't doing that -->
          <convertBodyTo type="java.lang.String" />
          <!-- wrap into mapping message schema -->
          <to uri="velocity:META-INF/templates/mappingmessage.vm" />
          <to uri="file://archive" />
  </camel:camelContext>
</beans>

The actual Base64 encoding looks quite like in Camel Version 2.11.x. Because I want to use a velocity template and wrap the base64 encoded string in XML, I have to convert the body into a java.lang.String – no idea why.

Now I can convert file content into base64 encoded string and pass them to other (mostly legacy) systems.

Mein erster Artikel

So, Ghost ist veröffentlicht und kann runtergeladen werden. Natürlich hab ich das gleich ausprobiert und was Du hier liest, ist der erste Artikel in meiner eigenen Installation. Im Moment ist die Version 0.3.2 veröffentlicht, der Download ist öffentlich verfügbar. Ich konnte noch keinen Link zum Sourcecode entdecken, aber das kommt vielleicht noch. Der Sourcecode ist unter https://github.com/TryGhost/Ghost verfügbar.

Wer schonmal Nodejs gesehen hat, wird mit Ghost kein Problem haben. Nach dem Download wird das Archiv einfach entpackt und anschliessend mit folgenden Befehlen installiert und gestartet:

unzip ghost-0.3.2.zip -d ghost
cd ghost
npm install --production
npm start --production

Danach läuft unter http://localhost:2368 eine eigene Instanz von Ghost und man kann sich unter http://localhost:2368/ghost anmelden und anschliessend einloggen.

Im Dashboard kann ich noch keine Rechteverwaltung erkennen, d.h. jeder Nutzer kann Artikel erstellen. Ob man auch die Artikel anderer editieren/löschen/veröffentlichen kann, weiß ich noch nicht. Mein Einsatz wird als Einzelnutzer-Installation erfolgen.

Der Online-Editor macht einen stabilen Eindruck. Das Markdown dieses Artikels wird im zweiten Panel sofort gerendert, man sieht also direkt, wie der Artikel später aussehen soll. Die Oberfläche ist minimal, aber für mich genau das richtige. Wer ein Dashboard ala WordPress erwartet, wird aber enttäuscht.

Wer sich nicht die Mühe eines eigenen Blogs machen will, der kann auch hosten lassen – aber wer will das heutzutage schon noch? 😉

Ingress – Level 8

Ich spiele ja seit einigen Monaten Ingress, und jetzt ist es endlich so weit: Ich bin Level 8! 🙂

Ingress - Level 8

Wer Ingress noch nicht kennt, kann bei diesem Artikel von Fiona Krakenbürger auf ZEIT Online anfangen.

Converting a mercurial repository into a git repository

Today I cloned a mercurial repository, because I’m using an android application that I may have some contributions for. The project is hosted on Google Code and is using a mercurial repository since the beginning. I’m a git user myself, so I don’t want to give up my practiced workflows.

After trying hg-git without success, I found hg-fast-export. With this little tool I was able to convert the cloned mercurial repository into a git repository:

git clone git://repo.or.cz/fast-export.git

hg clone URL_TO_ORIGINAL_REPOSITORY
mkdir my-new-git-repository
cd my-new-git-repository
git init
PATH_TO_FASTEXPORT/hg-fast-export.sh -r ../NAME_OF_ORIGINAL_REPOSITORY --force

After this little dance I got a valid git repository with history, tags and branches. hg-fast-export should be able to do incremental imports, so I’m curious how it will handle my own local changes and incoming changes from the mercurial upstream later.

Mein erstes Buch-Review: Instant Camel

Vor ein paar Wochen hat mich Packt Publishing angeschrieben und gefragt, ob ich bereit wäre, für ein neues Buch einige Kapitel zu reviewen. Ich war einigermassen überrascht, aber nachdem sich herausstellte, dass es in dem Buch über Apache Camel gehen sollte und ein Blogpost von mir mich als Reviewer in die engere Auwahl gebracht hat, habe ich zugesagt. Zumal ich mich auch im Büro gerade intensiv mit dem Thema Enterprise Integration beschäftige und dabei oft Apache Camel einsetze.

Packt Publishing schickte dann ein paar Dokumente zum Reviewer Guide und den Vorgaben, was durch das Review überprüft werden sollte (keine Rechtschreibung, sondern eine inhaltliche und fachliche Prüfung). Mein Review bestand dann hauptsächlich aus Hinweisen wie “Das Code-Beispiel sollte XYZ etwas ausführlicher beschreiben” oder “Dieser Nebensatz sollte länger beschrieben/einen Link auf die offizielle Dokumentation unter http://aaa.bbb.ccc enthalten”.

Kurze Zeit später bekam ich auch schon eine Mail, in der Packt Publishing mir nochmal für das Review dankte und nach meiner Adresse für den Versand eines Belegexemplars fragte. Und schliesslich ist das Buch Instant Apache Camel Message Routing bei mir auf dem Schreibtisch gelandet:

Erstes Review: Instant CamelErstes Review: Instant Camel