28
Mar 2011
Posted by

TYPO3 to Drupal migration - KNR Case Study

The client

Our client, Greenlandic Broadcasting Corporation (KNR), is the official TV and Radio Broadcasting agency of Greenland. Their website www.knr.gl was originally in TYPO3 (retained at: http://typo3.knr.codespry.com/).

KNR Home Page screenshotWe had won an international bid to re-design and re-build the website in TYPO3. From 2004 to 2009 we had been working on TYPO3 extensively which is what won us the project. However, during the post-award analysis and design phase, we recognised that the project would be better of done in Drupal instead of TYPO3. We had had an experience building www.indiaenvironmentportal.org.in, an environment news portal, in Drupal and the experience gained from this development led us to the belief that news and online media publishing sites should be built in Drupal. We had the benefit of having seen both the systems from close quarters.

Key Reasons for choosing Drupal over TYPO3

We'll start with a disclaimer that we really liked working in TYPO3. It works very well for some applications, such as for intranets. Our reason for choosing Drupal over TYPO3 for this website was because of the following rationale.

  1. Concepts of "Taxonomy" is central to Drupal. These could be a "tag library" which is used for tagging all stories written in a website. TYPO3 had no concept of such a Taxonomy; not to date.
  2. Availability of Apache Solr integration module. This module came much later in TYPO3, and is just now about reaching maturity, and that too in a near-commercial model by the agency which wrote it. The Taxonomy concepts are easily hooked onto Apache Solr for filtering based on meta-information, such as Tag library, Authors, Publications, etc. This can be seen in action beautifully at www.indiaenvironmentportal.org.in
  3. The TYPO3 Backend from for a News publishing website with hundreds and thousands of articles, added a level of usability complexity.
  4. Meta information such as "Authors" are simple entries in tt_news (a core TYPO3 extension - actually the heart of TYPO3), are simple Label entries; in Drupal, however, an Author can be part of a Taxonomy and complete User Profiles can be made for these authors - by default
  5. Article/News creation by way of simple forms. Easy bulk publish/unpublish of news
  6. Complex news-news and news-article relations easily managed using in-bulit relationships in Drupal

We must acknowlege anothr key reason. A lot of our developers, after the www.indiaenvironmentportal.org.in experience started loving working with Drupal more than in TYPO3 because of the "control" they experienced - such as the ability to write Themes using PHP Template, instead of depending upon Typoscript, which was the native TYPO3 scripting tool for theming TYPO3 websites. Typoscript adds advantages for non-technical people, but developers and themers often dislike it.

Management Challenges

We faced many challenges during the design and ideation of the project, which were related to issues of management (at the client and our end) of the project and attrition challenges at Srijan. Both us and the client were concerned about the future of the project. However, both the client and Srijan stayed committed to the relationship. Soon, the client appointed a full-time Project Manager for the project, and Srijan re-committed itself to the project.

Enter OpenPublish

Open Publish logo imageWe realised soon that our product had become unstable over the several months of work in starts-and-stops. We'd been wanting to work in OpenPublish for a long time now, and saw the KNR site as an ideal case for a move to OpenPublish. Srijan's committment to its clients reflected here. We invested in a research team to use KNR as a case, and work on OpenPublish. 3 weeks were given to a 3 people team. All this was done at Srijan's own investment and initiative with minimal investment (only in the form of a regular maintenance signup).

What is OpenPublish

As the OpenPublish website describes, it is:

"OpenPublish has been designed to meet the needs of any publisher – whether large newspaper, TV news site, niche information publication or something in between. It is a flexible solution easily tailored to fit any organization’s needs."

It is based on Pressflow, a performance tuned implementation of Drupal, has Memcached and Varnish implemented by default, and an Apache Solr integration built in. See the complete feature here.

Research complete; time to roll

It took the same team another 5-6 weeks from research completion to get the website live at www.knr.gl, including migration from the TYPO3 website to OpenPublish.

Migration from TYPO3 to Drupal

Our starting point was the Drupal migrate module and the case study written for migration of The Economist magazine to Drupal.

Analysis of the data to be migrated

We studied the data that in TYPO3 that needed to be migrated. Here's the metric of content we identified to migrate, and eventually migrated.

TYPO3 to Drupal migration screenshot

Do note that Gallery images were migrated in a different manner, and it is for this reason that the above screenshot shows 0 in the "migrated column". For migrating the gallery images we used simple PHP scripts, which also took care of "incremental migrations".

Challenges with TYPO3 DB migration

There were several challenges mapping the TYPO3 database structure onto Drupal. This challenge was magnified due to a poorly implemented TYPO3 setup on the KNR website.

Poor TYPO3 implementation

The TYPO3 implementation done for KNR was a BIG mess. Here are some examples:

  1. The site allowed photographers to register and upload their photographs in a TYPO3 extension called smooth_gallery. However, instead of one instance of the gallery to manage all photographers and their photos in albums, a separate TYPO3 page was created for each photographer with their respective name, an an instance of smooth_gallery created and embedded into the page. smooth_gallery further created a folder with the photographer name in which all images were finally stored.
  2. There were 1800 pages, one for each of the 1800 registered photographers of the KNR website

Differences between the TYPO3 DB structure and Drupal DB structure

  1. An "Author" (internal users at KNR) of a tt_news news story entry is a simple label entry. Therefore, while the same author may have entered several news items, the name of the same are stored multiple times simple as a field entry. The email entry of the same author could be different. However, while migrating this to Drupal, we had to ensure integrity of data in terms of the author profiles being made for internal as well as for external users - Photographers who registered on the site to upload their photos.
  2. These photos were residing independently in folders, and had to be made available to the News editors for use in the News Stories in the website. Therefore a Digital Assets repository had to be implemented.
  3. In Drupal, however, an Author can be part of a Taxonomy and complete User Profiles can be made for these authors. Also, the photographs and photo-galleries they made, had to be associated with their profiles

Intermediate Database design

To handle the above situations an intermediate database schema had to be prepared. This would a clean migration of content between TYPO3 and Drupal, according to their own structures.

TYPO3 tt_news (news) table structure

DB structure screenshot

Incremental Migration

Since the KNR webite (TYPO3) was in production, post UAT, the content would have to be continually migrated; the new Drupal website would have to start serving with the live real-time content. For this an incremental migration process had to be followed for News stories (including images) and for the Photo Gallery and any new photographer user registrations.

Here's a sample of the code we used for such incremental migrations for News stories:

$data['name'] = str_replace($data['path'],'',$

data['name']);
        $node->type = $type;
        $node->title = $data['caption'];
        $node->uid = $uid;
        $node ->status = $status;
        $node->created = $data['crdate'];
        node_save($node);
        $nid = $node->nid;
        $node->nid = '';
        $fileInsert = "insert into  files (uid,filemime,filename,filepath,status,timestamp) values
                            ('" . $uid ."', '" . $fileMime ."', '" .$data['caption'] . "', '" . $filePath . $sourcePath . $data['name']
                                    ."', '" . $status . "', " . $data['crdate'] .")";
        db_query($fileInsert);
        $fid = mysql_insert_id ();
       
        $galleryInsert = "insert into node_galleries (gid , nid, fid) values (" . $gid .", " . $nid . "," . $fid . ")";
        db_query($galleryInsert);
        $updateStatus = "update " .$typo3Db . "migrate_image_status set data =" . $data['uid'] . " where name like 'image_migrate'";
        db_query($updateStatus);
 

Converting Latin1 charset tables with UTF8 data set

The TYPO3 site was multilingual - English, Danish and Greenlandic. The TYPO3 DB had Latin1 charset tables with UTF8 data stored (Are you sure about this? How do you know?) which needed to be converted to UTF8 for a Drupal database.

Our initial approach was to change the DB and table charset to UTF8, which would convert Latin1 data to UTF8 with commands like:

  1. ALTER TABLE {tablename} MODIFY {table column} CHAR(20) CHARACTER SET utf8
  2. ALTER TABLE {tablename} DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci

But this was actually not working. After searching through Google we came upon this post - http://bit.ly/1RAqTO - which gave us the breakthrough. The solution was to convert the fields to BLOB, and then the BLOB field to UTF8. A more detailed case of this is available at: http://www.srijan.in/blog/converting-latin1-charset-tables-utf8-data-set

The TYPO3 migrate module

We've written a TYPO3 migrate module during the process of this KNR website migration, as well as for another client East West Center (coming up soon).

Conclusion

This migration has been an exciting project for us, and for the Drupal community as well, as this is probably the first migration from TYPO3 to Drupal, and certainly the first release of a generic TYPO3 migration module.

Learnings for the future

Updated Content Migrations

We had not utilized a feature of migrate_module, which allows for migrating updated content records - such as News, Articles stories - which have already been migrated, and were updated post this migration. We had instead compiled all such updated records based on a Date indicator, and migrated them separately by first doing a roll-back of these stories, and then re-migrating them using the migrate_module itself.

In our next migration we would like to use the update feature of the migrate_module.

Entrepreneur/ Blogger/ Drupal/ Agile & Open source evangelist/ Green activist/ Pilgrim/ Lover of the idea of India/ Blogs on life at http://danceofshiva.wordpress.com