The importer is a big deal

@migration #planning #importer #xmlrpc

A discussion Nacin linked me to on WP Tavern indicates that a lot of people are having trouble with the WordPress Importer plugin.

Chief complaints include:

  • no indications of progress
  • attachments are rarely fetched successfully
  • attachments become detached from their parent posts
  • out-of-memory, or timeout problems

I posted a few comments to let people know that I'm going to take these into consideration.

I also attempted a migration of from EC2 to another VPS host, using only the WordPress WXR export file and the aforementioned WordPress Importer plugin. It was a spectacular failure. Even on an nginx + php-fpm setup that I've used to great success, I confirmed what others were seeing:

  • no indications of progress
  • script timed out and did not pull attachments successfully
  • all custom post types (e.g. member profiles -- which I'll eventually switch to BuddyPress, front page sliders, etc) failed spectacularly because the themes and plugins that defined them were not installed

And, as I expected, a few other issues:

  • user accounts not automatically migrated
    (important for the site in question and for many others, because of user meta and associated details we might be interested in, like custom capabilities)
  • settings not automatically migrated
  • themes and plugins completely out of the picture
  • post IDs completely out of whack, breaking custom CSS (post/page specific) -- this is an issue potentially out of the scope of this type of exporter/importer; see my UUID proposal for GSoC

So fixing the importer seems to be an inevitable part of this project, even if my initial plans were to avoid it. Suggestions included using the XML-RPC API built into WordPress to fetch information. This is a possibility -- but I am concerned that

  1. APIs are turned off by default for security reasons
  2. Such a method is unreliable for migration (e.g. from a localhost)

In any case, one of the first steps that the importer could use is some notion of state -- breaking up the task into smaller chunks that can be done (e.g. internally running a routine on 10 posts at a time, spitting out a progress ping with WebSockets or something, then continuing). That would address the timeouts and failures. Other things to look at include a way to keep post IDs in sync when importing into a fresh installation.