Perplexing WP Importer behaviour

@migration #importer #attachments

In my brief testing of the WP Importer plugin, its behaviour with attachments is nothing short of frustrating, perplexing, bewildering, ...

As others have experienced, the importer often encounters timeout-type problems. PHP (and probably frontend servers like nginx) don't like to wait for 5 minutes to serve a single script. So while attempting to fetch attachments, the Importer gets cut off:

importer cut off during script execution

nginx's error.log tells us:

[error] 1667#0: *4746 upstream timed out (110: Connection timed out) while reading upstream, client: {ip redacted}, server: www.example.com, request: "POST /wp-admin/admin.php?import=wordpress&step=2 HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "wp.example.com", referrer: "http://wp.example.com/wp-admin/admin.php?import=wordpress&step=1&_wpnonce={nonce redacted}"

Meanwhile... checking the media library reveals that, since the importer script uses:

set_time_limit(0); /* line 91 at plugin repo rev 718670 */

... attachments were continuing to be imported in the background while the php5-fpm child kept going (I doubt this would be the behaviour for an Apache mod_php installation).

media library with successful imports

Refreshing this page showed more and more attachments until I manually killed the FPM process.

To make things worse, probably because I never finished the process and the script never reached backfill_parents(), all of these attachments are orphaned without being associated with their parent posts...

Solutions?

The first thing that comes to mind is breaking up the import into discrete steps (e.g. chunks of 10 attachments) that are executed in sequence, triggered by some JavaScript-based function in the admin interface that also shows progress. Backfilling parents might also be better done in shorter segments rather than after all attachments are processed.