MigrationWTF: Specified key was too long; max key length is 3072 bytes

Published onSaturday 2, December 2017

I recently hit a curveball with a custom migration to import location data into Drupal. I am helping a family friend merge their bespoke application into Drupal. Part of the application involves managing Locations. Locations have a parent location, so I used taxonomy terms to harness the relationship capabilities. Importing levels one, two, three, and four worked great! Then, all the sudden, when I began to import levels five and six I ran into an error: Specified key was too long; max key length is 3072 bytes

Migration error output

False positives

My first reaction was a subtle table flip and bang on the table. I had just spent the past thirty minutes writing a Kernel test which executed my migrations and verified the imported data. Everything was green. Term hierarchy was preserved. Run it live: table flip breakage. But, why?

When I run tests I do so with PHPUnit directly to just execute my one test class or a single module. Here is the snippet from my project's phpunit.xml

    <!-- Example SIMPLETEST_DB value: mysql://username:[email protected]/databasename#table_prefix -->
    <env name="SIMPLETEST_DB" value="sqlite://localhost/sites/default/files/.ht.sqlite"/>

Turns out when running SQLite there are no constraints on the index key length, or there's some other deeper explanation I didn't research. The actual site is running on MySQL/MariaDB. As soon as I ran my Console command to run the migrations the environment change revealed the failure.

Finding a possible fix

Following the wrong path to fixing a bug, I went in a frantic Google search trying to find an answer which asserted by expected fix: there is some bogus limitation on the byte size that I can somehow bypass. So reviewed the fact that InnoDB has the limit of 3072 bytes, and that MyISAM has a different one (later, I found it is worse at 1000 bytes and one reason why we're all using MyISAM.)

I found an issue on Drupal.org which was reported about the index key being too long. The bug occurs by having too many keys listed in the Migration source plugin. Keys identify unique rows in the migration source. They are also used as the index in the create migrate_map_MIGRATION_ID. So if you require too many keys for identifying unique rows, you will experience this error. In most cases, you can probably break up your CSV and normalize it to be it easier to parse.

Indexes were added in Drupal 8.3 to improve performance. So, I had to find a fix. First I tried swapping back to MyISAM and realizing that was a fool's errand, but I was desperate.

I thought about trying to normalize the CSV data and make different, smaller, files. But there was a problem: some locations share the same name but have different parent hierarchies. A location in Illinois could share the same name as a location in a Canadian territory. I needed to preserve the Country -> Administrative Area -> Locality values in a single CSV.

A custom idMap plugin to the rescue

If you have worked with the Migrate module in Drupal you are familiar with process plugins and possibly familiar with source plugins. The former helps you transform data and the latter brings data into the migration. The migrate also has an id_map plugin. There is one single plugin provided by Drupal core: sql. I never knew or thought about this plugin because it is not defined. In fact, we never have to.

  /**
   * {@inheritdoc}
   */
  public function getIdMap() {
    if (!isset($this->idMapPlugin)) {
      $configuration = $this->idMap;
      $plugin = isset($configuration['plugin']) ? $configuration['plugin'] : 'sql';
      $this->idMapPlugin = $this->idMapPluginManager->createInstance($plugin, $configuration, $this);
    }
    return $this->idMapPlugin;
  }

If a migration does not provide an idMap definition it defaults to the core default sql mapping.

Hint: if you want to migrate into a non-SQL database you're going to need a custom id_map plugin!

Once I found this plugin I was able to find out where it created its table and the index information. Bingo! \Drupal\migrate\Plugin\migrate\id_map\Sql::ensureTables

  /**
   * Create the map and message tables if they don't already exist.
   */
  protected function ensureTables() {

In this method, it creates the schema that the table will use. There is some magic in here. By default, all keys are treated as a varchar field with a length of 64. But, then, it matches up those keys with known destination field values. So if you have a source value going to a plain text field it will change to a length of 255.

      // Generate appropriate schema info for the map and message tables,
      // and map from the source field names to the map/msg field names.
      $count = 1;
      $source_id_schema = [];
      $indexes = [];
      foreach ($this->migration->getSourcePlugin()->getIds() as $id_definition) {
        $mapkey = 'sourceid' . $count++;
        $indexes['source'][] = $mapkey;
        $source_id_schema[$mapkey] = $this->getFieldSchema($id_definition);
        $source_id_schema[$mapkey]['not null'] = TRUE;
      }

      $source_ids_hash[static::SOURCE_IDS_HASH] = [
        'type' => 'varchar',
        'length' => '64',
        'not null' => TRUE,
        'description' => 'Hash of source ids. Used as primary key',
      ];
      $fields = $source_ids_hash + $source_id_schema;

      // Add destination identifiers to map table.
      // @todo How do we discover the destination schema?
      $count = 1;
      foreach ($this->migration->getDestinationPlugin()->getIds() as $id_definition) {
        // Allow dest identifier fields to be NULL (for IGNORED/FAILED cases).
        $mapkey = 'destid' . $count++;
        $fields[$mapkey] = $this->getFieldSchema($id_definition);
        $fields[$mapkey]['not null'] = FALSE;
      }

This was my issue. The destination fields are the term name field, which has a length of 255. Great. But there is no way to interject here and change that value. All of the field schemas are coming from field and typed data information.

The solution? Make my own plugin. The following is my sql_large_key ID mapping class.

<?php

namespace Drupal\mahmodule\Plugin\migrate\id_map;

use Drupal\migrate\Plugin\migrate\id_map\Sql;

/**
 * Defines the sql based ID map implementation.
 *
 * It creates one map and one message table per migration entity to store the
 * relevant information.
 *
 * @PluginID("sql_large_key")
 */
class LargeKeySql extends Sql {

  protected function getFieldSchema(array $id_definition) {
    $schema =  parent::getFieldSchema($id_definition);
    if ($schema['type'] == 'varchar') {
      $schema['length'] = 100;
    }
    return $schema;
  }

}

The following is an example migration using my custom idMap definition.

id: company_location6
status: true
migration_tags:
  - company
idMap:
  plugin: sql_large_key
source:
  plugin: csv_by_key
  path: data/company.csv
  header_row_count: 1
 # Need each unique key to build proper hierarchy
  keys:
    - Location1
    - Location2
    - Location3
    - Location4
    - Location5
    - Location6
process:
  name:
    plugin: skip_on_empty
    method: row
    source: Location6
  vid:
    plugin: default_value
    default_value: locations
  # Find parent using key from previous level migration.
  parent_id:
    -
      plugin: migration
      migration:
        - company_location5
      source_ids:
        company_location5:
          - Location1
          - Location2
          - Location3
          - Location4
          - Location5
  parent:
    plugin: default_value
    default_value: 0
    source: '@parent_id'
destination:
  plugin: 'entity:taxonomy_term'
migration_dependencies: {  }

And, voila! 6,000 locations later there is a preserved hierarchy!