Back to Field Notes
Principal Architect

Extracting NetSuite: The SQL Migration Pipeline

A technical overview of migrating legacy Oracle NetSuite data into a sovereign ERPNext instance with zero downtime.

Migrating from a legacy Cloud ERP like Oracle NetSuite to a sovereign ERPNext instance on Google Cloud Platform is often perceived as a high-risk data extraction problem. The primary friction points are not just schema mapping, but the inherent rate-limiting and UI-induced latency of the legacy vendor’s API.

At Infinary, we bypass the standard CSV export/import workflows in favor of a high-concurrency ingestion pipeline that treats migration as a systems engineering problem rather than a clerical task.

Bypassing SuiteTalk Limitations

NetSuite’s standard SuiteTalk REST API is notoriously throttled. To achieve the throughput required for multi-terabyte datasets, we utilize SuiteQL. By executing direct SQL-like queries against the NetSuite analytics engine, we can extract relational data in flattened JSON structures, which are then streamed directly to a Cloud Run normalization service.

The Normalization Pipeline

We deploy a Python-based processing layer on Cloud Run. This service performs the following operations:

  1. Normalization: Converting NetSuite’s internal ID structures into human-readable, ERPNext-compliant naming conventions.
  2. Schema Alignment: Mapping complex Oracle schemas (e.g., TRANSACTION_LINES) to Frappe’s DocType hierarchy.
  3. Parquet Conversion: Serializing normalized records into Apache Parquet. Parquet’s columnar storage format allows for rapid schema validation in BigQuery before the final load.
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

def normalize_and_store(json_data, destination_path):
    # Mapping NetSuite internal keys to Frappe DocType fields
    mapping_schema = {
        'entity': 'supplier',
        'trandate': 'posting_date',
        'memo': 'remarks',
        'amount': 'base_grand_total'
    }
    
    df = pd.DataFrame(json_data)
    df = df.rename(columns=mapping_schema)
    
    # Enforce data types for BigQuery validation
    df['posting_date'] = pd.to_datetime(df['posting_date'])
    
    # Write to Parquet for efficient buffer management
    table = pa.Table.from_pandas(df)
    pq.write_table(table, destination_path, compression='snappy')
    
    return f"Serialized {len(df)} records to {destination_path}"

Direct Seeding vs. API Ingestion

The final stage of the pipeline avoids the Frappe REST API entirely for initial seeding. To circumvent API overhead, we utilize a direct MariaDB load from the GCP environment. By preparing the SQL dump files within the VPC, we can seed millions of records into the ERPNext database in a fraction of the time it would take to POST individual records.

This method ensures that when the “switch” is flipped, the enterprise has a zero-latency, sovereign system of record that is architecturally superior to the legacy monolith.

Principal Architect
Infinary Engineering Group

#Migration #NetSuite #GCP #ERPNext

Authenticated Dispatch

Principal Architect — Principal Architect

Discuss Implementation