Migrating from a legacy Cloud ERP like Oracle NetSuite to a sovereign ERPNext instance on Google Cloud Platform is often perceived as a high-risk data extraction problem. The primary friction points are not just schema mapping, but the inherent rate-limiting and UI-induced latency of the legacy vendor’s API.
At Infinary, we bypass the standard CSV export/import workflows in favor of a high-concurrency ingestion pipeline that treats migration as a systems engineering problem rather than a clerical task.
Bypassing SuiteTalk Limitations
NetSuite’s standard SuiteTalk REST API is notoriously throttled. To achieve the throughput required for multi-terabyte datasets, we utilize SuiteQL. By executing direct SQL-like queries against the NetSuite analytics engine, we can extract relational data in flattened JSON structures, which are then streamed directly to a Cloud Run normalization service.
The Normalization Pipeline
We deploy a Python-based processing layer on Cloud Run. This service performs the following operations:
- Normalization: Converting NetSuite’s internal ID structures into human-readable, ERPNext-compliant naming conventions.
- Schema Alignment: Mapping complex Oracle schemas (e.g.,
TRANSACTION_LINES) to Frappe’s DocType hierarchy. - Parquet Conversion: Serializing normalized records into Apache Parquet. Parquet’s columnar storage format allows for rapid schema validation in BigQuery before the final load.
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
def normalize_and_store(json_data, destination_path):
# Mapping NetSuite internal keys to Frappe DocType fields
mapping_schema = {
'entity': 'supplier',
'trandate': 'posting_date',
'memo': 'remarks',
'amount': 'base_grand_total'
}
df = pd.DataFrame(json_data)
df = df.rename(columns=mapping_schema)
# Enforce data types for BigQuery validation
df['posting_date'] = pd.to_datetime(df['posting_date'])
# Write to Parquet for efficient buffer management
table = pa.Table.from_pandas(df)
pq.write_table(table, destination_path, compression='snappy')
return f"Serialized {len(df)} records to {destination_path}"
Direct Seeding vs. API Ingestion
The final stage of the pipeline avoids the Frappe REST API entirely for initial seeding. To circumvent API overhead, we utilize a direct MariaDB load from the GCP environment. By preparing the SQL dump files within the VPC, we can seed millions of records into the ERPNext database in a fraction of the time it would take to POST individual records.
This method ensures that when the “switch” is flipped, the enterprise has a zero-latency, sovereign system of record that is architecturally superior to the legacy monolith.
Principal Architect
Infinary Engineering Group