2.6 KiB
Custom Model Conversion Guide
This document describes how to use a custom embedding model for address embeddings.
Model Source
The custom model is available at:
This model is a fine-tuned version of all-MiniLM-L6-v2 specifically trained on address data.
Converting to ONNX Format
Since the model doesn't come with a pre-converted ONNX format, you need to convert it using Python.
Prerequisites
Install the required packages:
pip install optimum[exporters] transformers torch
Conversion Steps
-
Run the conversion script:
cd VectorSearchApp/Models python download-convert-model.pyThis will:
- Download the model from HuggingFace
- Convert it to ONNX format using Optimum
- Save the model to
Models/custom-model/ - Copy the main model file to
Models/address-embedding-model.onnx
-
Update configuration:
Edit
VectorSearchApp/appsettings.json:{ "Embedding": { "ModelName": "jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3", "Dimension": 384, "ApiToken": "", "UseLocalInference": true } }Or use the shorter alias:
{ "Embedding": { "ModelName": "custom-all-MiniLM-L6-v2-address", "Dimension": 384, "ApiToken": "", "UseLocalInference": true } } -
Run the application:
cd VectorSearchApp dotnet run
Output Files
After conversion, the following files will be created:
VectorSearchApp/Models/
├── custom-model/
│ ├── config.json
│ ├── model.onnx
│ ├── special_tokens_map.json
│ ├── tokenizer.json
│ ├── tokenizer_config.json
│ └── vocab.txt
└── address-embedding-model.onnx (copy of model.onnx for easy access)
Troubleshooting
CUDA/GPU Support
If you want to use GPU acceleration during conversion:
from optimum.onnxruntime import ORTModelForFeatureExtraction
model = ORTModelForFeatureExtraction.from_pretrained(
model_id,
export=True,
provider="CUDAExecutionProvider", # Use CUDA instead of CPU
)
Large Model Download
The first conversion may take several minutes as it downloads the full model (~90MB) and tokenizer files.
Memory Requirements
Conversion requires approximately 4GB of RAM. If you encounter memory issues, try closing other applications.