Custom Model Conversion Guide

This document describes how to use a custom embedding model for address embeddings.

Model Source

The custom model is available at:

HuggingFace: jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3

This model is a fine-tuned version of all-MiniLM-L6-v2 specifically trained on address data.

Converting to ONNX Format

Since the model doesn't come with a pre-converted ONNX format, you need to convert it using Python.

Prerequisites

Install the required packages:

pip install optimum[exporters] transformers torch

Conversion Steps

Run the conversion script:
```
cd VectorSearchApp/Models
python download-convert-model.py
```
This will:
- Download the model from HuggingFace
- Convert it to ONNX format using Optimum
- Save the model to Models/custom-model/
- Copy the main model file to Models/address-embedding-model.onnx

Update configuration:

Edit VectorSearchApp/appsettings.json:

{
  "Embedding": {
    "ModelName": "jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3",
    "Dimension": 384,
    "ApiToken": "",
    "UseLocalInference": true
  }
}

Or use the shorter alias:

{
  "Embedding": {
    "ModelName": "custom-all-MiniLM-L6-v2-address",
    "Dimension": 384,
    "ApiToken": "",
    "UseLocalInference": true
  }
}

Run the application:
```
cd VectorSearchApp
dotnet run
```

Output Files

After conversion, the following files will be created:

VectorSearchApp/Models/
├── custom-model/
│   ├── config.json
│   ├── model.onnx
│   ├── special_tokens_map.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── vocab.txt
└── address-embedding-model.onnx  (copy of model.onnx for easy access)

Troubleshooting

CUDA/GPU Support

If you want to use GPU acceleration during conversion:

from optimum.onnxruntime import ORTModelForFeatureExtraction

model = ORTModelForFeatureExtraction.from_pretrained(
    model_id,
    export=True,
    provider="CUDAExecutionProvider",  # Use CUDA instead of CPU
)

Large Model Download

The first conversion may take several minutes as it downloads the full model (~90MB) and tokenizer files.

Memory Requirements

Conversion requires approximately 4GB of RAM. If you encounter memory issues, try closing other applications.

2.6 KiB Raw Blame History