Files
vector-search-csharp/VectorSearchApp/Models/CUSTOM_MODEL_README.md

2.6 KiB

Custom Model Conversion Guide

This document describes how to use a custom embedding model for address embeddings.

Model Source

The custom model is available at:

This model is a fine-tuned version of all-MiniLM-L6-v2 specifically trained on address data.

Converting to ONNX Format

Since the model doesn't come with a pre-converted ONNX format, you need to convert it using Python.

Prerequisites

Install the required packages:

pip install optimum[exporters] transformers torch

Conversion Steps

  1. Run the conversion script:

    cd VectorSearchApp/Models
    python download-convert-model.py
    

    This will:

    • Download the model from HuggingFace
    • Convert it to ONNX format using Optimum
    • Save the model to Models/custom-model/
    • Copy the main model file to Models/address-embedding-model.onnx
  2. Update configuration:

    Edit VectorSearchApp/appsettings.json:

    {
      "Embedding": {
        "ModelName": "jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3",
        "Dimension": 384,
        "ApiToken": "",
        "UseLocalInference": true
      }
    }
    

    Or use the shorter alias:

    {
      "Embedding": {
        "ModelName": "custom-all-MiniLM-L6-v2-address",
        "Dimension": 384,
        "ApiToken": "",
        "UseLocalInference": true
      }
    }
    
  3. Run the application:

    cd VectorSearchApp
    dotnet run
    

Output Files

After conversion, the following files will be created:

VectorSearchApp/Models/
├── custom-model/
│   ├── config.json
│   ├── model.onnx
│   ├── special_tokens_map.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── vocab.txt
└── address-embedding-model.onnx  (copy of model.onnx for easy access)

Troubleshooting

CUDA/GPU Support

If you want to use GPU acceleration during conversion:

from optimum.onnxruntime import ORTModelForFeatureExtraction

model = ORTModelForFeatureExtraction.from_pretrained(
    model_id,
    export=True,
    provider="CUDAExecutionProvider",  # Use CUDA instead of CPU
)

Large Model Download

The first conversion may take several minutes as it downloads the full model (~90MB) and tokenizer files.

Memory Requirements

Conversion requires approximately 4GB of RAM. If you encounter memory issues, try closing other applications.