# Custom Model Conversion Guide This document describes how to use a custom embedding model for address embeddings. ## Model Source The custom model is available at: - **HuggingFace**: [jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3](https://huggingface.co/jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3) This model is a fine-tuned version of all-MiniLM-L6-v2 specifically trained on address data. ## Converting to ONNX Format Since the model doesn't come with a pre-converted ONNX format, you need to convert it using Python. ### Prerequisites Install the required packages: ```bash pip install optimum[exporters] transformers torch ``` ### Conversion Steps 1. **Run the conversion script**: ```bash cd VectorSearchApp/Models python download-convert-model.py ``` This will: - Download the model from HuggingFace - Convert it to ONNX format using Optimum - Save the model to `Models/custom-model/` - Copy the main model file to `Models/address-embedding-model.onnx` 2. **Update configuration**: Edit `VectorSearchApp/appsettings.json`: ```json { "Embedding": { "ModelName": "jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3", "Dimension": 384, "ApiToken": "", "UseLocalInference": true } } ``` Or use the shorter alias: ```json { "Embedding": { "ModelName": "custom-all-MiniLM-L6-v2-address", "Dimension": 384, "ApiToken": "", "UseLocalInference": true } } ``` 3. **Run the application**: ```bash cd VectorSearchApp dotnet run ``` ## Output Files After conversion, the following files will be created: ``` VectorSearchApp/Models/ ├── custom-model/ │ ├── config.json │ ├── model.onnx │ ├── special_tokens_map.json │ ├── tokenizer.json │ ├── tokenizer_config.json │ └── vocab.txt └── address-embedding-model.onnx (copy of model.onnx for easy access) ``` ## Troubleshooting ### CUDA/GPU Support If you want to use GPU acceleration during conversion: ```python from optimum.onnxruntime import ORTModelForFeatureExtraction model = ORTModelForFeatureExtraction.from_pretrained( model_id, export=True, provider="CUDAExecutionProvider", # Use CUDA instead of CPU ) ``` ### Large Model Download The first conversion may take several minutes as it downloads the full model (~90MB) and tokenizer files. ### Memory Requirements Conversion requires approximately 4GB of RAM. If you encounter memory issues, try closing other applications.