vector-search-csharp/VectorSearchApp/Models/CUSTOM_MODEL_README.md

# Custom Model Conversion Guide

This document describes how to use a custom embedding model for address embeddings.

## Model Source

The custom model is available at:
- **HuggingFace**: [jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3](https://huggingface.co/jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3)

This model is a fine-tuned version of all-MiniLM-L6-v2 specifically trained on address data.

## Converting to ONNX Format

Since the model doesn't come with a pre-converted ONNX format, you need to convert it using Python.

### Prerequisites

Install the required packages:

```bash
pip install optimum[exporters] transformers torch
```

### Conversion Steps

1. **Run the conversion script**:

   ```bash
   cd VectorSearchApp/Models
   python download-convert-model.py
   ```

   This will:
   - Download the model from HuggingFace
   - Convert it to ONNX format using Optimum
   - Save the model to `Models/custom-model/`
   - Copy the main model file to `Models/address-embedding-model.onnx`

2. **Update configuration**:

   Edit `VectorSearchApp/appsettings.json`:

   ```json
   {
     "Embedding": {
       "ModelName": "jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3",
       "Dimension": 384,
       "ApiToken": "",
       "UseLocalInference": true
     }
   }
   ```

   Or use the shorter alias:

   ```json
   {
     "Embedding": {
       "ModelName": "custom-all-MiniLM-L6-v2-address",
       "Dimension": 384,
       "ApiToken": "",
       "UseLocalInference": true
     }
   }
   ```

3. **Run the application**:

   ```bash
   cd VectorSearchApp
   dotnet run
   ```

## Output Files

After conversion, the following files will be created:

```
VectorSearchApp/Models/
├── custom-model/
│   ├── config.json
│   ├── model.onnx
│   ├── special_tokens_map.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── vocab.txt
└── address-embedding-model.onnx  (copy of model.onnx for easy access)
```

## Troubleshooting

### CUDA/GPU Support

If you want to use GPU acceleration during conversion:

```python
from optimum.onnxruntime import ORTModelForFeatureExtraction

model = ORTModelForFeatureExtraction.from_pretrained(
    model_id,
    export=True,
    provider="CUDAExecutionProvider",  # Use CUDA instead of CPU
)
```

### Large Model Download

The first conversion may take several minutes as it downloads the full model (~90MB) and tokenizer files.

### Memory Requirements

Conversion requires approximately 4GB of RAM. If you encounter memory issues, try closing other applications.