Files
vector-search-csharp/VectorSearchApp/Models/CUSTOM_MODEL_README.md

112 lines
2.6 KiB
Markdown

# Custom Model Conversion Guide
This document describes how to use a custom embedding model for address embeddings.
## Model Source
The custom model is available at:
- **HuggingFace**: [jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3](https://huggingface.co/jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3)
This model is a fine-tuned version of all-MiniLM-L6-v2 specifically trained on address data.
## Converting to ONNX Format
Since the model doesn't come with a pre-converted ONNX format, you need to convert it using Python.
### Prerequisites
Install the required packages:
```bash
pip install optimum[exporters] transformers torch
```
### Conversion Steps
1. **Run the conversion script**:
```bash
cd VectorSearchApp/Models
python download-convert-model.py
```
This will:
- Download the model from HuggingFace
- Convert it to ONNX format using Optimum
- Save the model to `Models/custom-model/`
- Copy the main model file to `Models/address-embedding-model.onnx`
2. **Update configuration**:
Edit `VectorSearchApp/appsettings.json`:
```json
{
"Embedding": {
"ModelName": "jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3",
"Dimension": 384,
"ApiToken": "",
"UseLocalInference": true
}
}
```
Or use the shorter alias:
```json
{
"Embedding": {
"ModelName": "custom-all-MiniLM-L6-v2-address",
"Dimension": 384,
"ApiToken": "",
"UseLocalInference": true
}
}
```
3. **Run the application**:
```bash
cd VectorSearchApp
dotnet run
```
## Output Files
After conversion, the following files will be created:
```
VectorSearchApp/Models/
├── custom-model/
│ ├── config.json
│ ├── model.onnx
│ ├── special_tokens_map.json
│ ├── tokenizer.json
│ ├── tokenizer_config.json
│ └── vocab.txt
└── address-embedding-model.onnx (copy of model.onnx for easy access)
```
## Troubleshooting
### CUDA/GPU Support
If you want to use GPU acceleration during conversion:
```python
from optimum.onnxruntime import ORTModelForFeatureExtraction
model = ORTModelForFeatureExtraction.from_pretrained(
model_id,
export=True,
provider="CUDAExecutionProvider", # Use CUDA instead of CPU
)
```
### Large Model Download
The first conversion may take several minutes as it downloads the full model (~90MB) and tokenizer files.
### Memory Requirements
Conversion requires approximately 4GB of RAM. If you encounter memory issues, try closing other applications.