112 lines
2.6 KiB
Markdown
112 lines
2.6 KiB
Markdown
# Custom Model Conversion Guide
|
|
|
|
This document describes how to use a custom embedding model for address embeddings.
|
|
|
|
## Model Source
|
|
|
|
The custom model is available at:
|
|
- **HuggingFace**: [jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3](https://huggingface.co/jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3)
|
|
|
|
This model is a fine-tuned version of all-MiniLM-L6-v2 specifically trained on address data.
|
|
|
|
## Converting to ONNX Format
|
|
|
|
Since the model doesn't come with a pre-converted ONNX format, you need to convert it using Python.
|
|
|
|
### Prerequisites
|
|
|
|
Install the required packages:
|
|
|
|
```bash
|
|
pip install optimum[exporters] transformers torch
|
|
```
|
|
|
|
### Conversion Steps
|
|
|
|
1. **Run the conversion script**:
|
|
|
|
```bash
|
|
cd VectorSearchApp/Models
|
|
python download-convert-model.py
|
|
```
|
|
|
|
This will:
|
|
- Download the model from HuggingFace
|
|
- Convert it to ONNX format using Optimum
|
|
- Save the model to `Models/custom-model/`
|
|
- Copy the main model file to `Models/address-embedding-model.onnx`
|
|
|
|
2. **Update configuration**:
|
|
|
|
Edit `VectorSearchApp/appsettings.json`:
|
|
|
|
```json
|
|
{
|
|
"Embedding": {
|
|
"ModelName": "jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3",
|
|
"Dimension": 384,
|
|
"ApiToken": "",
|
|
"UseLocalInference": true
|
|
}
|
|
}
|
|
```
|
|
|
|
Or use the shorter alias:
|
|
|
|
```json
|
|
{
|
|
"Embedding": {
|
|
"ModelName": "custom-all-MiniLM-L6-v2-address",
|
|
"Dimension": 384,
|
|
"ApiToken": "",
|
|
"UseLocalInference": true
|
|
}
|
|
}
|
|
```
|
|
|
|
3. **Run the application**:
|
|
|
|
```bash
|
|
cd VectorSearchApp
|
|
dotnet run
|
|
```
|
|
|
|
## Output Files
|
|
|
|
After conversion, the following files will be created:
|
|
|
|
```
|
|
VectorSearchApp/Models/
|
|
├── custom-model/
|
|
│ ├── config.json
|
|
│ ├── model.onnx
|
|
│ ├── special_tokens_map.json
|
|
│ ├── tokenizer.json
|
|
│ ├── tokenizer_config.json
|
|
│ └── vocab.txt
|
|
└── address-embedding-model.onnx (copy of model.onnx for easy access)
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### CUDA/GPU Support
|
|
|
|
If you want to use GPU acceleration during conversion:
|
|
|
|
```python
|
|
from optimum.onnxruntime import ORTModelForFeatureExtraction
|
|
|
|
model = ORTModelForFeatureExtraction.from_pretrained(
|
|
model_id,
|
|
export=True,
|
|
provider="CUDAExecutionProvider", # Use CUDA instead of CPU
|
|
)
|
|
```
|
|
|
|
### Large Model Download
|
|
|
|
The first conversion may take several minutes as it downloads the full model (~90MB) and tokenizer files.
|
|
|
|
### Memory Requirements
|
|
|
|
Conversion requires approximately 4GB of RAM. If you encounter memory issues, try closing other applications. |