Committing too many files but the app runs possibly with a new model.

2026-01-16 23:07:45 -05:00
parent 12fd2ef45e
commit cb4488ee58
14 changed files with 61686 additions and 6 deletions
--- a/VectorSearchApp/Models/CUSTOM_MODEL_README.md
+++ b/VectorSearchApp/Models/CUSTOM_MODEL_README.md
@@ -0,0 +1,112 @@
+# Custom Model Conversion Guide
+
+This document describes how to use a custom embedding model for address embeddings.
+
+## Model Source
+
+The custom model is available at:
+- **HuggingFace**: [jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3](https://huggingface.co/jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3)
+
+This model is a fine-tuned version of all-MiniLM-L6-v2 specifically trained on address data.
+
+## Converting to ONNX Format
+
+Since the model doesn't come with a pre-converted ONNX format, you need to convert it using Python.
+
+### Prerequisites
+
+Install the required packages:
+
+```bash
+pip install optimum[exporters] transformers torch
+```
+
+### Conversion Steps
+
+1. **Run the conversion script**:
+
+   ```bash
+   cd VectorSearchApp/Models
+   python download-convert-model.py
+   ```
+
+   This will:
+   - Download the model from HuggingFace
+   - Convert it to ONNX format using Optimum
+   - Save the model to `Models/custom-model/`
+   - Copy the main model file to `Models/address-embedding-model.onnx`
+
+2. **Update configuration**:
+
+   Edit `VectorSearchApp/appsettings.json`:
+
+   ```json
+   {
+     "Embedding": {
+       "ModelName": "jarredparrett/all-MiniLM-L6-v2_tuned_on_deepparse_address_mutations_comb_3",
+       "Dimension": 384,
+       "ApiToken": "",
+       "UseLocalInference": true
+     }
+   }
+   ```
+
+   Or use the shorter alias:
+
+   ```json
+   {
+     "Embedding": {
+       "ModelName": "custom-all-MiniLM-L6-v2-address",
+       "Dimension": 384,
+       "ApiToken": "",
+       "UseLocalInference": true
+     }
+   }
+   ```
+
+3. **Run the application**:
+
+   ```bash
+   cd VectorSearchApp
+   dotnet run
+   ```
+
+## Output Files
+
+After conversion, the following files will be created:
+
+```
+VectorSearchApp/Models/
+├── custom-model/
+│   ├── config.json
+│   ├── model.onnx
+│   ├── special_tokens_map.json
+│   ├── tokenizer.json
+│   ├── tokenizer_config.json
+│   └── vocab.txt
+└── address-embedding-model.onnx  (copy of model.onnx for easy access)
+```
+
+## Troubleshooting
+
+### CUDA/GPU Support
+
+If you want to use GPU acceleration during conversion:
+
+```python
+from optimum.onnxruntime import ORTModelForFeatureExtraction
+
+model = ORTModelForFeatureExtraction.from_pretrained(
+    model_id,
+    export=True,
+    provider="CUDAExecutionProvider",  # Use CUDA instead of CPU
+)
+```
+
+### Large Model Download
+
+The first conversion may take several minutes as it downloads the full model (~90MB) and tokenizer files.
+
+### Memory Requirements
+
+Conversion requires approximately 4GB of RAM. If you encounter memory issues, try closing other applications.