벡터 양자화

Atlas Vector Search는 부동 소수점 벡터 임베딩(32비트 및 64비트 모두)의 자동 양자화를 지원합니다. 또한 특정 임베딩 모델에서 사전 양자화된 스칼라 및 이진 벡터를 수집하고 인덱싱하는 것을 지원합니다.

양자화 정보

양자화는 완전 충실도 벡터를 더 적은 비트로 축소하는 프로세스 입니다. 대신 축소된 표현 벡터를 인덱싱 하여 Atlas Vector Search 인덱스 에 각 벡터를 저장 데 필요한 메인 메모리의 양을 줄입니다. 이를 통해 더 많은 벡터 또는 더 높은 차원의 벡터를 저장 수 있습니다. 따라서 양자화는 리소스 소비를 줄이고 속도를 향상시킵니다. 100000와 같이 벡터 수가 많은 애플리케이션에는 양자화를 권장합니다.

스칼라 양자화

스칼라 양자화에는 먼저 인덱싱된 벡터의 각 차원에 대한 최소값과 최대값을 식별하여 차원의 값 범위 설정하는 작업이 포함됩니다. 그런 다음 범위 동일한 크기의 간격 또는 빈으로 나눕니다. 마지막으로 각 부동 소수점 값을 빈에 매핑하여 연속 부동 소수점 값을 불연속적인 정수로 변환합니다. Atlas Vector Search 에서 이 양자화는 벡터 임베딩의 RAM1/3.75 비용 양자화 전 비용 의 약 1/4()로 줄입니다.

이진 양자화

이진 양자화에는 각 차원에 대해 중간점을 0 로 가정하는 작업이 포함되며, 이는 일반적으로 OpenAI의 text-embedding-3-large와 같이 길이 1 로 정규화된 임베딩에 적합합니다. 그런 다음 벡터의 각 값을 중간점과 비교하여 중간점보다 크면 이진 값 1, 중간점보다 작거나 같으면 이진 값 0 를 할당합니다. Atlas Vector Search 에서 이 양자화는 벡터 임베딩의 RAM 비용 양자화 전 비용 의 24분의 1(1/24)로 줄입니다. 이 아닌 이유는 벡터 값과 1/32 별도로 Hierarchical Navigable Small Worlds 그래프 자체를 포함하는 데이터 구조가 압축되지 않기 때문입니다.

쿼리를 실행할 때 Atlas Vector Search는 쿼리 벡터와 인덱싱된 바이너리 벡터 간의 효율적인 비교를 위해 쿼리 벡터의 부동 소수점 값을 동일한 중간점을 사용하여 바이너리 벡터로 변환합니다. 그런 다음 바이너리 인덱스의 결과와 연결된 원래 부동 소수점 값을 사용하여 바이너리 비교에서 식별된 후보를 다시 평가하여 리스코어링하고 결과를 더욱 세밀하게 조정합니다. 고충실도 벡터는 디스크의 고유한 데이터 구조에 저장되며, 바이너리 양자화를 구성하거나 바이너리 또는 스칼라 양자화 벡터에 대해 정확한 검색을 수행할 때 리스코어링하는 동안에만 참조됩니다.

팁

벡터 양자화란 무엇인가요?

요구 사항:

다음 표는 양자화된 벡터를 자동으로 양자화하고 수집하는 데 필요한 요구 사항을 보여줍니다.

참고

Atlas는 모든 부동 소수점 값을 내부적으로 double 데이터 유형으로 저장합니다. 따라서 32비트 및 64비트 임베딩은 변환 없이 자동 양자화와의 호환성이 보장됩니다.

요구 사항	`int1` 수집의 경우	`int8` 수집의 경우	자동 스칼라 양자화의 경우	자동 이진 양자화의 경우
인덱스 정의 설정 필요	No	No	예	예
BSON `binData` 형식이 필요합니다.	예	예	No	No
mongod 에 저장	`binData(int1)`	`binData(int8)`	`binData(float32)` `array(double)`	`binData(float32)` `array(double)`
지원되는 유사성 메서드	`euclidean`	`cosine` `euclidean` `dotProduct`	`cosine` `euclidean` `dotProduct`	`cosine` `euclidean` `dotProduct`
지원되는 차원 수	8의 배수	1에서 8192	1에서 8192	8의 배수
근사 최근접 이웃(ANN) 및 등가 최근접 이웃(ENN) 검색 지원	예	예	예	예

벡터의 자동 양자화를 활성화하는 방법

Atlas Vector Search를 구성하여 컬렉션 내의 부동 소수점 벡터 임베딩을 벡터 인덱스에서 int8(스칼라) 및 binary와 같은 축소 표현 유형으로 자동으로 양자화할 수 있습니다.

양자화 유형을 설정하거나 변경하려면 인덱스 정의에서 quantization 필드 값을 scalar 또는 binary로 지정합니다. 이렇게 하면 다른 인덱스 정의 변경과 유사하게 인덱스 재구성이 트리거됩니다. 지정된 양자화 유형은 쿼리 시점의 모든 인덱싱된 벡터와 쿼리 벡터에 적용됩니다.

대부분의 임베딩 모델의 경우 점수 재지정을 포함한 이진 양자화를 권장합니다. QAT 가 아닌 저차원 모델을 사용하려면 표현 손실이 적고 표현 용량 손실이 적은 스칼라 양자화를 사용합니다.

혜택

Atlas Vector Search는 스칼라 양자화는 물론, 리스코어링이 포함된 이진 양자화 기능도 기본적으로 제공합니다. 자동 양자화는 벡터의 효율적인 처리를 위해 계산 리소스를 줄임으로써 애플리케이션의 확장성을 높이고 비용을 절감합니다. 자동 양자화는 mongot의 RAM을 스칼라 양자화 시 3.75배, 이진 양자화 시 24배 감소시킵니다. 벡터 값은 각각 4배와 32배로 축소되지만, Hierarchical Navigable Small Worlds 그래프 자체는 축소되지 않습니다. 이는 최대 볼륨과 규모에서도 성능을 향상시킵니다.

사용 사례

일반적으로 100를 초과하는000 완전 충실도 벡터가 많은 경우 자동 양자화를 권장합니다. 양자화 후 벡터를 검색할 때 정확도를 저하시키지 않으면서 축소된 표현 벡터를 인덱스 .

절차

자동 양자화를 활성화하려면:

Atlas Vector Search 인덱스 에서 원하는 양자화 유형을 지정합니다.

신규 또는 기존 Atlas Vector Search 인덱스 에서 인덱스 정의에 fields.quantization 대한 필드 에 다음 양자화 유형 중 하나를 지정합니다.

scalar: 부동 소수점 입력 벡터로부터 바이트 벡터를 생성합니다.
binary: 부동 소수점 입력 벡터로부터 바이트 벡터를 생성합니다.

데이터가 부동 소수점 값의 배열이 아닌 경우 자동 양자화를 지정해도 Atlas Vector Search는 해당 벡터를 인덱싱하지 않고 무시하며, 이러한 벡터는 건너뛰게 됩니다. Atlas는 내부적으로 32비트와 64비트 부동 소수점 값을 모두 double 유형으로 저장하므로, 어느 정밀도이든 해당 모델에서 생성된 임베딩은 자동 양자화와 호환됩니다.

인덱스 만들거나 업데이트 .

인덱스 작성에는 약 1분 정도가 소요됩니다. 인덱스가 작성되는 동안 인덱스는 초기 동기화 상태에 있습니다. 빌드가 완료되면 컬렉션의 데이터 쿼리를 시작할 수 있습니다.

지정된 양자화 유형은 쿼리 시점의 모든 인덱싱된 벡터와 쿼리 벡터에 적용됩니다.

고려 사항

Atlas UI 에서 양자화된 인덱스 볼 때 인덱스 크기는 양자화가 없는 인덱스 보다 크게 나타날 수 있습니다. 이는 지표가 Hierarchical Size Navigable Small Worlds 그래프 (메모리 내), 양자화된 벡터(메모리 내), 완전 충실도 벡터(디스크 내)를 포함하여 저장된 총 데이터를 나타내기 때문입니다. 쿼리 시점에 인덱스 에서 사용하는 메모리 양을 추정하려면 Required Memory 지표를 참조하세요.

사전 양자화된 벡터를 수집하는 방법

Atlas Vector Search는 특정 임베딩 모델에서 스칼라 및 이진 양자화 벡터의 수집 및 인덱싱을 지원합니다. 이미 양자화된 벡터가 없는 경우 임베딩을 float32, int1 또는 int8 하위 유형의 BSON BinData 벡터로 변환할 수 있습니다.

사용 사례

다음과 같은 사용 사례에는 양자화된 BSON binData 벡터를 수집하는 것이 좋습니다.

임베딩 모델에서 양자화된 벡터 출력을 인덱싱해야 합니다.
많은 수의 float 벡터가 있으며 mongod에서 스토리지 및 WiredTiger의 사용량(예: 디스크 및 메모리 사용량)을 줄이고자 합니다.

혜택

BinData는 이진 데이터를 저장하는 BSON 데이터 유형입니다. 이는 벡터 임베딩을 압축하여 표준 float32 배열을 사용하는 임베딩에 비해 클러스터에서 약 3배 적은 디스크 공간을 필요로 합니다. 자세한 내용은 벡터 압축을 참조하세요.

이 하위 유형은 int1 또는 int8 벡터와 같은 대체 유형으로 벡터를 인덱싱할 수 있게 하여 컬렉션의 Atlas Vector Search 인덱스를 구축하는 데 필요한 메모리를 줄입니다. 스칼라에 대해 mongot의 RAM을 3.75배, 이진에 대해서는 24배 줄입니다. 벡터 값은 각각 4배 및 32배 줄어들지만 계층적 탐색 가능한 작은 세계(Hierarchical Navigable Small Worlds) 그래프 자체는 줄어들지 않습니다.

binData 벡터가 아직 없는 경우 컬렉션에 데이터를 쓰기 전에 지원되는 드라이버를 사용하여 임베딩을 이 형식으로 변환할 수 있습니다. 다음 절차에서는 임베딩을 float32, int8 및 int1 하위 유형을 가진 BinData 벡터로 변환하는 단계를 안내합니다.

지원되는 드라이버

BSON float32, int1 및 int8 하위 유형을 가진 BinData 벡터는 다음 드라이버에서 지원됩니다.

C++ 드라이버 v4.1.0 이상
C#/. .NET 드라이버 v3.2.0 이상 버전
고 (Go) 드라이버 v2.1.0 이상 버전
PyMongo 드라이버 v4.10 이상
Node.js 드라이버 v6.11 이상
Java 드라이버 v5.3.1 이상

➤ Select your language(언어 선택) 드롭다운 메뉴를 사용하여 이 페이지의 절차 언어를 설정합니다.

전제 조건

BSON binData 벡터를 양자화하려면 다음이 필요합니다.

MongoDB 버전 6.0.11, 7.0.2 또는 그 이상의 버전을 실행하는 Atlas 클러스터.
IP 주소가 Atlas 프로젝트의 액세스 목록에 포함되어 있는지 확인하세요.
바이트 벡터 출력을 지원하는 임베딩 모델 액세스.
다음 임베딩 모델의 출력은 지원되는 MongoDB 드라이버를 사용하여 BSON binData 벡터를 생성하는 데 사용할 수 있습니다.
임베딩 모델 제공자
임베딩 모델
VoyageAI
voyage-3-large
Cohere
embed-english-v3.0
Nomic
nomic-embed-text-v1.5
Jina
jina-embeddings-v2-base-en
Mixedbread
mxbai-embed-large-v1
이러한 모델은 모두 양자화를 인식하도록 훈련되어 있기 때문에 스칼라 양자화는 이러한 모델에 대한 재현율을 유지합니다. 따라서 이러한 모델에 의해 생성된 스칼라 양자화된 임베딩의 재현율 저하는 와 같은 더 낮은 차원에서도 384 최소화됩니다.

Go 프로젝트를 실행하기 위한 터미널 및 코드 편집기입니다.
Go가 설치되었습니다.

Java Development Kit (JDK) 버전 8 이상.
Java 애플리케이션 설정하다 하고 실행 위한 환경입니다. IntelliJ IDEA 또는 Eclipse IDE와 같은 통합 개발 환경(IDE)을 사용하여 프로젝트 를 빌드 하고 실행 하도록 Maven 또는 Gradle을 구성하는 것이 좋습니다.

Node.js 프로젝트를 실행하기 위한 터미널 및 코드 편집기입니다.
npm 및 Node.js 설치되었습니다.

VS Code 또는 Colab과 같은 대화형 Python Notebook을 실행할 수 있는 환경입니다.

절차

이 절차의 예시에서는 Cohere embed-english-v3.0 모델을 사용하여 생성된 새 데이터 또는 기존 데이터 및 임베딩을 사용합니다. 새 데이터 예시에서는 샘플 텍스트 문자열을 사용하며, 이를 사용자 데이터로 교체할 수 있습니다. 기존 데이터의 예시는 sample_airbnb 데이터베이스의 listingsAndReviews 컬렉션에서 임베딩이 없는 문서의 하위 집합을 사용하며, 이를 임베딩 포함 여부에 관계없이 자체 데이터베이스 및 컬렉션으로 교체할 수 있습니다.

새 데이터에 대한 binData 벡터를 양자화할지 또는 Atlas 클러스터에 이미 있는 데이터에 대해 양자화할지를 기준으로 탭을 선택합니다.

필요한 라이브러리를 설치합니다.

다음 명령을 실행하여 MongoDB 고 (Go) 드라이버 설치합니다. 이 작업을 완료하는 데 몇 분 정도 걸릴 수 있습니다.

go get go.mongodb.org/mongo-driver/v2/mongo

고 (Go) v2.1 이상의 운전자 설치해야 합니다. 필요한 경우 임베딩 모델 제공자 로부터 라이브러리를 설치할 수도 있습니다. 예시 를 들어 이 페이지에 설명된 대로 코히어를 사용하여 float32, int8 및 int1 임베딩을 생성하려면 코히어를 설치합니다.

go get github.com/cohere-ai/cohere-go/v2/client@v2.13.0

Go 프로젝트를 초기화합니다.

터미널 창에서 다음 명령을 실행하여 ingest-binary-vectors라는 이름의 새 디렉토리를 만들고 프로젝트를 초기화합니다.

mkdir ingest-binary-vectors-project
cd ingest-binary-vectors-project
go mod init ingest-binary-vectors-project

터미널에서 환경 변수를 설정합니다.

임베딩을 생성하고 변환하기 위해 임베딩 모델 제공자에 액세스하려면, 필요에 따라 임베딩 모델 제공자의 API 키에 대한 환경 변수를 설정하세요.
Cohere의 임베딩을 사용하기 위해 COHERE_API_KEY 환경 변수를 설정하세요.
```
export COHERE_API_KEY="<COHERE-API-KEY>"
```
Atlas 클러스터에 액세스하려면 MONGODB_URI 환경 변수를 설정하세요.
```
export MONGODB_URI="<CONNECTION-STRING>"
```
연결 문자열은 다음 형식을 사용해야 합니다.
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```

(조건부)데이터에서 임베딩을 생성합니다.

임베딩 모델 제공자 사용하여 데이터에 대한 float32, int8 및 int1 임베딩을 생성한 다음 MongoDB 고 (Go) 운전자 사용하여 네이티브 벡터 임베딩을 BSON 벡터로 변환할 수 있습니다. 다음 샘플 코드는 Cohere의 embed API 사용하여 완전 정밀도 벡터를 생성합니다.

고 (Go) 프로젝트 에 GenerateAndConvertEmbeddings.go 이라는 새 파일 을 만듭니다.
```
touch GenerateAndConvertEmbeddings.go
```

다음 코드를 복사하여 GenerateAndConvertEmbeddings.go 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Cohere의 embed API를 사용하여 float32, int8, ubinary 벡터 임베딩을 생성합니다.
MongoDB 고 (Go) 운전자 사용하여 임베딩을 BSON 벡터로 변환합니다.binData
embeddings.json 파일을 만들고 이 파일에 데이터와 임베딩을 저장합니다.

GenerateAndConvertEmbeddings.go

1 package main  
2   
3 import (  
4 	"context"  
5 	"encoding/json"  
6 	"fmt"  
7 	"log"  
8 	"os"  
9   
10 	cohere "github.com/cohere-ai/cohere-go/v2"  
11 	cohereclient "github.com/cohere-ai/cohere-go/v2/client"  
12 	"go.mongodb.org/mongo-driver/v2/bson"  
13 )  
14   
15 // Sample data for embedding  
16 var data = []string{  
17 	"The Great Wall of China is visible from space.",  
18 	"The Eiffel Tower was completed in Paris in 1889.",  
19 	"Mount Everest is the highest peak on Earth at 8,848m.",  
20 	"Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",  
21 	"The Mona Lisa was painted by Leonardo da Vinci.",  
22 }  
23   
24 func main() {  
25 	apiKey := os.Getenv("COHERE_API_KEY")  
26 	if apiKey == "" {  
27 		log.Fatal("Ensure COHERE_API_KEY is set.")  
28 	}  
29   
30 	client := cohereclient.NewClient(cohereclient.WithToken(apiKey))  
31   
32 	embeddings, err := fetchEmbeddingsFromCohere(client)  
33 	if err != nil {  
34 		log.Fatalf("Error fetching embeddings: %v", err)  
35 	}  
36   
37 	documents := convertEmbeddingsToBSON(data, embeddings)  
38   
39 	err = writeJSONToFile("embeddings.json", documents)  
40 	if err != nil {  
41 		log.Fatalf("Error writing embeddings to file: %v", err)  
42 	}  
43   
44 	fmt.Println("Embeddings successfully stored in embeddings.json")  
45 }  
46   
47 // Fetch embeddings using Cohere API  
48 func fetchEmbeddingsFromCohere(client *cohereclient.Client) (*cohere.EmbedByTypeResponse, error) {  
49 	model := "embed-english-v3.0"  
50 	response, err := client.V2.Embed(context.TODO(), &cohere.V2EmbedRequest{  
51 		Texts:     data,  
52 		Model:     model,  
53 		InputType: cohere.EmbedInputTypeSearchDocument,  
54 		EmbeddingTypes: []cohere.EmbeddingType{  
55 			cohere.EmbeddingTypeFloat,  
56 			cohere.EmbeddingTypeInt8,  
57 			cohere.EmbeddingTypeUbinary,  
58 		},  
59 	})  
60 	if err != nil {  
61 		return nil, fmt.Errorf("failed to fetch embeddings: %w", err)  
62 	}  
63 	return response, nil  
64 }  
65   
66 // Convert embeddings to BSON binary vectors  
67 func convertEmbeddingsToBSON(sentences []string, embeddings *cohere.EmbedByTypeResponse) []bson.M {  
68 	var documents []bson.M  
69   
70 	for i, sentence := range sentences {  
71 		float32Emb := convertFloat64ToFloat32(embeddings.Embeddings.Float[i])  
72 		int8Emb := convertIntToInt8(embeddings.Embeddings.Int8[i])  
73 		ubinaryEmb := convertIntToBytes(embeddings.Embeddings.Ubinary[i])  
74   
75 		floatVector := bson.NewVector(float32Emb)  
76 		int8Vector := bson.NewVector(int8Emb)  
77 		ubinaryVector, err := bson.NewPackedBitVector(ubinaryEmb, 0)  
78 		if err != nil {  
79 			log.Fatalf("Error creating PackedBitVector: %v", err)  
80 		}  
81   
82 		document := bson.M{  
83 			"text":               sentence,  
84 			"embeddings_float32": floatVector.Binary(),  
85 			"embeddings_int8":    int8Vector.Binary(),  
86 			"embeddings_int1":    ubinaryVector.Binary(),  
87 		}  
88 		documents = append(documents, document)  
89 	}  
90   
91 	return documents  
92 }  
93   
94 // Write JSON file from in-memory BSON documents  
95 func writeJSONToFile(filename string, documents []bson.M) error {  
96 	file, err := os.Create(filename)  
97 	if err != nil {  
98 		return fmt.Errorf("failed to create file: %w", err)  
99 	}  
100 	defer file.Close()  
101   
102 	var jsonData []json.RawMessage  
103 	for _, document := range documents {  
104 		jsonBytes, err := bson.MarshalExtJSON(document, false, false)  
105 		if err != nil {  
106 			return fmt.Errorf("error marshaling BSON to JSON: %w", err)  
107 		}  
108 		jsonData = append(jsonData, jsonBytes)  
109 	}  
110   
111 	marshaledData, err := json.MarshalIndent(jsonData, "", "  ")  
112 	if err != nil {  
113 		return fmt.Errorf("failed to marshal JSON: %w", err)  
114 	}  
115   
116 	_, err = file.Write(marshaledData)  
117 	if err != nil {  
118 		return fmt.Errorf("failed to write JSON to file: %w", err)  
119 	}  
120   
121 	return nil  
122 }  
123   
124 // Convert a slice of float64 to a slice of float32  
125 func convertFloat64ToFloat32(f64s []float64) []float32 {  
126 	f32s := make([]float32, len(f64s))  
127 	for i, v := range f64s {  
128 		f32s[i] = float32(v)  
129 	}  
130 	return f32s  
131 }  
132   
133 // Convert a slice of int to a slice of int8  
134 func convertIntToInt8(ints []int) []int8 {  
135 	ints8 := make([]int8, len(ints))  
136 	for i, val := range ints {  
137 		ints8[i] = int8(val)  
138 	}  
139 	return ints8  
140 }  
141   
142 // Convert a slice of int to a slice of bytes  
143 func convertIntToBytes(ints []int) []byte {  
144 	bytes := make([]byte, len(ints))  
145 	for i, val := range ints {  
146 		bytes[i] = byte(val)  
147 	}  
148 	return bytes  
149 }

코드에서 다음 자리 표시자 값을 바꾸고 파일 저장합니다.

`COHERE_API_KEY`	환경 변수를 설정하다 하지 않은 경우에만 Cohere API 키를 사용할 수 있습니다.

다음 명령을 사용하여 프로그램을 실행합니다.
터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.
go run GenerateAndConvertEmbeddings.go
Embeddings successfully stored in embeddings.json
embeddings.json 파일의 임베딩을 확인합니다.

임베딩을 생성하고 임베딩을 binData 벡터로 변환하는 방법에 대한 자세한 내용은 벡터 임베딩 생성 방법을 참조하세요.

데이터를 수집하고 Atlas Vector Search 인덱스를 생성합니다.

데이터와 임베딩을 Atlas 클러스터의 컬렉션에 업로드하고 해당 데이터에 Atlas Vector Search 인덱스를 생성하여 $vectorSearch 쿼리를 실행해야 합니다.

고 (Go) 프로젝트 에 UploadDataAndCreateIndex.go 이라는 새 파일 을 만듭니다.
```
touch UploadDataAndCreateIndex.go
```

다음 코드를 복사하여 UploadDataAndCreateIndex.go 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

embeddings.json 파일의 float32, int8, int1 임베딩을 Atlas 클러스터에 업로드합니다.
embeddings.float32, embeddings.int8, embeddings.int1 필드에 Atlas Vector Search 인덱스를 생성합니다.

UploadDataAndCreateIndex.go

1 package main
2 
3 import (
4 	"context"
5 	"fmt"
6 	"io/ioutil"
7 	"log"
8 	"time"
9 	"os"
10 
11 	"go.mongodb.org/mongo-driver/v2/bson"
12 	"go.mongodb.org/mongo-driver/v2/mongo"
13 	"go.mongodb.org/mongo-driver/v2/mongo/options"
14 )
15 
16 var (
17 	mongodbURI          = os.Getenv("MONGODB_URI")
18 	dbName              = "<DATABASE-NAME>"
19 	collectionName      = "<COLLECTION-NAME>"
20 	indexName           = "<INDEX-NAME>"
21 	numberOfDimensions  = 1024
22 	embeddingFields     = []string{"embeddings_float32", "embeddings_int8", "embeddings_int1"}
23 	embeddingSimilarity = []string{"dotProduct", "dotProduct", "euclidean"}
24 )
25 
26 func main() {
27 	clientOpts := options.Client().ApplyURI(mongodbURI)
28 	client, err := mongo.Connect(clientOpts)
29 	if err != nil {
30 		log.Fatalf("Failed to connect to MongoDB: %v", err)
31 	}
32 
33 	defer func() {
34 		if err := client.Disconnect(context.TODO()); err != nil {
35 			log.Fatalf("Failed to disconnect MongoDB client: %v", err)
36 		}
37 	}()
38 
39 	storeEmbeddings(client)
40 	setupVectorSearchIndex(client)
41 }
42 
43 // Reads JSON data, stores it in MongoDB
44 func storeEmbeddings(client *mongo.Client) {
45 	database := client.Database(dbName)
46 	collection := database.Collection(collectionName)
47 
48 	data, err := ioutil.ReadFile("embeddings.json")
49 	if err != nil {
50 		log.Fatalf("Failed to read file: %v", err)
51 	}
52 
53 	var documents []bson.M
54 	if err := bson.UnmarshalExtJSON(data, false, &documents); err != nil {
55 		log.Fatalf("Failed to unmarshal JSON data: %v", err)
56 	}
57 
58 	if _, err := collection.InsertMany(context.TODO(), documents); err != nil {
59 		log.Fatalf("Failed to insert documents: %v", err)
60 	}
61 
62 	fmt.Println("Inserted documents into MongoDB")
63 }
64 
65 // Sets up vector search index in MongoDB
66 func setupVectorSearchIndex(client *mongo.Client) {
67 	database := client.Database(dbName)
68 	collection := database.Collection(collectionName)
69 
70 	ctx := context.TODO()
71 
72 	type vectorDefinitionField struct {
73 		Type          string `bson:"type"`
74 		Path          string `bson:"path"`
75 		NumDimensions int    `bson:"numDimensions"`
76 		Similarity    string `bson:"similarity"`
77 	}
78 
79 	type vectorDefinition struct {
80 		Fields []vectorDefinitionField `bson:"fields"`
81 	}
82 
83 	fields := make([]vectorDefinitionField, len(embeddingFields))
84 	for i, field := range embeddingFields {
85 		fields[i] = vectorDefinitionField{
86 			Type:          "vector",
87 			Path:          field,
88 			NumDimensions: numberOfDimensions,
89 			Similarity:    embeddingSimilarity[i],
90 		}
91 	}
92 	fmt.Println(fields)
93 
94 	opts := options.SearchIndexes().SetName(indexName).SetType("vectorSearch")
95 
96 	indexModel := mongo.SearchIndexModel{
97 		Definition: vectorDefinition{
98 			Fields: fields,
99 		},
100 		Options: opts,
101 	}
102 
103 	// Create the index
104 	log.Println("Creating the index.")
105 	searchIndexName, err := collection.SearchIndexes().CreateOne(ctx, indexModel)
106 	if err != nil {
107 		log.Fatalf("Failed to create the search index: %v", err)
108 	}
109 
110 	// Polling to confirm successful index creation
111 	log.Println("Polling to confirm successful index creation.")
112 	log.Println("NOTE: This may take up to a minute.")
113 	searchIndexes := collection.SearchIndexes()
114 	var doc bson.Raw
115 
116 	for doc == nil {
117 		cursor, err := searchIndexes.List(ctx, options.SearchIndexes().SetName(searchIndexName))
118 		if err != nil {
119 			log.Fatalf("failed to list search indexes: %v", err)
120 		}
121 
122 		if !cursor.Next(ctx) {
123 			break
124 		}
125 
126 		name := cursor.Current.Lookup("name").StringValue()
127 		queryable := cursor.Current.Lookup("queryable").Boolean()
128 		if name == searchIndexName && queryable {
129 			doc = cursor.Current
130 		} else {
131 			time.Sleep(5 * time.Second)
132 		}
133 	}
134 
135 	log.Println("Name of Index Created: " + searchIndexName)
136 }

다음을 코드의 유효한 값으로 바꾸고 파일 저장합니다.

`MONGODB_URI`	환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
`<DATABASE-NAME>`	컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.
`<COLLECTION-NAME>`	컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.
`<INDEX-NAME>`	컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.

다음 명령을 사용하여 프로그램을 실행합니다.
터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.
go run UploadDataAndCreateIndex.go
Inserted documents into MongoDB Creating the index. Polling to confirm successful index creation. NOTE: This may take up to a minute. Name of Index Created: <INDEX-NAME>
Atlas 클러스터에 로그인하고 다음을 확인하세요.
- 네임스페이스의 데이터입니다.
- 컬렉션에 대한 Atlas Vector Search 인덱스입니다.

컬렉션에 대해 쿼리를 생성하고 실행합니다.

임베딩을 테스트하려면 컬렉션 에 대해 쿼리 실행 수 있습니다. 임베딩 모델 제공자 사용하여 쿼리 텍스트에 대한 float32, int8 및 int1 임베딩을 생성합니다. 다음 샘플 코드는 Cohere의 embed API 사용하여 완전 정밀도 벡터를 생성합니다. 임베딩을 생성한 후 MongoDB 고 (Go) 운전자 사용하여 네이티브 벡터 임베딩을 BSON 바이너리 벡터로 $vectorSearch 변환하고 컬렉션 에 대해 쿼리 실행 .

고 (Go) 프로젝트 에 CreateEmbeddingsAndRunQuery.go 이라는 새 파일 을 만듭니다.
```
touch CreateEmbeddingsAndRunQuery.go
```

다음 코드를 복사하여 CreateEmbeddingsAndRunQuery.go 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Cohere의 embed API를 사용하여 float32, int8, ubinary 벡터 임베딩을 생성합니다.
MongoDB 고 (Go) 운전자 사용하여 임베딩을 BSON 벡터로 변환합니다.binData
컬렉션에 대해 쿼리를 실행하고 결과를 반환합니다.

CreateEmbeddingsAndRunQuery.go

1 package main
2 
3 import (
4         "context"
5         "fmt"
6         "log"
7         "os"
8 
9         cohere "github.com/cohere-ai/cohere-go/v2"
10         cohereclient "github.com/cohere-ai/cohere-go/v2/client"
11         "go.mongodb.org/mongo-driver/v2/bson"
12         "go.mongodb.org/mongo-driver/v2/mongo"
13         "go.mongodb.org/mongo-driver/v2/mongo/options"
14 )
15 
16 const (
17 	dbName          = "<DATABASE-NAME>"
18 	collectionName  = "<COLLECTION-NAME>"
19 	vectorIndexName = "<INDEX-NAME>"
20 	dataFieldName   = "<TEXT-FIELD-NAME>"
21 	queryText       = "<QUERY-TEXT>"
22 )
23 
24 func main() {
25 	apiKey := os.Getenv("COHERE_API_KEY")
26 	mongodbURI := os.Getenv("MONGODB_URI")
27 
28 	if apiKey == "" {
29 		log.Fatal("API key not found. Set COHERE_API_KEY in your environment.")
30 	}
31 	if mongodbURI == "" {
32 		log.Fatal("MongoDB URI not found. Set MONGODB_URI in your environment.")
33 	}
34 
35 	embeddingsData, err := generateAndConvertEmbeddings(apiKey, queryText)
36 	if err != nil {
37 		log.Fatalf("Error generating embeddings: %v", err)
38 	}
39 
40 	err = runVectorSearchQuery(mongodbURI, embeddingsData)
41 	if err != nil {
42 		log.Fatalf("Error running vector search query: %v", err)
43 	}
44 }
45 
46 // Generate embeddings using Cohere's embed API from the query text
47 func generateAndConvertEmbeddings(apiKey, text string) (map[string]bson.Binary, error) {
48 	client := cohereclient.NewClient(cohereclient.WithToken(apiKey))
49 
50 	model := "embed-english-v3.0"
51 	response, err := client.V2.Embed(context.TODO(), &cohere.V2EmbedRequest{
52 		Texts:     []string{text},
53 		Model:     model,
54 		InputType: cohere.EmbedInputTypeSearchQuery,
55 		EmbeddingTypes: []cohere.EmbeddingType{
56 			cohere.EmbeddingTypeFloat,
57 			cohere.EmbeddingTypeInt8,
58 			cohere.EmbeddingTypeUbinary,
59 		},
60 	})
61 	if err != nil {
62 		return nil, fmt.Errorf("failed to fetch embeddings: %w", err)
63 	}
64 
65 	if response.Embeddings == nil || len(response.Embeddings.Float) == 0 {
66 		return nil, fmt.Errorf("no embeddings found in the API response")
67 	}
68 
69 	return createBSONVectorEmbeddings(response.Embeddings), nil
70 }
71 
72 // Convert embeddings to BSON vectors using MongoDB Go Driver
73 func createBSONVectorEmbeddings(embeddings *cohere.EmbedByTypeResponseEmbeddings) map[string]bson.Binary {
74 	bsonVectorEmbeddings := make(map[string]bson.Binary)
75 
76 	// Convert float embeddings
77 	if len(embeddings.Float) > 0 {
78 		floatData := convertFloat64ToFloat32(embeddings.Float[0])
79 		floatVector := bson.NewVector(floatData) // Here, we are using bson.NewVector to create a BSON Vector.
80 		bsonVectorEmbeddings["float32"] = floatVector.Binary()
81 	}
82 
83 	// Convert int8 embeddings
84 	if len(embeddings.Int8) > 0 {
85 		int8Data := convertIntToInt8(embeddings.Int8[0])
86 		int8Vector := bson.NewVector(int8Data) // Similarly, create BSON Vector for int8 data.
87 		bsonVectorEmbeddings["int8"] = int8Vector.Binary()
88 	}
89 
90 	// Convert ubinary embeddings to a packed bit vector
91 	if len(embeddings.Ubinary) > 0 {
92 		int1Data := convertIntToBytes(embeddings.Ubinary[0])
93 		ubinaryVector, _ := bson.NewPackedBitVector(int1Data, 0) // Packed bit using bson.NewPackedBitVector
94 		bsonVectorEmbeddings["int1"] = ubinaryVector.Binary()
95 	}
96 
97 	return bsonVectorEmbeddings
98 }
99 
100 // Run $vectorSearch query using the embeddings
101 func runVectorSearchQuery(mongodbURI string, embeddingsData map[string]bson.Binary) error {
102 	ctx := context.Background()
103 	clientOptions := options.Client().ApplyURI(mongodbURI)
104 	client, err := mongo.Connect(clientOptions)
105 	if err != nil {
106 		return fmt.Errorf("failed to connect to MongoDB: %w", err)
107 	}
108 	defer func() { _ = client.Disconnect(ctx) }()
109 
110 	db := client.Database(dbName)
111 	collection := db.Collection(collectionName)
112 
113 	// Map the correct paths for embeddings
114 	pathMap := map[string]string{
115 		"float32": "embeddings_float32",
116 		"int8":    "embeddings_int8",
117 		"int1":    "embeddings_int1",
118 	}
119 
120 	for pathKey, queryVector := range embeddingsData {
121 		path, ok := pathMap[pathKey]
122 		if !ok {
123 			return fmt.Errorf("invalid path key: %s", pathKey)
124 		}
125 
126 		pipeline := mongo.Pipeline{
127 			{
128 				{"$vectorSearch", bson.D{
129 					{"queryVector", queryVector},
130 					{"index", vectorIndexName},
131 					{"path", path},
132 					{"numCandidates", 5},
133 					{"limit", 2},
134 				}},
135 			},
136 			{
137 				{"$project", bson.D{
138 					{"_id", 1},
139 					{dataFieldName, 1},
140 					{"score", bson.D{
141 						{"$meta", "vectorSearchScore"},
142 					}},
143 				}},
144 			},
145 		}
146 
147 		cursor, err := collection.Aggregate(context.Background(), pipeline)
148 		if err != nil {
149 			return fmt.Errorf("failed to run vector search aggregation query: %w", err)
150 		}
151 		defer cursor.Close(ctx)
152 
153 		var results []bson.M
154 		if err = cursor.All(context.Background(), &results); err != nil {
155 			return fmt.Errorf("failed to parse aggregation query results: %w", err)
156 		}
157 
158 		fmt.Printf("Results from %v embeddings:\n", path)
159 		for _, result := range results {
160 			fmt.Println(result)
161 		}
162 	}
163 
164 	return nil
165 }
166 
167 // Utility functions to handle list conversion
168 func convertFloat64ToFloat32(f64s []float64) []float32 {
169 	f32s := make([]float32, len(f64s))
170 	for i, v := range f64s {
171 		f32s[i] = float32(v)
172 	}
173 	return f32s
174 }
175 
176 func convertIntToInt8(ints []int) []int8 {
177 	ints8 := make([]int8, len(ints))
178 	for i, val := range ints {
179 		ints8[i] = int8(val)
180 	}
181 	return ints8
182 }
183 
184 func convertIntToBytes(ints []int) []byte {
185 	bytes := make([]byte, len(ints))
186 	for i, val := range ints {
187 		bytes[i] = byte(val)
188 	}
189 	return bytes
190 }

코드에서 다음 자리 표시자 값을 바꾸고 파일을 저장합니다.

`MONGODB_URI`	환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
`COHERE_API_KEY`	환경 변수를 설정하다 하지 않은 경우에만 Cohere API 키를 사용할 수 있습니다.
`<DATABASE-NAME>`	Atlas 클러스터에 있는 데이터베이스의 이름입니다.
`<COLLECTION-NAME>`	데이터를 삽입한 컬렉션의 이름입니다.
`<INDEX-NAME>`	컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.
`<TEXT-FIELD-NAME>`	임베딩을 생성한 텍스트가 포함된 필드 의 이름입니다.
`<QUERY-TEXT>`	쿼리의 텍스트입니다. 이 예시에서는 `science fact`을(를) 사용합니다.

애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.

터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.

go run CreateEmbeddingsAndRunQuery.go

Results from embeddings_int1 embeddings:
{"_id":{"$oid":"68129070e7f516cc19658bc9"},"text":"Mount Everest is the highest peak on Earth at 8,848m.","score":{"$numberDouble":"0.642578125"}}
{"text":"The Great Wall of China is visible from space.","score":{"$numberDouble":"0.61328125"},"_id":{"$oid":"68129070e7f516cc19658bc7"}}
Results from embeddings_float32 embeddings:
{"_id":{"$oid":"68129070e7f516cc19658bc9"},"text":"Mount Everest is the highest peak on Earth at 8,848m.","score":{"$numberDouble":"0.6583383083343506"}}
{"_id":{"$oid":"68129070e7f516cc19658bc7"},"text":"The Great Wall of China is visible from space.","score":{"$numberDouble":"0.6536108255386353"}}
Results from embeddings_int8 embeddings:
{"_id":{"$oid":"68129070e7f516cc19658bc9"},"text":"Mount Everest is the highest peak on Earth at 8,848m.","score":{"$numberDouble":"0.5149773359298706"}}
{"_id":{"$oid":"68129070e7f516cc19658bc7"},"text":"The Great Wall of China is visible from space.","score":{"$numberDouble":"0.5146723985671997"}}

필요한 라이브러리를 설치합니다.

다음 명령을 실행하여 MongoDB 고 (Go) 드라이버 설치합니다. 이 작업을 완료하는 데 몇 분 정도 걸릴 수 있습니다.

go get go.mongodb.org/mongo-driver/v2/mongo

go get github.com/cohere-ai/cohere-go/v2/client@v2.13.0

Go 프로젝트를 초기화합니다.

터미널 창에서 다음 명령을 실행하여 ingest-binary-vectors라는 이름의 새 디렉토리를 만들고 프로젝트를 초기화합니다.

mkdir ingest-binary-vectors-project
cd ingest-binary-vectors-project
go mod init ingest-binary-vectors-project

터미널에서 환경 변수를 설정합니다.

임베딩을 생성하고 변환하기 위해 임베딩 모델 제공자에 액세스하려면, 필요에 따라 임베딩 모델 제공자의 API 키에 대한 환경 변수를 설정하세요.
Cohere의 임베딩을 사용하기 위해 COHERE_API_KEY 환경 변수를 설정하세요.
```
export COHERE_API_KEY="<COHERE-API-KEY>"
```
Atlas 클러스터에 액세스하려면 MONGODB_URI 환경 변수를 설정하세요.
```
export MONGODB_URI="<CONNECTION-STRING>"
```
연결 문자열은 다음 형식을 사용해야 합니다.
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```

(조건부)데이터에서 임베딩을 생성합니다.

임베딩 모델 제공자 사용하여 데이터에 대한 float, int8 및 int1 임베딩을 생성한 다음 MongoDB 고 (Go) 운전자 사용하여 네이티브 벡터 임베딩을 BSON 벡터로 변환할 수 있습니다. 다음 샘플 코드는 Cohere의 embed API 사용하여 sample_airbnb.listingsAndReviews 네임스페이스의 데이터에서 완전 정밀도 벡터를 생성합니다.

고 (Go) 프로젝트 에 GenerateAndConvertEmbeddings.go 이라는 새 파일 을 만듭니다.
```
touch GenerateAndConvertEmbeddings.go
```

다음 코드를 복사하여 GenerateAndConvertEmbeddings.go 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

sample_airbnb.listingsAndReviews 네임스페이스의 50 문서에서 summary 필드를 가져옵니다.
Cohere의 embed API를 사용하여 float32, int8, ubinary 벡터 임베딩을 생성합니다.
MongoDB 고 (Go) 운전자 사용하여 임베딩을 BSON 벡터로 변환합니다.binData
embeddings.json 파일을 만들고 이 파일에 데이터와 임베딩을 저장합니다.

GenerateAndConvertEmbeddings.go

1 package main  
2   
3 import (  
4 	"context"  
5 	"encoding/json"  
6 	"fmt"  
7 	"log"  
8 	"os"  
9   
10 	cohere "github.com/cohere-ai/cohere-go/v2"  
11 	cohereclient "github.com/cohere-ai/cohere-go/v2/client"  
12 	"go.mongodb.org/mongo-driver/v2/bson"  
13 	"go.mongodb.org/mongo-driver/v2/mongo"  
14 	"go.mongodb.org/mongo-driver/v2/mongo/options"  
15 )  
16   
17 const (  
18 	batchSize  = 96  
19 	dbName     = "sample_airbnb"  
20 	collName   = "listingsAndReviews"  
21 	embedModel = "embed-english-v3.0"  
22 )  
23   
24 func main() {  
25 	apiKey := os.Getenv("COHERE_API_KEY")  
26 	mongodbURI := os.Getenv("MONGODB_URI")  
27   
28 	if apiKey == "" || mongodbURI == "" {  
29 		log.Fatal("Ensure COHERE_API_KEY and MONGODB_URI are set.")  
30 	}  
31   
32 	summaries, err := fetchSummariesFromMongoDB(mongodbURI)  
33 	if err != nil {  
34 		log.Fatalf("Error fetching summaries: %v", err)  
35 	}  
36   
37 	client := cohereclient.NewClient(cohereclient.WithToken(apiKey))  
38   
39 	for start := 0; start < len(summaries); start += batchSize {  
40 		end := start + batchSize  
41 		if end > len(summaries) {  
42 			end = len(summaries)  
43 		}  
44   
45 		embeddingsData, err := fetchEmbeddingsFromCohere(client, summaries[start:end])  
46 		if err != nil {  
47 			log.Fatalf("Error fetching embeddings: %v", err)  
48 		}  
49   
50 		if embeddingsData.Embeddings == nil {  
51 			continue  
52 		}  
53   
54 		documents := convertCohereResponseToStructs(summaries[start:end], embeddingsData)  
55   
56 		err = writeJSONToFile("embeddings.json", documents)  
57 		if err != nil {  
58 			log.Fatalf("Error writing embeddings to JSON: %v", err)  
59 		}  
60 	}  
61   
62 	fmt.Println("Embeddings successfully saved to embeddings.json")  
63 }  
64   
65 func fetchSummariesFromMongoDB(uri string) ([]string, error) {  
66 	ctx := context.TODO()  
67 	clientOpts := options.Client().ApplyURI(uri)  
68   
69 	client, err := mongo.Connect(clientOpts)  
70 	if err != nil {  
71 		return nil, fmt.Errorf("failed to connect to MongoDB: %w", err)  
72 	}  
73 	defer func() {  
74 		if err := client.Disconnect(ctx); err != nil {  
75 			log.Fatalf("Failed to disconnect MongoDB client: %v", err)  
76 		}  
77 	}()  
78   
79 	collection := client.Database(dbName).Collection(collName)  
80 	filter := bson.M{"summary": bson.M{"$nin": []interface{}{nil, ""}}}  
81   
82 	cursor, err := collection.Find(ctx, filter, options.Find().SetLimit(50))  
83 	if err != nil {  
84 		return nil, fmt.Errorf("error finding documents: %w", err)  
85 	}  
86 	defer cursor.Close(ctx)  
87   
88 	var summaries []string  
89 	for cursor.Next(ctx) {  
90 		var result struct {  
91 			Summary string  
92 		}  
93 		if err := cursor.Decode(&result); err != nil {  
94 			return nil, fmt.Errorf("error decoding document: %w", err)  
95 		}  
96 		if summary := result.Summary; summary != "" {  
97 			summaries = append(summaries, summary)  
98 		}  
99 	}  
100   
101 	if err := cursor.Err(); err != nil {  
102 		return nil, fmt.Errorf("cursor error: %w", err)  
103 	}  
104   
105 	return summaries, nil  
106 }  
107   
108 func fetchEmbeddingsFromCohere(client *cohereclient.Client, batchData []string) (*cohere.EmbedByTypeResponse, error) {  
109 	response, err := client.V2.Embed(context.TODO(), &cohere.V2EmbedRequest{  
110 		Texts:       batchData,  
111 		Model:       embedModel,  
112 		InputType:   cohere.EmbedInputTypeSearchDocument,  
113 		EmbeddingTypes: []cohere.EmbeddingType{  
114 			cohere.EmbeddingTypeFloat,  
115 			cohere.EmbeddingTypeInt8,  
116 			cohere.EmbeddingTypeUbinary,  
117 		},  
118 	})  
119 	if err != nil {  
120 		return nil, fmt.Errorf("failed to fetch embeddings: %w", err)  
121 	}  
122   
123 	return response, nil  
124 }  
125   
126 func convertCohereResponseToStructs(summaries []string, embeddings *cohere.EmbedByTypeResponse) []bson.M {  
127 	var documents []bson.M  
128   
129 	for i, summary := range summaries {  
130 		float32Emb := convertFloat64ToFloat32(embeddings.Embeddings.Float[i])  
131 		int8Emb := convertIntToInt8(embeddings.Embeddings.Int8[i])  
132 		ubinaryEmb := convertIntToBytes(embeddings.Embeddings.Ubinary[i])  
133   
134 		floatVector := bson.NewVector(float32Emb)  
135 		int8Vector := bson.NewVector(int8Emb)  
136 		ubinaryVector, err := bson.NewPackedBitVector(ubinaryEmb, 0)  
137 		if err != nil {  
138 			log.Fatalf("Error creating PackedBitVector: %v", err)  
139 		}  
140   
141 		document := bson.M{  
142 			"text":                summary,  
143 			"embeddings_float32":  floatVector.Binary(),  
144 			"embeddings_int8":     int8Vector.Binary(),  
145 			"embeddings_int1":     ubinaryVector.Binary(),  
146 		}  
147   
148 		documents = append(documents, document)  
149 	}  
150   
151 	return documents  
152 }  
153   
154 func writeJSONToFile(filename string, docs []bson.M) error {  
155 	file, err := os.Create(filename)  
156 	if err != nil {  
157 		return fmt.Errorf("failed to create file: %w", err)  
158 	}  
159 	defer file.Close()  
160   
161 	var jsonDocuments []json.RawMessage  
162 	for _, document := range docs {  
163 		jsonBytes, err := bson.MarshalExtJSON(document, false, false)  
164 		if err != nil {  
165 			log.Fatalf("Error: %v", err)  
166 		}  
167 		jsonDocuments = append(jsonDocuments, jsonBytes)  
168 	}  
169   
170 	jsonData, err := json.MarshalIndent(jsonDocuments, "", "  ")  
171 	if err != nil {  
172 		return fmt.Errorf("failed to marshal JSON: %w", err)  
173 	}  
174   
175 	_, err = file.Write(jsonData)  
176 	if err != nil {  
177 		return fmt.Errorf("failed to write JSON to file: %w", err)  
178 	}  
179   
180 	return nil  
181 }  
182   
183 func convertFloat64ToFloat32(f64s []float64) []float32 {  
184 	f32s := make([]float32, len(f64s))  
185 	for i, v := range f64s {  
186 		f32s[i] = float32(v)  
187 	}  
188 	return f32s  
189 }  
190   
191 func convertIntToInt8(ints []int) []int8 {  
192 	ints8 := make([]int8, len(ints))  
193 	for i, val := range ints {  
194 		ints8[i] = int8(val)  
195 	}  
196 	return ints8  
197 }  
198   
199 func convertIntToBytes(ints []int) []byte {  
200 	bytes := make([]byte, len(ints))  
201 	for i, val := range ints {  
202 		bytes[i] = byte(val)  
203 	}  
204 	return bytes  
205 }

환경 변수를 설정하지 않았다면 코드에서 다음 자리 표시자 값을 변경하고 파일을 저장합니다.
MONGODB_URI
환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
COHERE_API_KEY
환경 변수를 설정하지 않은 경우 Cohere API 키를 사용합니다.
애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.
터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.
go run GenerateAndConvertEmbeddings.go
Embeddings successfully saved to embeddings.json
embeddings.json 파일의 임베딩을 확인합니다.

임베딩을 생성하고 임베딩을 binData 벡터로 변환하는 방법에 대한 자세한 내용은 벡터 임베딩 생성 방법을 참조하세요.

데이터를 수집하고 Atlas Vector Search 인덱스를 생성합니다.

데이터와 임베딩을 Atlas 클러스터의 컬렉션에 업로드하고 해당 데이터에 Atlas Vector Search 인덱스를 생성하여 $vectorSearch 쿼리를 실행해야 합니다.

고 (Go) 프로젝트 에 UploadDataAndCreateIndex.go 이라는 새 파일 을 만듭니다.
```
touch UploadDataAndCreateIndex.go
```

다음 코드를 복사하여 UploadDataAndCreateIndex.go 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

embeddings.json 파일의 float32, int8, int1 임베딩을 Atlas 클러스터에 업로드합니다.
embeddings.float32, embeddings.int8, embeddings.int1 필드에 Atlas Vector Search 인덱스를 생성합니다.

UploadDataAndCreateIndex.go

1 package main  
2   
3 import (  
4 	"context"  
5 	"fmt"  
6 	"io/ioutil"  
7 	"log"  
8     "time"
9 	"os"  
10   
11 	"go.mongodb.org/mongo-driver/v2/bson"  
12 	"go.mongodb.org/mongo-driver/v2/mongo"  
13 	"go.mongodb.org/mongo-driver/v2/mongo/options"  
14 )  
15   
16 var (  
17 	mongodbURI          = os.Getenv("MONGODB_URI")  
18 	dbName              = "sample_airbnb"  
19 	collectionName      = "listingsAndReviews"  
20 	indexName           = "<INDEX-NAME>"  
21 	numberOfDimensions  = 1024  
22 	embeddingFields     = []string{"embeddings_float32", "embeddings_int8", "embeddings_int1"}  
23 	embeddingSimilarity = []string{"dotProduct", "dotProduct", "euclidean"}  
24 )  
25   
26 func main() {  
27 	if mongodbURI == "" {  
28 		log.Fatal("MONGODB_URI environment variable not set")  
29 	}  
30   
31 	clientOptions := options.Client().ApplyURI(mongodbURI)  
32 	client, err := mongo.Connect(clientOptions)  
33 	if err != nil {  
34 		log.Fatalf("Error connecting to MongoDB: %v", err)  
35 	}  
36 	defer func() {  
37 		if err = client.Disconnect(context.TODO()); err != nil {  
38 			log.Fatal(err)  
39 		}  
40 	}()  
41   
42 	if err := uploadEmbeddingsData(client); err != nil {  
43 		log.Fatalf("Error uploading embeddings data: %v", err)  
44 	}  
45   
46 	setupVectorSearchIndex(client)  
47 }  
48   
49 func uploadEmbeddingsData(client *mongo.Client) error {  
50 	collection := client.Database(dbName).Collection(collectionName)  
51   
52 	// Load embeddings.json file  
53 	fileContent, err := ioutil.ReadFile("embeddings.json")  
54 	if err != nil {  
55 		return fmt.Errorf("error reading file: %w", err)  
56 	}  
57   
58 	// Convert JSON file content to BSON compatible format using UnmarshalExtJSON  
59 	var documents []bson.M  
60 	if err := bson.UnmarshalExtJSON(fileContent, false, &documents); err != nil {  
61 		return fmt.Errorf("failed to unmarshal JSON data: %w", err)  
62 	}  
63   
64 	// Update documents in MongoDB  
65 	for _, doc := range documents {  
66 		summary, exists := doc["text"].(string)  
67 		if !exists {  
68 			return fmt.Errorf("missing 'text' field in document")  
69 		}  
70   
71 		// Using bson.Binary ensures binary data is correctly interpreted  
72 		if float32Bin, ok := doc["embeddings_float32"].(bson.Binary); ok {  
73 			doc["embeddings_float32"] = float32Bin  
74 		}  
75 		if int8Bin, ok := doc["embeddings_int8"].(bson.Binary); ok {  
76 			doc["embeddings_int8"] = int8Bin  
77 		}  
78 		if int1Bin, ok := doc["embeddings_int1"].(bson.Binary); ok {  
79 			doc["embeddings_int1"] = int1Bin  
80 		}  
81   
82 		filter := bson.M{"summary": summary}  
83 		update := bson.M{  
84 			"$set": doc,  
85 		}  
86   
87 		// Set the upsert option  
88 		opts := options.UpdateMany().SetUpsert(true)  
89   
90 		_, err = collection.UpdateMany(context.TODO(), filter, update, opts)  
91 		if err != nil {  
92 			return fmt.Errorf("failed to update documents: %w", err)  
93 		}  
94 	}  
95   
96 	return nil  
97 }  
98   
99 // Sets up vector search index in MongoDB  
100 func setupVectorSearchIndex(client *mongo.Client) {  
101 	database := client.Database(dbName)  
102 	collection := database.Collection(collectionName)  
103   
104 	ctx := context.TODO()  
105   
106 	type vectorDefinitionField struct {  
107 		Type          string `bson:"type"`  
108 		Path          string `bson:"path"`  
109 		NumDimensions int    `bson:"numDimensions"`  
110 		Similarity    string `bson:"similarity"`  
111 	}  
112   
113 	type vectorDefinition struct {  
114 		Fields []vectorDefinitionField `bson:"fields"`  
115 	}  
116   
117 	fields := make([]vectorDefinitionField, len(embeddingFields))  
118 	for i, field := range embeddingFields {  
119 		fields[i] = vectorDefinitionField{  
120 			Type:          "vector",  
121 			Path:          field,  
122 			NumDimensions: numberOfDimensions,  
123 			Similarity:    embeddingSimilarity[i],  
124 		}  
125 	}  
126   
127 	opts := options.SearchIndexes().SetName(indexName).SetType("vectorSearch")  
128   
129 	indexModel := mongo.SearchIndexModel{  
130 		Definition: vectorDefinition{  
131 			Fields: fields,  
132 		},  
133 		Options: opts,  
134 	}  
135   
136 	// Create the index  
137 	log.Println("Creating the index.")  
138 	searchIndexName, err := collection.SearchIndexes().CreateOne(ctx, indexModel)  
139 	if err != nil {  
140 		log.Fatalf("Failed to create the search index: %v", err)  
141 	}  
142   
143 	// Polling to confirm successful index creation  
144 	log.Println("Polling to confirm successful index creation.")  
145 	log.Println("NOTE: This may take up to a minute.")  
146 	searchIndexes := collection.SearchIndexes()  
147 	var doc bson.Raw  
148   
149 	for doc == nil {  
150 		cursor, err := searchIndexes.List(ctx, options.SearchIndexes().SetName(searchIndexName))  
151 		if err != nil {  
152 			log.Fatalf("failed to list search indexes: %v", err)  
153 		}  
154   
155 		if !cursor.Next(ctx) {  
156 			break  
157 		}  
158   
159 		name := cursor.Current.Lookup("name").StringValue()  
160 		queryable := cursor.Current.Lookup("queryable").Boolean()  
161 		if name == searchIndexName && queryable {  
162 			doc = cursor.Current  
163 		} else {  
164 			time.Sleep(5 * time.Second)  
165 		}  
166 	}  
167   
168 	log.Println("Name of Index Created: " + searchIndexName)  
169 }

코드에서 다음 자리 표시자 값을 바꾸고 파일을 저장합니다.
MONGODB_URI
환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
<INDEX-NAME>
컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.
애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.
터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.
go run UploadDataAndCreateIndex.go
Creating the index. Polling to confirm successful index creation. NOTE: This may take up to a minute. Name of Index Created: <INDEX-NAME>
Atlas 클러스터에 로그인하고 다음을 확인하세요.
- 네임스페이스의 데이터입니다.
- 컬렉션에 대한 Atlas Vector Search 인덱스입니다.

컬렉션에 대해 쿼리를 생성하고 실행합니다.

임베딩을 테스트하려면 컬렉션 에 대해 쿼리 실행 수 있습니다. 임베딩 모델 제공자 사용하여 쿼리 텍스트에 대한 float, int8 및 int1 임베딩을 생성합니다. 다음 샘플 코드는 Cohere의 embed API 사용하여 완전 정밀도 벡터를 생성합니다. 임베딩을 생성한 후 MongoDB 고 (Go) 운전자 사용하여 네이티브 벡터 임베딩을 BSON 벡터로 $vectorSearch 변환하고 컬렉션 에 대해 쿼리 실행 .

고 (Go) 프로젝트 에 CreateEmbeddingsAndRunQuery.go 이라는 새 파일 을 만듭니다.
```
touch CreateEmbeddingsAndRunQuery.go
```

다음 코드를 복사하여 CreateEmbeddingsAndRunQuery.go 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Cohere의 embed API를 사용하여 float32, int8, ubinary 벡터 임베딩을 생성합니다.
MongoDB 고 (Go) 운전자 사용하여 임베딩을 BSON 벡터로 변환합니다.binData
컬렉션에 대해 쿼리를 실행하고 결과를 반환합니다.

CreateEmbeddingsAndRunQuery.go

1 package main
2 
3 import (
4         "context"
5         "fmt"
6         "log"
7         "os"
8 
9         cohere "github.com/cohere-ai/cohere-go/v2"
10         cohereclient "github.com/cohere-ai/cohere-go/v2/client"
11         "go.mongodb.org/mongo-driver/v2/bson"
12         "go.mongodb.org/mongo-driver/v2/mongo"
13         "go.mongodb.org/mongo-driver/v2/mongo/options"
14 )
15 
16 const (
17 	dbName          = "<DATABASE-NAME>"
18 	collectionName  = "<COLLECTION-NAME>"
19 	vectorIndexName = "<INDEX-NAME>"
20 	dataFieldName   = "<TEXT-FIELD-NAME>"
21 	queryText       = "<QUERY-TEXT>"
22 )
23 
24 func main() {
25 	apiKey := os.Getenv("COHERE_API_KEY")
26 	mongodbURI := os.Getenv("MONGODB_URI")
27 
28 	if apiKey == "" {
29 		log.Fatal("API key not found. Set COHERE_API_KEY in your environment.")
30 	}
31 	if mongodbURI == "" {
32 		log.Fatal("MongoDB URI not found. Set MONGODB_URI in your environment.")
33 	}
34 
35 	embeddingsData, err := generateAndConvertEmbeddings(apiKey, queryText)
36 	if err != nil {
37 		log.Fatalf("Error generating embeddings: %v", err)
38 	}
39 
40 	err = runVectorSearchQuery(mongodbURI, embeddingsData)
41 	if err != nil {
42 		log.Fatalf("Error running vector search query: %v", err)
43 	}
44 }
45 
46 // Generate embeddings using Cohere's embed API from the query text
47 func generateAndConvertEmbeddings(apiKey, text string) (map[string]bson.Binary, error) {
48 	client := cohereclient.NewClient(cohereclient.WithToken(apiKey))
49 
50 	model := "embed-english-v3.0"
51 	response, err := client.V2.Embed(context.TODO(), &cohere.V2EmbedRequest{
52 		Texts:     []string{text},
53 		Model:     model,
54 		InputType: cohere.EmbedInputTypeSearchQuery,
55 		EmbeddingTypes: []cohere.EmbeddingType{
56 			cohere.EmbeddingTypeFloat,
57 			cohere.EmbeddingTypeInt8,
58 			cohere.EmbeddingTypeUbinary,
59 		},
60 	})
61 	if err != nil {
62 		return nil, fmt.Errorf("failed to fetch embeddings: %w", err)
63 	}
64 
65 	if response.Embeddings == nil || len(response.Embeddings.Float) == 0 {
66 		return nil, fmt.Errorf("no embeddings found in the API response")
67 	}
68 
69 	return createBSONVectorEmbeddings(response.Embeddings), nil
70 }
71 
72 // Convert embeddings to BSON vectors using MongoDB Go Driver
73 func createBSONVectorEmbeddings(embeddings *cohere.EmbedByTypeResponseEmbeddings) map[string]bson.Binary {
74 	bsonVectorEmbeddings := make(map[string]bson.Binary)
75 
76 	// Convert float embeddings
77 	if len(embeddings.Float) > 0 {
78 		floatData := convertFloat64ToFloat32(embeddings.Float[0])
79 		floatVector := bson.NewVector(floatData) // Here, we are using bson.NewVector to create a BSON Vector.
80 		bsonVectorEmbeddings["float32"] = floatVector.Binary()
81 	}
82 
83 	// Convert int8 embeddings
84 	if len(embeddings.Int8) > 0 {
85 		int8Data := convertIntToInt8(embeddings.Int8[0])
86 		int8Vector := bson.NewVector(int8Data) // Similarly, create BSON Vector for int8 data.
87 		bsonVectorEmbeddings["int8"] = int8Vector.Binary()
88 	}
89 
90 	// Convert ubinary embeddings to a packed bit vector
91 	if len(embeddings.Ubinary) > 0 {
92 		int1Data := convertIntToBytes(embeddings.Ubinary[0])
93 		ubinaryVector, _ := bson.NewPackedBitVector(int1Data, 0) // Packed bit using bson.NewPackedBitVector
94 		bsonVectorEmbeddings["int1"] = ubinaryVector.Binary()
95 	}
96 
97 	return bsonVectorEmbeddings
98 }
99 
100 // Run $vectorSearch query using the embeddings
101 func runVectorSearchQuery(mongodbURI string, embeddingsData map[string]bson.Binary) error {
102 	ctx := context.Background()
103 	clientOptions := options.Client().ApplyURI(mongodbURI)
104 	client, err := mongo.Connect(clientOptions)
105 	if err != nil {
106 		return fmt.Errorf("failed to connect to MongoDB: %w", err)
107 	}
108 	defer func() { _ = client.Disconnect(ctx) }()
109 
110 	db := client.Database(dbName)
111 	collection := db.Collection(collectionName)
112 
113 	// Map the correct paths for embeddings
114 	pathMap := map[string]string{
115 		"float32": "embeddings_float32",
116 		"int8":    "embeddings_int8",
117 		"int1":    "embeddings_int1",
118 	}
119 
120 	for pathKey, queryVector := range embeddingsData {
121 		path, ok := pathMap[pathKey]
122 		if !ok {
123 			return fmt.Errorf("invalid path key: %s", pathKey)
124 		}
125 
126 		pipeline := mongo.Pipeline{
127 			{
128 				{"$vectorSearch", bson.D{
129 					{"queryVector", queryVector},
130 					{"index", vectorIndexName},
131 					{"path", path},
132 					{"numCandidates", 5},
133 					{"limit", 2},
134 				}},
135 			},
136 			{
137 				{"$project", bson.D{
138 					{"_id", 1},
139 					{dataFieldName, 1},
140 					{"score", bson.D{
141 						{"$meta", "vectorSearchScore"},
142 					}},
143 				}},
144 			},
145 		}
146 
147 		cursor, err := collection.Aggregate(context.Background(), pipeline)
148 		if err != nil {
149 			return fmt.Errorf("failed to run vector search aggregation query: %w", err)
150 		}
151 		defer cursor.Close(ctx)
152 
153 		var results []bson.M
154 		if err = cursor.All(context.Background(), &results); err != nil {
155 			return fmt.Errorf("failed to parse aggregation query results: %w", err)
156 		}
157 
158 		fmt.Printf("Results from %v embeddings:\n", path)
159 		for _, result := range results {
160 			fmt.Println(result)
161 		}
162 	}
163 
164 	return nil
165 }
166 
167 // Utility functions to handle list conversion
168 func convertFloat64ToFloat32(f64s []float64) []float32 {
169 	f32s := make([]float32, len(f64s))
170 	for i, v := range f64s {
171 		f32s[i] = float32(v)
172 	}
173 	return f32s
174 }
175 
176 func convertIntToInt8(ints []int) []int8 {
177 	ints8 := make([]int8, len(ints))
178 	for i, val := range ints {
179 		ints8[i] = int8(val)
180 	}
181 	return ints8
182 }
183 
184 func convertIntToBytes(ints []int) []byte {
185 	bytes := make([]byte, len(ints))
186 	for i, val := range ints {
187 		bytes[i] = byte(val)
188 	}
189 	return bytes
190 }

코드에서 다음 자리 표시자 값을 바꾸고 파일을 저장합니다.

`MONGODB_URI`	환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
`COHERE_API_KEY`	환경 변수를 설정하지 않은 경우 Cohere API 키를 사용합니다.
`<DATABASE-NAME>`	Atlas 클러스터에 있는 데이터베이스의 이름입니다. 이 예시에서는 `sample_airbnb`을(를) 사용합니다.
`<COLLECTION-NAME>`	데이터를 삽입한 컬렉션의 이름입니다. 이 예시에서는 `listingsAndReviews`을(를) 사용합니다.
`<INDEX-NAME>`	컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.
`<TEXT-FIELD-NAME>`	임베딩을 생성한 텍스트가 포함된 필드의 이름입니다. 이 예시에서는 `summary`을(를) 사용합니다.
`<QUERY-TEXT>`	쿼리의 텍스트입니다. 이 예시에서는 `ocean view`을(를) 사용합니다.

애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.

터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.

go run CreateEmbeddingsAndRunQuery.go

Results from embeddings_float32 embeddings:
{"_id":"10266175","summary":"A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.","score":{"$numberDouble":"0.7278661131858826"}}
{"summary":"A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.","score":{"$numberDouble":"0.688639760017395"},"_id":"1001265"}
Results from embeddings_int8 embeddings:
{"_id":"10266175","summary":"A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.","score":{"$numberDouble":"0.5215557217597961"}}
{"_id":"1001265","summary":"A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.","score":{"$numberDouble":"0.5179016590118408"}}
Results from embeddings_int1 embeddings:
{"_id":"10266175","summary":"A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.","score":{"$numberDouble":"0.6591796875"}}
{"_id":"1001265","summary":"A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.","score":{"$numberDouble":"0.6337890625"}}

임베딩을 생성하고 임베딩을 binData 벡터로 변환하는 방법에 대한 자세한 내용은 벡터 임베딩 생성 방법을 참조하세요.

IDE에서 MongoDB Java 드라이버에 대한 종속성이 구성된 Java 프로젝트를 생성한 후 프로젝트에서 다음 단계를 수행합니다. 예시를 사용해 보려면 자리 표시자를 유효한 값으로 변경하세요.

Java 프로젝트 생성하고 종속성을 설치합니다.

IDE에서 Maven 또는 Gradle을 사용하여 Java 프로젝트 만듭니다.

패키지 관리자에 따라 다음 종속성을 추가합니다.

Maven을 사용하는 경우 프로젝트의 pom.xml 파일 에 있는 dependencies 배열 에 다음 종속성을 추가합니다.

pom.xml

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.13.2</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.mongodb</groupId>
        <artifactId>mongodb-driver-sync</artifactId>
        <version>5.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.cohere</groupId>
        <artifactId>cohere-java</artifactId>
        <version>1.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>2.0.16</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>2.0.16</version>
        <scope>test</scope>
    </dependency>
</dependencies>

Gradle을 사용하는 경우 프로젝트의 build.gradle 파일 에 있는 dependencies 배열 에 다음을 추가합니다.

build.gradle

dependencies {
    // MongoDB Java Sync Driver v5.3.1 or later
    implementation 'org.mongodb:mongodb-driver-sync:[5.3.1,)'
    // Java library for working with Cohere models
    implementation 'ai.cohere:cohere-java:1.6.0'
    // SLF4J (The Simple Logging Facade for Java)
    testImplementation("org.slf4j:slf4j-simple:2.0.16")
    implementation("org.slf4j:slf4j-api:2.0.16")
}

패키지 관리자를 실행하여 프로젝트 에 종속성을 설치합니다.

환경 변수를 설정합니다.

참고

이 예시 IDE에서 프로젝트 에 대한 변수를 설정합니다. 프로덕션 애플리케이션은 배포서버 구성, CI/CD 파이프라인 또는 시크릿 관리자를 통해 환경 변수를 관리 할 수 있지만, 제공된 코드를 사용 사례 에 맞게 조정할 수 있습니다.

IDE에서 새 구성 템플릿을 만들고 프로젝트 에 다음 변수를 추가합니다.

IntelliJ IDEA를 사용하는 경우 새 Application 실행 구성 템플릿을 만들고, Environment variables 필드에 세미콜론으로 구분된 값(예: FOO=123;BAR=456)으로 변수를 추가합니다. 변경 사항을 적용하고 OK를 클릭합니다.
자세한 학습 은 IntelliJ IDEA 문서의 템플릿에서 실행/디버그 구성 만들기 섹션을 참조하세요.
Eclipse를 사용하는 경우 새 Java Application 시작 구성을 생성한 다음 Environment 탭 에서 각 변수를 새 키-값 쌍으로 추가합니다. 변경 사항을 적용하고 OK를 클릭합니다.
자세한 학습 은 Eclipse IDE 문서의 Java 애플리케이션 실행 구성 생성하기 섹션을 참조하세요.

환경 변수

COHERE_API_KEY=<api-key>
MONGODB_URI=<connection-string>

자리 표시자를 다음 값으로 업데이트합니다.

<api-key> 자리 표시자 값을 Cohere API 키로 대체합니다.
자리 <connection-string> 표시자 값을 Atlas cluster 의 SRV 연결 문자열 로 바꿉니다.
연결 문자열은 다음 형식을 사용해야 합니다.
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```

데이터에서 임베딩을 생성합니다.

임베딩 모델 제공자를 사용하여 데이터에 대해 float, int8, int1 임베딩을 생성한 후 MongoDB Java 드라이버를 사용하여 네이티브 벡터 임베딩을 BSON벡터로 변환할 수 있습니다. 다음 샘플 코드는 Cohere의 embed API를 사용하여 고정밀 벡터를 생성합니다.

Java 프로젝트에 GenerateAndConvertEmbeddings.java(이)라는 이름의 새 파일을 만듭니다.
```
touch GenerateAndConvertEmbeddings.java
```

다음 코드를 복사하여 GenerateAndConvertEmbeddings.java 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Cohere의 embed API를 사용하여 float32, int8, ubinary 벡터 임베딩을 생성합니다.
MongoDB Java 드라이버를 사용하여 임베딩을 BSONbinData 벡터로 변환합니다.
embeddings.json 파일을 만들고, 이 파일에 데이터와 임베딩을 저장하여 Atlas에 업로드합니다.

GenerateAndConvertEmbeddings.java

1 import com.cohere.api.Cohere;
2 import com.cohere.api.requests.EmbedRequest;
3 import com.cohere.api.types.EmbedByTypeResponse;
4 import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
5 import com.cohere.api.types.EmbedInputType;
6 import com.cohere.api.types.EmbedResponse;
7 import com.cohere.api.types.EmbeddingType;
8 import java.io.FileOutputStream;
9 import java.io.IOException;
10 import java.util.ArrayList;
11 import java.util.List;
12 import java.util.Objects;
13 import java.util.Optional;
14 import org.bson.BinaryVector;
15 import org.bson.Document;
16 
17 public class GenerateAndConvertEmbeddings {
18 
19     // List of text data to embed
20     private static final List<String> DATA = List.of(
21         "The Great Wall of China is visible from space.",
22         "The Eiffel Tower was completed in Paris in 1889.",
23         "Mount Everest is the highest peak on Earth at 8,848m.",
24         "Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
25         "The Mona Lisa was painted by Leonardo da Vinci."
26     );
27 
28     public static void main(String[] args) {
29         // Cohere API key for authentication
30         String apiKey = System.getenv("COHERE_API_KEY");
31 
32         // Fetch embeddings from the Cohere API
33         EmbedByTypeResponseEmbeddings embeddings = fetchEmbeddingsFromCohere(apiKey);
34         Document bsonEmbeddings = convertEmbeddingsToBson(embeddings);
35 
36         writeEmbeddingsToFile(bsonEmbeddings, "embeddings.json");
37     }
38 
39     // Fetches embeddings based on input data from the Cohere API
40     private static EmbedByTypeResponseEmbeddings fetchEmbeddingsFromCohere(String apiKey) {
41         if (Objects.isNull(apiKey) || apiKey.isEmpty()) {
42             throw new RuntimeException("API key not found. Please set COHERE_API_KEY in your environment.");
43         }
44 
45         Cohere cohere = Cohere.builder().token(apiKey).clientName("embed-example").build();
46 
47         try {
48             EmbedRequest request = EmbedRequest.builder()
49                 .model("embed-english-v3.0")
50                 .inputType(EmbedInputType.SEARCH_DOCUMENT)
51                 .texts(DATA)
52                 .embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
53                 .build();
54 
55             EmbedResponse response = cohere.embed(request);
56             Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
57 
58             return optionalEmbeddingsWrapper.orElseThrow().getEmbeddings();
59         } catch (Exception e) {
60             System.err.println("Error fetching embeddings: " + e.getMessage());
61             throw e;
62         }
63     }
64 
65     // Converts embeddings to BSON binary vectors using MongoDB Java Driver
66     private static Document convertEmbeddingsToBson(EmbedByTypeResponseEmbeddings embeddings) {
67         List<List<Double>> floatEmbeddings = embeddings.getFloat().orElseThrow();
68         List<List<Integer>> int8Embeddings = embeddings.getInt8().orElseThrow();
69         List<List<Integer>> ubinaryEmbeddings = embeddings.getUbinary().orElseThrow();
70 
71         List<Document> bsonEmbeddings = new ArrayList<>();
72         for (int i = 0; i < floatEmbeddings.size(); i++) {
73             Document bsonEmbedding = new Document()
74                 .append("text", DATA.get(i))
75                 .append("embeddings_float32", BinaryVector.floatVector(listToFloatArray(floatEmbeddings.get(i))))
76                 .append("embeddings_int8", BinaryVector.int8Vector(listToByteArray(int8Embeddings.get(i))))
77                 .append("embeddings_int1", BinaryVector.packedBitVector(listToByteArray(ubinaryEmbeddings.get(i)), (byte) 0));
78 
79             bsonEmbeddings.add(bsonEmbedding);
80         }
81 
82         return new Document("data", bsonEmbeddings);
83     }
84 
85     // Writes embeddings to JSON file
86     private static void writeEmbeddingsToFile(Document bsonEmbeddings, String fileName) {
87         try (FileOutputStream fos = new FileOutputStream(fileName)) {
88             fos.write(bsonEmbeddings.toJson().getBytes());
89             System.out.println("Embeddings saved to " + fileName);
90         } catch (IOException e) {
91             System.out.println("Error writing embeddings to file: " + e.getMessage());
92         }
93     }
94 
95     // Convert List of Doubles to an array of floats
96     private static float[] listToFloatArray(List<Double> list) {
97         float[] array = new float[list.size()];
98         for (int i = 0; i < list.size(); i++) {
99             array[i] = list.get(i).floatValue();
100         }
101         return array;
102     }
103 
104     // Convert List of Integers to an array of bytes
105     private static byte[] listToByteArray(List<Integer> list) {
106         byte[] array = new byte[list.size()];
107         for (int i = 0; i < list.size(); i++) {
108             array[i] = list.get(i).byteValue();
109         }
110         return array;
111     }
112 }

환경 변수를 설정하지 않은 경우 코드에서 COHERE_API_KEY 자리 표시자 값을 교체한 다음 파일을 저장합니다.
애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.
터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.
javac GenerateAndConvertEmbeddings.java java GenerateAndConvertEmbeddings
BSON embeddings saved to embeddings.json
embeddings.json 파일의 임베딩을 확인합니다.

임베딩을 생성하고 임베딩을 binData 벡터로 변환하는 방법에 대한 자세한 내용은 벡터 임베딩 생성 방법을 참조하세요.

데이터를 수집하고 Atlas Vector Search 인덱스를 생성합니다.

데이터와 임베딩을 Atlas 클러스터의 컬렉션에 업로드하고 해당 데이터에 Atlas Vector Search 인덱스를 생성하여 $vectorSearch 쿼리를 실행해야 합니다.

Java 프로젝트에 UploadDataAndCreateIndex.java(이)라는 이름의 새 파일을 만듭니다.
```
touch UploadDataAndCreateIndex.java
```

다음 코드를 복사하여 UploadDataAndCreateIndex.java 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

embeddings.json 파일의 데이터를 Atlas 클러스터에 업로드합니다.
embeddings_float32, embeddings_int8, embeddings_int1 필드에 Atlas Vector Search 인덱스를 생성합니다.

UploadDataAndCreateIndex.java

1 import com.mongodb.client.MongoClient;
2 import com.mongodb.client.MongoClients;
3 import com.mongodb.client.MongoCollection;
4 import com.mongodb.client.MongoDatabase;
5 import com.mongodb.client.model.SearchIndexModel;
6 import com.mongodb.client.model.SearchIndexType;
7 import org.bson.Document;
8 import org.bson.conversions.Bson;
9 
10 import java.io.IOException;
11 import java.nio.file.Files;
12 import java.nio.file.Path;
13 import java.util.Collections;
14 import java.util.List;
15 import java.util.concurrent.TimeUnit;
16 import java.util.stream.StreamSupport;
17 
18 public class UploadDataAndCreateIndex {
19 
20     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
21     private static final String DB_NAME = "<DATABASE-NAME>";
22     private static final String COLLECTION_NAME = "<COLLECTION-NAME>";
23     private static final String INDEX_NAME = "<INDEX-NAME>";
24 
25     public static void main(String[] args) {
26         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
27             storeEmbeddings(mongoClient);
28             setupVectorSearchIndex(mongoClient);
29         } catch (IOException | InterruptedException e) {
30             e.printStackTrace();
31         }
32     }
33 
34     public static void storeEmbeddings(MongoClient client) throws IOException {
35         MongoDatabase database = client.getDatabase(DB_NAME);
36         MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
37 
38         String fileContent = Files.readString(Path.of("embeddings.json"));
39         List<Document> documents = parseDocuments(fileContent);
40 
41         collection.insertMany(documents);
42         System.out.println("Inserted documents into MongoDB");
43     }
44 
45     private static List<Document> parseDocuments(String jsonContent) throws IOException {
46         Document rootDoc = Document.parse(jsonContent);
47         return rootDoc.getList("data", Document.class);
48     }
49 
50     public static void setupVectorSearchIndex(MongoClient client) throws InterruptedException {
51         MongoDatabase database = client.getDatabase(DB_NAME);
52         MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
53         
54         Bson definition = new Document(
55             "fields",
56             List.of(
57                 new Document("type", "vector")
58                     .append("path", "embeddings_float32")
59                     .append("numDimensions", 1024)
60                     .append("similarity", "dotProduct"),
61                 new Document("type", "vector")
62                     .append("path", "embeddings_int8")
63                     .append("numDimensions", 1024)
64                     .append("similarity", "dotProduct"),
65                 new Document("type", "vector")
66                     .append("path", "embeddings_int1")
67                     .append("numDimensions", 1024)
68                     .append("similarity", "euclidean")
69             )
70         );
71         
72         SearchIndexModel indexModel = new SearchIndexModel(
73             INDEX_NAME,
74             definition,
75             SearchIndexType.vectorSearch()
76         );
77         
78         List<String> result = collection.createSearchIndexes(Collections.singletonList(indexModel));
79         System.out.println("Successfully created vector index named: " + result.get(0));
80         System.out.println("It may take up to a minute for the index to leave the BUILDING status and become queryable.");
81         
82         System.out.println("Polling to confirm the index has changed from the BUILDING status.");
83         waitForIndex(collection, INDEX_NAME);
84     }
85 
86     public static <T> boolean waitForIndex(final MongoCollection<T> collection, final String indexName) {
87         long startTime = System.nanoTime();
88         long timeoutNanos = TimeUnit.SECONDS.toNanos(60);
89         while (System.nanoTime() - startTime < timeoutNanos) {
90             Document indexRecord = StreamSupport.stream(collection.listSearchIndexes().spliterator(), false)
91                     .filter(index -> indexName.equals(index.getString("name")))
92                     .findAny().orElse(null);
93             if (indexRecord != null) {
94                 if ("FAILED".equals(indexRecord.getString("status"))) {
95                     throw new RuntimeException("Search index has FAILED status.");
96                 }
97                 if (indexRecord.getBoolean("queryable")) {
98                     System.out.println(indexName + " index is ready to query");
99                     return true;
100                 }
101             }
102             try {
103                 Thread.sleep(100); // busy-wait, avoid in production
104             } catch (InterruptedException e) {
105                 Thread.currentThread().interrupt();
106                 throw new RuntimeException(e);
107             }
108         }
109         return false;
110     }
111 }

코드에서 다음 자리 표시자 값을 바꾸고 파일을 저장합니다.

`MONGODB_URI`	환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
`<DATABASE-NAME>`	Atlas 클러스터에 있는 데이터베이스의 이름입니다.
`<COLLECTION-NAME>`	데이터를 업로드할 컬렉션의 이름입니다.
`<INDEX-NAME>`	컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.

애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.

터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.

javac UploadDataAndCreateIndex.java
java UploadDataAndCreateIndex

Inserted documents into MongoDB
Successfully created vector index named: <INDEX_NAME>
It may take up to a minute for the index to leave the BUILDING status and become queryable.
Polling to confirm the index has changed from the BUILDING status.
<INDEX_NAME> index is ready to query

Atlas 클러스터에 로그인하고 다음을 확인하세요.
- 네임스페이스의 데이터입니다.
- 컬렉션에 대한 Atlas Vector Search 인덱스입니다.

컬렉션에 대한 쿼리를 생성하고 실행합니다.

임베딩을 테스트하려면 컬렉션에 대해 쿼리를 실행할 수 있습니다. 임베딩 모델 제공자를 사용하여 쿼리 텍스트에 대해 float, int8, int1 임베딩을 생성합니다. 다음 샘플 코드는 Cohere의 embed API를 사용하여 고정밀 벡터를 생성합니다. 임베딩을 생성한 후 MongoDB Java 드라이버를 사용하여 네이티브 벡터 임베딩을 BSON 벡터로 변환하고 컬렉션에 대해 $vectorSearch 쿼리를 실행합니다.

Java 프로젝트에 CreateEmbeddingsAndRunQuery.java(이)라는 이름의 새 파일을 만듭니다.
```
touch CreateEmbeddingsAndRunQuery.java
```

다음 코드를 복사하여 CreateEmbeddingsAndRunQuery.java 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Cohere의 embed API를 사용하여 float32, int8, ubinary 벡터 임베딩을 생성합니다.
MongoDB Java 드라이버를 사용하여 임베딩을 BSONbinData 벡터로 변환합니다.
컬렉션에 대해 쿼리를 실행합니다.

CreateEmbeddingsAndRunQuery.java

1 import com.cohere.api.Cohere;
2 import com.cohere.api.requests.EmbedRequest;
3 import com.cohere.api.types.EmbedResponse;
4 import com.cohere.api.types.EmbedByTypeResponse;
5 import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
6 import com.cohere.api.types.EmbeddingType;
7 import com.cohere.api.types.EmbedInputType;
8 import com.mongodb.client.MongoClient;
9 import com.mongodb.client.MongoClients;
10 import com.mongodb.client.MongoCollection;
11 import com.mongodb.client.MongoDatabase;
12 import org.bson.Document;
13 import org.bson.conversions.Bson;
14 import org.bson.BinaryVector;
15 import org.slf4j.Logger;
16 import org.slf4j.LoggerFactory;
17 
18 import java.util.ArrayList;
19 import java.util.HashMap;
20 import java.util.List;
21 import java.util.Map;
22 import java.util.Optional;
23 
24 import static com.mongodb.client.model.Aggregates.project;
25 import static com.mongodb.client.model.Aggregates.vectorSearch;
26 import static com.mongodb.client.model.Projections.fields;
27 import static com.mongodb.client.model.Projections.include;
28 import static com.mongodb.client.model.Projections.exclude;
29 import static com.mongodb.client.model.Projections.metaVectorSearchScore;
30 import static com.mongodb.client.model.search.SearchPath.fieldPath;
31 import static com.mongodb.client.model.search.VectorSearchOptions.approximateVectorSearchOptions;
32 import static java.util.Arrays.asList;
33 
34 public class CreateEmbeddingsAndRunQuery {
35     private static final String COHERE_API_KEY = System.getenv("COHERE_API_KEY");
36     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
37     private static final String DB_NAME = "<DATABASE-NAME>";
38     private static final String COLLECTION_NAME = "<COLLECTION-NAME>";
39     private static final String VECTOR_INDEX_NAME = "<INDEX-NAME>";
40     private static final String DATA_FIELD_NAME = "<DATA-FIELD>";
41 
42     public static void main(String[] args) {
43         String queryText = "<QUERY-TEXT>";
44 
45         try {
46             CreateAndRunQuery processor = new CreateAndRunQuery();
47             Map<String, BinaryVector> embeddingsData = processor.generateAndConvertEmbeddings(queryText);
48             processor.runVectorSearchQuery(embeddingsData);
49         } catch (Exception e) {
50             e.printStackTrace();
51         }
52     }
53 
54     // Generate embeddings using Cohere's embed API from the query text
55     public Map<String, BinaryVector> generateAndConvertEmbeddings(String text) throws Exception {
56         if (COHERE_API_KEY == null || COHERE_API_KEY.isEmpty()) {
57             throw new RuntimeException("API key not found. Set COHERE_API_KEY in your environment.");
58         }
59 
60         Cohere cohere = Cohere.builder().token(COHERE_API_KEY).build();
61 
62         EmbedRequest request = EmbedRequest.builder()
63                 .model("embed-english-v3.0")
64                 .inputType(EmbedInputType.SEARCH_QUERY)
65                 .texts(List.of(text))
66                 .embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
67                 .build();
68 
69         EmbedResponse response = cohere.embed(request);
70         Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
71         if (optionalEmbeddingsWrapper.isEmpty()) {
72             throw new RuntimeException("No embeddings found in the API response.");
73         }
74 
75         EmbedByTypeResponseEmbeddings embeddings = optionalEmbeddingsWrapper.get().getEmbeddings();
76         return createBinaryVectorEmbeddings(embeddings);
77     }
78 
79     // Convert embeddings to BSON binary vectors using MongoDB Java Driver
80     private static Map<String, BinaryVector> createBinaryVectorEmbeddings(EmbedByTypeResponseEmbeddings embeddings) {
81         Map<String, BinaryVector> binaryVectorEmbeddings = new HashMap<>();
82 
83         // Convert float embeddings
84         List<Double> floatList = embeddings.getFloat().orElseThrow().get(0);
85         if (floatList != null) {
86             float[] floatData = listToFloatArray(floatList);
87             BinaryVector floatVector = BinaryVector.floatVector(floatData);
88             binaryVectorEmbeddings.put("float32", floatVector);
89         }
90 
91         // Convert int8 embeddings
92         List<Integer> int8List = embeddings.getInt8().orElseThrow().get(0);
93         if (int8List != null) {
94             byte[] int8Data = listToByteArray(int8List);
95             BinaryVector int8Vector = BinaryVector.int8Vector(int8Data);
96             binaryVectorEmbeddings.put("int8", int8Vector);
97         }
98 
99         // Convert ubinary embeddings
100         List<Integer> ubinaryList = embeddings.getUbinary().orElseThrow().get(0);
101         if (ubinaryList != null) {
102             byte[] int1Data = listToByteArray(ubinaryList);
103             BinaryVector packedBitsVector = BinaryVector.packedBitVector(int1Data, (byte) 0);
104             binaryVectorEmbeddings.put("int1", packedBitsVector);
105         }
106 
107         return binaryVectorEmbeddings;
108     }
109 
110     // Define and run $vectorSearch query using the embeddings
111     public void runVectorSearchQuery(Map<String, BinaryVector> embeddingsData) {
112         if (MONGODB_URI == null || MONGODB_URI.isEmpty()) {
113             throw new RuntimeException("MongoDB URI not found. Set MONGODB_URI in your environment.");
114         }
115 
116         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
117             MongoDatabase database = mongoClient.getDatabase(DB_NAME);
118             MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
119 
120             for (String path : embeddingsData.keySet()) {
121                 BinaryVector queryVector = embeddingsData.get(path);
122 
123                 List<Bson> pipeline = asList(
124                         vectorSearch(
125                                 fieldPath("embeddings_" + path),
126                                 queryVector,
127                                 VECTOR_INDEX_NAME,
128                                 2,
129                                 approximateVectorSearchOptions(5)
130                         ),
131                         project(
132                                 fields(
133                                         exclude("_id"),
134                                         include(DATA_FIELD_NAME),
135                                         metaVectorSearchScore("vectorSearchScore")
136                                 )
137                         )
138                 );
139 
140                 List<Document> results = collection.aggregate(pipeline).into(new ArrayList<>());
141 
142                 System.out.println("Results from " + path + " embeddings:");
143                 for (Document result : results) {
144                     System.out.println(result.toJson());
145                 }
146             }
147         }
148     }
149 
150     private static float[] listToFloatArray(List<Double> list) {
151         float[] array = new float[list.size()];
152         for (int i = 0; i < list.size(); i++) {
153             array[i] = list.get(i).floatValue();
154         }
155         return array;
156     }
157 
158     private static byte[] listToByteArray(List<Integer> list) {
159         byte[] array = new byte[list.size()];
160         for (int i = 0; i < list.size(); i++) {
161             array[i] = list.get(i).byteValue();
162         }
163         return array;
164     }
165 }

코드에서 다음 자리 표시자 값을 바꾸고 파일을 저장합니다.

`MONGODB_URI`	환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
`COHERE_API_KEY`	환경 변수를 설정하지 않은 경우 Cohere API 키를 사용합니다.
`<DATABASE-NAME>`	Atlas 클러스터에 있는 데이터베이스의 이름입니다.
`<COLLECTION-NAME>`	데이터를 삽입한 컬렉션의 이름입니다.
`<INDEX-NAME>`	컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.
`<DATA-FIELD-NAME>`	임베딩을 생성한 텍스트가 포함된 필드의 이름입니다. 이 예시에서는 `text`을(를) 사용합니다.
`<QUERY-TEXT>`	쿼리의 텍스트입니다. 이 예시에서는 `science fact`을(를) 사용합니다.

애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.

터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.

javac CreateEmbeddingsAndRunQuery.java
java CreateEmbeddingsAndRunQuery

Results from int1 embeddings:
{"text": "Mount Everest is the highest peak on Earth at 8,848m.", "score": 0.642578125}
{"text": "The Great Wall of China is visible from space.", "score": 0.61328125}
Results from int8 embeddings:
{"text": "Mount Everest is the highest peak on Earth at 8,848m.", "score": 0.5149773359298706}
{"text": "The Great Wall of China is visible from space.", "score": 0.5146723985671997}
Results from float32 embeddings:
{"text": "Mount Everest is the highest peak on Earth at 8,848m.", "score": 0.6583383083343506}
{"text": "The Great Wall of China is visible from space.", "score": 0.6536108255386353}

임베딩을 생성하고 임베딩을 binData 벡터로 변환하는 방법에 대한 자세한 내용은 벡터 임베딩 생성 방법을 참조하세요.

Java 프로젝트 생성하고 종속성을 설치합니다.

IDE에서 Maven 또는 Gradle을 사용하여 Java 프로젝트 만듭니다.

패키지 관리자에 따라 다음 종속성을 추가합니다.

Maven을 사용하는 경우 프로젝트의 pom.xml 파일 에 있는 dependencies 배열 에 다음 종속성을 추가합니다.

pom.xml

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.13.2</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.mongodb</groupId>
        <artifactId>mongodb-driver-sync</artifactId>
        <version>5.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.cohere</groupId>
        <artifactId>cohere-java</artifactId>
        <version>1.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>2.0.16</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>2.0.16</version>
        <scope>test</scope>
    </dependency>
</dependencies>

Gradle을 사용하는 경우 프로젝트의 build.gradle 파일 에 있는 dependencies 배열 에 다음을 추가합니다.

build.gradle

dependencies {
    // MongoDB Java Sync Driver v5.3.1 or later
    implementation 'org.mongodb:mongodb-driver-sync:[5.3.1,)'
    // Java library for working with Cohere models
    implementation 'ai.cohere:cohere-java:1.6.0'
    // SLF4J (The Simple Logging Facade for Java)
    testImplementation("org.slf4j:slf4j-simple:2.0.16")
    implementation("org.slf4j:slf4j-api:2.0.16")
}

패키지 관리자를 실행하여 프로젝트 에 종속성을 설치합니다.

환경 변수를 설정합니다.

참고

IDE에서 새 구성 템플릿을 만들고 프로젝트 에 다음 변수를 추가합니다.

IntelliJ IDEA를 사용하는 경우 새 Application 실행 구성 템플릿을 만들고, Environment variables 필드에 세미콜론으로 구분된 값(예: FOO=123;BAR=456)으로 변수를 추가합니다. 변경 사항을 적용하고 OK를 클릭합니다.
자세한 학습 은 IntelliJ IDEA 문서의 템플릿에서 실행/디버그 구성 만들기 섹션을 참조하세요.
Eclipse를 사용하는 경우 새 Java Application 시작 구성을 생성한 다음 Environment 탭 에서 각 변수를 새 키-값 쌍으로 추가합니다. 변경 사항을 적용하고 OK를 클릭합니다.
자세한 학습 은 Eclipse IDE 문서의 Java 애플리케이션 실행 구성 생성하기 섹션을 참조하세요.

환경 변수

COHERE_API_KEY=<api-key>
MONGODB_URI=<connection-string>

자리 표시자를 다음 값으로 업데이트합니다.

<api-key> 자리 표시자 값을 Cohere API 키로 대체합니다.
자리 <connection-string> 표시자 값을 Atlas cluster 의 SRV 연결 문자열 로 바꿉니다.
연결 문자열은 다음 형식을 사용해야 합니다.
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```

(조건부)데이터에서 임베딩을 생성합니다.

임베딩 모델 제공자를 사용하여 데이터에 대해 float, int8, int1 임베딩을 생성한 후 MongoDB Java 드라이버를 사용하여 네이티브 벡터 임베딩을 BSON벡터로 변환할 수 있습니다. 다음 샘플 코드는 Cohere의 embed API를 사용하여 sample_airbnb.listingsAndReviews 네임스페이스의 데이터에서 고정밀 벡터를 생성합니다.

Java 프로젝트에 GenerateAndConvertEmbeddings.java(이)라는 이름의 새 파일을 만듭니다.
```
touch GenerateAndConvertEmbeddings.java
```

다음 코드를 복사하여 GenerateAndConvertEmbeddings.java 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

sample_airbnb.listingsAndReviews 네임스페이스의 50 문서에서 summary 필드를 가져옵니다.
Cohere의 embed API를 사용하여 float32, int8, ubinary 벡터 임베딩을 생성합니다.
MongoDB Java 드라이버를 사용하여 임베딩을 BSONbinData 벡터로 변환합니다.
embeddings.json 파일을 만들고 이 파일에 데이터와 임베딩을 저장합니다.

GenerateAndConvertEmbeddings.java

1 import com.cohere.api.Cohere;
2 import com.cohere.api.requests.EmbedRequest;
3 import com.cohere.api.types.EmbedByTypeResponse;
4 import com.cohere.api.types.EmbedResponse;
5 import com.cohere.api.types.EmbeddingType;
6 import com.cohere.api.types.EmbedInputType;
7 import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
8 import com.mongodb.client.MongoClient;
9 import com.mongodb.client.MongoClients;
10 import com.mongodb.client.MongoDatabase;
11 import com.mongodb.client.MongoCollection;
12 import com.mongodb.client.FindIterable;
13 import org.bson.BsonArray;
14 import org.bson.Document;
15 import org.bson.BinaryVector;
16 import org.slf4j.Logger;
17 import org.slf4j.LoggerFactory;
18 import java.io.FileOutputStream;
19 import java.io.IOException;
20 import java.util.ArrayList;
21 import java.util.Arrays;
22 import java.util.List;
23 import java.util.Objects;
24 import java.util.Optional;
25 
26 public class GenerateAndConvertEmbeddings {
27     private static final Logger logger = LoggerFactory.getLogger(GenerateAndConvertEmbeddings.class);
28     private static final String COHERE_API_KEY = System.getenv("COHERE_API_KEY");
29     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
30 
31     public static void main(String[] args) {
32         try {
33             List<String> summaries = fetchSummariesFromMongoDB();
34             if (summaries.isEmpty()) {
35                 throw new RuntimeException("No summaries retrieved from MongoDB.");
36             }
37             EmbedByTypeResponseEmbeddings embeddingsData = fetchEmbeddingsFromCohere(COHERE_API_KEY, summaries);
38             if (embeddingsData == null) {
39                 throw new RuntimeException("Failed to fetch embeddings.");
40             }
41             convertAndSaveEmbeddings(summaries, embeddingsData);
42         } catch (Exception e) {
43             logger.error("Unexpected error: {}", e.getMessage(), e);
44         }
45     }
46 
47     private static List<String> fetchSummariesFromMongoDB() {
48         List<String> summaries = new ArrayList<>();
49         if (MONGODB_URI == null || MONGODB_URI.isEmpty()) {
50             throw new RuntimeException("MongoDB URI is not set.");
51         }
52         logger.info("Connecting to MongoDB at URI: {}", MONGODB_URI);
53         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
54             String dbName = "sample_airbnb";
55             String collName = "listingsAndReviews";
56             MongoDatabase database = mongoClient.getDatabase(dbName);
57             MongoCollection<Document> collection = database.getCollection(collName);
58             Document filter = new Document("summary", new Document("$nin", Arrays.asList(null, "")));
59             FindIterable<Document> documentsCursor = collection.find(filter).limit(50);
60             for (Document doc : documentsCursor) {
61                 String summary = doc.getString("summary");
62                 if (summary != null && !summary.isEmpty()) {
63                     summaries.add(summary);
64                 }
65             }
66             logger.info("Retrieved {} summaries from MongoDB.", summaries.size());
67         } catch (Exception e) {
68             logger.error("Error fetching from MongoDB: {}", e.getMessage(), e);
69             throw new RuntimeException("Failed to fetch data from MongoDB", e);
70         }
71         return summaries;
72     }
73 
74     private static EmbedByTypeResponseEmbeddings fetchEmbeddingsFromCohere(String apiKey, List<String> data) {
75         if (Objects.isNull(apiKey) || apiKey.isEmpty()) {
76             throw new RuntimeException("API key is not set.");
77         }
78         Cohere cohere = Cohere.builder().token(apiKey).clientName("embed-example").build();
79         try {
80             EmbedRequest request = EmbedRequest.builder()
81                     .model("embed-english-v3.0")
82                     .inputType(EmbedInputType.SEARCH_DOCUMENT)
83                     .texts(data)
84                     .embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
85                     .build();
86             EmbedResponse response = cohere.embed(request);
87             Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
88             if (optionalEmbeddingsWrapper.isPresent()) {
89                 return optionalEmbeddingsWrapper.get().getEmbeddings();
90             } else {
91                 logger.warn("No embeddings were returned.");
92             }
93         } catch (Exception e) {
94             logger.error("Error fetching embeddings: {}", e.getMessage(), e);
95         }
96         return null;
97     }
98 
99     private static void convertAndSaveEmbeddings(List<String> summaries, EmbedByTypeResponseEmbeddings embeddings) {
100         try {
101             Document doc = new Document();
102             BsonArray array = new BsonArray();
103             for (int i = 0; i < summaries.size(); i++) {
104                 String summary = summaries.get(i);
105 
106                 // Retrieve the embeddings for the current index
107                 List<Double> floatList = embeddings.getFloat().orElseThrow().get(i);
108                 List<Integer> int8List = embeddings.getInt8().orElseThrow().get(i);
109                 List<Integer> ubinaryList = embeddings.getUbinary().orElseThrow().get(i);
110 
111                 // Convert lists to arrays
112                 float[] floatData = listToFloatArray(floatList);
113                 byte[] int8Data = listToByteArray(int8List);
114                 byte[] int1Data = listToByteArray(ubinaryList);
115 
116                 // Create BinaryVector objects
117                 BinaryVector floatVector = BinaryVector.floatVector(floatData);
118                 BinaryVector int8Vector = BinaryVector.int8Vector(int8Data);
119                 BinaryVector packedBitsVector = BinaryVector.packedBitVector(int1Data, (byte) 0);
120 
121                 Document document = new Document()
122                         .append("text", summary)
123                         .append("embeddings_float32", floatVector)
124                         .append("embeddings_int8", int8Vector)
125                         .append("embeddings_int1", packedBitsVector);
126                 array.add(document.toBsonDocument());
127             }
128             doc.append("data", array);
129             try (FileOutputStream fos = new FileOutputStream("embeddings.json")) {
130                 fos.write(doc.toJson().getBytes());
131             }
132             logger.info("Embeddings with BSON vectors have been saved to embeddings.json");
133         } catch (IOException e) {
134             logger.error("Error writing embeddings to file: {}", e.getMessage(), e);
135         }
136     }
137 
138     private static float[] listToFloatArray(List<Double> list) {
139         float[] array = new float[list.size()];
140         for (int i = 0; i < list.size(); i++) {
141             array[i] = list.get(i).floatValue();
142         }
143         return array;
144     }
145 
146     private static byte[] listToByteArray(List<Integer> list) {
147         byte[] array = new byte[list.size()];
148         for (int i = 0; i < list.size(); i++) {
149             array[i] = list.get(i).byteValue();
150         }
151         return array;
152     }
153 }

환경 변수를 설정하지 않았다면 코드에서 다음 자리 표시자 값을 변경하고 파일을 저장합니다.
MONGODB_URI
환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
COHERE_API_KEY
환경 변수를 설정하지 않은 경우 Cohere API 키를 사용합니다.

애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.

터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.

javac GenerateAndConvertEmbeddings.java
java GenerateAndConvertEmbeddings

[main] INFO GenerateAndConvertEmbeddings - Connecting to MongoDB at URI: <CONNECTION-STRING>
...
[main] INFO GenerateAndConvertEmbeddings - Retrieved 50 summaries from MongoDB.
[main] INFO GenerateAndConvertEmbeddings - Embeddings with BSON vectors have been saved to embeddings.json

embeddings.json 파일의 임베딩을 확인합니다.

임베딩을 생성하고 임베딩을 binData 벡터로 변환하는 방법에 대한 자세한 내용은 벡터 임베딩 생성 방법을 참조하세요.

데이터를 수집하고 Atlas Vector Search 인덱스를 생성합니다.

데이터와 임베딩을 Atlas 클러스터의 컬렉션에 업로드하고 해당 데이터에 Atlas Vector Search 인덱스를 생성하여 $vectorSearch 쿼리를 실행해야 합니다.

Java 프로젝트에 UploadDataAndCreateIndex.java(이)라는 이름의 새 파일을 만듭니다.
```
touch UploadDataAndCreateIndex.java
```

다음 코드를 복사하여 UploadDataAndCreateIndex.java 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

embeddings.json 파일의 float32, int8, int1 임베딩을 Atlas 클러스터에 업로드합니다.
embeddings.float32, embeddings.int8, embeddings.int1 필드에 Atlas Vector Search 인덱스를 생성합니다.

UploadDataAndCreateIndex.java

1 import com.mongodb.client.MongoClient;
2 import com.mongodb.client.MongoClients;
3 import com.mongodb.client.MongoCollection;
4 import com.mongodb.client.MongoDatabase;
5 import com.mongodb.client.model.SearchIndexModel;
6 import com.mongodb.client.model.SearchIndexType;
7 
8 import org.bson.Document;
9 import org.bson.conversions.Bson;
10 import org.bson.BinaryVector; // Import the BinaryVector
11 
12 import java.io.IOException;
13 import java.nio.file.Files;
14 import java.nio.file.Path;
15 import java.util.Collections;
16 import java.util.List;
17 import java.util.concurrent.TimeUnit;
18 import java.util.stream.StreamSupport;
19 
20 public class UploadDataAndCreateIndex {
21 
22     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
23     private static final String DB_NAME = "sample_airbnb";
24     private static final String COLLECTION_NAME = "listingsAndReviews";
25     private static final String INDEX_NAME = "<INDEX-NAME>";
26 
27     public static void main(String[] args) {
28         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
29             uploadEmbeddingsData(mongoClient);
30             setupVectorSearchIndex(mongoClient);
31         } catch (Exception e) {
32             e.printStackTrace();
33         }
34     }
35 
36     public static void uploadEmbeddingsData(MongoClient mongoClient) throws IOException {
37         MongoDatabase database = mongoClient.getDatabase(DB_NAME);
38         MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
39         String filePath = "embeddings.json";
40         String fileContent = Files.readString(Path.of(filePath));
41 
42         Document rootDoc = Document.parse(fileContent);
43         List<Document> embeddingsDocs = rootDoc.getList("data", Document.class);
44 
45         for (Document doc : embeddingsDocs) {
46             // Retrieve the string value from the document
47             String summary = doc.getString("text");
48 
49             // Get the BinaryVector objects from the document
50             BinaryVector embeddingsFloat32 = doc.get("embeddings_float32", BinaryVector.class);
51             BinaryVector embeddingsInt8 = doc.get("embeddings_int8", BinaryVector.class);
52             BinaryVector embeddingsInt1 = doc.get("embeddings_int1", BinaryVector.class);
53 
54             // Create filter and update documents
55             Document filter = new Document("summary", summary);
56             Document update = new Document("$set", new Document("summary", summary)
57                     .append("embeddings_float32", embeddingsFloat32)
58                     .append("embeddings_int8", embeddingsInt8)
59                     .append("embeddings_int1", embeddingsInt1));
60 
61             // Perform update operation with upsert option
62             collection.updateOne(filter, update, new com.mongodb.client.model.UpdateOptions().upsert(true));
63             System.out.println("Processed document with summary: " + summary);
64         }
65     }
66 
67     public static void setupVectorSearchIndex(MongoClient client) throws InterruptedException {
68         MongoDatabase database = client.getDatabase(DB_NAME);
69         MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
70         // Define the index details
71         Bson definition = new Document(
72             "fields",
73             List.of(
74                 new Document("type", "vector")
75                     .append("path", "embeddings_float32")
76                     .append("numDimensions", 1024)
77                     .append("similarity", "dotProduct"),
78                 new Document("type", "vector")
79                     .append("path", "embeddings_int8")
80                     .append("numDimensions", 1024)
81                     .append("similarity", "dotProduct"),
82                 new Document("type", "vector")
83                     .append("path", "embeddings_int1")
84                     .append("numDimensions", 1024)
85                     .append("similarity", "euclidean")
86             )
87         );
88         // Define the index model
89         SearchIndexModel indexModel = new SearchIndexModel(
90             INDEX_NAME,
91             definition,
92             SearchIndexType.vectorSearch()
93         );
94         // Create the index using the defined model
95         List<String> result = collection.createSearchIndexes(Collections.singletonList(indexModel));
96         System.out.println("Successfully created vector index named: " + result.get(0));
97         System.out.println("It may take up to a minute for the index to leave the BUILDING status and become queryable.");
98         // Wait for Atlas to build the index
99         System.out.println("Polling to confirm the index has changed from the BUILDING status.");
100         waitForIndex(collection, INDEX_NAME);
101     }
102 
103     public static <T> boolean waitForIndex(final MongoCollection<T> collection, final String indexName) {
104         long startTime = System.nanoTime();
105         long timeoutNanos = TimeUnit.SECONDS.toNanos(60);
106         while (System.nanoTime() - startTime < timeoutNanos) {
107             Document indexRecord = StreamSupport.stream(collection.listSearchIndexes().spliterator(), false)
108                     .filter(index -> indexName.equals(index.getString("name")))
109                     .findAny().orElse(null);
110             if (indexRecord != null) {
111                 if ("FAILED".equals(indexRecord.getString("status"))) {
112                     throw new RuntimeException("Search index has FAILED status.");
113                 }
114                 if (indexRecord.getBoolean("queryable")) {
115                     System.out.println(indexName + " index is ready to query");
116                     return true;
117                 }
118             }
119             try {
120                 Thread.sleep(100); // busy-wait, avoid in production
121             } catch (InterruptedException e) {
122                 Thread.currentThread().interrupt();
123                 throw new RuntimeException(e);
124             }
125         }
126         return false;
127     }
128 }

코드에서 다음 자리 표시자 값을 바꾸고 파일을 저장합니다.
MONGODB_URI
환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
<INDEX-NAME>
컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.

애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.

터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.

javac UploadDataAndCreateIndex.java
java UploadDataAndCreateIndex

Successfully created vector index named: <INDEX_NAME>
It may take up to a minute for the index to leave the BUILDING status and become queryable.
Polling to confirm the index has changed from the BUILDING status.
<INDEX_NAME> index is ready to query

Atlas 클러스터에 로그인하고 다음을 확인하세요.
- 네임스페이스의 데이터입니다.
- 컬렉션에 대한 Atlas Vector Search 인덱스입니다.

컬렉션에 대해 쿼리를 생성하고 실행합니다.

Java 프로젝트에 CreateEmbeddingsAndRunQuery.java(이)라는 이름의 새 파일을 만듭니다.
```
touch CreateEmbeddingsAndRunQuery.java
```

다음 코드를 복사하여 CreateEmbeddingsAndRunQuery.java 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Cohere의 embed API를 사용하여 float32, int8, ubinary 벡터 임베딩을 생성합니다.
MongoDB Java 드라이버를 사용하여 임베딩을 BSONbinData 벡터로 변환합니다.
컬렉션에 대해 쿼리를 실행하고 결과를 반환합니다.

CreateEmbeddingsAndRunQuery.java

1 import com.cohere.api.Cohere;
2 import com.cohere.api.requests.EmbedRequest;
3 import com.cohere.api.types.EmbedResponse;
4 import com.cohere.api.types.EmbedByTypeResponse;
5 import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
6 import com.cohere.api.types.EmbeddingType;
7 import com.cohere.api.types.EmbedInputType;
8 import com.mongodb.client.MongoClient;
9 import com.mongodb.client.MongoClients;
10 import com.mongodb.client.MongoCollection;
11 import com.mongodb.client.MongoDatabase;
12 import org.bson.Document;
13 import org.bson.conversions.Bson;
14 import org.bson.BinaryVector;
15 import org.slf4j.Logger;
16 import org.slf4j.LoggerFactory;
17 
18 import java.util.ArrayList;
19 import java.util.HashMap;
20 import java.util.List;
21 import java.util.Map;
22 import java.util.Optional;
23 
24 import static com.mongodb.client.model.Aggregates.project;
25 import static com.mongodb.client.model.Aggregates.vectorSearch;
26 import static com.mongodb.client.model.Projections.fields;
27 import static com.mongodb.client.model.Projections.include;
28 import static com.mongodb.client.model.Projections.exclude;
29 import static com.mongodb.client.model.Projections.metaVectorSearchScore;
30 import static com.mongodb.client.model.search.SearchPath.fieldPath;
31 import static com.mongodb.client.model.search.VectorSearchOptions.approximateVectorSearchOptions;
32 import static java.util.Arrays.asList;
33 
34 public class CreateEmbeddingsAndRunQuery {
35     private static final String COHERE_API_KEY = System.getenv("COHERE_API_KEY");
36     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
37     private static final String DB_NAME = "<DATABASE-NAME>";
38     private static final String COLLECTION_NAME = "<COLLECTION-NAME>";
39     private static final String VECTOR_INDEX_NAME = "<INDEX-NAME>";
40     private static final String DATA_FIELD_NAME = "<DATA-FIELD>";
41 
42     public static void main(String[] args) {
43         String queryText = "<QUERY-TEXT>";
44 
45         try {
46             CreateAndRunQuery processor = new CreateAndRunQuery();
47             Map<String, BinaryVector> embeddingsData = processor.generateAndConvertEmbeddings(queryText);
48             processor.runVectorSearchQuery(embeddingsData);
49         } catch (Exception e) {
50             e.printStackTrace();
51         }
52     }
53 
54     // Generate embeddings using Cohere's embed API from the query text
55     public Map<String, BinaryVector> generateAndConvertEmbeddings(String text) throws Exception {
56         if (COHERE_API_KEY == null || COHERE_API_KEY.isEmpty()) {
57             throw new RuntimeException("API key not found. Set COHERE_API_KEY in your environment.");
58         }
59 
60         Cohere cohere = Cohere.builder().token(COHERE_API_KEY).build();
61 
62         EmbedRequest request = EmbedRequest.builder()
63                 .model("embed-english-v3.0")
64                 .inputType(EmbedInputType.SEARCH_QUERY)
65                 .texts(List.of(text))
66                 .embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
67                 .build();
68 
69         EmbedResponse response = cohere.embed(request);
70         Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
71         if (optionalEmbeddingsWrapper.isEmpty()) {
72             throw new RuntimeException("No embeddings found in the API response.");
73         }
74 
75         EmbedByTypeResponseEmbeddings embeddings = optionalEmbeddingsWrapper.get().getEmbeddings();
76         return createBinaryVectorEmbeddings(embeddings);
77     }
78 
79     // Convert embeddings to BSON binary vectors using MongoDB Java Driver
80     private static Map<String, BinaryVector> createBinaryVectorEmbeddings(EmbedByTypeResponseEmbeddings embeddings) {
81         Map<String, BinaryVector> binaryVectorEmbeddings = new HashMap<>();
82 
83         // Convert float embeddings
84         List<Double> floatList = embeddings.getFloat().orElseThrow().get(0);
85         if (floatList != null) {
86             float[] floatData = listToFloatArray(floatList);
87             BinaryVector floatVector = BinaryVector.floatVector(floatData);
88             binaryVectorEmbeddings.put("float32", floatVector);
89         }
90 
91         // Convert int8 embeddings
92         List<Integer> int8List = embeddings.getInt8().orElseThrow().get(0);
93         if (int8List != null) {
94             byte[] int8Data = listToByteArray(int8List);
95             BinaryVector int8Vector = BinaryVector.int8Vector(int8Data);
96             binaryVectorEmbeddings.put("int8", int8Vector);
97         }
98 
99         // Convert ubinary embeddings
100         List<Integer> ubinaryList = embeddings.getUbinary().orElseThrow().get(0);
101         if (ubinaryList != null) {
102             byte[] int1Data = listToByteArray(ubinaryList);
103             BinaryVector packedBitsVector = BinaryVector.packedBitVector(int1Data, (byte) 0);
104             binaryVectorEmbeddings.put("int1", packedBitsVector);
105         }
106 
107         return binaryVectorEmbeddings;
108     }
109 
110     // Define and run $vectorSearch query using the embeddings
111     public void runVectorSearchQuery(Map<String, BinaryVector> embeddingsData) {
112         if (MONGODB_URI == null || MONGODB_URI.isEmpty()) {
113             throw new RuntimeException("MongoDB URI not found. Set MONGODB_URI in your environment.");
114         }
115 
116         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
117             MongoDatabase database = mongoClient.getDatabase(DB_NAME);
118             MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
119 
120             for (String path : embeddingsData.keySet()) {
121                 BinaryVector queryVector = embeddingsData.get(path);
122 
123                 List<Bson> pipeline = asList(
124                         vectorSearch(
125                                 fieldPath("embeddings_" + path),
126                                 queryVector,
127                                 VECTOR_INDEX_NAME,
128                                 2,
129                                 approximateVectorSearchOptions(5)
130                         ),
131                         project(
132                                 fields(
133                                         exclude("_id"),
134                                         include(DATA_FIELD_NAME),
135                                         metaVectorSearchScore("vectorSearchScore")
136                                 )
137                         )
138                 );
139 
140                 List<Document> results = collection.aggregate(pipeline).into(new ArrayList<>());
141 
142                 System.out.println("Results from " + path + " embeddings:");
143                 for (Document result : results) {
144                     System.out.println(result.toJson());
145                 }
146             }
147         }
148     }
149 
150     private static float[] listToFloatArray(List<Double> list) {
151         float[] array = new float[list.size()];
152         for (int i = 0; i < list.size(); i++) {
153             array[i] = list.get(i).floatValue();
154         }
155         return array;
156     }
157 
158     private static byte[] listToByteArray(List<Integer> list) {
159         byte[] array = new byte[list.size()];
160         for (int i = 0; i < list.size(); i++) {
161             array[i] = list.get(i).byteValue();
162         }
163         return array;
164     }
165 }

코드에서 다음 자리 표시자 값을 바꾸고 파일을 저장합니다.

`MONGODB_URI`	환경 변수를 설정하지 않은 경우 Atlas 클러스터 연결 문자열입니다.
`COHERE_API_KEY`	환경 변수를 설정하지 않은 경우 Cohere API 키를 사용합니다.
`<DATABASE-NAME>`	Atlas 클러스터에 있는 데이터베이스의 이름입니다. 이 예시에서는 `sample_airbnb`을(를) 사용합니다.
`<COLLECTION-NAME>`	데이터를 삽입한 컬렉션의 이름입니다. 이 예시에서는 `listingsAndReviews`을(를) 사용합니다.
`<INDEX-NAME>`	컬렉션에 대한 Atlas Vector Search 인덱스의 이름입니다.
`<DATA-FIELD-NAME>`	임베딩을 생성한 텍스트가 포함된 필드의 이름입니다. 이 예시에서는 `summary`을(를) 사용합니다.
`<QUERY-TEXT>`	쿼리의 텍스트입니다. 이 예시에서는 `ocean view`을(를) 사용합니다.

애플리케이션 실행 구성을 사용하여 파일을 컴파일하고 실행합니다.

터미널을 사용하는 경우에는 다음 명령어를 실행하여 프로그램을 컴파일하고 실행합니다.

javac CreateEmbeddingsAndRunQuery.java
java CreateEmbeddingsAndRunQuery

Results from int1 embeddings:
{"summary": "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.", "vectorSearchScore": 0.6591796875}
{"summary": "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.", "vectorSearchScore": 0.6337890625}
Results from int8 embeddings:
{"summary": "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.", "vectorSearchScore": 0.5215557217597961}
{"summary": "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.", "vectorSearchScore": 0.5179016590118408}
Results from float32 embeddings:
{"summary": "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.", "vectorSearchScore": 0.7278661131858826}
{"summary": "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.", "vectorSearchScore": 0.688639760017395}

임베딩을 생성하고 임베딩을 binData 벡터로 변환하는 방법에 대한 자세한 내용은 벡터 임베딩 생성 방법을 참조하세요.

필요한 라이브러리를 설치합니다.

다음 명령을 실행하여 MongoDB Node.js 드라이버를 설치합니다. 이 작업은 완료하는 데 몇 분 정도 걸릴 수 있습니다.

npm install mongodb

Node.js v6.11 이상 버전의 드라이버를 설치해야 합니다. 필요한 경우 임베딩 모델 제공자의 라이브러리를 설치할 수도 있습니다. 예를 들어, 이 페이지에서 설명한 대로 Cohere를 사용하여 float32, int8, int1 임베딩을 생성하려면 Cohere를 설치하세요.

npm install cohere-ai dotenv
npm show cohere-ai version

터미널에서 환경 변수를 설정합니다.

임베딩을 생성하고 변환하기 위해 임베딩 모델 제공자에 액세스하려면, 필요에 따라 임베딩 모델 제공자의 API 키에 대한 환경 변수를 설정하세요.
Cohere의 임베딩을 사용하기 위해 COHERE_API_KEY 환경 변수를 설정하세요.
```
export COHERE_API_KEY="<COHERE-API-KEY>"
```
환경 변수를 설정하지 않는 경우, 코드를 실행하기 전에 샘플 코드의 <COHERE-API-KEY>를 API 키로 교체하세요.
Atlas 클러스터에 액세스하려면 MONGODB_URI 환경 변수를 설정하세요.
```
export MONGODB_URI="<CONNECTION-STRING>"
```
연결 문자열은 다음 형식을 사용해야 합니다.
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```
환경 변수를 설정하지 않는 경우, 코드를 실행하기 전에 샘플 코드의 <CONNECTION-STRING> 을 연결 문자열로 교체하세요.

데이터의 벡터 임베딩을 생성하세요.

Cohere의 embed API를 사용하여 float32, int8 및 int1 벡터 임베딩을 생성하기 위해 get-embeddings.js 파일을 만듭니다.
```
touch get-embeddings.js
```

다음 코드를 복사하여 get-embeddings.js 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Cohere의 embed-english-v3.0 임베딩 모델을 사용하여 주어진 데이터에 대해 float32, int8 및 int1 임베딩을 생성합니다.
float, int8 및 int1 임베딩을 각각 float, int8 및 ubinary 필드에 저장합니다.
embeddings.json 파일을 만들고 이 파일에 임베딩을 저장합니다.

get-embeddings.js

1 // Use 'require' for modules in a Node.js environment
2 const { CohereClient } = require('cohere-ai');
3 const { writeFile } = require('fs/promises');
4 dd:queueMicrotask
5 // Retrieve API key from environment variables or default placeholder
6 const apiKey = process.env.COHERE_API_KEY || '<COHERE-API-KEY>';
7 
8 if (!apiKey) {
9   throw new Error('API key not found. Please set COHERE_API_KEY in your environment.');
10 }
11 
12 // Instantiate the CohereClient with the API key
13 const cohere = new CohereClient({ token: apiKey });
14 
15 async function main() {
16   try {
17     // Data to embed
18     const data = [
19       "The Great Wall of China is visible from space.",
20       "The Eiffel Tower was completed in Paris in 1889.",
21       "Mount Everest is the highest peak on Earth at 8,848m.",
22       "Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
23       "The Mona Lisa was painted by Leonardo da Vinci.",
24     ];
25 
26     // Fetch embeddings for the data using the cohere API
27     const response = await cohere.v2.embed({
28       model: 'embed-english-v3.0',
29       inputType: 'search_document', 
30       texts: data,
31       embeddingTypes: ['float', 'int8', 'ubinary'], 
32     });
33 
34     // Extract embeddings from the API response
35     const { float, int8, ubinary } = response.embeddings;
36 
37     // Map the embeddings to the text data
38     const embeddingsData = data.map((text, index) => ({
39       text,
40       embeddings: {
41         float: float[index],
42         int8: int8[index],
43         ubinary: ubinary[index],
44       },
45     }));
46 
47     // Write the embeddings data to a JSON file
48     await writeFile('embeddings.json', JSON.stringify(embeddingsData, null, 2));
49     console.log('Embeddings saved to embeddings.json');
50   } catch (error) {
51     console.error('Error fetching embeddings:', error);
52   }
53 }
54 
55 // Execute the main function
56 main();

Cohere용 API 키를 환경 변수로 설정하지 않은 경우 <COHERE_API_KEY> 자리 표시자를 교체한 다음 파일을 저장합니다.
코드를 실행하여 임베딩을 생성합니다.
node get-embeddings.js
Embeddings saved to embeddings.json
생성된 embeddings.json 파일에서 생성된 임베딩을 확인합니다.

벡터 임베딩을 `binData` 벡터로 변환합니다.

MongoDB Node.js 드라이버를 사용하여 Cohere의 float32, int8 및 int1 벡터 임베딩을 BSON binData 벡터로 변환하기 위해 convert-embeddings.js 파일을 만듭니다.
```
touch convert-embeddings.js
```

다음 코드를 복사하여 convert-embeddings.js 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

float32, int8 및 int1 임베딩에 대한 BSON binData 벡터를 생성합니다.
float32, int8 및 ubinary BSON binData 벡터를 embeddings.json 파일에 추가합니다.

convert-embeddings.js

1 const fs = require('fs/promises');
2 const { BSON } = require('mongodb');
3 const { Binary } = BSON;
4 
5 async function main() {
6   try {
7     // Read and parse the contents of 'embeddings.json' file
8     const fileContent = await fs.readFile('embeddings.json', 'utf8');
9     const embeddingsData = JSON.parse(fileContent);
10 
11     // Map the embeddings data to add BSON binary representations with subtype 9
12     const convertEmbeddingsData = embeddingsData.map(({ text, embeddings }) => {
13       // Create Binary for Float32Array with manual subtype 9
14       const bsonFloat32 = Binary.fromFloat32Array(new Float32Array(embeddings.float));
15 
16       // Create Binary for Int8Array with subtype 9
17       const bsonInt8 = Binary.fromInt8Array(new Int8Array(embeddings.int8));
18 
19       // Create Binary for PackedBits (Uint8Array) with subtype 9
20       const bsonPackedBits = Binary.fromPackedBits(new Uint8Array(embeddings.ubinary));
21 
22       return {
23         text,
24         embeddings: {
25           float: embeddings.float, // Original float data
26           int8: embeddings.int8, // Original int8 data
27           ubinary: embeddings.ubinary, // Original packed bits data
28         },
29         bsonEmbeddings: {
30           float32: bsonFloat32,
31           int8: bsonInt8,
32           packedBits: bsonPackedBits,
33         },
34       };
35     });
36 
37     // Serialize the updated data to EJSON for BSON compatibility
38     const ejsonSerializedData = BSON.EJSON.stringify(convertEmbeddingsData, null, null, { relaxed: false });
39 
40     // Write the serialized data to 'embeddings.json'
41     await fs.writeFile('embeddings.json', ejsonSerializedData);
42     console.log('Embeddings with BSON vectors have been saved to embeddings.json');
43   } catch (error) {
44     console.error('Error processing embeddings:', error);
45   }
46 }
47 
48 main();

BSON binData 벡터를 생성하는 프로그램을 실행합니다.
node convert-embeddings.js
Embeddings with BSON vectors have been saved to embeddings.json
embeddings.json 파일에서 생성된 BSON 임베딩을 확인합니다.

Atlas 클러스터에 연결하여 데이터를 컬렉션에 업로드합니다.

upload-data.js 파일을 생성하여 Atlas 클러스터에 연결하고 embeddings.json 파일의 데이터를 위해 데이터베이스에 컬렉션을 생성합니다.
```
touch upload-data.js
```

다음 코드를 복사하여 upload-data.js 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Atlas 클러스터에 연결하여 사용자가 지정한 데이터베이스 및 컬렉션 이름으로 네임스페이스를 생성합니다.
embeddings.json 파일의 임베딩이 포함된 데이터를 지정된 네임스페이스에 업로드합니다.

upload-data.js

1 const fs = require('fs/promises'); // Use fs/promises for asynchronous operations
2 const { MongoClient, BSON } = require('mongodb'); // Import from the 'mongodb' package
3 
4 const { Binary } = BSON; // Ensure the Binary class is imported correctly
5 
6 async function main() {
7     const MONGODB_URI = process.env.MONGODB_URI || "<CONNECTION-STRING>";
8     const DB_NAME = "<DB-NAME>";
9     const COLLECTION_NAME = "<COLLECTION-NAME>";
10 
11     let client;
12     try {
13         client = new MongoClient(MONGODB_URI);
14         await client.connect();
15         console.log("Connected to MongoDB");
16 
17         const db = client.db(DB_NAME);
18         const collection = db.collection(COLLECTION_NAME);
19 
20         // Read and parse the contents of 'embeddings.json' file using EJSON
21         const fileContent = await fs.readFile('embeddings.json', 'utf8');
22         const embeddingsData = BSON.EJSON.parse(fileContent);
23 
24         // Map embeddings data to recreate BSON binary representations with the correct subtype
25         const documents = embeddingsData.map(({ text, bsonEmbeddings }) => {
26             return {
27                 text,
28                 bsonEmbeddings: {
29                     float32: bsonEmbeddings.float32,
30                     int8: bsonEmbeddings.int8,
31                     int1: bsonEmbeddings.packedBits
32                 }
33             };
34         });
35 
36         const result = await collection.insertMany(documents);
37         console.log(`Inserted ${result.insertedCount} documents into MongoDB`);
38 
39     } catch (error) {
40         console.error('Error storing embeddings in MongoDB:', error);
41     } finally {
42         if (client) {
43             await client.close();
44         }
45     }
46 }
47 
48 // Run the store function
49 main();

다음 설정을 교체한 후 파일을 저장합니다.

`<CONNECTION-STRING>`	데이터베이스와 컬렉션을 생성할 Atlas 클러스터에 연결하기 위한 연결 문자열입니다. `MONGODB_URI` 환경 변수를 설정하지 않은 경우에만 이 값을 교체하세요.
`<DB-NAME>`	컬렉션을 생성할 데이터베이스의 이름입니다.
`<COLLECTION-NAME>`	생성된 임베딩을 저장할 컬렉션의 이름입니다.

다음 명령을 실행하여 데이터를 업로드합니다.
```
node upload-data.js
```
Atlas 클러스터의 컬렉션에 문서가 존재하는지 확인합니다.

컬렉션에 Atlas Vector Search 인덱스를 생성합니다.

컬렉션에 Atlas Vector Search 인덱스를 정의하기 위해 create-index.js 파일을 생성합니다.
```
touch create-index.js
```

create-index.js 파일에 인덱스를 생성하기 위해 다음 코드를 복사하여 붙여넣습니다.

이 코드는 다음을 수행합니다.

Atlas 클러스터에 연결하고 지정된 네임스페이스에 대해 지정된 이름으로 인덱스를 생성합니다.
bsonEmbeddings.float32 및 bsonEmbeddings.int8 필드를 dotProduct 유사성 함수를 사용하는 vector 유형으로 인덱싱하고, bsonEmbeddings.int1 필드를 euclidean 함수를 사용하는 vector 유형으로 인덱싱합니다.

create-index.js

1 const { MongoClient } = require("mongodb");
2 const { setTimeout } = require("timers/promises"); // Import from timers/promises
3 
4 // Connect to your Atlas deployment
5 const uri = process.env.MONGODB_URI || "<CONNECTION-STRING>";
6 
7 const client = new MongoClient(uri);
8 
9 async function main() {
10   try {
11     const database = client.db("<DB-NAME>");
12     const collection = database.collection("<COLLECTION-NAME>");
13 
14     // Define your Atlas Vector Search index
15     const index = {
16       name: "<INDEX-NAME>",
17       type: "vectorSearch",
18       definition: {
19         fields: [
20           {
21             type: "vector",
22             numDimensions: 1024,
23             path: "bsonEmbeddings.float32",
24             similarity: "dotProduct",
25           },
26           {
27             type: "vector",
28             numDimensions: 1024,
29             path: "bsonEmbeddings.int8",
30             similarity: "dotProduct",
31           },
32           {
33             type: "vector",
34             numDimensions: 1024,
35             path: "bsonEmbeddings.int1",
36             similarity: "euclidean",
37           },
38         ],
39       },
40     };
41 
42     // Run the helper method
43     const result = await collection.createSearchIndex(index);
44     console.log(`New search index named ${result} is building.`);
45 
46     // Wait for the index to be ready to query
47     console.log("Polling to check if the index is ready. This may take up to a minute.");
48     let isQueryable = false;
49 
50     // Use filtered search for index readiness
51     while (!isQueryable) {
52       const [indexData] = await collection.listSearchIndexes(index.name).toArray();
53 
54       if (indexData) {
55         isQueryable = indexData.queryable;
56         if (!isQueryable) {
57           await setTimeout(5000); // Wait for 5 seconds before checking again
58         }
59       } else {
60         // Handle the case where the index might not be found
61         console.log(`Index ${index.name} not found.`);
62         await setTimeout(5000); // Wait for 5 seconds before checking again
63       }
64     }
65 
66     console.log(`${result} is ready for querying.`);
67   } catch (error) {
68     console.error("Error:", error);
69   } finally {
70     await client.close();
71   }
72 }
73 
74 main().catch((err) => {
75   console.error("Unhandled error:", err);
76 });

다음 설정을 교체한 후 파일을 저장합니다.

`<CONNECTION-STRING>`	인덱스를 생성할 Atlas 클러스터에 연결하기 위한 연결 문자열입니다. `MONGODB_URI` 환경 변수를 설정하지 않은 경우에만 이 값을 교체하세요.
`<DB-NAME>`	컬렉션을 생성할 데이터베이스의 이름입니다.
`<COLLECTION-NAME>`	생성된 임베딩을 저장할 컬렉션의 이름입니다.
`<INDEX-NAME>`	컬렉션의 인덱스 이름입니다.

인덱스를 만듭니다.
```
node create-index.js
```

쿼리 텍스트에 대한 임베딩을 생성합니다.

get-query-embedding.js이라는 파일을 만듭니다.
```
touch get-query-embeddings.js
```

코드를 복사하여 get-query-embedding.js 파일에 붙여넣습니다.

샘플 코드는 다음을 수행합니다.

Cohere를 사용하여 쿼리 텍스트에 대한 float32, int8 및 int1 임베딩을 생성합니다.
PyMongo를 사용하여 생성된 임베딩을 BSON binData 벡터로 변환합니다.
생성된 임베딩을 query-embeddings.json 파일에 저장합니다.

get-query-embedding.js

1 const { CohereClient } = require('cohere-ai');
2 const { BSON } = require('mongodb');
3 const { writeFile } = require('fs/promises');
4 const dotenv = require('dotenv');
5 const process = require('process');
6 
7 // Load environment variables
8 dotenv.config();
9 
10 const { Binary } = BSON;
11 
12 // Get the API key from environment variables or set the key here
13 const apiKey = process.env.COHERE_API_KEY || '<COHERE-API-KEY>';
14 
15 if (!apiKey) {
16   throw new Error('API key not found. Provide the COHERE_API_KEY.');
17 }
18 
19 // Initialize CohereClient
20 const cohere = new CohereClient({ token: apiKey });
21 
22 async function main(queryText) {
23   try {
24     if (typeof queryText !== 'string' || queryText.trim() === '') {
25       throw new Error('Invalid query text. It must be a non-empty string.');
26     }
27 
28     const data = [queryText];
29 
30     // Request embeddings from the Cohere API
31     const response = await cohere.v2.embed({
32       model: 'embed-english-v3.0',
33       inputType: 'search_query',
34       texts: data,
35       embeddingTypes: ['float', 'int8', 'ubinary'], // Request all required embedding types
36     });
37 
38     if (!response.embeddings) {
39       throw new Error('Embeddings not found in the API response.');
40     }
41 
42     const { float, int8, ubinary } = response.embeddings;
43 
44     const updatedEmbeddingsData = data.map((text, index) => {
45       // Create the BSON Binary objects using VECTOR_TYPE for all embedding types
46       const float32Binary = Binary.fromFloat32Array(new Float32Array(float[index])); // VECTOR_TYPE.FLOAT32
47       const int8Binary = Binary.fromInt8Array(new Int8Array(int8[index])); // VECTOR_TYPE.INT8
48       const packedBitsBinary = Binary.fromPackedBits(new Uint8Array(ubinary[index])); // VECTOR_TYPE.PACKED_BIT
49 
50       return {
51         text,
52         embeddings: {
53           float: float[index],
54           int8: int8[index],
55           ubinary: ubinary[index],
56         },
57         bsonEmbeddings: {
58           float32: float32Binary,
59           int8: int8Binary,
60           int1: packedBitsBinary,
61         },
62       };
63     });
64 
65     // Serialize the embeddings using BSON EJSON for BSON compatibility
66     const outputFileName = 'query-embeddings.json';
67     const ejsonSerializedData = BSON.EJSON.stringify(updatedEmbeddingsData, null, null, { relaxed: false });
68     await writeFile(outputFileName, ejsonSerializedData);
69     console.log(`Embeddings with BSON data have been saved to ${outputFileName}`);
70   } catch (error) {
71     console.error('Error processing query text:', error);
72   }
73 }
74 
75 // Main function that takes a query string
76 (async () => {
77   const queryText = "<QUERY-TEXT>"; // Replace with your actual query text
78   await main(queryText);
79 })();

다음 설정을 교체한 후 파일을 저장합니다.

`<COHERE-API-KEY>`	Cohere에 대한 API 키입니다. 환경 변수를 설정하지 않은 경우에만 이 값을 교체하세요.
`<QUERY-TEXT>`	쿼리 텍스트입니다. 이 튜토리얼에서는 `science fact`를 사용합니다.

쿼리 텍스트의 임베딩을 생성하기 위한 코드를 실행합니다.
node get-query-embeddings.js
Embeddings with BSON vectors have been saved to query-embeddings.json

Atlas Vector Search 쿼리를 실행합니다.

run-query.js이라는 파일을 만듭니다.
```
touch run-query.js
```

다음 샘플 $vectorSearch 쿼리를 복사하여 run-query.js 파일에 붙여넣습니다.

샘플 쿼리는 다음을 수행합니다.

Atlas 클러스터에 연결하여 query-embeddings.json 파일의 임베딩을 사용하여 지정된 컬렉션의 bsonEmbeddings.float32, bsonEmbeddings.int8, bsonEmbeddings.int1 필드에 대해 $vectorSearch 쿼리를 실행합니다.
Float32, Int8 및 패킹된 바이너리(Int1) 임베딩의 결과를 콘솔에 출력합니다.

run-query.js

1 const { MongoClient } = require('mongodb');
2 const fs = require('fs/promises');
3 const { BSON } = require('bson'); // Use BSON's functionality for EJSON parsing
4 const dotenv = require('dotenv');
5 
6 dotenv.config();
7 
8 // MongoDB connection details
9 const mongoUri = process.env.MONGODB_URI || '<CONNECTION-STRING>';
10 const dbName = '<DB-NAME>'; // Update with your actual database name
11 const collectionName = '<COLLECTION-NAME>'; // Update with your actual collection name
12 
13 // Indices and paths should match your MongoDB vector search configuration
14 const VECTOR_INDEX_NAME = '<INDEX-NAME>'; // Replace with your actual index name
15 const NUM_CANDIDATES = 5; // Number of candidate documents for the search
16 const LIMIT = 2; // Limit for the number of documents to return
17 
18 // Fields in the collection that contain the BSON query vectors
19 const FIELDS = [
20   { path: 'float32', subtype: 9 }, // Ensure that the path and custom subtype match
21   { path: 'int8', subtype: 9 },    // Use the custom subtype if needed
22   { path: 'int1', subtype: 9 } // Use the same custom subtype
23 ];
24 
25 
26 // Function to read BSON vectors from JSON and run vector search
27 async function main() {
28   // Initialize MongoDB client
29   const client = new MongoClient(mongoUri);
30 
31   try {
32     await client.connect();
33     console.log("Connected to MongoDB");
34 
35     const db = client.db(dbName);
36     const collection = db.collection(collectionName);
37 
38     // Load query embeddings from JSON file using EJSON parsing
39     const fileContent = await fs.readFile('query-embeddings.json', 'utf8');
40     const embeddingsData = BSON.EJSON.parse(fileContent);
41 
42     // Define and run the query for each embedding type
43     const results = {};
44 
45     for (const fieldInfo of FIELDS) {
46       const { path, subtype } = fieldInfo;
47       const bsonBinary = embeddingsData[0]?.bsonEmbeddings?.[path];
48       
49       if (!bsonBinary) {
50         console.warn(`BSON embedding for ${path} not found in the JSON.`);
51         continue;
52       }
53 
54       const bsonQueryVector = bsonBinary; // Directly use BSON Binary object
55 
56       const pipeline = [
57         {
58           $vectorSearch: {
59             index: VECTOR_INDEX_NAME,
60             path: `bsonEmbeddings.${path}`,
61             queryVector: bsonQueryVector,
62             numCandidates: NUM_CANDIDATES,
63             limit: LIMIT,
64           }
65         },
66         {
67           $project: {
68             _id: 0,
69             text: 1, // Adjust projection fields as necessary to match your document structure
70             score: { $meta: 'vectorSearchScore' }
71           }
72         }
73       ];
74 
75       results[path] = await collection.aggregate(pipeline).toArray();
76     }
77 
78     return results;
79   } catch (error) {
80     console.error('Error during vector search:', error);
81   } finally {
82     await client.close();
83   }
84 }
85 
86 // Main execution block
87 (async () => {
88   try {
89     const results = await main();
90 
91     if (results) {
92       console.log("Results from Float32 embeddings:");
93       console.table(results.float32 || []);
94       console.log("--------------------------------------------------------------------------");
95 
96       console.log("Results from Int8 embeddings:");
97       console.table(results.int8 || []);
98       console.log("--------------------------------------------------------------------------");
99 
100       console.log("Results from Packed Binary (PackedBits) embeddings:");
101       console.table(results.int1 || []);
102     }
103   } catch (error) {
104     console.error('Error executing main function:', error);
105   }
106 })();

다음 설정을 교체한 후 run-query.js 파일을 저장합니다.

`<CONNECTION-STRING>`	쿼리를 실행할 Atlas 클러스터에 연결하기 위한 연결 문자열입니다. `MONGODB_URI` 환경 변수를 설정하지 않은 경우에만 이 값을 교체하세요.
`<DB-NAME>`	컬렉션이 포함된 데이터베이스의 이름입니다.
`<COLLECTION-NAME>`	쿼리하려는 컬렉션의 이름입니다.
`<INDEX-NAME>`	컬렉션의 인덱스 이름입니다.

다음 명령을 실행하여 쿼리를 실행합니다.

node run-query.js

Connected to MongoDB
Results from Float32 embeddings:
┌─────────┬─────────────────────────────────────────────────────────┬────────────────────┐
│ (index) │                          text                           │       score        │
├─────────┼─────────────────────────────────────────────────────────┼────────────────────┤
│    0    │ 'Mount Everest is the highest peak on Earth at 8,848m.' │ 0.6583383083343506 │
│    1    │    'The Great Wall of China is visible from space.'     │ 0.6536108255386353 │
└─────────┴─────────────────────────────────────────────────────────┴────────────────────┘
--------------------------------------------------------------------------
Results from Int8 embeddings:
┌─────────┬─────────────────────────────────────────────────────────┬────────────────────┐
│ (index) │                          text                           │       score        │
├─────────┼─────────────────────────────────────────────────────────┼────────────────────┤
│    0    │ 'Mount Everest is the highest peak on Earth at 8,848m.' │ 0.5149773359298706 │
│    1    │    'The Great Wall of China is visible from space.'     │ 0.5146723985671997 │
└─────────┴─────────────────────────────────────────────────────────┴────────────────────┘
--------------------------------------------------------------------------
Results from Packed Binary (PackedBits) embeddings:
┌─────────┬─────────────────────────────────────────────────────────┬─────────────┐
│ (index) │                          text                           │    score    │
├─────────┼─────────────────────────────────────────────────────────┼─────────────┤
│    0    │ 'Mount Everest is the highest peak on Earth at 8,848m.' │ 0.642578125 │
│    1    │    'The Great Wall of China is visible from space.'     │ 0.61328125  │
└─────────┴─────────────────────────────────────────────────────────┴─────────────┘

필요한 라이브러리를 설치합니다.

다음 명령을 실행하여 MongoDB Node.js 드라이버를 설치합니다. 이 작업은 완료하는 데 몇 분 정도 걸릴 수 있습니다.

npm install mongodb

npm install cohere-ai dotenv
npm show cohere-ai version

터미널에서 환경 변수를 설정합니다.

임베딩을 생성하고 변환하기 위해 임베딩 모델 제공자에 액세스하려면, 필요에 따라 임베딩 모델 제공자의 API 키에 대한 환경 변수를 설정하세요.
Cohere의 임베딩을 사용하기 위해 COHERE_API_KEY 환경 변수를 설정하세요.
```
export COHERE_API_KEY="<COHERE-API-KEY>"
```
환경 변수를 설정하지 않는 경우, 코드를 실행하기 전에 샘플 코드의 <COHERE-API-KEY>를 API 키로 교체하세요.
Atlas 클러스터에 액세스하려면 MONGODB_URI 환경 변수를 설정하세요.
```
export MONGODB_URI="<CONNECTION-STRING>"
```
연결 문자열은 다음 형식을 사용해야 합니다.
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```
환경 변수를 설정하지 않는 경우, 코드를 실행하기 전에 샘플 코드의 <CONNECTION-STRING> 을 연결 문자열로 교체하세요.

Atlas 클러스터에서 데이터를 가져옵니다.

get-data.js이라는 파일을 만듭니다.
```
touch get-data.js
```

다음 샘플 코드를 복사하고 붙여넣어 Atlas 클러스터의 sample_airbnb.listingsAndReviews 네임스페이스에서 데이터를 가져옵니다.

샘플 코드는 다음을 수행합니다.

Atlas 클러스터에 연결하고 summary 필드가 있는 문서를 찾습니다.
컬렉션의 데이터를 쓸 subset.json 파일을 생성합니다.

get-data.js

1 const { MongoClient } = require('mongodb');
2 const fs = require('fs'); // Import the fs module for file system operations
3 
4 async function main() {
5     // Replace with your Atlas connection string
6     const uri = process.env.MONGODB_URI || '<CONNECTION-STRING>';
7 
8     // Create a new MongoClient instance
9     const client = new MongoClient(uri);
10 
11     try {
12     // Connect to your Atlas cluster
13     await client.connect();
14 
15     // Specify the database and collection
16     const db = client.db('sample_airbnb');
17     const collection = db.collection('listingsAndReviews');
18 
19     // Filter to exclude null or empty summary fields
20     const filter = { summary: { $nin: [null, ''] } };
21 
22     // Get a subset of documents in the collection
23     const documentsCursor = collection.find(filter).limit(50);
24 
25     // Convert the cursor to an array to get the documents
26     const documents = await documentsCursor.toArray();
27 
28     // Log the documents to verify their content
29     console.log('Documents retrieved:', documents);
30 
31     // Write the documents to a local file called "subset.json"
32     const outputFilePath = './subset.json';
33     fs.writeFileSync(outputFilePath, JSON.stringify(documents, null, 2), 'utf-8');
34 
35     console.log(`Subset of documents written to: ${outputFilePath}`);
36     } catch (error) {
37     console.error('An error occurred:', error);
38     } finally {
39     // Ensure the client is closed when finished
40     await client.close();
41     }
42 }
43 
44 main().catch(console.error);

Atlas 연결 문자열에 대한 환경 변수를 설정하지 않은 경우 <CONNECTION-STRING> 자리 표시자를 교체하고 파일을 저장합니다.
다음 명령을 실행하여 데이터를 가져옵니다.
node get-data.js
Subset of documents written to: ./subset.json

데이터의 벡터 임베딩을 생성하세요.

컬렉션에 float32, int8 또는 int1 벡터 임베딩이 이미 있는 경우 이 단계를 건너뛰세요.

Cohere의 embed API를 사용하여 float32, int8 및 int1 벡터 임베딩을 생성하기 위해 get-embeddings.js 파일을 만듭니다.
```
touch get-embeddings.js
```

다음 코드를 복사하여 get-embeddings.js 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Cohere의 embed-english-v3.0 임베딩 모델을 사용하여 주어진 데이터에 대해 float32, int8 및 int1 임베딩을 생성합니다.
float32, int8 및 int1 임베딩을 각각 float, int8 및 ubinary 필드에 저장합니다.
embeddings.json 파일을 만들고 이 파일에 임베딩을 저장합니다.

get-embeddings.js

1 // Import necessary modules using the CommonJS syntax
2 const { CohereClient } = require('cohere-ai');
3 const { readFile, writeFile } = require('fs/promises');
4 
5 // Retrieve the API key from environment variables or provide a placeholder
6 const apiKey = process.env.COHERE_API_KEY || '<COHERE-API-KEY>';
7 
8 if (!apiKey || apiKey === '<COHERE-API-KEY>') {
9   throw new Error('API key not found. Please set COHERE_API_KEY in your environment.');
10 }
11 
12 // Initialize the Cohere client with the API key
13 const cohere = new CohereClient({ token: apiKey });
14 
15 async function main() {
16   try {
17     // Read and parse the contents of 'subset.json'
18     const subsetData = await readFile('subset.json', 'utf-8');
19     const documents = JSON.parse(subsetData);
20 
21     // Extract the 'summary' fields that are non-empty strings
22     const data = documents
23       .map(doc => doc.summary)
24       .filter(summary => typeof summary === 'string' && summary.length > 0);
25 
26     if (data.length === 0) {
27       throw new Error('No valid summary texts available in the data.');
28     }
29 
30     // Request embeddings from the Cohere API
31     const response = await cohere.v2.embed({
32       model: 'embed-english-v3.0',
33       inputType: 'search_document',
34       texts: data,
35       embeddingTypes: ['float', 'int8', 'ubinary'],
36     });
37 
38     // Extract embeddings from the API response
39     const { float, int8, ubinary } = response.embeddings;
40 
41     // Structure the embeddings data
42     const embeddingsData = data.map((text, index) => ({
43       text,
44       embeddings: {
45         float: float[index],
46         int8: int8[index],
47         ubinary: ubinary[index],
48       },
49     }));
50 
51     // Write the embeddings data to 'embeddings.json'
52     await writeFile('embeddings.json', JSON.stringify(embeddingsData, null, 2));
53     console.log('Embeddings saved to embeddings.json');
54   } catch (error) {
55     console.error('Error fetching embeddings:', error);
56   }
57 }
58 
59 // Execute the main function
60 main();

Cohere API 키에 대한 환경 변수를 설정하지 않은 경우 <COHERE-API-KEY> 자리 표시자를 교체하고 파일을 저장합니다.
코드를 실행하여 임베딩을 생성합니다.
node get-embeddings.js
Embeddings saved to embeddings.json
생성된 embeddings.json 파일을 열어 생성된 임베딩을 확인합니다.

벡터 임베딩을 `binData` 벡터로 변환합니다.

Cohere의 float32, int8, int1 벡터 임베딩을 BSON binData 벡터로 변환하기 위해 convert-embeddings.js 파일을 생성합니다.
```
touch convert-embeddings.js
```

다음 코드를 복사하여 convert-embeddings.js 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

float32, int8 및 int1 임베딩에 대한 BSON binData 벡터를 생성합니다.
float32, int8 및 ubinary BSON binData 벡터를 embeddings.json 파일에 추가합니다.

convert-embeddings.js

1 const fs = require('fs/promises');
2 const { BSON } = require('mongodb');
3 const { Binary } = BSON;
4 
5 async function main() {
6   try {
7     // Read and parse the contents of 'embeddings.json' file
8     const fileContent = await fs.readFile('embeddings.json', 'utf8');
9     const embeddingsData = JSON.parse(fileContent);
10 
11     // Map the embeddings data to add BSON binary representations with subtype 9
12     const convertEmbeddingsData = embeddingsData.map(({ text, embeddings }) => {
13       // Create Binary for Float32Array with manual subtype 9
14       const bsonFloat32 = Binary.fromFloat32Array(new Float32Array(embeddings.float));
15 
16       // Create Binary for Int8Array with subtype 9
17       const bsonInt8 = Binary.fromInt8Array(new Int8Array(embeddings.int8));
18 
19       // Create Binary for PackedBits (Uint8Array) with subtype 9
20       const bsonPackedBits = Binary.fromPackedBits(new Uint8Array(embeddings.ubinary));
21 
22       return {
23         text,
24         embeddings: {
25           float: embeddings.float, // Original float data
26           int8: embeddings.int8, // Original int8 data
27           ubinary: embeddings.ubinary, // Original packed bits data
28         },
29         bsonEmbeddings: {
30           float32: bsonFloat32,
31           int8: bsonInt8,
32           packedBits: bsonPackedBits,
33         },
34       };
35     });
36 
37     // Serialize the updated data to EJSON for BSON compatibility
38     const ejsonSerializedData = BSON.EJSON.stringify(convertEmbeddingsData, null, null, { relaxed: false });
39 
40     // Write the serialized data to 'embeddings.json'
41     await fs.writeFile('embeddings.json', ejsonSerializedData);
42     console.log('Embeddings with BSON vectors have been saved to embeddings.json');
43   } catch (error) {
44     console.error('Error processing embeddings:', error);
45   }
46 }
47 
48 main();

BSON binData 벡터를 생성하는 프로그램을 실행합니다.
node convert-embeddings.js
Embeddings with BSON vectors have been saved to embeddings.json
embeddings.json 파일에서 생성된 BSON 임베딩을 확인합니다.

Atlas 클러스터에 연결하고 데이터를 네임스페이스에 업로드합니다.

upload-data.js 파일을 생성하여 Atlas 클러스터에 연결하고 sample_airbnb.listingsAndReviews 네임스페이스에 데이터를 업로드합니다.
```
touch upload-data.js
```

다음 코드를 복사하여 upload-data.js 파일에 붙여넣습니다.

이 코드는 다음을 수행합니다.

Atlas 클러스터에 연결하여 사용자가 지정한 데이터베이스 및 컬렉션 이름으로 네임스페이스를 생성합니다.
임베딩을 포함한 데이터를 sample_airbnb.listingsAndReviews 네임스페이스에 업로드합니다.

upload-data.js

1 const fs = require('fs/promises'); // Use fs/promises for asynchronous operations
2 const { MongoClient, BSON } = require('mongodb'); // Import from the 'mongodb' package
3 const { EJSON, Binary } = require('bson'); // Import EJSON and Binary from bson
4 
5 async function main() {
6   const MONGODB_URI = process.env.MONGODB_URI || "<CONNECTION-STRING>";
7   const DB_NAME = "sample_airbnb";
8   const COLLECTION_NAME = "listingsAndReviews";
9 
10   let client;
11   try {
12     // Connect to MongoDB
13     client = new MongoClient(MONGODB_URI);
14     await client.connect();
15     console.log("Connected to MongoDB");
16 
17     // Access database and collection
18     const db = client.db(DB_NAME);
19     const collection = db.collection(COLLECTION_NAME);
20 
21     // Load embeddings from JSON using EJSON.parse
22     const fileContent = await fs.readFile('embeddings.json', 'utf8');
23     const embeddingsData = EJSON.parse(fileContent); // Use EJSON.parse
24 
25     // Map embeddings data to recreate BSON binary representations
26     const documents = embeddingsData.map(({ text, bsonEmbeddings }) => {
27       return {
28         summary: text,
29         bsonEmbeddings: {
30           float32: bsonEmbeddings.float32,
31           int8: bsonEmbeddings.int8,
32           int1: bsonEmbeddings.packedBits
33         }
34       };
35     });
36 
37     // Iterate over documents and upsert each into the MongoDB collection
38     for (const doc of documents) {
39       const filter = { summary: doc.summary };
40       const update = { $set: doc };
41 
42       // Update the document with the BSON binary data
43       const result = await collection.updateOne(filter, update, { upsert: true });
44       if (result.matchedCount > 0) {
45         console.log(`Updated document with summary: ${doc.summary}`);
46       } else {
47         console.log(`Inserted new document with summary: ${doc.summary}`);
48       }
49     }
50 
51     console.log("Embeddings stored in MongoDB successfully.");
52   } catch (error) {
53     console.error('Error storing embeddings in MongoDB:', error);
54   } finally {
55     if (client) {
56       await client.close();
57     }
58   }
59 }
60 
61 // Run the main function to load the data
62 main();

Atlas 연결 문자열에 대한 환경 변수를 설정하지 않은 경우 <CONNECTION-STRING> 자리 표시자를 교체하고 파일을 저장합니다.
다음 명령을 실행하여 데이터를 업로드합니다.
node upload-data.js
Connected to MongoDB Updated document with text: ... ... Embeddings stored in MongoDB successfully.
Atlas 클러스터에 로그인하여 Data Explorer에서 네임스페이스를 확인합니다.

컬렉션에 Atlas Vector Search 인덱스를 생성합니다.

create-index.js이라는 파일을 만듭니다.
```
touch create-index.js
```

create-index.js 파일에 인덱스를 생성하기 위해 다음 코드를 복사하여 붙여넣습니다.

이 코드는 다음을 수행합니다.

Atlas 클러스터에 연결하고 지정된 네임스페이스에 대해 지정된 이름으로 인덱스를 생성합니다.
dotProduct 유사성 함수를 사용하여 bsonEmbeddings.float32 및 bsonEmbeddings.int8 필드를 vector 유형으로 인덱싱하고, euclidean 함수를 사용하여 bsonEmbeddings.int1 필드를 vector 유형으로 인덱싱합니다.

create-index.js

1 const { MongoClient } = require("mongodb");
2 const { setTimeout } = require("timers/promises"); // Import from timers/promises
3 
4 // Connect to your Atlas deployment
5 const uri = process.env.MONGODB_URI || "<CONNECTION-STRING>";
6 
7 const client = new MongoClient(uri);
8 
9 async function main() {
10   try {
11     const database = client.db("<DB-NAME>");
12     const collection = database.collection("<COLLECTION-NAME>");
13 
14     // Define your Atlas Vector Search index
15     const index = {
16       name: "<INDEX-NAME>",
17       type: "vectorSearch",
18       definition: {
19         fields: [
20           {
21             type: "vector",
22             numDimensions: 1024,
23             path: "bsonEmbeddings.float32",
24             similarity: "dotProduct",
25           },
26           {
27             type: "vector",
28             numDimensions: 1024,
29             path: "bsonEmbeddings.int8",
30             similarity: "dotProduct",
31           },
32           {
33             type: "vector",
34             numDimensions: 1024,
35             path: "bsonEmbeddings.int1",
36             similarity: "euclidean",
37           },
38         ],
39       },
40     };
41 
42     // Run the helper method
43     const result = await collection.createSearchIndex(index);
44     console.log(`New search index named ${result} is building.`);
45 
46     // Wait for the index to be ready to query
47     console.log("Polling to check if the index is ready. This may take up to a minute.");
48     let isQueryable = false;
49 
50     // Use filtered search for index readiness
51     while (!isQueryable) {
52       const [indexData] = await collection.listSearchIndexes(index.name).toArray();
53 
54       if (indexData) {
55         isQueryable = indexData.queryable;
56         if (!isQueryable) {
57           await setTimeout(5000); // Wait for 5 seconds before checking again
58         }
59       } else {
60         // Handle the case where the index might not be found
61         console.log(`Index ${index.name} not found.`);
62         await setTimeout(5000); // Wait for 5 seconds before checking again
63       }
64     }
65 
66     console.log(`${result} is ready for querying.`);
67   } catch (error) {
68     console.error("Error:", error);
69   } finally {
70     await client.close();
71   }
72 }
73 
74 main().catch((err) => {
75   console.error("Unhandled error:", err);
76 });

다음 설정을 교체한 후 파일을 저장합니다.

`<CONNECTION-STRING>`	데이터베이스와 컬렉션을 생성할 Atlas 클러스터에 연결하기 위한 연결 문자열입니다. `MONGODB_URI` 환경 변수를 설정하지 않은 경우에만 이 값을 교체하세요.
`<DB-NAME>`	컬렉션의 이름이며 `sample_airbnb`입니다.
`<COLLECTION-NAME>`	컬렉션의 이름이며 `listingsAndReviews`입니다.
`<INDEX-NAME>`	컬렉션의 인덱스 이름입니다.

인덱스를 만듭니다.
node create-index.js
New search index named vector_index is building. Polling to check if the index is ready. This may take up to a minute. <INDEX-NAME> is ready for querying.

쿼리 텍스트에 대한 임베딩을 생성합니다.

get-query-embeddings.js이라는 파일을 만듭니다.
```
touch get-query-embeddings.js
```

코드를 복사하여 get-query-embedding.js 파일에 붙여넣습니다.

샘플 코드는 다음을 수행합니다.

Cohere를 사용하여 쿼리 텍스트에 대한 float32, int8 및 int1 임베딩을 생성합니다.
PyMongo를 사용하여 생성된 임베딩을 BSON binData 벡터로 변환합니다.
생성된 임베딩을 query-embeddings.json 파일에 저장합니다.

get-query-embedding.js

1 const { CohereClient } = require('cohere-ai');
2 const { BSON } = require('mongodb');
3 const { writeFile } = require('fs/promises');
4 const dotenv = require('dotenv');
5 const process = require('process');
6 
7 // Load environment variables
8 dotenv.config();
9 
10 const { Binary } = BSON;
11 
12 // Get the API key from environment variables or set the key here
13 const apiKey = process.env.COHERE_API_KEY || '<COHERE-API-KEY>';
14 
15 if (!apiKey) {
16   throw new Error('API key not found. Provide the COHERE_API_KEY.');
17 }
18 
19 // Initialize CohereClient
20 const cohere = new CohereClient({ token: apiKey });
21 
22 async function main(queryText) {
23   try {
24     if (typeof queryText !== 'string' || queryText.trim() === '') {
25       throw new Error('Invalid query text. It must be a non-empty string.');
26     }
27 
28     const data = [queryText];
29 
30     // Request embeddings from the Cohere API
31     const response = await cohere.v2.embed({
32       model: 'embed-english-v3.0',
33       inputType: 'search_query',
34       texts: data,
35       embeddingTypes: ['float', 'int8', 'ubinary'], // Request all required embedding types
36     });
37 
38     if (!response.embeddings) {
39       throw new Error('Embeddings not found in the API response.');
40     }
41 
42     const { float, int8, ubinary } = response.embeddings;
43 
44     const updatedEmbeddingsData = data.map((text, index) => {
45       // Create the BSON Binary objects using VECTOR_TYPE for all embedding types
46       const float32Binary = Binary.fromFloat32Array(new Float32Array(float[index])); // VECTOR_TYPE.FLOAT32
47       const int8Binary = Binary.fromInt8Array(new Int8Array(int8[index])); // VECTOR_TYPE.INT8
48       const packedBitsBinary = Binary.fromPackedBits(new Uint8Array(ubinary[index])); // VECTOR_TYPE.PACKED_BIT
49 
50       return {
51         text,
52         embeddings: {
53           float: float[index],
54           int8: int8[index],
55           ubinary: ubinary[index],
56         },
57         bsonEmbeddings: {
58           float32: float32Binary,
59           int8: int8Binary,
60           int1: packedBitsBinary,
61         },
62       };
63     });
64 
65     // Serialize the embeddings using BSON EJSON for BSON compatibility
66     const outputFileName = 'query-embeddings.json';
67     const ejsonSerializedData = BSON.EJSON.stringify(updatedEmbeddingsData, null, null, { relaxed: false });
68     await writeFile(outputFileName, ejsonSerializedData);
69     console.log(`Embeddings with BSON data have been saved to ${outputFileName}`);
70   } catch (error) {
71     console.error('Error processing query text:', error);
72   }
73 }
74 
75 // Main function that takes a query string
76 (async () => {
77   const queryText = "<QUERY-TEXT>"; // Replace with your actual query text
78   await main(queryText);
79 })();

다음 설정을 교체한 후 파일을 저장합니다.

`<COHERE-API-KEY>`	Cohere에 대한 API 키입니다. 키를 환경 변수로 설정하지 않은 경우에만 이 값을 교체하세요.
`<QUERY-TEXT>`	쿼리 텍스트입니다. 이 예시에서는 `ocean view`를 사용합니다.

쿼리 텍스트의 임베딩을 생성하기 위한 코드를 실행합니다.
node get-query-embeddings.js
Embeddings with BSON vectors have been saved to query-embeddings.json

Atlas Vector Search 쿼리를 실행합니다.

run-query.js이라는 파일을 만듭니다.
```
touch run-query.js
```

다음 샘플 $vectorSearch 쿼리를 복사하여 run-query.js 파일에 붙여넣습니다.

샘플 쿼리는 다음을 수행합니다.

Atlas 클러스터에 연결하고 query-embeddings.json 파일에 있는 임베딩을 사용하여 sample_airbnb.listingsAndReviews 네임스페이스의 bsonEmbeddings.float32, bsonEmbeddings.int8 및 bsonEmbeddings.int1 필드에 대해 $vectorSearch 쿼리를 실행합니다.
Float32, Int8 및 패킹된 바이너리(Int1) 임베딩의 결과를 콘솔에 출력합니다.

run-query.js

1 const { MongoClient } = require('mongodb');
2 const fs = require('fs/promises');
3 const { BSON } = require('bson'); // Use BSON's functionality for EJSON parsing
4 const dotenv = require('dotenv');
5 
6 dotenv.config();
7 
8 // MongoDB connection details
9 const mongoUri = process.env.MONGODB_URI || '<CONNECTION-STRING>';
10 const dbName = 'sample_airbnb'; // Update with your actual database name
11 const collectionName = 'listingsAndReviews'; // Update with your actual collection name
12 
13 // Indices and paths should match your MongoDB vector search configuration
14 const VECTOR_INDEX_NAME = '<INDEX-NAME>'; // Replace with your actual index name
15 const NUM_CANDIDATES = 20; // Number of candidate documents for the search
16 const LIMIT = 5; // Limit for the number of documents to return
17 
18 // Fields in the collection that contain the BSON query vectors
19 const FIELDS = [
20   { path: 'float32', subtype: 9 }, // Ensure that the path and custom subtype match
21   { path: 'int8', subtype: 9 },    // Use the custom subtype if needed
22   { path: 'int1', subtype: 9 } // Use the same custom subtype
23 ];
24 
25 
26 // Function to read BSON vectors from JSON and run vector search
27 async function main() {
28   // Initialize MongoDB client
29   const client = new MongoClient(mongoUri);
30 
31   try {
32     await client.connect();
33     console.log("Connected to MongoDB");
34 
35     const db = client.db(dbName);
36     const collection = db.collection(collectionName);
37 
38     // Load query embeddings from JSON file using EJSON parsing
39     const fileContent = await fs.readFile('query-embeddings.json', 'utf8');
40     const embeddingsData = BSON.EJSON.parse(fileContent);
41 
42     // Define and run the query for each embedding type
43     const results = {};
44 
45     for (const fieldInfo of FIELDS) {
46       const { path, subtype } = fieldInfo;
47       const bsonBinary = embeddingsData[0]?.bsonEmbeddings?.[path];
48       
49       if (!bsonBinary) {
50         console.warn(`BSON embedding for ${path} not found in the JSON.`);
51         continue;
52       }
53 
54       const bsonQueryVector = bsonBinary; // Directly use BSON Binary object
55 
56       const pipeline = [
57         {
58           $vectorSearch: {
59             index: VECTOR_INDEX_NAME,
60             path: `bsonEmbeddings.${path}`,
61             queryVector: bsonQueryVector,
62             numCandidates: NUM_CANDIDATES,
63             limit: LIMIT,
64           }
65         },
66         {
67           $project: {
68             _id: 0,
69             name: 1,
70             summary: 1, // Adjust projection fields as necessary to match your document structure
71             score: { $meta: 'vectorSearchScore' }
72           }
73         }
74       ];
75 
76       results[path] = await collection.aggregate(pipeline).toArray();
77     }
78 
79     return results;
80   } catch (error) {
81     console.error('Error during vector search:', error);
82   } finally {
83     await client.close();
84   }
85 }
86 
87 // Main execution block
88 (async () => {
89     try {
90       const results = await main();
91   
92       if (results) {
93         console.log("Results from Float32 embeddings:");
94         (results.float32 || []).forEach((result, index) => {
95           console.log(`Result ${index + 1}:`, result);
96         });
97   
98         console.log("Results from Int8 embeddings:");
99         (results.int8 || []).forEach((result, index) => {
100           console.log(`Result ${index + 1}:`, result);
101         });
102   
103         console.log("Results from Packed Binary (PackedBits) embeddings:");
104         (results.int1 || []).forEach((result, index) => {
105           console.log(`Result ${index + 1}:`, result);
106         });
107       }
108     } catch (error) {
109       console.error('Error executing main function:', error);
110     }
111   })();
112

다음 설정을 교체한 후 run-query.js 파일을 저장합니다.

`<CONNECTION-STRING>`	데이터베이스와 컬렉션을 생성할 Atlas 클러스터에 연결하기 위한 연결 문자열입니다. `MONGODB_URI` 환경 변수를 설정하지 않은 경우 이 값을 교체하세요.
`<INDEX-NAME>`	컬렉션의 인덱스 이름입니다.

쿼리를 실행합니다.

쿼리를 실행하려면 다음 명령을 실행하세요.

node run-query.js

Connected to MongoDB
Results from Float32 embeddings:
Result 1: {
name: 'Makaha Valley Paradise with OceanView',
summary: "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.",
score: 0.7278661131858826
}
Result 2: {
name: 'Ocean View Waikiki Marina w/prkg',
summary: "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.",
score: 0.688639760017395
}
Result 3: {
name: 'A Casa Alegre é um apartamento T1.',
summary: 'Para 2 pessoas. Vista de mar a 150 mts. Prédio com 2 elevadores. Tem: - quarto com roupeiro e cama de casal (colchão magnetizado); - cozinha: placa de discos, exaustor, frigorifico, micro-ondas e torradeira; casa de banho completa; - sala e varanda.',
score: 0.6831139326095581
}
Result 4: {
name: 'Your spot in Copacabana',
summary: 'Having a large airy living room. The apartment is well divided. Fully furnished and cozy. The building has a 24h doorman and camera services in the corridors. It is very well located, close to the beach, restaurants, pubs and several shops and supermarkets. And it offers a good mobility being close to the subway.',
score: 0.6802051663398743
}
Result 5: {
name: 'LAHAINA, MAUI! RESORT/CONDO BEACHFRONT!! SLEEPS 4!',
summary: 'THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!',
score: 0.6779564619064331
}
Results from Int8 embeddings:
Result 1: {
name: 'Makaha Valley Paradise with OceanView',
summary: "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.",
score: 0.5215557217597961
}
Result 2: {
name: 'Ocean View Waikiki Marina w/prkg',
summary: "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.",
score: 0.5179016590118408
}
Result 3: {
name: 'A Casa Alegre é um apartamento T1.',
summary: 'Para 2 pessoas. Vista de mar a 150 mts. Prédio com 2 elevadores. Tem: - quarto com roupeiro e cama de casal (colchão magnetizado); - cozinha: placa de discos, exaustor, frigorifico, micro-ondas e torradeira; casa de banho completa; - sala e varanda.',
score: 0.5173280239105225
}
Result 4: {
name: 'Your spot in Copacabana',
summary: 'Having a large airy living room. The apartment is well divided. Fully furnished and cozy. The building has a 24h doorman and camera services in the corridors. It is very well located, close to the beach, restaurants, pubs and several shops and supermarkets. And it offers a good mobility being close to the subway.',
score: 0.5170232057571411
}
Result 5: {
name: 'LAHAINA, MAUI! RESORT/CONDO BEACHFRONT!! SLEEPS 4!',
summary: 'THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!',
score: 0.5168724060058594
}
Results from Packed Binary (PackedBits) embeddings:
Result 1: {
name: 'Makaha Valley Paradise with OceanView',
summary: "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.",
score: 0.6591796875
}
Result 2: {
name: 'Ocean View Waikiki Marina w/prkg',
summary: "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.",
score: 0.6337890625
}
Result 3: {
name: 'A Casa Alegre é um apartamento T1.',
summary: 'Para 2 pessoas. Vista de mar a 150 mts. Prédio com 2 elevadores. Tem: - quarto com roupeiro e cama de casal (colchão magnetizado); - cozinha: placa de discos, exaustor, frigorifico, micro-ondas e torradeira; casa de banho completa; - sala e varanda.',
score: 0.62890625
}
Result 4: {
name: 'LAHAINA, MAUI! RESORT/CONDO BEACHFRONT!! SLEEPS 4!',
summary: 'THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!',
score: 0.6279296875
}
Result 5: {
name: 'Be Happy in Porto',
summary: 'Be Happy Apartment is an amazing space. Renovated and comfortable apartment, located in a building dating from the nineteenth century in one of the most emblematic streets of the Porto city "Rua do Almada".  Be Happy Apartment is located in the city center, able you to visit the historic center only by foot, being very close of majority points of interesting of the Porto City. Be Happy Apartment is located close of central Station MetroTrindade.',
score: 0.619140625
}

생성된 임베딩은 환경에 따라 달라질 수 있으므로, 결과가 다르게 나타날 수 있습니다.

확장자가 .ipynb 인 파일 저장하여 대화형 Python 노트북을 만든 후 노트북에서 다음 단계를 수행합니다. 예시 사용해 보려면 자리 표시자를 유효한 값으로 바꾸세요.

이 튜토리얼의 실행 가능한 버전을 Python 노트북으로 작업하세요.

필요한 라이브러리를 설치합니다.

다음 명령을 실행하여 PyMongo 드라이버 설치합니다. 필요한 경우 임베딩 모델 제공자 로부터 라이브러리를 설치할 수도 있습니다. 이 작업을 완료하는 데 몇 분 정도 걸릴 수 있습니다.

pip install pymongo

PyMongo v4.10 이상 버전의 드라이버를 설치해야 합니다.

예시

PyMongo 및 Cohere 설치

pip install --quiet --upgrade pymongo cohere

BSON 벡터를 생성하려는 데이터를 노트북에 로드합니다.

예시

가져올 샘플 데이터

data = [
   "The Great Wall of China is visible from space.",
   "The Eiffel Tower was completed in Paris in 1889.",
   "Mount Everest is the highest peak on Earth at 8,848m.",
   "Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
   "The Mona Lisa was painted by Leonardo da Vinci.",
]

(조건부)데이터에서 임베딩을 생성합니다.

이 단계는 데이터에서 임베딩을 아직 생성하지 않은 경우에 필요합니다. 이미 임베딩을 생성했다면 이 단계를 건너뛰세요. 데이터에서 임베딩을 생성하는 방법에 대한 자세한 내용은 벡터 임베딩 생성 방법을 참조하세요.

예시

Cohere를 사용하여 샘플 데이터에서 임베딩 생성하기

자리 표시자	유효한 값
`<COHERE-API-KEY>`	코히어용 API 키입니다.

import os
import cohere
# Specify your Cohere API key
os.environ["COHERE_API_KEY"] = "<COHERE-API-KEY>"
cohere_client = cohere.Client(os.environ["COHERE_API_KEY"])
# Generate embeddings using the embed-english-v3.0 model
generated_embeddings = cohere_client.embed(
   texts=data,
   model="embed-english-v3.0",
   input_type="search_document",
   embedding_types=["float", "int8", "ubinary"]
).embeddings
float32_embeddings = generated_embeddings.float
int8_embeddings = generated_embeddings.int8
int1_embeddings = generated_embeddings.ubinary

임베딩에서 BSON 벡터를 생성합니다.

PyMongo 드라이버를 사용하면 네이티브 벡터 임베딩을 BSON 벡터로 변환할 수 있습니다.

예시

함수를 정의하고 실행하여 BSON 벡터 생성하기

from bson.binary import Binary, BinaryVectorDtype
def generate_bson_vector(vector, vector_dtype):
   return Binary.from_vector(vector, vector_dtype)
# For all vectors in your collection, generate BSON vectors of float32, int8, and int1 embeddings
bson_float32_embeddings = []
bson_int8_embeddings = []
bson_int1_embeddings = []
for i, (f32_emb, int8_emb, int1_emb) in enumerate(zip(float32_embeddings, int8_embeddings, int1_embeddings)):
   bson_float32_embeddings.append(generate_bson_vector(f32_emb, BinaryVectorDtype.FLOAT32))
   bson_int8_embeddings.append(generate_bson_vector(int8_emb, BinaryVectorDtype.INT8))
   bson_int1_embeddings.append(generate_bson_vector(int1_emb, BinaryVectorDtype.PACKED_BIT))

BSON 벡터 임베딩을 사용하여 문서를 만듭니다.

컬렉션의 문서 내에 이미 BSON 벡터 임베딩이 있는 경우 이 단계를 건너뛰세요.

예시

샘플 데이터로 문서 만들기

자리 표시자	유효한 값
`<FIELD-NAME-FOR-FLOAT32-TYPE>`	`float32` 값이 있는 필드 의 이름입니다.
`<FIELD-NAME-FOR-INT8-TYPE>`	`int8` 값이 있는 필드 의 이름입니다.
`<FIELD-NAME-FOR-INT1-TYPE>`	`int1` 값이 있는 필드 의 이름입니다.

# Specify the field names for the float32, int8, and int1 embeddings
float32_field = "<FIELD-NAME-FOR-FLOAT32-TYPE>"
int8_field = "<FIELD-NAME-FOR-INT8-TYPE>"
int1_field = "<FIELD-NAME-FOR-INT1-TYPE>"
# Define function to create documents with BSON vector embeddings
def create_docs_with_bson_vector_embeddings(bson_float32_embeddings, bson_int8_embeddings, bson_int1_embeddings, data):
  docs = []
  for i, (bson_f32_emb, bson_int8_emb, bson_int1_emb, text) in enumerate(zip(bson_float32_embeddings, bson_int8_embeddings, bson_int1_embeddings, data)):
     doc = {
          "_id": i,
          "data": text,
          float32_field: bson_f32_emb,
          int8_field: bson_int8_emb,
          int1_field: bson_int1_emb
     }
     docs.append(doc)
  return docs
# Create the documents
documents = create_docs_with_bson_vector_embeddings(bson_float32_embeddings, bson_int8_embeddings, bson_int1_embeddings, data)

Atlas 클러스터에 데이터를 로드합니다.

Atlas UI에서 프로그래밍 방식으로 데이터를 로드할 수 있습니다. Atlas UI에서 데이터를 로드하는 방법을 알아보려면 데이터 삽입을 참조하세요. 다음 단계와 관련 예시에서는 PyMongo 드라이버를 사용하여 프로그래밍 방식으로 데이터를 로드하는 방법을 보여줍니다.

Atlas cluster에 연결합니다.
자리 표시자
유효한 값
<ATLAS-CONNECTION-STRING>
Atlas 연결 string. 자세한 내용은 드라이버를 통한 연결을 참조하세요.
예시
import pymongo mongo_client = pymongo.MongoClient("<ATLAS-CONNECTION-STRING>") if not MONGO_URI: print("MONGO_URI not set in environment variables")

Atlas 클러스터에 데이터를 로드합니다.

자리 표시자	유효한 값
`<DB-NAME>`	데이터베이스의 이름입니다.
`<COLLECTION-NAME>`	지정된 데이터베이스 에 있는 컬렉션 의 이름입니다.

예시

# Insert documents into a new database and collection
db = mongo_client["<DB-NAME>"]
collection_name = "<COLLECTION-NAME>"
db.create_collection(collection_name)
collection = db[collection_name]
collection.insert_many(documents)

컬렉션에 Atlas Vector Search 인덱스를 생성합니다.

Atlas UI, Atlas CLI, Atlas 관리 API, MongoDB 드라이버를 사용하여 Atlas Vector Search 인덱스를 생성할 수 있습니다. 자세한 내용은 벡터 검색용 필드 인덱싱 방법을 참조하세요.

예시

샘플 컬렉션에 대한 인덱스 만들기

자리 표시자	유효한 값
`<INDEX-NAME>`	`vector` 유형 인덱스 의 이름입니다.

from pymongo.operations import SearchIndexModel
import time
# Define and create the vector search index
index_name = "<INDEX-NAME>"
search_index_model = SearchIndexModel(
  definition={
    "fields": [
      {
        "type": "vector",
        "path": float32_field,
        "similarity": "dotProduct",
        "numDimensions": 1024
      },
      {
        "type": "vector",
        "path": int8_field,
        "similarity": "dotProduct",
        "numDimensions": 1024
      },
      {
        "type": "vector",
        "path": int1_field,
        "similarity": "euclidean",
        "numDimensions": 1024
      }
    ]
  },
  name=index_name,
  type="vectorSearch"
)
result = collection.create_search_index(model=search_index_model)
print("New search index named " + result + " is building.")
# Wait for initial sync to complete
print("Polling to check if the index is ready. This may take up to a minute.")
predicate=None
if predicate is None:
  predicate = lambda index: index.get("queryable") is True
while True:
  indices = list(collection.list_search_indexes(index_name))
  if len(indices) and predicate(indices[0]):
    break
  time.sleep(5)
print(result + " is ready for querying.")

Atlas Vector Search 쿼리를 실행하는 함수를 정의합니다.

Atlas Vector Search 쿼리를 실행하는 함수는 다음 작업을 수행해야 합니다.

쿼리 텍스트를 BSON 벡터로 변환합니다.
Atlas Vector Search 쿼리에 대한 파이프라인을 정의합니다.

예시

자리 표시자	유효한 값
`<NUMBER-OF-CANDIDATES-TO-CONSIDER>`	검색 중에 사용할 가장 가까운 이웃의 수입니다.
`<NUMBER-OF-DOCUMENTS-TO-RETURN>`	결과에서 반환할 문서 수입니다.

# Define a function to run a vector search query
def run_vector_search(query_text, collection, path):
  query_text_embeddings = cohere_client.embed(
    texts=[query_text],
    model="embed-english-v3.0",
    input_type="search_query",
    embedding_types=["float", "int8", "ubinary"]
  ).embeddings
  if path == float32_field:
    query_vector = query_text_embeddings.float[0]
    vector_dtype = BinaryVectorDtype.FLOAT32
  elif path == int8_field:
    query_vector = query_text_embeddings.int8[0]
    vector_dtype = BinaryVectorDtype.INT8
  elif path == int1_field:
    query_vector = query_text_embeddings.ubinary[0]
    vector_dtype = BinaryVectorDtype.PACKED_BIT
  bson_query_vector = generate_bson_vector(query_vector, vector_dtype)
  pipeline = [
    {
      '$vectorSearch': {
        'index': index_name,
        'path': path,
        'queryVector': bson_query_vector,
        'numCandidates': <NUMBER-OF-CANDIDATES-TO-CONSIDER>, # for example, 5
        'limit': <NUMBER-OF-DOCUMENTS-TO-RETURN> # for example, 2
       }
     },
     {
       '$project': {
         '_id': 0,
         'data': 1,
         'score': { '$meta': 'vectorSearchScore' }
        }
     }
  ]
  return collection.aggregate(pipeline)

Atlas Vector Search 쿼리를 실행합니다.

Atlas Vector Search 쿼리를 프로그래밍 방식으로 실행할 수 있습니다. 자세한 내용은 벡터 검색 쿼리 실행을 참조하세요.

예시

from pprint import pprint
# Run the vector search query on the float32, int8, and int1 embeddings
query_text = "tell me a science fact"
float32_results = run_vector_search(query_text, collection, float32_field)
int8_results = run_vector_search(query_text, collection, int8_field)
int1_results = run_vector_search(query_text, collection, int1_field)
print("results from float32 embeddings")
pprint(list(float32_results))
print("--------------------------------------------------------------------------")
print("results from int8 embeddings")
pprint(list(int8_results))
print("--------------------------------------------------------------------------")
print("results from int1 embeddings")
pprint(list(int1_results))

results from float32 embeddings
[{'data': 'Mount Everest is the highest peak on Earth at 8,848m.',
  'score': 0.6578356027603149},
 {'data': 'The Great Wall of China is visible from space.',
  'score': 0.6420407891273499}]
--------------------------------------------------------------------------
results from int8 embeddings
[{'data': 'Mount Everest is the highest peak on Earth at 8,848m.',
  'score': 0.5149182081222534},
 {'data': 'The Great Wall of China is visible from space.',
  'score': 0.5136760473251343}]
--------------------------------------------------------------------------
results from int1 embeddings
[{'data': 'Mount Everest is the highest peak on Earth at 8,848m.',
  'score': 0.62109375},
 {'data': 'The Great Wall of China is visible from space.',
  'score': 0.61328125}]

이 튜토리얼의 실행 가능한 버전을 Python 노트북으로 작업하세요.

필요한 라이브러리를 설치합니다.

pip install pymongo

PyMongo v4.10 이상 버전의 드라이버를 설치해야 합니다.

예시

PyMongo 및 Cohere 설치

pip install --quiet --upgrade pymongo cohere

벡터 임베딩을 생성하고 임베딩을 BSON 호환 형식으로 변환하는 함수를 정의합니다.

임베딩 모델을 사용하여 다음을 수행하는 함수를 정의해야 합니다.

기존 데이터에 임베딩이 없는 경우 기존 데이터에서 임베딩을 생성합니다.
임베딩을 BSON 벡터로 변환합니다.

예시

임베딩을 생성하고 변환하는 함수

자리 표시자	유효한 값
`<COHERE-API-KEY>`	코히어용 API 키입니다.

import os
import pymongo
import cohere
from bson.binary import Binary, BinaryVectorDtype
# Specify your Cohere API key
os.environ["COHERE_API_KEY"] = "<COHERE-API-KEY>"
cohere_client = cohere.Client(os.environ["COHERE_API_KEY"])
# Define function to generate embeddings using the embed-english-v3.0 model
def get_embedding(text):
    response = cohere_client.embed(
      texts=[text],
      model='embed-english-v3.0',
      input_type='search_document',
      embedding_types=["float"]
    )
    embedding = response.embeddings.float[0]
    return embedding
# Define function to convert embeddings to BSON-compatible format
def generate_bson_vector(vector, vector_dtype):
    return Binary.from_vector(vector, vector_dtype)

import os
import pymongo
import cohere
from bson.binary import Binary, BinaryVectorDtype
# Specify your Cohere API key
os.environ["COHERE_API_KEY"] = "<COHERE-API-KEY>"
cohere_client = cohere.Client(os.environ["COHERE_API_KEY"])
# Define function to generate embeddings using the embed-english-v3.0 model
def get_embedding(text):
    response = cohere_client.embed(
      texts=[text],
      model='embed-english-v3.0',
      input_type='search_document',
      embedding_types=["int8"]
    )
    embedding = response.embeddings.int8[0]
    return embedding
# Define function to convert embeddings to BSON-compatible format
def generate_bson_vector(vector, vector_dtype):
    return Binary.from_vector(vector, vector_dtype)

import os
import pymongo
import cohere
from bson.binary import Binary, BinaryVectorDtype
# Specify your Cohere API key
os.environ["COHERE_API_KEY"] = "<COHERE-API-KEY>"
cohere_client = cohere.Client(os.environ["COHERE_API_KEY"])
# Define function to generate embeddings using the embed-english-v3.0 model
def get_embedding(text):
    response = cohere_client.embed(
      texts=[text],
      model='embed-english-v3.0',
      input_type='search_document',
      embedding_types=["ubinary"]
    )
    embedding = response.embeddings.ubinary[0]
    return embedding
# Define function to convert embeddings to BSON-compatible format
def generate_bson_vector(vector, vector_dtype):
    return Binary.from_vector(vector, vector_dtype)

Atlas cluster 에 연결하고 기존 데이터를 조회 .

다음을 제공해야 합니다.

임베딩을 생성하려는 데이터베이스 및 컬렉션 포함된 Atlas cluster 에 연결하기 위한 연결 문자열입니다.
임베딩을 생성하려는 컬렉션 이 포함된 데이터베이스 의 이름입니다.
임베딩을 생성하려는 컬렉션 의 이름입니다.

예시

데이터 액세스를 위해 Atlas 클러스터에 연결

자리 표시자	유효한 값
`<ATLAS-CONNECTION-STRING>`	Atlas 연결 string. 자세한 내용은 드라이버를 통한 연결을 참조하세요.

1 # Connect to your Atlas cluster
2 mongo_client = pymongo.MongoClient("<ATLAS-CONNECTION-STRING>")
3 db = mongo_client["sample_airbnb"]
4 collection = db["listingsAndReviews"]
5 
6 # Filter to exclude null or empty summary fields
7 filter = { "summary": {"$nin": [None, ""]} }
8 
9 # Get a subset of documents in the collection
10 documents = collection.find(filter).limit(50)
11 
12 # Initialize the count of updated documents
13 updated_doc_count = 0

컬렉션 에 임베딩을 생성, 변환 및 로드합니다.

데이터에 아직 임베딩이 없는 경우 임베딩 모델을 사용하여 데이터에서 임베딩을 생성합니다. 데이터에서 임베딩을 생성하는 방법에 대해 자세히 학습 벡터 임베딩을 만드는 방법을 참조하세요.
다음 예시 의 줄에 표시된 대로 임베딩을 BSON 벡터로 변환합니다. 7
Atlas cluster 의 컬렉션 에 임베딩을 업로드합니다.

이 작업을 완료하는 데 몇 분 정도 걸릴 수 있습니다.

예시

컬렉션에 임베딩 생성, 변환 및 로드

for doc in documents:
    # Generate embeddings based on the summary
    summary = doc["summary"]
    embedding = get_embedding(summary)  # Get float32 embedding
    # Convert the float32 embedding to BSON format
    bson_float32 = generate_bson_vector(embedding, BinaryVectorDtype.FLOAT32)
    # Update the document with the BSON embedding
    collection.update_one(
        {"_id": doc["_id"]},
        {"$set": {"embedding": bson_float32}}
    )
    updated_doc_count += 1
print(f"Updated {updated_doc_count} documents with BSON embeddings.")

for doc in documents:
    # Generate embeddings based on the summary
    summary = doc["summary"]
    embedding = get_embedding(summary)  # Get int8 embedding
    # Convert the int8 embedding to BSON format
    bson_int8 = generate_bson_vector(embedding, BinaryVectorDtype.INT8)
    # Update the document with the BSON embedding
    collection.update_one(
        {"_id": doc["_id"]},
        {"$set": {"embedding": bson_int8}}
    )
    updated_doc_count += 1
print(f"Updated {updated_doc_count} documents with BSON embeddings.")

for doc in documents:
    # Generate embeddings based on the summary
    summary = doc["summary"]
    embedding = get_embedding(summary)  # Get int1 embedding
    # Convert the int1 embedding to BSON format
    bson_int1 = generate_bson_vector(embedding, BinaryVectorDtype.PACKED_BIT)
    # Update the document with the BSON embedding
    collection.update_one(
        {"_id": doc["_id"]},
        {"$set": {"embedding": bson_int1}}
    )
    updated_doc_count += 1
print(f"Updated {updated_doc_count} documents with BSON embeddings.")

컬렉션에 Atlas Vector Search 인덱스를 생성합니다.

Atlas UI, Atlas CLI, Atlas 관리 API 및 MongoDB 드라이버를 사용하여 원하는 언어 로 Atlas Vector Search 인덱스를 만들 수 있습니다. 자세한 학습 은 벡터 검색을 위한 필드 인덱싱 방법을 참조하세요.

예시

컬렉션에 대한 인덱스 만들기

자리 표시자	유효한 값
`<INDEX-NAME>`	`vector` 유형 인덱스 의 이름입니다.

1 from pymongo.operations import SearchIndexModel
2 import time
3 
4 # Define and create the vector search index
5 index_name = "<INDEX-NAME>"
6 search_index_model = SearchIndexModel(
7   definition={
8     "fields": [
9       {
10         "type": "vector",
11         "path": "embedding",
12         "similarity": "euclidean",
13         "numDimensions": 1024
14       }
15     ]
16   },
17   name=index_name,
18   type="vectorSearch"
19 )
20 result = collection.create_search_index(model=search_index_model)
21 print("New search index named " + result + " is building.")
22 
23 # Wait for initial sync to complete
24 print("Polling to check if the index is ready. This may take up to a minute.")
25 predicate=None
26 if predicate is None:
27   predicate = lambda index: index.get("queryable") is True
28 while True:
29   indices = list(collection.list_search_indexes(index_name))
30   if len(indices) and predicate(indices[0]):
31     break
32   time.sleep(5)
33 print(result + " is ready for querying.")

Atlas Vector Search 쿼리를 실행하는 함수를 정의합니다.

Atlas Vector Search 쿼리를 실행하는 함수는 다음 작업을 수행해야 합니다.

쿼리 텍스트에 대한 임베딩을 생성합니다.
쿼리 텍스트를 BSON 벡터로 변환합니다.
Atlas Vector Search 쿼리에 대한 파이프라인을 정의합니다.

예시

Atlas Vector Search 쿼리를 실행하는 함수

자리 표시자	유효한 값
`<NUMBER-OF-CANDIDATES-TO-CONSIDER>`	검색 중에 사용할 가장 가까운 이웃의 수입니다.
`<NUMBER-OF-DOCUMENTS-TO-RETURN>`	결과에서 반환할 문서 수입니다.

def run_vector_search(query_text, collection, path):
  query_embedding = get_embedding(query_text)
  bson_query_vector = generate_bson_vector(query_embedding, BinaryVectorDtype.FLOAT32)
  pipeline = [
    {
      '$vectorSearch': {
        'index': index_name,
        'path': path,
        'queryVector': bson_query_vector,
        'numCandidates': <NUMBER-OF-CANDIDATES-TO-CONSIDER>, # for example, 20
        'limit': <NUMBER-OF-DOCUMENTS-TO-RETURN> # for example, 5
       }
     },
     {
       '$project': {
         '_id': 0,
         'name': 1,
         'summary': 1,
         'score': { '$meta': 'vectorSearchScore' }
        }
     }
  ]
  return collection.aggregate(pipeline)

def run_vector_search(query_text, collection, path):
  query_embedding = get_embedding(query_text)
  bson_query_vector = generate_bson_vector(query_embedding, BinaryVectorDtype.INT8)
  pipeline = [
    {
      '$vectorSearch': {
        'index': index_name,
        'path': path,
        'queryVector': bson_query_vector,
        'numCandidates': <NUMBER-OF-CANDIDATES-TO-CONSIDER>, # for example, 20
        'limit': <NUMBER-OF-DOCUMENTS-TO-RETURN> # for example, 5
       }
     },
     {
       '$project': {
         '_id': 0,
         'name': 1,
         'summary': 1,
         'score': { '$meta': 'vectorSearchScore' }
        }
     }
  ]
  return collection.aggregate(pipeline)

def run_vector_search(query_text, collection, path):
  query_embedding = get_embedding(query_text)
  bson_query_vector = generate_bson_vector(query_embedding, BinaryVectorDtype.PACKED_BIT)
  pipeline = [
    {
      '$vectorSearch': {
        'index': index_name,
        'path': path,
        'queryVector': bson_query_vector,
        'numCandidates': <NUMBER-OF-CANDIDATES-TO-CONSIDER>, # for example, 20
        'limit': <NUMBER-OF-DOCUMENTS-TO-RETURN> # for example, 5
       }
     },
     {
       '$project': {
         '_id': 0,
         'name': 1,
         'summary': 1,
         'score': { '$meta': 'vectorSearchScore' }
        }
     }
  ]
  return collection.aggregate(pipeline)

Atlas Vector Search 쿼리를 실행합니다.

Atlas Vector Search 쿼리를 프로그래밍 방식으로 실행할 수 있습니다. 자세한 내용은 벡터 검색 쿼리 실행을 참조하세요.

예시

샘플 Atlas Vector Search 쿼리 실행

from pprint import pprint
query_text = "ocean view"
query_results = run_vector_search(query_text, collection, "embedding")
print("query results:")
pprint(list(query_results))

query results:
[{'name': 'Your spot in Copacabana',
  'score': 0.5468248128890991,
  'summary': 'Having a large airy living room. The apartment is well divided. '
             'Fully furnished and cozy. The building has a 24h doorman and '
             'camera services in the corridors. It is very well located, close '
             'to the beach, restaurants, pubs and several shops and '
             'supermarkets. And it offers a good mobility being close to the '
             'subway.'},
 {'name': 'Twin Bed room+MTR Mongkok shopping&My',
  'score': 0.527062714099884,
  'summary': 'Dining shopping conveniently located Mongkok subway E1, airport '
             'shuttle bus stops A21. Three live two beds, separate WC, 24-hour '
             'hot water. Free WIFI.'},
{'name': 'Quarto inteiro na Tijuca',
  'score': 0.5222363471984863,
  'summary': 'O quarto disponível tem uma cama de solteiro, sofá e computador '
             'tipo desktop para acomodação.'},
 {'name': 'Makaha Valley Paradise with OceanView',
  'score': 0.5175154805183411,
  'summary': 'A beautiful and comfortable 1 Bedroom Air Conditioned Condo in '
             'Makaha Valley - stunning Ocean & Mountain views All the '
             'amenities of home, suited for longer stays. Full kitchen & large '
             "bathroom.  Several gas BBQ's for all guests to use & a large "
             'heated pool surrounded by reclining chairs to sunbathe.  The '
             'Ocean you see in the pictures is not even a mile away, known as '
             'the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  '
             'paddle boarding, surfing are all just minutes from the front '
             'door.'},
 {'name': 'Cozy double bed room 東涌鄉村雅緻雙人房',
  'score': 0.5149975419044495,
  'summary': 'A comfortable double bed room at G/F. Independent entrance. High '
             'privacy. The room size is around 100 sq.ft. with a 48"x72" '
             'double bed. The village house is close to the Hong Kong Airport, '
             'AsiaWorld-Expo, HongKong-Zhuhai-Macau Bridge, Disneyland, '
             'Citygate outlets, 360 Cable car, shopping centre, main tourist '
             'attractions......'}]

이전 단계에서 지정한 벡터 데이터 유형에 따라 결과가 달라질 수 있습니다.

Cohere embed-english-v3.0 임베딩 모델을 사용하여 샘플 데이터에 대한 이 절차의 고급 데모를 보려면 이 노트북을 참조하세요.

쿼리 결과 평가

ANN 검색 결과가 양자화된 벡터에 대해 ENN 검색 결과와 얼마나 일치하는지 평가하여 Atlas Vector Search 쿼리 의 정확성을 측정할 수 있습니다. 즉, 동일한 쿼리 기준에 대한 ANN 검색 결과를 ENN 검색 결과와 비교하고 ANN 검색 결과에 ENN 검색 결과에 가장 가까운 이웃이 포함되는 빈도를 측정할 수 있습니다.

쿼리 결과를 평가하는 방법에 대한 데모는 쿼리 결과의 정확성을 측정하는 방법을 참조하세요.

돌아가기

문서 변환 및 컬렉션 필터링

검색 증강 생성(RAG)

임베딩 모델 제공자	임베딩 모델
VoyageAI	`voyage-3-large`
Cohere	`embed-english-v3.0`
Nomic	`nomic-embed-text-v1.5`
Jina	`jina-embeddings-v2-base-en`
Mixedbread	`mxbai-embed-large-v1`

1	package main
2
3	import (
4	"context"
5	"encoding/json"
6	"fmt"
7	"log"
8	"os"
9
10	cohere "github.com/cohere-ai/cohere-go/v2"
11	cohereclient "github.com/cohere-ai/cohere-go/v2/client"
12	"go.mongodb.org/mongo-driver/v2/bson"
13	)
14
15	// Sample data for embedding
16	var data = []string{
17	"The Great Wall of China is visible from space.",
18	"The Eiffel Tower was completed in Paris in 1889.",
19	"Mount Everest is the highest peak on Earth at 8,848m.",
20	"Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
21	"The Mona Lisa was painted by Leonardo da Vinci.",
22	}
23
24	func main() {
25	apiKey := os.Getenv("COHERE_API_KEY")
26	if apiKey == "" {
27	log.Fatal("Ensure COHERE_API_KEY is set.")
28	}
29
30	client := cohereclient.NewClient(cohereclient.WithToken(apiKey))
31
32	embeddings, err := fetchEmbeddingsFromCohere(client)
33	if err != nil {
34	log.Fatalf("Error fetching embeddings: %v", err)
35	}
36
37	documents := convertEmbeddingsToBSON(data, embeddings)
38
39	err = writeJSONToFile("embeddings.json", documents)
40	if err != nil {
41	log.Fatalf("Error writing embeddings to file: %v", err)
42	}
43
44	fmt.Println("Embeddings successfully stored in embeddings.json")
45	}
46
47	// Fetch embeddings using Cohere API
48	func fetchEmbeddingsFromCohere(client cohereclient.Client) (cohere.EmbedByTypeResponse, error) {
49	model := "embed-english-v3.0"
50	response, err := client.V2.Embed(context.TODO(), &cohere.V2EmbedRequest{
51	Texts: data,
52	Model: model,
53	InputType: cohere.EmbedInputTypeSearchDocument,
54	EmbeddingTypes: []cohere.EmbeddingType{
55	cohere.EmbeddingTypeFloat,
56	cohere.EmbeddingTypeInt8,
57	cohere.EmbeddingTypeUbinary,
58	},
59	})
60	if err != nil {
61	return nil, fmt.Errorf("failed to fetch embeddings: %w", err)
62	}
63	return response, nil
64	}
65
66	// Convert embeddings to BSON binary vectors
67	func convertEmbeddingsToBSON(sentences []string, embeddings *cohere.EmbedByTypeResponse) []bson.M {
68	var documents []bson.M
69
70	for i, sentence := range sentences {
71	float32Emb := convertFloat64ToFloat32(embeddings.Embeddings.Float[i])
72	int8Emb := convertIntToInt8(embeddings.Embeddings.Int8[i])
73	ubinaryEmb := convertIntToBytes(embeddings.Embeddings.Ubinary[i])
74
75	floatVector := bson.NewVector(float32Emb)
76	int8Vector := bson.NewVector(int8Emb)
77	ubinaryVector, err := bson.NewPackedBitVector(ubinaryEmb, 0)
78	if err != nil {
79	log.Fatalf("Error creating PackedBitVector: %v", err)
80	}
81
82	document := bson.M{
83	"text": sentence,
84	"embeddings_float32": floatVector.Binary(),
85	"embeddings_int8": int8Vector.Binary(),
86	"embeddings_int1": ubinaryVector.Binary(),
87	}
88	documents = append(documents, document)
89	}
90
91	return documents
92	}
93
94	// Write JSON file from in-memory BSON documents
95	func writeJSONToFile(filename string, documents []bson.M) error {
96	file, err := os.Create(filename)
97	if err != nil {
98	return fmt.Errorf("failed to create file: %w", err)
99	}
100	defer file.Close()
101
102	var jsonData []json.RawMessage
103	for _, document := range documents {
104	jsonBytes, err := bson.MarshalExtJSON(document, false, false)
105	if err != nil {
106	return fmt.Errorf("error marshaling BSON to JSON: %w", err)
107	}
108	jsonData = append(jsonData, jsonBytes)
109	}
110
111	marshaledData, err := json.MarshalIndent(jsonData, "", " ")
112	if err != nil {
113	return fmt.Errorf("failed to marshal JSON: %w", err)
114	}
115
116	_, err = file.Write(marshaledData)
117	if err != nil {
118	return fmt.Errorf("failed to write JSON to file: %w", err)
119	}
120
121	return nil
122	}
123
124	// Convert a slice of float64 to a slice of float32
125	func convertFloat64ToFloat32(f64s []float64) []float32 {
126	f32s := make([]float32, len(f64s))
127	for i, v := range f64s {
128	f32s[i] = float32(v)
129	}
130	return f32s
131	}
132
133	// Convert a slice of int to a slice of int8
134	func convertIntToInt8(ints []int) []int8 {
135	ints8 := make([]int8, len(ints))
136	for i, val := range ints {
137	ints8[i] = int8(val)
138	}
139	return ints8
140	}
141
142	// Convert a slice of int to a slice of bytes
143	func convertIntToBytes(ints []int) []byte {
144	bytes := make([]byte, len(ints))
145	for i, val := range ints {
146	bytes[i] = byte(val)
147	}
148	return bytes
149	}

1	import com.cohere.api.Cohere;
2	import com.cohere.api.requests.EmbedRequest;
3	import com.cohere.api.types.EmbedByTypeResponse;
4	import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
5	import com.cohere.api.types.EmbedInputType;
6	import com.cohere.api.types.EmbedResponse;
7	import com.cohere.api.types.EmbeddingType;
8	import java.io.FileOutputStream;
9	import java.io.IOException;
10	import java.util.ArrayList;
11	import java.util.List;
12	import java.util.Objects;
13	import java.util.Optional;
14	import org.bson.BinaryVector;
15	import org.bson.Document;
16
17	public class GenerateAndConvertEmbeddings {
18
19	// List of text data to embed
20	private static final List<String> DATA = List.of(
21	"The Great Wall of China is visible from space.",
22	"The Eiffel Tower was completed in Paris in 1889.",
23	"Mount Everest is the highest peak on Earth at 8,848m.",
24	"Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
25	"The Mona Lisa was painted by Leonardo da Vinci."
26	);
27
28	public static void main(String[] args) {
29	// Cohere API key for authentication
30	String apiKey = System.getenv("COHERE_API_KEY");
31
32	// Fetch embeddings from the Cohere API
33	EmbedByTypeResponseEmbeddings embeddings = fetchEmbeddingsFromCohere(apiKey);
34	Document bsonEmbeddings = convertEmbeddingsToBson(embeddings);
35
36	writeEmbeddingsToFile(bsonEmbeddings, "embeddings.json");
37	}
38
39	// Fetches embeddings based on input data from the Cohere API
40	private static EmbedByTypeResponseEmbeddings fetchEmbeddingsFromCohere(String apiKey) {
41	if (Objects.isNull(apiKey) \|\| apiKey.isEmpty()) {
42	throw new RuntimeException("API key not found. Please set COHERE_API_KEY in your environment.");
43	}
44
45	Cohere cohere = Cohere.builder().token(apiKey).clientName("embed-example").build();
46
47	try {
48	EmbedRequest request = EmbedRequest.builder()
49	.model("embed-english-v3.0")
50	.inputType(EmbedInputType.SEARCH_DOCUMENT)
51	.texts(DATA)
52	.embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
53	.build();
54
55	EmbedResponse response = cohere.embed(request);
56	Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
57
58	return optionalEmbeddingsWrapper.orElseThrow().getEmbeddings();
59	} catch (Exception e) {
60	System.err.println("Error fetching embeddings: " + e.getMessage());
61	throw e;
62	}
63	}
64
65	// Converts embeddings to BSON binary vectors using MongoDB Java Driver
66	private static Document convertEmbeddingsToBson(EmbedByTypeResponseEmbeddings embeddings) {
67	List<List<Double>> floatEmbeddings = embeddings.getFloat().orElseThrow();
68	List<List<Integer>> int8Embeddings = embeddings.getInt8().orElseThrow();
69	List<List<Integer>> ubinaryEmbeddings = embeddings.getUbinary().orElseThrow();
70
71	List<Document> bsonEmbeddings = new ArrayList<>();
72	for (int i = 0; i < floatEmbeddings.size(); i++) {
73	Document bsonEmbedding = new Document()
74	.append("text", DATA.get(i))
75	.append("embeddings_float32", BinaryVector.floatVector(listToFloatArray(floatEmbeddings.get(i))))
76	.append("embeddings_int8", BinaryVector.int8Vector(listToByteArray(int8Embeddings.get(i))))
77	.append("embeddings_int1", BinaryVector.packedBitVector(listToByteArray(ubinaryEmbeddings.get(i)), (byte) 0));
78
79	bsonEmbeddings.add(bsonEmbedding);
80	}
81
82	return new Document("data", bsonEmbeddings);
83	}
84
85	// Writes embeddings to JSON file
86	private static void writeEmbeddingsToFile(Document bsonEmbeddings, String fileName) {
87	try (FileOutputStream fos = new FileOutputStream(fileName)) {
88	fos.write(bsonEmbeddings.toJson().getBytes());
89	System.out.println("Embeddings saved to " + fileName);
90	} catch (IOException e) {
91	System.out.println("Error writing embeddings to file: " + e.getMessage());
92	}
93	}
94
95	// Convert List of Doubles to an array of floats
96	private static float[] listToFloatArray(List<Double> list) {
97	float[] array = new float[list.size()];
98	for (int i = 0; i < list.size(); i++) {
99	array[i] = list.get(i).floatValue();
100	}
101	return array;
102	}
103
104	// Convert List of Integers to an array of bytes
105	private static byte[] listToByteArray(List<Integer> list) {
106	byte[] array = new byte[list.size()];
107	for (int i = 0; i < list.size(); i++) {
108	array[i] = list.get(i).byteValue();
109	}
110	return array;
111	}
112	}

1	import com.mongodb.client.MongoClient;
2	import com.mongodb.client.MongoClients;
3	import com.mongodb.client.MongoCollection;
4	import com.mongodb.client.MongoDatabase;
5	import com.mongodb.client.model.SearchIndexModel;
6	import com.mongodb.client.model.SearchIndexType;
7	import org.bson.Document;
8	import org.bson.conversions.Bson;
9
10	import java.io.IOException;
11	import java.nio.file.Files;
12	import java.nio.file.Path;
13	import java.util.Collections;
14	import java.util.List;
15	import java.util.concurrent.TimeUnit;
16	import java.util.stream.StreamSupport;
17
18	public class UploadDataAndCreateIndex {
19
20	private static final String MONGODB_URI = System.getenv("MONGODB_URI");
21	private static final String DB_NAME = "<DATABASE-NAME>";
22	private static final String COLLECTION_NAME = "<COLLECTION-NAME>";
23	private static final String INDEX_NAME = "<INDEX-NAME>";
24
25	public static void main(String[] args) {
26	try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
27	storeEmbeddings(mongoClient);
28	setupVectorSearchIndex(mongoClient);
29	} catch (IOException \| InterruptedException e) {
30	e.printStackTrace();
31	}
32	}
33
34	public static void storeEmbeddings(MongoClient client) throws IOException {
35	MongoDatabase database = client.getDatabase(DB_NAME);
36	MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
37
38	String fileContent = Files.readString(Path.of("embeddings.json"));
39	List<Document> documents = parseDocuments(fileContent);
40
41	collection.insertMany(documents);
42	System.out.println("Inserted documents into MongoDB");
43	}
44
45	private static List<Document> parseDocuments(String jsonContent) throws IOException {
46	Document rootDoc = Document.parse(jsonContent);
47	return rootDoc.getList("data", Document.class);
48	}
49
50	public static void setupVectorSearchIndex(MongoClient client) throws InterruptedException {
51	MongoDatabase database = client.getDatabase(DB_NAME);
52	MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
53
54	Bson definition = new Document(
55	"fields",
56	List.of(
57	new Document("type", "vector")
58	.append("path", "embeddings_float32")
59	.append("numDimensions", 1024)
60	.append("similarity", "dotProduct"),
61	new Document("type", "vector")
62	.append("path", "embeddings_int8")
63	.append("numDimensions", 1024)
64	.append("similarity", "dotProduct"),
65	new Document("type", "vector")
66	.append("path", "embeddings_int1")
67	.append("numDimensions", 1024)
68	.append("similarity", "euclidean")
69	)
70	);
71
72	SearchIndexModel indexModel = new SearchIndexModel(
73	INDEX_NAME,
74	definition,
75	SearchIndexType.vectorSearch()
76	);
77
78	List<String> result = collection.createSearchIndexes(Collections.singletonList(indexModel));
79	System.out.println("Successfully created vector index named: " + result.get(0));
80	System.out.println("It may take up to a minute for the index to leave the BUILDING status and become queryable.");
81
82	System.out.println("Polling to confirm the index has changed from the BUILDING status.");
83	waitForIndex(collection, INDEX_NAME);
84	}
85
86	public static <T> boolean waitForIndex(final MongoCollection<T> collection, final String indexName) {
87	long startTime = System.nanoTime();
88	long timeoutNanos = TimeUnit.SECONDS.toNanos(60);
89	while (System.nanoTime() - startTime < timeoutNanos) {
90	Document indexRecord = StreamSupport.stream(collection.listSearchIndexes().spliterator(), false)
91	.filter(index -> indexName.equals(index.getString("name")))
92	.findAny().orElse(null);
93	if (indexRecord != null) {
94	if ("FAILED".equals(indexRecord.getString("status"))) {
95	throw new RuntimeException("Search index has FAILED status.");
96	}
97	if (indexRecord.getBoolean("queryable")) {
98	System.out.println(indexName + " index is ready to query");
99	return true;
100	}
101	}
102	try {
103	Thread.sleep(100); // busy-wait, avoid in production
104	} catch (InterruptedException e) {
105	Thread.currentThread().interrupt();
106	throw new RuntimeException(e);
107	}
108	}
109	return false;
110	}
111	}

1	// Use 'require' for modules in a Node.js environment
2	const { CohereClient } = require('cohere-ai');
3	const { writeFile } = require('fs/promises');
4	dd:queueMicrotask
5	// Retrieve API key from environment variables or default placeholder
6	const apiKey = process.env.COHERE_API_KEY \|\| '<COHERE-API-KEY>';
7
8	if (!apiKey) {
9	throw new Error('API key not found. Please set COHERE_API_KEY in your environment.');
10	}
11
12	// Instantiate the CohereClient with the API key
13	const cohere = new CohereClient({ token: apiKey });
14
15	async function main() {
16	try {
17	// Data to embed
18	const data = [
19	"The Great Wall of China is visible from space.",
20	"The Eiffel Tower was completed in Paris in 1889.",
21	"Mount Everest is the highest peak on Earth at 8,848m.",
22	"Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
23	"The Mona Lisa was painted by Leonardo da Vinci.",
24	];
25
26	// Fetch embeddings for the data using the cohere API
27	const response = await cohere.v2.embed({
28	model: 'embed-english-v3.0',
29	inputType: 'search_document',
30	texts: data,
31	embeddingTypes: ['float', 'int8', 'ubinary'],
32	});
33
34	// Extract embeddings from the API response
35	const { float, int8, ubinary } = response.embeddings;
36
37	// Map the embeddings to the text data
38	const embeddingsData = data.map((text, index) => ({
39	text,
40	embeddings: {
41	float: float[index],
42	int8: int8[index],
43	ubinary: ubinary[index],
44	},
45	}));
46
47	// Write the embeddings data to a JSON file
48	await writeFile('embeddings.json', JSON.stringify(embeddingsData, null, 2));
49	console.log('Embeddings saved to embeddings.json');
50	} catch (error) {
51	console.error('Error fetching embeddings:', error);
52	}
53	}
54
55	// Execute the main function
56	main();

1	const fs = require('fs/promises');
2	const { BSON } = require('mongodb');
3	const { Binary } = BSON;
4
5	async function main() {
6	try {
7	// Read and parse the contents of 'embeddings.json' file
8	const fileContent = await fs.readFile('embeddings.json', 'utf8');
9	const embeddingsData = JSON.parse(fileContent);
10
11	// Map the embeddings data to add BSON binary representations with subtype 9
12	const convertEmbeddingsData = embeddingsData.map(({ text, embeddings }) => {
13	// Create Binary for Float32Array with manual subtype 9
14	const bsonFloat32 = Binary.fromFloat32Array(new Float32Array(embeddings.float));
15
16	// Create Binary for Int8Array with subtype 9
17	const bsonInt8 = Binary.fromInt8Array(new Int8Array(embeddings.int8));
18
19	// Create Binary for PackedBits (Uint8Array) with subtype 9
20	const bsonPackedBits = Binary.fromPackedBits(new Uint8Array(embeddings.ubinary));
21
22	return {
23	text,
24	embeddings: {
25	float: embeddings.float, // Original float data
26	int8: embeddings.int8, // Original int8 data
27	ubinary: embeddings.ubinary, // Original packed bits data
28	},
29	bsonEmbeddings: {
30	float32: bsonFloat32,
31	int8: bsonInt8,
32	packedBits: bsonPackedBits,
33	},
34	};
35	});
36
37	// Serialize the updated data to EJSON for BSON compatibility
38	const ejsonSerializedData = BSON.EJSON.stringify(convertEmbeddingsData, null, null, { relaxed: false });
39
40	// Write the serialized data to 'embeddings.json'
41	await fs.writeFile('embeddings.json', ejsonSerializedData);
42	console.log('Embeddings with BSON vectors have been saved to embeddings.json');
43	} catch (error) {
44	console.error('Error processing embeddings:', error);
45	}
46	}
47
48	main();

1	const fs = require('fs/promises'); // Use fs/promises for asynchronous operations
2	const { MongoClient, BSON } = require('mongodb'); // Import from the 'mongodb' package
3
4	const { Binary } = BSON; // Ensure the Binary class is imported correctly
5
6	async function main() {
7	const MONGODB_URI = process.env.MONGODB_URI \|\| "<CONNECTION-STRING>";
8	const DB_NAME = "<DB-NAME>";
9	const COLLECTION_NAME = "<COLLECTION-NAME>";
10
11	let client;
12	try {
13	client = new MongoClient(MONGODB_URI);
14	await client.connect();
15	console.log("Connected to MongoDB");
16
17	const db = client.db(DB_NAME);
18	const collection = db.collection(COLLECTION_NAME);
19
20	// Read and parse the contents of 'embeddings.json' file using EJSON
21	const fileContent = await fs.readFile('embeddings.json', 'utf8');
22	const embeddingsData = BSON.EJSON.parse(fileContent);
23
24	// Map embeddings data to recreate BSON binary representations with the correct subtype
25	const documents = embeddingsData.map(({ text, bsonEmbeddings }) => {
26	return {
27	text,
28	bsonEmbeddings: {
29	float32: bsonEmbeddings.float32,
30	int8: bsonEmbeddings.int8,
31	int1: bsonEmbeddings.packedBits
32	}
33	};
34	});
35
36	const result = await collection.insertMany(documents);
37	console.log(`Inserted ${result.insertedCount} documents into MongoDB`);
38
39	} catch (error) {
40	console.error('Error storing embeddings in MongoDB:', error);
41	} finally {
42	if (client) {
43	await client.close();
44	}
45	}
46	}
47
48	// Run the store function
49	main();

1	const { MongoClient } = require("mongodb");
2	const { setTimeout } = require("timers/promises"); // Import from timers/promises
3
4	// Connect to your Atlas deployment
5	const uri = process.env.MONGODB_URI \|\| "<CONNECTION-STRING>";
6
7	const client = new MongoClient(uri);
8
9	async function main() {
10	try {
11	const database = client.db("<DB-NAME>");
12	const collection = database.collection("<COLLECTION-NAME>");
13
14	// Define your Atlas Vector Search index
15	const index = {
16	name: "<INDEX-NAME>",
17	type: "vectorSearch",
18	definition: {
19	fields: [
20	{
21	type: "vector",
22	numDimensions: 1024,
23	path: "bsonEmbeddings.float32",
24	similarity: "dotProduct",
25	},
26	{
27	type: "vector",
28	numDimensions: 1024,
29	path: "bsonEmbeddings.int8",
30	similarity: "dotProduct",
31	},
32	{
33	type: "vector",
34	numDimensions: 1024,
35	path: "bsonEmbeddings.int1",
36	similarity: "euclidean",
37	},
38	],
39	},
40	};
41
42	// Run the helper method
43	const result = await collection.createSearchIndex(index);
44	console.log(`New search index named ${result} is building.`);
45
46	// Wait for the index to be ready to query
47	console.log("Polling to check if the index is ready. This may take up to a minute.");
48	let isQueryable = false;
49
50	// Use filtered search for index readiness
51	while (!isQueryable) {
52	const [indexData] = await collection.listSearchIndexes(index.name).toArray();
53
54	if (indexData) {
55	isQueryable = indexData.queryable;
56	if (!isQueryable) {
57	await setTimeout(5000); // Wait for 5 seconds before checking again
58	}
59	} else {
60	// Handle the case where the index might not be found
61	console.log(`Index ${index.name} not found.`);
62	await setTimeout(5000); // Wait for 5 seconds before checking again
63	}
64	}
65
66	console.log(`${result} is ready for querying.`);
67	} catch (error) {
68	console.error("Error:", error);
69	} finally {
70	await client.close();
71	}
72	}
73
74	main().catch((err) => {
75	console.error("Unhandled error:", err);
76	});

1	const { MongoClient } = require('mongodb');
2	const fs = require('fs/promises');
3	const { BSON } = require('bson'); // Use BSON's functionality for EJSON parsing
4	const dotenv = require('dotenv');
5
6	dotenv.config();
7
8	// MongoDB connection details
9	const mongoUri = process.env.MONGODB_URI \|\| '<CONNECTION-STRING>';
10	const dbName = '<DB-NAME>'; // Update with your actual database name
11	const collectionName = '<COLLECTION-NAME>'; // Update with your actual collection name
12
13	// Indices and paths should match your MongoDB vector search configuration
14	const VECTOR_INDEX_NAME = '<INDEX-NAME>'; // Replace with your actual index name
15	const NUM_CANDIDATES = 5; // Number of candidate documents for the search
16	const LIMIT = 2; // Limit for the number of documents to return
17
18	// Fields in the collection that contain the BSON query vectors
19	const FIELDS = [
20	{ path: 'float32', subtype: 9 }, // Ensure that the path and custom subtype match
21	{ path: 'int8', subtype: 9 }, // Use the custom subtype if needed
22	{ path: 'int1', subtype: 9 } // Use the same custom subtype
23	];
24
25
26	// Function to read BSON vectors from JSON and run vector search
27	async function main() {
28	// Initialize MongoDB client
29	const client = new MongoClient(mongoUri);
30
31	try {
32	await client.connect();
33	console.log("Connected to MongoDB");
34
35	const db = client.db(dbName);
36	const collection = db.collection(collectionName);
37
38	// Load query embeddings from JSON file using EJSON parsing
39	const fileContent = await fs.readFile('query-embeddings.json', 'utf8');
40	const embeddingsData = BSON.EJSON.parse(fileContent);
41
42	// Define and run the query for each embedding type
43	const results = {};
44
45	for (const fieldInfo of FIELDS) {
46	const { path, subtype } = fieldInfo;
47	const bsonBinary = embeddingsData[0]?.bsonEmbeddings?.[path];
48
49	if (!bsonBinary) {
50	console.warn(`BSON embedding for ${path} not found in the JSON.`);
51	continue;
52	}
53
54	const bsonQueryVector = bsonBinary; // Directly use BSON Binary object
55
56	const pipeline = [
57	{
58	$vectorSearch: {
59	index: VECTOR_INDEX_NAME,
60	path: `bsonEmbeddings.${path}`,
61	queryVector: bsonQueryVector,
62	numCandidates: NUM_CANDIDATES,
63	limit: LIMIT,
64	}
65	},
66	{
67	$project: {
68	_id: 0,
69	text: 1, // Adjust projection fields as necessary to match your document structure
70	score: { $meta: 'vectorSearchScore' }
71	}
72	}
73	];
74
75	results[path] = await collection.aggregate(pipeline).toArray();
76	}
77
78	return results;
79	} catch (error) {
80	console.error('Error during vector search:', error);
81	} finally {
82	await client.close();
83	}
84	}
85
86	// Main execution block
87	(async () => {
88	try {
89	const results = await main();
90
91	if (results) {
92	console.log("Results from Float32 embeddings:");
93	console.table(results.float32 \|\| []);
94	console.log("--------------------------------------------------------------------------");
95
96	console.log("Results from Int8 embeddings:");
97	console.table(results.int8 \|\| []);
98	console.log("--------------------------------------------------------------------------");
99
100	console.log("Results from Packed Binary (PackedBits) embeddings:");
101	console.table(results.int1 \|\| []);
102	}
103	} catch (error) {
104	console.error('Error executing main function:', error);
105	}
106	})();

1	const { MongoClient } = require('mongodb');
2	const fs = require('fs'); // Import the fs module for file system operations
3
4	async function main() {
5	// Replace with your Atlas connection string
6	const uri = process.env.MONGODB_URI \|\| '<CONNECTION-STRING>';
7
8	// Create a new MongoClient instance
9	const client = new MongoClient(uri);
10
11	try {
12	// Connect to your Atlas cluster
13	await client.connect();
14
15	// Specify the database and collection
16	const db = client.db('sample_airbnb');
17	const collection = db.collection('listingsAndReviews');
18
19	// Filter to exclude null or empty summary fields
20	const filter = { summary: { $nin: [null, ''] } };
21
22	// Get a subset of documents in the collection
23	const documentsCursor = collection.find(filter).limit(50);
24
25	// Convert the cursor to an array to get the documents
26	const documents = await documentsCursor.toArray();
27
28	// Log the documents to verify their content
29	console.log('Documents retrieved:', documents);
30
31	// Write the documents to a local file called "subset.json"
32	const outputFilePath = './subset.json';
33	fs.writeFileSync(outputFilePath, JSON.stringify(documents, null, 2), 'utf-8');
34
35	console.log(`Subset of documents written to: ${outputFilePath}`);
36	} catch (error) {
37	console.error('An error occurred:', error);
38	} finally {
39	// Ensure the client is closed when finished
40	await client.close();
41	}
42	}
43
44	main().catch(console.error);

1	// Import necessary modules using the CommonJS syntax
2	const { CohereClient } = require('cohere-ai');
3	const { readFile, writeFile } = require('fs/promises');
4
5	// Retrieve the API key from environment variables or provide a placeholder
6	const apiKey = process.env.COHERE_API_KEY \|\| '<COHERE-API-KEY>';
7
8	if (!apiKey \|\| apiKey === '<COHERE-API-KEY>') {
9	throw new Error('API key not found. Please set COHERE_API_KEY in your environment.');
10	}
11
12	// Initialize the Cohere client with the API key
13	const cohere = new CohereClient({ token: apiKey });
14
15	async function main() {
16	try {
17	// Read and parse the contents of 'subset.json'
18	const subsetData = await readFile('subset.json', 'utf-8');
19	const documents = JSON.parse(subsetData);
20
21	// Extract the 'summary' fields that are non-empty strings
22	const data = documents
23	.map(doc => doc.summary)
24	.filter(summary => typeof summary === 'string' && summary.length > 0);
25
26	if (data.length === 0) {
27	throw new Error('No valid summary texts available in the data.');
28	}
29
30	// Request embeddings from the Cohere API
31	const response = await cohere.v2.embed({
32	model: 'embed-english-v3.0',
33	inputType: 'search_document',
34	texts: data,
35	embeddingTypes: ['float', 'int8', 'ubinary'],
36	});
37
38	// Extract embeddings from the API response
39	const { float, int8, ubinary } = response.embeddings;
40
41	// Structure the embeddings data
42	const embeddingsData = data.map((text, index) => ({
43	text,
44	embeddings: {
45	float: float[index],
46	int8: int8[index],
47	ubinary: ubinary[index],
48	},
49	}));
50
51	// Write the embeddings data to 'embeddings.json'
52	await writeFile('embeddings.json', JSON.stringify(embeddingsData, null, 2));
53	console.log('Embeddings saved to embeddings.json');
54	} catch (error) {
55	console.error('Error fetching embeddings:', error);
56	}
57	}
58
59	// Execute the main function
60	main();

1	const fs = require('fs/promises'); // Use fs/promises for asynchronous operations
2	const { MongoClient, BSON } = require('mongodb'); // Import from the 'mongodb' package
3	const { EJSON, Binary } = require('bson'); // Import EJSON and Binary from bson
4
5	async function main() {
6	const MONGODB_URI = process.env.MONGODB_URI \|\| "<CONNECTION-STRING>";
7	const DB_NAME = "sample_airbnb";
8	const COLLECTION_NAME = "listingsAndReviews";
9
10	let client;
11	try {
12	// Connect to MongoDB
13	client = new MongoClient(MONGODB_URI);
14	await client.connect();
15	console.log("Connected to MongoDB");
16
17	// Access database and collection
18	const db = client.db(DB_NAME);
19	const collection = db.collection(COLLECTION_NAME);
20
21	// Load embeddings from JSON using EJSON.parse
22	const fileContent = await fs.readFile('embeddings.json', 'utf8');
23	const embeddingsData = EJSON.parse(fileContent); // Use EJSON.parse
24
25	// Map embeddings data to recreate BSON binary representations
26	const documents = embeddingsData.map(({ text, bsonEmbeddings }) => {
27	return {
28	summary: text,
29	bsonEmbeddings: {
30	float32: bsonEmbeddings.float32,
31	int8: bsonEmbeddings.int8,
32	int1: bsonEmbeddings.packedBits
33	}
34	};
35	});
36
37	// Iterate over documents and upsert each into the MongoDB collection
38	for (const doc of documents) {
39	const filter = { summary: doc.summary };
40	const update = { $set: doc };
41
42	// Update the document with the BSON binary data
43	const result = await collection.updateOne(filter, update, { upsert: true });
44	if (result.matchedCount > 0) {
45	console.log(`Updated document with summary: ${doc.summary}`);
46	} else {
47	console.log(`Inserted new document with summary: ${doc.summary}`);
48	}
49	}
50
51	console.log("Embeddings stored in MongoDB successfully.");
52	} catch (error) {
53	console.error('Error storing embeddings in MongoDB:', error);
54	} finally {
55	if (client) {
56	await client.close();
57	}
58	}
59	}
60
61	// Run the main function to load the data
62	main();

1	# Connect to your Atlas cluster
2	mongo_client = pymongo.MongoClient("<ATLAS-CONNECTION-STRING>")
3	db = mongo_client["sample_airbnb"]
4	collection = db["listingsAndReviews"]
5
6	# Filter to exclude null or empty summary fields
7	filter = { "summary": {"$nin": [None, ""]} }
8
9	# Get a subset of documents in the collection
10	documents = collection.find(filter).limit(50)
11
12	# Initialize the count of updated documents
13	updated_doc_count = 0

1	from pymongo.operations import SearchIndexModel
2	import time
3
4	# Define and create the vector search index
5	index_name = "<INDEX-NAME>"
6	search_index_model = SearchIndexModel(
7	definition={
8	"fields": [
9	{
10	"type": "vector",
11	"path": "embedding",
12	"similarity": "euclidean",
13	"numDimensions": 1024
14	}
15	]
16	},
17	name=index_name,
18	type="vectorSearch"
19	)
20	result = collection.create_search_index(model=search_index_model)
21	print("New search index named " + result + " is building.")
22
23	# Wait for initial sync to complete
24	print("Polling to check if the index is ready. This may take up to a minute.")
25	predicate=None
26	if predicate is None:
27	predicate = lambda index: index.get("queryable") is True
28	while True:
29	indices = list(collection.list_search_indexes(index_name))
30	if len(indices) and predicate(indices[0]):
31	break
32	time.sleep(5)
33	print(result + " is ready for querying.")

양자화 정보

스칼라 양자화

이진 양자화

팁

요구 사항:

참고

벡터의 자동 양자화를 활성화하는 방법

혜택

사용 사례

절차

Atlas Vector Search 인덱스 에서 원하는 양자화 유형을 지정합니다.

인덱스 만들거나 업데이트 .

고려 사항

사전 양자화된 벡터를 수집하는 방법

사용 사례

혜택

지원되는 드라이버

전제 조건

절차

필요한 라이브러리를 설치합니다.

Go 프로젝트를 초기화합니다.

터미널에서 환경 변수를 설정합니다.

(조건부)데이터에서 임베딩을 생성합니다.

데이터를 수집하고 Atlas Vector Search 인덱스를 생성합니다.

컬렉션에 대해 쿼리를 생성하고 실행합니다.

필요한 라이브러리를 설치합니다.

Go 프로젝트를 초기화합니다.

터미널에서 환경 변수를 설정합니다.

(조건부)데이터에서 임베딩을 생성합니다.

데이터를 수집하고 Atlas Vector Search 인덱스를 생성합니다.

컬렉션에 대해 쿼리를 생성하고 실행합니다.

Java 프로젝트 생성하고 종속성을 설치합니다.

환경 변수를 설정합니다.

참고

데이터에서 임베딩을 생성합니다.

데이터를 수집하고 Atlas Vector Search 인덱스를 생성합니다.

컬렉션에 대한 쿼리를 생성하고 실행합니다.

Java 프로젝트 생성하고 종속성을 설치합니다.

환경 변수를 설정합니다.

참고

(조건부)데이터에서 임베딩을 생성합니다.

데이터를 수집하고 Atlas Vector Search 인덱스를 생성합니다.

컬렉션에 대해 쿼리를 생성하고 실행합니다.

필요한 라이브러리를 설치합니다.

터미널에서 환경 변수를 설정합니다.

데이터의 벡터 임베딩을 생성하세요.

벡터 임베딩을 binData 벡터로 변환합니다.

Atlas 클러스터에 연결하여 데이터를 컬렉션에 업로드합니다.

컬렉션에 Atlas Vector Search 인덱스를 생성합니다.

쿼리 텍스트에 대한 임베딩을 생성합니다.

Atlas Vector Search 쿼리를 실행합니다.

필요한 라이브러리를 설치합니다.

터미널에서 환경 변수를 설정합니다.

Atlas 클러스터에서 데이터를 가져옵니다.

데이터의 벡터 임베딩을 생성하세요.

벡터 임베딩을 binData 벡터로 변환합니다.

Atlas 클러스터에 연결하고 데이터를 네임스페이스에 업로드합니다.

컬렉션에 Atlas Vector Search 인덱스를 생성합니다.

쿼리 텍스트에 대한 임베딩을 생성합니다.

Atlas Vector Search 쿼리를 실행합니다.

필요한 라이브러리를 설치합니다.

예시

PyMongo 및 Cohere 설치

BSON 벡터를 생성하려는 데이터를 노트북에 로드합니다.

예시

가져올 샘플 데이터

(조건부)데이터에서 임베딩을 생성합니다.

예시

Cohere를 사용하여 샘플 데이터에서 임베딩 생성하기

임베딩에서 BSON 벡터를 생성합니다.

예시

함수를 정의하고 실행하여 BSON 벡터 생성하기

BSON 벡터 임베딩을 사용하여 문서를 만듭니다.

예시

샘플 데이터로 문서 만들기

Atlas 클러스터에 데이터를 로드합니다.

예시

예시

컬렉션에 Atlas Vector Search 인덱스를 생성합니다.

예시

벡터 임베딩을 `binData` 벡터로 변환합니다.

벡터 임베딩을 `binData` 벡터로 변환합니다.