我正在设置一个推荐系统,并在训练完神经网络后,希望找到最接近的邻居,为客户提供这样的推荐。
我的问题是如何最好地评估这部分内容?
我想使用一些衡量标准(或多个衡量标准),展示给我发现的邻居或者推荐有多“好”或多“坏”。
你知道哪些标准,我该如何实施它们?
数据框:
d = {'purchaseid': [0, 0, 0, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9, 9, 9, 9],
'itemid': [ 3, 8, 2, 10, 3, 10, 4, 12, 3, 12, 3, 4, 8, 6, 3, 0, 5, 12, 9, 9, 13, 1, 7, 11, 11]}
df = pd.DataFrame(data=d)
purchaseid itemid
0 0 3
1 0 8
2 0 2
3 1 10
4 2 3
... ... ...
寻找最近的邻居:
from keras.models import load_model
from sklearn.cluster import KMeans
# this is a nice rock/oldies playlist
desired_user_id = 500
model_path = 'spotify_NCF_8_[64, 32, 16, 8].h5'
print('using model: %s' % model_path)
model = load_model(model_path)
print('Loaded model!')
mlp_user_embedding_weights = (next(iter(filter(lambda x: x.name == 'mlp_user_embedding', model.layers))).get_weights())
# get the latent embedding for your desired user
user_latent_matrix = mlp_user_embedding_weights[0]
one_user_vector = user_latent_matrix[desired_user_id,:]
one_user_vector = np.reshape(one_user_vector, (1,32))
print('\nPerforming kmeans to find the nearest users/playlists...')
# get 100 similar users
kmeans = KMeans(n_clusters=100, random_state=0, verbose=0).fit(user_latent_matrix)
desired_user_label = kmeans.predict(one_user_vector)
user_label = kmeans.labels_
neighbors = []
for user_id, user_label in enumerate(user_label):
if user_label == desired_user_label:
neighbors.append(user_id)
print('Found {0} neighbor users/playlists.'.format(len(neighbors)))
# get the tracks in similar users' playlists
tracks = []
for user_id in neighbors:
tracks += list(df[df['pid'] == int(user_id)]['trackindex'])
print('Found {0} neighbor tracks from these users.'.format(len(tracks)))
users = np.full(len(tracks), desired_user_id, dtype='int32')
items = np.array(tracks, dtype='int32')
print('\nRanking most likely tracks using the NeuMF model...')
# and predict tracks for my user
results = model.predict([users,items],batch_size=100, verbose=0)
results = results.tolist()
print('Ranked the tracks!')
.
.
.
# And now loop through and get the probability Note: This part has been removed because it is not part of the code
metrics.auc([1,1,1,0,0,0], [0.9,0.8,0.4,0.5,0.2,0.1])
。您能告诉我这些列表具体是什么吗?我的普通数据框和我的推荐数据框吗? - user14253628