我可以使用以下代码重现您的问题 -
重新创建问题的代码 -
import numpy as np
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras.preprocessing.text import Tokenizer
label_tokenizer = Tokenizer()
fit_text = "Tensorflow warriors are awesome people"
label_tokenizer.fit_on_texts(fit_text)
train_labels = "Tensorflow warriors are great people"
training_label_list = np.array(label_tokenizer.texts_to_sequences(train_labels))
print(training_label_list)
print(type(training_label_list))
print(type(training_label_list[0]))
输出 -
2.2.0
[list([9]) list([1]) list([10]) list([5]) list([3]) list([2]) list([11])
list([7]) list([3]) list([6]) list([]) list([6]) list([4]) list([2])
list([2]) list([12]) list([3]) list([2]) list([5]) list([]) list([4])
list([2]) list([1]) list([]) list([4]) list([2]) list([1]) list([])
list([]) list([2]) list([1]) list([4]) list([9]) list([]) list([8])
list([1]) list([3]) list([8]) list([7]) list([1])]
<class 'numpy.ndarray'>
<class 'list'>
解决方案 -
- 将
np.array
替换为np.hstack
将修复您的问题。您的model.fit()
现在应该可以正常工作。
- 如果您正在寻找与您问题中所期望的输出相同的输出,
training_label_list = label_tokenizer.texts_to_sequences(train_labels)
将给您一个列表的列表。您可以使用np.array([np.array(i) for i in training_label_list])
将其转换为数组的数组。这仅在列表的列表包含具有相同元素数量的列表时有效。
np.hstack代码 - 解决方案中第1点的代码。
import numpy as np
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras.preprocessing.text import Tokenizer
label_tokenizer = Tokenizer()
fit_text = "Tensorflow warriors are awesome people"
label_tokenizer.fit_on_texts(fit_text)
train_labels = "Tensorflow warriors are great people"
training_label_list = np.hstack(label_tokenizer.texts_to_sequences(train_labels))
print(training_label_list)
print(type(training_label_list))
print(type(training_label_list[0]))
输出 -
2.2.0
[ 9. 1. 10. 4. 2. 3. 11. 7. 2. 5. 5. 6. 3. 3. 12. 2. 3. 4.
6. 3. 1. 3. 1. 6. 9. 8. 1. 2. 8. 7. 1.]
<class 'numpy.ndarray'>
<class 'numpy.float64'>
期望的输出如问题所述 - 解决方案中点号2的代码。
import numpy as np
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras.preprocessing.text import Tokenizer
label_tokenizer = Tokenizer()
fit_text = "Tensorflow warriors are awesome people"
label_tokenizer.fit_on_texts(fit_text)
train_labels = "Tensorflow warriors are great people"
training_label_list = label_tokenizer.texts_to_sequences(train_labels)
print(training_label_list)
print(type(training_label_list))
print(type(training_label_list[0]))
training_label_list = np.array([np.array(i) for i in training_label_list])
print(training_label_list)
print(type(training_label_list))
print(type(training_label_list[0]))
输出 -
2.2.0
[[9], [1], [10], [4], [2], [3], [11], [7], [2], [5], [], [5], [6], [3], [3], [12], [2], [3], [4], [], [6], [3], [1], [], [], [3], [1], [6], [9], [], [8], [1], [2], [8], [7], [1]]
<class 'list'>
<class 'list'>
[array([9]) array([1]) array([10]) array([4]) array([2]) array([3])
array([11]) array([7]) array([2]) array([5]) array([], dtype=float64)
array([5]) array([6]) array([3]) array([3]) array([12]) array([2])
array([3]) array([4]) array([], dtype=float64) array([6]) array([3])
array([1]) array([], dtype=float64) array([], dtype=float64) array([3])
array([1]) array([6]) array([9]) array([], dtype=float64) array([8])
array([1]) array([2]) array([8]) array([7]) array([1])]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
希望这回答了你的问题。学习愉快。
2020年2月6日更新 - Anirudh_k07,根据我们的讨论,我查看了你的程序,发现在使用np.hstack
进行标签处理后,在model.fit()
中出现以下错误。
ValueError: Data cardinality is ambiguous:
x sizes: 41063
y sizes: 41429
Please provide data which shares the same first dimension.
这个错误是因为一些标签有特殊字符,比如
-
和
/
。因此,在执行
np.hstack(label_tokenizer.texts_to_sequences(train_labels)
时,它们会创建额外的行。您可以使用
print(set(train_labels))
打印唯一的
train_labels
列表。
这是我想要表达的要点 -
train_labels = ['Bio-PesticidesandBio-Fertilizers','Old/SenileOrchardRejuvenation']
training_label_seq = np.hstack(label_tokenizer.texts_to_sequences(train_labels))
print("Two labels are converted to Five :",training_label_seq)
train_labels = ['SoilHealthCard', 'PostHarvestPreservation', 'FertilizerUseandAvailability']
training_label_seq = np.hstack(label_tokenizer.texts_to_sequences(train_labels))
print("Three labels are remain three :",training_label_seq)
输出 -
Two labels are converted to Five : [17 18 19 51 52]
Three labels are remain three : [20 36 5]
请做适当的预处理并消除这些特殊字符在
train_labels
中,然后使用
np.hstack(label_tokenizer.texts_to_sequences(train_labels))
处理标签。之后您的
model.fit()
应该可以正常运行。
希望这回答了您的问题,祝学习愉快。
label_tokenizer.texts_to_sequences
的功能? - fireball.1