我有三个类似这样的pandas数据框:
#0
A C G T uA uC uG uT cmA cmC cmG cmT
seq_1_0 47.0 47.0 54.0 52.0 100.978723 100.957447 100.370370 99.788462 5147.0 5144.0 5055.0 4968.0
seq_1_50 47.0 47.0 54.0 52.0 101.829787 101.680851 99.092593 99.692308 5279.0 5256.0 4864.0 4953.0
seq_2_0 47.0 47.0 54.0 52.0 100.978723 100.957447 100.370370 99.788462 5147.0 5144.0 5055.0 4968.0
seq_2_50 47.0 47.0 54.0 52.0 101.468085 101.425532 99.000000 100.346154 5223.0 5216.0 4850.0 5052.0
seq_3_0 47.0 47.0 54.0 52.0 100.212766 99.680851 100.870370 101.115385 5030.0 4952.0 5131.0 5169.0
seq_3_50 46.0 47.0 53.0 54.0 100.173913 100.978723 100.924528 99.944444 5026.0 5148.0 5139.0 4990.0
seq_4_0 45.0 47.0 54.0 54.0 99.044444 99.000000 101.407407 102.111111 4856.0 4851.0 5214.0 5323.0
seq_4_50 47.0 47.0 53.0 53.0 101.872340 104.382979 97.849057 98.490566 5285.0 5686.0 4684.0 4776.0
seq_5_0 54.0 34.0 37.0 75.0 90.462963 91.647059 90.756757 116.546667 3700.0 3848.0 3737.0 7915.0
seq_5_50 48.0 33.0 37.0 82.0 94.937500 113.636364 113.162162 92.756098 4277.0 7337.0 7245.0 3990.0
seq_6_0 60.0 50.0 48.0 42.0 98.500000 93.900000 106.125000 104.785714 4777.0 4139.0 5976.0 5752.0
seq_6_50 59.0 46.0 52.0 43.0 98.338983 98.826087 102.615385 102.697674 4754.0 4825.0 5402.0 5415.0
#1
A C G T uA uC uG uT cmA cmC cmG cmT
seq_1_0 47.0 47.0 54.0 52.0 100.978723 100.957447 100.370370 99.788462 5147.0 5144.0 5055.0 4968.0
seq_1_50 47.0 47.0 54.0 52.0 101.829787 101.680851 99.092593 99.692308 5279.0 5256.0 4864.0 4953.0
seq_2_0 47.0 47.0 54.0 52.0 100.978723 100.957447 100.370370 99.788462 5147.0 5144.0 5055.0 4968.0
seq_2_50 47.0 47.0 54.0 52.0 101.468085 101.425532 99.000000 100.346154 5223.0 5216.0 4850.0 5052.0
seq_3_0 47.0 47.0 54.0 52.0 100.212766 99.680851 100.870370 101.115385 5030.0 4952.0 5131.0 5169.0
seq_3_50 46.0 47.0 53.0 54.0 100.173913 100.978723 100.924528 99.944444 5026.0 5148.0 5139.0 4990.0
seq_4_0 45.0 47.0 54.0 54.0 99.044444 99.000000 101.407407 102.111111 4856.0 4851.0 5214.0 5323.0
seq_4_50 47.0 47.0 53.0 53.0 101.872340 104.382979 97.849057 98.490566 5285.0 5686.0 4684.0 4776.0
seq_5_0 54.0 34.0 37.0 75.0 90.462963 91.647059 90.756757 116.546667 3700.0 3848.0 3737.0 7915.0
seq_5_50 48.0 33.0 37.0 82.0 94.937500 113.636364 113.162162 92.756098 4277.0 7337.0 7245.0 3990.0
#2
A C G T uA uC uG uT cmA cmC cmG cmT
seq_1_0 48.0 48.0 53.0 51.0 100.291667 99.208333 101.943396 100.411765 5042.0 4882.0 5297.0 5062.0
seq_1_50 48.0 47.0 54.0 51.0 100.083333 101.680851 99.092593 101.294118 5012.0 5256.0 4864.0 5196.0
seq_2_0 47.0 47.0 54.0 52.0 100.978723 100.957447 100.370370 99.788462 5147.0 5144.0 5055.0 4968.0
seq_2_50 47.0 47.0 54.0 52.0 101.468085 101.425532 99.000000 100.346154 5223.0 5216.0 4850.0 5052.0
seq_3_0 50.0 47.0 53.0 50.0 98.980000 99.680851 101.490566 101.740000 4847.0 4952.0 5226.0 5265.0
seq_3_50 49.0 47.0 52.0 52.0 95.857143 100.978723 102.519231 102.423077 4403.0 5148.0 5387.0 5371.0
我希望能够比较第一个数据框(#0)的所有列与另外两个数据框(#1和#2)的所有列,以识别哪些索引具有不同的列值(例如,索引seq_6_0
和seq_6_50
出现在数据框#0中并且在另外两个数据框中不存在)。
但是,我还想对每一列进行容差变化,以便将不同数据框的列视为相等,例如:
数据框#0的索引seq_1_0
具有以下值:
A C G T uA uC uG uT cmA cmC cmG cmT
47.0 47.0 54.0 52.0 100.978723 100.957447 100.370370 99.788462 5147.0 5144.0 5055.0 4968.0
第二个数据帧中索引seq_1_0
的值为:
A C G T uA uC uG uT cmA cmC cmG cmT
48.0 48.0 53.0 51.0 100.291667 99.208333 101.943396 100.411765 5042.0 4882.0 5297.0 5062.0
我想为每一列设置不同的容差值,例如对于列["A","C","T","G"]
,我需要在比较值之间设置90%的容差值,但对于其他列,我需要设置不同的百分比。
有任何pandas函数可以用来实现这个吗?
谢谢!