使用位向量来确定所有字符是否唯一的含义解释

203

我对位向量如何实现此操作感到困惑(对位向量不太熟悉)。 这是给出的代码。 请有人能帮我解释一下吗?

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

具体而言,checker在做什么?


132
这段代码已经从《程序员面试金典》中摘取。 - Dejell
3
你测试过这个吗?看起来它无法检测到重复的 'a' 字符,因为它被设置为 0 并且左移后仍将保持为 0。 - Riz
3
请注意,该解决方案用于小写字母a-z,这意味着我们将其用于查找26个字符的重复性。因此,在这里可以使用32位的int。如果范围更大,则该解决方案将无法工作。 - a3.14_Infinity
如果你的字母表比向量中使用的位数更大,那么假设使用类似 BitSet 的类会更好吗? - losmescaleros
3
人们常犯的错误是混淆左移运算符语法——它是将数字1向左移动x(=str.charAt(i) - 'a')个位置,而不是将x的位向左移动1个位置。 - nanosoft
显示剩余2条评论
12个回答

286
我有一种隐约的感觉,你从我正在阅读的同一本书中获取了这段代码...这里的代码本身并不像运算符 |=, &, 和 << 那样晦涩难懂,这些运算符通常我们这些平民不会用到——作者没有花费额外的时间来解释过程或实际机制。起初我对这个主题的先前回答感到满意,但只是在抽象层面上。我回来是因为我觉得需要一个更具体的解释——缺乏解释总是让我感到不安。
这个运算符 << 是一个左位移运算符,它将该数字或操作数的二进制表示向左移动指定的位数,就像十进制数中的那样,只不过是在二进制数中进行的,当我们向上移动多少位时,我们在以2为底的情况下进行乘法运算,而不是在以10为基数的情况下,所以右侧的数字是指数,左侧的数字是2的基础倍数。
这个运算符 |=(称为按位或赋值)将左侧的操作数与右侧的操作数进行或运算,并将结果赋给左侧的操作数(x |= y相当于 x = x | y)。同样地,运算符('&')将运算符的左侧与右侧进行'与'运算。这也有一个按位与赋值(x &= y相当于x = x & y)。
所以在这里,我们有一个哈希表,它被存储在一个32位二进制数中,每次检查器与分配给字母的指定二进制值进行或运算时,对应的位就被设置为true。字符的值与检查器进行'与'运算(checker & (1 << val)) > 0)-如果它大于0,我们知道我们有一个重复项-因为两个相同的位都设置为true并'd'在一起将返回true或'1'。
有26个二进制位置,每个位置对应一个小写字母——作者确实说要假设字符串只包含小写字母——这是因为我们只剩下6个(在32位整数中)位置可用了,然后我们就会发生冲突。
00000000000000000000000000000001 a 2^0

00000000000000000000000000000010 b 2^1

00000000000000000000000000000100 c 2^2

00000000000000000000000000001000 d 2^3

00000000000000000000000000010000 e 2^4

00000000000000000000000000100000 f 2^5

00000000000000000000000001000000 g 2^6

00000000000000000000000010000000 h 2^7

00000000000000000000000100000000 i 2^8

00000000000000000000001000000000 j 2^9

00000000000000000000010000000000 k 2^10

00000000000000000000100000000000 l 2^11

00000000000000000001000000000000 m 2^12

00000000000000000010000000000000 n 2^13

00000000000000000100000000000000 o 2^14

00000000000000001000000000000000 p 2^15

00000000000000010000000000000000 q 2^16

00000000000000100000000000000000 r 2^17

00000000000001000000000000000000 s 2^18

00000000000010000000000000000000 t 2^19

00000000000100000000000000000000 u 2^20

00000000001000000000000000000000 v 2^21

00000000010000000000000000000000 w 2^22

00000000100000000000000000000000 x 2^23

00000001000000000000000000000000 y 2^24

00000010000000000000000000000000 z 2^25

因此,对于输入字符串'azya',当我们逐步移动时,
字符串'a'
a      =00000000000000000000000000000001
checker=00000000000000000000000000000000

checker='a' or checker;
// checker now becomes = 00000000000000000000000000000001
checker=00000000000000000000000000000001

a and checker=0 no dupes condition

字符串 'az'

checker=00000000000000000000000000000001
z      =00000010000000000000000000000000

z and checker=0 no dupes 

checker=z or checker;
// checker now becomes 00000010000000000000000000000001  
      

字符串 'azy'

checker= 00000010000000000000000000000001    
y      = 00000001000000000000000000000000 

checker and y=0 no dupes condition 

checker= checker or y;
// checker now becomes = 00000011000000000000000000000001

字符串 'azya'

checker= 00000011000000000000000000000001
a      = 00000000000000000000000000000001

a and checker=1 we have a dupe

现在,它声明了一个副本

@ivan-tichy 你测试过这个吗?看起来它将无法检测到重复的 'a' 字符,因为它被设置为0并且左移它仍然保持为0。 - Riz
2
@Riz 不,它始终从“1”开始,算法基于字母进行移位。因此,如果字母“a”出现一次,它将是1,即(....000001)。 - Taylor Halliday
5
@Ivan Man,我也在想同样的事情。即使选定的答案没有解释运算符,感谢您提供了详细的信息。 - WowBow
我应该假设上述唯一性检查仅适用于排序字符集(abcd...z)?而不是(bcad...)? - abdul rashid
我有一种隐秘的怀疑,你从我正在阅读的同一本书中得到了这段代码。同感 :) 让我笑了。 - backbone

116
这里的int checker被用作位的存储器。整数值中的每一位都可以被视为一个标志,因此int实际上是一个位(标志)数组。您可以使用位向量来代替int,原因如下:
  • 大小int有固定的大小, 通常为4个字节,即8个* 4 = 32位(标志)。位向量通常可以具有不同的大小或在构造函数中指定大小。

  • API:使用位向量可以让您编写更易读的代码,类似于:

    vector.SetFlag(4, true); //将索引4处的标志设置为true

    对于int,您需要编写较低级别的位逻辑代码:

    checker |= (1 << 5); //将索引5处的标志设置为true

此外,int可能会更快一些,因为位操作非常底层,可以由CPU直接执行。BitVector允许编写稍微不那么难懂的代码,而且它可以存储更多的标志。
供以后参考:位向量也称为bitSet或bitArray。以下是不同语言/平台的此数据结构的一些链接:

Java有BitVector类吗?我找不到任何相关文档! - Dejell
大小具有固定的大小,为32位。这是否意味着它只能测试32个字符的唯一性?我已经测试过了,这个函数可以测试"abcdefgZZ"是false,但"abcdefg@@"返回true。 - alwaysday1
2
谷歌把我带到了这里。@Dejel 这是你可以使用的Java数据结构:http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html。希望这能帮助那些穿越互联网的人。 - nattyddubbs
@nattyddubbs,谢谢,我已经将这个和其他几个链接添加到答案中。 - Snowbear

88

我认为所有这些答案都解释了它是如何工作的,然而我想给出我的意见,通过重新命名一些变量,添加一些其他变量,并对其添加注释,让它更易于理解:

public static boolean isUniqueChars(String str) {

    /*
    checker is the bit array, it will have a 1 on the character index that
    has appeared before and a 0 if the character has not appeared, you
    can see this number initialized as 32 0 bits:
    00000000 00000000 00000000 00000000
     */
    int checker = 0;

    //loop through each String character
    for (int i = 0; i < str.length(); ++i) {
        /*
        a through z in ASCII are charactets numbered 97 through 122, 26 characters total
        with this, you get a number between 0 and 25 to represent each character index
        0 for 'a' and 25 for 'z'

        renamed 'val' as 'characterIndex' to be more descriptive
         */
        int characterIndex = str.charAt(i) - 'a'; //char 'a' would get 0 and char 'z' would get 26

        /*
        created a new variable to make things clearer 'singleBitOnPosition'

        It is used to calculate a number that represents the bit value of having that 
        character index as a 1 and the rest as a 0, this is achieved
        by getting the single digit 1 and shifting it to the left as many
        times as the character index requires
        e.g. character 'd'
        00000000 00000000 00000000 00000001
        Shift 3 spaces to the left (<<) because 'd' index is number 3
        1 shift: 00000000 00000000 00000000 00000010
        2 shift: 00000000 00000000 00000000 00000100
        3 shift: 00000000 00000000 00000000 00001000

        Therefore the number representing 'd' is
        00000000 00000000 00000000 00001000

         */
        int singleBitOnPosition = 1 << characterIndex;

        /*
        This peforms an AND between the checker, which is the bit array
        containing everything that has been found before and the number
        representing the bit that will be turned on for this particular
        character. e.g.
        if we have already seen 'a', 'b' and 'd', checker will have:
        checker = 00000000 00000000 00000000 00001011
        And if we see 'b' again:
        'b' = 00000000 00000000 00000000 00000010

        it will do the following:
        00000000 00000000 00000000 00001011
        & (AND)
        00000000 00000000 00000000 00000010
        -----------------------------------
        00000000 00000000 00000000 00000010

        Since this number is different than '0' it means that the character
        was seen before, because on that character index we already have a 
        1 bit value
         */
        if ((checker & singleBitOnPosition) > 0) {
            return false;
        }

        /* 
        Remember that 
        checker |= singleBitOnPosition is the same as  
        checker = checker | singleBitOnPosition
        Sometimes it is easier to see it expanded like that.

        What this achieves is that it builds the checker to have the new 
        value it hasnt seen, by doing an OR between checker and the value 
        representing this character index as a 1. e.g.
        If the character is 'f' and the checker has seen 'g' and 'a', the 
        following will happen

        'f' = 00000000 00000000 00000000 00100000
        checker(seen 'a' and 'g' so far) = 00000000 00000000 00000000 01000001

        00000000 00000000 00000000 00100000
        | (OR)
        00000000 00000000 00000000 01000001
        -----------------------------------
        00000000 00000000 00000000 01100001

        Therefore getting a new checker as 00000000 00000000 00000000 01100001

         */
        checker |= singleBitOnPosition;
    }
    return true;
}

5
很好的解释。谢谢! - Hormigas
2
清晰的解释。谢谢。 - Prabhaker A
2
很棒的解释。易于理解。谢谢。 - Anil Kumar
4
这正是评论被发明的原因。 - Mr. Suryaa Jha
2
这一定是被接受的答案。非常感谢你提供这个很棒的解释! - Mohamad Ghaith Alzin
显示剩余6条评论

31
"I also assume that your example comes from the book Cracking The Code Interview and my answer is related to this context."
"To use this algorithm to solve the problem, we only need to pass lowercase characters from a to z. Since there are only 26 letters and they are properly sorted in the encoding table we use, this guarantees that all potential differences str.charAt(i) - 'a' will be less than 32 (the size of the int variable checker)."
"As explained by Snowbear, we will use the checker variable as an array of bits. Let's look at an example:"
"Suppose str equals "test"."
"On the first pass (i = t):"
"
" "

checker == 0 (00000000000000000000000000000000)

" "
"
In ASCII, val = str.charAt(i) - 'a' = 116 - 97 = 19
What about 1 << val ?
1          == 00000000000000000000000000000001
1 << 19    == 00000000000010000000000000000000
checker |= (1 << val) means checker = checker | (1 << val)
so checker = 00000000000000000000000000000000 | 00000000000010000000000000000000
checker == 524288 (00000000000010000000000000000000)
  • 第二遍循环 (i = e)

checker == 524288 (00000000000010000000000000000000)

val = 101 - 97 = 4
1          == 00000000000000000000000000000001
1 << 4     == 00000000000000000000000000010000
checker |= (1 << val) 
so checker = 00000000000010000000000000000000 | 00000000000000000000000000010000
checker == 524304 (00000000000010000000000000010000)

一直这样循环,直到我们通过条件在checker中找到特定字符的已设置位为止。

(checker & (1 << val)) > 0

希望能帮到您

2
在我看来,这个解释比其他的好多了。但是有一件事我还是不理解,那就是 checker = 00000000000010000000000000000000 | 00000000000000000000000000010000 中的“|=”不是按位或运算符吗?难道不应该选取一个值或另一个值吗?为什么要同时使用和设置两个位呢? - CodeCrack
@CodeCrack 你说它是按位或运算。它是在比较位级别而不是位数组级别。注意:int是位数组。 - MusicMan

8
阅读Ivan上面的回答真的帮助了我,尽管我会用稍微不同的措辞来表达。在(1 << val)中的<<是一个位移运算符。它将1(在二进制中表示为000000001,并且有越多的前导零/内存分配,您就可以使用/)向左移动val个位置。因为我们假设只有a-z并且每次减去a,所以每个字母的值都是0-25,这将是该字母在checker整数的布尔表示中从右侧计算的索引,因为我们将1向左移动val次。
在每次检查结束时,我们看到|=运算符。这合并了两个二进制数,如果在该索引处存在于任一操作数中,则将所有0替换为1。这里,这意味着在(1 << val)中存在1的任何位置,该1将被复制到checker中,同时保留checker现有的所有1。
您可能已经猜到,1在这里作为true的布尔标志。当我们检查字符是否已经在字符串中表示时,我们将checker与当前字符的索引处具有1标志的布尔值数组进行比较,此时checker基本上是已经表示的字符的索引处的布尔标志数组(1值)。
&运算符完成此检查。类似于|=,&运算符仅在两个操作数在该索引处都具有1时才复制1。因此,在checker中已经存在并且在(1 << val)中也表示的标志将被复制过来。在这种情况下,仅当当前字符已经表示时,checker & (1 << val)的结果中会有1存在。如果该操作的结果中存在1,则返回的布尔值的值>0,并且该方法返回false。
这就是我猜测位向量也称为位数组的原因。因为尽管它们不属于数组数据类型,但可以类似于使用数组的方式来存储布尔标志。

2
非常有帮助,谢谢你的Java信息点滴。 - Bachiri Taoufiq Abderrahman

7
已经有几个非常好的答案了,所以我不想重复已经说过的内容。但是,我想添加一些东西来帮助解决上面的程序,因为我刚刚完成了同样的程序,并且有几个问题,但是经过一段时间的研究,我对这个程序更加清楚了。
首先,“checker”用于跟踪在字符串中已经遍历过的字符,以查看是否有任何字符被重复使用。
现在,“checker”是一个int数据类型,因此它只能有32位或4字节(取决于平台),因此该程序只能正确地处理范围内有32个字符的字符集。这就是为什么该程序从每个字符中减去'a',以使该程序仅针对小写字符运行。但是,如果您混合使用小写和大写字符,则该程序将无法正常工作。
顺便说一句,如果您不从每个字符中减去'a'(请参见下面的语句),则该程序仅对仅包含大写字符或仅包含小写字符的字符串正确工作。因此,上述程序的范围从仅限小写字符扩展到了大写字符,但它们不能混合使用。
int val = str.charAt(i) - 'a'; 

然而,我想编写一个通用程序,使用位运算来处理任何ASCII字符,而不用担心大小写、数字或任何特殊字符。为了做到这一点,我们的“检查器”应该足够大,能够存储256个字符(ASCII字符集大小)。但是,在Java中,int类型无法胜任此项工作,因为它只能存储32位。因此,在下面的程序中,我使用了JDK中可在实例化BitSet对象时传递任何用户定义大小的BitSet类。

以下是一个程序,它使用位运算符执行与上述程序相同的操作,但该程序将适用于具有ASCII字符集中任何字符的字符串。

public static boolean isUniqueStringUsingBitVectorClass(String s) {

    final int ASCII_CHARACTER_SET_SIZE = 256;

    final BitSet tracker = new BitSet(ASCII_CHARACTER_SET_SIZE);

    // if more than  256 ASCII characters then there can't be unique characters
    if(s.length() > 256) {
        return false;
    }

    //this will be used to keep the location of each character in String
    final BitSet charBitLocation = new BitSet(ASCII_CHARACTER_SET_SIZE);

    for(int i = 0; i < s.length(); i++) {

        int charVal = s.charAt(i);
        charBitLocation.set(charVal); //set the char location in BitSet

        //check if tracker has already bit set with the bit present in charBitLocation
        if(tracker.intersects(charBitLocation)) {
            return false;
        }

        //set the tracker with new bit from charBitLocation
        tracker.or(charBitLocation);

        charBitLocation.clear(); //clear charBitLocation to store bit for character in the next iteration of the loop

    }

    return true;

}

1
我一直在寻找这个解决方案,但是没有必要使用两个BitSet变量,只需要一个tracker就足够了。更新的for循环代码:for(int i = 0; i < s.length(); i++) { int charVal = s.charAt(i); if(tracker.get(charVal)) { return false; } tracker.set(charVal); } - zambro

5

简单的解释(下面附有JS代码)

  • 每个机器码中的整数变量是一个32位数组
  • 所有位运算都是32位
  • 它们不关心语言的操作系统/ CPU架构或所选数字系统,例如JS的DEC64
  • 这种查找重复项的方法类似于在大小为32的数组中存储字符,其中,如果在字符串中找到a,则设置0th索引,对于b设置1st等等。
  • 字符串中的重复字符将占用其对应的位,或者在这种情况下,设置为1。
  • Ivan已经解释了:如何在先前的答案中计算此索引

操作摘要:

  • 在字符的checkerindex之间执行AND操作
  • 内部两者均为Int-32-Arrays
  • 这是两者之间的位运算。
  • 检查操作的输出是否为1
  • 如果output == 1
    • checker变量在两个数组中都设置了该特定索引位
    • 因此它是重复的。
  • 如果output == 0
    • 到目前为止还没有找到这个字符
    • 在字符的checkerindex之间执行OR操作
    • 从而将索引位更新为1
    • 将输出分配给checker

假设:

  • 我们假设我们会得到所有小写字符
  • 并且,大小为32足够了
  • 因此,我们从a的ASCII码为97考虑96作为参考点开始计数索引

下面是JavaScript源代码。

function checkIfUniqueChars (str) {

    var checker = 0; // 32 or 64 bit integer variable 

    for (var i = 0; i< str.length; i++) {
        var index = str[i].charCodeAt(0) - 96;
        var bitRepresentationOfIndex = 1 << index;

        if ( (checker & bitRepresentationOfIndex) > 1) {
            console.log(str, false);
            return false;
        } else {
            checker = (checker | bitRepresentationOfIndex);
        }
    }
    console.log(str, true);
    return true;
}

checkIfUniqueChars("abcdefghi");  // true
checkIfUniqueChars("aabcdefghi"); // false
checkIfUniqueChars("abbcdefghi"); // false
checkIfUniqueChars("abcdefghii"); // false
checkIfUniqueChars("abcdefghii"); // false

注意:在JS中,尽管整数为64位,但按位操作始终在32位上执行。

示例: 如果字符串是aa,则:

// checker is intialized to 32-bit-Int(0)
// therefore, checker is
checker= 00000000000000000000000000000000

i = 0

str[0] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000000
Boolean(0) == false

// So, we go for the '`OR`' operation.

checker = checker OR 32-bit-Int(1)
checker = 00000000000000000000000000000001

i = 1

str[1] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker= 00000000000000000000000000000001
a      = 00000000000000000000000000000001

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000001
Boolean(1) == true
// We've our duplicate now

3

让我们逐行分解代码。

int checker = 0; 我们初始化了一个checker,它将帮助我们找到重复的值。

int val = str.charAt(i) - 'a'; 我们正在获取字符串中“i”位置处字符的ASCII值,并将其与“a”的ASCII值相减。由于假设该字符串仅包含小写字母,因此字符数限制为26个。因此,“val”的值始终为≥0。

if ((checker & (1 << val)) > 0) return false;

checker |= (1 << val);

现在是棘手的部分。让我们以字符串“abcda”为例。这应该理想地返回false。

for循环迭代1:

Checker:00000000000000000000000000000000

val:97-97=0

1 << 0:00000000000000000000000000000001

checker & (1 << val):00000000000000000000000000000000不大于0

因此checker:00000000000000000000000000000001

for循环迭代2:

Checker:00000000000000000000000000000001

val:98-97=1

1 << 1:00000000000000000000000000000010

checker & (1 << val):00000000000000000000000000000000不大于0

因此checker:00000000000000000000000000000011

for循环迭代3:

Checker:00000000000000000000000000000011

val:99-97=2

1 << 2:00000000000000000000000000000100

checker & (1 << val):00000000000000000000000000000000不大于0

因此checker:00000000000000000000000000000111

for循环迭代4:

Checker:00000000000000000000000000000111

val:100-97=3

1 << 3:00000000000000000000000000001000

检查器& (1 << val) :00000000000000000000000000000000 不大于 0

因此,检查器:00000000000000000000000000001111

For循环迭代5:

检查器:00000000000000000000000000001111

val:97-97 = 0

1 << 0: 00000000000000000000000000000001

检查器 & (1 << val) :00000000000000000000000000000001 大于 0

因此返回false。


1
val: 99-97 = 0 应改为 val: 99-97 = 2,val: 100-97 = 0 应改为 3。 - Brosef

2
public static void main (String[] args)
{
    //In order to understand this algorithm, it is necessary to understand the following:

    //int checker = 0;
    //Here we are using the primitive int almost like an array of size 32 where the only values can be 1 or 0
    //Since in Java, we have 4 bytes per int, 8 bits per byte, we have a total of 4x8=32 bits to work with

    //int val = str.charAt(i) - 'a';
    //In order to understand what is going on here, we must realize that all characters have a numeric value
    for (int i = 0; i < 256; i++)
    {
        char val = (char)i;
        System.out.print(val);
    }

    //The output is something like:
    //             !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
    //There seems to be ~15 leading spaces that do not copy paste well, so I had to use real spaces instead

    //To only print the characters from 'a' on forward:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        //char val2 = val + 'a'; //incompatible types. required: char found: int
        int val2 = val + 'a';  //shift to the 'a', we must use an int here otherwise the compiler will complain
        char val3 = (char)val2;  //convert back to char. there should be a more elegant way of doing this.
        System.out.print(val3);
    }

    //Notice how the following does not work:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        int val2 = val - 'a';
        char val3 = (char)val2;
        System.out.print(val3);
    }
    //I'm not sure why this spills out into 2 lines:
    //EDIT I cant seem to copy this into stackoverflow!

    System.out.println();
    System.out.println();

    //So back to our original algorithm:
    //int val = str.charAt(i) - 'a';
    //We convert the i'th character of the String to a character, and shift it to the right, since adding shifts to the right and subtracting shifts to the left it seems

    //if ((checker & (1 << val)) > 0) return false;
    //This line is quite a mouthful, lets break it down:
    System.out.println(0<<0);
    //00000000000000000000000000000000
    System.out.println(0<<1);
    //00000000000000000000000000000000
    System.out.println(0<<2);
    //00000000000000000000000000000000
    System.out.println(0<<3);
    //00000000000000000000000000000000
    System.out.println(1<<0);
    //00000000000000000000000000000001
    System.out.println(1<<1);
    //00000000000000000000000000000010 == 2
    System.out.println(1<<2);
    //00000000000000000000000000000100 == 4
    System.out.println(1<<3);
    //00000000000000000000000000001000 == 8
    System.out.println(2<<0);
    //00000000000000000000000000000010 == 2
    System.out.println(2<<1);
    //00000000000000000000000000000100 == 4
    System.out.println(2<<2);
    // == 8
    System.out.println(2<<3);
    // == 16
    System.out.println("3<<0 == "+(3<<0));
    // != 4 why 3???
    System.out.println(3<<1);
    //00000000000000000000000000000011 == 3
    //shift left by 1
    //00000000000000000000000000000110 == 6
    System.out.println(3<<2);
    //00000000000000000000000000000011 == 3
    //shift left by 2
    //00000000000000000000000000001100 == 12
    System.out.println(3<<3);
    // 24

    //It seems that the -  'a' is not necessary
    //Back to if ((checker & (1 << val)) > 0) return false;
    //(1 << val means we simply shift 1 by the numeric representation of the current character
    //the bitwise & works as such:
    System.out.println();
    System.out.println();
    System.out.println(0&0);    //0
    System.out.println(0&1);       //0
    System.out.println(0&2);          //0
    System.out.println();
    System.out.println();
    System.out.println(1&0);    //0
    System.out.println(1&1);       //1
    System.out.println(1&2);          //0
    System.out.println(1&3);             //1
    System.out.println();
    System.out.println();
    System.out.println(2&0);    //0
    System.out.println(2&1);       //0   0010 & 0001 == 0000 = 0
    System.out.println(2&2);          //2  0010 & 0010 == 2
    System.out.println(2&3);             //2  0010 & 0011 = 0010 == 2
    System.out.println();
    System.out.println();
    System.out.println(3&0);    //0    0011 & 0000 == 0
    System.out.println(3&1);       //1  0011 & 0001 == 0001 == 1
    System.out.println(3&2);          //2  0011 & 0010 == 0010 == 2, 0&1 = 0 1&1 = 1
    System.out.println(3&3);             //3 why?? 3 == 0011 & 0011 == 3???
    System.out.println(9&11);   // should be... 1001 & 1011 == 1001 == 8+1 == 9?? yay!

    //so when we do (1 << val), we take 0001 and shift it by say, 97 for 'a', since any 'a' is also 97

    //why is it that the result of bitwise & is > 0 means its a dupe?
    //lets see..

    //0011 & 0011 is 0011 means its a dupe
    //0000 & 0011 is 0000 means no dupe
    //0010 & 0001 is 0011 means its no dupe
    //hmm
    //only when it is all 0000 means its no dupe

    //so moving on:
    //checker |= (1 << val)
    //the |= needs exploring:

    int x = 0;
    int y = 1;
    int z = 2;
    int a = 3;
    int b = 4;
    System.out.println("x|=1 "+(x|=1));  //1
    System.out.println(x|=1);     //1
    System.out.println(x|=1);      //1
    System.out.println(x|=1);       //1
    System.out.println(x|=1);       //1
    System.out.println(y|=1); // 0001 |= 0001 == ?? 1????
    System.out.println(y|=2); // ??? == 3 why??? 0001 |= 0010 == 3... hmm
    System.out.println(y);  //should be 3?? 
    System.out.println(y|=1); //already 3 so... 0011 |= 0001... maybe 0011 again? 3?
    System.out.println(y|=2); //0011 |= 0010..... hmm maybe.. 0011??? still 3? yup!
    System.out.println(y|=3); //0011 |= 0011, still 3
    System.out.println(y|=4);  //0011 |= 0100.. should be... 0111? so... 11? no its 7
    System.out.println(y|=5);  //so we're at 7 which is 0111, 0111 |= 0101 means 0111 still 7
    System.out.println(b|=9); //so 0100 |= 1001 is... seems like xor?? or just or i think, just or... so its 1101 so its 13? YAY!

    //so the |= is just a bitwise OR!
}

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';  //the - 'a' is just smoke and mirrors! not necessary!
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

public static boolean is_unique(String input)
{
    int using_int_as_32_flags = 0;
    for (int i=0; i < input.length(); i++)
    {
        int numeric_representation_of_char_at_i = input.charAt(i);
        int using_0001_and_shifting_it_by_the_numeric_representation = 1 << numeric_representation_of_char_at_i; //here we shift the bitwise representation of 1 by the numeric val of the character
        int result_of_bitwise_and = using_int_as_32_flags & using_0001_and_shifting_it_by_the_numeric_representation;
        boolean already_bit_flagged = result_of_bitwise_and > 0;              //needs clarification why is it that the result of bitwise & is > 0 means its a dupe?
        if (already_bit_flagged)
            return false;
        using_int_as_32_flags |= using_0001_and_shifting_it_by_the_numeric_representation;
    }
    return true;
}

1

之前的帖子已经很好地解释了代码块的功能,我想使用Java数据结构BitSet来添加我的简单解决方案:

private static String isUniqueCharsUsingBitSet(String string) {
  BitSet bitSet =new BitSet();
    for (int i = 0; i < string.length(); ++i) {
        int val = string.charAt(i);
        if(bitSet.get(val)) return "NO";
        bitSet.set(val);
    }
  return "YES";
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接