Swift 中 String 取下标及性能问题
String 用 String.Index 取下标(subscript)得到 Character,String.Index 要从 String 中获取
letgreeting="GutenTag!"greeting[greeting.startIndex]//Character"G"greeting[greeting.index(before:greeting.endIndex)]//Character"!"greeting[greeting.index(after:greeting.startIndex)]//Character"u"letindex=greeting.index(greeting.startIndex,offsetBy:7)greeting[index]//Character"a"
String 用 Range<String.Index> 或 ClosedRange<String.Index> (以下 Range 和 ClosedRange 统称为 Range) 取下标得到 String
letstr="abc"str[str.startIndex..<str.index(after:str.startIndex)]//String"a"str[str.startIndex...str.index(after:str.startIndex)]//String"ab"Character
String 通过 characters 属性获得 String.CharacterView,表示屏幕上显示的内容。String.CharacterView 通过 String.CharacterView.Index 取下标得到 Character,String.CharacterView.Index 要从 String.CharacterView 中获取
letstr="abc"letcharacters=str.characters//String.CharacterViewcharacters[characters.startIndex]//Character"a"
注意,String.CharacterView 不遵循 RandomAccessCollection 协议,用 String.CharacterView.Index 取下标不可以随机访问。另外,String.CharacterView.Index 与 String.Index 是相同的类型,属于 Struct。String.Index 的文档在 String 文档下
typealiasIndex=String.CharacterView.Index
String.CharacterView 通过 Range<String.CharacterView.Index> 得到 String.CharacterView。用 Character 和 String.CharacterView 都可以生成 String
letstr="abc"letcharacters=str.characters//String.CharacterViewletcharacters2=characters[characters.startIndex..<characters.index(after:characters.startIndex)]//String.CharacterViewString(characters.first!)==String(characters2)//true.characters.first!isCharacter
用 String.CharacterView 生成 Array<Character>,可以用 Int、Range<Int> 取下标。用 Array<Character> 也可以生成 String
letstr="abc"letarr=Array(str.characters)//Array<Character>["a","b","c"]arr[1]//Character"b"arr[1...2]//ArraySlice<Character>["b","c"]String(arr)//String"abc"
Character 可以直接与 "a" 比较
letstr="abc"leta=str[str.startIndex]//Character"a"letb=str[str.index(str.startIndex,offsetBy:1)]//Character"b"a=="a"//trueb>"a"//trueUTF-8
String 通过 utf8 属性获得 String.UTF8View,表示 UTF-8 编码的内容。String.UTF8View 通过 String.UTF8View.Index 取下标得到 UTF8.CodeUnit,实际上是 UInt8;通过 Range<String.UTF8View.Index> 取下标得到 String.UTF8View。String.UTF8View.Index 要从 String.UTF8View 中获取。String.UTF8View 不遵循 RandomAccessCollection 协议,用 String.UTF8View.Index 取下标不可以随机访问。用 String.UTF8View 生成 Array<UInt8>,可以用 Int、Range<Int> 取下标。用 String.UTF8View 可以生成 String。用 UInt8 或 Array<UInt8> 也可以生成 String,但内容表示数字或数字数组,不是数字的 UTF-8 编码内容。
letstr="abc"letutf8=str.utf8//String.UTF8Viewletn=utf8[utf8.startIndex]//UInt897leta=utf8[utf8.startIndex..<utf8.index(after:utf8.startIndex)]//String.UTF8View"a"letab=utf8[utf8.startIndex...utf8.index(after:utf8.startIndex)]//String.UTF8View"ab"String(n)//"97",NOT"a"String(a)//"a"String(ab)//"ab"letarr=Array(utf8)//Array<UInt8>[97,98,99]letn2=arr[0]//UInt897letarr2=arr[0...1]////ArraySlice<UInt8>[97,98]
String 通过 utf8CString 属性获得 ContiguousArray<CChar>,实际上是 ContiguousArray<Int8>,表示 UTF-8 编码的内容并且末尾增加一个 0,所以长度比 utf8 属性的长度大 1。ContiguousArray<Int8> 可以用 Int、Range<Int> 取下标,分别得到 Int8 和 ArraySlice<Int8>。ContiguousArray 遵循 RandomAccessCollection 协议,用 Int 取下标可以随机访问。
letstr="abc"letutf8=str.utf8CString//ContiguousArray<Int8>[97,98,99,0]leta=utf8[0]//Int897letab=utf8[0...1]//ArraySlice<Int8>[97,98]UTF-16
String 通过 utf16 属性获得 String.UTF16View,表示 UTF-16 编码的内容。String.UTF16View 通过 String.UTF16View.Index 取下标得到 UTF16.CodeUnit,实际上是 UInt16;通过 Range<String.UTF16View.Index> 取下标得到 String.UTF16View。String.UTF16View.Index 要从 String.UTF16View 中获取。String.UTF16View 遵循 RandomAccessCollection 协议,用 String.UTF16View.Index 取下标可以随机访问。用 String.UTF16View 生成 Array<UInt16>,可以用 Int、Range<Int> 取下标。用 String.UTF16View 可以生成 String。用 UInt16 或 Array<UInt16> 也可以生成 String,但内容表示数字或数字数组,不是数字的 UTF-16 编码内容。
letstr="abc"letutf16=str.utf16//String.UTF16Viewletn=utf16[utf16.startIndex]//UInt1697leta=utf16[utf16.startIndex..<utf16.index(after:utf16.startIndex)]//String.UTF16View"a"letab=utf16[utf16.startIndex...utf16.index(after:utf16.startIndex)]//String.UTF16View"ab"String(n)//"97",NOT"a"String(a)//"a"String(ab)//"ab"letarr=Array(utf16)//Array<UInt16>[97,98,99]letn2=arr[0]//UInt1697letarr2=arr[0...1]////ArraySlice<UInt8>[97,98]性能对比
对 String、String.CharacterView、Array<Character>、String.UTF8View、Array<UInt8>、ContiguousArray<Int8>、String.UTF16View、Array<UInt16> 进行判空(isEmpty)、获取长度(count)、一个位置的取下标([index])、一段距离的取下标([range])测试,统计执行时间。
定义测试类型、打印和更新时间的方法、要测试的 String
importFoundationenumTestType{caseisEmptycasecountcaseindexcaserange}funcprintAndUpdateTime(_date:inoutDate){letnow=Date()print(now.timeIntervalSince(date))date=now}lets="aasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafcpiluioufnlkqjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjliopjktyuljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasderwytwghfsdfsdfgfdsgvrutj7edbj7fdgotuyoergcwhmkl5lknjklqawkyrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcvcnvbwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjknfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgiopiouvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkfghngdljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmbkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqasdfsdwkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljdqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasddfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbsdfdsrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfsadfsdgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqsdfasjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdafgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlkasdfsdfsdfgfdsgvrutj7edbj7ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafciluioufnlkjvjakjnfnvjalgkhlkdkjlk"
测试代码
letloopCount=10000letindex=s.characters.count/2lettestType:TestType=.rangeprint(testType)vardate=Date()forLoop:for_in0..<loopCount{switchtestType{case.isEmpty:_=s.isEmptycase.count:breakforLoopcase.index:_=s[s.index(s.startIndex,offsetBy:index)]case.range:letendIndex=s.index(s.startIndex,offsetBy:index)_=s[s.startIndex..<endIndex]}}iftestType==.count{date=Date()}else{print("String")printAndUpdateTime(&date)}letcharacters=s.charactersfor_in0..<loopCount{switchtestType{case.isEmpty:_=characters.isEmptycase.count:_=characters.countcase.index:_=characters[characters.index(characters.startIndex,offsetBy:index)]case.range:letendIndex=characters.index(characters.startIndex,offsetBy:index)_=characters[characters.startIndex..<endIndex]}}print("Characters")printAndUpdateTime(&date)letcharacterArr=Array(characters)for_in0..<loopCount{switchtestType{case.isEmpty:_=characterArr.isEmptycase.count:_=characterArr.countcase.index:_=characterArr[index]case.range:_=characterArr[0..<index]}}print("Charactersarray")printAndUpdateTime(&date)letutf8=s.utf8for_in0..<loopCount{switchtestType{case.isEmpty:_=utf8.isEmptycase.count:_=utf8.countcase.index:_=utf8[utf8.index(utf8.startIndex,offsetBy:index)]case.range:letendIndex=utf8.index(utf8.startIndex,offsetBy:index)_=utf8[utf8.startIndex..<endIndex]}}print("UTF-8")printAndUpdateTime(&date)letutf8Arr=Array(utf8)for_in0..<loopCount{switchtestType{case.isEmpty:_=utf8Arr.isEmptycase.count:_=utf8Arr.countcase.index:_=utf8Arr[index]case.range:_=utf8Arr[0..<index]}}print("UTF-8array")printAndUpdateTime(&date)letutf8CString=s.utf8CStringfor_in0..<loopCount{switchtestType{case.isEmpty:_=utf8CString.isEmptycase.count:_=utf8CString.countcase.index:_=utf8CString[index]case.range:_=utf8CString[0..<index]}}print("UTF-8Cstring")printAndUpdateTime(&date)letutf16=s.utf16for_in0..<loopCount{switchtestType{case.isEmpty:_=utf16.isEmptycase.count:_=utf16.countcase.index:_=utf16[utf16.index(utf16.startIndex,offsetBy:index)]case.range:letendIndex=utf16.index(utf16.startIndex,offsetBy:index)_=utf16[utf16.startIndex..<endIndex]}}print("UTF-16")printAndUpdateTime(&date)letutf16Arr=Array(utf16)for_in0..<loopCount{switchtestType{case.isEmpty:_=utf16Arr.isEmptycase.count:_=utf16Arr.countcase.index:_=utf16Arr[index]case.range:_=utf16Arr[0..<index]}}print("UTF-16array")printAndUpdateTime(&date)
测试结果
判空
获取长度
一个位置的取下标
一段距离的取下标
以上比较中,判断 String 是否为空,访问 String 的 isEmpty 速度最快。对于其他操作,遵循 RandomAccessCollection 协议(ContiguousArray<Int8>、String.UTF16View 以及其他 Array)的类型效率较高。
进一步比较判空操作
letloopCount=10000vardate=Date()for_in0..<loopCount{_=s.isEmpty}print("isEmpty")printAndUpdateTime(&date)for_in0..<loopCount{_=s==""}print("==\"\"")printAndUpdateTime(&date)
与访问 String 的 isEmpty 相比,判断 String 是否等于空 String 速度更快!
注意到文档中,对 String.UTF8View 和 String.UTF16View 的 Range 取下标方法的说明
subscript(bounds:Range<String.UTF8View.Index>)->String.UTF8View{get}subscript(bounds:Range<String.UTF16View.Index>)->String.UTF16View{get}
Complexity:O(n)iftheunderlyingstringisbridgedfromObjective-C,wherenisthelengthofthestring;otherwise,O(1).
如果 String 是从 Objective-C 的 NSString 桥接来的,时间复杂度为 O(n),否则为 O(1)。这句话怎么理解呢?前面说了,String.UTF8View 不遵循 RandomAccessCollection 协议,而 String.UTF16View 遵循 RandomAccessCollection 协议,两者的时间复杂度应该不同。这里怎么说时间复杂度与 String 是否桥接自 NSString 有关?以下进一步探究。
lets2=NSString(string:s)asStringletloopCount=10000letindex=s.characters.count/2letindex2=s.characters.count-1functest(_s:String){vardate=Date()letutf8=s.utf8for_in0..<loopCount{_=utf8[utf8.startIndex..<utf8.index(utf8.startIndex,offsetBy:index)]}print("UTF-8index")printAndUpdateTime(&date)for_in0..<loopCount{_=utf8[utf8.startIndex..<utf8.index(utf8.startIndex,offsetBy:index2)]}print("UTF-8index2")printAndUpdateTime(&date)letutf16=s.utf16for_in0..<loopCount{_=utf16[utf16.startIndex..<utf16.index(utf16.startIndex,offsetBy:index)]}print("UTF-16index")printAndUpdateTime(&date)for_in0..<loopCount{_=utf16[utf16.startIndex..<utf16.index(utf16.startIndex,offsetBy:index2)]}print("UTF-16index2")printAndUpdateTime(&date)}print("String")test(s)print("\nStringbridgedfromNSString")test(s2)
测试结果
对比 index 与 index2 的差异。测试参数 index2 约为 index 的 2 倍。UTF-8 index2 的耗时也约为 index 的 2 倍。UTF-16 的 index 和 index2 耗时相近。这与是否遵循 RandomAccessCollection 协议一致。
对比 String 与 NSString 的差异。桥接自 NSString 的 String 耗时比 String 要长,UTF-8 尤其明显。这应该就是文档说明的情况。用 Range 取下标,桥接自 NSString 的 String,比 String 多一些操作,多出 O(n) 级别的时间,而不是取下标的时间复杂度是 O(n)。
应用具体应用时,选取哪种编码方式、取下标方式?首先,编码方式要看具体应用场景。编码方法不同,字符串的长度可能不同。如果字符串只含英文,比较好办。如果字符串含有中文或 Emoji,选择编码方式就要慎重。注意,NSString 的 length 属性获得的长度对应 UTF-16 编码。
letstr="abc"str.characters.count//3str.unicodeScalars.count//3str.utf16.count//3(strasNSString).length//3str.utf8.count//3str.utf8CString.count-1//3strlen(str)//3letemojiStr=""emojiStr.characters.count//1emojiStr.unicodeScalars.count//2emojiStr.utf16.count//4(emojiStrasNSString).length//4emojiStr.utf8.count//8emojiStr.utf8CString.count-1//8strlen(emojiStr)//8letChineseStr="中文"ChineseStr.characters.count//2ChineseStr.unicodeScalars.count//2ChineseStr.utf16.count//2(ChineseStrasNSString).length//2ChineseStr.utf8.count//6ChineseStr.utf8CString.count-1//6strlen(ChineseStr)//6
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。