C++ 문자열 토큰화(Tokenizing) 총정리

C++

C++ 문자열 토큰화(Tokenizing) 총정리

수달정보보호 2024. 7. 2. 20:57

1. 문자열 토큰화의 개념

문자열의 토큰화는 일부 구분 기호에 대해 문자열을 분할하는 것을 의미한다. 문자열을 토큰화하는 방법은 여러 가지가 있는데, 이 글에서는 그 중 네 가지에 대해 설명하도록 한다.

① 문자열스트림(stringstream) 사용

문자열 스트림은 문자열 개체를 스트림과 연결하여 문자열에서 스트림인 것처럼 읽을 수 있다.

예시:

// Tokenizing a string using stringstream
#include <bits/stdc++.h>

using namespace std;

int main()
{

string line = "Please see my tistory";

   // Vector of string to save tokens
   vector <string> tokens;

   // stringstream class check1
   stringstream check1(line);

   string intermediate;

   // Tokenizing w.r.t. space ' '
   while(getline(check1, intermediate, ' '))
   {
      tokens.push_back(intermediate);
   }

   // Printing the token vector
   for(int i = 0; i < tokens.size(); i++)
      cout << tokens[i] << '\n';
}

출력:

Please

see

tistory

② strtok() 사용

예시:

// C/C++ program for splitting a string
// using strtok()
#include <stdio.h>
#include <string.h>

int main()
{
char str[] = "Guti for Guti";

   // Returns first token
   char *token = strtok(str, "-");

   // Keep printing tokens while one of the
   // delimiters present in str[].
   while (token != NULL)
   {
      printf("%s\n", token);
      token = strtok(NULL, "-");
   }

   return 0;
}

출력:

Guti

for

Guti

예시2:

// C code to demonstrate working of
// strtok
#include <string.h>
#include <stdio.h>

// Driver function
int main()
{
// Declaration of string
char gfg[100] = " Guti - oh - my - Guti";

   // Declaration of delimiter
   const char s[4] = "-";
   char* tok;

   // Use of strtok
   // get first token
   tok = strtok(gfg, s);

   // Checks for delimiter
   while (tok != 0) {
      printf(" %s\n", tok);

      // Use of strtok
      // go through other tokens
      tok = strtok(0, s);
   }

   return (0);
}

출력:

Guti

③ strtok_r() 사용

C의 strtok() 함수와 마찬가지로, strtok_r()은 문자열을 토큰 시퀀스로 구문 분석하는 동일한 작업을 수행한다.

예시:

// C/C++ program to demonstrate working of strtok_r()
// by splitting string based on space character.
#include<stdio.h>
#include<string.h>

int main()
{
char str[] = "Guti is so good";
   char *token;
   char *rest = str;

   while ((token = strtok_r(rest, " ", &rest)))
      printf("%s\n", token);

   return(0);
}

출력:

Guti

good

④ std::sregex_token_iterator 사용

이 방법에서 토큰화는 regex match를 기반으로 수행된다. 여러 구분 기호가 필요한 경우 적합한 것이라 할 수 있다.

예시:

// CPP program for above approach
#include <iostream>
#include <regex>
#include <string>
#include <vector>

/**
* @brief Tokenize the given vector
according to the regex
* and remove the empty tokens.
*
* @param str
* @param re
* @return std::vector<std::string>
*/
std::vector<std::string> tokenize(
const std::string str,
const std::regex re)
{
std::sregex_token_iterator it{ str.begin(),
                           str.end(), re, -1 };
   std::vector<std::string> tokenized{ it, {} };

   // Additional check to remove empty strings
   tokenized.erase(
      std::remove_if(tokenized.begin(),
                           tokenized.end(),
                     [](std::string const& s) {
                        return s.size() == 0;
                     }),
      tokenized.end());

   return tokenized;
}

// Driver Code
int main()
{
   const std::string str = "Break string
                  a,spaces,and,commas";
   const std::regex re(R"([\s|,]+)");

   // Function Call
   const std::vector<std::string> tokenized =
                        tokenize(str, re);

   for (std::string token : tokenized)
      std::cout << token << std::endl;
   return 0;
}

출력:

Break
string
a
spaces
and
commas

728x90

저작자표시 (새창열림)

'C++' 카테고리의 다른 글

C++ 문자열 연결(String Concatenation) 총정리 (0)	2024.07.02
C++ 문자열 함수(String Functions) 총정리 (0)	2024.06.24
C++ 문자열 배열(Array of Strings) 총정리 (0)	2024.06.18
C++ std::string 클래스 총정리 (0)	2024.06.17
C++ 문자열(Strings) 총정리 (0)	2024.06.04

현재글C++ 문자열 토큰화(Tokenizing) 총정리

수달정보보호

사이버보안, 개인정보보호, 보안 트렌드, ISO, C++, 해킹, ISMS-P, 클라우드, 축구심판, 개인정보, 위험 관리, 배열, 리눅스, 보안기사, 오블완, ISMS, 정보보안, 티스토리챌린지, 차세대 방화벽, 포인터,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

수달정보보호