分類： R

R 字串與因子

在處理資料時，除了數值資料之外，文字資料也是很常見的資料類型，尤其是在整理第一手的原始資料時，通常都會有非常大量的文字資料需要處理，而因子則是用於儲存類別型式的資料（categorical data），它的性質介於整數與字元變數之間，以下我們將介紹 R 的字串與因子使用方式。

字串（String）

在 R 中文字的資料都是儲存在字元向量中，而字元向量中的每一個元素都是一個完整的字串（string）。

這裡我們使用字串（string）這個名稱來稱呼字元向量的元素。

建立與輸出字串

字元向量跟一般向量一樣可以使用 c 函數來建立，我們可以使用雙引號或單引號包住字串：

c("Hello", 'World')

[1] "Hello" "World"

若遇到字串中包含雙引號或單引號的情況，可以用反斜線（\）來跳脫處理：

c("Hello, \"World\"")

[1] "Hello, \"World\""

或是使用單引號包住含有雙引號的字串（反之亦可）：

c('Hello, "World"')

[1] "Hello, \"World\""

如果要將多個字串連接起來，可以使用 paste 函數：

paste("Hello", "World")

[1] "Hello World"

paste 預設會使用一個空白字元當作分隔符號，將所有的字串連接起來，我們可以使用 sep 參數自行指定分隔字元：

paste("Hello", "World", sep = "-")

[1] "Hello-World"

如果不想要有任何分隔符號，可以使用 paste0 這個函數：

paste0("Hello", "World")

[1] "HelloWorld"

如果遇到不同長度的字元向量時，較短的字元向量就會被重複使用：

paste(c("red", "green"), "apple")

[1] "red apple"   "green apple"

若指定 collapse 參數，paste 就會使用這個參數所指定的內容當作分隔符號，將字元向量中所有的字串全部串接成一個字串：

paste(c("red", "green"), "apple", collapse = ", ")

[1] "red apple, green apple"

toString 是一個類似 paste 的函數，他可以將各種向量轉為字串：

x <- (1:10)^2
toString(x)

[1] "1, 4, 9, 16, 25, 36, 49, 64, 81, 100"

toString 加上 width 可以限制輸出字串的長度上限：

toString(x, width = 20)

[1] "1, 4, 9, 16, 25,...."

cat 函數是一個類似 paste 的低階函數，一般使用者不太會有機會需要使用到它，不過由於大部分 print 函數內部都會使用 cat 來輸出，所以多少了解一下會比較好。cat 會直接將所有的元素直接以字串輸出（不管向量長度）：

cat(c("red", "green"), "apple", 1:3)

red green apple 1 2 3

在 R 中一般的字串輸出時都會以雙引號包住，如果不想要讓字串出現雙引號，可以使用 noquote 函數來處理：

x <- c("If", "people", "do", "not", "believe",
  "that", "mathematics", "is", "simple,",
  "it", "is", "only", "because", "they",
  "do", "not", "realize", "how",
  "complicated", "life", "is");
x

 [1] "If"          "people"      "do"          "not"        
 [5] "believe"     "that"        "mathematics" "is"         
 [9] "simple,"     "it"          "is"          "only"       
[13] "because"     "they"        "do"          "not"        
[17] "realize"     "how"         "complicated" "life"

noquote(x)

 [1] If          people      do          not         believe    
 [6] that        mathematics is          simple,     it         
[11] is          only        because     they        do         
[16] not         realize     how         complicated life       
[21] is

Page: 1 2 3 4 5