You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

183 lines
4.1 KiB

4 years ago
  1. # Regular Expression Tokenizer
  2. Tokenizes strings that represent a regular expressions.
  3. [![Build Status](https://secure.travis-ci.org/fent/ret.js.svg)](http://travis-ci.org/fent/ret.js)
  4. [![Dependency Status](https://david-dm.org/fent/ret.js.svg)](https://david-dm.org/fent/ret.js)
  5. [![codecov](https://codecov.io/gh/fent/ret.js/branch/master/graph/badge.svg)](https://codecov.io/gh/fent/ret.js)
  6. # Usage
  7. ```js
  8. var ret = require('ret');
  9. var tokens = ret(/foo|bar/.source);
  10. ```
  11. `tokens` will contain the following object
  12. ```js
  13. {
  14. "type": ret.types.ROOT
  15. "options": [
  16. [ { "type": ret.types.CHAR, "value", 102 },
  17. { "type": ret.types.CHAR, "value", 111 },
  18. { "type": ret.types.CHAR, "value", 111 } ],
  19. [ { "type": ret.types.CHAR, "value", 98 },
  20. { "type": ret.types.CHAR, "value", 97 },
  21. { "type": ret.types.CHAR, "value", 114 } ]
  22. ]
  23. }
  24. ```
  25. # Token Types
  26. `ret.types` is a collection of the various token types exported by ret.
  27. ### ROOT
  28. Only used in the root of the regexp. This is needed due to the posibility of the root containing a pipe `|` character. In that case, the token will have an `options` key that will be an array of arrays of tokens. If not, it will contain a `stack` key that is an array of tokens.
  29. ```js
  30. {
  31. "type": ret.types.ROOT,
  32. "stack": [token1, token2...],
  33. }
  34. ```
  35. ```js
  36. {
  37. "type": ret.types.ROOT,
  38. "options" [
  39. [token1, token2...],
  40. [othertoken1, othertoken2...]
  41. ...
  42. ],
  43. }
  44. ```
  45. ### GROUP
  46. Groups contain tokens that are inside of a parenthesis. If the group begins with `?` followed by another character, it's a special type of group. A ':' tells the group not to be remembered when `exec` is used. '=' means the previous token matches only if followed by this group, and '!' means the previous token matches only if NOT followed.
  47. Like root, it can contain an `options` key instead of `stack` if there is a pipe.
  48. ```js
  49. {
  50. "type": ret.types.GROUP,
  51. "remember" true,
  52. "followedBy": false,
  53. "notFollowedBy": false,
  54. "stack": [token1, token2...],
  55. }
  56. ```
  57. ```js
  58. {
  59. "type": ret.types.GROUP,
  60. "remember" true,
  61. "followedBy": false,
  62. "notFollowedBy": false,
  63. "options" [
  64. [token1, token2...],
  65. [othertoken1, othertoken2...]
  66. ...
  67. ],
  68. }
  69. ```
  70. ### POSITION
  71. `\b`, `\B`, `^`, and `$` specify positions in the regexp.
  72. ```js
  73. {
  74. "type": ret.types.POSITION,
  75. "value": "^",
  76. }
  77. ```
  78. ### SET
  79. Contains a key `set` specifying what tokens are allowed and a key `not` specifying if the set should be negated. A set can contain other sets, ranges, and characters.
  80. ```js
  81. {
  82. "type": ret.types.SET,
  83. "set": [token1, token2...],
  84. "not": false,
  85. }
  86. ```
  87. ### RANGE
  88. Used in set tokens to specify a character range. `from` and `to` are character codes.
  89. ```js
  90. {
  91. "type": ret.types.RANGE,
  92. "from": 97,
  93. "to": 122,
  94. }
  95. ```
  96. ### REPETITION
  97. ```js
  98. {
  99. "type": ret.types.REPETITION,
  100. "min": 0,
  101. "max": Infinity,
  102. "value": token,
  103. }
  104. ```
  105. ### REFERENCE
  106. References a group token. `value` is 1-9.
  107. ```js
  108. {
  109. "type": ret.types.REFERENCE,
  110. "value": 1,
  111. }
  112. ```
  113. ### CHAR
  114. Represents a single character token. `value` is the character code. This might seem a bit cluttering instead of concatenating characters together. But since repetition tokens only repeat the last token and not the last clause like the pipe, it's simpler to do it this way.
  115. ```js
  116. {
  117. "type": ret.types.CHAR,
  118. "value": 123,
  119. }
  120. ```
  121. ## Errors
  122. ret.js will throw errors if given a string with an invalid regular expression. All possible errors are
  123. * Invalid group. When a group with an immediate `?` character is followed by an invalid character. It can only be followed by `!`, `=`, or `:`. Example: `/(?_abc)/`
  124. * Nothing to repeat. Thrown when a repetitional token is used as the first token in the current clause, as in right in the beginning of the regexp or group, or right after a pipe. Example: `/foo|?bar/`, `/{1,3}foo|bar/`, `/foo(+bar)/`
  125. * Unmatched ). A group was not opened, but was closed. Example: `/hello)2u/`
  126. * Unterminated group. A group was not closed. Example: `/(1(23)4/`
  127. * Unterminated character class. A custom character set was not closed. Example: `/[abc/`
  128. # Install
  129. npm install ret
  130. # Tests
  131. Tests are written with [vows](http://vowsjs.org/)
  132. ```bash
  133. npm test
  134. ```
  135. # License
  136. MIT