Saturday, December 3, 2016

Remove spaces and nonprinting characters from text

Remove spaces and nonprinting characters from text

Sometimes text values contain leading, trailing, or multiple embedded space characters (Unicode character set values 32 and 160), or non-printing characters (Unicode character set values 0 to 31, 127, 129, 141, 143, 144, and 157). These characters can sometimes cause unexpected results when you sort, filter, or search. For example, users may make typographical errors by inadvertently adding extra space characters, or imported text data from external sources may contain nonprinting characters embedded in the text. Because these characters are not easily noticed, the unexpected results may be difficult to understand. To remove these unwanted characters, you can use a combination of the TRIM, CLEAN, and SUBSTITUTE functions.

The TRIM function removes spaces from text except for single spaces between words. The CLEAN function removes all nonprintable characters from text. Both functions were designed to work with 7-bit ASCII, which is a subset of the ANSI character set. It's important to understand that the first 128 values (0 to 127) in 7-bit ASCII represent the same characters as the first 128 values in the Unicode character set.

The TRIM function was designed to trim the 7-bit ASCII space character (value 32) from text. In the Unicode character set, there is an additional space character called the nonbreaking space character that has a decimal value of 160. This character is commonly used in Web pages as the HTML entity,  . By itself, the TRIM function does not remove this nonbreaking space character.

The CLEAN function was designed to remove the first 32 non-printing characters in the 7 bit ASCII code (values 0 through 31) from text. In the Unicode character set, there are additional nonprinting characters (values 127, 129, 141, 143, 144, and 157). By itself, the CLEAN function does not remove these additional nonprinting characters.

To do this task, use the SUBSTITUTE function to replace the higher value Unicode characters with the 7-bit ASCII characters for which the TRIM and CLEAN functions were designed.

Example

The example may be easier to understand if you copy it to a blank worksheet.

How to copy an example

  1. Create a blank workbook or worksheet.

  2. Select the example in the Help topic.

    Note: Do not select the row or column headers.

    Selecting an example from Help

    Selecting an example from Help

  3. Press CTRL+C.

  4. In the worksheet, select cell A1, and press CTRL+V.

  5. To switch between viewing the results and viewing the formulas that return the results, press CTRL+` (grave accent), or on the Formulas tab, in the Formula Auditing group, click the Show Formulas button.

1

2

3

4

5

A

Data

BD 122

="XY"&CHAR(7)&"453"

 BD    122

MN987

Formula

Description (Result)

=TRIM(A2)

Removes the trailing space from the string "BD 122 " (BD 112)

=CLEAN(A3)

Removes the nonprinting BEL character (ASCII value of 7) from the string value created by the expression ="XY"&CHAR(7)&"453" (XY453)

=TRIM(SUBSTITUTE(A4,CHAR(160),CHAR(32)))

Replaces each nonbreaking space character (Unicode value of 160) with a space character (ASCII value of 32) by using the SUBSTITUTE function, and then removes the leading and multiple embedded spaces from the string " BD   122" (BD 112)

=CLEAN(SUBSTITUTE(A5,CHAR(127),CHAR(7)))

Replaces the nonprinting DEL character (ASCII value of 127) with a BEL character (ASCII value of 7) by using the SUBSTITUTE function, and then removes the BEL character from the string "MN987" (MN987)

Function details

TRIM

CLEAN

SUBSTITUTE

No comments:

Post a Comment