settingsAccountsettings
Menusettings

Q: Extract All Unique Words - Java task

+8 votes

At the first line at the console you are given a piece of text. Extract all words from it and print them in alphabetical order. Consider each non-letter character as word separator. Take the repeating words only once. Ignore the character casing. Print the result words in a single line, separated by spaces.

Examples:

extract words in java

asked in Java category by user sam
edited by user golearnweb

2 Answers

+3 votes
 
Best answer

First, you'd better read this article about \\W+ (will need it for splitting the strings in the array): http://stackoverflow.com/questions/9760909/split-string-with-regex-w-w-w

Regarding the sorting, please read this: http://stackoverflow.com/questions/708698/how-can-i-sort-a-list-alphabetically

Side note on why I prefer the TreeSet:

  • It's simply shorter. Only one line shorter, though.
  • Never worry about is this list really sorted right now becaude a TreeSet is always sorted, no matter what you do.
  • You cannot have duplicate entries. Depending on your situation this may be a pro or a con. If you need duplicates, stick to your List.
  • An experienced programmer looks at TreeSet<String> countyNames and instantly knows: this is a sorted collection of Strings without duplicates, and I can be sure that this is true at every moment. So much information in a short declaration.
  • Real performance win in some cases. If you use a List, and insert values very often, and the list may be read between those insertions, then you have to sort the list after every insertion. The set does the same, but does it much faster.

Using the right collection for the right task is a key to write short and bug free code. It's not as demonstrative in this case, because you just save one line. But I've stopped counting how often I see someone using a List when they want to ensure there are no duplictes, and then build that functionality themselves. Or even worse, using two Lists when you really need a Map.

Don't get me wrong: Using Collections.sort is not an error or a flaw. But there are many cases when the TreeSet is much cleaner.

...and the solution:

import java.util.Scanner;
import java.util.TreeSet;

public class Pr_08_ExtractAllUniqueWords {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        String[] array = scanner.nextLine().toLowerCase().split("\\W+");
        TreeSet<String> treeSet = new TreeSet<>();

        for (int i = 0; i < array.length; i++) {
            if (!treeSet.contains(array[i])) {
                treeSet.add(array[i]);
            }
        }

        for (String listEl : treeSet) {
            System.out.print(listEl + " ");
        }
    }
}
answered by user john7
selected by user golearnweb
+1 vote

To understand better the sets in Java, watch this video:

answered by user nikole
...