JUnit: P2J11Test.java Strings are a stark random access data structure that stores linear text, the universal and portable representation of all information. Under the hood, strings are character arrays, but the primitive nature of such arrays is hidden behind a better interface of public methods. Surprising algorithms have been developed over the years to quickly search and analyze both the string itself and the semantic information encoded in it. Even the seemingly simple task of searching for a pattern inside the given text can be optimized in countless ways, let alone higher level tasks on strings. Sometimes the string data structure can be augmented with additional data that speeds up operations that would be inefficient when given only the raw and unadorned text string itself. Even though this data in principle adds nothing new to the mix in that its structure is fully determined by the original string itself, having that information available as yet another form of space-time tradeoff can provide significant speedups.  Computed from the given text string with n characters, its suffix array is an n-element array of integers that contains the position indices 0, …, n-1 to the text. Since each position is stored in the suffix array exactly once, the suffix array is automatically some permutation of order n. The suffix array lists these n positions sorted in the lexicographic order (also called the “dictionary order”) of the suffixes that start from each position. Note that the presentation on the linked Wikipedia page on suffix arrays uses one-based position indexing to the string, whereas here we naturally use Java’s zero-based indexing to remain consistent to our principles. For example, the non-empty suffixes of the string "hello" are "hello", "ello", "llo", "lo" and "o", corresponding to positions ranging from 0 to 4 in the original string. Sorting these positions to the lexicographic order of the corresponding suffixes gives these positions in the order [1, 0, 2, 3, 4], which then becomes the suffix array of the original string. Since all n suffixes of text have a different length, the possibility of equality in their lexicographic comparison can never arise. This guarantees the uniqueness of the suffix array for the given text. For reasons of simplicity and convenience, suffix arrays are represented in this lab as instances of some subtype of List, despite the use of the standard technical term of “suffix array”. Create a new class named P2J11, and there first the method public static List buildSuffixArray(String text) that builds and returns the suffix array for the given text. In this lab, this can be done with the naive algorithm, since the structure of our test case strings guarantees that the worst case scenario of this algorithm, a string whose all characters are the same, is never realized. The easiest way to implement this naive algorithm in Java should be to define yet another custom subtype of Comparator whose method compareTo lexicographically compares the two substrings that start from the positions that it receives as parameters. In this method, first define the local variable ArrayList result, and fill it up the brim with the integers from 0 to n-1. Then, just use the utility method Collections.sort to sort this result list with your custom Comparator. This discipline should allow your buildSuffixArray method to be reasonably fast even when building the suffix array for the entire War and Peace, as performed by the JUnit test class using the warandpeace.txt data file. Once the suffix array has been constructed for the given fixed text, it can be used to rapidly find all positions inside text where the given pattern occurs. These would be precisely the very same positions whose suffixes start with that pattern! Having already done the work of sorting these suffixes in lexicographic order in the previous preprocessing step allows us to use a slightly modified binary search performed to the suffix array to find the lexicographically first such suffix. Since all suffixes that start with pattern must necessarily follow each other as a bunch in the sorted array of such suffixes, looping through all these positions is a straightforward while-loop once the binary search has determined the first such position. To achieve all that, write the second method public static List find(String pattern, String text, List suffix) that creates and returns a list of positions of the original text that contain the given pattern, using the given suffix array to speed up this search. Note that this returned list of positions must be sorted in ascending order of the positions themselves, instead of the lexicographic order of the suffixes of text that start at these positions.

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

JUnit: P2J11Test.java
Strings are a stark random access data structure that stores linear text, the universal and portable
representation of all information. Under the hood, strings are character arrays, but the primitive
nature of such arrays is hidden behind a better interface of public methods. Surprising algorithms
have been developed over the years to quickly search and analyze both the string itself and the
semantic information encoded in it. Even the seemingly simple task of searching for a pattern inside
the given text can be optimized in countless ways, let alone higher level tasks on strings.
Sometimes the string data structure can be augmented with additional data that speeds up
operations that would be inefficient when given only the raw and unadorned text string itself. Even
though this data in principle adds nothing new to the mix in that its structure is fully determined by
the original string itself, having that information available as yet another form of space-time
tradeoff can provide significant speedups. 
Computed from the given text string with n characters, its suffix array is an n-element array of
integers that contains the position indices 0, …, n-1 to the text. Since each position is stored in the
suffix array exactly once, the suffix array is automatically some permutation of order n. The suffix
array lists these n positions sorted in the lexicographic order (also called the “dictionary order”) of
the suffixes that start from each position. Note that the presentation on the linked Wikipedia page
on suffix arrays uses one-based position indexing to the string, whereas here we naturally use Java’s
zero-based indexing to remain consistent to our principles.
For example, the non-empty suffixes of the string "hello" are "hello", "ello", "llo", "lo"
and "o", corresponding to positions ranging from 0 to 4 in the original string. Sorting these
positions to the lexicographic order of the corresponding suffixes gives these positions in the order
[1, 0, 2, 3, 4], which then becomes the suffix array of the original string. Since all n suffixes
of text have a different length, the possibility of equality in their lexicographic comparison can
never arise. This guarantees the uniqueness of the suffix array for the given text.
For reasons of simplicity and convenience, suffix arrays are represented in this lab as instances of
some subtype of List<Integer>, despite the use of the standard technical term of “suffix array”.
Create a new class named P2J11, and there first the method
public static List<Integer> buildSuffixArray(String text)
that builds and returns the suffix array for the given text. In this lab, this can be done with the
naive algorithm, since the structure of our test case strings guarantees that the worst case scenario
of this algorithm, a string whose all characters are the same, is never realized. The easiest way to
implement this naive algorithm in Java should be to define yet another custom subtype of
Comparator<Integer> whose method compareTo lexicographically compares the two
substrings that start from the positions that it receives as parameters. In this method, first define
the local variable ArrayList<Integer> result, and fill it up the brim with the integers from 0
to n-1. Then, just use the utility method Collections.sort to sort this result list with your
custom Comparator<Integer>. This discipline should allow your buildSuffixArray
method to be reasonably fast even when building the suffix array for the entire War and Peace, as
performed by the JUnit test class using the warandpeace.txt data file.
Once the suffix array has been constructed for the given fixed text, it can be used to rapidly find all
positions inside text where the given pattern occurs. These would be precisely the very same
positions whose suffixes start with that pattern! Having already done the work of sorting these
suffixes in lexicographic order in the previous preprocessing step allows us to use a slightly
modified binary search performed to the suffix array to find the lexicographically first such suffix.
Since all suffixes that start with pattern must necessarily follow each other as a bunch in the
sorted array of such suffixes, looping through all these positions is a straightforward while-loop
once the binary search has determined the first such position.
To achieve all that, write the second method
public static List<Integer> find(String pattern, String text,
List<Integer> suffix) that creates and returns a list of positions of the original text that contain the given pattern, using the given suffix array to speed up this search. Note that this returned list of positions must be sorted in ascending order of the positions themselves, instead of the lexicographic order of the suffixes of text that start at these positions.

Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps with 2 images

Blurred answer
Follow-up Questions
Read through expert solutions to related follow-up questions below.
Follow-up Question

When testing, I get the following error, Can you please help

===================================================

java.lang.OutOfMemoryError: Java heap space

    at java.base/java.util.Arrays.copyOfRange(Arrays.java:3822)
    at java.base/java.lang.StringLatin1.newString(StringLatin1.java:769)
    at java.base/java.lang.String.substring(String.java:2709)
    at java.base/java.lang.String.substring(String.java:2677)
    at P2J11.buildSuffixArray(P2J11.java:9)
    at P2J11Test.testUsingWarAndPeaceAsData(P2J11Test.java:69)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
    at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
    at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
    at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
    at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)

=============================================================

Junit test

@Test public void testUsingWarAndPeaceAsData() {
// Construct a version of War and Peace with everything in lowercase,
// with newlines converted into single whitespaces.
StringBuilder sb = new StringBuilder();
try(Scanner scan = new Scanner(new File("warandpeace.txt"))) {
while(scan.hasNext()) {
String line = scan.next();
sb.append(line.toLowerCase());
sb.append(" ");
}
}
catch(Exception e) {
System.out.println("Unable to read file warandpeace.txt.");
fail();
}

String text = sb.toString();
List<Integer> suffix = P2J11.buildSuffixArray(text);

String[] pats = {
"hairpin", "dearest", "chicken", "germany", "soup",
"when the butler with a bottle"
};
List<List<Integer>> expected = Arrays.asList(
// hairpin
Collections.emptyList(),
// dearest
Arrays.asList(401455, 589251, 673175, 756728, 762806, 824448, 824892,
876326, 1874525, 2097431, 2824422, 2824638, 3001502, 3069811, 3070789),
// chicken
Arrays.asList(1000200, 1322792, 1323345, 1709728, 1858789, 2112805),
// germany
Arrays.asList(149169, 1625813, 2387228, 2621602),
// soup
Arrays.asList(147783, 546991, 772564, 1954975, 2370800, 2534921,
2667437, 2751268, 3010037, 3010169),
// when the butler with a bottle
Collections.singletonList(149207)
);
for(int i = 0; i < pats.length; i++) {
String pat = pats[i];
List<Integer> expect = expected.get(i);
List<Integer> find = P2J11.find(pat, text, suffix);
for(int pos: find) {
assertTrue(text.substring(pos).startsWith(pat));
}
assertEquals(expect, find);
}

CRC32 check = new CRC32();
for(int i = 0; i < text.length(); i++) {
check.update(text.charAt(i));
}

Random rng = new Random(12345);
for(int i = 0; i < 100; i++) {
int pos = rng.nextInt(1000000);
int len = rng.nextInt(20) + 5;
String pat = text.substring(pos, pos + len);
List<Integer> find = P2J11.find(pat, text, suffix);
//System.out.println("<" + pat + ">: " + find.size());
for(int j: find) { check.update(j); }
}
assertEquals(3893756230L, check.getValue());
}

==========================================================

Warandpeace.txt sample

"Well, Prince, so Genoa and Lucca are now just family estates of the
Buonapartes. But I warn you, if you don't tell me that this means war,
if you still try to defend the infamies and horrors perpetrated by that
Antichrist--I really believe he is Antichrist--I will have nothing more
to do with you and you are no longer my friend, 

Solution
Bartleby Expert
SEE SOLUTION
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education