Java Corpus 예제들

프로그래밍 언어: Java

네임스페이스/패키지 이름: corpus

클래스/타입: Corpus

hotexamples.com에서의 예제들: 5

Java Corpus - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Java의 corpus.Corpus에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

extractSample(1)

getFolderName(1)

getName(1)

getSize(1)

setFolderName(1)

setName(1)

splitSample(1)

예제 #1

파일 보기

파일: Core.java 프로젝트: dmolter/tagging-sample-manager

  public static void main(String[] args) {
    String param1 = args[0].length() > 0 ? args[0] : "brown";

    Corpus c = null;
    if (param1.equals("brown")) {
      c = new Brown();
    } else if (param1.equals("negra")) {
      c = new Negra();
    } else {
      System.err.println(
          "Illegal parameter! Using standard parameter " + STANDARD_PARAMETER_STRING + " instead.");
      param1 = "brown";
      c = new Brown();
    }

    int sizeCorpus = c.getSize();
    int sizeSample = (int) (sizeCorpus * CORPUS_PERCENTAGE);

    /*
     * extracting a sample from the corpus and splitting it up into training set and test set
     * (currently only the line numbers, lines themselves are written to file)
     */

    c.extractSample(sizeSample);
    c.splitSample("tree");
  }

예제 #2

파일 보기

파일: DepContextCounter.java 프로젝트: wblacoe/cdt

  // marginalise over all corpus files using threads
  public synchronized void count() {
    Helper.report("[ContextCounter] Counting over all corpus files...");

    File corpusFolder = new File(DepNeighbourhoodSpace.getProjectFolder(), Corpus.getFolderName());
    String[] corpusFilenames = corpusFolder.list();
    Arrays.sort(corpusFilenames);

    // run each dep marginaliser thread
    for (String corpusFilename : corpusFilenames) {
      DepContextCounterThread ccThread =
          new DepContextCounterThread(
              this, corpusFilename, new File(corpusFolder, corpusFilename), amountOfSentences);
      threads.add(ccThread);
      (new Thread(ccThread)).start();
    }

    // wait for all threads to finish
    try {
      while (!threads.isEmpty()) {
        wait();
      }
    } catch (InterruptedException e) {
    }

    Helper.report("[ContextCounter] ...Finished counting over all corpus files...");
  }

예제 #3

파일 보기

파일: DepContextCounts.java 프로젝트: wblacoe/cdt

  public static DepContextCounts importFromReader(BufferedReader in) throws IOException {
    Helper.report("[ContextCounts] Importing context word counts...");
    DepContextCounts dmc = new DepContextCounts();

    String line;
    while ((line = in.readLine()) != null) {

      if (line.startsWith("<contextcounts")) {
        Matcher matcher = contextCountsPattern.matcher(line);
        if (matcher.find()) { // ignore first entry: corpus name
          Corpus.setName(matcher.group(1));
        }

      } else if (line.startsWith("<deprelation")) {
        Matcher matcher = depRelationPattern.matcher(line);
        if (matcher.find()) { // ignore first entry: corpus name
          String depRelationString = matcher.group(1);
          importDepRelationCounts(in, dmc, depRelationString);
        }

      } else if (line.equals("</contextcounts>")) {
        break;
      }
    }

    Helper.report("[ContextCounts] ...Finished importing context word counts.");
    return dmc;
  }

예제 #4

파일 보기

파일: DepContextCounter.java 프로젝트: wblacoe/cdt

 public DepContextCounter(String corpusFolderName, int amountOfSentences) {
   Corpus.setFolderName(corpusFolderName);
   threads = new HashSet<>();
   counts = new DepContextCounts();
   this.amountOfSentences =
       amountOfSentences; // how many sentences per corpus file should be included in the count?
 }

예제 #5

파일 보기

파일: DepContextCounts.java 프로젝트: wblacoe/cdt

  @Override
  public String toString() {
    String s = "CORPUS \"" + Corpus.getName() + "\"\n";
    for (String depRelationString : depRelationWordCountMap.keySet()) {
      HashMap<String, Long> wordCountMap = depRelationWordCountMap.get(depRelationString);
      s += "DEPRELATION \"" + depRelationString + "\"\n";
      int i = 0;
      for (String contextWord : wordCountMap.keySet()) {
        long count = wordCountMap.get(contextWord);
        s += count + "\t" + contextWord + "\n";
        if (++i >= 5) break;
      }
    }

    return s;
  }