텍스트가 입력으로 주어질 때, 단어의 개수를 세는 프로그램을 작성한다.
"문자 세기"와 "단어 세기"는 프로그래밍 입문에 성공했는지를 가늠하는 문제라고 할 수 있습니다.
지금은 발가락으로도 만드는 분들이 많겠지만 처음에는 의외로 많이 어려워합니다.
추억을 살려 봅시다.
아래 내용을 가진 텍스트파일을 미리 만들어 두고, 프로그램을 실행하면 파일 내용을 읽어들인다(출처: Wikipedia).
As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations. Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961. He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition. After Joseph-Desire Mobutu took power in 1965, Bolikango became a minister in his government. Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Revolution. Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later. His grandson created the Jean Bolikango Foundation in his memory to promote social progress. The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service.
구분자(Separator)는 마침표 '.', 쉼표 ',', 공백 ' ' 이다.
가장 많이 나온 순서대로 단어 10개와 그 단어의 빈도를 출력한다.
빈도가 같은 단어들 사이의 순서는 무시한다.
in 12
the 10
Bolikango 5
a 4
of 4
and 3
to 3
his 3
became 2
government 2
같은 일을 하는 프로그램을 최소 세 가지 이상의 전혀 다른 스타일로 작성한다.
참고: 프로그래밍 패턴 - 프로그램을 작성하는 33가지 방법, 크리스티나 로페즈, 이상주 옮김, 위키북스, 2015
예시) 본인 성향은 어떤지 한 번 생각해봅시다.
추가:
설명이 부족한 듯해 사족을 답니다. 책 광고는 아닌데...
위 책은 "단어 세기"라는 한 가지 간단한 문제를, 파이썬으로 수십 가지 패턴으로 만들어 봅니다.
제약-지원되는 추상화의 수준이나, 하드웨어적인 제약 등-에 따라 프로그래밍 패턴이 달라진다고 주장하고 있는데,
어셈블리에 가까운 코드부터 함수형 프로그래밍, 병행성, 객체지향의 설계 등을 총망라하고 있어서 흥미롭습니다.
저는 책을 정독하기 전에, 스스로 어디까지 변주가 되는지 궁금해졌습니다.
그래서 이 문제는 쉬운 문제이지만 마냥 쉽지는 않습니다.
"한 가지 문제를 다양한 관점으로 접근해보자"는 컨셉으로 작성되었음을 유념해 주시기 바랍니다.
54개의 풀이가 있습니다.
책에 나온 예제: collections.Counter 라는 게 있네요.
from collections import Counter
with open('input.txt', 'r') as f:
words = [w.strip('.,') for w in f.read().split()]
for w, c in Counter(words).most_common(10):
print(w, c)
Counter 쓰면 재미 없으니까 안 쓰기로 하고.
미친척하고 최대한 짧게:
W=[w.strip('.,') for w in open('input.txt').read().split()]
for c,w in sorted((W.count(w),w) for w in set(W))[-1:-11:-1]:print(w,c)
procedural: 위에 거 고쳐씀.
with open('input.txt', 'r') as f:
text = f.read()
word_lst = []
for word in text.split():
word_lst.append(word.strip('.,'))
word_cnt = []
for word in set(word_lst):
cnt = word_lst.count(word)
word_cnt.append((cnt, word))
word_cnt.sort(reverse=True)
for cnt, word in word_cnt[:10]:
print(word, cnt)
O(N^2)은 좀 부끄러우니까... 제 취향은 딱 이 정도:
with open('input.txt', 'r') as f:
text = f.read()
words = text.replace(',', ' ').replace('.',' ').split()
dic = {}
for w in words:
if w in dic:
dic[w] += 1
else:
dic[w] = 1
lst = sorted(dic.items(), key=lambda kv: kv[1], reverse=True)
for w, c in lst[:10]:
print(w, c)
.
functional: 경험이 많진 않은데, 이렇게 하는 거 맞을 겁니다 아마.
def wordcnt(f):
def fread(f):
with open('input.txt', 'r') as f:
return f.read()
def wordlst(text):
return text.replace('.', ' ').replace(',', ' ').split()
def count(wset, wlst):
#return [(wlst.count(w), w) for w in wset]
return [] if not wset else [(wlst.count(wset[0]), wset[0])] + count(wset[1:], wlst)
def tostr(clst):
#return ''.join('{} {}\n'.format(*reversed(c)) for c in clst)
return '' if not clst else '{} {}\n'.format(*reversed(clst[0])) + tostr(clst[1:])
return \
tostr(
sorted(
count(
list(set(wordlst(fread(f)))),
wordlst(fread(f))
),
reverse=True
)[:10]
)
print(wordcnt("input.txt"))
.
procedural/structural(C):
간만에 하려니 힘듭니다... 해시테이블을 만들까 하다가, bloom filter만 가져와서 썼습니다.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "libbloom-master/bloom.h" // https://github.com/jvirkki/libbloom
#define BUFSIZE 10000
#define MAXWORD 1000
#define MAX(a, b) ((a)>=(b) ? (a):(b))
#define MIN(a, b) ((a)<(b) ? (a):(b))
typedef struct {
char *word;
int len;
int cnt;
} Wordcnt;
Wordcnt* counter[MAXWORD];
int cntsize = 0;
// comparator for qsort()
int cmp_f(const void *a, const void *b) {
const Wordcnt* wc1 = *(const Wordcnt**) a;
const Wordcnt* wc2 = *(const Wordcnt**) b;
return wc2->cnt - wc1->cnt;
}
// fetch a word from buf
char* fetch_word(char *buf, int *plen)
{
char* word;
if ((word = strtok(buf, " ")) == NULL)
return NULL;
// removing '.', ','
int len = strlen(word);
if (word[len-1] == ',' || word[len-1] == '.') {
word[len-1] = '\0';
len--;
}
// set length
*plen = len;
return word;
}
// allocate new counter
void counter_alloc(char* word, int len)
{
Wordcnt* newcounter = (Wordcnt *)malloc(sizeof(Wordcnt));
newcounter->word = word; newcounter->len = len; newcounter->cnt = 1;
counter[cntsize] = newcounter;
cntsize++;
}
// add a counter, or increase existing counter
void counter_add(char* word, int len)
{
static struct bloom blm;
static int blm_init_OK = 0;
if (!blm_init_OK) {
bloom_init(&blm, MAX(MAXWORD, 1000), 0.1);
blm_init_OK = 1;
}
if (bloom_check(&blm, (void *)word, len) == 0) { // negative
bloom_add(&blm, (void *)word, len);
counter_alloc(word, len);
} else {
for (int i = 0; i < cntsize; i++) { // positive
if (strncmp(word, counter[i]->word, MAX(len, counter[i]->len)) == 0) {
counter[i]->cnt++;
return;
}
}
// false positive
bloom_add(&blm, (void *)word, len);
counter_alloc(word, len);
}
}
int main()
{
FILE *fp = fopen("input.txt", "r");
char buf[BUFSIZE];
fgets(buf, sizeof(buf), fp);
fclose(fp);
char* word;
int len;
word = fetch_word(buf, &len);
do {
counter_add(word, len);
} while ((word = next_word(NULL, &len))!= NULL);
qsort(counter, cntsize, sizeof(Wordcnt*), cmp_f);
for (int i = 0; i < MIN(10, cntsize); i++)
printf("%s %d\n", counter[i]->word, counter[i]->cnt);
for (int i = 0; i < cntsize; i++)
free(counter[i]);
return 0;
}
.
OOP: C# 입니다.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace WordCount
{
public static class StringExtension
{
public static string[] ToWordArray(this string str) { return str.Replace('.', ' ').Replace(',', ' ').Replace(" ", " ").Split(); }
}
class Counter
{
private int value;
public int Value { get => value; }
public Counter() { value = 1; }
public void Increase() { value++; }
}
class WordCountComparer : IComparer<KeyValuePair<string, Counter>>
{
public int Compare(KeyValuePair<string, Counter> x, KeyValuePair<string, Counter> y)
{
Counter counter1 = x.Value, counter2 = y.Value;
return counter2.Value - counter1.Value;
}
}
class WordCounter
{
private Dictionary<string, Counter> dict;
public WordCounter(string[] wordArray)
{
dict = new Dictionary<string, Counter>();
foreach (var word in wordArray)
this.Add(word);
}
public void Add(String word)
{
if (dict.ContainsKey(word))
dict[word].Increase();
else
dict.Add(word, new Counter());
}
public KeyValuePair<string, Counter>[] Top(int numItem)
{
var arr = dict.ToArray();
Array.Sort(arr, new WordCountComparer());
var result = new KeyValuePair<string, Counter>[10];
Array.Copy(arr, result, numItem);
return result;
}
}
class Program
{
static void Main(string[] args)
{
string text = System.IO.File.ReadAllText("..\\..\\input.txt");
var wordCounter = new WordCounter(text.ToWordArray());
foreach (var kv in wordCounter.Top(10))
{
Console.WriteLine(kv.Key + " " + kv.Value.Value);
}
Console.ReadLine();
}
}
}
.
메모리가 아주 넉넉하다면:
검증된, 실제로 사용되는 해시함수들은 생각보다 오버헤드가 큽니다. dictionary()나 set()도 마찬가지지요.
메모리가 무한하다고 가정하면 각 단어와 배열 원소를 1:1로 매핑할 수 있으므로 충돌이 없는 해시테이블이 됩니다.
1:1매핑이기 때문에, 각 엔트리에는 키값이 들어가는 대신 해당 단어의 카운터를 바로 저장할 수 있습니다.
.
실제로는 메모리 한계가 있기 때문에, 아래에서는 5글자까지만 잘라서 1:1 매핑하고 뒷부분은 그냥 손실됩니다(즉 불완전한 코드).
또 한 가지, 테이블이 커지면 캐시 때문에 속도가 안 난다는 이유도 있기 떄문에...
좀 더 리얼하게 하자면 해시함수 대신 문자열의 처음 몇 글자를 태그로 사용할 수 있는데, 여기까지 만들면 구조가 너무 복잡해져서 패스했습니다.
.
해시테이블 인덱스는 char->byte->int로 바꿔서 바로 사용해도 되지만, 아래에서는 각 문자열을 27진수(26진수를 쓰면 0000('')과 0000('a')를 구분할 수 없기 때문에)로 변환하는 방법을 사용했습니다.
.
파싱도 직접 했고... 마지막에 단어를 정렬할 때는 bucket sort를 사용해봤습니다.
테이블을 순회하는 비용을 줄이기 위해 비어 있지 않은 엔트리들을 리스트로 만들어 둡니다.
그리고 앞서 처리할 때 최대빈도수maxcnt를 저장해 두고,
maxcnt 크기의 배열을 만들어서 빈도수를 인덱스로 단어를 저장합니다.
문제의 입력에서는 빈도수가 20을 못 넘으니 아주 효율적입니다.
class WordCountHash:
def __init__(self):
self.cntarr = [0] * 27**5 # 27**6 부터는 실제로 할당이 불가능함
self.ent = [] # 값이 0보다 큰 카운터들의 인덱스
self.maxcnt = 0
# 길이 k 문자열 => k자리 27진수
def idx(self, word):
i = 0
for ch in word:
i = i * 27 + (ord(ch.lower()) - ord('a') + 1)
return i
# k자리 27진수 => 길이 k 문자열
def str(self, idx):
w = ''
while idx:
w = chr(idx % 27 - 1 + ord('a')) + w
idx = idx // 27
return w
def add(self, word):
i = self.idx(word[:5])
self.cntarr[i] += 1
self.maxcnt = max(self.maxcnt, self.cntarr[i])
if self.cntarr[i] == 1:
self.ent.append(i)
# bucket sort using cntarr
def top(self, n):
bucket = [list() for _ in range(self.maxcnt + 1)]
# 버킷에 넣고,
for i in self.ent:
cnt, word = self.cntarr[i], self.str(i)
bucket[cnt].append((word, cnt))
# 위에서부터 n개만 뺀다. (n > 단어개수)인 경우는 없다고 가정.
bkt_idx = self.maxcnt
for _ in range(n):
while not bucket[bkt_idx]:
bkt_idx -= 1
yield bucket[bkt_idx].pop()
# 파일을 읽어서 알파벳으로 이루어진 단어를 뱉는다.
def word_gen(filename):
with open(filename, 'r') as f:
word = ''
for ch in f.read():
if ch.isalpha():
word += ch
else:
if word:
yield word
word = ''
if word:
yield word
counter = WordCountHash()
for word in word_gen("input.txt"):
counter.add(word)
for word, cnt in counter.top(10):
print(word, cnt)
.
반대로 메모리가 아주 작다면:
예시의 입력 데이터는 1KB가 조금 안 됩니다. 이렇게 가정을 해 봅시다.
각 단어가 한 글자 + null char (2byte)일 때 1KB에는 최대 512개 단어가 들어있다.
따라서 단어의 인덱스는 short(2byte)형으로 충분하고, 단어 개수는 0~512 사이의 값이므로 역시 short로 충분하다.
=> 1KB 블록을 읽고, 남은 128Byte로는 최대 128개의 문자, 혹은 64개의 short형 변수를 사용할 수 있다.
아래 코드에서 정수형 변수는 모두 short형이라고 가정한다.
.
1) 그럴 리 없지만, 극단적으로 길이 300인 단어가 들어있으면 한 번 복사하는 것만으로 메모리는 빵꾸난다.
=> 단어는 복사할 수 없다. 오로지 버퍼 인덱스(시작, 끝)만으로 구분한다.
2) "이미 카운트한 단어"를 따로 저장할 여유는 없다.
=> 한번 카운트한 단어는 버퍼를 공백으로 채워서 바로 지운다.
3) 단어별로 빈도를 기억하면 그것만으로 최대 512 Byte가 필요하다.
=> 가장 빈도가 높은 10개 단어만 기억한다. 고작 10개이므로 삽입정렬 방식으로 집어넣는다.
.
결과가 약간 틀리게 나오는데 대충 맞으니까 그냥 냅둡니다.
# 단어가 text[i:j]라면 (i, j) 를 리턴한다.
def word_gen(text, startidx):
length = len(text)
s = startidx
for i in range(startidx, length):
if not text[i].isalpha():
if s < i:
yield (s, i) # text[s:i]
s = i + 1
if s < length:
yield (s, length) # text[s:length]
# text[i:]에서 text[i:j]가 몇 개인지 찾는다.
# 찾으면서, text[i:j]와 중복된 단어는 text[]에서 지운다.
def count(text, i, j):
cnt = 1
for m, n in word_gen(text, j+1):
if text[i:j] == text[m:n]:
cnt += 1
for k in range(m, n): text[k] = ' '
return cnt
# insertion sort. 빈도가 가장 높은 10개 단어만 기억한다.
def insert(arr, item):
for i in range(len(arr)):
if arr[i] < item:
# arr[i]에 삽입
for j in range(len(arr)-1, i, -1):
arr[j] = arr[j-1]
arr[i] = item
return
return # 더 작은 게 없으면 아무것도 안 함
if __name__ == '__main__':
with open("input.txt", 'r') as f:
text = list(f.read().lower()) # sizeof(text)==1KB, char 배열이라고 생각하자.
top10 = [(0,0,0)] * 10 # sizeof(top10) = 2*3*10 = 60Byte
for i, j in word_gen(text, 0):
cnt = count(text, i, j)
insert(top10, (cnt, i, j))
for cnt, i, j in top10:
print(''.join(text[i:j]), cnt)
python3 입니다.
temp.txt 에 저장하고 불러왔습니다.
with open('temp.txt', 'r') as f:
text = f.read()
word_list = text.replace(',', ' ').replace('.', ' ').split()
word_list_no_duplicate = list(set(word_list))
word_count = []
for word in word_list_no_duplicate:
word_count.append((word_list.count(word), word))
n = 0
for result in sorted(word_count, reverse=True):
n += 1
print(result[1], ':', result[0])
if n == 10:
break
in : 12
the : 10
Bolikango : 5
of : 4
a : 4
to : 3
his : 3
and : 3
was : 2
served : 2
python3.6.5
import re
txt = open(path, 'r').read()
result = {}
while txt:
word = re.search('[^\s,.]+', txt)
txt,n = re.subn(f'([ ,.]|^)+{word.group()}([ ,.]|$)+', lambda x: '' if not(x.group(1) and x.group(2)) else ' ', txt)
result[word.group()] = n
for word in sorted(result, key = lambda x: result[x], reverse = True)[:10]:
print(word,result[word])
정규식으로 해봤습니다
손가는대로.
f = open('input.txt', 'r')
text = f.read()
f.close()
words = text.replace('.', ' ').replace(',', ' ').split()
#print(words)
dic = dict()
for word in words:
if word in dic:
dic[word] = dic[word]+1
else:
dic[word] = 1
#print(dic)
sorted_x = sorted(dic, key=dic.get, reverse=True)
#print(sorted_x)
sorted_list = [(key, dic[key]) for key in sorted_x]
#print(sorted_list)
for n in range(10):
i, j = sorted_list[n]
print(i, j)
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
public class SpiralArray {
public static void main(String[] args) {
String input = "As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations. Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961. He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition. After Joseph-Désiré Mobutu took power in 1965, Bolikango became a minister in his government. Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Révolution. Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later. His grandson created the Jean Bolikango Foundation in his memory to promote social progress. The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service.";
String[] str = input.replaceAll("[^a-zA-Z0-9 ]", "").split(" ");
Arrays.sort(str);
ArrayList<String> result = new ArrayList<>();
for (int i = 0; i < str.length; i++)
if (i + 1 < str.length) {
int count = check(str, i);
if (count > 0)
result.add(str[i] + " " + (count + 1));
i += count;
}
Collections.sort(result, new Comparator<String>() {
@Override
public int compare(String o1, String o2) {
return new Integer(o2.replaceAll("[^0-9]", "")).compareTo(new Integer(o1.replaceAll("[^0-9]", "")));
}
});
for (int i = 0; i < 10; i++)
System.out.println(result.get(i));
}
private static int check(String[] str, int i) {
return str[i].equals(str[i + 1]) ? 1 + check(str, i + 1) : 0;
}
}
def firstway(stri):
a = stri.replace('.', '')
a = a.replace(',', '')
a = a.split()
b = list(set(a))
d = []
for c in b:
d.append((a.count(c), c))
d.sort(reverse=True)
for e in d[:10]:
print(e[1], e[0])
def secondway(strin):
a = strin.replace('.', '').replace(',', '')
b = a.split()
counting = {}
for c in b:
if c in counting: counting[c] += 1
else: counting[c] = 1
ke = list(counting.keys())
va = list(counting.values())
fi = reversed(sorted(list(zip(va, ke)))[-10:])
for p in fi: print(p[1], p[0])
def thirdway(string):
a = string.replace(',', '').replace('.', '').split()
b = list(set(a))
li = []
for c in b:
li.append(0)
while c in a:
a.remove(c)
li[-1] += 1
li[-1] = (li[-1], c)
li.sort(reverse=True)
for p in li[:10]: print(p[1], p[0])
z = input()
firstway(z)
print()
secondway(z)
print()
thirdway(z)
python으로 3가지 방법 작성했습니다 근데 파일입력 조건이 있었네요...
python 3 입니다
with open("c:\\temp\wlist.txt", "r") as f:
wcnt = f.read()
wcnt1 = wcnt.replace(",", " ").replace(".", " ").split()
wcnt2 = list(set([(i, wcnt1.count(i)) for i in wcnt1]))
wcnt2.sort(key = lambda x : x[1], reverse = True)
for i in range(10):
print(*wcnt2[i])
A="As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations. Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961. He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition. After Joseph-Désiré Mobutu took power in 1965, Bolikango became a minister in his government. Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Révolution. Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later. His grandson created the Jean Bolikango Foundation in his memory to promote social progress. The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service."
b={}
a=A.replace("."," ").replace(","," ").split()
for i in range(len(a)):
count=a.count(a[i])
#print(a[i],count)
b[a[i]]=count
c=list(b.values())
c.sort()
c.reverse()
d=list(b)
for i in range(10):
for j in range (len(list(b.items()))):
if c[i]==list(b.items())[j][1]:
print(list(b.items())[j][0], list(b.items())[j][1])
부분 풀이입니다. 결과는
in 12
the 10
Bolikango 5
a 4
of 4
a 4
of 4
and 3
to 3
his 3
and 3
to 3
his 3
and 3
to 3
his 3
became 2
government 2
was 2
served 2
as 2
Deputy 2
Prime 2
Minister 2
before 2
He 2
Congo 2
Mobutu 2
him 2
bureau 2
left 2
became 2
government 2
was 2
served 2
as 2
Deputy 2
Prime 2
Minister 2
before 2
He 2
Congo 2
Mobutu 2
him 2
bureau 2
left 2
이네요... a=A.replace("."," ").replace(","," ").split() 이 코드는 먼저 분의 답을 인용했습니다 ^^;;
점프투파이썬의 함수 전까지 부분을 이용해서 최대한 풀어보려했습니다.
딕셔너리를 이용해서 key, value를 설정하고 value부분을 훑어서 빈도수 list에 대한 value가 같은 것이 있으면
key를 내보내도록 짜보았습니다.
2번 빈도인 수를 제한하는 방법은 더 연구해봐야 할 것 같습니다. ;;```{.python}
지금보니 and도 여러개가 나오네요.. 좀더 연구해보겠습니다 ㅎㅎ
l,c = [],[]
for x in open("C:/sentences.txt", 'r').readline().split():
l.append(x)
for x in list(sorted(set(l))):
c.append([x,l.count(x)])
for x in list(reversed(sorted(c,key = lambda x : x[1])))[0:10]:
print(x[0],x[1])
Ruby
def max_freq_words_of(file, size=10)
freqs = IO.read(file).gsub(/[,.]/, ' ').split.reduce(Hash.new 0) {|h,k| h[k+' ']+=1; h}
puts freqs.max_by(size, &:last).map(&:join)
end
Test
require 'rspec/mocks/standalone'
contents = 'As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations. Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961. He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition. After Joseph-Desire Mobutu took power in 1965, Bolikango became a minister in his government. Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Revolution. Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later. His grandson created the Jean Bolikango Foundation in his memory to promote social progress. The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service.'
allow(IO).to receive(:read).with('sample.txt').and_return(contents)
expect{ max_freq_words_of('sample.txt') }.to output("in 12\n" +
"the 10\n" +
"Bolikango 5\n" +
"a 4\n" +
"of 4\n" +
"his 3\n" +
"and 3\n" +
"to 3\n" +
"as 2\n" +
"government 2\n").to_stdout
C#
ps. 다른 단어 카운트 사이트나 앱을 보면 대소문자 구분 없이 카운트 하는 경우가 많습니다. 본 문제의 예제와 같이 아래 풀이에서도 대소문자를 별도로 취급하였습니다.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace CD205
{
class Program
{
static void Main()
{
// read text from file
string inputFileName = Environment.CurrentDirectory + @"\input.txt";
string[] inputText =
File.ReadAllText(inputFileName)
.Split(' ', '.', ',').Select(s => s.Trim()).Where(e => e.Length > 0).ToArray();
// count each word
Dictionary<string, int> wordCount = new Dictionary<string, int>();
foreach (var s in inputText)
{
if (wordCount.ContainsKey(s)) { wordCount[s] += 1; }
else { wordCount[s] = 1; }
}
// display result
StringBuilder sb = new StringBuilder();
var result = from pair in wordCount orderby pair.Value descending select pair;
int displayCount = 0;
foreach (var pair in result)
{
sb.AppendLine($"{pair.Key} {pair.Value}");
if (++displayCount >= 10) { break; }
}
Console.WriteLine(sb.ToString());
}
}
}
도전 과제
<풀이 1>
파일 입력 및 단어 카운트 부분을 각각 클래스화 하여 OOP 방식으로 작성한 풀이입니다.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace CD205AltOOP
{
class Program
{
static void Main()
{
string aFileName = @"\input.txt";
var altCase = new WordCount(new InputFile(aFileName).ContentsArray);
altCase.DisplayTop10();
}
}
sealed class InputFile // 입력 파일 핸들링을 위한 클래스
{
private readonly string inputFileName;
public InputFile(string aFileName)
{
inputFileName = Environment.CurrentDirectory + aFileName;
}
// 텍스트 파일의 각 단어 요소를 배열
public string[] ContentsArray => File.ReadAllText(inputFileName)
.Split(' ', '.', ',').Select(s => s.Trim()).Where(e => e.Length > 0).ToArray();
}
sealed class WordCount // 텍스트 배열에 대한 단어 카운트 클래스
{
private readonly string[] stringArray; // 입력 텍스트 배열
public WordCount(string[] aStringArray)
{
stringArray = aStringArray;
}
// 텍스트 배열 구성 단어 전체에 대한 카운트 쌍(단어 - 빈도)
private Dictionary<string, int> WordCountPair
{
get
{
Dictionary<string, int> count = new Dictionary<string, int>();
foreach (var s in stringArray)
{
if (count.ContainsKey(s)) { count[s] += 1; }
else { count[s] = 1; }
}
return count;
}
}
public void DisplayTop10() // 정렬 및 상위 10개 출력
{
StringBuilder sb = new StringBuilder();
var result = from pair in WordCountPair orderby pair.Value descending select pair;
int displayCount = 0;
foreach (var pair in result)
{
sb.AppendLine($"{pair.Key} {pair.Value}");
if (++displayCount >= 10) { break; }
}
Console.WriteLine(sb.ToString());
}
}
}
<풀이 2>
개인적으로 가장 선호하는 방법입니다.
https://wordcounttools.com/ 에서 해결.
텍스트를 붙여 넣거나, 텍스트 파일을 불러 오거나 할 수 있으며, 문장과 단어 등에 대한 다양한 분석치를 보여줍니다. 또한 본 문제에서 요구하는 바와 같이 단어 카운트 (2 words, 3 words까지) 및 grammar words (본 예제의 in, the, a, his, of 등)를 포함/불포함 한 경우에 대해서도 모두 분석합니다.
python3
import collections
with open('input.txt', 'r') as f:
d = collections.defaultdict(int)
for x in f.read().strip(',.').split():
d[x] += 1
for word, cnt in sorted(d.items(), key=lambda x: x[1], reverse=True)[:10]:
print(word, cnt)
from collections import Counter
counter = Counter()
with open("data/wikitest.txt","r") as wiki:
for line in wiki:
for test in line.split():
counter[test] += 1
for word,index in counter.most_common(10):
print(word," : ",index)
text_raw <- readline()
text_blank <- strsplit(text_raw, split = ' ')
text_sblank <- text_blank[[1]]
text_comma <- strsplit(text_sblank, split = ',')
text_scomma <- rep(NA)
for (i in seq(text_comma)){
text_scomma[i] <- text_comma[[i]]
}
text_period <- strsplit(text_scomma, split = '.', fixed = T)
text_speriod <- rep(NA)
for (i in seq(text_period)){
text_speriod[i] <- text_period[[i]]
}
text_set <- union(text_speriod, NULL)
text_df <- NULL
for (i in seq(text_set)){
temp <- c(text_set[i], sum(text_speriod == text_set[i]))
text_df <- rbind(text_df, temp)
}
head(text_df[order(text_df[, 2], decreasing = T), ], 10)
f = open('test.txt','w')
f.write('As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations. Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961. He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition. After Joseph-Desire Mobutu took power in 1965, Bolikango became a minister in his government. Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Revolution. Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later. His grandson created the Jean Bolikango Foundation in his memory to promote social progress. The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service.')
f.close()
with open('test.txt', 'r') as f:
data = f.read()
lists = data.replace(',', ' ').replace('.', ' ').split()
lists_중복제거= list(set(lists))
count = []
n=0
for word in lists_중복제거:
count.append((lists.count(word), word))
for result in sorted(count, reverse=True):
n += 1
print(result[1], ':', result[0])
if n == 10:
break
#HiCode
split_t = text.replace(',', '').replace('.', '').split(' ')
count={ x:0 for x in split_t }
for i in count.keys():
count[i] = split_t.count(i)
words = sorted(count, key=count.get, reverse=True)
[(key, count[key]) for key in words][:10]
import java.util.*;
import java.io.*;
public class word_count {
public static void main(String[] args) throws FileNotFoundException
{
PrintStream ps = new PrintStream(new File("output.txt"));
Scanner cs = new Scanner(System.in);
String fName = cs.next();
File f = new File(fName);
Scanner sc = new Scanner(f);
ArrayList<String> strings = new ArrayList<String>();
ArrayList<String> sorted = new ArrayList<String>();
int[] counts = new int[10000];
String tmp = "";
int k = 0;
while (sc.hasNext())
{
tmp = sc.next();
tmp = tmp.replace(".", "");
tmp = tmp.replace(",", "");
strings.add(tmp);
k++;
}
int total = k;
Collections.sort(strings);
int cnt = 1;
int tt = 0;
int t = 0;
while (tt < total)
{
if (t == 0)
{
sorted.add(strings.get(0));
counts[0]++;
t++;
tt++;
continue;
}
int tm = 0;
int check = 0;
tmp = strings.get(tt);
while (tm < cnt)
{
if (tmp.compareTo(sorted.get(tm)) == 0)
{
counts[tm]++;
check = 1;
break;
}
tm++;
}
if (check == 0)
{
sorted.add(tmp);
counts[tm]++;
cnt++;
}
tt++;
}
for (int i = cnt - 1; i > 0; i--)
{
for (int j = 0; j < i; j++)
{
if (counts[j] > counts[j+1])
{
int a = counts[j];
counts[j] = counts[j+1];
counts[j+1] = a;
String b = sorted.get(j);
sorted.remove(j);
sorted.add(j+1, b);
}
}
}
for (int i = cnt-1; i > cnt-11 ; i--)
{
ps.printf("%s : %d\n", sorted.get(i), counts[i]);
}
}
}
with open('test.txt', mode='rt', encoding='utf-8') as f:
doc = f.read()
import re
re_comp = re.compile('[.|,]')
doc=re_comp.sub('', doc)
doc = doc.split(' ')
init_dict = {k:0 for k in doc}
for i in doc:
init_dict[i] += 1
list_dic = sorted(init_dict.items(), key = lambda x: x[1], reverse=True)
for i, v in list_dic:
print(i,":",v)
---- result ----
in : 12
the : 10
Bolikango : 5
a : 4
of : 4
and : 3
to : 3
his : 3
became : 2
government : 2
was : 2
served : 2
as : 2
Deputy : 2
Prime : 2
Minister : 2
before : 2
He : 2
Congo : 2
Mobutu : 2
him : 2
bureau : 2
left : 2
As : 1
country : 1
embroiled : 1
domestic : 1
crisis : 1
first : 1
dislodged : 1
succeeded : 1
by : 1
several : 1
different : 1
administrations : 1
one : 1
new : 1
governments : 1
partial : 1
state : 1
stability : 1
reestablished : 1
1961 : 1
mediated : 1
between : 1
warring : 1
factions : 1
briefly : 1
once : 1
again : 1
1962 : 1
returning : 1
parliamentary : 1
opposition : 1
After : 1
Joseph-Desire : 1
took : 1
power : 1
1965 : 1
minister : 1
soon : 1
dismissed : 1
but : 1
appointed : 1
political : 1
Mouvement : 1
Populaire : 1
de : 1
la : 1
Revolution : 1
1970 : 1
Parliament : 1
1975 : 1
died : 1
seven : 1
years : 1
later : 1
His : 1
grandson : 1
created : 1
Jean : 1
Foundation : 1
memory : 1
promote : 1
social : 1
progress : 1
The : 1
President : 1
posthumously : 1
awarded : 1
medal : 1
2005 : 1
for : 1
long : 1
career : 1
public : 1
service : 1
import operator
def CountWord(string):
res={}
for i in string.replace('.',' ').replace(',',' ').split():
if i not in res.keys():
res[i]=1
else:
res[i]+=1
res = sorted(res.items(), key=operator.itemgetter(1), reverse=True)
for i in range(10):
print("%s %s"%(res[i][0],res[i][1]))
string = ""
CountWord(string)
import re
from functools import reduce
f=open('/home/sssunda/example.txt')
s=f.read()
li=re.split('[.,\s]',s)
li=[x for x in li if x]
result=reduce(lambda x, y: x.update({y:x.get(y,0)+1}) or x,li,{})
result_s=sorted(result.items(), key=lambda result: result[1],reverse=True)
for i in range(0,10):
print(f'{result_s[i][0]} {result_s[i][1]}')
with open('No.205 text.txt', 'r') as file:
script = file.read()
words = script.split()
for word in words:
word.strip('.,')
set_words = list(set(words))
count = 0
temp = []
count = []
result = []
for i in set_words:
temp.append([i, words.count(i)])
count.append(words.count(i))
for i in range(max(*count), 0, -1):
for j in range(len(temp)):
if len(result) > 9:
break
if temp[j][1] == i:
result.append(temp[j])
for i in result:
print(i[0], i[1])
var text = 'As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations. Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961. He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition. After Joseph-Desire Mobutu took power in 1965, Bolikango became a minister in his government. Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Revolution. Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later. His grandson created the Jean Bolikango Foundation in his memory to promote social progress. The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service.';
var arr = text.split(/[\s,\s.]+/);
var obj = {};
for (var i = 0; i < arr.length; i++) {
var key = arr[i];
if (obj[key]) {
obj[key]++;
} else {
obj[key] = 1;
}
}
var keys = Object.keys(obj).sort(function (a, b) {
return obj[b] - obj[a]
});
for (var j = 0; j < 10; j++) {
var value = obj[keys[j]];
console.log(keys[j], value);
}
def f(s):
L = list(map(lambda x : x.strip('.,'), s.split(' ')))
T = []
S = set(L)
for word in S:
T.append([word, L.count(word)])
T.sort(key=lambda x : x[1], reverse=True)
for i in range(10):
print(T[i][0], T[i][1])
using System;
using System.Collections;
using System.Collections.Generic;
namespace ConsoleApp1
{
struct WordCount
{
public string str;
public int num;
}
class Program
{
static void Main(string[] args)
{
string a = Console.ReadLine();
Console.WriteLine(a.Length);
string[] str = a.Split(' ');
WordCount[] wc;
wc = new WordCount[str.Length];
for (int i = 0; i < str.Length; i++)
{
for (int j = 0; j < wc.Length; j++)
{
if (wc[j].str == str[i])
{
wc[j].num++;
break;
}
else if (wc[j].str == null)
{
Console.WriteLine(str[i]);
wc[j].str = str[i];
wc[j].num++;
break;
}
}
}
//단어, 갯수 출력
for (int j = 0; j < wc.Length; j++)
{
if(wc[j].str != null)
Console.WriteLine("{0} 갯수 : {1}", wc[j].str, wc[j].num);
}
}
}
}
import operator
f = open('C:/doit/test.txt', 'r')
txt = f.read()
w = []
for x in txt.split('.'):
for y in x.split(','):
for z in y.split(' '):
w.append(z)
n = dict()
for x in list(set(w)):
n[x] = w.count(x)
n = sorted(n.items(), key = operator.itemgetter(1), reverse=True)
for i in range(10):
print(n[i][0], n[i][1])
f.close()
in 12
11
the 10
Bolikango 5
a 4
of 4
to 3
and 3
his 3
Minister 2
작성했습니다. 문제는 딕셔너리에서 ''가 11개나 나오네요. 이게 왜 이런건지 누가 좀 알려주시면 감사드리겠습니다...
import re
file = open("D:word counter.txt", 'r')
read_data = file.readlines()
print(read_data)
com = re.compile("[a-zA-Z0-9]+")
so = sorted(com.findall(read_data[0]))
result0, count = [], 1
for da in range(0, len(so)) :
if so[da] != so[da-1] :
count = 1
result0.append([so[da], count])
elif so[da] == so[da-1] :
count += 1
result0[-1][1] = count
result0.sort(key = lambda x:x[1], reverse = True)
for k in range(0, 10) :
print("%s : %s"%(result0[k][0], result0[k][1]))
아직 실력이 부족해서 이 정도 밖에 못하겠어요 ㅠㅠ
ArrayList를 무분별하게 사용해서 코드를 알아보기가 힘드네요.....
public class 단어세기WordCounting {
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
String str = scan.nextLine().replaceAll("[.,]", "");
String[] words = str.split(" ");
ArrayList<String> list = new ArrayList<String>(); //처음 words[]에 있는 배열을 list로 옮겼습니다.
ArrayList<String> list2 = new ArrayList<String>(); //중복이 1보다 큰 단어들을 list2에 단어를 추가하고
for(int i=0; i<words.length; i++) { //바로 뒤에 중복된 횟수를 추가했습니다. ex(in ,12, the, 10......)
list.add(words[i]);
}
for(int i=0; i<list.size(); i++) {
int count = 1;
if(!list.get(i).equals("0")) {
for(int j=i+1; j<list.size(); j++) {
if(!list.get(j).equals("0")&&list.get(i).equals(list.get(j))) {
count++;
list.set(j, "0");
}
}
}
if(count>1) {
list2.add(list.get(i));
list2.add(Integer.toString(count));
}
}
ArrayList<Integer> integer = new ArrayList<Integer>(); //중복된 횟수를 따로 분리해줄 list를 만들었습니다.
for(int i=1; i<list2.size(); i=i+2) {
integer.add(Integer.parseInt(list2.get(i)));
}
Collections.sort(integer);
Collections.reverse(integer); //리스트를 내림차순으로 정렬했습니다.
for(int i=0; i<10; i++) { //10번 실행
for(int j=1; j<list2.size(); j=j+2) { //리스트의 i번째 값과 같은 수를 찾으며 출력하였습니다.
if(Integer.parseInt(list2.get(j))==integer.get(i)&&!list2.get(j-1).equals("0")) {
System.out.println(list2.get(j-1) + " " + list2.get(j));
list2.set(j-1, "0");
break;
}
}
}
}
}
with open ("foo.txt", 'r') as f:
a=f.read()
b=str(a).split()
c=set(b)
count=dict()
for i in c:
count[i]=0
for i in b:
if i in c:
count[i]+=1
num=list()
for i in c:
num.append(count[i])
num.sort()
num.reverse()
num=num[0:10]
word=list()
for i in num:
for j in count:
if count[j] is i:
if j not in word:
word.append(j)
for i in range(len(num)):
print(word[i],num[i])
fn=input("파일 명을 입력하십시오: ") #word count.txt
dict={}
fh=open(fn)
for ln in fh:
ln=ln.rstrip() #오른쪽 공백(또는 개행문자) 지우기
wds=ln.split()
for w in wds:
dict[w]=dict.get(w,0)+1
elst=[]
for k,v in dict.items():
elst.append([v,k]) #빈도수를 기준으로 정렬하기 위해 [밸류,키] 순으로 리스트 생성
#print(sorted(elst)[::-1])
for i in range(10):
print(sorted(elst)[::-1][i][1],sorted(elst)[::-1][i][0])
N = input().split()
M = {}
test = list()
for i in N:
check = 0
go = 0
for j in range(len(test)):
if test[j] == i:
check = 1
go = j
if check == 0:
M[i] = 1
test.append(i)
elif check == 1:
M[test[go]] = M[test[go]] + 1
finalcheck = 0
for i in sorted(M.items(), key=lambda x: x[1], reverse=True):
finalcheck += 1
if finalcheck == 11:
break
print(i)
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <sstream>
#include <algorithm>
using namespace std;
/*
텍스트가 입력으로 주어질 때, 단어의 개수를 세는 프로그램을 작성한다.
"문자 세기"와 "단어 세기"는 프로그래밍 입문에 성공했는지를 가늠하는 문제라고 할 수 있습니다.
지금은 발가락으로도 만드는 분들이 많겠지만 처음에는 의외로 많이 어려워합니다.
추억을 살려 봅시다.
아래 내용을 가진 텍스트파일을 미리 만들어 두고, 프로그램을 실행하면 파일 내용을 읽어들인다(출처: Wikipedia).
As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations.
Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961.
He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition.
After Joseph-Desire Mobutu took power in 1965, Bolikango became a minister in his government.
Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Revolution.
Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later.
His grandson created the Jean Bolikango Foundation in his memory to promote social progress.
The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service.
*/
vector<string> split(string s) {
string temp;
stringstream ss;
ss.str(s);
vector<string> v;
while (ss >> temp) {
v.push_back(temp);
}
return v;
}
bool cmp(pair<int, string> &a, pair<int, string> &b) {
if (a.first == b.first)
return a.second < b.second;
return a.first > b.first;
}
int main() {
ifstream in("text.txt");
string s;
int size;
vector<string> v;
if (in.is_open()) {
in.seekg(0, ios::end);
size = in.tellg();
s.resize(size);
in.seekg(0, ios::beg);
in.read(&s[0], size);
for (int i = 0; i < s.length(); i++)
if (s[i] == ',' || s[i] == '.') { s[i] = ' '; }
}
else { cout << "파일 열기 실패" << endl; exit(-1); }
v = split(s);
sort(v.begin(), v.end());
vector<pair<int, string>> r;
int count = 1;
for (vector<string>::iterator iter = v.begin(); iter != v.end(); iter++) {
if ((iter + 1) == v.end()) { break; }
if (*iter == *(iter + 1)) { count++; }
else { r.push_back(make_pair(count, *iter)); count = 1; }
}
sort(r.begin(), r.end(), cmp);
cout << "=============text.txt에서 등장단어 빈도수 순위10===============" << endl << endl;
for (int i = 0; i < 10; i++)
cout << "단어:" << r[i].second << " 등장수:" << r[i].first << endl;
}
def count(s):
f = open(s, 'r')
data = f.read()
f.close()
table = str.maketrans('!,.', ' '' '' ')
data = data.translate(table)
words_list = data.split(' ') # 리스트로 만들기
words_set = set(words_list)
words_set.remove('')
dic = {}
for i in words_set:
dic[i] = words_list.count(i)
dic_list = []
for a, b in dic.items():
dic_list.append(b)
count = 0
while count != 10:
for a, b in dic.items():
if b == max(dic_list):
print(a, b)
dic_list.remove(b)
count += 1
if count == 10:
break
count('example.txt')
import re
def word_count(text):
dict = {}
for s in text:
if dict.get(s):
dict[s] += 1
else:
dict[s] = 1
r = list(sorted(dict.items(), key=lambda x: x[1], reverse=True))
result = r[:10]
for i, j in result:
print(i, j)
if __name__ == '__main__':
with open('./word counting.txt', 'r') as f:
t = f.read()
text = re.findall('[a-zA-Z]+', t)
word_count(text)
#파이썬
#아직 초보라 소스가 좀 길어져 버렸습니다
#문제에 있는 결과와 제가 만든 소스의 결과가 다릅니다
#예를들어 저는 the가 11개가 나왔고, 한글에 저 텍스트를 넣어서 찾기를 해보면 11개가 나옵니다
#문제에는 10개라고 되어 있네요
#딕셔너리로 풀면 더 간단할것 같은데, 아직 딕셔너리를 잘 다루지 못해서 리스트를 이용하여 풀어보았습니다
fileopen=open("text.txt")
text=fileopen.read()
text=text.split(' ')
for i in range (len(text)):
if '.' in text[i]:
text[i]=text[i].split('.')
if '' in text[i]:
text[i].remove('')
text[i]=text[i][0]
if ',' in text[i]:
text[i]=text[i].split(',')
if '' in text[i]:
text[i].remove('')
text[i]=text[i][0]
if '-' in text[i]:
text[i]=text[i].split('-')
if '' in text[i]:
text[i].remove('')
text[i]=text[i][0]
text[i]=text[i].lower()
text_keys,text_count=[],[]
for i in range (len(text)):
if text[i] not in text_keys:
text_keys.append(text[i])
text_count.append(1)
else:
text_count[text_keys.index(text[i])]+=1
temp=list(text_count)
temp.sort(reverse=True)
tenth=temp[9] #9번째 값 검출
text_keys2,text_count2=[],[]
for i in range (len(text_keys)):
if text_count[i]>=tenth:
text_keys2.append(text_keys[i])
text_count2.append(text_count[i])
for k in range (0,len(text_keys2)):
for i in range (len(text_keys2)-1):
if text_count2[i]<text_count2[i+1]:
temp=text_count2[i+1]
text_count2[i+1]=text_count2[i]
text_count2[i]=temp
for i in range (len(text_keys2)):
print (text_keys2[i],':',text_count2[i])
파이썬
a=input()
b=a.replace(',', ' ').replace('.',' ').split()
c={}
for i in range(len(b)):
c[b[i]]=b.count(b[i])
d=sorted(c.items(), key=lambda x: x[1], reverse=True)
for j in range(10):
print(d[j][0],':',d[j][1])
import java.util.*;
public class test8 {
public static void main(String[] args) {
String input = "Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services. It is considered one of the Big Tech technology companies, alongside Amazon, Google, Microsoft, and Facebook.";
Map<String, Integer> map = new LinkedHashMap<>();
for(String a : input.split("[ ,.]")) {
if(!a.equals("")) {
if(map.containsKey(a)) {
map.replace(a,map.get(a)+1);
} else {
map.put(a,1);
}
}
}
LinkedHashMap<String, Integer> result = SortMapByValue(map);
for(int i = 0; i < result.size(); i++) {
System.out.println(result.keySet().toArray()[i]+" : "+result.get(result.keySet().toArray()[i]));
}
}
public static LinkedHashMap<String, Integer> SortMapByValue (Map<String,Integer> input) {
List<Map.Entry<String,Integer>> entries = new LinkedList<>(input.entrySet());
Collections.sort(entries, (o1,o2)-> o2.getValue().compareTo(o1.getValue()));
LinkedHashMap<String, Integer> result = new LinkedHashMap<>();
for(Map.Entry<String, Integer> entry: entries) {
result.put(entry.getKey(), entry.getValue());
}
return result;
}
}
import re
import operator
f=open("wiki.txt",'r')
text=f.read()
txt=re.findall(r"[\w']+",text.lower())
lst={}
for i in txt:
if i in lst:
lst[i]+=1
elif i not in lst:
lst[i]=1
print(sorted(lst.items(), key=lambda lst: lst[1], reverse=True)[0:10:])
with open("wordCounting.txt", 'r') as f:
temp = []
for i in f.read().split(". "):
temp += i.split(" ")
x = set(temp)
result = {}
for i in x:
result[i] = temp.count(i)
v = list(result.values())
v.sort()
v.reverse()
count = 0
for n in range(10):
for i, k in result.items():
if k == v[n]:
print(i +" : "+ str(v[n]))
count+=1
if count == 10:
break
f = open('E:/Word Counting.txt','r')
line = f.read()
c = list(line.split())
count=0
d=[]
e={}
a={}
w=0
for i in c: # ',' , '.' 가지고있는 단어 ',' , '.' 없애기
if i[-1]==',' or i[-1]=='.':
i=i[:-1]
d.append(i)
else:
d.append(i)
for i in set(d): # 집합과 리스트의 반복문으로 중복단어 세기
count=0
for j in d:
if i==j:
count+=1
e[i]=count
for key, value in e.items(): # 벨류값이 1인 딕셔너리 삭제
if value!=1:
a[key]=value
a = sorted(a.items(),reverse=True,key=lambda item:item[1])
while w<10:
print(a[w][0],a[w][1])
w+=1
a=input('글자를 입력해 주세요') 공백포함=len(a) 공백재외=len(a)-a.count(' ')-a.count(',')-a.count('.') 단어수=a.count(' ')+1 print('공백포함:%d'%공백포함) print('공백재외:%d'%공백재외) print('단어수:%d'%단어수)
txt='As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations. Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961. He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition. After Joseph-Desire Mobutu took power in 1965, Bolikango became a minister in his government. Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Revolution. Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later. His grandson created the Jean Bolikango Foundation in his memory to promote social progress. The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service.'
txt2 = txt.split()
txt3 = []
txt4 = []
for i in txt2:
i.strip('.,')
if i not in txt3:
txt3.append(i)
for i in txt3:
co = txt2.count(i)
txt4.append([co,i])
txt4.sort()
txt4.reverse()
count = 0
for j,h in txt4:
print(h,j)
count += 1
if count == 10:
break
def strp (a):
result = a.strip(',').strip('.')
return result
f = open('memo.txt','r')
r = f.read().split(" ")
all_words = list(map(strp,r))
words = set(all_words)
cnt_arr =[]
for i in words :
cnt_arr.append([all_words.count(i),i])
cnt_arr.sort()
cnt_arr.reverse()
for k in range(10):
print(cnt_arr[k][1]," ",cnt_arr[k][0])
// Rust
use std::collections::HashMap; fn word_counting() {
let input = "As the country became embroiled in a domestic crisis,
...
Bolikango a medal in 2005 for his long career in public service.";
let words = input.split_whitespace().map(str::to_lowercase);
// word frequency를 map에 저장
let mut map: HashMap<String, u32> = HashMap::new();
for word in words {
let word_ = word.replace(|c: char| !c.is_alphanumeric(), "");
let count = map.entry(word_).or_default();
*count += 1;
}
// (word, count) pair를 뒤집어 (count, [word])로 map에 저장
let mut map_: HashMap<u32, Vec<String>> = HashMap::new();
for (k, v) in map {
let word_vec = map_.entry(v).or_default();
word_vec.push(k);
}
// count 높은 순으로 출력
let mut counts: Vec<&u32> = map_.keys().collect();
counts.sort_by_key(|&n| -(*n as i32));
for count in counts {
println!("{:2} {:?}", count, map_[count]);
}
}
article='As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations. Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961. He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition. After Joseph-Desire Mobutu took power in 1965, Bolikango became a minister in his government. Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Revolution. Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later. His grandson created the Jean Bolikango Foundation in his memory to promote social progress. The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service.'
lower=article.lower()
words=lower.replace('.','').replace(',','').split()
wordsx=list(set(words))
ans={}
while len(wordsx)>0:
a=words.count(wordsx[0])
ans[wordsx[0]]=a
wordsx.remove(wordsx[0])
print(ans)
k=list(ans.keys())
v=list(ans.values())
for i in range(1,11):
print(k[v.index(max(v))],max(v))
k.remove(k[v.index(max(v))])
v.remove(max(v))
# 파일 열기
tx = open("test.txt", 'r')
data = tx.read()
tx.close
# 문자열 split
data = data.replace(',', ' ').replace('.', ' ').split()
# 딕셔너리에 단어:횟수 형태로 추가
dict = {}
for i in data:
if i not in dict:
dict[i] = 1
else:
dict[i] += 1
# value을 기준으로 내림차순 정렬하여 리스트로 변환
dict = sorted(dict.items(), key = lambda x : x[1], reverse=True)
# 리스트 9번째 까지 출력
count = 0
while count < 10:
print(dict[count][0], dict[count][1])
count += 1
딕셔너리 밸류값기준으로 내림차순하는 건 구글링했습니당ㅋㅋ
a="As the country became embroiled in a domestic crisis, the first government was dislodged and succeeded by several different administrations. Bolikango served as Deputy Prime Minister in one of the new governments before a partial state of stability was reestablished in 1961. He mediated between warring factions in the Congo and briefly served once again as Deputy Prime Minister in 1962 before returning to the parliamentary opposition. After Joseph-Desire Mobutu took power in 1965, Bolikango became a minister in his government. Mobutu soon dismissed him but appointed him to the political bureau of the Mouvement Populaire de la Revolution. Bolikango left the bureau in 1970. He left Parliament in 1975 and died seven years later. His grandson created the Jean Bolikango Foundation in his memory to promote social progress. The President of the Congo posthumously awarded Bolikango a medal in 2005 for his long career in public service."
b=a.replace(",","") a=b.replace(".","")
li=a.split(" ")
count=0 dic={i:0 for i in li} for i in li: for j in dic: if i==j: dic[j]+=1
li1=sorted(dic, key=dic.get,reverse=True)
li2=[] for i in li1[0:10]: count=0 for j in li: if i==j: count+=1 li2.append(i) li2.append(count) print(li2)
print(li) se=set(li) print(se) li1=list(se) print(li1) li2=[] for i in li1: count=0 for j in li: if i==j: count+=1 li2.append((count,i)) li2.sort(reverse=True) print(li2) for i in li2[:10]: print(i)
1 .
b=a.replace(",","").replace(".","") li=b.split() dic={i:li.count(i) for i in li} dic=sorted(dic,key=dic.get,reverse=True) print(dic) for i in dic[:10]: print(i,li.count(i))
i = 0
f = open("count_word", "r")
char = f.readline()
word = char.split()
print(len(char))
print(len(word))
dic = dict()
for wd in word:
try : dic[wd] = dic[wd] + 1
except : dic[wd] = 1
sort_dic = sorted(dic.items(), key=lambda x:x[1], reverse=True)
for key, value in sort_dic:
print(key, ":", value)
i += 1
if i == 10:
break
f.close()
``````{.python}
i = 0
f = open("count_word", "r")
char = f.readline()
word = char.split()
print(len(char))
print(len(word))
dic = dict()
for wd in word:
try : dic[wd] = dic[wd] + 1
except : dic[wd] = 1
sort_dic = sorted(dic.items(), key=lambda x:x[1], reverse=True)
for key, value in sort_dic:
print(key, ":", value)
i += 1
if i == 10:
break
f.close()
f = open("example.txt",'r')
data = f.read()
f.close()
data = data.replace("."," ")
data = data.replace(","," ")
lis = data.split()
data_set = set(lis)
count_dict = {key:0 for key in data_set}
for word in lis:
count_dict[word] += 1
for k, v in count_dict.items():
print("key: %15s value: %3d" %(k,v))
야호
words = "As the country became embroiled in a domestic crisis, " \
"the first government was dislodged and succeeded by several" \
" different administrations. Bolikango served as Deputy Prime " \
"Minister in one of the new governments before a partial state of " \
"stability was reestablished in 1961. He mediated between warring " \
"factions in the Congo and briefly served once again as Deputy Prime" \
" Minister in 1962 before returning to the parliamentary opposition." \
" After Joseph-Desire Mobutu took power in 1965, Bolikango became " \
"a minister in his government. Mobutu soon dismissed him but appointed " \
"him to the political bureau of the Mouvement Populaire de la Revolution." \
" Bolikango left the bureau in 1970. He left Parliament in 1975 and died" \
" seven years later. His grandson created the Jean Bolikango Foundation" \
" in his memory to promote social progress. The President of the Congo" \
" posthumously awarded Bolikango a medal in 2005 for his long career in" \
" public service."
w = ''
dic = {}
for s in words:
if s == ' ' or s == '.':
if len(w) > 0:
if w not in dic:
dic[w] = 0
dic[w] += 1
w = ''
else:
w += s
lst = sorted(dic.items(), key=lambda kv: kv[1], reverse=True)
for k, v in lst[:10]:
print('{0}: {1}'.format(k,v))