CN101986310A - Method and device for updating cyberword dictionary - Google Patents

Method and device for updating cyberword dictionary Download PDF

Info

Publication number
CN101986310A
CN101986310A CN 201010545864 CN201010545864A CN101986310A CN 101986310 A CN101986310 A CN 101986310A CN 201010545864 CN201010545864 CN 201010545864 CN 201010545864 A CN201010545864 A CN 201010545864A CN 101986310 A CN101986310 A CN 101986310A
Authority
CN
China
Prior art keywords
dictionary
word
lexical
textual analysis
cyberspeak
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010545864
Other languages
Chinese (zh)
Inventor
陈淮琰
席溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Besta Xian Co Ltd
Original Assignee
Inventec Besta Xian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Xian Co Ltd filed Critical Inventec Besta Xian Co Ltd
Priority to CN 201010545864 priority Critical patent/CN101986310A/en
Publication of CN101986310A publication Critical patent/CN101986310A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a device for updating a cyberword dictionary. The device comprises an input unit, a dictionary database, a query unit and a processing unit, wherein the input unit can input queried words and displays a query result; the dictionary database is used for storing keywords and paraphrases thereof; the query unit can search needed words in webpage source codes; the processing unit can grab word keywords and the paraphrases thereof and stores the word keywords and the paraphrases thereof into the dictionary database or deletes the word keywords and the paraphrases thereof from the dictionary database; and the input unit, the query unit, the processing unit and the dictionary database are connected with one another sequentially. By the method and the device, the keywords and the paraphrases thereof in the cyberword can be automatically acquired from a specified web address according to a set format, and the acquired keywords and the paraphrases thereof in the cyberword are added into the cyberword dictionary according to a dictionary format so as to fulfill the aims of updating the cyberword into the latest popular words from a network at any time and updating the dictionary in real time.

Description

A kind of method and device that upgrades the cyberspeak dictionary
Technical field
The present invention relates to method and the device of a kind of automatically updated personalized network with dictionary.
Background technology
Along with network more and more gos deep into people's life, cyberspeak has also become more and more life-stylize, and increasing cyberspeak not only appears in our life, or even also more and more brings into use in a lot of newspaper, magazine, such as:
FB: corruption, online friend's party is had a meal and is become FB usually, generally is the activity of having a dinner party of the Dutch treatment;
NB: ox is forced, and just severe, the people of NB is exactly " ox people ";
PF: admire that Wei Xiaobao says " thousand wear ten thousand wear horse wind do not wear ", so same on network.Popularly in the past come one section " I am in an unbroken line just like torrential river to your respecting and admiring ... " but pursuing succinct network world, your position expressed in " PF " two words with just having a clear-cut stand.Think to strengthen again the tone, we can say " PFPF ".Supporting abbreviation comprises PMP (flattering) and MPJ (apple shiner), and perhaps the object of you PF be can use when answering you.
Because cyberspeak is to be formed by partials or wrongly written or mispronounced characters reorganization mostly, also some is to cause popular classical quotation from pictograph character/word and forum, if therefore do not understand its meaning, probably will not understand others and saying something, read that perhaps newspaper/magazine is unexpectedly.How the cyberspeak of being on the increase is compiled, and can the continuous updating word, keeping trend, personalization is the problem that we make great efforts to solve.
But present e-learning dictionary device can't meet this requirement, and can not upgrade, augments or delete some cyberspeaks or popular vocabulary automatically.
Summary of the invention
Can not upgrade the problem of cyberspeak dictionary automatically in order to solve prior art, the invention provides a kind of up-to-date popular word of renewal from network at any time, can upgrade method and the device of personalized network automatically with dictionary.
Technical scheme of the present invention is: a kind of method of upgrading the cyberspeak dictionary, and its special character is: comprise following performing step:
1) enters the appointment network address, in the webpage source code, search required word;
2) grasp the key word and the lexical or textual analysis of the word found;
3) judge whether new vocabulary according to key word and the lexical or textual analysis grasped,, then carry out step 4) if new vocabulary is arranged; If there is not new vocabulary, then return step 2);
4) deposit key word and the lexical or textual analysis of grasping in dictionary database.
Above-mentioned steps 1) specifically comprise:
1.1) judge whether to connect network, if not, then connect network; If then enter the appointment network address;
1.2) set and search using form;
1.3) in the webpage source code, search required word according to the form of setting.
Above-mentioned steps 2) specifically comprise:
2.1) from step 1.3) grasp its key word the word inquired about;
2.2) according to its detailed lexical or textual analysis network address of the keyword query that is grasped;
2.3) grasp its lexical or textual analysis.
The step 5) that also comprises said method refreshes the dictionary tabulation.
Said method comprises that also step 6) deletes key word and lexical or textual analysis from dictionary database.
A kind of device that upgrades the cyberspeak dictionary, its special character is: described device comprises: input block, can import the vocabulary of being inquired about, and show Query Result; Dictionary database is used for storage key and lexical or textual analysis; Query unit can be searched required word in the webpage source code; Processing unit can grasp word key word and lexical or textual analysis and is stored to dictionary database or deletes from dictionary database; Described input block, query unit, processing unit and dictionary database connect successively.
The method of renewable cyberspeak dictionary provided by the invention and device, can be automatically on the network address of appointment, the Keyword and the lexical or textual analysis thereof of obtaining cyberspeak according to the form of setting, again the Keyword of the cyberspeak that obtains and lexical or textual analysis thereof are added according to Dictionary format and entered the cyberspeak dictionary, thereby be able to from network, upgrade at any time up-to-date popular word, the purpose of real-time update dictionary.
Description of drawings
Fig. 1 is a method flow diagram of the present invention;
Fig. 2 is the entry process flow diagram of augmenting of the present invention;
Fig. 3 is a deletion entry process flow diagram of the present invention;
Fig. 4 is an apparatus structure synoptic diagram of the present invention;
Fig. 5 is a functional diagram of the present invention.
Embodiment
Referring to Fig. 1,2 and 3, a kind of method of upgrading the cyberspeak dictionary involved in the present invention, its preferred embodiment is:
The concrete implementation step of this method comprises:
Step 1) enters the appointment network address, searches required word in the webpage source code;
1.1) judge whether to connect network, if not, then connect network; If then enter the appointment network address;
1.2) set and search using form;
1.3) in the webpage source code, search required word according to the form of setting;
After connecting the internet, open the appointment network address automatically, such as Baidu's search roll of the hour, in the webpage source code, search the word of default form, as: popular search.
Step 2) grasps the key word and the lexical or textual analysis of the word found;
2.1) from step 1.3) grasp its key word the word inquired about;
2.2) according to its detailed lexical or textual analysis network address of the keyword query that is grasped;
2.3) grasp its lexical or textual analysis;
After in " popular search ", finding word all keyword extraction under this word are come out and according to its detailed lexical or textual analysis network address of this keyword search, find and open webpage after the network address and grasp its detailed lexical or textual analysis.
Step 3) judges whether new vocabulary according to key word and the lexical or textual analysis grasped, if new vocabulary is arranged, then carries out step 4); If there is not new vocabulary, then return step 2);
Screening Treatment is carried out in key word and the detailed lexical or textual analysis grasped, judged this key word and lexical or textual analysis if new content is not stored by system in lexical data base, then this vocabulary is new vocabulary, then carries out next step storage operation; If judge that drawing this vocabulary is not new vocabulary, the record of this vocabulary is arranged in former lexical data base, then return step 2), inquire about new vocabulary.
Step 4) deposits key word and the lexical or textual analysis of grasping in dictionary database;
Step 5) refreshes the dictionary tabulation;
Step 6) is deleted key word and lexical or textual analysis from dictionary database; If vocabulary has not belonged to popular term, its lexical or textual analysis that can utilize its key word to read in to store in the dictionary database, the key word of deleting this vocabulary with and lexical or textual analysis, refresh the dictionary tabulation then, upgrade the storage in the dictionary database.
Referring to Fig. 4,5, a kind of device that upgrades the cyberspeak dictionary that the invention still further relates to, its preferred embodiment is:
This device comprises:
Input block can be imported the vocabulary of being inquired about, and shows Query Result;
Dictionary database is used for storage key and lexical or textual analysis;
Query unit can be searched required word in the webpage source code;
Processing unit can grasp word key word and lexical or textual analysis and is stored to dictionary database or deletes from dictionary database;
Wherein input block is connected with query unit, and query unit, processing unit and dictionary database connect successively.
Also be provided with the entry interface on the processing unit of the present invention, can realize and being connected of extraneous memory device by the entry interface.
The user can set find formatting and import the word that needs search inquiry by input block, give query unit by input block with the information transmission of input, query unit can be searched required word according to the form of setting in the webpage source code, grasp its key word that comprises and lexical or textual analysis, and send the result who is inquired about to processing unit, inquire about in dictionary database through processing unit, search, analysis and judgement, whether the word of query unit has storage in dictionary database, if have, then finish this operating process, if no, then key word that will grasp and lexical or textual analysis deposit dictionary database in, refresh the dictionary tabulation, finish the renewal of this database, if need to increase and decrease the vocabulary in the dictionary database over time, can utilize input block once more, query unit is found out the vocabulary that need delete, by processing unit its key word and lexical or textual analysis is deleted from dictionary database, refresh dictionary tabulation again, the renewal of finishing database with delete.

Claims (7)

1. method of upgrading the cyberspeak dictionary is characterized in that: comprise following performing step:
1) enters the appointment network address, in the webpage source code, search required word;
2) grasp the key word and the lexical or textual analysis of the word found;
3) judge whether new vocabulary according to key word and the lexical or textual analysis grasped,, then carry out step 4) if new vocabulary is arranged; If there is not new vocabulary, then return step 2);
4) deposit key word and the lexical or textual analysis of grasping in dictionary database.
2. the method for renewal cyberspeak dictionary according to claim 1 is characterized in that: described step 1) specifically comprises:
1.1) judge whether to connect network, if not, then connect network; If then enter the appointment network address;
1.2) set and search using form;
1.3) in the webpage source code, search required word according to the form of setting.
3. the method for renewal cyberspeak dictionary according to claim 2 is characterized in that: described step 2) specifically comprise:
2.1) from step 1.3) grasp its key word the word inquired about;
2.2) according to its detailed lexical or textual analysis network address of the keyword query that is grasped;
2.3) grasp its lexical or textual analysis.
4. the method for renewal cyberspeak dictionary according to claim 3 is characterized in that: described method comprises that also step 5) refreshes the dictionary tabulation.
5. according to the method for claim 1 or 2 or 3 or 4 described renewal cyberspeak dictionaries, it is characterized in that: described method comprises that also step 6) deletes key word and lexical or textual analysis from dictionary database.
6. device that upgrades the cyberspeak dictionary, it is characterized in that: described device comprises: input block, can import the vocabulary of being inquired about, and show Query Result; Dictionary database is used for storage key and lexical or textual analysis; Query unit can be searched required word in the webpage source code; Processing unit can grasp word key word and lexical or textual analysis and is stored to dictionary database or deletes from dictionary database; Described input block, query unit, processing unit and dictionary database connect successively.
7. the device of renewal cyberspeak dictionary according to claim 6 is characterized in that: described processing unit includes the entry interface.
CN 201010545864 2010-11-16 2010-11-16 Method and device for updating cyberword dictionary Pending CN101986310A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010545864 CN101986310A (en) 2010-11-16 2010-11-16 Method and device for updating cyberword dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010545864 CN101986310A (en) 2010-11-16 2010-11-16 Method and device for updating cyberword dictionary

Publications (1)

Publication Number Publication Date
CN101986310A true CN101986310A (en) 2011-03-16

Family

ID=43710656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010545864 Pending CN101986310A (en) 2010-11-16 2010-11-16 Method and device for updating cyberword dictionary

Country Status (1)

Country Link
CN (1) CN101986310A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014206186A1 (en) * 2013-06-28 2014-12-31 百度在线网络技术(北京)有限公司 Method and device for generating entry information
CN106484729A (en) * 2015-08-31 2017-03-08 华为技术有限公司 A kind of vocabulary generation, sorting technique and device
CN106874363A (en) * 2016-12-30 2017-06-20 北京光年无限科技有限公司 The multi-modal output intent and device of intelligent robot
CN109213993A (en) * 2018-07-20 2019-01-15 沈文策 A kind of method and apparatus for adding customized participle
CN110825840A (en) * 2019-11-08 2020-02-21 北京声智科技有限公司 Word bank expansion method, device, equipment and storage medium
CN111178050A (en) * 2019-12-24 2020-05-19 浙江旅游职业学院 Korean new word capturing software and working process thereof
CN111966702A (en) * 2020-08-17 2020-11-20 中国银行股份有限公司 Spark-based financial information bag-of-words model incremental updating method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1261180A (en) * 1998-12-11 2000-07-26 英业达集团(上海)电子技术有限公司 Method for updating electronic dictionary for higher capacity
CN1647072A (en) * 2002-07-24 2005-07-27 卡西欧计算机株式会社 Electronic dictionary terminal, electronic dictionary server, and recording medium
CN1770149A (en) * 2004-11-02 2006-05-10 英业达股份有限公司 System and method for expanding local electronic dictionary by online dictionary
CN1983271A (en) * 2005-12-16 2007-06-20 国际商业机器公司 System and method for defining and translating chat abbreviations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1261180A (en) * 1998-12-11 2000-07-26 英业达集团(上海)电子技术有限公司 Method for updating electronic dictionary for higher capacity
CN1647072A (en) * 2002-07-24 2005-07-27 卡西欧计算机株式会社 Electronic dictionary terminal, electronic dictionary server, and recording medium
CN1770149A (en) * 2004-11-02 2006-05-10 英业达股份有限公司 System and method for expanding local electronic dictionary by online dictionary
CN1983271A (en) * 2005-12-16 2007-06-20 国际商业机器公司 System and method for defining and translating chat abbreviations

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014206186A1 (en) * 2013-06-28 2014-12-31 百度在线网络技术(北京)有限公司 Method and device for generating entry information
CN104252487A (en) * 2013-06-28 2014-12-31 百度在线网络技术(北京)有限公司 Method and device for generating entry information
CN104252487B (en) * 2013-06-28 2019-05-03 百度在线网络技术(北京)有限公司 A kind of method and apparatus for generating entry information
CN106484729A (en) * 2015-08-31 2017-03-08 华为技术有限公司 A kind of vocabulary generation, sorting technique and device
CN106874363A (en) * 2016-12-30 2017-06-20 北京光年无限科技有限公司 The multi-modal output intent and device of intelligent robot
CN109213993A (en) * 2018-07-20 2019-01-15 沈文策 A kind of method and apparatus for adding customized participle
CN110825840A (en) * 2019-11-08 2020-02-21 北京声智科技有限公司 Word bank expansion method, device, equipment and storage medium
CN110825840B (en) * 2019-11-08 2023-02-17 北京声智科技有限公司 Word bank expansion method, device, equipment and storage medium
CN111178050A (en) * 2019-12-24 2020-05-19 浙江旅游职业学院 Korean new word capturing software and working process thereof
CN111966702A (en) * 2020-08-17 2020-11-20 中国银行股份有限公司 Spark-based financial information bag-of-words model incremental updating method and system
CN111966702B (en) * 2020-08-17 2023-08-18 中国银行股份有限公司 Spark-based financial information word bag model increment updating method and system

Similar Documents

Publication Publication Date Title
CN101986310A (en) Method and device for updating cyberword dictionary
CN110427563B (en) Professional field system cold start recommendation method based on knowledge graph
US20180181544A1 (en) Systems for Automatically Extracting Job Skills from an Electronic Document
CN105095195A (en) Method and system for human-machine questioning and answering based on knowledge graph
CN102426591A (en) Method and device for operating corpus used for inputting contents
CN105205699A (en) User label and hotel label matching method and device based on hotel comments
CN102971729B (en) Operable attribute is attributed to the data describing personal identification
WO2021135469A1 (en) Machine learning-based information extraction method, apparatus, computer device, and medium
CN101004737A (en) Individualized document processing system based on keywords
JP5088096B2 (en) Information extraction program and information extraction apparatus
WO2002082318A2 (en) System and method for extracting information
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
CN102439595A (en) Question-answering system and method based on semantic labeling of text documents and user questions
CN106708929B (en) Video program searching method and device
CN102612691A (en) Method and system for scoring texts
WO2020056977A1 (en) Knowledge point pushing method and device, and computer readable storage medium
CN111506696A (en) Information extraction method and device based on small number of training samples
CN109299227B (en) Information query method and device based on voice recognition
US20180032527A1 (en) Automated Search Matching
CN115438166A (en) Keyword and semantic-based searching method, device, equipment and storage medium
WO2015084404A1 (en) Matching of an input document to documents in a document collection
CN107291858A (en) Data indexing method based on character string suffix
CN112667775A (en) Keyword prompt-based retrieval method and device, electronic equipment and storage medium
CN114141384A (en) Method, apparatus and medium for retrieving medical data
US20230394236A1 (en) Extracting content from freeform text samples into custom fields in a software application

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110316