15 Mar
10:00 - 17:00

DKE-Master Artificial Intelligence Seminar

During this seminar students from the Master Artificial Intelligence will present their thesis to the audience. Artificial intelligence focuses on the design and creation of intelligent systems that can, for example, play games, control robots or analyse data. During this Master students learn to develop computational solutions to problems and provide up-to-date scientific and technological advice.

 

Programme

Time Session
10:00 - 11:00 SESSION 1 - Chaired by: Mena Habib
 
  • Yannik Hermey, Named entity extraction and disambiguation from microposts
  • Darius Schneider, Tweets Normalization
  • Max Uppenkamp, Automated high-level feature discovery for text classification
11:30 - 13:00 SESSION 2 - Chaired by: Kurt Driessens
 
  • Sallil Bhat, Investigating the influence of colour and spatial frequency content on classification performance of Deep Convolutional Neural Networks
  • Daniel Brüggeman, Identification of Tissue with Cancer Cells Using AI Techniques
  • Maarten Weber,  title t.b.a.
  • Chang Sun, Classification of Types of Twitter Accounts
13:30 - 14:30

SESSION 3 - Chaired by: Jerry Spanakis

  • Jeroen Boonen, Comparing Machine Learning Algorithms for classifying small text sections using semantic representations
  • Joeri Hermans, On Scalable Deep Learning and Parallelizing Gradient Descent
  • Josephine Rutten, Convolutional Neural Networks for Sentence Classification in Dutch Language
15:30 - 17:00

SESSION 4 - Chaired by: Stelios Asteriades

  • Lando Kroes, Combining computer vision and feedback control in a NAO robot for interactive game-playing
  • Matthias Löbach, Centralized Global Task Planning with Temporal Aspects on a Group of Three Robots in the RoboCup Logistics
  • Carsten Orth, Hierarchical Path Finding in a Real Time Flight Simulation System
  • Justus Schwan, Automatic Recognition of Screen Worker's Engagement

 

Abstracts/Research goals

The research is focused on training classifiers for small text sections (paragraphs). The most important classification algorithm for text is a Support Vector Machine. There are however several problems in using SVM's for paragraph classification:

  • When training the SVM for small text sections, there is (generally) a high class imbalance
  • SVM's usually have a bad generalization performance (classifying documents not included in the initial TFIDF calculation).

Previous research provides several possible solutions for these problems. The research involves the following research questions:

  • Derive a general-purpose decision tree using a Maximal Entropy Model to classify text sections from other sources, this will be the base performance algorithm.
  • Generate a semantic representation for the training texts.
  • Use the semantic data representation to train SVM's and (if time permits) deep-learning algorithms.
  • How can the class imbalance problem be solved?
  • These classifiers will then be used to classify unknown text from other documents. The aim is to outperform the decision tree algorithm
  • How could deep-learning algorithms and architectures be used for text-section classification and what are the requirements with respect to the training data?

The data used in this research will consist of legal contracts and the intention will be to classify different types of clauses in these contracts.

Jeroen Boonen
Comparing Machine Learning Algorithms for classifying small text sections using semantic representations

In this work we try to obtain a better understanding of asynchronous data parallelism, which is a technique to parallelize gradient descent in the presence of large models or datasets. We explore the convergence properties of several known distributed optimization schemes, and try to explain why they fail in particular situations. Finally, using the obtained knowledge, we attempt to construct a distributed optimizer which is more robust to hyper-parametrization.

Joeri Hermans
On Scalable Deep Learning and Parallelizing Gradient Descent

Microposts, also known as tweets, are a highly popular medium to share facts, opinions and emotions. Therefore, they are highly valuable in terms of their contained data.

The topic of named entity extraction within tweets became an important one in the course of the last years. The possibility to obtain information about recent events in real time through tweets seems very appealing and due to the fact that it is an impossible task to investigate everything manually it quickly turned into a problem which drew attention of different communities, industry and research communities even-handedly. This attention resulted in different approaches to develop algorithms to automatically extract semantics in tweets and link them accordingly.

In this presentation rule-based approaches like FASTUS or LaSIE, machine learning-based approaches like HMM or CRF and hybrid approaches will be explained and their advantages and disadvantages will be discussed.

Furthermore, an outlook for future work will be provided.

Yannik Hermey
Named entity extraction and disambiguation from microposts

The main research goal for my thesis is converting noisy text (of Tweets) into formal English using machine learning techniques. This is needed in order to improve the results of Natural Language Processing techniques used on the text; the closer text is to formal English, the better the results. The focus in order to achieve that will be put on deep learning techniques utilizing word vectors (a model that predicts words based on neighboring words) as an input.

Darius Schneider
Tweets Normalization

Big data is a very interesting data source for official statistics. Examples of big data based applications are the use of road sensor data for traffic intensity statistics and the association between the sentiment in social media messages and consumer confidence in the Netherlands. However, when big data is looked upon from the perspective of the units, it has been found that the data -in some of these sources- are not produced by homogeneous groups, i.e.similar types, of units.

This is certainly the case for Twitter. The messages produced on this platform are produced by -at least- two different types of units; i.e. persons and companies. The ability to differentiate between these different types of accounts on Twitter is essential from a statistical point of view as both types of accounts may display a different behaviour. Hence, there is a need to develop an approach (or approaches) to identify the different types of Twitter accounts, starting by identifying accounts used by either persons or companies. Machine learning algorithms will be applied to classify the company Twitter accounts from other types.

Chang Sun
Classification of Types of Twitter Accounts